KISTI Institutional Repository: Domain Independent Recognition of Multi-word Technology Using Web Search

KISTI repository

download0 view1,208

This item is licensed Korea Open Government License

Title: Domain Independent Recognition of Multi-word Technology Using Web Search

Abstract: The multi-word terminology is recognized based on statistical information obtained from large amount of documents in general. However, this kind of approaches is dependent on the domain information, so that it is very limited to use when it is hard to apply to a new domain or extract statistical information in the domain. Hence, we have developed a multi-word terminology recognition system independent with domain, which is more effective than the local statistical information-based methodology. It utilizes dictionaries, syntactic features, and web search results after excluding the statistical information extracted from the target literature records. We achieved F-score 80.9 and 6.4% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies.

Keyword: Terminology Recognition; Text Mining; Machine Learning; Information Extraction

KISTI 국가과학기술데이터본부 디지털큐레이션센터 데이터표준화팀
우)34141 대전광역시 유성구 대학로 245 한국과학기술정보연구원
Tel 042) 869-1004,1234 FAX 042) 869-1091

KISTI Institutional Repository는 국립중앙도서관 OAK 보급사업으로 구축되었습니다.