The multi-word terminology is recognized based on statistical information obtained from large amount of documents in general. However, this kind of approaches is dependent on the domain information, so that it is very limited to use when it is hard to apply to a new domain or extract statistical information in the domain. Hence, we have developed a multi-word terminology recognition system independent with domain, which is more effective than the local statistical information-based methodology. It utilizes dictionaries, syntactic features, and web search results after excluding the statistical information extracted from the target literature records. We achieved F-score 80.9 and 6.4% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies.
Terminology Recognition; Text Mining; Machine Learning; Information Extraction
Information : an international interdisciplinary journal