KISTI Institutional Repository: TAKES: Two-step Approach for Knowledge Extraction in Biomedical Digital Libraries

Open Access KISTI

KISTI repository

BROWSE

KISTI Institutional Repository8. KISTI 간행물 JISTaP Vol. 2 - No. 1

download508 view1,598

This item is licensed Creative Commons License

dc.contributor.author: Min Song

dc.date.accessioned: 2018-10-12T04:51:11Z

dc.date.available: 2018-10-12T04:51:11Z

dc.date.issued: 2014-03-30

dc.identifier.issn: 2287-4577

dc.identifier.uri: https://repository.kisti.re.kr/handle/10580/8646

dc.description.abstract: This paper proposes a novel knowledge extraction system, TAKES (Two-step Approach for Knowledge Extraction System), which integrates advanced techniques from Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing (NLP). In particular, TAKES adopts a novel keyphrase extraction-based query expansion technique to collect promising documents. It also uses a Conditional Random Field-based machine learning technique to extract important biological entities and relations. TAKES is applied to biological knowledge extraction, particularly retrieving promising documents that contain Protein-Protein Interaction (PPI) and extracting PPI pairs. TAKES consists of two major components: DocSpotter, which is used to query and retrieve promising documents for extraction, and a Conditional Random Field (CRF)-based entity extraction component known as FCRF. The present paper investigated research problems addressing the issues with a knowledge extraction system and conducted a series of experiments to test our hypotheses. The findings from the experiments are as follows: First, the author verified, using three different test collections to measure the performance of our query expansion technique, that DocSpotter is robust and highly accurate when compared to Okapi BM25 and SLIPPER. Second, the author verified that our relation extraction algorithm, FCRF, is highly accurate in terms of F-Measure compared to four other competitive extraction algorithms: Support Vector Machine, Maximum Entropy, Single POS HMM, and Rapier.