This item is licensed Korea Open Government License
Title
초고성능컴퓨팅 기반 지능형 국가∙사회 이슈 탐지∙추적 기술 개발
Alternative Title
Development of Intelligent Horizon ScanningTechnologies
Publisher
한국과학기술정보연구원 Korea Institute of Science and Technology Information
Publication Year
2018-01
Description
funder : 과학기술정보통신부 funder : Ministry of Science and ICT
Abstract
본 과제는 초고성능컴퓨팅 및 인공지능 기술을 기반하여 상시적이며 지능적으로 국가·사회에 위협 또는 기회가 될 수 있는 이슈를 탐지하고 추적할 수 있는 기술 확보하고, 이를 활용한 이슈 해결형 연구과제 발굴 및 의사결정 지원을 목적으로 함. 이를 위해, 당해 연도에 다음과 같은 연구내용을 수행함.
□ 데이터 기반 이슈 분석 모델링
○ 이슈 분석 수요자 요구사항 수집 및 분석
○ 아슈 탐지·추적 기술의 활용 시나리오 도출
○ 도메인 지식 기반 감염병 이슈 분석 모델링
□ 딥러닝 기반 이슈 마이닝 기술 연구
○ 딥러닝 기반 이슈 개체 마이닝 기술 연구
○ 딥러닝 기반 이슈 이벤트 마이닝 기술 연구
○ 감염병 도메인 이슈 마이닝 학습데이터 구축
○ 이슈 마이닝 학습데이터 확장 기술 연구
(출처 : 보고서 초록 5p)
Ⅳ. Study results
■ Building a data-driven issue analysis model
○ Collecting and analysis of issue analysts’ requirements
- We organized a group of experts who came from industry, academia to gather their requirements. We also organized workshops which open to the public (four times a year)
- We collected various knowledge and experience from issue domain experts, analysts, and field managers, and then derived functional requirements for infectious disease issue detection and analysis
- To detect and track emerging issues related to infectious diseases that may be brought into our country, we defined the type of data to be analyzed and the way how to secure them
○ Drawing scenarios for applications of issue detection and tracking techniques
- We developed sophisticated scenarios that incorporated the main contents of the technology, reflecting the requirements from experts, thereby enhancing the understanding of research tasks both inside and outside
○ Building a model for analysis of infectious diseases based on domain knowledge
- We surveyed two real-world cases of HSC and RAHS, which had implemeneted and run horizon scanning tasks for years
- Results of factor analysis and causal analysis for the influx of foreign infectious diseases to our country
- Results of expert verification of factor derivation and causal mappings
- Development of infectionous disease detection and tracking model using System Dynamics technique
- Verification of the model through simulation based on two actual cases of MERS and ZIKA viruses
■ Development of deep learning-based issue mining techniques
○ Study on deep learning-based issue entity mining techniques
- Building various word embeddings with very large corpus by using CBOW, SkipGram and GloVe algorithms
- Design and development of named entity recognition model with deep learning techniques (reaching 89.84% in F1, which is the same as 98.06% of the world’s highest score for CoNLL-03 data, currently 91.62%)
- Development of a solution for unseen words which do not exist in corpora used for building word embeddings. We achieved 31.05% improvement using the solution compared to when using simple word vector only
○ Study on deep learning-based event mining techniques
- Survey of deep learning model and features suitable for event detection
- Preprocessing TAC KBP 2015 learning and evaluation dataset for learning deep learning model
- Development of an event trigger detect/classifier model that is composed of multi-layered bidirectional LSTM RNNs whose inputs are n-gramized sentences
- Performance analysis of the model with respect to various input unit sizes and types of models
- Building a word embedding database that efficiently provides various word embeddings for given words
○ Building a learning data set for the issues in the field of infectious diseases
- We defined a guideline and event and entity classes for the information about infectious diseases, referring to the conventional dataset guidelines including ACE and TAC KBP guidelines.
- We collected and preprocessed news articles and ProMed email threads that dealt with the information about infectious diseases
- By following the guideline we defined, we also tagged the text dataset to identify event information related to infectious diseases
○ Study on how to extend learning data set for issue mining tasks
- We suggested a rule-based character data conversion scheme that transforms characters that have discrete numerical ranges compared to image data having numerical ranges
- We designed an insertion module that can insert words related to the context by utilizing word associations
- We designed a replacement module that can replace similar words related to the context by using semantic language resources such as word embeddings
- We used syntactic language resources such as a parser to delete or move words that do not play an important role in the context
- We performed performance comparison of the extended data in the named entity recognition task
(출처 : SUMMARY 14p)
Keyword
호라이즌 스캐닝; 이슈 탐지; 이슈 추적; 이슈 분석; 기계학습; 텍스트 마이닝; horizon scanning; weak signal detection; issue analysis; machine learning; text mining