Park, Jin-Seo; Kim, Eun-Jin; Kim, Jae-Seong; Park, Gyeong-Seok; Park, Dong-Un; Park, Seon-Yeong; So, Dae-Seop; Lee, Dong-Ho; Lee, Yong-Ho; Lee, Jun-Yeong; Ju, Won-Gyun
한국과학기술정보연구원 Korea Institute of Science and Technology Information
funder : 미래창조과학부 funder : KA agency : 한국과학기술정보연구원 agency : Korea Institute of Science and Technology Information
□ 연구개발 필요성
○ 데이터를 활용한 증거기반 의사결정(evidence-based decision-making)의 중요성 부각
○ 증거기반 의사결정을 위한 데이터 활용 전략에 있어서 가장 큰 걸림돌이 ‘가시화’ 혹은 ‘시각화’
○ 미국의 빅데이터 연구개발이니셔티브에서 주목할 점은 ‘가시화’가 빅데이터에 기반한 가치창출에서 가장 중요한 기술요소라는 점
□ 연구개발 목적
○ 증거기반 의사결정지원을 위한 과학기술 이슈 및 논쟁 맵핑 방법론 정립
○ 비정형 데이터를 활용한 네트워크 가시화의 타당성 검토
□ 주요 연구내용
○ 텍스트 분석의 이론적 배경
∙ 자연어 처리와 시맨틱 네트워크
∙ ANT(Actor-Network Theory)와 Co-word Mapping
○ 과학기술 논쟁의 유형과 논쟁연구의 함의
∙ 과학기술 논쟁에 대한 선행 연구 고찰
∙ 과학기술 논쟁과 접근 방식: 1) 실증주의 접근, 2) 집단정치, 3) 구성주의, 4) 사회구조
∙ 과학기술 논쟁 및 이슈의 부상 배경
○ Co-word Mapping 활용 및 가시화 방법론
∙ Co-word mapping의 활용 사례
∙ 유사도 측정과 커뮤니티 탐지
∙ co-word mapping 가시화
∙ 가시화 툴
∙ 신문기사에서 방사능 이슈의 변화
∙ 과학기술의 사회적 이슈 변화 모델링
∙ 줄기세포와 발암물질
□ 연구의 의의
○ 학제 간 연구를 통해 향후 비정형 빅데이터 분석에 필요한 주요 연구 이슈 확인
○ 과학기술 이슈 및 논쟁은 그동안 해당 분야 전문가의 연구영역이었으나, 본 과제를 통해 실시간 이슈 모니터링 시스템을 구축할 수 있는 가능성 확인
Ⅲ. Major Study Results
□ Theoretical Background of Text Analysis
○ Processing of the natural language and semantic network
- Outline of the processing of the natural language
∙ The accuracy of the processing of the Korean natural language needs to be improved as the technical infrastructure for the analysis of issues and controversies including text data preprocessing, word spacing, typo correction, processing of allomorphs with the same meaning, morpheme analysis
∙ Upon the preliminary morpheme analysis, major key words need to be reviewed by the specialists to help raise the reliability of the results of the morpheme analysis.
- Core concepts of the semantic network
○ ANT(Actor-Network Theory) and Co-word Mapping
- Review the major literature of the ANT (Actor-network Theory) study group that proposed co-word Mapping.
- Though existing within the tradition of text analysis, co-word analysis is based upon a brand-new principle and offers a new possibility in that, rather than simply belonging to the scope of the quantitative methodology, it can also create a synergy effect by accommodating both the qualitative and quantitative methodologies.
- Key implications
∙ Most of the existing studies are focused on visualizing and interpreting the changes in the scientific areas based on the scientific papers (patents) and making predictions therefrom.
∙ Without doubt, such efforts can be further applied to the analyses of diverse social controversies including scientific outcomes.
∙ As far as controversy studies can be visualized, the status of controversies can be understood and the evolutionary process can be predicted, significantly contributing to reducing social costs and understanding the nature of controversies.
□ Types of Scientific Controversies and Meanings of the Controversies
○ Review of the previous studies on scientific controversies
○ Scientific controversies and approaches: 1) positivist approach, 2) group politics, 3) constructurism,
4) social structure
○ Background of the emergence of scientific controversies and issues
- A natural phenomenon in the age of post-normal science that is represented by uncertainty
□ Application of Co-word Mapping and Visualization Methods
○ Case of Application of Co-word Mapping
- In order to better use the issue and controversy mapping in policies, this study reviewed in advance the existing papers that used the issue and controversy mapping in the major social science journals (PUSS, SSS, etc.)
○ Similarity Measurement and Community Search
- Similarity measurement is important as it is the measurement value that must be chosen to detect communities (issue, cluster, etc.) through diverse network visualizations or statistical techniques.
- The concept of community structure is employed to catch the tendency or trend of nodes beingorganized as modules. (community, cluster, topic, group, etc.) This, for example, means that the members within a community are more similar to each other.
- Community, in general, is equal to a cluster on the node in a social network graph, and, in this case, the graph should be sparse in density. In other words, the connecting lines should be sparse for searching to be easier.
- If the graph is too dense, cluster search using the structural attributes becomes meaningless. The difficulty in co-word mapping lies here, and the density of co-word map is generally highly dense.
○ Co-word Mapping Visualization
- van Eck & Waltman (2010) categorized the maps in the bibliometric analysis technique area into the distance-based maps and the graph-based maps.
- Leydesdorff (2014) divided visualization largely into Latent Semantic Analysis (LSA) and Social Network Analysis (SNA).
- While LSA concentrates on the meanings inherent in text data, SNA is focused on the network of the discoverable relations. In other words, LAS visualizes the hidden data from the structure, and SNA visualizes the network of the discoverable relations
○ Visualization Rule
- Review of major literature
∙ Social Network Analysis Tools In Communication Systems and Network Technologies: compared Networkx, Gephi, Pajek, Igraph.
∙ A Comparative study of social network analysis tools: compared GraphViz, Tulip, UCInet, JUNG, GUESS.
∙ Recent Large Graph Visualization Tools - A Review: compared MultiNet, NetMiner 3, ORA, Pajek, statnet/sna, UCINET + NetDraw, etc.
□ Case Study
○ Changes in the issue of radioactivity in the newspaper articles
- Data: Collected the newspaper articles containing the word 'radioactivity' from the Internet search portal Naver between Jan.1, 2010 and Dec.31, 2012: 24,802 cases in total
- Purpose of analysis: kept track of changes in the issue of radioactivity before and after the breakout of Fukushima nuclear accident throughout media articles
- To trace the changes, the search period was divided into 1) the year before the breakout of the accident (2010), 2) six months before the accident (Sep.2010-Feb.2011), 3) immediately after the accident (Mar.-May 2011), 4) six months after the accident (Jun.-Nov.2011), and 5) the next yeat to the accident (2012)
- Word extraction and selection of the words for analysis
∙ 460,204 words extracted through the analysis of morphemes
∙ 1st selection of words: Out of common nouns (nc), proper nouns (nq), common nouns denoting movement (nca), and common nouns denoting status (ncs), 800 words were selected based on the frequency and timeliness and co-word mapping was conducted.
∙ 2nd selection of words: based on the co-word mapping results of the words from the 1st selection, 450 words were selected and mapping was conducted.
∙ Selection of the final words: based on the co-word mapping results of the words from the 2nd selection, synonyms were processed and important words were re-selected, resulting in a total of 191 keywords for analysis.
○ Modelling of the changes in the social issue of scientific technology
- To find out whether a modelling of the changes in the social issues of scientific technology in advance, an analysis of the process consisting of word extraction→keyword mapping→issue verification→changes in issues’.
- Major results of the study
∙ Unlike the common social network, the connection distribution of the nodes in the co-word network does not bear power law, while the weight distribution of the edges shows a typical power law.
∙ Co-word map was visualized through VOSViewer that uses association strength as the measurement value of similarity, and the major issues of each area were verified to be clustered per period.
∙ If issues (community, modularity, etc.) are verifiable, the changes in issues were judged to be better schematized based on the changes in location seen from the two perspectives: how much centrally (in the relation with other issues) the issue is located to the distance-based map than the strategic map that is frequently used in co-word mapping, and how strong the strength of exposure is.
∙ Central or peripheral issues: location in the distance-based map (center of the issue ≈ clue to the solution)
○ Stem Cells and Carcinogens
- Stem cells
∙ The analysis of the newspaper articles across the entire period showed the issues related to stem cell were evenly covered, and, presumably reflecting a heightened interest of people in stem cell prompted by the Hwang Woo-Suk research scandal since 2005, such words Hwang Woo-Suk, ovum, and embryonic stem cell frequently showed up.
∙ The mapping analysis also clearly showed the heightened interest in stem cell after the Hwang Woo-Suk scandal led to the bio market, biological treatment and new drugs, etc. The newspaper articles also covered the controversies circling the roles of president and political circle while Hwang Woo-Suk rose to the position of a national star scientist.
∙ Meanwhile, Agora, a community forum of the search portal Daum, was a bit different in that most of the key words centered around such words related to patents and stem cell research. With the word 'patent' closely related to such key words as 'economic value', 'embryonic stem cell drug', and 'expectations', it points to a strong interest of people in the actual effects that might have been available through stem cells.
∙ Internet cafes and blogs by far stood out. As shown through co-word mapping, the blogs frequently carried very detailed key words like treatment, adult stem cell, stem cell drug, cosmetic product, showing a close relationship between cafes and blogs.
∙ In newspapers, words like food product-consumer, cancer-tobacco, asbestos-local resident-building, and environment-government-standards-dioxin were frequently visible along with the central concept of carcinogenic materials.
∙ Agora, however, showed the words related to social issues and responsibility for such events like the US-food-BSE disease, Japan-environment-accident, people-government-media, and public servant.
∙ Cafes and blogs seemed to take a middle way between newspapers and Agora, carrying such words as cancer-food (saccharin, sweetener, meat), food-atopy research, the US-environment-BSE disease, chemicals-air pollutant-water, and tobacco-smoking. Articles in cafes and blogs seemed to be more in-depth and touch issues of personal interest.
□ Meanings of the Study
○ Identified the major research issues required for the future analysis of the unstructured big data through interdisciplinary research
- Attributes of the structured/unstructured data, difference in the results following the measurement of similarity and the community search algorithm, difference between LSA and SNA in the analysis of arguments, etc.
- In the analysis of the unstructured big data, few interdisciplinary studies have been conducted due to the differences in perception between the technology-based approach and the analysis-based approach. This study helped identify the possibility of KISTI's-led R&D projects.
- Analytical capability has been fostered across the entire process of data collection⇒morpheme extraction through the processing of the natural language⇒selection of the words for analysis⇒ matrix creation and similarity measurement⇒ visualization⇒issue identification and verification⇒ monitoring of changes in issues.
○ The scientific technology-related issues and controversies have been the exclusive territory of the experts and specialists engaged in the relevant fields, but this study showed a monitoring system capable of a real-time monitoring of issues may be established.
- For example, in the case of stem cell, such issues as stem-cell plastic surgery and stem-cell cosmetic products were found to be the issues emerging in the stem cell field in addition to the issue of ethics.
- In the future, through raising the accuracy of morpheme analysis for categorizing the types of issue and argument, the study is expected to contribute to even visualizing the relations among those leading the issues and arguments as well as identifying the changes in the issues and arguments.
Network Visualization; Co-word Mapping; Stem Cell; Carcinogen; Actor-network Theory