This item is licensed Korea Open Government License
기계학습을 활용한 중소기업 R&D 정보지원 수요 기업 예측 모형 연구
A Study on Demand Predictive Model for SME R&D Information Supporting
한국과학기술정보연구원 Korea Institute of Science and Technology Information
funder : 과학기술정보통신부 funder : Ministry of Science and ICT
본 연구에서는 제목과 같이 중소기업의 R&D 정보지원 수요에 대한 데이터 기반 서비스 방법을 찾고자 했으며, 이를 위해서 먼저 R&D 정보지원 대상(수요)을 기술사업화분석센터의 3대 정보지원 기능인 R&D 기획/기술가치평가/기술(시장)정보 지원으로 나누어 수요를 구분함. 추가적으로 중소기업을 위한 컨설팅 기능(처방적 서비스) 개발을 위해서 중소기업 공동연구 전략 추천 알고리즘 개발도 진행함. 구체적으로 공동연구 기관 matching 수요를 찾는 방법을 선도적으로 찾는데, 대안을 대학, 출연연, 대기업, 중소기업, 민간 연구기관의 5개 대안으로 구분하여 진행함. 한편, 정보 지원을 받는 중소기업을 이해하기 위해 잠재적 수요 집단의 군집을 규명함. 이를 위해 ASTI 기업의 추가 설문을 통해 R&D를 수행하는 중소기업의 특징과 실제 서비스를 받고 있는 중소기업의 특징을 비교해 군집분석 결과를 실증하고 연구 결과의 일반화 가능성을 확인함.
(출처 : 보고서 초록 5p)
II. Necessity of Research Project
□ Various support systems and measures are needed for continuous growth and fostering of small and medium-sized enterprises.
○ SMEs lack decision support information and planning capability needed for growth stage compared with large companies, and demand for new business development and global market entry is urgently needed for sustainable growth.
○ In the R&D and commercialization of small and medium-sized companies, a lot of cost and time are required for acquiring information and planning related technologies and markets, and a support system is needed.
○ Based on the information and capability of analysis and evaluation of KISTI, we can play a big role by suggesting measures to increase the efficiency of the government to support the continuous growth of small and medium-sized enterprises.
□ Extend technology to extract insight from data and analyze results big data using machine learning.
○ It has long used machine learning to analyze various data generated in real time in stock investment or capital market.
○ In recent years, big data-based machine learning analysis has been used in the government to select work supervision sites, revitalize local tourism, and forecast the productivity of weather-based farms.
○ Decision tree analysis is one of the simplest but powerful machine learning (map learning) and is used in various fields through general-purpose software.
□ KISTI requires deep and objective understanding of small and medium-sized enterprises receiving information support (technology, market, planning, M&S, valuation, etc.)
○ In order to grasp the information support demand of SMEs in the past, the demand and characteristics of the enterprises are grasped by ASTI companies through self-response type of short survey or interview.
○ As the information that can be utilized by the effect of the government 3.0 and so on is expanded now, analysis of diversified data enables analysis of SMEs with high demand for information support by grasping new relationships, patterns, and trends.
○ Data mining (profiling) can be used to predict customer departure (avoidance), target product offering (service), and recommend product (service) by predicting demanded companies that support R&D information.
○ Through the identification of companies with high demand for information support, it is possible to concentrate on managing customers and provide efficient services. Understanding of companies with high demand will also lead to greater satisfaction (performance).
□ It is possible to provide new evaluation reference information for government or KISTI R&D support projects or to select companies for valuation support. (New business or area development)
○ The need for objective and transparent budget execution of the national budget is increasing, and it is already obligatory to utilize various information as well as experts in making budget support decisions from overseas.
○ The quantitative information used in the selection of existing government support projects was mostly financial information-oriented information, and self-response information was collected. However, it has been used as information that has subjective influence on the decision making of the expert rather than the objective effect on the decision making.
○ By using data mining (profiling), we can provide reference indicators of selected company evaluation by analyzing applicant’s suitability for supporting business, detection of fraud (abnormal case), and possibility of success.
○ In this study, we propose a method of profiling high-demand companies for information support, and profiling high-demand companies for new support companies. (Ie, a methodology that predicts whether a new company is a high-demand enterprise for information support or which information support is appropriate).
III. Scope and contents of R&D
□ Research Purpose
○ The purpose of this study is to identify the policy needs of government - funded research institutes and SMEs for technological cooperation, and identify the groups that need technical cooperation from the government - funded SMEs. In addition, from the viewpoint of the government-funded research institute, we seek to identify ways to identify companies that are highly likely to have technology cooperation with the government-funded research institutes, as well as companies that need R&D information support, and contribute to the advancement and efficiency of related science and technology policy enforcement.
□ Main contents of research
○ In this study, we sought to find a data-based service method for R&D information support needs of SMEs. To do this, we first divide demand for R&D information support (demand) by R&D planning, technology valuation, and technology (market) information support of our center.
- In addition, we are developing the recommendation algorithm of joint research strategy for SMEs to develop consulting function (prescription service) for SMEs.
○ Next, we identify the clusters of potential demand groups to understand SMEs receiving information support.
- Through an additional questionnaire on ASTI companies, we demonstrate cluster analysis results by comparing the characteristics of SMEs performing R&D with the characteristics of SMEs receiving actual services.
□ Research Methodology
○ Research conducted according to IBM’s proposed data mining standard implementation process.
1) Business Understanding. Understanding the business. For example, basic knowledge of the field such as insurance business, credit card business, distribution business, etc. should be understood through various reference tables and communication with the responsible person in charge. Identify problems that can be accessed by data mining. Therefore, in this study, the analysis of the previous research for the understanding of KISTI service and the government SME support policy will be proceeded.
2) Data Understanding. Understanding the data that the collaboration holds and manages. The number of records, the type of variables, the quality of data values, and the data management system. Because there are often hundreds of variable types and data scattered, it takes time to understand exactly one organization’s information system. In this course, we will understand the data through understanding and basic analysis of SMEs’ technical statistics survey results of Small and Medium Business Administration (SME).
3) Data Preparation. Data gathering is performed and data cleaning is carried out in order to collect data in an analytical state. For example, customer name, address, and telephone number are arranged in one standard form. If the DB of the company is composed of a customer DB, a product DB, and a transaction DB, then we need to pull the information we want from the three items into a single data set. In this process, data maintenance is often required. For example, date information prepared in various forms should be unified into one. This step requires a lot of effort, so it usually takes more than 50% of the total project schedule. We also need to create the variables we need. For example, if you need total transaction value for each customer, you need to bundle individual transaction records for each customer. In this research process, we should summarize the results of the survey on the 10,000 cases that have been carried out for the last 6 years, prepare the data by combining the variables used in each survey.
4) Modeling. Includes necessary modeling including data description and exploration. This includes unsupervised learning such as clustering, and association analysis, and supervised learning such as neural networks, tree structured models, and decision trees. In this study, we tried to profile related variables and companies by using not only supervised learning but also unsupervised learning.
5) Evaluation. Evaluate whether the model generated at the modeling stage is well interpreted or reproducible even when applied to independent new data. In this research process, we can apply the modeling results derived from the mentoring service beneficiary companies among the ASTI companies or companies benefiting from R&D planning support projects of the Small and Medium Business Administration.
6) Development. The step of applying the reviewed model to actual business. For example, it calculates the score of exit possibility for all customers and sends them to the client manager for necessary measures. This study also presents the modeling results of all the variables analyzed in this study as a complete model. In addition, from the aspect of the reduced model, the possibility of further development of the ASTI enterprise database by additional modeling of the available variables is examined.
□ Research process
○ 10,000 cases related to SME R&D activities
- Basic analysis of SME technical statistics survey data for 2011-2015 (Small and Medium Business Administration survey)
- Data search and understanding (Step2)
- Preparation of data refining and analysis variable preparation (Step3)
○ Profiling of companies demanding R&D through data mining
- Understanding business: analyzing previous research and understanding output variables to understand government SME support policy, KISTI service (Step1)
- Perform necessary modeling (profiling, clustering, etc.) including data description and search (Step 4): Combine superviesed learning (control) and unsupervised learning (relationship analysis, clustering) Etc.), use of ensemble model, development of accuracy (stability) model and economic model, use of commercial SW (SPSS Statistics, modeler)
- Review and evaluation of the reproducibility of the model generated at the modeling stage (Step 5)
○ Validation of results and linkage to project
- Verification to develop the modeling results after review into real projects (R&D support project, technology valuation, etc.)
- To derive a service scenario to expand modeling results (Step 6)
- Linking method to support ASTI companies (Systematization)
○ Comparison of required enterprise groups for generalization
- Cluster analysis of R&D SMEs
- Usage questionnaire for ASTI companies
- Comparison of R&D SMEs and ASTI companies
IV. R&D achievements
□ Qualitative achievements
○ Theoretical basis for constructing objective decision support information for efficient R&D in small and medium-sized enterprises in terms of technology
- We present theory and hypothesis method through inductive research methodology deviating from existing deductive approach
- It is possible to enhance the research capability of R&D support methods for SMEs and SMEs by using machine learning or artificial intelligence.
- This work enhanced KISTI’s research capability and position in data-based SME R&D support
○ Contributing to the efficiency and advancement of government-funded research institute administration in terms of economy and industry.
- Decrease the investment time to find new SMEs in order to respond to SME support policy
- Provide customized services for small and medium-sized enterprises that need them, or use them for target marketing of policy projects to increase the efficiency of R&D investment and support
- The profiling and machine learning-based statistical approach presented in this study provides more objective decision support information than the existing administration method of finding
and selecting support companies based on experts or performance.
- Therefore, it is expected that it will help administrative efficiency by enhancing the success rate of the policy projects.
○ Establishing the rationale for the establishment of objective decision support information for efficient R&D of SMEs.
- Contributing to enhancement of R&D success rate and shortening development period by suggesting optimal R&D support policy or investment direction for SMEs
□ Quantitative achievements
○ 10,000 cases of SME R&D innovation
○ Deriving more than 5 profiling (linkage derived 8 cases)
○ More than 2 articles from domestic and international journals (including an article published in top 10% SSCI journal)
○ Two papers on domestic journals
V. Plan to utilize R&D results
□ Find new research projects
○ Provide a basis for receiving new research projects from the SMBA or STEPI (Enhancement of research capacity and external verification (publications and submitting papers))
□ Using in the main project in 2018 or outsourced projects
○ Based on the results of the research in 2017, MyKISTI will develop diagnosis and recommendation services, and further offer a project to the Small and Medium Business Administration to provide the services.
○ Review progress of self-survey data collection through major projects in 2019
○ Providing profiling service by expanding linkage to conformity assessment of executing body for technical value evaluation, ASTI, etc.
(출처 : SUMMARY 13p)
중소기업; 연구개발; 기계학습; 프로파일; 데이터마이닝; 예측모형; SMEs; R&D; Machine Learning; Profile; Data Mining; Predictive model