Background: Influenza continues to pose a serious threat to human health worldwide. For this reason, detecting influenza infection patterns is critical. However, as the epidemic spread of influenza occurs sporadically and rapidly, it is not easy to estimate the future variance of influenza virus infection. Furthermore, accumulating influenza related data is not easy, because the type of data that is associated with influenza is very limited. For these reasons, identifying useful data and building a prediction model with these data are necessary steps toward predicting if the number of patients will increase or decrease. On the Internet, numerous press releases are published every day that reflect currently pending issues.
Results: In this research, we collected Internet articles related to infectious diseases from the Centre for Health Protection (CHP), which is maintained the by Hong Kong Department of Health, to see if news text data could be used to predict the spread of influenza. In total, 7769 articles related to infectious diseases published from 2004 January to 2018 January were collected. We evaluated the predictive ability of article text data from the period of 2013–2018 for each of the weekly time horizons. The support vector machine (SVM) model was used for prediction in order to examine the use of information embedded in the web articles and detect the pattern of influenza spread variance. The prediction result using news text data with SVM exhibited a mean accuracy of 86.7 % on predicting whether weekly ILI patient ratio would increase or decrease, and a root mean square error of 0.611 on estimating the weekly ILI patient ratio.
Conclusions: In order to remedy the problems of conventional data, using news articles can be a suitable choice, because they can help estimate if ILI patient ratio will increase or decrease as well as how many patients will be affected, as shown in the result of research. Thus, advancements in research on using news articles for influenza prediction should continue to be pursed, as the result showed acceptable performance as compared to existing influenza prediction researches.
Keyword
감염병; 인플루엔자; 기계학습; 뉴스데이터; 서포트 벡터 머신; Epidemics; Influenza; Machine Learning; News Article Data; Support Vector Machine