download0 view759
twitter facebook

공공누리This item is licensed Korea Open Government License

Title
The Prediction of Diatom Abundance by Comparison of Various Machine Learning Methods
Author(s)
신유나허태영김택근김재훈홍석수정보미이영주이희석이재경
Publisher
Hindawi Publishing Corporation
Publication Year
2019-05-27
Abstract
This study adopts two approaches to analyze the occurrence of algae at Haman Weir for Nakdong River; one is the traditional statistical method, such as logistic regression, while the other is machine learning technique, such as kNN, ANN, RF, Bagging, Boosting, and SVM. In order to compare the performance of the models, this study measured the accuracy, specificity, sensitivity, and AUC, which are representative model evaluation tools.The ROC curve is created by plotting association of sensitivity and (1-specificity).The AUC that is area of ROC curve represents sensitivity and specificity.This measure has two competitive advantages compared to other evaluation tools. One is that it is scale-invariant. It means that purpose of AUC is how well the model predicts.
Other is that the AUC is classification-threshold-invariant. It shows that the AUC is independent of threshold because it is plotted association of sensitivity and (1-specificity) obtained by threshold. We chose AUC as a final model evaluation tool with two advantages. Also, variable selection was conducted using the Boruta algorithm. In addition, we tried to distinguish the better model by comparing the model with the variable selectionmethod and themodel without the variable selectionmethod. As a result of the analysis, Boruta algorithmas a variable selectionmethod suggested PO4, DO, BOD, NH3, Susp, pH, TOC, Temp, TN, and TP as significant explanatory variables. A comparison was made between the model with and without these selected variables. Among the models without variable selection method, the accuracy of RF analysis was highest, and ANN analysis showed the highest AUC. In conclusion, ANN analysis using the variable selection method showed the best performance among the models with and without variable selection method.
Keyword
Diatom; water quality; machine learning; 규조류; 수질; 머신러닝
Journal Title
Mathematical Problems in Engineering;
Citation Volume
2019
ISSN
1024-123x
DOI
10.1155/2019/5749746
Files in This Item:
There are no files associated with this item.
Appears in Collections:
7. KISTI 연구성과 > 학술지 발표논문
URI
https://repository.kisti.re.kr/handle/10580/16041
Fulltext
 https://scienceon.kisti.re.kr/srch/selectPORSrchArticle.do?cn=NART106415710
Export
RIS (EndNote)
XLS (Excel)
XML

Browse