Automatic text classification is widely used as a basic method for analyzing data. While classification methods like support vector machines (SVM) have exhibited the highest performance in the past, the recent use of deep learning has led to many advancements in text classification. This study presents a deep learning-based classification model for national research and development (R&D) information with complex structural features, large text, and large-scale classification classes. In addition to the word–sentence structure of a simple document, the number of stacking layers of the deep model is raised by considering the higher-level structure of items. Based on experimental results on 180,000 datasets and 366 classification schemes, we achieved a performance improvement of 22.7% over conventional SVM, and 15.7% over the conventional model using structured modeling of word-sentences. This performance improvement was achieved because the multi-layered stacking method was applied to enhance learning by stacking 5-to-10 times the depth of the conventional model and by effectively combining features of the heterogeneous items. Despite the limited availability of datasets with complex structures, the proposed model adopted for national R&D information is equally applicable to datasets with similar structures.
Keyword
Structured document; Text classification; Deep learning model; Deep model architecture