An essential matter in the knowledge-based information society is how to extract useful information quickly from a large volume of literature. Since most existing data mining frameworks deal with structured input data, many limitations are faced in analyzing unstructured scientific literature and extracting new information. This study proposes a scientific-knowledge processing framework, which offers high performance by using grid computing technology for extracting important entities and their relations from the scientific literature. Since the grid computing provides a large volume of data storage and high-speed computing, the proposed framework can efficiently analyze the massive body of scientific literature and process knowledge. The workflow tool that we have developed for the proposed framework enables users to easily design and execute complicated applications that consist of complicated scientific-knowledge processes. The experimental results showed that the proposed framework reduced working time by approximately 83% when the number of running nodes was assigned in accordance with the workload ratio of each step in scientific-knowledge processes. As a result, it is possible to effectively process a large volume of scientific literature by flexibly adjusting the number of computing nodes that constitute the grid environment as the number of documents for processing increases.
scientific knowledge processing; Grid computing; workflow; data mining framework; text mining