서지정보 추출/자료 가공 자동화 및 통합문서 뷰어의 개발에 관한연구
Alternative Author(s)
An, Myung-Soo; Cha, Jae-Choon; Hwang, Deok-Chang; Chun, Yong-Ju; Kim, Geun-Sook
Korea Institute of Science and Technology Information
Publication Year
funder : 국무조정실
agency : 한국과학기술정보연구원
agency : Korea Institute of Science and Technology Information
본 연구에서는 한글 DVI 문서를 일반적인 텍스트 문서로 변환하는 도구인 DVITOTXT를 개발하였고, 이를 기반으로 석박사 학위논문의 서지정보 및 목차정보를 추출하여 SGML 형식으로 저장하는 시스템을 개발하였습니다. 부가적으로는 SGML 형식 파일로부터 하이퍼 링크가 구현되는 목차정보 DVI 파일을 생성하는 도구를 개발하였습니다. 특히 DVI에서 텍스트를 추출하는 과정에서 띄어쓰기를 구분하도록 하여 미려한 본문검색 결과를 줄 수 있도록 하였으며 목차정보 역시 DVI 파일로 생성하여 목차로부터 내용을 열람하는 과정 중에 DVI 플러그인의 캐쉬기능을 활용할 수 있게 하였습니다. 웹 브라우저 안에서 플러그인 형태로 동작하는 TIFF 뷰어를 개발하였습니다. 본 과제에서 개발한 통합 Multi-TIFF 뷰어는 특히 웹 기술과 연동 가능한 TIFF 문서를 제공하는 방법에 중점을 두고 설계되었습니다. 많은 페이지를 포함한 Multi-page TIFF의 경우 다운로드 도중에도 문서를 보여주며, HTML의 frame와 window 기능을 지원하여 TIFF를 포함하는 웹 페이지를 다양하게 설계할 수 있도록 배려했습니다. 또한 DVI 문서 뷰어인 TeXplus Viewer와 일관성있는 인터페이스를 지원해 사용자의 편의성을 도모하였습니다.

This research has purpose on automatization of indexing and extracting bibliographic information from theses and articles, which would enable searching and reading the materials on-line, and on development of an on-line TIFF Viewer necessary for providing TIFF documents on the internet. The indexing processes have been managed manually so far, although the automatization of the indexing process was considered to be necessary due to the huge amount of data that should be processed in the future. As results of this research the Digital Library managed by KORDIC will be equipped with appropriate tools for searching and providing in contents and texts of the electronic documents, that have been considered impossible by the classic library systems. The technology for providing the original document delivery service is quite a requirement for web based document services these days. Although the use of much more advanced document formats like DVI, PDF are gradually increasing, a lot more amount of documents are already existing in TIFF format so that methods for delivering these documents on the web are needed. Actually there has been no satifactory applcation for viewing multi-page TIFF documents on the web. In this research we developed TIFF Viewer as a solution for delivering multi-page TIFF documents on the web. This kind of service is uniquely the first and would be a landmark in domestic information industry. In this research we developed DVITOTXT which is a tool for converting Korean DVI documents into text files and FINDBIB which extracts the bibliographic information and contents from the Korean master's and doctor's theses and convert them into SGML format. In addition we developed a process which generates the contents DVI file, on which the hyper-link service is realized, from the SGML file. Particularly in the process of converting DVI files into texts, we made a carefull attention to blank spacing of the texts so that the search output would more readable and beautiful. We generated the contents file using DVI format in order to utilize the cache mechanizm of the TeXplus DVI plug-in. The TIFF Viewer is a plug-in application for web brousers. This integrated multi-TIFF viewer is designed with emphasis on the linkage with the web technology. For multi-page TIFF documents containing many pages TIFF Viewer shows the downloaded pages while retrieving the other pages simultaneously. Frame and Window methods of HTML are supported so that designing web pages containing TIFF files would be possible in various ways. The user interface is constructed consistent with that of TeXplus Viewer for the user's convenience. As a result of this research the Digital Lirbrary of KORDIC will be equipped with an effective method for providing materials to the users. The on-line service of searching in theses in DVI format will be of great value to the information requirors. Furthermore It would be an evidence that extracting bibliographic information from Korean DVI documents is actually possible, proving the superiority of DVI format as the contents format of Korean digital libraries. As the result of the development of TIFF Viewer, the followings are expected. Firstly the method for delivering the TIFF documents on the web service is aquired and secondly a lot more various designs of web interfaces became possible. The users can view the TIFF documents on the web brouser without any external application and can be provided with consistent user interfaces.The on-line documents provider became free for using DVI documents or TIFF ones whichever are available and are able to provide identical user interfaces for both file formats.
