download498 view1,182
twitter facebook

CC_BYThis item is licensed Creative Commons License

Title
Query Formulation for Heuristic Retrieval in Obfuscated and Translated Partially Derived Text
Author(s)
Aarti KumarSujoy Das
Publication Year
2015-03-30
Abstract
Pre-retrieval query formulation is an important step for identifying local text reuse. Local reuse with high obfuscation, paraphrasing, and translation poses a challenge of finding the reused text in a document. In this paper, three pre-retrieval query formulation strategies for heuristic retrieval in case of low obfuscated, high obfuscated, and translated text are studied. The strategies used are (a) Query formulation using proper nouns; (b) Query formulation using unique words (Hapax); and (c) Query formulation using most frequent words. Whereas in case of low and high obfuscation and simulated paraphrasing, keywords with Hapax proved to be slightly more efficient, initial results indicate that the simple strategy of query formulation using proper nouns gives promising results and may prove better in reducing the size of the corpus for post processing, for identifying local text reuse in case of obfuscated and translated text reuse.
Keyword
Heuristic; obfuscated; translated; simulated paraphrasing; retrieval; Hapax; query formulation; pre-retrieval
Journal Title
Journal of Information Science Theory and Practice
Citation Volume
3
ISSN
2287-4577
DOI
10.1633/JISTaP.2015.3.1.2
Files in This Item:
Thumbnail E1JSCH_2015_v3n1_24.pdf1.5 MBDownload
Appears in Collections:
8. KISTI 간행물 > JISTaP > Vol. 3 - No. 1
Type
Article
URI
https://repository.kisti.re.kr/handle/10580/8672
Export
RIS (EndNote)
XLS (Excel)
XML

Browse