This paper introduces DICE, a Domain-Independent text Classification Engine. DICE is robust, efficient, and domain-independent in terms of software and architecture. Each module of the system is clearly modularized and encapsulated for extensibility. The clear modular architecture allows for simple and continuous verification and facilitates changes in multiple cycles, even after its major development period is complete.
Document classification; test bed; machine learning; text categorization; feature selection; text mining; lemmatization
Transactions on Internet and Information Systems (TIIS)