Semisupervised Text Classification Using Unsupervised Topic Information
PRESENTER:
Sylvie Ratte and Ruben Dorado
Ecole de technologie superieure, Montreal
ABSTRACT:
Labeling corpora is a time consuming and recurring problem while
developing practical NLP applications. This study aims to propose a method
to increase the speed when developing categorized corpora for text
classification. We report advances in the task by presenting a
semi-supervised method to build a text classifier using unsupervised topic
information. The objective is to use the least amount of labeled data to
accelerate the creation of corpus for classification in specific domains.
We show that it is possible to obtain a performance similar to
state-of-the-art methods, despite the limited quantity of data. We finally
discuss the overall objective of the research and future research.