DATE: | Monday, Oct 18, 2010 |
TIME: | 3:30 pm |
PLACE: | Council Room (SITE 5-084) |
TITLE: | A Scalable Semi-supervised Multinomial Naive Bayes Method for Text Classification |
PRESENTER: | Jelber Sayyad Shirabad University of Ottawa |
ABSTRACT:
Multinomial Naive Bayes (MNB) has been widely used in text classification due to its simplicity and computational efficiency. A number of semi-supervised learning methods have been proposed to improve the accuracy of MNB using unlabeled documents. However, there are applications where improving AUC is desirable. This talk presents a method to improve the performance of MNB in terms of both AUC and accuracy using large number of unlabeled documents. We propose a new method called Semi-supervised Frequency Estimate(SFE) and compare its performance to Expectation Maximization (EM) on the same task. Our experiments show that while EM+MNB improves the accuracy of MNB, the same can not be said about its performance in terms of AUC. On the other hand, SFE improves both accuracy and AUC when compared to MNB or alternative semi-supervised EM+MNB. In terms of performance SFE often was at least two orders of magnitude faster than EM+MNB. |