DATE: | Tuesday, Nov 10, 2009 |
TIME: | 3:30 pm |
PLACE: | Council Room (SITE 5-084) |
TITLE: | Compact Features for Sentiment Analysis |
PRESENTER: | Lisa Gaudette University of Ottawa |
ABSTRACT: Sentiment Analysis is the problem of identifying opinions from text automatically. While a wide variety of approaches have been attempted, the Bag of Words representation of a document is common in approaches involving machine learning. This representation generally involves thousands of features, and is very sparse, particularly for short documents. I will discuss a method to use word scoring techniques to learn scores for words, and develop features for a machine learning algorithm based on the distribution of these scores in the documents. This results in a very compact representation of the documents. I will then present the results of a thorough test of this method across a wide range of datasets, showing that this compact representation is frequently better in terms of performance than a baseline Bag of Words classifier, while being considerably faster to train. The speed advantages of this method allow for the processing of very large collections of documents, which are frequently available in the form of online user reviews. I will also demonstrate that this method can be used for other similar text classifying problems. |