DATE: | Wed, Dec 3, 2014 |
TIME: | 12:00 pm |
PLACE: | Council Room (SITE 5-084) |
TITLE: | Topic Modeling of Short Social Messages |
PRESENTER: | Kenton White Girih Inc. |
ABSTRACT: Topic modeling discovers the abstract topics that occur in a collection of documents. Latent Dirichlet Allocation (LDA), perhaps the most popular topic modeling algorithm, use the statistical occurrence of words in a document to infer a topic distribution among the document collection. These techniques assume that each document is a mixture of related topics. A collection of short social messages (SSM), such as Tweets, breaks this assumption. With SSMs each document is a single topic and the collection of documents may not be a mixture of related topics. Instead, I explore using Non-Negative Matrix Factorization (NMF) to model topics in SSMs. NMF segments SSMs into topics based on inferred similarities of the authors, using author identity from the social graph. Working with a corpus of 801,943 Tweets collected from Ottawa ON in August of 2014, I compare the topics extracted by LDA and NMF. I will show how NMF can learn to extract local news topics from Twitter streams. |