| DATE: | Wed, Dec 3, 2014 |
| TIME: | 12:00 pm |
| PLACE: | Council Room (SITE 5-084) |
| TITLE: | Topic Modeling of Short Social Messages |
| PRESENTER: | Kenton White Girih Inc. |
| ABSTRACT: Topic modeling discovers the abstract topics that occur in a collection of documents. Latent Dirichlet Allocation (LDA), perhaps the most popular topic modeling algorithm, use the statistical occurrence of words in a document to infer a topic distribution among the document collection. These techniques assume that each document is a mixture of related topics. A collection of short social messages (SSM), such as Tweets, breaks this assumption. With SSMs each document is a single topic and the collection of documents may not be a mixture of related topics. Instead, I explore using Non-Negative Matrix Factorization (NMF) to model topics in SSMs. NMF segments SSMs into topics based on inferred similarities of the authors, using author identity from the social graph. Working with a corpus of 801,943 Tweets collected from Ottawa ON in August of 2014, I compare the topics extracted by LDA and NMF. I will show how NMF can learn to extract local news topics from Twitter streams. | |