DATE: Tuesday, Jan 19, 2010
TIME: 3:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Toward the Twuring Test: Conversation Modeling using Twitter
PRESENTER: Colin Cherry
Institute for Information Technology, NRC
ABSTRACT:

The growing popularity of social media has had an interesting side-effect for language researchers: services such as Twitter have resulted in people having instant-messenger-style conversations using a public medium. This creates a unique opportunity to collect, study and model large-scale conversation data. We present a method for mining conversations from Twitter's public feed. The resulting conversation corpus, which will be made publicly available, has more than 1.3 million conversations, providing a rich resource for the study of both Twitter and Internet chat. We present several methods that attempt to model the flow of conversation by discovering latent classes over Tweets. We show that a repurposed content model (Barzilay and Lee 2004) can discover meaningful dialogue acts, such as "question" and "comment", which indicate not only the role a Tweet plays in its conversation, but also the sorts of Tweets that are likely to follow. This model is improved and extended by employing a Bayesian approach, allowing us to model a conversation's topic and to introduce sparse priors during learning.