DATE: Thursday, Mar 3, 2011
TIME: 3:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Text Segmentation Using Affinity Propagation
PRESENTER: Anna Kazantseva
University of ottawa
ABSTRACT:

Text segmentation is the task of splitting a document into segments characterized by relatively constant topic. Alternatively, one can look at it as the task of identifying topical shifts. Text segmentation provides a simple picture of the informational structure of a document. As such, it is a useful intermediate step for many higher level language-related tasks (e.g., text summarization, question answering, co-reference resolution, etc.)

In this talk I will present a new algorithm for linear test segmentation. It is an adaptation of a state-of-the-art clustering algorithm, Affinity Propagation (Givoni and Frey 2009). The algorithm takes as input a (possibly sparse) matrix of pairwise similarities between sentences. It outputs segment boundaries and also segment centers – sentences that best capture the informational content of a segment.

Using a very simple similarity metric the algorithm performs on par or outperforms two state-of-the-art segmenters on several benchmark datasets.