DATE: Wed, Oct 1, 2014
TIME: 12:00 pm
PLACE: Council Room (SITE 5-084)
TITLE: Hierarchical Topical Segmentation Using Affinity Propagation
PRESENTER: Anna Kasantseva
University of Ottawa
ABSTRACT:

Topical segmentation is the task of identifying places in the document where the topic under discussion changes. While there has been considerable amount of research on single-level topical segmentation, relatively little has been done about hierarchical topical segmentation. In this talk I will present an algorithm for hierarchical segmentation of free text. I will also present a relatively large and multi-annotator corpus created for this task.

Hierarchical Affinity Propagation for Segmentation (HAPS) is derived from a clustering algorithm Affinity Propagation. Given a document, HAPS builds a topical tree. The nodes at the top level correspond to the most prominent shifts of topic in the document. Nodes at lower levels correspond to finer topical fluctuations. For each segment in the tree, HAPS identifies a segment centre - a sentence or a paragraph which best describes its contents. We evaluate the segmenter on a subset of a novel manually segmented by several annotators, and on a dataset of Wikipedia articles. The results suggest that hierarchical segmentations produced by HAPS are better than those obtained by iteratively running several one-level segmenters. An additional advantage of HAPS is that it does not require the "gold standard" number of segments in advance.