DATE: Thursday, April 5, 2012
TIME: 3:00 pm
PLACE: Council Room (SITE 5-084)
TITLE: Datasets for Topical Segmentation of Literature
PRESENTER: Anna Kazantseva
University of Ottawa
ABSTRACT:

In this talk I will describe two relatively large datasets for topical segmentation of literature. To study how well people find topical shifts in literature we have recruited respectively 27 and 23 undergraduate students of English and asked them to segment a portion of the novel The Moonstone by Wilkie Collins into topically coherent segments. In the first study, the students were asked to find only one level of segmentation. In the second one we asked them build a hierarchical structure of topical changes.

I will describe both datasets and some of the interesting trends we have noticed when analyzing this data. I will also touch upon some issues in using this data for evaluating automatic topical segmenters.