DATE: Thursday, March 21, 2013
TIME: 3:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Evaluating Text Segmentation
PRESENTER: Chris Fournier
Ottawa University
ABSTRACT:

This talk focuses upon the evaluation of automatic and manual text segmentations. Text segmentation is the process of placing boundaries within text to create segments according to some task-dependent criterion. An example of text segmentation is topical segmentation, which aims to segment a text according to the subjective definition of what constitutes a topic. A number of automatic segmenters have been created to segment text, and the question that this work answers is how to select the best automatic segmenter for such a task. The evaluation of automatic text segmenters requires: a suitable segmentation comparison method to judge similarity between an automatic segmentation and a manual segmentation; an inter-coder agreement coefficient to ensure that the manual segmentations can be relied upon; and a methodology that can be used to select the best automatic segmenter from a set. In this work, the state-of-the-art of all three such evaluation elements are researched. Additionally, new elements are proposed and evaluated which solve a variety of their flaws. This work contributes a new: edit distance for segmentations, referred to as boundary edit distance; segmentation comparison method based upon the edit distance proposed, referred to as boundary similarity (B); inter-coder agreement coefficient based upon boundary similarity; and recommended methodology for applying these contributions. Further, a meta-evaluation is performed that compares these contributions to the state-of-the art that exists.