DATE: Tuesday, Feb. 13, 2007
TIME: 2:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Filtering noise from gazetteer generated for unsupervised NER
PRESENTER: David Nadeau
University of Ottawa
ABSTRACT:

Unsupervised Named Entity Recognition (uNER) offers a solution to the NER problem for virtually any entity type (person, location, car brand, book title, etc.) without the prerequisite of large quantities of manually-annotated data. In a U. Ottawa proof-of-concept uNER system, one of the key components is a bootstrapping algorithm that generates large gazetteers (i.e. lists of NEs) starting with very few examples. The main problem of this algorithm, and presumably the main problem of most bootstrapping algorithms, is handling noise - for instance, noise due to concept drift. In this talk, we compare two approaches to noise filtering. The first approach is based on information redundancy and has performed well on the task of filtering noise from lists of films, cities, countries, and mayors. The second approach is based on a new model that we call General Feature Model (GFM). We show experimentally how GFM-based filtering compares to filtering based on information redundancy, and also show how the two approaches can be combined.