DATE: Tuesday, Jan. 23, 2007
TIME: 2:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Analysis and Construction of Noun Hypernym Hierarchies to Enhance Roget's Thesaurus
PRESENTER: Alistair Kennedy
University of Ottawa
ABSTRACT:

Lexical resources are machine-readable dictionaries or lists of terms, where semantic relationships between the terms are somehow expressed.  These lexical resources have been used for many tasks such as word sense disambiguation and determining semantic similarity between terms.  In recent years some research has been put into automatically building lexical resources from large corpora.  I examine methods of enhancing an existing lexical resource, rather than constructing one from scratch.  Roget's Thesaurus is a lexical resource that groups terms/phrases together based on degrees of semantic relatedness.  One of Roget?s Thesaurus' weaknesses is that it does not specify the nature of the relationships between its terms; it only indicates that there is a relationship.  I attempt to label noun hypernym relationships that appear within Roget?s Thesaurus. 
I examine several methods of mining noun hypernym relationships from several resources including large corpora, dictionaries, and existing lexical resources.  Human annotators manually evaluate samples from each resource.  Over 50,000 hypernym relationships are incorporated into Roget?s Thesaurus.  These new relationships imported to the Thesaurus are used to improve Roget's capacity for calculating semantic similarity between terms.  The improved similarity function is tested on several applications that make use of semantic similarity, including identifying synonyms and solving SAT style analogy questions.