DATE: | Thursday, Nov 3, 2011 |
TIME: | 3:30 pm |
PLACE: | Council Room (SITE 5-084) |
TITLE: | A Supervised Method of Feature Weighting for Measuring Semantic Relatedness |
PRESENTER: | Alistair Kennedy University of Ottawa |
ABSTRACT: Clustering of related words is crucial for a variety of Natural Language Processing applications. A popular technique is to use the context that a word appears in to build vectors that represent a words meaning. Vector distance is then taken to determine whether two words have similar meanings. Usually these contexts are given weight based on some measure of association between the word and the context. These measures increase the weight of contexts where a word appears regularly but other words do not, and decrease the weight of contexts where many words may appear. Essentially, it is unsupervised feature weighting. I will present and discuss a method of supervised feature weighting. It identifies contexts shared by pairs of words known to be semantically related or unrelated, and then uses this information to weight these contexts on how well they indicate word relatedness. The system can be trained with data from resources such as WordNet or Roget’s Thesaurus and can also be used with various measures of association including Pointwise Mutual Information. This work is as a step towards adding new terms to Roget’s Thesaurus automatically, and doing so with high confidence. |