DATE: Friday, Jan. 17, 2003
TIME: 3:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Roget's Thesaurus and Semantic Similarity
PRESENTER: Mario Jarmasz
University of Ottawa
ABSTRACT:

People identify synonyms -- strictly speaking, near-synonyms -- such as angel-cherub, without being able to define synonymy properly. The term tends to be used loosely, even in the crucially synonymy-oriented WordNet with the synset as the basic semantic unit. For NLP systems it is often more useful to establish the degree of synonymy between two words, referred to as semantic similarity.

In this talk I set out to validate the intuition that Roget's Thesaurus, sometimes treated as a book of synonyms, allows us to measure semantic similarity effectively. In a few typical tests, we compare how Roget's and WordNet help measure semantic similarity. One of the benchmarks is Miller and Charles' list of 30 noun pairs to which human judges had assigned similarity measures. The 30 pairs can be traced back to Rubenstein and Goodenough's 65 pairs, which we have also studied. Our Roget's-based system gets correlations of .878 for the smaller and .818 for the larger list of noun pairs; this is quite close to the .885 that Resnik obtained when he employed humans to replicate the Miller and Charles experiment.

We further evaluate our measure by using Roget's and WordNet to answer 80 TOEFL, 50 ESL and 300 Reader's Digest questions: the correct synonym must be selected amongst a group of four words. Our system gets 78.75%, 82.00% and 74.33% of the questions respectively, better than any published results.