DATE: Wednesday, Oct. 3, 2007
TIME: 4:00 pm
PLACE: Council Room (SITE 5-084)
TITLE: Semantic Similarity of Short Texts
PRESENTER: Aminul Islam
University of Ottawa
ABSTRACT:

We present a method for measuring the semantic similarity of texts using normalized and modified versions of the Longest Common Subsequence (LCS) string matching algorithm, a corpus based measure of semantic word similarity and an optional measure of common word order similarity. Existing methods for computing text similarity have focused mainly on either large documents or individual words. Here, we focus on computing the similarity between short texts of sentence length. The proposed method can be exploited in a variety of applications involving textual knowledge representation and knowledge discovery. Evaluation results on two different data sets show that our method outperforms several competing methods.