DATE: Thursday, Jan. 13, 2005
TIME: 1:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Comparison between Similarity Measurements of Vector Space Model and Probabilistic Model on Arabic Text
PRESENTER: Muath Alzghool
University of Ottawa
ABSTRACT:

We designed and built an automatic information retrieval system to handle Arabic data. For evaluation, we selected 242 Arabic documents and 20 queries. All these documents are about computer science and information systems. The system was built using two traditional model technique: Vector Space Model (VSM), and Probabilistic Model. In VSM we used Cosine measure, Dice measure, and Jaccard measure similarity. We compared the retrieval results using the Vector Space Model and the Probabilistic Model. We found out that the retrieval results for the Probabilistic Model are better than the retrieval results for the Vector Space Model for Arabic documents. Also we compared the retrieval results of two different indexing methods: the full-word indexing and the root indexing. We found out that the root indexing improved the retrieval performance compared to the full-word indexing on the Arabic documents; furthermore, it reduces the size of stored data. Our results are better than other previously reported results on similar settings, due to a better root extraction method.