DATE: Thursday, Feb. 10, 2005
TIME: 1:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Cross-Lingual Information Retrieval and the Amazing Utility of Comparable Corpora
PRESENTER: Fatiha Sadat
NRC
ABSTRACT:

Expanded international collaboration, the increase in the availability of electronic foreign language texts and resources and the growing number of non-English speaking users compels us to develop Cross-Lingual Information Retrieval (CLIR) tools capable of bridging the language barrier. CLIR bridges this gap by enabling a person to search in one language and retrieve documents across different languages.

In this talk, I will focus on the major problems associated to CLIR and their solutions, such as using comparable corpora. Evaluations using Japanese-English pair of languages and different weighting schemes of SMART retrieval system showed that combination of different resources for query translation improves greatly the effectiveness of CLIR.