DATE: Thursday, March 12, 2009
TIME: 2:45 pm
PLACE: Council Room (SITE 5-084)
TITLE: Building specialized-language comparable corpora from the Web
PRESENTER: Caroline Barriere
Interactive Language Technology Group, IIT-NRC
ABSTRACT:

This work establishes the feasibility of building specialized-language comparable corpora from the Web for the purpose of finding term equivalents. We define a "difficult" setting in which a term from a source language (here English) is taken out-of-context, and the task is to build a corpus in a target language (here French) which would contain an equivalent for the term. In such setting, very encouraging results of over 60% coverage are obtained for small corpora (less than 50 ocuments). Those results are based on tests with 30 terms randomly chosen from the Grand Dictionnaire Terminologique (from the Office Québécois de la Langue Française). This presentation describes and discusses the different steps for building the specialized-language comparable corpora, and gives details on the experimentation performed and the results obtained.