DATE: | Tuesday, Mar 9, 2010 |
TIME: | 3:30 pm |
PLACE: | Council Room (SITE 5-084) |
TITLE: | Contextual search for term equivalents in terminological databases |
PRESENTER: | Caroline Barriere NRC |
ABSTRACT:
As a translator looks for an unknown technical term in a term bank, he/she hopes to find a term equivalent (translation) which properly expresses the meaning of the term with respect to the text to translate. Many terms (even specialised) are polysemous and will be assigned multiple records in the term bank for their multiple meanings. The problem of finding the appropriate record in the term bank is very similar to the word-sense disambiguation problem of finding the appropriate entry in a dictionary. Although word-sense disambiguation is largely studied within the computational linguistic community, the proposed solutions for general language dictionaries are mainly based on the use of definitional information and unfortunately such information is not mandatory in a term bank. In the term bank, many terms have either very short definitions or no definitions at all, but all terms must be assigned to one or more domains from a pre-established list of domains. Our research makes use of this domain information. We developed a first algorithm which automatically assigns a domain profile to a source text, and a second algorithm which finds a match between a term's domains (as found in the term bank) and the text's domain profile. For our experimentation, bilingual abstracts (French-English) from eight scientific journals provide 1130 pairs of term equivalents. The Grand Dictionnaire Terminologique (Office Québécois de la Langue Française) is used as a terminological ressource. On our data set, we show a reduction of 75% in the average rank of the correct equivalent, in comparison to a random choice. |