DATE: | Wed, Sept 25, 2013 |
TIME: | 11:45 am |
PLACE: | Council Room (SITE 5-084) |
TITLE: | Making Sense in Translation: Lexical Choice Errors When Translating Across Domains |
PRESENTER: | Marine
Capuat NRC |
ABSTRACT:
While Statistical Machine Translation has achieved significant progress in recent years, state-of-the-art
systems cannot yet be trusted to convey the correct semantics of the original language. Performance is
particularly poor when systems are applied on test domains that differ from their training domain. In this
talk, I will present an analysis of lexical choice errors observed when porting a French-English system
trained on the Canadian Hansard to very different new domains (e.g., scientific papers or movie
subtitles). I will show that many errors fall into a category that has not been addressed in the machine
translation literature: French words that acquire new senses in the new domain. For instance, the word
"rime" is frequently used in the "political regime" sense in the Hansard, while the previously unseen
"diet" sense is more frequent in scientific articles. I will introduce a novel approach for detecting such
words automatically, using cues inspired from word sense disambiguation/induction models. This case study
highlights potential for future research at the intersection of machine translation and lexical semantics.
|