DATE: Thu, Feb 25, 2016
TIME: 1:30 pm
PLACE: SITE 5084
TITLE: Sampling to Efficiently Train Bilingual Neural Network Language Models
PRESENTER: Colin Cherry
NRC
ABSTRACT:

The neural network joint model of translation (NNJM) combines source and target context in a 15-gram, feed-forward neural network language model to produce a powerful translation feature. However, its softmax top layer means that probability and gradient calculations require a sum over the entire output vocabulary, resulting in very slow maximum likelihood (MLE) training. This has led some groups to train using Noise Contrastive Estimation (NCE), which side steps this sum by sampling over the output. We carry out the first direct comparison of MLE and NCE training objectives for the NNJM, showing that NCE is significantly outperformed by MLE on large-scale Arabic-English and Chinese-English translation tasks. We also show that this drop can be avoided by using a recently proposed translation noise distribution. In addition to these translation-specific results, this talk will include a tutorial on Noise Contrastive Estimation, which is a generally useful technique for efficient training of any log-linear model with a large output space.