DATE: Wed, Oct 30, 2013
TIME: 11:45 am
PLACE: Council Room (SITE 5-084)
TITLE: Feature Space Selection and Combination for Native Language Identification
PRESENTER: Cyril Goutte
NRC
ABSTRACT:

We describe the National Research Council Canada's submission to a shared task on Native Language Identification, and provide an analysis of the results. Our systems rely on the use of SVM statistical classifiers, trained on various combinations of feature spaces describing lexical and syntactic characteristics of documents written by ESL learners. Somewhat surprisingly, classifiers using only surface form information performed very well and yield an error rate of around 20% over the 11 classes. However, the best performance is obtained by a combination of different models using majority voting.