Nov. 22, 2002

DATE:	Friday, Nov. 22, 2002
TIME:	3:30 pm
PLACE:	Council Room (SITE 5-084)
TITLE:	Learning Sparse Classifiers with Data-Dependent Features
PRESENTER:	Mario Marchand University of Ottawa
ABSTRACT: A substantial amount of machine learning research is currently directed towards building sparse (compact) classifiers that exhibit good generalization (i.e., predict well). We present learning algorithms that achieve this goal by constructing simple functions (like conjunctions, disjunctions, or decision lists) on Boolean-valued features that are defined only with respect to a training set. The algorithms used generalize some famous PAC learning algorithms (proposed by Valiant, Haussler, and Rivest) by allowing the possibility of tradeoff between complexity and accuracy and by allowing the use of arbitrary feature sets. Since the class of functions explored by these learning algorithms is data-dependent, standard VC theory cannot be used to assess their performance. However, by extending an approach pioneered by Floyd, Littlestone, and Warmuth, we are able to bound the generalization error in terms of the sparsity achieved on the training data. On real-world data sets obtained from the UCI repository, our proposed learning algorithms (called set covering machines and decision list machines) are generally as accurate (and sometimes more accurate) as the famous support vector machine but produce much sparser classifiers. In fact, on some data sets, our classifiers are more than 100 times sparser (smaller) than support vector machines! We also show that our generalization error bounds provide an effective guide for choosing the best model-selection parameters of our learning algorithms.