Jan. 27, 2005

DATE:	Thursday, Jan. 27, 2005
TIME:	1:30 pm
PLACE:	Council Room (SITE 5-084)
TITLE:	PAC-Bayes Learning for Classification of Gene-Expression Data
PRESENTER:	Mohak Shah University of Ottawa
ABSTRACT: Classifying successfully high dimensional data, such as gene-expression data, by using the smallest number of attributes is still a challenge. The standard methods used for that purpose can essentially be grouped into two broad categories viz. filter based (processing data prior to learning) and wrapper based (using the base learning algorithm for feature selection). Both these methods have sometimes yielded good empirical results but they are not theoretically justified since no provable guarantees exist for these approaches. We propose a "soft greedy" learning algorithm for building small conjunctions of simple threshold functions, called rays, defined on single real-valued attributes. We also propose a PAC-Bayes risk bound which is minimized for classifiers achieving a non-trivial tradeoff between sparsity (the number of rays used) and the magnitude of the separating margin of each individual rays. Finally, we test the soft greedy algorithm on some DNA micro-array data sets.