Apr. 3, 2007

DATE:	Wednesday, Oct. 10, 2007
TIME:	4:00 pm
PLACE:	Council Room (SITE 5-084)
TITLE:	Progressive Border Sampling
PRESENTER:	John Li University of Ottawa
ABSTRACT: Selecting a small training set from the original dataset can help reduce the cost of learning on massive amounts of data and help learn better classifiers. The Progressive Sampling approach (PS) proposed in previous research can learn from small samples by using random sampling and progressive learning techniques. This paper proposes an extension of PS called Progressive Border Sampling (PBS). Rather than using random sampling, PBS assumes stratified sampling from the inherent border specified by the labeled training examples. This border consisting of data points lying close to the boundary separating the training examples of different classes is divided into two parts: near border and far border. The data points located on this border are believed to be more informative than the others and a few of them should be sufficient to help learn better classifiers. As a result, PBS is expected to discover smaller sets of data points in a tractable way than PS, to substitute in the learning task. Our experimental results on the selected 30 benchmark datasets from the UCI repository show the improvement and strength that PBS yields over PS. This is join work of Guichong Li and Nathalie Japkowicz from University of Ottawa, and Trevor J. Stocki and R. Kurt Ungar from Radiation Protection Bureau, Health Canada.