DATE: | Wednesday, Oct. 10, 2007 |
TIME: | 4:00 pm |
PLACE: | Council Room (SITE 5-084) |
TITLE: | Progressive Border Sampling |
PRESENTER: | John Li University of Ottawa |
ABSTRACT:
Selecting a small training set from the original dataset can help reduce
the cost of learning on massive amounts of data and help learn better
classifiers. The Progressive Sampling approach (PS) proposed in previous
research can learn from small samples by using random sampling and
progressive learning techniques. This paper proposes an extension of PS
called Progressive Border Sampling (PBS). Rather than using random
sampling, PBS assumes stratified sampling from the inherent border
specified by the labeled training examples. This border consisting of data
points lying close to the boundary separating the training examples of
different classes is divided into two parts: near border and far border.
The data points located on this border are believed to be more informative
than the others and a few of them should be sufficient to help learn
better classifiers. As a result, PBS is expected to discover smaller sets
of data points in a tractable way than PS, to substitute in the learning
task. Our experimental results on the selected 30 benchmark datasets from
the UCI repository show the improvement and strength that PBS yields over
PS.
|