Jan 21, 2015

DATE:	Wed, Jan 21, 2015
TIME:	12:00 pm
PLACE:	Council Room (SITE 5-084)
TITLE:	Learning from Imbalanced Data Using Ensemble Methods and Cluster-Based Undersampling
PRESENTER:	Parinaz Sobhani University of Ottawa
ABSTRACT: Imbalanced data, where the number of instances of one class is much higher than the others, are frequent in many domains such as fraud detection, telecommunications management, oil spill detection, and text classification. Traditional classifiers do not perform well when considering data that are susceptible to both within-class and between-class imbalances. In this paper, we propose the ClusFirstClass algorithm that employs cluster analysis to aid classifiers when aiming to build accurate models against such imbalanced datasets. In order to work with balanced classes, all minority instances are used together with the same number of majority instances. To further reduce the impact of within-class imbalance, majority instances are clustered into different groups and at least one instance is selected from each cluster. Experimental results demonstrate that our proposed ClusFirstClass algorithm yields promising results compared to the state-of-the art classification approaches, when evaluated against a number of highly imbalanced datasets.