DATE: Tuesday, Nov 17, 2009
TIME: 3:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Evaluating Machine Learning: Scored Receiver Operating Characteristics Curves (SROC)
PRESENTER: William Klement
University of Ottawa
ABSTRACT:

In machine learning, standard learning tasks range from learning to classify, learning to rank, or learning to estimate probabilities. Such tasks may be performed by means of estimating scores related to class memberships. These scores can be made to produce classification decisions by imposing a classification threshold, can be ordered to produce a ranking, or may be used as probability estimates of class memberships given that certain statistical assumption are met.

In recent years, the machine learning community has developed well-established methods to evaluate such learning tasks. In particular, the Receiver Operating Characteristics Curve, the ROC curve, is an evaluation method that depicts the ranking performance of a learning algorithm based on its classification decisions, which are obtained by imposing a threshold on the scores. As for the task of estimating probabilities, principles of statistics provide extensive methods to evaluate its performance, however, these require making assumptions.
In this research, we claim that measuring similarities and differences between scores, produced by a learning algorithm, convey performance information omitted by the standard ROC curve. On the one hand, for a classification or a ranking task, the ROC curve ignores the magnitudes and the margins such scores once the curve is determined. On the other hand, these scores may not meet assumptions needed to be considered as probabilities. Therefore, our work is concerned with measuring similarities or differences between sets of scores resulting from such tasks. This research develops the Scored ROC Curve, which extends the standard ROC curve to incorporate such scores. Our experimental results show that measuring similarities or differences between learning algorithms, using this Scored ROC curve, presents an intuitive performance evaluation.