ABSTRACT:
In machine learning, standard learning tasks range from learning to
classify, learning to rank, or learning to estimate probabilities.
Such tasks may be performed by means of estimating scores related to
class memberships. These scores can be made to produce classification
decisions by imposing a classification threshold, can be ordered to
produce a ranking, or may be used as probability estimates of class
memberships given that certain statistical assumption are met.
In recent years, the machine learning community has developed
well-established methods to evaluate such learning tasks.
In particular, the Receiver Operating Characteristics Curve, the ROC
curve, is an evaluation method that depicts the ranking performance
of a learning algorithm based on its classification decisions, which
are obtained by imposing a threshold on the scores. As for the task
of estimating probabilities, principles of statistics provide extensive
methods to evaluate its performance, however, these require making
assumptions.
In this research, we claim that measuring similarities and differences
between scores, produced by a learning algorithm, convey performance
information omitted by the standard ROC curve. On the one hand, for a
classification or a ranking task, the ROC curve ignores the magnitudes
and the margins such scores once the curve is determined. On the other
hand, these scores may not meet assumptions needed to be considered as
probabilities. Therefore, our work is concerned with measuring
similarities or differences between sets of scores resulting from such
tasks. This research develops the Scored ROC Curve, which extends the
standard ROC curve to incorporate such scores. Our experimental results
show that measuring similarities or differences between learning
algorithms, using this Scored ROC curve, presents an intuitive
performance evaluation.
|