DATE: | Wednesday, May 23, 2012 |
TIME: | 3:00 pm |
PLACE: | Council Room (SITE 5-084) |
TITLE: | Document Clustering with Dual Supervision |
PRESENTER: | Evangelos Milios Dalhousie University |
ABSTRACT: Nowadays, academic researchers maintain a personal library of papers,
which they would like to organize based on their needs.
Clustering techniques are often employed to achieve this goal by
grouping the document collection into different topics.
Unsupervised clustering does not require any user effort but only
produces one universal output with which users may not be
satisfied. Therefore, document clustering needs user input for guidance
to generate personalized clusters. Semi-supervised
clustering incorporates prior information and has the potential to
produce customized clusters. Traditional semi-supervised
clustering is based on user supervision in the form of labeled instances
or pairwise instance constraints. However, alternative
forms of user supervision exist such as labeling features. The joint use
of document-level and feature-level supervision has been
called dual supervision. We first explore a framework to use feature
supervision for feature selection by indicating
whether a feature is useful for clustering. Second, we enhance the
semi-supervised clustering with feature supervision using feature
re-weighting. Third, we propose a unified framework to combine document
supervision and feature supervision through seeding. The
newly proposed algorithms are evaluated using oracles and demonstrated
to be helpful in producing better clusters matching a
single user's point of view than document clustering without any
supervision and with only document supervision. Finally, we
conduct a user study to confirm that different users have different
understandings of the same document collection and prefer
personalized clusters. At the same time, we demonstrate that document
clustering with dual supervision is able to produce good
personalized clusters even with noisy user input. Dual supervision is
also demonstrated to work better in personalization than any
single form of supervision.
|