DATE: | Monday, Nov 22, 2010 |
TIME: | 3:30 pm |
PLACE: | Council Room (SITE 5-084) |
TITLE: | Accelerating K-Means Revisited - Sparse Data Vectors |
PRESENTER: | Andrew McPherson IT Research and Development, CSEC |
ABSTRACT:
In 2003 Charles Elkan reported a considerable gain in speed for K-Means clustering using the triangle inequality. In this talk we present some further work specifically targetting the case where the entities being clustered are represented by very sparse vectors in a very large dimensional space. This situation naturallly arises when clustering a large set of documents represented by bag-of-words vectors. In this talk we are concerned only with speed and we specifically concentrate on small numbers of clusters such as are used in iterative clustering schemes such as X-Means. |