| DATE: | Monday, Nov 22, 2010 |
| TIME: | 3:30 pm |
| PLACE: | Council Room (SITE 5-084) |
| TITLE: | Accelerating K-Means Revisited - Sparse Data Vectors |
| PRESENTER: | Andrew McPherson IT Research and Development, CSEC |
| ABSTRACT:
In 2003 Charles Elkan reported a considerable gain in speed for K-Means clustering using the triangle inequality. In this talk we present some further work specifically targetting the case where the entities being clustered are represented by very sparse vectors in a very large dimensional space. This situation naturallly arises when clustering a large set of documents represented by bag-of-words vectors. In this talk we are concerned only with speed and we specifically concentrate on small numbers of clusters such as are used in iterative clustering schemes such as X-Means. |
|