DATE: Wed, Jan 17, 2018
TIME: 1 pm
TITLE: Clustering in the Presence of Concept Drift
PRESENTER: Richard Hugh Moulton
University of Ottawa

Cybersecurity is a problem representative of the challenging environment of data streams: potentially infinite data which arrives quickly and evolves over time. An additional challenge inherent in the problem is that the timely arrival of labels is not a realistic assumption. Clustering is a natural approach to address these characteristics and many algorithms have been proposed for application with data streams. The literature does not, however, provide quantitative descriptions of how these algorithms can be expected to behave in given circumstances. This work addresses this gap and analyses the performance of a wide range of data stream clustering algorithms (DSCAs) applied to categorical and real-valued artificial data streams as well as data streams that are representative of cybersecurity use cases.