DATE: Thursday, Mar. 10, 2005
TIME: 1:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Building Predictors for Horizontally and Vertically Distributed Data
PRESENTER: Sabine McConnell
Queen's University
ABSTRACT:

Due to privacy concerns and to the large volume of data available today, data are often distributed across institutional, geographical and organizational boundaries. Existing parallel and distributed data mining approaches require either the communication of the data to a central site, are communication intensive or result in a loss of prediction accuracy. In this talk, we will demonstrate that the simple strategy of building local predictors for data that are partitioned by both samples and attributes, followed by a combination through simple voting schemes, can be as effective as a predictor built from a centralized dataset. This method exchanges models or classification results rather than raw data, which makes it suitable for privacy preserving data mining. In addition, both final model size and runtime are typically reduced compared to a centralized model. We will also discuss the extension of this approach to applications in sensor networks as well as astronomical data.