DATE: Thu, Oct 22, 2015
TIME: 1:30 pm
PLACE: SITE 5084
TITLE: Task Oriented Privacy-preserving (TOP) Data Publishing Framework Using Feature Selection
PRESENTER: Yasser Jafer
University of Ottawa
ABSTRACT:

A large amount of digital information collected and stored in databases creates vast opportunities for knowledge discovery and data mining. These datasets, however, may contain sensitive information about individuals and therefore, it is imperative to ensure that their privacy is protected. Most works in the areas of privacy preserving data publishing do not make any assumption about an intended analysis task applied on the dataset. In many domains such as healthcare, finance, etc, however, it is possible to identify the analysis task beforehand. Incorporating such knowledge of the ultimate analysis task may improve the quality of the anonymized data while protecting the privacy of individuals. Furthermore, the existing works which consider the ultimate analysis task (e.g. classification) are not suitable for high-dimensional data. We show that feature selection (which is a well-known dimensionality reduction tool) can be utilized in order to consider both aspects of privacy and utility simultaneously. In doing so, we show that feature selection can enhance existing privacy preserving techniques. We consider incorporating the concept of privacy-by-design within the feature selection process itself and as such, propose techniques that turn filter-based and wrapper-based feature selection into privacy-ware processes. In a different dimension, we introduce a framework for a privacy-aware feature selection evaluation measure. That is, we incorporate privacy "during" feature selection and obtain a list of candidate privacy-aware attribute subsets that consider (and satisfy) both efficacy and privacy requirements. Finally, we propose a multi-dimensional privacy-aware evaluation function (in automatic feature selection) which incorporates efficacy, privacy, and dimensionality weights and enables the data holder to obtain a best attribute subset according to his/her preferences.