Creating Longitudinal Data: Challenges and Solutions
PRESENTER:
Luiza Antonie
University of Guelph
ABSTRACT:
Linking multiple databases to create longitudinal data is an important
research problem with multiple applications. Longitudinal data allows
analysts to perform studies that would be unfeasible otherwise. In this
talk, I discuss a system we designed to link historical census databases
in order to create longitudinal data
that allow tracking people over time. The goal of the linking is to
identify the same person in multiple census collections. Data
imprecision in historical census data and the lack of unique personal
identifiers make this task a challenging one. We design and employ a
record linkage system that incorporates a supervised learning module for
classifying pairs of records as matches and
non-matches. We show that our system performs large scale linkage
producing high quality links and generating sufficient longitudinal data
to allow meaningful social science studies. These longitudinal data have
already been used by social scientists and historians to investigate
historical trends and to address questions about society, history and
economy, and this comparative, systematic research would not be
possible without the linked data.