DATE: Tuesday, Apr 27, 2010
TIME: 3:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Automatic extraction of clinical trial characteristics from journal publications
PRESENTER: Berry de Bruijn and Svetlana Kiritchenko
NRC
ABSTRACT:

The Human Studies Database Project is an international project to collect clinical trial characteristics to make it easier for researchers to conduct systematic reviews and meta-analyses, and to plan new studies. The database contains trial parameters such as eligibility rules, population sample size, various intervention and control details, funding details, and outcome names and their time points. However, most of this information is available only in free text - e.g., in journal publications - which is labour intensive to process. Automated information extraction techniques can assist, but still require human supervision.

In this presentation, we will describe the architecture of the NRC Information Extraction engine, which combines a text classifier with a weak regular expression matcher. This approach is uniform for the 21 clinical trial parameters selected for this study, even though they greatly differ in their structure and complexity. Integrated into the system is a web browser-based user interface that allows curators to review and modify the suggested entries prior to committing them into the database. We will discuss the design considerations of the user interface, and present the results of the system evaluation. Finally, we will demonstrate the system and its user interface.