DATE: Tuesday, Nov 3, 2009
TIME: 3:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Data Fusion for a Spontaneous Speech Retrieval Task
PRESENTER: Muath Alzghool
University of Ottawa
ABSTRACT:

Searching through recordings of interviews, teleconferences, or other conversational speech is difficult because the transcripts produced with Automatic Speech Recognition (ASR) systems tend to contain many recognition errors, leading to low Information Retrieval (IR) performance, unlike the retrieval from broadcast speech, where the lower word error rate did not seem to harm the retrieval. A large number of IR systems and retrieval strategies have been proposed and implemented in the last 30 years. There is a tremendous need to benefit from these strategies. One way to benefit from them is to combine their results by a data fusion technique. We will present two novel model fusion techniques. We fuse together results from several IR models or variations of the models. We test them on a collection of spontaneous speech transcripts. We also fuse results obtained with different documents representations (automatic transcripts or manual data). Our first fusion model is training the weighs based on the training data, but in an efficient and novel way. The second fusion model works for results with high variation, such as results obtained from automatic vs. manual document representations.