Mar. 31, 2005

DATE:	Thursday, Mar. 31, 2005
TIME:	1:30 pm
PLACE:	Council Room (SITE 5-084)
TITLE:	Detecting Semantic Outliers in Automatic Speech Transcripts and in Keyphrases Extracted from Speech
PRESENTER:	Diana Inkpen University of Ottawa
ABSTRACT: Browsing through large volumes of spoken audio is known to be a challenging task for end users. Two ways to alleviate this problem are: (1) to allow users to gist a spoken audio document by glancing over a transcript generated through Automatic Speech Recognition; (2) to present the user with keyphrases extracted from these transcripts. Unfortunately, the transcripts typically contain many recognition errors which are highly distracting and make gisting more difficult. The keyphrases could also contain keywords that are recognition errors. We present an approach that detects recognition errors by identifying words which have low semantic similarity to other words in the transcript or in the keyphrases. We describe several variants of the semantic outlier detection algorithm. We test them on transcripts with high recognition accuracy (27.6% initial WER) and on less-than-broadcast quality transcripts (62.3% initial WER). We are able to substantially reduce the number of recognition errors, while loosing only a small amount of good words. This is joint work with Alain Desilets, IIT Group, National Research Council of Canada.