DATE: Tuesday, Mar 23, 2010
TIME: 3:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Finding salient segments in noisy text: some studies on speech transcripts and semi-structured Web text
PRESENTER: Xiaodan Zhu
Institute for Information Technology, NRC
ABSTRACT:

Automatically identifying salient content from written or spoken documents plays a very important role in helping human beings to acquire information efficiently. In this talk, I will mainly focus on my PhD thesis work, in which I studied the roles of noisy text (automatic transcripts) in the task of identifying salient content from spoken documents. I will also briefly discuss an information extraction effort that aims to identify from the whole Web repository the related entity names that are embedded in noisy html/xml tags.