Mar 23, 2010

DATE:	Tuesday, Mar 23, 2010
TIME:	3:30 pm
PLACE:	Council Room (SITE 5-084)
TITLE:	Finding salient segments in noisy text: some studies on speech transcripts and semi-structured Web text
PRESENTER:	Xiaodan Zhu Institute for Information Technology, NRC
ABSTRACT: Automatically identifying salient content from written or spoken documents plays a very important role in helping human beings to acquire information efficiently. In this talk, I will mainly focus on my PhD thesis work, in which I studied the roles of noisy text (automatic transcripts) in the task of identifying salient content from spoken documents. I will also briefly discuss an information extraction effort that aims to identify from the whole Web repository the related entity names that are embedded in noisy html/xml tags.