DATE: Tuesday, Nov. 8, 2005
TIME: 2:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Web Page Wrapper and Natural Language Processing
PRESENTER: David Nadeau
University of Ottawa
ABSTRACT:

This talk presents an extension to a web page wrapper algorithm. It consists in diversifying the information granularity that such an algorithm can capture by using NLP techniques. A web page wrapper is a filter over some specific HTML nodes holding desired information. For instance, in a newspaper article, a wrapper may allow tagging the author name, the date or even all instances of a location or a company name. We extend a web page wrapper algorithm to remove the limit of the HTML node and enable extracting information from the textual passage within it. We apply this technique to automatic lexicon generation.