DATE: | Tuesday, Nov. 8, 2005 |
TIME: | 2:30 pm |
PLACE: | Council Room (SITE 5-084) |
TITLE: | Web Page Wrapper and Natural Language Processing |
PRESENTER: | David Nadeau University of Ottawa |
ABSTRACT:
This talk presents an extension to a web page wrapper algorithm. It consists in diversifying the information granularity that such an algorithm can capture by using NLP techniques. A web page wrapper is a filter over some specific HTML nodes holding desired information. For instance, in a newspaper article, a wrapper may allow tagging the author name, the date or even all instances of a location or a company name. We extend a web page wrapper algorithm to remove the limit of the HTML node and enable extracting information from the textual passage within it. We apply this technique to automatic lexicon generation. |