DATE: Thursday, Feburary 7, 2013
TIME: 3:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Query-Structure Based Web Page Indexing OpenMT 2012
PRESENTER:Falah Al-akashi
University of Ottawa
ABSTRACT:

Indexing is a crucial technique for dealing with the massive amount of data present on the web. In our third participation in the web track at TREC 2012, we explore the idea of building an efficient query-based indexing system over Web page collection. Our prototype explores the trends in user queries and consequently indexes texts using particular attributes available in the documents. This paper provides an in-depth description of our approach for indexing web documents efficiently; that is, topics available in the web documents are discovered with the assistance of knowledge available in Wikipedia. The well-defined articles in Wikipedia are shown to be valuable as a training set when indexing Webpages. Our complex index structure also records information from titles and urls, and pays attention to web domains.