DATE: Thursday, Mar. 4, 2004
TIME: 11:30 am
PLACE: Council Room (SITE 5-084)
TITLE: Lexical resources: harnessing people power to build them, and launching them on the express lane
PRESENTER: Vivi Nastase
University of Ottawa
ABSTRACT:

Research in NLP focuses more and more on developing robust technologies, to be deployed on unseen texts to produce summaries, answer questions, etc. These tasks require lexical resources to provide information about word senses, and to relate different senses. Lexical resources that can support robust working paradigms must have broad coverage in terms of vocabulary, and a high degree of connectivity.

I will talk about three issues related to lexical resources:

  1. how we build them at low cost while maintaining quality and quantity,
  2. how we put them to work to help us obtain accurately annotated data,
  3. how we encode them so we can exploit them efficiently despite their size.

A Web-based system serves as an example for the first issue, and its purpose is to obtain a WordNet for Romanian. This resource is put to use in the Open Mind Word Expert project, to collect Romanian sentences with annotated word senses for a SENSEVAL 3 task. Finally, I will show an encoding of WordNet as an ordered set, which allows us to compute practically instantaneously various measures, and to use a semantic wildcard for information retrieval with excellent results.