DATE: Tuesday, Nov. 29, 2005
TIME: 2:30 pm
PLACE: Council Room (SITE 5-084)
TITLE: Exploring Knowledge Patterns for Building Knowledge-Rich Specialized Corpora
PRESENTER: Jakob Halskov
Department of Computational Linguistics
Copenhagen Business School
ABSTRACT:

When bootstrapping knowledge-rich specialized corpora from the Internet re-ranking documents by their density of Knowledge Patterns (Meyer, 2001) has been shown to be a reliable indicator of knowledge richness (Agbago & Barriere, 2005). However, the presentation will address three questions which remain open:

  1. Are all Knowledge Patterns (KPs) equally useful in assessing the richness of specialized knowledge in a document?
  2. Are KPs universal across domains and text types?
  3. Can we measure the likelihood that a given KP in a given web document does, in fact, realize a semantic relation between two concepts?

The questions are addressed by computing "weirdness" scores (Ahmad, 1993) and context vectors of the individual KPs based on a general language corpus (the BNC) and a number of specialized corpora. An evaluation of a small pilot study is carried out by comparing the performance of an automatic approach with a small set of manually annotated KPs in context.