DATE: | Tuesday, Nov. 15, 2005 |
TIME: | 2:30 pm |
PLACE: | Council Room (SITE 5-084) |
TITLE: | Towards a Text Representation for Sentence Selection |
PRESENTER: | Maria Fernanda Caropreso University of Ottawa |
ABSTRACT:
Performance of the Bag of Words (BOW) text representation used commonly in text mining and classification seems surprising, given total lack of syntactic and semantic information. We address the task of sentence selection, working on a corpus of texts on genetics. We believe that because of the short length and the highly specific vocabulary of this corpus, the use of syntactic and semantic knowledge could be even more beneficial than in a collection of a more general nature. We present different representations and methodologies that we have tried in our preliminary work, including noun phrases and other syntactic links, the use of dictionaries and hierarchies, first order predicates and grammar learners. |