TAMALE SEMINAR

SPEAKER: Messaouda Ouerd, University of Ottawa , email: ouerd@csi.uottawa.ca

TOPIC: Learning in Belief Networks and its Application in Distributed Databases

DATE: Friday, October 23, 1998

PLACE: Room 318, MacDonald Hall, University of Ottawa

ABSTRACT: In this talk we present the problem of learning in belief networks and its application to caching data with repeated read-only accesses in distributed databases. Our goal is to build a probabilistic network from the distribution of the data which adequately represents the data. We describe two classes of techniques for the induction of Bayesian networks from data, methods based on probabilistic-graph models and methods using a Bayesian learning approach. The probabilistic methods for learning Bayesian Belief Networks(BBN)s focus on finding the most likely structure, implicitly assuming that there is a true underlying structure. The Bayesian methods for learning BBN search the network structure hypothesis to yield the network which maximizes the relative a posteriori probability. Once constructed, such a network can provide insight into probabilistic dependencies that exist among the variables. We consider representations using Chow's dependence trees and Polytrees (Singly Connected Networks) as structures for inferring causal knowledge. We apply this approach to learn patterns or sequences in query accesses to distributed databases.

SPEAKER: Dr. Ken Barker, School of Information Technology and Engineering (SITE)
email: kbarker@site.uottawa.ca

TOPIC: Identifying Semantic Relationships in Complex Noun Phrases

DATE: Friday, October 30, 1998

PLACE: Room 318, MacDonald Hall, University of Ottawa

ABSTRACT: Complex noun phrases carry much of the information in English texts. Unfortunately for systems that want to get at that information, there are few surface indicators of the underlying meaning of a noun phrase. Such systems must compensate for the lack of clues with other information. One way is to load the system with lexical semantics for nouns and adjectives. This merely shifts the problem elsewhere: how do we define the lexical semantics and build large semantic lexicons? Another way is to find constructions similar to a given noun phrase, for which the semantic relationships among components are already known. In this talk I will present a semi-automatic system that identifies semantic relationships in noun phrases without using precoded noun or adjective semantics. Instead, partial matching on similar, previously analyzed noun phrases leads to a tentative interpretation of a new input, which is accepted or corrected by a cooperative user. I will break the bad news: similarity is not easily assessed, similar analyzed constructions may not exist, and if they do exist, their analyses may not be appropriate for the current phrase. I will also share the good news: processing can start with no prior analyses, and as more noun phrases are analyzed, the system learns to find better interpretations and reduces its reliance on the user. This talk contains no sliding boxes.

SPEAKER: Johanne Morin, University of Ottawa , email: jmorin@csi.uottawa.ca

TOPIC: Learning Relational Cliches with Contextual Generalization

DATE: Friday, November 27, 1998

PLACE: Room 318, MacDonald Hall, University of Ottawa

ABSTRACT: Concept learners learn the definition of a concept from positive and negative examples of the concept. The definitions learned describe as many of the positives and as few of the negatives as possible. These definitions are then used to classify unknown examples as positive or negative examples. Many existing systems learn concepts one feature at a time. These systems have trouble learning definitions with interdependent features. The FOCL system (Pazzani et al. 1991) solved this problem by giving the concept learner hand-made "clichés" which are combinations of features. The problem is that these clichés are hard to derive. I developed CLUse (Clichés Learned and Used) to learn clichés automatically. Empirical testing shows that CLUse can help concept learners with useful clichés learned across domains.

SPEAKER: Mauricio de Almeida, University of Ottawa , email: malmeida@csi.uottawa.ca

TOPIC: Learning (Tree/Rule)-like boolean C++ methods

DATE: Friday, December 4th, 1998

PLACE: Room 318, MacDonald Hall, University of Ottawa

ABSTRACT: Decision trees and rule sets are commonly used languages to describe learned concepts. Even though those representations are easy to read and often the learners that generate them can evaluate their performance on the testing set, in the long run the rule sets and trees are not directly implementable in systems that are to use the learned rules or trees. This problem, among others, has suggested us an approach in which C++ classes, equivalent to a decision tree or a rule set, or a mixture of the two, are direct learned from a set of examples represented as attribute-value vectors. In this seminar we present the main ideas behind the Knowledge Embedding Learning system, witch we are now implementing.

SPEAKER: Mario Jarmasz, University of Ottawa , email: mjarmasz@csi.uottawa.ca

TOPIC: Corpus Linguistics : a paradigm for solving NLP problems

DATE: Friday, December 11th, 1998

PLACE: Room 318, MacDonald Hall, University of Ottawa

ABSTRACT: Development of large electronic corpora for use in Computational Linguistics started in the late 1970's. Advances in software and NLP technologies have facilitated the transformation of text archives into electronic corpora. Many researchers have turned to Corpus Linguistics in the past decade to develop large-scale linguistic applications. The use of large corpora is not a new concept in Linguistics. The richness of the corpora, the increase in their size and the fact that many are easily accessible are some reasons that make Corpus Linguistics attractive today. In this talk I will present the different aspects of Corpus Linguistics. A definition of the corpus will be introduced along with the various types of corpora that are currently available. An overview of fields interested in corpora and possible applications such as the construction of an electronic thesaurus, information retrieval systems and machine translation systems will be demonstrated. I will also present some statistical methods for empirical investigations of corpora as well as the steps involved in creating an electronic corpus. This presentation is based on the book Les linguistiques de corpus (Habert, Nazerenko, Salem, 1997).