![]() The Text Analysis and Machine Learning Group Executive summary | What is Knowledge Management? | The team | Brief history | Results and accomplishments | Current graduate students | International, national and industrial cooperation
SPEAKER:
Messaouda Ouerd, TOPIC:
Learning in Belief Networks and its Application in Distributed
Databases DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
In this talk we present the problem of learning in belief networks and its
application to caching data with repeated read-only accesses in distributed
databases. Our goal is to build a probabilistic network from the
distribution of the data which adequately represents the data. We describe
two classes of techniques for the induction of Bayesian networks from data,
methods based on probabilistic-graph models and methods using a Bayesian
learning approach. The probabilistic methods for learning Bayesian Belief
Networks(BBN)s focus on finding the most likely structure, implicitly
assuming that there is a true underlying structure. The Bayesian methods for
learning BBN search the network structure hypothesis to yield the network
which maximizes the relative a posteriori probability. Once constructed,
such a network can provide insight into probabilistic dependencies that
exist among the variables. We consider representations using Chow's
dependence trees and Polytrees (Singly Connected Networks) as structures for
inferring causal knowledge. We apply this approach to learn patterns or
sequences in query accesses to distributed databases. SPEAKER:
Dr. Ken Barker, TOPIC:
Identifying Semantic Relationships in Complex Noun Phrases DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
Complex noun phrases carry much of the information in English texts.
Unfortunately for systems that want to get at that information, there are
few surface indicators of the underlying meaning of a noun phrase. Such
systems must compensate for the lack of clues with other information. One
way is to load the system with lexical semantics for nouns and adjectives.
This merely shifts the problem elsewhere: how do we define the lexical
semantics and build large semantic lexicons? Another way is to find
constructions similar to a given noun phrase, for which the semantic
relationships among components are already known. In this talk I will
present a semi-automatic system that identifies semantic relationships in
noun phrases without using precoded noun or adjective semantics. Instead,
partial matching on similar, previously analyzed noun phrases leads to a
tentative interpretation of a new input, which is accepted or corrected by a
cooperative user. I will break the bad news: similarity is not easily
assessed, similar analyzed constructions may not exist, and if they do
exist, their analyses may not be appropriate for the current phrase. I will
also share the good news: processing can start with no prior analyses, and
as more noun phrases are analyzed, the system learns to find better
interpretations and reduces its reliance on the user. This talk contains no
sliding boxes. SPEAKER:
Johanne Morin, TOPIC:
Learning Relational Cliches with Contextual Generalization
DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
Concept learners learn the definition of a concept from positive and
negative examples of the concept. The definitions learned describe as many
of the positives and as few of the negatives as possible. These definitions
are then used to classify unknown examples as positive or negative examples.
Many existing systems learn concepts one feature at a time. These systems
have trouble learning definitions with interdependent features. The FOCL
system (Pazzani et al. 1991) solved this problem by giving the concept
learner hand-made "clichés" which are combinations of features.
The problem is that these clichés are hard to derive. I developed CLUse
(Clichés Learned and Used) to learn clichés automatically. Empirical
testing shows that CLUse can help concept learners with useful clichés
learned across domains. SPEAKER:
Mauricio de Almeida, TOPIC:
Learning (Tree/Rule)-like boolean C++ methods
DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
Decision trees and rule sets are commonly used languages to describe
learned concepts. Even though those representations are easy to read and
often the learners that generate them can evaluate their performance on the
testing set, in the long run the rule sets and trees are not directly
implementable in systems that are to use the learned rules or trees.
This problem, among others, has suggested us an approach in which C++
classes, equivalent to a decision tree or a rule set, or a mixture of the
two, are direct learned from a set of examples represented as
attribute-value vectors. In this seminar we present the main ideas behind
the Knowledge Embedding Learning system,
witch we are now implementing. SPEAKER:
Mario Jarmasz, TOPIC:
Corpus Linguistics : a paradigm for solving NLP problems
DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
Development of large electronic corpora for use in Computational Linguistics
started in the late 1970's. Advances in software and NLP technologies have
facilitated the transformation of text archives into electronic corpora.
Many researchers have turned to Corpus Linguistics in the past decade to
develop large-scale linguistic applications. The use of large corpora is not
a new concept in Linguistics. The richness of the corpora, the increase in
their size and the fact that many are easily accessible are some reasons
that make Corpus Linguistics attractive today. In this talk I will present
the different aspects of Corpus Linguistics. A definition of the corpus will
be introduced along with the various types of corpora that are currently
available. An overview of fields interested in corpora and possible
applications such as the construction of an electronic thesaurus,
information retrieval systems and machine translation systems will be
demonstrated. I will also present some statistical methods for empirical
investigations of corpora as well as the steps involved in creating an
electronic corpus. This presentation is based on the book Les linguistiques
de corpus (Habert, Nazerenko, Salem, 1997). |