![]() The Text Analysis and Machine Learning Group Executive summary | What is Knowledge Management? | The team | Brief history | Results and accomplishments | Current graduate students | International, national and industrial cooperation
SPEAKER:
Chris Drummond, TOPIC:
Symbols, Systematicity and Synergisms DATE:
PLACE:
Room 318, MacDonald Hall,
ABSTRACT:
In the solving of complex tasks, it is important to exploit the
results of prior learning. If transfer occurs at the level of the whole
task, the likelihood of previous learning being relevant is small. If the
complex task can be broken down into smaller parts then this likelihood will
be considerably increased. It is this structure sensitivity which is the
critical property of Fodor's Language of Thought hypothesis. The main
argument in support of this view is that thought is strongly systematic,
that being able to think some thoughts is intrinsically linked to being able
to think other thoughts. This work looks at the idea that being able to
solve some tasks is intrinsically linked to being able to solve other tasks.
This talk discusses an approach that realises this systematicity property by
combining symbolic and associative processes. An associative learning
algorithm generates solutions to subtasks. These are composed by a process
much like symbolic planning to form a solution to a complex task.
This is further refined by the associative learning algorithm so that
it becomes more synergistic, the solutions to subtasks becoming more
interdependent. This talk will contrast this approach to others from the
Artificial Intelligence community and related fields.
It will demonstrate its viability by showing how it is implemented
and used to solve a series of robot navigation tasks. SPEAKER:
Dr Joel Martin, National Research Council, email: Joel.Martin@iit.nrc.ca
TOPIC:
Clustering Documents in Any Language? DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
A collection of documents can be more useful if it is organized or
clustered, but most automatic clustering techniques rely on a preprocessing
step that identifies words or stems and discards a known list of irrelevant
words. Designing the
preprocessing step for a new language is usually time consuming because a
human language user must choose a set of stemming rules and stop words by
hand using some form of trial and error. I will present some work in
progress that would allow clustering in an arbitrary language without
requiring a language user to identify stems and a stoplist.
The system learns a suffix tree description (essentially a grammar)
of frequent subsequences in a large collection of documents and then uses
that knowledge to cluster the documents and produce descriptive labels for
those documents. Initial results
suggest that in English, the automatic technique is comparable to using
hand-generated stemming rules and hand-picked stoplists. SPEAKER:
Dr Yllias Chali, TOPIC:
Lexical Chains as an Indicator of the Text Segment Topic
DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
Lexical cohesion is a device for creating unity in text, it arises
from the semantic relationship between words. We investigate a technique
relying on a model of the topic progression in the multi-paragraph text,
derived from lexical chains, and without requiring its full semantic
interpretation. We present two algorithms for the computation of these
chains: Roget's thesaurus based version and WordNet thesaurus based version.
Lexical chaining proceeds in three steps: the original text is first
segmented, select a set of candidate words, find the relatedness among the
members of the chains, and build up the chain. Finally, we show the use of
lexical chaining in the process of segment selection for the purpose of text
summarization. SPEAKER:
Dr TOPIC:
An Automated Method for Studying Interactive Systems
DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
Information Retrieval (IR) is usually an interactive process, which
makes evaluation of IR systems tricky. System studies - studies that measure
relevance and precision on well defined tasks in a well defined document
environment - are limited in their application because they don't do justice
to user aspects and interactivity. User studies are either costly and time
consuming, or of a small scale. In our group, we explore the possibility of
adding interactivity to system studies: they may help to bridge the gap
between user studies and traditional system studies. In the simulations,
samples are taken from the set of all possible user actions. Evaluation is
done by comparing the performances after different sequences of such
interactions. Several machine learning methods are then employed to identify
those categories of actions that are most likely to lead to the best end
results. In the seminar, I intend to discuss this methodology and present
the results of a first run of simulations. SPEAKER:
Jerome Mathieu, Computer Systems Officer, National Research
Council TOPIC:
Adaptation of a Keyphrase Extractor for Japanese Text
DATE:
Friday, Febuary 5th, 1999 PLACE:
Room 318, MacDonald Hall, ABSTRACT:
Compared to keyphrase extraction from an English or French corpus,
extracting keyphrases from Japanese text needs a different approach,
especially for parsing, stemming, and scoring.
This talk will discuss the
characteristics of a typical Japanese document and explain the
implementation of an efficient and effective way of dealing with
Japanese keyphrase extraction SPEAKER:
Dr Claire Cardie, TOPIC:
Machine Learning for Information Extraction Systems
DATE:
Friday, Febuary 12th, 1999 PLACE:
Room 318, MacDonald Hall, ABSTRACT:
A major obstacle to building robust systems that can read, summarize,
and extract information from text is the need for large amounts of
linguistic knowledge to handle the myriad syntactic, semantic, and pragmatic
ambiguities that pervade virtually all aspects of text analysis.
This talk will first briefly summarize existing work that addresses
this knowledge engineering bottleneck for information extraction systems.
We will then present a new approach to partial parsing of natural
language texts that supports large-scale information extraction applications
and relies on machine learning methods.
The approach combines corpus-based grammar induction with a very
simple pattern-matching algorithm and an optional constituent verification
step. In spite of its
simplicity, we will show that performance is surprisingly good for
applications that require orprefer fairly simple constituent bracketing. SPEAKER:
Dr Rob Holte, TOPIC:
Machine Learning with Skewed Class Distributions DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
Many real-world concept learning applications involve detecting rare
events. Datasets for such
applications will be highly skewed, with positive examples (the events of
interest) being far outnumbered by negative examples.
This severe imbalance often causes existing concept learning systems
to perform poorly. In this talk I will report on my current investigations
into this phenomenon, which focuses on the nearest neighbour learning
algorithm (IB1). SPEAKER:
Anantha Mahadevan, TOPIC:
Using data mining and logistic analysis on an E-Commerce dataset DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
The InterNeg project (http://interneg.org) was created to study
anonymous negotiations through the development of net-centric systems, and
one of the systems developed is called INSPIRE. Data generated by INSPIRE is
analyzed using naive-bayes, entropy-based decision tree and CHAID methods.
These data mining methods provide exploratory models that indicate
general interaction between
variables. Statistical logistic
analysis, which is based on log measures and generalized linear modelling,
is used to further analyze the initial models. This talk will present the
data mining and logistic methods, as well as initial results from the
analyses. SPEAKER:
Dr. Marek Zaremba, Departement d'informatique, UQAH TOPIC:
Design of Heterogeneous Intelligent Systems for Concurrent
Processing DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
This talk will briefly summarise the issues involved in the design of
systems that combine different computational intelligence techniques. The
main part of the talk will present a design and programming tool that allows
the components of a heterogeneous system to communicate and synchronise in
run time in a dynamic distributed computing environment. The approach used
in the design of the concurrent programming tool builds on the strength of
process-algebraic formal models for system specification and verification
tasks. The distributed system design process will
illustrated with an example of a multi-spectral image classification
problem. SPEAKER:
Dr. Jean-Francois Delannoy, RES International, TOPIC:
On argumentation analysis DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
For computational linguists, discourse analysis is mostly restricted
to text organization and the forms of exchange in conversation.
More attention is due to the analysis of argumentation proper, that
is: the claim/proof core (claims, model description, explicit inferences,
evidence, hypothesis formulation and testing), and its rhetorical
supplements (irony; analogy,...). Argumentation analysis can render services
in summarization, information extraction, information retrieval, critical
reading, scientific communication, and decision making. I will present a
draft notation for the mark-up of argumentation, and show ways to mark up
texts automatically or semi-automatically. SPEAKER:
Messaouda Ouerd, TOPIC:
Learning Bayes Belief Networks:
Test Data Generation DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
In this talk we present the problem of learning in belief networks.
Our goal is to build a probabilistic network from the distribution of
the data which adequately represents the data. It is assumed that no
information about the probability is available. We consider representations
using Chow's dependence trees and Polytrees (Singly Connected Networks)as
structures for inferring causal knowledge if the training samples are given.
To test the validity of the theoretical algorithms that we have developed we
propose a method for generating a sample data obeying an underlying
structure which can be any Directed Acyclic Graph. The results of this
process will be used to test the learning algorithms. SPEAKER:
Dr Joel Martin, National Research Council, email: Joel.Martin@iit.nrc.ca
TOPIC:
Which System Configuration is Best?
DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
Many complex systems have parameters that allow them to be endlessly
configured, but not even the designers know which configurations are the
best for a new application. Each
configuration of the system can be simulated (or run if it a computer
system) on a representative set of problems.
One way to find the best configuration is to perform a randomized or
a hill-climbing search trying to optimize the simulation results.
Although this technique will find an optimal configuration, it does
not guarantee that the observed optimal is the only one or that the observed
optimal will still be best for a different representative sample of
problems. Another approach is to
simulate as many different configurations as possible from across the entire
space. These simulation results
would then be analyzed to make general statements about which configurations
are reliably better than others. This
talk will discuss how to analyze the results of simulating a large number of
system configurations. These
techniques have been applied to human-computer systems, machine learning
systems, and traditional discrete-event systems. SPEAKER:
Jelber Sayyad Shirabad,
TOPIC:
Classifying Software Problem Reports DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
In this presentation I will talk about the task of classifying
problem reports, regarding the operation of a telephone switch software, to
a fixed set of problem types. While the intention of using a problem type is
to aid the software engineers in finding the cause of the problem, in
practice, due to different reasons, many times the type is not assigned to a
problem. The talk will discuss issues surrounding modeling this task as a
text classification problem, and decisions that one has to make regarding
the nature of the training sets, and their impact on measures such as
accuracy, precision, recall. We will present some of the results obtained by
altering different parameters and factors in this particular learning task.
SPEAKERS:
Dr Rob Holte and Istvan Hernadvolgyi, TOPIC:
A Space-Time Tradeoff for Memory-based Heuristics DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
A memory-based heuristic is a function, h(s), stored in the form of a
lookup table (pattern database): h(s) is computed by mapping $s$ to an index
and then retrieving the appropriate entry in the table. (Korf,1997)
conjectures for search using memory-based heuristics that m*t is a constant,
where m is the size of the heuristic's lookup table and t is search time. In
this paper we present a method for automatically generating memory-based
heuristics and use this to test Korf's conjecture in a large-scale
experiment. This is an extended version of a talk that will be given at
AAAI'99 in July.
|