The Text Analysis and Machine Learning Group

Executive summary | What is Knowledge Management? | The team | Brief history | Results and accomplishments | Current graduate students | International, national and industrial cooperation

 


Winter 1999


 

SPEAKER:             Chris Drummond, University of Ottawa , email: cdrummon@csi.uottawa.ca  

TOPIC:                   Symbols, Systematicity and Synergisms  

DATE:                    Friday, January, 8th, 1999  

PLACE:                   Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         In the solving of complex tasks, it is important to exploit the results of prior learning. If transfer occurs at the level of the whole task, the likelihood of previous learning being relevant is small. If the complex task can be broken down into smaller parts then this likelihood will be considerably increased. It is this structure sensitivity which is the critical property of Fodor's Language of Thought hypothesis. The main argument in support of this view is that thought is strongly systematic, that being able to think some thoughts is intrinsically linked to being able to think other thoughts. This work looks at the idea that being able to solve some tasks is intrinsically linked to being able to solve other tasks. This talk discusses an approach that realises this systematicity property by combining symbolic and associative processes. An associative learning algorithm generates solutions to subtasks. These are composed by a process much like symbolic planning to form a solution to a complex task.  This is further refined by the associative learning algorithm so that it becomes more synergistic, the solutions to subtasks becoming more interdependent. This talk will contrast this approach to others from the Artificial Intelligence community and related fields.  It will demonstrate its viability by showing how it is implemented and used to solve a series of robot navigation tasks.  

   

SPEAKER:             Dr Joel Martin, National Research Council, email: Joel.Martin@iit.nrc.ca               

TOPIC:                   Clustering Documents in Any Language?
                               
Clustering Sequences based on Frequent Subsequences.
               

DATE:                      Friday, January 15th, 1999  

PLACE:                  Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         A collection of documents can be more useful if it is organized or clustered, but most automatic clustering techniques rely on a preprocessing step that identifies words or stems and discards a known list of irrelevant words.  Designing the preprocessing step for a new language is usually time consuming because a human language user must choose a set of stemming rules and stop words by hand using some form of trial and error. I will present some work in progress that would allow clustering in an arbitrary language without requiring a language user to identify stems and a stoplist.  The system learns a suffix tree description (essentially a grammar) of frequent subsequences in a large collection of documents and then uses that knowledge to cluster the documents and produce descriptive labels for those documents.  Initial results suggest that in English, the automatic technique is comparable to using hand-generated stemming rules and hand-picked stoplists.  

   

SPEAKER:             Dr Yllias Chali, University of Ottawa , email: ychali@csi.uottawa.ca               

TOPIC:                   Lexical Chains as an Indicator of the Text Segment Topic               

DATE:                    Friday, January 22th, 1999  

PLACE:                  Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         Lexical cohesion is a device for creating unity in text, it arises from the semantic relationship between words. We investigate a technique relying on a model of the topic progression in the multi-paragraph text, derived from lexical chains, and without requiring its full semantic interpretation. We present two algorithms for the computation of these chains: Roget's thesaurus based version and WordNet thesaurus based version. Lexical chaining proceeds in three steps: the original text is first segmented, select a set of candidate words, find the relatedness among the members of the chains, and build up the chain. Finally, we show the use of lexical chaining in the process of segment selection for the purpose of text summarization.

   

SPEAKER:             Dr Berry Debruijn, University of Ottawa , email: debruijn@csi.uottawa.ca               

TOPIC:                   An Automated Method for Studying Interactive Systems                 

DATE:                    Friday, January 29th, 1999  

PLACE:                  Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         Information Retrieval (IR) is usually an interactive process, which makes evaluation of IR systems tricky. System studies - studies that measure relevance and precision on well defined tasks in a well defined document environment - are limited in their application because they don't do justice to user aspects and interactivity. User studies are either costly and time consuming, or of a small scale. In our group, we explore the possibility of adding interactivity to system studies: they may help to bridge the gap between user studies and traditional system studies. In the simulations, samples are taken from the set of all possible user actions. Evaluation is done by comparing the performances after different sequences of such interactions. Several machine learning methods are then employed to identify those categories of actions that are most likely to lead to the best end results. In the seminar, I intend to discuss this methodology and present the results of a first run of simulations.

   

SPEAKER:             Jerome Mathieu, Computer Systems Officer, National Research Council
                               
email: Jerome.Mathieu@iit.nrc.ca
               

TOPIC:                   Adaptation of a Keyphrase Extractor for Japanese Text                 

DATE:                    Friday, Febuary 5th, 1999  

PLACE:                  Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         Compared to keyphrase extraction from an English or French corpus,  extracting keyphrases from Japanese text needs a different approach,  especially for parsing, stemming, and scoring.  This talk will discuss  the characteristics of a typical Japanese document and explain the  implementation of an efficient and effective way of dealing with  Japanese keyphrase extraction

   

SPEAKER:             Dr Claire Cardie, Cornell University , email: cardie@cs.cornell.ca               

TOPIC:                   Machine Learning for Information Extraction Systems                

DATE:                    Friday, Febuary 12th, 1999  

PLACE:                  Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         A major obstacle to building robust systems that can read, summarize, and extract information from text is the need for large amounts of linguistic knowledge to handle the myriad syntactic, semantic, and pragmatic ambiguities that pervade virtually all aspects of text analysis.  This talk will first briefly summarize existing work that addresses this knowledge engineering bottleneck for information extraction systems.  We will then present a new approach to partial parsing of natural language texts that supports large-scale information extraction applications and relies on machine learning methods.  The approach combines corpus-based grammar induction with a very simple pattern-matching algorithm and an optional constituent verification step.  In spite of its simplicity, we will show that performance is surprisingly good for applications that require orprefer fairly simple constituent bracketing.

   

SPEAKER:             Dr Rob Holte, University of Ottawa , email: Holte@site.uottawa.ca  

TOPIC:                   Machine Learning with Skewed Class Distributions  

DATE:                    Friday, March 12th, 1999  

PLACE:                  Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         Many real-world concept learning applications involve detecting rare events.  Datasets for such applications will be highly skewed, with positive examples (the events of interest) being far outnumbered by negative examples.  This severe imbalance often causes existing concept learning systems to perform poorly. In this talk I will report on my current investigations into this phenomenon, which focuses on the nearest neighbour learning algorithm (IB1).

   

SPEAKER:             Anantha Mahadevan, Carleton University , email: amahadev@business.carleton.ca  

TOPIC:                   Using data mining and logistic analysis on an E-Commerce dataset  

DATE:                    Friday, March 26, 1999  

PLACE:                  Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         The InterNeg project (http://interneg.org) was created to study anonymous negotiations through the development of net-centric systems, and one of the systems developed is called INSPIRE. Data generated by INSPIRE is analyzed using naive-bayes, entropy-based decision tree and CHAID methods.  These data mining methods provide exploratory models that indicate general interaction  between variables.  Statistical logistic analysis, which is based on log measures and generalized linear modelling, is used to further analyze the initial models. This talk will present the data mining and logistic methods, as well as initial results from the analyses.

   

SPEAKER:             Dr. Marek Zaremba, Departement d'informatique, UQAH
                               
email: Marek_Zaremba@UQAH.UQuebec.CA  

TOPIC:                   Design of Heterogeneous Intelligent Systems for Concurrent Processing  

DATE:                    Friday, April 9th, 1999  

PLACE:                  Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         This talk will briefly summarise the issues involved in the design of systems that combine different computational intelligence techniques. The main part of the talk will present a design and programming tool that allows the components of a heterogeneous system to communicate and synchronise in run time in a dynamic distributed computing environment. The approach used in the design of the concurrent programming tool builds on the strength of process-algebraic formal models for system specification and verification tasks. The distributed system design process will  illustrated with an example of a multi-spectral image classification problem.

   

SPEAKER:             Dr. Jean-Francois Delannoy, RES International, Ottawa , email: delannoy@res.ca  

TOPIC:                   On argumentation analysis  

DATE:                    Friday, April 16th, 1999  

PLACE:                  Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         For computational linguists, discourse analysis is mostly restricted to text organization and the forms of exchange in conversation.  More attention is due to the analysis of argumentation proper, that is: the claim/proof core (claims, model description, explicit inferences, evidence, hypothesis formulation and testing), and its rhetorical supplements (irony; analogy,...). Argumentation analysis can render services in summarization, information extraction, information retrieval, critical reading, scientific communication, and decision making. I will present a draft notation for the mark-up of argumentation, and show ways to mark up texts automatically or semi-automatically.

   

SPEAKER:             Messaouda Ouerd, University of Ottawa ,  email: ouerd@site.uottawa.ca  

TOPIC:                   Learning Bayes Belief  Networks: Test Data Generation  

DATE:                    Friday, April 23th, 1999  

PLACE:                  Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         In this talk we present the problem of learning in belief networks.  Our goal is to build a probabilistic network from the distribution of the data which adequately represents the data. It is assumed that no information about the probability is available. We consider representations using Chow's dependence trees and Polytrees (Singly Connected Networks)as structures for inferring causal knowledge if the training samples are given. To test the validity of the theoretical algorithms that we have developed we propose a method for generating a sample data obeying an underlying structure which can be any Directed Acyclic Graph. The results of this process will be used to test the learning algorithms.

   

SPEAKER:             Dr Joel Martin, National Research Council, email: Joel.Martin@iit.nrc.ca               

TOPIC:                   Which System Configuration is Best?                               

DATE:                    Friday, April 30th, 1999 (not on the 23th)  

PLACE:                  Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         Many complex systems have parameters that allow them to be endlessly configured, but not even the designers know which configurations are the best for a new application.  Each configuration of the system can be simulated (or run if it a computer system) on a representative set of problems.  One way to find the best configuration is to perform a randomized or a hill-climbing search trying to optimize the simulation results.  Although this technique will find an optimal configuration, it does not guarantee that the observed optimal is the only one or that the observed optimal will still be best for a different representative sample of problems.  Another approach is to simulate as many different configurations as possible from across the entire space.  These simulation results would then be analyzed to make general statements about which configurations are reliably better than others.  This talk will discuss how to analyze the results of simulating a large number of system configurations.  These techniques have been applied to human-computer systems, machine learning systems, and traditional discrete-event systems.

 

SPEAKER:             Jelber Sayyad Shirabad,    University of Ottawa , email: jsayyad@csi.uottawa.ca  

TOPIC:                   Classifying Software Problem Reports  

DATE:                    Friday, May 7th, 1999  

PLACE:                  Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         In this presentation I will talk about the task of classifying problem reports, regarding the operation of a telephone switch software, to a fixed set of problem types. While the intention of using a problem type is to aid the software engineers in finding the cause of the problem, in practice, due to different reasons, many times the type is not assigned to a problem. The talk will discuss issues surrounding modeling this task as a text classification problem, and decisions that one has to make regarding the nature of the training sets, and their impact on measures such as accuracy, precision, recall. We will present some of the results obtained by altering different parameters and factors in this particular learning task. 

   

SPEAKERS:          Dr Rob Holte and Istvan Hernadvolgyi, University of Ottawa  

TOPIC:                   A Space-Time Tradeoff for Memory-based Heuristics  

DATE:                    Friday, May 14th, 1999  

PLACE:                  Room 318, MacDonald Hall, University of Ottawa  

ABSTRACT:         A memory-based heuristic is a function, h(s), stored in the form of a lookup table (pattern database): h(s) is computed by mapping $s$ to an index and then retrieving the appropriate entry in the table. (Korf,1997) conjectures for search using memory-based heuristics that m*t is a constant, where m is the size of the heuristic's lookup table and t is search time. In this paper we present a method for automatically generating memory-based heuristics and use this to test Korf's conjecture in a large-scale experiment. This is an extended version of a talk that will be given at AAAI'99 in July.

 

Return to main page