![]() The Text Analysis and Machine Learning Group Executive summary | What is Knowledge Management? | The team | Brief history | Results and accomplishments | Current graduate students | International, national and industrial cooperation
SPEAKER:
Heide Brucher
TOPIC:
Clustering queries
DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
Every time a user issues a query he tells us something about his
actual information needs. Probably he wants to reuse the queries he executed
some time ago maybe with a light topic shift. The query he had issued once
probably would be a good starting point for him. Concerning a sequence of
queries we will have sets of queries in this sequence that belong together
because they are dealing with the same topic. Usually, the queries which are
topically related will not be issued in one contiguous perpetual stream. If
the user wants to reuse a query that is part of that stream he has to
recover it. For the recovery of the used queries it is necessary to organize
them in a way so it is possible to reuse them. One possibility to organize
the queries is to cluster them according to the topic they are related to
(content-based clustering). SPEAKER:
Jérôme Tétreault, TOPIC:
Bilingual text alignment based on word occurrence information
DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
Bilingual parallel corpora represent one of the most valuable source
of information for the development of translation resources. Aligned
corpora, which are obtained by aligning corresponding segments
(usually sentences) of texts, have proved very useful in many tasks, such as
statistical machine translation, bilingual lexicography, and word sense
disambiguation. In this talk, I will give a brief overview of published work
on parallel texts alignment, outlining different approaches and their domain
of application. I will present, in more details, an algorithm which uses
dynamic programming techniques to compare word ccurrence vectors. This
algorithm is based on previous work by Fung, to which some modifications
have been introduced. The algorithm aims at extracting approximate bilingual
lexicons from bilingual corpora, assuming no knowledge of either language
and no prior sentence-level or paragraph-level alignment. Results of
extracted bi-lexicons using the Hansard corpus will be presented. We
envisage that the extracted bi-lexicon could further be used to produce a
set of anchor points between the texts, allowing alignment at a finer level
with high accuracy. SPEAKER:
Ebenezer Ntienjem, ntienjem@usa.net TOPIC:
Completion of Logic Programs with respect to
Negation-As-Finite-Failure. DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
The procedural semantics of logic programs, expressed by the
so-called SLDNF-resolution, treats a variable occurring in a negative
literal in the body but not in the head of a program clause as universally
quantified whereas
SPEAKER:
Tom Mitchell, President-Elect, American Association for Artificial
Intelligence DATE:
TITLE:
Extracting Information from the World Wide Web ABSTRACT:
Consider the fact that although your computer workstation can now
retrieve any of 600,000,000 pages on the World Wide Web, it unfortunately
cannot understand their content. This
is, of course, because web pages are written to be understandable to people,
not computers. The goal of our research is to automatically extract a very
large database of facts that mirror the content of the Web, and that can be
manipulated by computer. If we can
achieve this goal, it will enable using the web as a gargantuan data base and
knowledge base to support a rich variety of applications.
Our approach is to use machine learning algorithms to train a system to
automatically extract information from web hypertext.
For example, in one set of experiments our system was trained to
extract descriptions of faculty, students, research projects, and courses from
web sites of computer science departments.
It then used these learned extraction routines to build a database
containing thousands of new entries by automatically browsing new university
web sites. The system is currently
running 24 hours per day, and over the past eight months has built a knowledge
base containing over 100,000 assertions, with an accuracy of roughly 70%. This
talk will present the machine learning algorithms we have developed to date,
along with experimental results suggesting these methods can be quite
effective for information extraction in certain
domains. BIO: Tom M.
Mitchell is the Fredkin Professor of Artificial Intelligence and Learning in
the SPEAKER:
TOPIC:
Evaluation of Interactive Information Retrieval DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
Our experiment platform - a full-text information retrieval system
- supports Query Expansion (QE) by Relevance Feedback (RF). I.e., the
system suggests additional query terms that are typical for relevant
documents and a typical for irrelevant or unretrieved documents.
Evaluation of QE/RF is slightly problematic. A user study is costly and
due to tensions between experiment size, expected variations between users
and variations between retrieval tasks, effects are difficult to establish
statistically and the results are difficult to generalize. A system study
lacks the input of users and disregards the interactive character of the
system. A system study with a simulated user will lack the essential
credibility. A system study with a "cohort" of simulated users -
each one behaving slightly differently - could be a step to solve the
evaluation problems. The presentation describes: (1) the place that this
new experimental method has among existing methods, (2) the design of the
study, (3) the design of the group of simulated users, (4) the results of
the experiments, (5) the restrictions and assumptions that apply, (6)
examples of possible future studies of the framework of the Intelligent
Information Access project. How does it relate to my previous Tamale
presentation: a more extensive methodological discussion, more results,
further analyses, in-depth interpretation, still no clip-art, still no
donuts. SPEAKER:
Joel Martin, National Research Council, Email: joel.martin@iit.nrc.ca TOPIC:
Design of a Better Question Answering System
DATE:
PLACE:
Room 318, MacDonald Hall, ABSTRACT:
A better search-engine would allow you to ask a natural language
question and would return an answer instead of 10,000 web pages.
In this talk I will review our current design for question
answering and summarize the design of all the other systems that were
presented at TREC-8 in
|