The Text Analysis and Machine Learning Group Executive summary | What is Knowledge Management? | The team | Brief history | Results and accomplishments | Current graduate students | International, national and industrial cooperation
SPEAKER:
Lee Graham DATE:
TITLE:
A complete implementation of John Holland's Echo model for complex
adaptive systems, ABSTRACT:
Many natural and man-made systems exhibit high-level emergent behaviors,
which result from numerous intricate interactions within a large population
of primitive evolving components. Such systems are known as "Complex
Adaptive Systems" or "cas" and are extremely difficult to
model using conventional modeling techniques. John Holland, of the Santa Fe
Institute in SPEAKER:
Peter Turney DATE:
TITLE:
Thumbs Up or Thumbs Down? Semantic Orientation Applied to
Unsupervised Classification of Reviews ABSTRACT: This talk presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). The semantic orientation of a phrase is estimated using the PMI-IR algorithm, which combines Pointwise Mutual Information (PMI) with Information Retrieval (IR). The semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A given review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews. SPEAKER:
George Foster DATE:
TITLE:
User-Friendly Text Prediction for Translators ABSTRACT: Text
prediction is a form of interactive machine translation that is well suited
to skilled translators. In principle it can aid in the production of a
translation with minimal disruption to a translator's normal routine.
However, recent evaluations of a prototype prediction tool showed that it
significantly decreased the productivity of most translators who used it. In
this talk, I will analyze the reasons for this and describe a solution which
consists in seeking predictions that maximize the expected benefit to the
user, rather than just trying to anticipate some amount of upcoming text.
Using a model of a "typical translator" constructed from data
collected in the evaluations of the prediction prototype, I show that this
approach can turn text prediction into a help rather than a hindrance to a
translator. SPEAKER:
Rada Mihalcea DATE:
TITLE:
Efficient Data-Driven Methods for Word Sense Disambiguation ABSTRACT:
Word Sense Disambiguation (WSD) is well known to be one of the hardest
problems in Natural Language Processing, and yet a necessary step in a large
range of applications, including Machine Translation, Information Retrieval,
Knowledge Acquisition, and others. While humans usually encounter no
difficulties in identifying the correct sense of an ambiguous word, the task
turns out to be tremendously harder when needs to be performed by a
computer. This
talk will present a novel approach for data driven WSD, which relies on an
instance based learning algorithm improved with automatic feature selection.
I will first describe a large set of features that may be good indicators of
word sense, and then show how a subset of these features can be
automatically extracted to create an efficient classifier tailored to the
behavior of each ambiguous word. This algorithm was implemented in a system
that achieved excellent performance during the WSD Senseval-2 competition. A
useful side effect of the approach is that it provides us with interesting
insights into the efficiency of various features in automatic WSD. Since the
main drawback of data driven methods for WSD is the lack of sense tagged
corpora, the talk address this problem and investigate the possibility of
automatically building a partially sense tagged corpus out of WordNet. SPEAKER:
Sylvain Letourneau DATE:
TITLE: ANOREL: A technique for the analysis of attribute dependencies in inductive learning ABSTRACT: Several
machine learning algorithms assume that the attributes are conditionally
independent given the class attribute. When the data violates the
independence assumption, these learning algorithms are likely to infer
inadequate models. General solutions to reduce the requirement of
independence are typically non-practicable due to an enormous increase in
computational resources. We argue for an alternative approach with two
main steps. First, we try to efficiently identify the attribute
relationships that are likely to hurt the learning process. Second, we
apply targeted remedial measures to these relationships. In
this talk, we will focus on the first step (i.e. the identification of
dependencies) but we will also give an example of a remedial measure.
We will introduce two complementary techniques: a numerical one to
quickly find potentially important attribute relationships, and a
visualization tool providing additional insights on the nature of the
dependencies. We named these two techniques ANOREL (ANalysis Of
RELevance) and ANOREL Graphs, respectively. We will also discuss the
relations between the ANOREL and the ANOVA techniques. Finally, we will
use real world data to show how the ANOREL Graphs can facilitate the
identification of remedial measures and the impact on learning accuracy. SPEAKER:
Caroline Barriиre DATE:
TITLE:
SERT - a tool for extracting
knowledge from texts ABSTRACT: Several
machine learning algorithms assume that the attributes are conditionally
independent given the class attribute. When the data violates the
independence assumption, these learning algorithms are likely to infer
inadequate models. General solutions to reduce the requirement of
independence are typically non-practicable due to an enormous increase in
computational resources. We argue for an alternative approach with two
main steps. First, we try to efficiently identify the attribute
relationships that are likely to hurt the learning process. Second, we
apply targeted remedial measures to these relationships. In
this talk, we will focus on the first step (i.e. the identification of
dependencies) but we will also give an example of a remedial measure.
We will introduce two complementary techniques: a numerical one to
quickly find potentially important attribute relationships, and a
visualization tool providing additional insights on the nature of the
dependencies. We named these two techniques ANOREL (ANalysis Of
RELevance) and ANOREL Graphs, respectively. We will also discuss the
relations between the ANOREL and the ANOVA techniques. Finally, we
will use real world data to show how the ANOREL Graphs can facilitate the
identification of remedial measures and the impact on learning accuracy. SPEAKER:
Chris Drummond DATE:
Thursday 28th Marc, 2002h TITLE:
Inferring and Revising Theories with Confidence: ABSTRACT: I
hope this talk will interest you in two ways.
Firstly as an exploration of an important social phenomenon in SPEAKER:
Andrew Vardy DATE:
TITLE:
From Bugs to Bots: ABSTRACT: Insects
such as bees and ants exhibit impressive navigational abilities despite
having small brains and limited sensory capacity. They can reliably make
return trips to places of interest despite having to travel immense
distances (thousands of times their body length) while avoiding obstacles
and other dangers. To achieve
this they must operate in the face of constantly changing lighting and
environmental conditions. The
goal of this work is to seek inspiration from what is known of insect
navigation and apply it to the design of robot navigators.
A subsidiary goal is to provide concrete instantiations of models
from biology and psychology and allow them to be evaluated in a real-world
context.The work to be presented focuses on vision-based navigational
strategies. These strategies
attempt to presume as little as possible about the cognitive capacity of
insects (yes, bugs are dumb). The
discipline of Artificial Life provides the main design philosophy. Thus,
only low-level local processing will be applied to achieve the globally
emergent phenomenon of navigation.The work completed so far is preliminary
and is focussed on the sub-problem of returning to a single place of
interest following a displacement from that place.
An introduction to the "snapshot model" proposed by
Cartwright and Collett will be provided followed by the application of this
model to navigating in a 3-D world. A
simulation of this model will be presented along with some preliminary
results.
|