ABSTRACT:
Reinforcement learning is an approach for learning how to make sequential
optimal decisions, in real time, by interacting with a stochastic
environment. The main idea is that if an action results in an improved
situation, then the tendency to produce that action is strengthened, i.e.
reinforced. Reinforcement learning methods have developed at the
confluence of control theory, neuroscience, animal learning, machine
learning and operations research. Notable success stories include
world-best computer game players for Backgammon and Go, helicopter and
robotic control, computer network design, cell phone routing,
transportation and logistics.
A key problem in reinforcement learning is dealing with big data, in
terms of a very large or infinite number of environment configurations,
many possible actions, or a very fast sampling and decision rate. In this
talk, I will present three solutions to deal with these problems. First,
I will present temporal abstraction, an approach in which extended actions
are used to control the environment. I will describe the theoretical
framework of "options", which provides well-founded and very efficient
learning and planning algorithms, and has been used extensively by other
research groups in big data applications. Secondly, I will discuss
"off-policy" learning methods, which compute improved action choices from
data gathered under pre-existing control strategies, such as observational
studies in medical applications, or existing advertising policies in
e-commerce. I will describe the theory behind such methods and an
application to building a world-class computer network design system
(joint work with Nortel). Finally, I will discuss constructive function
approximation methods, which incrementally build a representation of the
value function (expected long-term return), based on a stream of data.
These methods offer an automated way of providing appropriate
regularization for the amount of data available.
Short bio: Dr. Doina Precup is an Associate Professor in the School of
Computer Science of McGill University. She earned her B.Sc. degree from
the Technical University Cluj-Napoca, Romania (1994) and her M.Sc. (1997)
and Ph.D. (2000) degrees from the University of Massachusetts Amherst,
where she was a Fulbright fellow. Her research interests lie in the area
of machine learning, with an emphasis on reinforcement learning and time
series data, as well as applications of machine learning and artificial
intelligence to activity recognition, medicine, electronic commerce and
robotics. She currently serves as chair of the NSERC Discovery Evaluation
Group for Computer Science.
|