May 16, 2003

DATE:	Friday, May 16, 2003
TIME:	3:30 pm
PLACE:	Council Room (SITE 5-084)
TITLE:	Characteristics of Markov Decision Processes And Their Use in Exploration Strategies
PRESENTER:	Bohdana Ratitch McGill University
ABSTRACT: Reinforcement Learning (RL) is a framework for learning sequential decision making strategies in stochastic dynamic environments where the learning problem is modelled as a Markov Decision Process (MDP). This framework is very general and flexible, however, it has been noticed in the past that domain characteristics can have a significant influence on the difficulty of learning. We work on characterizing MDPs by means of quantitative measurable attributes, and study how different characteristics affect the performance of value-based reinforcement learning algorithms. The attributes measure mainly the amount of randomness in the environment as well as some structural properties of the transition and reward components of the MDP model. These attributes can be used to facilitate the design of reinforcement learning systems. For instance, in the talk, I will present a new approach for exploration in RL that we developed based on two such attributes. Our strategy facilitates a more uniform visitation of the state space, a more extensive sampling of actions with potentially high variance of the action-value function estimates, and encourages the RL agent to focus on states where it has most control over the outcomes of its actions. In contrast to other directed exploration methods, the exploration-relevant information can be precomputed beforehand and then used during learning without additional computation cost. Our exploration strategy can be used in combination with other existing exploration techniques, and we experimentally demonstrate that it can improve the performance of both undirected and directed exploration methods.