PILCO: A Model-Based And Data-Efficient Approach To Policy Search

Marc Deisenroth; Carl Rasmussen

Venue: International Conference on Machine Learning (ICML) 2011
Recognition: Most Influential ICML 2011 Paper (Rank No. 5)
Edition: 2026-03
Impact factor: 9
Certificate ID: 9e706a80db99ca05

Abstract

In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

Download PDF certificate