PAPER DIGEST
Most Influential ICML 2006 Paper · 2026-03 edition

PAC Model-free Reinforcement Learning

Alexander L. Strehl; Lihong Li; Eric Wiewiora; John Langford; Michael L. Littman

Venue
International Conference on Machine Learning (ICML) 2006
Recognition
Most Influential ICML 2006 Paper (Rank No. 13)
Edition
2026-03
Impact factor
7
Certificate ID
33293fee99b146dd

Abstract

For a Markov Decision Process with finite state (size <i>S</i>) and action spaces (size <i>A</i> per state), we propose a new algorithm---Delayed Q-Learning. We prove it is PAC, achieving near optimal performance except for &Otilde;(<i>SA</i>) timesteps using <i>O(SA)</i> space, improving on the &Otilde;(<i>S</i><sup>2</sup><i>A</i>) bounds of best previous algorithms. This result proves efficient reinforcement learning is possible without learning a model of the MDP from experience. Learning takes place from a single continuous thread of experience---no resets nor parallel sampling is used. Beyond its smaller storage and experience requirements, Delayed Q-learning's per-experience computation cost is much less than that of previous PAC algorithms.

Download PDF certificate