PAC Subset Selection In Stochastic Multi-armed Bandits

Shivaram Kalyanakrishnan; Ambuj Tewari; Peter Auer; Peter Stone

Venue: International Conference on Machine Learning (ICML) 2012
Recognition: Most Influential ICML 2012 Paper (Rank No. 14)
Edition: 2026-03
Impact factor: 6
Certificate ID: b378a86e2a2a9fdd

Abstract

We consider the problem of selecting, from among the arms of a stochastic n-armed bandit, a subset of size m of those arms with the highest expected rewards, based on efficiently sampling the arms. This �subset selection� problem finds application in a variety of areas. Kalyanakrishnan & Stone (2010) frame this problem under a PAC setting (denoting it �Explore-m�) and analyze corresponding sampling algorithms both formally and experimentally. Whereas their formal analysis is restricted to the worst case sample complexity of algorithms, in this paper, we design and analyze an algorithm (�LUCB�) with improved expected sample complexity. Interestingly LUCB bears a close resemblance to the well-known UCB algorithm for regret minimization. We obtain a sample complexity bound for LUCB that matches the best existing bound for single-arm selection (that is, when m = 1). We also provide a lower bound on the worst case sample complexity of PAC algorithms for Explore-m.

Download PDF certificate