PAPER DIGEST
Most Influential ICML 2007 Paper · 2026-03 edition

Combining Online And Offline Knowledge In UCT

Sylvain Gelly; David Silver

Venue
International Conference on Machine Learning (ICML) 2007
Recognition
Most Influential ICML 2007 Paper (Rank No. 12)
Edition
2026-03
Impact factor
7
Certificate ID
925b5ec5e90f9fc5

Abstract

The UCT algorithm learns a value function online using sample-based search. The <i>TD</i>(λ) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 x 9 Go against GnuGo 3.7.10. The first algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in <i>MoGo</i>, the world's strongest 9 x 9 Go program. Each technique significantly improves <i>MoGo's</i> playing strength.

Download PDF certificate