# Paper Digest: COLT 2016 Highlights

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The Annual Conference on Learning Theory (COLT) focuses on addressing theoretical aspects of machine learing and related topics.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to **sign up our free daily paper digest service ** to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team

team@paperdigest.org

#### TABLE 1: COLT 2016 Papers

Title | Authors | Highlight | |
---|---|---|---|

1 | Conference on Learning Theory 2016: Preface | Vitaly Feldman, Alexander Rakhlin | Conference on Learning Theory 2016: Preface |

2 | Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies | Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar | Fitting this model to the environment, where the partial observation is given to the agent, generally gives dramatic performance improvement, sometimes unbounded improvement, compared to MDP. |

3 | Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture | Lijie Chen, Jian Li | We propose to study the instance-wise optimality for the BEST-1-ARM problem. |

4 | Open Problem: Kernel methods on manifolds and metric spaces. What is the probability of a positive definite geodesic exponential kernel? | Aasa Feragen, S�ren Hauberg | From this we formulate conjectures on the probability of a positive definite kernel matrix for a finite random sample, depending on the geometry of the data space and the spread of the sample. |

5 | Open Problem: Second order regret bounds based on scaling time | Yoav Freund | We propose a different form of a second order bound and conjecture the it is satisfied by NormalHedge ChaudhuriFrHs2009. |

6 | Open Problem: Property Elicitation and Elicitation Complexity | Rafael Frongillo, Ian Kash, Stephen Becker | We give concrete instantiations of these important questions, which have numerous applications to machine learning and related fields. |

7 | Open Problem: Parameter-Free and Scale-Free Online Algorithms | Francesco Orabona, D�vid P�l | Is is possible to design algorithms that are simultaneously \emphparameter-free and \emphscale-free? |

8 | An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives | Shipra Agrawal, Nikhil R. Devanur, Lihong Li | We consider a contextual version of multi-armed bandit problem with global knapsack constraints. |

9 | Learning and Testing Junta Distributions | Maryam Aliakbarpour, Eric Blais, Ronitt Rubinfeld | We consider the problem of learning distributions in the presence of irrelevant features. |

10 | Sign rank versus VC dimension | Noga Alon, Shay Moran, Amir Yehudayoff | We show that the sign rank of the signed version of this matrix is at least ∆/λ. |

11 | Efficient approaches for escaping higher order saddle points in non-convex optimization | Animashree Anandkumar, Rong Ge | In this paper we use higher order derivatives to escape these saddle points: we design the first efficient algorithm guaranteed to converge to a third order local optimum (while existing techniques are at most second order). |

12 | Monte Carlo Markov Chain Algorithms for Sampling Strongly Rayleigh Distributions and Determinantal Point Processes | Nima Anari, Shayan Oveis Gharan, Alireza Rezaei | We show that the "natural" Monte Carlo Markov Chain (MCMC) algorithm mixes rapidly in the support of a homogeneous strongly Rayleigh distribution. |

13 | An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits | Peter Auer, Chao-Kai Chiang | We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits. |

14 | Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models | Bernardo �vila Pires, Csaba Szepesv�ri | In this paper we study a model-based approach to calculating approximately optimal policies in Markovian Decision Processes. |

15 | Learning and 1-bit Compressed Sensing under Asymmetric Noise | Pranjal Awasthi, Maria-Florina Balcan, Nika Haghtalab, Hongyang Zhang | In this work, we introduce algorithms with nearly optimal guarantees for both problems under two realistic noise models, \emphbounded (Massart) noise and \emphadversarial (agnostic) noise, when the measurements x_i’s are drawn from any isotropic log-concave distribution. |

16 | Reinforcement Learning of POMDPs using Spectral Methods | Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar | We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. |

17 | Highly-Smooth Zero-th Order Online Optimization | Francis Bach, Vianney Perchet | In this paper we consider convex optimization with noisy zero-th order information, that is noisy function evaluations at any desired point. |

18 | An Improved Gap-Dependency Analysis of the Noisy Power Method | Maria-Florina Balcan, Simon Shaolei Du, Yining Wang, Adams Wei Yu | In this paper, we present a new analysis of the noisy power method that achieves improved gap dependency for both sample complexity and noise tolerance bounds. |

19 | Learning Combinatorial Functions from Pairwise Comparisons | Maria-Florina Balcan, Ellen Vitercik, Colin White | In this model, we present a series of novel algorithms that learn over a wide variety of combinatorial function classes. |

20 | Instance-dependent Regret Bounds for Dueling Bandits | Akshay Balsubramani, Zohar Karnin, Robert E. Schapire, Masrour Zoghi | Following Dudik et al. (2015), we aim for algorithms whose performance approaches that of the optimal randomized choice of actions, the von Neumann winner, expressly avoiding more restrictive assumptions, for instance, regarding the existence of a single best action (a Condorcet winner). |

21 | On the low-rank approach for semidefinite programs arising in synchronization and community detection | Afonso S. Bandeira, Nicolas Boumal, Vladislav Voroninski | We focus on Synchronization and Community Detection problems and provide theoretical guarantees shedding light on the remarkable efficiency of this heuristic. |

22 | Information-theoretic thresholds for community detection in sparse networks | Jess Banks, Cristopher Moore, Joe Neeman, Praneeth Netrapalli | We give upper and lower bounds on the information-theoretic threshold for community detection in the stochastic block model. |

23 | Noisy Tensor Completion via the Sum-of-Squares Hierarchy | Boaz Barak, Ankur Moitra | In the noisy tensor completion problem we observe m entries (whose location is chosen uniformly at random) from an unknown n_1 \times n_2 \times n_3 tensor T. |

24 | Basis Learning as an Algorithmic Primitive | Mikhail Belkin, Luis Rademacher, James Voss | Specifically, we describe a class of BEFs such that their local maxima on the unit sphere are in one-to-one correspondence with the basis elements. |

25 | Aggregation of supports along the Lasso path | Pierre C. Bellec | In linear regression with fixed design, we propose two procedures that aggregate a data-driven collection of supports. |

26 | Dropping Convexity for Faster Semi-definite Optimization | Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi | We study the minimization of a convex function f(X) over the set of n \times n positive semi-definite matrices, but when the problem is recast as \min_U g(U) := f(UU^⊤), with U ∈\mathbbR^n \times r and r ≤n. |

27 | Multi-scale exploration of convex functions and bandit convex optimization | S�bastien Bubeck, Ronen Eldan | We construct a new map from a convex function to a distribution on its domain, with the property that this distribution is a multi-scale exploration of the function. |

28 | Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem | Alexandra Carpentier, Andrea Locatelli | We consider the problem of \textitbest arm identification with a \textitfixed budget T, in the K-armed stochastic bandit setting, with arms distribution defined on [0,1]. |

29 | Delay and Cooperation in Nonstochastic Bandits | Nicol�o Cesa-Bianchi, Claudio Gentile, Yishay Mansour, Alberto Minora | We introduce Exp3-Coop, a cooperative version of the Exp3 algorithm and prove that with K actions and N agents the average per-agent regret after T rounds is at most of order \sqrt\left(d+1 + \fracKN\alpha_≤d\right)(T\ln K), where \alpha_≤d is the independence number of the d-th power of the communication graph G. |

30 | On the Approximability of Sparse PCA | Siu On Chan, Dimitris Papailliopoulos, Aviad Rubinstein | Our contributions include: \beginenumerate \item a simple and efficient algorithm that achieves an n^-1/3-approximation; \item NP-hardness of approximation to within (1-\varepsilon), for some small constant \varepsilon > 0; \item SSE-hardness of approximation to within \em any constant factor; and \item an \exp\exp\left(Ω\left(\sqrt\log \log n\right)\right) (“quasi-quasi-polynomial”) gap for the standard semidefinite program. |

31 | Pure Exploration of Multi-armed Bandit Under Matroid Constraints | Lijie Chen, Anupam Gupta, Jian Li | Our goal is to identify a basis of \mathcalM with the maximum total weight, using as few samples as possible. |

32 | Provably manipulation-resistant reputation systems | Paul Christiano | We propose a reputation system which provably achieves a very strong correctness guarantee under extremely pessimistic assumptions—it works even given a supermajority of malicious users, converges to optimal behavior after a constant number of interactions per user, does not require repeated interactions, and accommodates time-varying quality of resources. |

33 | On the Expressive Power of Deep Learning: A Tensor Analysis | Nadav Cohen, Or Sharir, Amnon Shashua | In this work we derive a deep network architecture based on arithmetic circuits that inherently employs locality, sharing and pooling. |

34 | A Light Touch for Heavily Constrained SGD | Andrew Cotter, Maya Gupta, Jan Pfeifer | For heavily-constrained objectives, we propose an efficient extension of SGD that stays close to the feasible region while only applying constraints probabilistically at each iteration. |

35 | Adaptive Learning with Robust Generalization Guarantees | Rachel Cummings, Katrina Ligett, Kobbi Nissim, Aaron Roth, Zhiwei Steven Wu | In this paper, we study three notions of generalization—increasing in strength—that are \emphrobust to postprocessing and amenable to adaptive composition, and examine the relationships between them. |

36 | Complexity Theoretic Limitations on Learning DNF�s | Amit Daniely, Shai Shalev-Shwartz | Using the recently developed framework of Daniely, Linial and Shalev-Shwartz, we show that under a natural assumption on the complexity of random K-SAT, learning DNF formulas is hard. |

37 | Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables | I. Diakonikolas, D. M. Kane, A. Stewart | In this paper, we tightly characterize the sample and computational complexity of this problem. |

38 | Properly Learning Poisson Binomial Distributions in Almost Polynomial Time | I. Diakonikolas, D. M. Kane, A. Stewart | We give an algorithm for properly learning Poisson binomial distributions. |

39 | Asymptotic behavior of \ell_p-based Laplacian regularization in semi-supervised learning | Ahmed El Alaoui, Xiang Cheng, Aaditya Ramdas, Martin J. Wainwright, Michael I. Jordan | We present a theoretical study of \ell_p-based Laplacian regularization under a d-dimensional geometric random graph model. |

40 | The Power of Depth for Feedforward Neural Networks | Ronen Eldan, Ohad Shamir | We show that there is a simple (approximately radial) function on \mathbbR^d, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, to more than a certain constant accuracy, unless its width is exponential in the dimension. |

41 | Online Learning and Blackwell Approachability in Quitting Games | Janos Flesch, Rida Laraki, Vianney Perchet | We consider the sequential decision problem known as regret minimization, or more precisely its generalization to the vectorial or multi-criteria setup called Blackwell approachability. |

42 | Spectral thresholds in the bipartite stochastic block model | Laura Florescu, Will Perkins | Nevertheless, we propose a simple spectral algorithm, Diagonal Deletion SVD, which recovers the partition at a nearly optimal edge density. |

43 | Online Sparse Linear Regression | Dean Foster, Satyen Kale, Howard Karloff | We consider the online sparse linear regression problem, which is the problem of sequentially making predictions observing only a limited number of features in each round, to minimize regret with respect to the best sparse linear regressor, where prediction accuracy is measured by square loss. |

44 | Preference-based Teaching | Ziyuan Gao, Christoph Ries, Hans Simon, Sandra Zilles | We introduce a new model of teaching named “preference-based teaching” and a corresponding complexity parameter—the preference-based teaching dimension (PBTD)—representing the worst-case number of examples needed to teach any concept in a given concept class. |

45 | Optimal Best Arm Identification with Fixed Confidence | Aur�lien Garivier, Emilie Kaufmann | We propose the ‘Track-and-Stop’ strategy, which we prove to be asymptotically optimal. |

46 | Maximin Action Identification: A New Bandit Framework for Games | Aur�lien Garivier, Emilie Kaufmann, Wouter M. Koolen | We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower- and upper- confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. |

47 | Semidefinite Programs for Exact Recovery of a Hidden Community | Bruce Hajek, Yihong Wu, Jiaming Xu | We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n \times n symmetric data matrix A, where for distinct indices i,j, A_ij ∼P if i, j are both in the community and A_ij ∼Q otherwise, for two known probability distributions P and Q. |

48 | Online Learning with Low Rank Experts | Elad Hazan, Tomer Koren, Roi Livni, Yishay Mansour | We consider the problem of prediction with expert advice when the losses of the experts have low-dimensional structure: they are restricted to an unknown d-dimensional subspace. |

49 | Optimal rates for total variation denoising | Jan-Christian H�tter, Philippe Rigollet | Motivated by its practical success, we show that the 2D total variation denoiser satisfies a sharp oracle inequality that leads to near optimal rates of estimation for a large class of image models such as bi-isotonic, Hölder smooth and cartoons. |

50 | Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja�s Algorithm | Prateek Jain, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford | In this paper we provide improved guarantees for streaming principal component analysis (PCA). |

51 | Online Isotonic Regression | Wojciech Kotlowski, Wouter M. Koolen, Alan Malek | We consider the online version of the isotonic regression problem. |

52 | Time series prediction and online learning | Vitaly Kuznetsov, Mehryar Mohri | We present a series of theoretical and algorithmic results combining the benefits of the statistical learning approach to time series prediction with that of on-line learning. |

53 | Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits | Tor Lattimore | Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. |

54 | Gradient Descent Only Converges to Minimizers | Jason D. Lee, Max Simchowitz, Michael I. Jordan, Benjamin Recht | We show that gradient descent converges to a local minimizer, almost surely with random initial- ization. |

55 | Learning Communities in the Presence of Errors | Konstantin Makarychev, Yury Makarychev, Aravindan Vijayaraghavan | In this paper, we initiate the study of robust algorithms for partial recovery in SBM with modeling errors or noise. |

56 | On the capacity of information processing systems | Laurent Massoulie, Kuang Xu | We propose and analyze a family of \emphinformation processing systems, where a finite set of experts or servers are employed to extract information about a stream of incoming jobs. |

57 | Learning Simple Auctions | Jamie Morgenstern, Tim Roughgarden | We present a general framework for proving polynomial sample complexity bounds for the problem of learning from samples the best auction in a class of “simple” auctions. |

58 | Density Evolution in the Degree-correlated Stochastic Block Model | Elchanan Mossel, Jiaming Xu | In this paper, we address the more refined question of how many vertices that will be misclassified on average. |

59 | Cortical Computation via Iterative Constructions | Christos Papadimitriou, Samantha Petti, Santosh Vempala | We study Boolean functions of an arbitrary number of input variables that can be realized by simple iterative constructions based on constant-size primitives. |

60 | When can we rank well from comparisons of O(n\log(n)) non-actively chosen pairs? | Arun Rajkumar, Shivani Agarwal | Here we introduce a broad family of ‘low-rank’ conditions on the probabilities P_ij under which the resulting preference matrix P has low rank under some link function, and show these conditions encompass the BTL and Thurstone classes as special cases, but are considerably more general. |

61 | How to calculate partition functions using convex programming hierarchies: provable bounds for variational methods | Andrej Risteski | We consider the problem of approximating partition functions for Ising models. |

62 | Simple Bayesian Algorithms for Best Arm Identification | Daniel Russo | I propose three simple Bayesian algorithms for adaptively allocating measurement effort. |

63 | Interactive Algorithms: from Pool to Stream | Sivan Sabato, Tom Hess | We consider interactive algorithms in the pool-based setting, and in the stream-based setting. |

64 | Best-of-K-bandits | Max Simchowitz, Kevin Jamieson, Benjamin Recht | We present distribution-dependent lower bounds based on a particular construction which force a learner to consider all N-choose-K subsets, and match naive extensions of known upper bounds in the bandit setting obtained by treating each subset as a separate arm. |

65 | Memory, Communication, and Statistical Queries | Jacob Steinhardt, Gregory Valiant, Stefan Wager | We introduce a formal framework for studying these questions, and investigate the relationship between the fundamental resources of memory or communication and the sample complexity of the learning task. |

66 | benefits of depth in neural networks | Matus Telgarsky | benefits of depth in neural networks |

67 | A Guide to Learning Arithmetic Circuits | Ilya Volkovich | In this paper, we exhibit several connections between learning algorithms for arithmetic circuits and other problems. |

68 | Online learning in repeated auctions | Jonathan Weed, Vianney Perchet, Philippe Rigollet | We adopt an online learning approach with bandit feedback to model this problem and derive bidding strategies for two models: stochastic and adversarial. |

69 | The Extended Littlestone�s Dimension for Learning with Mistakes and Abstentions | Chicheng Zhang, Kamalika Chaudhuri | In this paper, we address both challenges. |

70 | First-order Methods for Geodesically Convex Optimization | Hongyi Zhang, Suvrit Sra | In this paper we contribute to the understanding of g-convex optimization by developing iteration complexity analysis for several first-order algorithms on Hadamard manifolds. |