# Paper Digest: COLT 2015 Highlights

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The Annual Conference on Learning Theory (COLT) focuses on addressing theoretical aspects of machine learing and related topics.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to **sign up our free daily paper digest service ** to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team

team@paperdigest.org

#### TABLE 1: COLT 2015 Papers

Title | Authors | Highlight | |
---|---|---|---|

1 | Conference on Learning Theory 2015: Preface | Peter Gr�nwald, Elad Hazan | Conference on Learning Theory 2015: Preface |

2 | Open Problem: Restricted Eigenvalue Condition for Heavy Tailed Designs | Arindam Banerjee, Sheng Chen, Vidyashankar Sivakumar | We pose the equivalent question for heavy-tailed distributions: Given a random design matrix drawn from a heavy-tailed distribution satisfying the smallball property (Mendelson, 2015), does the design matrix satisfy the RE condition with the same order of sample complexity as sub-Gaussian distributions? |

3 | Open Problem: The landscape of the loss surfaces of multilayer networks | Anna Choromanska, Yann LeCun, G�rard Ben Arous | The question we pose is whether it is possible to drop some of these assumptions to establish a stronger connection between both models. |

4 | Open Problem: The Oracle Complexity of Smooth Convex Optimization in Nonstandard Settings | Crist�bal Guzm�n | We propose a conjecture on the optimal convergence rates for these settings, for which a positive answer would lead to significant improvements on minimization algorithms for parsimonious regression models. |

5 | Open Problem: Online Sabotaged Shortest Path | Wouter M. Koolen, Manfred K. Warmuth, Dmitri Adamskiy | In this note we revisit this online routing problem in the case where in each trial some of the edges or components are sabotaged / blocked. |

6 | Open Problem: Learning Quantum Circuits with Queries | Jeremy Kun, Lev Reyzin | We pose an open problem on the complexity of learning the behavior of a quantum circuit with value injection queries. |

7 | Open Problem: Recursive Teaching Dimension Versus VC Dimension | Hans U. Simon, Sandra Zilles | We pose the following question: is the RTD upper-bounded by a function that grows only linearly in the VCD? |

8 | On Consistent Surrogate Risk Minimization and Property Elicitation | Arpit Agarwal, Shivani Agarwal | In this paper, we connect these two themes by showing that calibrated surrogate losses in supervised learning can essentially be viewed as eliciting or estimating certain properties of the underlying conditional label distribution that are sufficient to construct an optimal classifier under the target loss of interest. |

9 | Online Learning with Feedback Graphs: Beyond Bandits | Noga Alon, Nicol� Cesa-Bianchi, Ofer Dekel, Tomer Koren | We study a general class of online learning problems where the feedback is specified by a graph. |

10 | Learning Overcomplete Latent Variable Models through Tensor Methods | Animashree Anandkumar, Rong Ge, Majid Janzamin | In the unsupervised setting, a simple initialization algorithm based on SVD of the tensor slices is proposed, and the guarantees are provided under the stricter condition that k ≤βd (where constant βcan be larger than 1). |

11 | Simple, Efficient, and Neural Algorithms for Sparse Coding | Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra | Here we give a general framework for understanding alternating minimization which we leverage to analyze existing heuristics and to design new ones also with provable guarantees. |

12 | Label optimal regret bounds for online local learning | Pranjal Awasthi, Moses Charikar, Kevin A Lai, Andrej Risteski | In this work, we provide a complete answer to the question above via two main results. |

13 | Efficient Learning of Linear Separators under Bounded Noise | Pranjal Awasthi, Maria-Florina Balcan, Nika Haghtalab, Ruth Urner | We provide the first polynomial time algorithm that can learn linear separators to arbitrarily small excess error in this noise model under the uniform distribution over the unit sphere in \Re^d, for some constant value of η. |

14 | Efficient Representations for Lifelong Learning and Autoencoding | Maria-Florina Balcan, Avrim Blum, Santosh Vempala | In this work we pose and provide efficient algorithms for several natural theoretical formulations of this goal. |

15 | Optimally Combining Classifiers Using Unlabeled Data | Akshay Balsubramani, Yoav Freund | We develop a worst-case analysis of aggregation of classifier ensembles for binary classification. |

16 | Minimax Fixed-Design Linear Regression | Peter L. Bartlett, Wouter M. Koolen, Alan Malek, Eiji Takimoto, Manfred K. Warmuth | We consider a linear regression game in which the covariates are known in advance: at each round, the learner predicts a real-value, the adversary reveals a label, and the learner incurs a squared error loss. |

17 | Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions | Alexandre Belloni, Tengyuan Liang, Hariharan Narayanan, Alexander Rakhlin | Other applications of the method discussed in this work include private computation of empirical risk minimizers, two-stage stochastic programming, and approximate dynamic programming for online learning. |

18 | Bandit Convex Optimization: \sqrtT Regret in One Dimension | S�bastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres | Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. |

19 | The entropic barrier: a simple and optimal universal self-concordant barrier | S�bastien Bubeck, Ronen Eldan | We prove that the Fenchel dual of the log-Laplace transform of the uniform measure on a convex body in \mathbbR^n is a (1+o(1)) n-self-concordant barrier, improving a seminal result of Nesterov and Nemirovski. |

20 | Optimum Statistical Estimation with Strategic Data Sources | Yang Cai, Constantinos Daskalakis, Christos Papadimitriou | We propose an optimum mechanism for providing monetary incentives to the data sources of a statistical estimator such as linear regression, so that high quality data is provided at low cost, in the sense that the weighted sum of payments and estimation error is minimized. |

21 | On the Complexity of Learning with Kernels | Nicol� Cesa-Bianchi, Yishay Mansour, Ohad Shamir | In this paper, we study lower bounds on the error attainable by such methods as a function of the number of entries observed in the kernel matrix or the rank of an approximate kernel matrix. |

22 | Learnability of Solutions to Conjunctive Queries: The Full Dichotomy | Hubie Chen, Matthew Valeriote | In this article, we study a family of such learning problems; this family contains, for each relational structure, the problem of learning the solution space of an unknown conjunctive query evaluated on the structure. |

23 | Sequential Information Maximization: When is Greedy Near-optimal? | Yuxin Chen, S. Hamed Hassani, Amin Karbasi, Andreas Krause | In this paper, we analyze the widely used greedy policy for this task, and identify problem instances where it provides provably near-maximal utility, even in the challenging setting of persistent noise. |

24 | Efficient Sampling for Gaussian Graphical Models via Spectral Sparsification | Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, Shang-Hua Teng | We give two sparsification methods for this approach that may be of independent interest. |

25 | Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery | Peter Chin, Anup Rao, Van Vu | In this paper, we present and analyze a simple and robust spectral algorithm for the stochastic block model with k blocks, for any k fixed. |

26 | On-Line Learning Algorithms for Path Experts with Non-Additive Losses | Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Manfred Warmuth | We give new algorithms extending the Follow-the-Perturbed-Leader (FPL) algorithm to both of these families of loss functions and similarly give new algorithms extending the Randomized Weighted Majority (RWM) algorithm to both of these families. |

27 | Truthful Linear Regression | Rachel Cummings, Stratis Ioannidis, Katrina Ligett | We consider the problem of fitting a linear model to data held by individuals who are concerned about their privacy. |

28 | A PTAS for Agnostically Learning Halfspaces | Amit Daniely | We present a PTAS for agnostically learning halfspaces w.r.t. the uniform distribution on the d dimensional sphere. |

29 | S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification | Gautam Dasarathy, Robert Nowak, Xiaojin Zhu | We introduce a simple and label-efficient algorithm called S^2 for this task. |

30 | Improved Sum-of-Squares Lower Bounds for Hidden Clique and Hidden Submatrix Problems | Yash Deshpande, Andrea Montanari | Recently, \citemeka2013association proposed a method to establish lower bounds for the hidden clique problem within the Sum of Squares (SOS) semidefinite hierarchy. |

31 | Contextual Dueling Bandits | Miroslav Dud�k, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi | Here, we propose a new and natural solution concept, rooted in game theory, called a \emphvon Neumann winner, a randomized policy that beats or ties every other policy. |

32 | Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering | Justin Eldridge, Mikhail Belkin, Yusu Wang | In this paper we identify two limit properties, \emphseparation and \emphminimality, which address both over-segmentation and improper nesting and together imply (but are not implied by) Hartigan consistency. |

33 | Faster Algorithms for Testing under Conditional Sampling | Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, Ananda Theertha Suresh | We study two of the most important tests under the conditional-sampling model where each query specifies a subset S of the domain, and the response is a sample drawn from S according to the underlying distribution. |

34 | Learning and inference in the presence of corrupted inputs | Uriel Feige, Yishay Mansour, Robert Schapire | We model the classification and inference problems as a zero-sum game between a learner, minimizing the expected error, and an adversary, maximizing the expected error. |

35 | From Averaging to Acceleration, There is Only a Step-size | Nicolas Flammarion, Francis Bach | We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system, showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants. |

36 | Variable Selection is Hard | Dean Foster, Howard Karloff, Justin Thaler | Assuming a standard complexity hypothesis, we show that no polynomial-time algorithm can find a k’-sparse \bfx with \|B\bfx-\bfy\|^2\le h(m,p), where k’=k⋅2^\log ^1-δ p and h(m,p)= p^C_1 m^1-C_2, where δ>0,C_1>0,C_2>0 are arbitrary. |

37 | Vector-Valued Property Elicitation | Rafael Frongillo, Ian A. Kash | We show that linear and ratio-of-linear do admit nonseparable scores, and provide evidence for a conjecture that these are the only such properties (up to link functions). |

38 | Competing with the Empirical Risk Minimizer in a Single Pass | Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford | Our goal in this work is to perform as well as the ERM, on \emphevery problem, while minimizing the use of computational resources such as running time and space usage. |

39 | A Chaining Algorithm for Online Nonparametric Regression | Pierre Gaillard, S�bastien Gerchinovitz | We consider the problem of online nonparametric regression with arbitrary deterministic sequences. |

40 | Escaping From Saddle Points � Online Stochastic Gradient for Tensor Decomposition | Rong Ge, Furong Huang, Chi Jin, Yang Yuan | In this paper we identify \em strict saddle property for non-convex problem that allows for efficient optimization. |

41 | Learning the dependence structure of rare events: a non-asymptotic study | Nicolas Goix, Anne Sabourin, St�phan Cl�men\ccon | The main purpose of this paper is to fill this gap. |

42 | Thompson Sampling for Learning Parameterized Markov Decision Processes | Aditya Gopalan, Shie Mannor | We present a version of Thompson sampling for parameterized reinforcement learning problems, and derive a frequentist regret bound for priors over general parameter spaces. |

43 | Computational Lower Bounds for Community Detection on Random Graphs | Bruce Hajek, Yihong Wu, Jiaming Xu | This paper studies the problem of detecting the presence of a small dense community planted in a large Erdős-Rényi random graph \calG(N,q), where the edge probability within the community exceeds q by a constant factor. |

44 | Adaptive Recovery of Signals by Convex Optimization | Zaid Harchaoui, Anatoli Juditsky, Arkadi Nemirovski, Dmitry Ostrovsky | We present a theoretical framework for adaptive estimation and prediction of signals of unknown structure in the presence of noise. |

45 | Tensor principal component analysis via sum-of-square proofs | Samuel B. Hopkins, Jonathan Shi, David Steurer | We study a statistical model for the \emphtensor principal component analysis problem introduced by Montanari and Richard: Given a order-3 tensor \mathbf T of the form \mathbf T = τ⋅v_0^⊗3 + \mathbf A, where τ≥0 is a signal-to-noise ratio, v_0 is a unit vector, and \mathbf A is a random noise tensor, the goal is to recover the planted vector v_0. |

46 | Fast Exact Matrix Completion with Finite Samples | Prateek Jain, Praneeth Netrapalli | In this paper, we present a fast iterative algorithm that solves the matrix completion problem by observing O\left(nr^5 \log^3 n\right) entries, which is independent of the condition number and the desired accuracy. |

47 | Exp-Concavity of Proper Composite Losses | Parameswaran Kamalaruban, Robert Williamson, Xinhua Zhang | In this paper we provide a complete characterization of the exp-concavity of any proper composite loss. |

48 | On Learning Distributions from their Samples | Sudeep Kamath, Alon Orlitsky, Dheeraj Pichapati, Ananda Theertha Suresh | We study distribution approximations for general loss measures. |

49 | MCMC Learning | Varun Kanade, Elchanan Mossel | In this paper we initiate the investigation of extending central ideas, methods and algorithms from the theory of learning under the uniform distribution to the setup of learning concepts given examples from MRF distributions. |

50 | Online PCA with Spectral Bounds | Zohar Karnin, Edo Liberty | We describe two simple and deterministic algorithms. |

51 | Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem | Junpei Komiyama, Junya Honda, Hisashi Kashima, Hiroshi Nakagawa | We introduce a tight asymptotic regret lower bound that is based on the information divergence. |

52 | Second-order Quantile Methods for Experts and Combinatorial Games | Wouter M. Koolen, Tim Van Erven | We aim to design strategies for sequential decision making that adjust to the difficulty of the learning problem. |

53 | Hierarchical Label Queries with Data-Dependent Partitions | Samory Kpotufe, Ruth Urner, Shai Ben-David | Given a joint distribution P_X, Y over a space \Xcal and a label set \Ycal=\braces0, 1, we consider the problem of recovering the labels of an unlabeled sample with as few label queries as possible. |

54 | Algorithms for Lipschitz Learning on Graphs | Rasmus Kyng, Anup Rao, Sushant Sachdeva, Daniel A. Spielman | We present an algorithm that computes a minimal Lipschitz extension in expected linear time, and an algorithm that computes an absolutely minimal Lipschitz extension in expected time \widetildeO (m n). |

55 | Low Rank Matrix Completion with Exponential Family Noise | Jean Lafond | When the sampling distribution is known, we propose another estimator and prove an oracle inequality \em w.r.t. the Kullback-Leibler prediction risk, which translates immediately into an upper bound on the Frobenius prediction risk. |

56 | Bad Universal Priors and Notions of Optimality | Jan Leike, Marcus Hutter | We show that Legg-Hutter intelligence and thus balanced Pareto optimality is entirely subjective, and that every policy is Pareto optimal in the class of all computable environments. |

57 | Learning with Square Loss: Localization through Offset Rademacher Complexity | Tengyuan Liang, Alexander Rakhlin, Karthik Sridharan | We introduce a notion of offset Rademacher complexity that provides a transparent way to study localization both in expectation and in high probability. |

58 | Achieving All with No Parameters: AdaNormalHedge | Haipeng Luo, Robert E. Schapire | We study the classic online learning problem of predicting with expert advice, and propose a truly parameter-free and adaptive algorithm that achieves several objectives simultaneously without using any prior information. |

59 | Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization | Mehrdad Mahdavi, Lijun Zhang, Rong Jin | In this paper we derive \textithigh probability lower and upper bounds on the excess risk of stochastic optimization of exponentially concave loss functions. |

60 | Correlation Clustering with Noisy Partial Information | Konstantin Makarychev, Yury Makarychev, Aravindan Vijayaraghavan | In this paper, we propose and study a semi-random model for the Correlation Clustering problem on arbitrary graphs G. |

61 | Online Density Estimation of Bradley-Terry Models | Issei Matsumoto, Kohei Hatano, Eiji Takimoto | We consider an online density estimation problem for the Bradley-Terry model, where each model parameter defines the probability of a match result between any pair in a set of n teams. |

62 | First-order regret bounds for combinatorial semi-bandits | Gergely Neu | In this paper, we propose an algorithm that improves this scaling to \widetildeO(\sqrtL_T^*), where L_T^* is the total loss of the best action. |

63 | Norm-Based Capacity Control in Neural Networks | Behnam Neyshabur, Ryota Tomioka, Nathan Srebro | We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks. |

64 | Cortical Learning via Prediction | Christos H. Papadimitriou, Santosh S. Vempala | Using Valiant’s neuronal model as a foundation, we introduce PJOIN (for “predictive join"), a primitive that combines association and prediction. |

65 | Partitioning Well-Clustered Graphs: Spectral Clustering Works! | Richard Peng, He Sun, Luca Zanetti | In this work we study the widely used \emphspectral clustering algorithms, i.e. partition a graph into k clusters via (1) embedding the vertices of a graph into a low-dimensional space using the bottom eigenvectors of the Laplacian matrix, and (2) partitioning embedded points via k-means algorithms. |

66 | Batched Bandit Problems | Vianney Perchet, Philippe Rigollet, Sylvain Chassang, Erik Snowberg | Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. |

67 | Hierarchies of Relaxations for Online Prediction Problems with Evolving Constraints | Alexander Rakhlin, Karthik Sridharan | Despite this, we provide polynomial-time prediction algorithms that achieve low regret against combinatorial benchmark sets. |

68 | Fast Mixing for Discrete Point Processes | Patrick Rebeschini, Amin Karbasi | We investigate the systematic mechanism for designing fast mixing Markov chain Monte Carlo algorithms to sample from discrete point processes under the Dobrushin uniqueness condition for Gibbs measures. |

69 | Generalized Mixability via Entropic Duality | Mark D. Reid, Rafael M. Frongillo, Robert C. Williamson, Nishant Mehta | In doing so we introduce a more general notion of Φ-mixability where Φis a general entropy (\emphi.e., any convex function on probabilities). |

70 | On the Complexity of Bandit Linear Optimization | Ohad Shamir | This and other results we present highlight some interesting differences between full-information and bandit learning, which were not considered in previous literature. |

71 | An Almost Optimal PAC Algorithm | Hans U. Simon | In contrast to this result, we show that every consistent algorithm L (even a provably suboptimal one) induces a family (L_K)_K\ge1 of PAC algorithms (with 2K-1 calls of L as a subroutine) which come very close to optimality: the number of labeled examples needed by L_K exceeds the general lower bound only by factor \ell_K(1/\epsillon) where \ell_K denotes (a truncated version of) the K-times iterated logarithm. |

72 | Minimax rates for memory-bounded sparse linear regression | Jacob Steinhardt, John Duchi | We establish a minimax lower bound of Ω(\frackdBε) on the sample size needed to estimate parameters in a k-sparse linear regression of dimension d under memory restrictions to B bits, where εis the \ell_2 parameter error. |

73 | Interactive Fingerprinting Codes and the Hardness of Preventing False Discovery | Thomas Steinke, Jonathan Ullman | In order to optimize our hardness result, we give a new Fourier-analytic approach to analyzing fingerprinting codes that is simpler, more flexible, and yields better parameters than previous constructions. |

74 | Convex Risk Minimization and Conditional Probability Estimation | Matus Telgarsky, Miroslav Dud�k, Robert Schapire | This paper proves, in very general settings, that convex risk minimization is a procedure to select a unique conditional probability model determined by the classification problem. |

75 | Regularized Linear Regression: A Precise Analysis of the Estimation Error | Christos Thrampoulidis, Samet Oymak, Babak Hassibi | We focus on the problem of linear regression and consider a general class of optimization methods that minimize a loss function measuring the misfit of the model to the observations with an added structured-inducing regularization term. |

76 | Max vs Min: Tensor Decomposition and ICA with nearly Linear Sample Complexity | Santosh S. Vempala, Ying. Xiao | We present a simple, general technique for reducing the sample complexity of matrix and tensor decomposition algorithms applied to distributions. |

77 | On Convergence of Emphatic Temporal-Difference Learning | H. Yu | We present in this paper the first convergence proofs for two emphatic algorithms, ETD(λ) and ELSTD(λ). |