# Paper Digest: COLT 2020 Highlights

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The Annual Conference on Learning Theory (COLT) focuses on addressing theoretical aspects of machine learing and related topics.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to **sign up our free daily paper digest service ** to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team

team@paperdigest.org

#### TABLE 1: COLT 2020 Papers

Title | Authors | Highlight | |
---|---|---|---|

1 | Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit | Jayadev Acharya, Cl?ment L Canonne, Yanjun Han, Ziteng Sun, Himanshu Tyagi | In this work, we provide a complete understanding of the interplay between the amount of shared randomness available, the stringency of information constraints, and the sample complexity of the testing problem by characterizing a tight trade-off between these three parameters. |

2 | Distributed Signal Detection under Communication Constraints | Jayadev Acharya, Cl?ment L Canonne, Himanshu Tyagi | We study this distributed testing problem with and without the availability of a common randomness shared by the users. |

3 | Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes | Alekh Agarwal, Sham M Kakade, Jason D Lee, Gaurav Mahajan | This work provides provable characterizations of computational, approximation, and sample size issues with regards to policy gradient methods in the context of discounted Markov Decision Processes (MDPs). |

4 | Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal | Alekh Agarwal, Sham Kakade, Lin F. Yang | This work considers the sample and computational complexity of obtaining an ?-optimal policy in a discounted Markov Decision Process (MDP), given only access to a generative model. |

5 | From Nesterov?s Estimate Sequence to Riemannian Acceleration | Kwangjun Ahn, Suvrit Sra | We propose the first global accelerated gradient method for Riemannian manifolds. |

6 | Closure Properties for Private Classification and Online Prediction | Noga Alon, Amos Beimel, Shay Moran, Uri Stemmer | As a corollary, we derive closure properties for online learning and private PAC learning. |

7 | Hierarchical Clustering: A 0.585 Revenue Approximation | Noga Alon, Yossi Azar, Danny Vainstein | Hierarchical Clustering: A 0.585 Revenue Approximation |

8 | Winnowing with Gradient Descent | Ehsan Amid, Manfred K. Warmuth | The performance of multiplicative updates is typically logarithmic in the number of features when the targets are sparse. Strikingly, we show that the same property can also be achieved with gradient descent updates. |

9 | Pan-Private Uniformity Testing | Kareem Amin, Matthew Joseph, Jieming Mao | We study the intermediate model of \emph{pan-privacy}. |

10 | Dimension-Free Bounds for Chasing Convex Functions | C.J. Argue, Anupam Gupta, Guru Guruganesh | We consider the problem of chasing convex functions, where functions arrive over time. |

11 | Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations | Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan | We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. |

12 | Data-driven confidence bands for distributed nonparametric regression | Valeriy Avanesov | In this paper we suggest a novel computationally efficient fully data-driven algorithm, quantifying uncertainty of this method, yielding frequentist $L_2$-confidence bands. |

13 | Estimating Principal Components under Adversarial Perturbations | Pranjal Awasthi, Xue Chen, Aravindan Vijayaraghavan | We study a natural model of robustness for high-dimensional statistical estimation problems that we call the {\em adversarial perturbation model}. |

14 | Active Local Learning | Arturs Backurs, Avrim Blum, Neha Gupta | In this work we consider active {\em local learning}: given a query point $x$, and active access to an unlabeled training set $S$, output the prediction $h(x)$ of a near-optimal $h \in H$ using significantly fewer labels than would be needed to actually learn $h$ fully. |

15 | Finite Regret and Cycles with Fixed Step-Size via Alternating Gradient Descent-Ascent | James P. Bailey, Gauthier Gidel, Georgios Piliouras | In this paper, we eliminate these negative properties by considering a different implementation to obtain $O\left( \nicefrac{1}{T}\right)$ time-average regret via arbitrary fixed step-size. |

16 | Calibrated Surrogate Losses for Adversarially Robust Classification | Han Bao, Clay Scott, Masashi Sugiyama | In this work, we consider the question of which surrogate losses are \emph{calibrated} with respect to the adversarial 0-1 loss, meaning that minimization of the former implies minimization of the latter. |

17 | Complexity Guarantees for Polyak Steps with Momentum | Mathieu Barr?, Adrien Taylor, Alexandre d?Aspremont | In this work, we study a class of methods, based on Polyak steps, where this knowledge is substituted by that of the optimal value, $f_*$. |

18 | Free Energy Wells and Overlap Gap Property in Sparse PCA | G?rard Ben Arous, Alexander S. Wein, Ilias Zadik | We study a variant of the sparse PCA (principal component analysis) problem in the “hard” regime, where the inference task is possible yet no polynomial-time algorithm is known to exist. |

19 | Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process | Guy Blanc, Neha Gupta, Gregory Valiant, Paul Valiant | We consider networks, trained via stochastic gradient descent to minimize $\ell_2$ loss, with the training labels perturbed by independent noise at each iteration. |

20 | Hardness of Identity Testing for Restricted Boltzmann Machines and Potts models | Antonio Blanca, Zongchen Chen, Daniel ?tefankovic, Eric Vigoda | We study identity testing for restricted Boltzmann machines (RBMs), and more generally for undirected graphical models. |

21 | Selfish Robustness and Equilibria in Multi-Player Bandits | Etienne Boursier, Vianney Perchet | We provide the first algorithm robust to selfish players (a.k.a. Nash equilibrium) with a logarithmic regret, when the arm performance is observed. |

22 | Proper Learning, Helly Number, and an Optimal SVM Bound | Olivier Bousquet, Steve Hanneke, Shay Moran, Nikita Zhivotovskiy | In this paper we aim to characterize the classes for which the optimal sample complexity can be achieved by a proper learning algorithm. |

23 | Sharper Bounds for Uniformly Stable Algorithms | Olivier Bousquet, Yegor Klochkov, Nikita Zhivotovskiy | This paper is devoted to these questions: firstly, inspired by the original arguments of Feldman and Vondrak (2019), we provide a short proof of the moment bound that implies the generalization bound stronger than both recent results in Feldman and Vondrak (2018, 2019). |

24 | The Gradient Complexity of Linear Regression | Mark Braverman, Elad Hazan, Max Simchowitz, Blake Woodworth | We investigate the computational complexity of several basic linear algebra primitives, including largest eigenvector computation and linear regression, in the computational model that allows access to the data via a matrix-vector product oracle. |

25 | Reducibility and Statistical-Computational Gaps from Secret Leakage | Matthew Brennan, Guy Bresler | The insight in this work is that a slight generalization of the planted clique conjecture – secret leakage planted clique ($\textsc{pc}_\rho$), wherein a small amount of information about the hidden clique is revealed – gives rise to a variety of new average-case reduction techniques, yielding a web of reductions relating statistical problems with very different structure. |

26 | A Corrective View of Neural Networks: Representation, Memorization and Learning | Guy Bresler, Dheeraj Nagaraj | We develop a \emph{corrective mechanism} for neural network approximation: the total available non-linear units are divided into multiple groups and the first group approximates the function under consideration, the second approximates the error in approximation produced by the first group and corrects it, the third group approximates the error produced by the first and second groups together and so on. |

27 | ID3 Learns Juntas for Smoothed Product Distributions | Alon Brutzkus, Amit Daniely, Eran Malach | In this paper, we analyze the ID3 algorithm, when the target function is a $k$-Junta, a function that depends on $k$ out of $n$ variables of the input. |

28 | Coordination without communication: optimal regret in two players multi-armed bandits | S?bastien Bubeck, Thomas Budzinski | Under the assumption that shared randomness is available, we propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. |

29 | How to Trap a Gradient Flow | S?bastien Bubeck, Dan Mikulincer | We consider the problem of finding an e-approximate stationary point of a smooth function on a compact domain of \Rd. |

30 | Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without | S?bastien Bubeck, Yuanzhi Li, Yuval Peres, Mark Sellke | We prove the first $\sqrt{T}$-type regret guarantee for this problem, assuming only two players, and under the feedback model where collisions are announced to the colliding players. |

31 | Highly smooth minimization of non-smooth problems | Brian Bullins | We establish improved rates for structured \emph{non-smooth} optimization problems by means of near-optimal higher-order accelerated methods. |

32 | Efficient, Noise-Tolerant, and Private Learning via Boosting | Mark Bun, Marco Leandro Carmosino, Jessica Sorrell | We introduce a simple framework for designing private boosting algorithms. |

33 | The estimation error of general first order methods | Michael Celentano, Andrea Montanari, Yuchen Wu | Here we consider two families of high-dimensional estimation problems: high-dimensional regression and low-rank matrix estimation, and introduce a class of ‘general first order methods’ that aim at efficiently estimating the underlying parameters. |

34 | Bounds in query learning | Hunter Chase, James Freitag | We introduce new combinatorial quantities for concept classes, and prove lower and upper bounds for learning complexity in several models of learning in terms of various combinatorial quantities. |

35 | Learning Polynomials in Few Relevant Dimensions | Sitan Chen, Raghu Meka | In this work we consider the important case where the covariates are Gaussian. |

36 | The Influence of Shape Constraints on the Thresholding Bandit Problem | James Cheshire, Pierre Menard, Alexandra Carpentier | We investigate the stochastic \emph{Thresholding Bandit problem} (\textit{TBP}) under several \emph{shape constraints}. |

37 | Gradient descent algorithms for Bures-Wasserstein barycenters | Sinho Chewi, Tyler Maunu, Philippe Rigollet, Austin J. Stromme | We study first order methods to compute the barycenter of a probability distribution $P$ over the space of probability measures with finite second moment. |

38 | Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss | L?ena?c Chizat, Francis Bach | Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations. |

39 | ODE-Inspired Analysis for the Biological Version of Oja?s Rule in Solving Streaming PCA | Chi-Ning Chou, Mien Brabeeba Wang | In this work, we give the first convergence rate analysis for the biological version of Oja’s rule in solving streaming PCA. |

40 | Pessimism About Unknown Unknowns Inspires Conservatism | Michael K. Cohen, Marcus Hutter | Our first main contribution is: given an assumption about the agent’s model class, a sufficiently pessimistic agent does not cause “unprecedented events” with probability 1-d, whether or not designers know how to precisely specify those precedents they are concerned with. |

41 | Optimal Group Testing | Amin Coja-Oghlan, Oliver Gebhard, Max Hahn-Klimroth, Philipp Loick | In the group testing problem, which goes back to the work of Dorfman (1943), we aim to identify a small set of $k\sim n^\theta$ infected individuals out of a population size $n$, $0<\theta<1$. |

42 | PAC learning with stable and private predictions | Yuval Dagan, Vitaly Feldman | We study binary classification algorithms for which the prediction on any point is not too sensitive to individual examples in the dataset. |

43 | High probability guarantees for stochastic convex optimization | Damek Davis, Dmitriy Drusvyatskiy | In this work, we show that a wide class of stochastic optimization algorithms for strongly convex problems can be augmented with high confidence bounds at an overhead cost that is only logarithmic in the confidence level and polylogarithmic in the condition number. |

44 | Halpern Iteration for Near-Optimal and Parameter-Free Monotone Inclusion and Strong Solutions to Variational Inequalities | Jelena Diakonikolas | We leverage the connections between nonexpansive maps, monotone Lipschitz operators, and proximal mappings to obtain near-optimal (i.e., optimal up to poly-log factors in terms of iteration complexity) and parameter-free methods for solving monotone inclusion problems. |

45 | Approximation Schemes for ReLU Regression | Ilias Diakonikolas, Surbhi Goel, Sushrut Karmalkar, Adam R. Klivans, Mahdi Soltanolkotabi | Our main insight is a new characterization of {\em surrogate losses} for nonconvex activations. |

46 | Learning Halfspaces with Massart Noise Under Structured Distributions | Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, Nikos Zarifis | We study the problem of learning halfspaces with Massart noise in the distribution-specific PAC model. |

47 | Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks | Ilias Diakonikolas, Daniel M. Kane, Vasilis Kontonis, Nikos Zarifis | We study the problem of PAC learning one-hidden-layer ReLU networks with $k$ hidden units on $\mathbb{R}^d$ under Gaussian marginals in the presence of additive label noise. |

48 | Consistent recovery threshold of hidden nearest neighbor graphs | Jian Ding, Yihong Wu, Jiaming Xu, Dana Yang | Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete graph, whose edge weights are independent and distributed according to $P_n$ for edges in the hidden $2k$-NN graph and $Q_n$ otherwise. |

49 | Root-n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank | Kefan Dong, Jian Peng, Yining Wang, Yuan Zhou | In this paper, we consider the problem of online learning of Markov decision processes (MDPs) with very large state spaces. |

50 | Embedding Dimension of Polyhedral Losses | Jessie Finocchiaro, Rafael Frongillo, Bo Waggoner | In this work, we study the notion of the embedding dimension of a given discrete loss: the minimum dimension d such that an embedding exists. |

51 | Efficient Parameter Estimation of Truncated Boolean Product Distributions | Dimitris Fotakis, Alkis Kalavasis, Christos Tzamos | We study the problem of estimating the parameters of a Boolean product distribution in $d$ dimensions, when the samples are truncated by a set $S \subset \{0, 1\}^d$ accessible through a membership oracle. |

52 | Rigorous Guarantees for Tyler?s M-Estimator via Quantum Expansion | William Cole Franks, Ankur Moitra | Here we observe a surprising connection between Tyler’s M-estimator and operator scaling, which has been intensively studied in recent years in part because of its connections to the Brascamp-Lieb inequality in analysis. |

53 | From tree matching to sparse graph alignment | Luca Ganassali, Laurent Massouli? | In this paper we consider alignment of sparse graphs, for which we introduce the Neighborhood Tree Matching Algorithm (NTMA). |

54 | On the Convergence of Stochastic Gradient Descent with Low-Rank Projections for Convex Low-Rank Matrix Problems | Dan Garber | We revisit the use of Stochastic Gradient Descent (SGD) for solving convex optimization problems that serve as highly popular convex relaxations for many important low-rank matrix recovery problems such as matrix completion, phase retrieval, and more. |

55 | Asymptotic Errors for High-Dimensional Convex Penalized Linear Regression beyond Gaussian Matrices | C?dric Gerbelot, Alia Abbara, Florent Krzakala | We consider the problem of learning a coefficient vector $\bf x_0 \in \mathbb R^N$ from noisy linear observations $\mathbf{y} = \mathbf{F}{\mathbf{x}_{0}}+\mathbf{w} \in \mathbb R^M$ in high dimensional limit $M,N \to \infty$ with $\alpha \equiv M/N$ fixed. |

56 | No-Regret Prediction in Marginally Stable Systems | Udaya Ghai, Holden Lee, Karan Singh, Cyril Zhang, Yi Zhang | We consider the problem of online prediction in a marginally stable linear dynamical system subject to bounded adversarial or (non-isotropic) stochastic perturbations. |

57 | Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems | Noah Golowich, Sarath Pattathil, Constantinos Daskalakis, Asuman Ozdaglar | In this paper we study the smooth convex-concave saddle point problem. |

58 | Locally Private Hypothesis Selection | Sivakanth Gopi, Gautam Kamath, Janardhan Kulkarni, Aleksandar Nikolov, Zhiwei Steven Wu, Huanyu Zhang | Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $\mathcal{Q}$, we aim to output, under the constraints of $\varepsilon$-differential privacy, a distribution from $\mathcal{Q}$ whose total variation distance to $p$ is comparable to the best such distribution. |

59 | Bessel Smoothing and Multi-Distribution Property Estimation | Yi Hao, Ping Li | We consider a basic problem in statistical learning: estimating properties of multiple discrete distributions. |

60 | Faster Projection-free Online Learning | Elad Hazan, Edgar Minasyan | In this paper we give an efficient projection-free algorithm that guarantees $T^{2/3}$ regret for general online convex optimization with smooth cost functions and one linear optimization computation per iteration. |

61 | Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond | Oliver Hinder, Aaron Sidford, Nimit Sohoni | In this paper, we provide near-optimal accelerated first-order methods for minimizing a broad class of smooth nonconvex functions that are unimodal on all lines through a minimizer. |

62 | A Greedy Anytime Algorithm for Sparse PCA | Guy Holtzman, Adam Soffer, Dan Vilenchik | We propose a new greedy algorithm for the $\ell_0$-sparse PCA problem which supports the calibration principle. |

63 | Noise-tolerant, Reliable Active Classification with Comparison Queries | Max Hopkins, Daniel Kane, Shachar Lovett, Gaurav Mahajan | By introducing comparisons, an additional type of query comparing two points, we provide the first time and query efficient algorithms for learning non-homogeneous linear separators robust to bounded (Massart) noise. |

64 | Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes | Yichun Hu, Nathan Kallus, Xiaojie Mao | We study a nonparametric contextual bandit problem where the expected reward functions belong to a Hölder class with smoothness parameter $\beta$. |

65 | Extrapolating the profile of a finite population | Soham Jana, Yury Polyanskiy, Yihong Wu | We study a prototypical problem in empirical Bayes. |

66 | Precise Tradeoffs in Adversarial Training for Linear Regression | Adel Javanmard, Mahdi Soltanolkotabi, Hamed Hassani | In this paper we provide a precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features. |

67 | Robust causal inference under covariate shift via worst-case subpopulation treatment effects | Sookyo Jeong, Hongseok Namkoong | We propose a notion of worst-case treatment effect (WTE) across all subpopulations of a given size, a conservative notion of topline treatment effect. |

68 | Efficient improper learning for online logistic regression | R?mi J?z?quel, Pierre Gaillard, Alessandro Rudi | In this work, we design an efficient improper algorithm that avoids this exponential constant while preserving a logarithmic regret. |

69 | Gradient descent follows the regularization path for general losses | Ziwei Ji, Miroslav Dud?k, Robert E. Schapire, Matus Telgarsky | In this work, we show that for empirical risk minimization over linear predictors with \emph{arbitrary} convex, strictly decreasing losses, if the risk does not attain its infimum, then the gradient-descent path and the \emph{algorithm-independent} regularization path converge to the same direction (whenever either converges to a direction). |

70 | Provably efficient reinforcement learning with linear function approximation | Chi Jin, Zhuoran Yang, Zhaoran Wang, Michael I Jordan | This paper presents the first provable RL algorithm with both polynomial runtime and polynomial sample complexity in this linear setting, without requiring a “simulator” or additional assumptions. |

71 | Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise | Maxim Kaledin, Eric Moulines, Alexey Naumov, Vladislav Tadic, Hoi-To Wai | In this paper, we provide a finite-time analysis for linear two timescale SA. |

72 | Private Mean Estimation of Heavy-Tailed Distributions | Gautam Kamath, Vikrant Singhal, Jonathan Ullman | We give new upper and lower bounds on the minimax sample complexity of differentially private mean estimation of distributions with bounded $k$-th moments. |

73 | Approximate is Good Enough: Probabilistic Variants of Dimensional and Margin Complexity | Pritish Kamath, Omar Montasser, Nathan Srebro | We present and study approximate notions of dimensional and margin complexity, which correspond to the minimal dimension or norm of an embedding required to {\em approximate}, rather then exactly represent, a given hypothesis class. |

74 | Privately Learning Thresholds: Closing the Exponential Gap | Haim Kaplan, Katrina Ligett, Yishay Mansour, Moni Naor, Uri Stemmer | In this work we reduce this gap significantly, almost settling the sample complexity. |

75 | Online Learning with Vector Costs and Bandits with Knapsacks | Thomas Kesselheim, Sahil Singla | We introduce online learning with vector costs ($OLVC_p$) where in each time step $t \in \{1,\ldots, T\}$, we need to play an action $i \in \{1,\ldots,n\}$ that incurs an unknown vector cost in $[0,1]^d$. |

76 | Universal Approximation with Deep Narrow Networks | Patrick Kidger, Terry Lyons | Here we consider the natural ‘dual’ scenario for networks of bounded width and arbitrary depth. |

77 | Information Directed Sampling for Linear Partial Monitoring | Johannes Kirschner, Tor Lattimore, Andreas Krause | We introduce {\em information directed sampling} (IDS) for stochastic partial monitoring with a linear reward and observation structure. |

78 | New Potential-Based Bounds for Prediction with Expert Advice | Vladimir A. Kobzar, Robert V. Kohn, Zhilei Wang | This work addresses the classic machine learning problem of online prediction with expert advice. |

79 | On Suboptimality of Least Squares with Application to Estimation of Convex Bodies | Gil Kur, Alexander Rakhlin, Adityanand Guntuboyina | We develop a technique for establishing lower bounds on the sample complexity of Least Squares (or, Empirical Risk Minimization) for large classes of functions. |

80 | The EM Algorithm gives Sample-Optimality for Learning Mixtures of Well-Separated Gaussians | Jeongyeol Kwon, Constantine Caramanis | We consider the problem of spherical Gaussian Mixture models with $k \geq 3$ components when the components are well separated. |

81 | Exploration by Optimisation in Partial Monitoring | Tor Lattimore, Csaba Szespv?ri | We provide a novel algorithm for adversarial k-action d-outcome partial monitoring that is adaptive, intuitive and efficient. |

82 | A Closer Look at Small-loss Bounds for Bandits with Graph Feedback | Chung-Wei Lee, Haipeng Luo, Mengxiao Zhang | Specifically, we develop an algorithm with regret $\mathcal{\tilde{O}}(\sqrt{\kappa L_*})$ where $\kappa$ is the clique partition number and $L_*$ is the loss of the best arm, and for the special case of self-aware graphs where every arm has a self-loop, we improve the regret to $\mathcal{\tilde{O}}(\min\{\sqrt{\alpha T}, \sqrt{\kappa L_*}\})$ where $\alpha \leq \kappa$ is the independence number. |

83 | Logsmooth Gradient Concentration and Tighter Runtimes for Metropolized Hamiltonian Monte Carlo | Yin Tat Lee, Ruoqi Shen, Kevin Tian | We show that the gradient norm \norm?f(x) for x~exp(-f(x)), where f is strongly convex and smooth, concentrates tightly around its mean. |

84 | A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates | Zhixian Lei, Kyle Luh, Prayaag Venkat, Fred Zhang | In this work, we show that it is possible to go beyond SDP and achieve better computational efficiency. |

85 | Learning Over-Parametrized Two-Layer Neural Networks beyond NTK | Yuanzhi Li, Tengyu Ma, Hongyang R. Zhang | We consider the dynamic of gradient descent for learning a two-layer neural network. |

86 | On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels | Tengyuan Liang, Alexander Rakhlin, Xiyu Zhai | We study the risk of minimum-norm interpolants of data in Reproducing Kernel Hilbert Spaces. |

87 | Learning Entangled Single-Sample Gaussians in the Subset-of-Signals Model | Yingyu Liang, Hui Yuan | We propose the subset-of-signals model where an unknown subset of $m$ variances are bounded by 1 while there are no assumptions on the other variances. |

88 | Near-Optimal Algorithms for Minimax Optimization | T. Lin, C. Jin, M. I. Jordan | This paper presents the first algorithm with $\tilde{O}(\sqrt{\kappax\kappay})$ gradient complexity, matching the lower bound up to logarithmic factors. |

89 | Better Algorithms for Estimating Non-Parametric Models in Crowd-Sourcing and Rank Aggregation | Allen Liu, Ankur Moitra | Here we introduce a framework for exploiting global information in shape-constrained estimation problems. |

90 | Tight Lower Bounds for Combinatorial Multi-Armed Bandits | Nadav Merlis, Shie Mannor | In this work, we prove regret lower bounds for combinatorial bandits that hold under mild assumptions for all smooth reward functions. |

91 | Lipschitz and Comparator-Norm Adaptivity in Online Learning | Zakaria Mhammedi, Wouter M. Koolen | We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. |

92 | Information Theoretic Optimal Learning of Gaussian Graphical Models | Sidhant Misra, Marc Vuffray, Andrey Y. Lokhov | In this paper, we constructively answer this question and propose an algorithm, termed DICE, whose sample complexity matches the information-theoretic lower bound up to a universal constant factor. |

93 | Parallels Between Phase Transitions and Circuit Complexity? | Ankur Moitra, Elchanan Mossel, Colin Sandon | In this work, we study the circuit complexity of inference in the broadcast tree model, which has important applications in phylogenetic reconstruction and close connections to community detection. |

94 | On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration | Wenlong Mou, Chris Junchi Li, Martin J Wainwright, Peter L Bartlett, Michael I Jordan | When the matrix $\bar{A}$ is Hurwitz, we prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity. |

95 | Extending Learnability to Auxiliary-Input Cryptographic Primitives and Meta-PAC Learning | Mikito Nanashima | In this paper, we formulate a task of determining efficient learnability as a meta-PAC learning problem and show that our meta-PAC learning is exactly as hard as PAC learning. |

96 | Fast Rates for Online Prediction with Abstention | Gergely Neu, Nikita Zhivotovskiy | In the setting of sequential prediction of individual $(0, 1)$-sequences with expert advice, we show that by allowing the learner to abstain from the prediction by paying a cost marginally smaller than $0.5$ (say, $0.49$), it is possible to achieve expected regret bounds that are independent of the time horizon T. |

97 | Efficient and robust algorithms for adversarial linear contextual bandits | Gergely Neu, Julia Olkhovskaya | Under the assumption that the $d$-dimensional contexts are generated i.i.d. at random from a known distribution, we develop computationally efficient algorithms based on the classic Exp3 algorithm. |

98 | An $\widetilde\mathcalO(m/\varepsilon^3.5)$-Cost Algorithm for Semidefinite Programs with Diagonal Constraints | Yin Tat Lee, Swati Padmanabhan | Our key technical contribution is to combine an approximate variant of the Arora-Kale framework of mirror descent for SDPs with the idea of trading off exact computations in every iteration for variance-reduced estimations in most iterations, only periodically resetting the accumulated error with exact computations. |

99 | Costly Zero Order Oracles | Renato Paes Leme, Jon Schneider | We study optimization with an approximate zero order oracle where there is a cost $c(\epsilon)$ associated with querying the oracle with $\epsilon$ accuracy. |

100 | Adaptive Submodular Maximization under Stochastic Item Costs | Srinivasan Parthasarathy | In this work, we develop adaptive policies for maximizing such functions when both the utility function and item costs may be stochastic. |

101 | Covariance-adapting algorithm for semi-bandits with application to sparse outcomes | Pierre Perrault, Michal Valko, Vianney Perchet | We investigate \emph{stochastic combinatorial semi-bandits}, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). |

102 | Finite-Time Analysis of Asynchronous Stochastic Approximation and $Q$-Learning | Guannan Qu, Adam Wierman | We consider a general asynchronous Stochastic Approximation (SA) scheme featuring a weighted infinity-norm contractive operator, and prove a bound on its finite-time convergence rate on a single trajectory. |

103 | List Decodable Subspace Recovery | Prasad Raghavendra, Morris Yau | In this work, we study robust statistics in the presence of overwhelming outliers for the fundamental problem of subspace recovery. |

104 | Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits | Chlo? Rouyer, Yevgeny Seldin | We derive a new algorithm using regularization by Tsallis entropy to achieve best of both worlds guarantees. |

105 | How Good is SGD with Random Shuffling? | Itay Safran, Ohad Shamir | In this paper, we provide lower bounds on the expected optimization error with these heuristics (using SGD with any constant step size), which elucidate their advantages and disadvantages. |

106 | A Nearly Optimal Variant of the Perceptron Algorithm for the Uniform Distribution on the Unit Sphere | Marco Schmalhofer | We show a simple perceptron-like algorithm to learn origin-centered halfspaces in $\mathbb{R}^n$ with accuracy $1-\epsilon$ and confidence $1-\delta$ in time $\mathcal{O}\left(\frac{n^2}{\epsilon}\left(\log \frac{1}{\epsilon}+\log \frac{1}{\delta}\right)\right)$ using $\mathcal{O}\left(\frac{n}{\epsilon}\left(\log \frac{1}{\epsilon}+\log \frac{1}{\delta}\right)\right)$ labeled examples drawn uniformly from the unit $n$-sphere. |

107 | Logistic Regression Regret: What?s the Catch? | Gil I Shamir | We address the problem of the achievable regret rates with online logistic regression. |

108 | Improper Learning for Non-Stochastic Control | Max Simchowitz, Karan Singh, Elad Hazan | We introduce a controller parametrization based on the denoised observations, and prove that applying online gradient descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies. |

109 | Reasoning About Generalization via Conditional Mutual Information | Thomas Steinke, Lydia Zakynthinou | We provide an information-theoretic framework for studying the generalization properties of machine learning algorithms. |

110 | Estimation and Inference with Trees and Forests in High Dimensions | Vasilis Syrgkanis, Manolis Zampetakis | In this work, we analyze the performance of regression trees and forests with binary features in the high-dimensional regime, where the number of features can grow exponentially with the number of samples. |

111 | Balancing Gaussian vectors in high dimension | Paxton Turner, Raghu Meka, Philippe Rigollet | We present a randomized polynomial-time algorithm that achieves discrepancy $e^{-\Omega(\log^2(n)/m)}$ with high probability, provided that $m = O(\sqrt{\log{n}})$. |

112 | Active Learning for Identification of Linear Dynamical Systems | Andrew Wagenmaker, Kevin Jamieson | We propose an algorithm to actively estimate the parameters of a linear dynamical system. |

113 | Taking a hint: How to leverage loss predictors in contextual bandits? | Chen-Yu Wei, Haipeng Luo, Alekh Agarwal | We initiate the study of learning in contextual bandits with the help of loss predictors. |

114 | Kernel and Rich Regimes in Overparametrized Models | Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, Nathan Srebro | We provide a complete and detailed analysis for a family of simple depth-$D$ linear networks that exhibit an interesting and meaningful transition between the kernel and rich regimes, and highlight an interesting role for the \emph{width} of the models. |

115 | Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium | Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang | In this work, we develop provably efficient reinforcement learning algorithms for two-player zero-sum Markov games with simultaneous moves. |

116 | Tree-projected gradient descent for estimating gradient-sparse parameters on graphs | Sheng Xu, Zhou Fan, Sahand Negahban | Given observations $Z_1,\ldots,Z_n$ and a smooth, convex loss function $\mathcal{L}$ for which $\boldsymbol{\theta}^*$ minimizes the population risk $\mathbb{E}[\mathcal{L}(\boldsymbol{\theta};Z_1,\ldots,Z_n)]$, we propose to estimate $\boldsymbol{\theta}^*$ by a projected gradient descent algorithm that iteratively and approximately projects gradient steps onto spaces of vectors having small gradient-sparsity over low-degree spanning trees of $G$. |

117 | Non-asymptotic Analysis for Nonparametric Testing | Yun Yang, Zuofeng Shang, Guang Cheng | We develop a non-asymptotic framework for hypothesis testing in nonparametric regression where the true regression function belongs to a Sobolev space. |

118 | Learning a Single Neuron with Gradient Methods | Gilad Yehudai, Shamir Ohad | We consider the fundamental problem of learning a single neuron $\mathbf{x}\mapsto \sigma(\mathbf{w}^\top\mathbf{x})$ in a realizable setting, using standard gradient methods with random initialization, and under general families of input distributions and activations. |

119 | Nearly Non-Expansive Bounds for Mahalanobis Hard Thresholding | Xiao-Tong Yuan, Ping Li | The core contribution of this paper is to prove that for any $\bar k$-sparse vector $\bar w$ with $\bar k < k$, the estimation error $\|\mathcal{H}_{A,k}(w) – \bar w\|_A$ satisfies \[ \|\mathcal{H}_{A,k}(w) – \bar w\|^2_A \le \left(1+ \mathcal{O}\left(\kappa(A,2k) \sqrt{\frac{\bar k }{k – \bar k}}\right)\right) \|{w} – \bar w\|^2_A, \]{where} $\kappa(A,2k)$ is the restricted strong condition number of $A$ over $(2k)$-sparse subspace. |

120 | Wasserstein Control of Mirror Langevin Monte Carlo | Kelvin Shuangjian Zhang, Gabriel Peyr?, Jalal Fadili, Marcelo Pereyra | In this paper, we consider Langevin diffusions on a Hessian-type manifold and study a discretization that is closely related to the mirror-descent scheme. |

121 | Open Problem: Model Selection for Contextual Bandits | Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo | We ask whether similar guarantees are possible for contextual bandit learning. |

122 | Open Problem: Tight Convergence of SGD in Constant Dimension | Tomer Koren, Shahar Segal | We point out to a gap that remains between the known upper and lower bounds for the expected suboptimality of the last SGD point whenever the dimension is a constant independent of the number of SGD iterations T, and in particular, that the gap is still unaddressed even in the one dimensional case. |

123 | Open Problem: Average-Case Hardness of Hypergraphic Planted Clique Detection | Yuetian Luo, Anru R Zhang | In particular, we conjecture if it is possible to establish the equivalence of the computational hardness between HPC and PC detection. |

124 | Open Problem: Information Complexity of VC Learning | Thomas Steinke, Lydia Zakynthinou | We ask whether all VC classes admit a learner with low information complexity which achieves the generalization bounds guaranteed by uniform convergence. |

125 | Open Problem: Fast and Optimal Online Portfolio Selection | Tim Van Erven, Dirk Van der Hoeven, Wojciech Kotlowski, Wouter M. Koolen | The open problem we put before the community is to formally prove whether this approach achieves the optimal regret. |