# Paper Digest: COLT 2018 Highlights

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The Annual Conference on Learning Theory (COLT) focuses on addressing theoretical aspects of machine learing and related topics.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to **sign up our free daily paper digest service ** to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team

team@paperdigest.org

#### TABLE 1: COLT 2018 Papers

Title | Authors | Highlight | |
---|---|---|---|

1 | Conference on Learning Theory 2018: Preface | S�bastien Bubeck, Philippe Rigollet | Conference on Learning Theory 2018: Preface |

2 | Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations | Yuanzhi Li, Tengyu Ma, Hongyang Zhang | We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations. |

3 | Reducibility and Computational Lower Bounds for Problems with Planted Sparse Structure | Matthew Brennan, Guy Bresler, Wasim Huleihel | We introduce several new techniques to give a web of average-case reductions showing strong computational lower bounds based on the planted clique conjecture. |

4 | Logistic Regression: The Importance of Being Improper | Dylan J. Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan | Starting with the simple observation that the logistic loss is $1$-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm. |

5 | Actively Avoiding Nonsense in Generative Models | Steve Hanneke, Adam Tauman Kalai, Gautam Kamath, Christos Tzamos | To address this, we propose a model of active distribution learning using a binary invalidity oracle that identifies some examples as clearly invalid, together with random positive examples sampled from the true distribution. |

6 | A Faster Approximation Algorithm for the Gibbs Partition Function | Vladimir Kolmogorov | We consider the problem of estimating the partition function $Z(\beta)=\sum_x \exp(-\beta H(x))$ of a Gibbs distribution with a Hamilton $H(\cdot)$, or more precisely the logarithm of the ratio $q=\ln Z(0)/Z(\beta)$. |

7 | Exponential Convergence of Testing Error for Stochastic Gradient Methods | Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach | We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods. |

8 | Size-Independent Sample Complexity of Neural Networks | Noah Golowich, Alexander Rakhlin, Ohad Shamir | We study the sample complexity of learning neural networks, by providing new bounds on their Rademacher complexity assuming norm constraints on the parameter matrix of each layer. |

9 | Underdamped Langevin MCMC: A non-asymptotic analysis | Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, Michael I. Jordan | We present a MCMC algorithm based on its discretization and show that it achieves $\varepsilon$ error (in 2-Wasserstein distance) in $\mathcal{O}(\sqrt{d}/\varepsilon)$ steps. |

10 | Online Variance Reduction for Stochastic Optimization | Zalan Borsos, Andreas Krause, Kfir Y. Levy | In this work, we investigate a recently proposed setting which poses variance reduction as an online optimization problem with bandit feedback. |

11 | Information Directed Sampling and Bandits with Heteroscedastic Noise | Johannes Kirschner, Andreas Krause | In this work, we consider bandits with heteroscedastic noise, where we explicitly allow the noise distribution to depend on the evaluation point. |

12 | Testing Symmetric Markov Chains From a Single Trajectory | Constantinos Daskalakis, Nishanth Dikkala, Nick Gravin | We propose a measure of difference between two Markov chains, motivated by the early work of Kazakos [78], which captures the scaling behavior of the total variation distance between trajectories sampled from the Markov chains as the length of these trajectories grows. |

13 | Detection limits in the high-dimensional spiked rectangular model | Ahmed El Alaoui, Michael I. Jordan | We present a probabilistic approach capable of treating generic product priors. |

14 | Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification | Max Simchowitz, Horia Mania, Stephen Tu, Michael I. Jordan, Benjamin Recht | We generalize our technique to provide bounds for a more general class of linear response time-series. |

15 | Active Tolerant Testing | Avrim Blum, Lunjia Hu | In this work, we show that for a nontrivial hypothesis class $\mathcal C$, we can estimate the distance of a target function $f$ to $\mathcal C$ (estimate the error rate of the best $h\in \mathcal C$) using substantially fewer labeled examples than would be needed to actually {\em learn} a good $h \in \mathcal C$. |

16 | Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods | Yan Shuo Tan, Roman Vershynin | In this paper, we propose a simple spectral algorithm called \textsc{Reweighted PCA}, and prove that it possesses the same guarantee. |

17 | Calibrating Noise to Variance in Adaptive Data Analysis | Vitaly Feldman, Thomas Steinke | Here we propose a relaxed notion of stability based on KL divergence that also composes adaptively. |

18 | Accelerating Stochastic Gradient Descent for Least Squares Regression | Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford | In particular, this work introduces an accelerated stochastic gradient method that provably achieves the minimax optimal statistical risk faster than stochastic gradient descent. |

19 | Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints | Wenlong Mou, Liwei Wang, Xiyu Zhai, Kai Zheng | Two theories are proposed with non-asymptotic discrete-time analysis, using stability and PAC-Bayesian theory respectively. |

20 | Optimal approximation of continuous functions by very deep ReLU networks | Dmitry Yarotsky | We consider approximations of general continuous functions on finite-dimensional cubes by general deep ReLU neural networks and study the approximation rates with respect to the modulus of continuity of the function and the total number of weights $W$ in the network. |

21 | Averaging Stochastic Gradient Descent on Riemannian Manifolds | Nilesh Tripuraneni, Nicolas Flammarion, Francis Bach, Michael I. Jordan | We develop a geometric framework to transform a sequence of slowly converging iterates generated from stochastic gradient descent (SGD) on $\mathcal{M}$ to an averaged iterate sequence with a robust and fast $O(1/n)$ convergence rate. |

22 | Fitting a Putative Manifold to Noisy Data | Charles Fefferman, Sergei Ivanov, Yaroslav Kurylev, Matti Lassas, Hariharan Narayanan | In the present work, we give a solution to the following question from manifold learning. |

23 | Private Sequential Learning | John Tsitsiklis, Kuang Xu, Zhi Xu | We formulate a private learning model to study an intrinsic tradeoff between privacy and query complexity in sequential learning. |

24 | Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models | Jean Barbier, Florent Krzakala, Nicolas Macris, L�o Miolane, Lenka Zdeborov� | % In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes or benchmarks models in neural networks. |

25 | Exact and Robust Conformal Inference Methods for Predictive Machine Learning with Dependent Data | Victor Chernozhukov, Kaspar W�thrich, Zhu Yinchu | We extend conformal inference to general settings that allow for time series data. |

26 | Nonstochastic Bandits with Composite Anonymous Feedback | Nicol� Cesa-Bianchi, Claudio Gentile, Yishay Mansour | Our main contribution is a general reduction transforming a standard bandit algorithm into one that can operate in this harder setting. |

27 | Lower Bounds for Higher-Order Convex Optimization | Naman Agarwal, Elad Hazan | As a special case, we show Nesterov’s accelerated cubic regularization method and higher-order methods to be nearly tight. |

28 | Log-concave sampling: Metropolis-Hastings algorithms are fast! | Raaz Dwivedi, Yuansi Chen, Martin J Wainwright, Bin Yu | We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove a non-asymptotic upper bound on the mixing time of the Metropolis-adjusted Langevin algorithm (MALA). |

29 | Incentivizing Exploration by Heterogeneous Users | Bangrui Chen, Peter Frazier, David Kempe | We consider the problem of incentivizing exploration with heterogeneous agents. |

30 | Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms | Ilias Diakonikolas, Jerry Li, Ludwig Schmidt | Our goal is to output a hypothesis that is $O(\mathrm{OPT}) + \epsilon$ close to $f$, in $L_1$-distance. |

31 | Time-Space Tradeoffs for Learning Finite Functions from Random Evaluations, with Applications to Polynomials | Paul Beame, Shayan Oveis Gharan, Xin Yang | With our methods we can obtain bounds for learning concept classes of finite functions from random evaluations even when the sample space of random inputs can be significantly smaller than the concept class of functions and the function values can be from an arbitrary finite set. |

32 | Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability | Belinda Tzen, Tengyuan Liang, Maxim Raginsky | We study the detailed path-wise behavior of the discrete-time Langevin algorithm for non-convex Empirical Risk Minimization (ERM) through the lens of metastability, adopting some techniques from Berglund and Gentz (2003). |

33 | Hardness of Learning Noisy Halfspaces using Polynomial Thresholds | Arnab Bhattacharyya, Suprovat Ghoshal, Rishi Saket | We prove the hardness of weakly learning halfspaces in the presence of adversarial noise using polynomial threshold functions (PTFs). |

34 | Best of both worlds: Stochastic & adversarial best-arm identification | Yasin Abbasi-Yadkori, Peter Bartlett, Victor Gabillon, Alan Malek, Michal Valko | We study bandit best-arm identification with arbitrary and potentially adversarial rewards. |

35 | Learning Patterns for Detection with Multiscale Scan Statistics | James Sharpnack | We consider the problem of simultaneously learning and detecting the anomalous pattern from a dictionary of smooth patterns and a database of many tensors. |

36 | Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk | Paul Hand, Vladislav Voroninski | In particular we consider two models, one in which the task is to invert a generative neural network given access to its last layer and another in which the task is to invert a generative neural network given only compressive linear observations of its last layer. |

37 | Small-loss bounds for online learning with partial information | Thodoris Lykouris, Karthik Sridharan, �va Tardos | We consider the problem of adversarial (non-stochastic) online learning with partial information feedback, where at each round, a decision maker selects an action from a finite set of alternatives. |

38 | Empirical bounds for functions with weak interactions | Andreas Maurer, Massimiliano Pontil | We provide sharp empirical estimates of expectation, variance and normal approximation for a class of statistics whose variation in any argument does not change too much when another argument is modified. |

39 | Restricted Eigenvalue from Stable Rank with Applications to Sparse Linear Regression | Shiva Prasad Kasiviswanathan, Mark Rudelson | We give two applications of this construction to sparse linear regression problems, including one to a compressed sparse regression setting where the regression algorithm only has access to a compressed representation of a fixed design matrix $X$. |

40 | Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent | Chi Jin, Praneeth Netrapalli, Michael I. Jordan | Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent |

41 | Convex Optimization with Unbounded Nonconvex Oracles using Simulated Annealing | Oren Mangoubi, Nisheeth K. Vishnoi | In this paper we study the more general case when the noise has magnitude $\alpha F(x) + \beta$ for some $\alpha, \beta > 0$, and present a polynomial time algorithm that finds an approximate minimizer of $F$ for this noise model. |

42 | Learning Mixtures of Linear Regressions with Nearly Optimal Complexity | Yuanzhi Li, Yingyu Liang | This paper proposes a fixed parameter tractable algorithm for the problem under general conditions, which achieves global convergence and the sample complexity scales nearly linearly in the dimension. |

43 | Detecting Correlations with Little Memory and Communication | Yuval Dagan, Ohad Shamir | We study the problem of identifying correlations in multivariate data, under information constraints: Either on the amount of memory that can be used by the algorithm, or the amount of communication when the data is distributed across several machines. |

44 | Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning | Gal Dalal, Gugan Thoppe, Bal�zs Sz�r�nyi, Shie Mannor | In this work, we develop a novel recipe for their finite sample analysis. |

45 | Near-Optimal Sample Complexity Bounds for Maximum Likelihood Estimation of Multivariate Log-concave Densities | Timothy Carpenter, Ilias Diakonikolas, Anastasios Sidiropoulos, Alistair Stewart | We study the problem of learning multivariate log-concave densities with respect to a global loss function. |

46 | More Adaptive Algorithms for Adversarial Bandits | Chen-Yu Wei, Haipeng Luo | We develop a novel and generic algorithm for the adversarial multi-armed bandit problem (or more generally the combinatorial semi-bandit problem). |

47 | Efficient Convex Optimization with Membership Oracles | Yin Tat Lee, Aaron Sidford, Santosh S. Vempala | We consider the problem of minimizing a convex function over a convex set given access only to an evaluation oracle for the function and a membership oracle for the set. |

48 | A General Approach to Multi-Armed Bandits Under Risk Criteria | Asaf Cassel, Shie Mannor, Assaf Zeevi | In this paper we provide a more systematic approach to analyzing such risk criteria within a stochastic multi-armed bandit (MAB) formulation. |

49 | An Optimal Learning Algorithm for Online Unconstrained Submodular Maximization | Tim Roughgarden, Joshua R. Wang | We consider a basic problem at the interface of two fundamental fields: {\em submodular optimization} and {\em online learning}. |

50 | The Mean-Field Approximation: Information Inequalities, Algorithms, and Complexity | Vishesh Jain, Frederic Koehler, Elchanan Mossel | Building on the methods used to prove the bound, along with techniques from combinatorics and optimization, we study the algorithmic problem of estimating the (variational) free energy for Ising models and general Markov random fields. |

51 | Approximation beats concentration? An approximation view on inference with smooth radial kernels | Mikhail Belkin | In this paper we take the approximation theory point of view to explore various aspects of smooth kernels related to their inferential properties. |

52 | Non-Convex Matrix Completion Against a Semi-Random Adversary | Yu Cheng, Rong Ge | In this paper, we investigate a more realistic semi-random model, where the probability of observing each entry is {\em at least} $p$. |

53 | The Vertex Sample Complexity of Free Energy is Polynomial | Vishesh Jain, Frederic Koehler, Elchanan Mossel | For Markov random fields of order $r$, we obtain an algorithm that achieves $\epsilon$ approximation using a number of samples polynomial in $r$ and $1/\epsilon$ and running time that is $2^{O(1/\epsilon^2)}$ up to polynomial factors in $r$ and $\epsilon$. |

54 | Efficient Algorithms for Outlier-Robust Regression | Adam Klivans, Pravesh K. Kothari, Raghu Meka | We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels. |

55 | Action-Constrained Markov Decision Processes With Kullback-Leibler Cost | Ana Bu�ic, Sean Meyn | This paper introduces a technique to solve a more general class of action-constrained MDPs. |

56 | Fundamental Limits of Weak Recovery with Applications to Phase Retrieval | Marco Mondelli, Andrea Montanari | We consider the case of Gaussian vectors $\boldsymbol a_i$. |

57 | Cutting plane methods can be extended into nonconvex optimization | Oliver Hinder | We show that it is possible to obtain an $O(\epsilon^{-4/3})$ runtime — including computational cost — for finding $\epsilon$-stationary points of nonconvex functions using cutting plane methods. |

58 | An Analysis of the t-SNE Algorithm for Data Visualization | Sanjeev Arora, Wei Hu, Pravesh K. Kothari | This work gives a formal framework for the problem of data visualization – finding a 2-dimensional embedding of clusterable data that correctly separates individual clusters to make them visually identifiable. |

59 | Adaptivity to Smoothness in X-armed bandits | Andrea Locatelli, Alexandra Carpentier | We study the stochastic continuum-armed bandit problem from the angle of adaptivity to \emph{unknown regularity} of the reward function $f$. |

60 | Black-Box Reductions for Parameter-free Online Learning in Banach Spaces | Ashok Cutkosky, Francesco Orabona | We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime. |

61 | A Data Prism: Semi-verified learning in the small-alpha regime | Michela Meister, Gregory Valiant | We consider a simple model of unreliable or crowdsourced data where there is an underlying set of $n$ binary variables, each “evaluator” contributes a (possibly unreliable or adversarial) estimate of the values of some subset of $r$ of the variables, and the learner is given the true value of a \emph{constant} number of variables. |

62 | A Direct Sum Result for the Information Complexity of Learning | Ido Nachum, Jonathan Shafer, Amir Yehudayoff | We introduce a class of functions of VC dimension $d$ over the domain $\mathcal{X}$ with information complexity at least $\Omega \left(d\log \log \frac{|\mathcal{X}|}{d}\right)$ bits for any consistent and proper algorithm (deterministic or random). |

63 | Online learning over a finite action set with limited switching | Jason Altschuler, Kunal Talwar | \par Next, to investigate the value of switching actions at a more granular level, we introduce the setting of \textit{switching budgets}, in which the algorithm is limited to $S \leq T$ switches between actions. |

64 | Smoothed Online Convex Optimization in High Dimensions via Online Balanced Descent | Niangjun Chen, Gautam Goel, Adam Wierman | We introduce a novel algorithmic framework for this problem, Online Balanced Descent (OBD), which works by iteratively projecting the previous point onto a carefully chosen level set of the current cost function so as to balance the switching costs and hitting costs. |

65 | Faster Rates for Convex-Concave Games | Jacob Abernethy, Kevin A. Lai, Kfir Y. Levy, Jun-Kun Wang | In this work we go further, showing that for a particular class of games one achieves a $O(1/T^2)$ rate, and we show how this applies to the Frank-Wolfe method and recovers a similar bound \citep{D15}. |

66 | $\ell_1$ Regression using Lewis Weights Preconditioning and Stochastic Gradient Descent | David Durfee, Kevin A. Lai, Saurabh Sawlani | We present preconditioned stochastic gradient descent (SGD) algorithms for the $\ell_1$ minimization problem $\min_{\boldsymbol{\mathit{x}}}\|\boldsymbol{\mathit{A}} \boldsymbol{\mathit{x}} – \boldsymbol{\mathit{b}}\|_1$ in the overdetermined case, where there are far more constraints than variables. |

67 | Optimal Single Sample Tests for Structured versus Unstructured Network Data | Guy Bresler, Dheeraj Nagaraj | Our goal is to test without know- ing the parameter values of the underlying models: only the structure of dependencies is known. |

68 | A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation | Jalaj Bhandari, Daniel Russo, Raghav Singal | In this work, we provide a \emph{simple and explicit finite time analysis} of temporal difference learning with linear function approximation. |

69 | Privacy-preserving Prediction | Cynthia Dwork, Vitaly Feldman | Here we formulate the problem of ensuring privacy of individual predictions and investigate the overheads required to achieve it in several standard models of classification and regression. |

70 | An Estimate Sequence for Geodesically Convex Optimization | Hongyi Zhang, Suvrit Sra | We propose a Riemannian version of Nesterov’s Accelerated Gradient algorithm (\textsc{Ragd}), and show that for \emph{geodesically} smooth and strongly convex problems, within a neighborhood of the minimizer whose radius depends on the condition number as well as the sectional curvature of the manifold, \textsc{Ragd} converges to the minimizer with acceleration. |

71 | The Externalities of Exploration and How Data Diversity Helps Exploitation | Manish Raghavan, Aleksandrs Slivkins, Jennifer Vaughan Wortman, Zhiwei Steven Wu | We introduce the notion of a group externality, measuring the extent to which the presence of one population of users (the majority) impacts the rewards of another (the minority). |

72 | Efficient Contextual Bandits in Non-stationary Worlds | Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford | In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with sophisticated statistical tests so as to dynamically adapt to a change in distribution. |

73 | Langevin Monte Carlo and JKO splitting | Espen Bernton | We develop novel connections between such Monte Carlo algorithms, the theory of Wasserstein gradient flow, and the operator splitting approach to solving PDEs. |

74 | Subpolynomial trace reconstruction for random strings \{and arbitrary deletion probability | Nina Holden, Robin Pemantle, Yuval Peres | We show that if $\bf x$ is chosen uniformly at random, then $\exp(O(\log^{1/3} n))$ traces suffice to reconstruct $\bf x$ with high probability. |

75 | An explicit analysis of the entropic penalty in linear programming | Jonathan Weed | We provide matching lower bounds and show that the entropic approach does not lead to a near-linear time approximation scheme for the linear assignment problem. |

76 | Efficient active learning of sparse halfspaces | Chicheng Zhang | In this paper, we provide a computationally efficient algorithm that achieves this goal. |

77 | Marginal Singularity, and the Benefits of Labels in Covariate-Shift | Samory Kpotufe, Guillaume Martinet | We present new minimax results that concisely capture the relative benefits of source and target labeled data, under {covariate-shift}. |

78 | Learning Single-Index Models in Gaussian Space | Rishabh Dudeja, Daniel Hsu | We consider regression problems where the response is a smooth but non-linear function of a $k$-dimensional projection of $p$ normally-distributed covariates, contaminated with additive Gaussian noise. |

79 | Hidden Integrality of SDP Relaxations for Sub-Gaussian Mixture Models | Yingjie Fei, Yudong Chen | We consider the problem of finding discrete clustering structures under Sub-Gaussian Mixture Models. |

80 | Counting Motifs with Graph Sampling | Jason M. Klusowski, Yihong Wu | In this paper, we study the problem of estimating the number of motifs as induced subgraphs under both models from a statistical perspective. |

81 | Approximate Nearest Neighbors in Limited Space | Piotr Indyk, Tal Wagner | We consider the $(1+\epsilon)$-approximate nearest neighbor search problem: given a set $X$ of $n$ points in a $d$-dimensional space, build a data structure that, given any query point $y$, finds a point $x \in X$ whose distance to $y$ is at most $(1+\epsilon) \min_{x \in X} \|x-y\|$ for an accuracy parameter $\epsilon \in (0,1)$. |

82 | Breaking the $1/\sqrt{n}$ Barrier: Faster Rates for Permutation-based Models in Polynomial Time | Cheng Mao, Ashwin Pananjady, Martin J. Wainwright | We consider the problem of estimating such a matrix based on noisy observations of a subset of its entries, and design and analyze a polynomial-time algorithm that improves upon the state of the art. |

83 | Unleashing Linear Optimizers for Group-Fair Learning and Optimization | Daniel Alabi, Nicole Immorlica, Adam Kalai | Most systems and learning algorithms optimize average performance or average loss – one reason being computational complexity. |

84 | The Many Faces of Exponential Weights in Online Learning | Dirk Hoeven, Tim Erven, Wojciech Kotlowski | Here we explore the alternative approach of putting Exponential Weights (EW) first. |

85 | Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem | Andre Wibisono | We propose the symmetrized Langevin algorithm (SLA), which should have a smaller bias than ULA, at the price of implementing a proximal gradient step in space. |

86 | Online Learning: Sufficient Statistics and the Burkholder Method | Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan | To demonstrate the scope and effectiveness of the Burkholder method, we develop a novel online strategy for matrix prediction that attains a regret bound corresponding to the variance term in matrix concentration inequalities. |

87 | Minimax Bounds on Stochastic Batched Convex Optimization | John Duchi, Feng Ruan, Chulhee Yun | We study the stochastic batched convex optimization problem, in which we use many \emph{parallel} observations to optimize a convex function given limited rounds of interaction. |

88 | Geometric Lower Bounds for Distributed Parameter Estimation under Communication Constraints | Yanjun Han, Ayfer �zg�r, Tsachy Weissman | For other models however, we show that the sample size reduction is re-mediated only linearly with increasing $k$, e.g. when some sub-Gaussian structure is available. |

89 | Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance | Yanjun Han, Jiantao Jiao, Tsachy Weissman | We present \emph{Local Moment Matching (LMM)}, a unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance. |

90 | Iterate Averaging as Regularization for Stochastic Gradient Descent | Gergely Neu, Lorenzo Rosasco | We propose and analyze a variant of the classic Polyak–Ruppert averaging scheme, broadly used in stochastic gradient methods. |

91 | Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form | Srinadh Bhojanapalli, Nicolas Boumal, Prateek Jain, Praneeth Netrapalli | In pursuit of low-rank solutions and low complexity algorithms, we consider the Burer–Monteiro factorization approach for solving SDPs. |

92 | Certified Computation from Unreliable Datasets | Themis Gouleakis, Christos Tzamos, Manolis Zampetakis | In this work, we provide a generic approach that is based on \textit{verification} of only few records of the data set to guarantee high quality learning outcomes for various optimization objectives. |

93 | Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon | Nan Jiang, Alekh Agarwal | How can we explain such a difference? |

94 | Open problem: Improper learning of mixtures of Gaussians | Elad Hazan, Livni Roi | Open problem: Improper learning of mixtures of Gaussians |