Paper Digest: AISTATS 2018 Highlights

June 17, 2018June 18, 2020 admin

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The International Conference on Artificial Intelligence and Statistics (AISTATS) is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: AISTATS 2018 Papers

	Title	Authors	Highlight
1	The Geometry of Random Features	Krzysztof Choromanski, Mark Rowland, Tamas Sarlos, Vikas Sindhwani, Richard Turner, Adrian Weller	We present an in-depth examination of the effectiveness of radial basis function kernel (beyond Gaussian) estimators based on orthogonal random feature maps.
2	Gauged Mini-Bucket Elimination for Approximate Inference	Sungsoo Ahn, Michael Chertkov, Jinwoo Shin, Adrian Weller	In this paper, we propose a new gauge-variational approach, termed WMBE-G, which combines gauge transformations with the weighted mini-bucket elimination (WMBE) method.
3	A Fast Algorithm for Separated Sparsity via Perturbed Lagrangians	Aleksander Madry, Slobodan Mitrovic, Ludwig Schmidt	In this paper, we make progress in this direction in the context of separated sparsity – a fundamental sparsity notion that captures exclusion constraints in linearly ordered data such as time series.
4	An Analysis of Categorical Distributional Reinforcement Learning	Mark Rowland, Marc Bellemare, Will Dabney, Remi Munos, Yee Whye Teh	In this paper, we introduce a framework to analyse CDRL algorithms, establish the importance of the projected distributional Bellman operator in distributional RL, draw fundamental connections between CDRL and the Cramer distance, and give a proof of convergence for sample-based categorical distributional reinforcement learning algorithms.
5	Combinatorial Preconditioners for Proximal Algorithms on Graphs	Thomas M�llenhoff, Zhenzhang Ye, Tao Wu, Daniel Cremers	We present a novel preconditioning technique for proximal optimization methods that relies on graph algorithms to construct effective preconditioners.
6	Growth-Optimal Portfolio Selection under CVaR Constraints	Guy Uziel, Ran El-Yaniv	We characterize the asymptomatically optimal risk-adjusted performance and present an investment strategy whose portfolios are guaranteed to achieve the asymptotic optimal solution while fulfilling the desired risk constraint.
7	Accelerated Stochastic Power Iteration	Peng Xu, Bryan He, Christopher De Sa, Ioannis Mitliagkas, Chris Re	In this paper, we study methods to accelerate power iteration in the stochastic setting by adding a momentum term.
8	Multi-scale Nystrom Method	Woosang Lim, Rundong Du, Bo Dai, Kyomin Jung, Le Song, Haesun Park	In this paper, we propose Nested Nystrom Method (NNM) which achieves a delicate balance between the approximation accuracy and computational efficiency by exploiting the multilayer structure and multiple compressions.
9	Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach	Satoshi Hara, Kohei Hayashi	In this study, we propose a method to make a complex tree ensemble interpretable by simplifying the model.
10	Mixed Membership Word Embeddings for Computational Social Science	James Foulds	I propose a probabilistic model-based word embedding method which can recover interpretable embeddings, without big data.
11	Fast Threshold Tests for Detecting Discrimination	Emma Pierson, Sam Corbett-Davies, Sharad Goel	To achieve these performance gains, we introduce and analyze a flexible family of probability distributions on the interval [0, 1] – which we call discriminant distributions – that is computationally efficient to work with.
12	Iterative Supervised Principal Components	Juho Piironen, Aki Vehtari	To do this, we propose a new dimension reduction technique, called iterative supervised principal components (ISPCs), which combines variable screening and dimension reduction and can be considered as an extension to the existing technique of supervised principal components (SPCs).
13	Iterative Spectral Method for Alternative Clustering	Chieh Wu, Stratis Ioannidis, Mario Sznaier, Xiangyu Li, David Kaeli, Jennifer Dy	We propose a novel Iterative Spectral Method (ISM) that greatly improves the scalability of KDAC.
14	Can clustering scale sublinearly with its clusters? A variational EM acceleration of GMMs and k-means	Dennis Forster, J�rg L�cke	In this study, we explore whether one iteration of k-means or EM for GMMs can scale sublinearly with C at run-time, while improving the clustering objective remains effective.
15	Parallelised Bayesian Optimisation via Thompson Sampling	Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, Barnabas Poczos	We design and analyse variations of the classical Thompson sampling (TS) procedure for Bayesian optimisation (BO) in settings where function evaluations are expensive but can be performed in parallel.
16	On the challenges of learning with inference networks on sparse, high-dimensional data	Rahul Krishnan, Dawen Liang, Matthew Hoffman	We propose methods to tackle it via iterative optimization inspired by stochastic variational inference (Hoffman et al., 2013) and improvements in the data representation used for inference.
17	Post Selection Inference with Kernels	Makoto Yamada, Yuta Umezu, Kenji Fukumizu, Ichiro Takeuchi	In this paper, we propose a kernel-based post-selection inference (PSI) algorithm that can find a set of statistically significant features from non-linearly related data.
18	On how complexity affects the stability of a predictor	Joel Ratsaby	We define the predictor’s complexity to be the amount of uncertainty in detecting that the criterion fails given that it fails.
19	On Truly Block Eigensolvers via Riemannian Optimization	Zhiqiang Xu, Xin Gao	We thus propose the concept of generalized k-th gap.
20	Layerwise Systematic Scan: Deep Boltzmann Machines and Beyond	Heng Guo, Kaan Kara, Ce Zhang	We show that the Gibbs sampler with a layerwise alternating scan order has its relaxation time (in terms of epochs) no larger than that of a random-update Gibbs sampler (in terms of variable updates).
21	IHT dies hard: Provable accelerated Iterative Hard Thresholding	Rajiv Khanna, Anastasios Kyrillidis	We study – both in theory and practice– the use of momentum motions in classic iterative hard thresholding (IHT) methods.
22	Finding Global Optima in Nonconvex Stochastic Semidefinite Optimization with Variance Reduction	Jinshan Zeng, Ke Ma, Yuan Yao	In this paper, we provide an answer that a stochastic gradient descent method with variance reduction, can be adapted to solve the nonconvex reformulation of the original convex problem, with a global linear convergence, i.e., converging to a global optimum exponentially fast, at a proper initial choice in the restricted strongly convex case.
23	Outlier Detection and Robust Estimation in Nonparametric Regression	Dehan Kong, Howard Bondell, Weining Shen	We propose to include a subject-specific mean shift parameter for each data point such that a nonzero parameter will identify its corresponding data point as an outlier.
24	Integral Transforms from Finite Data: An Application of Gaussian Process Regression to Fourier Analysis	Luca Ambrogioni, Eric Maris	In this paper, we use Gaussian process regression to estimate the Fourier transform (or any other integral transform) without making these assumptions.
25	AdaGeo: Adaptive Geometric Learning for Optimization and Sampling	Gabriele Abbati, Alessandra Tosi, Michael Osborne, Seth Flaxman	In order to overcome these difficulties, we propose AdaGeo, a preconditioning framework for adaptively learning the geometry of the parameter space during optimization or sampling.
26	Online Learning with Non-Convex Losses and Non-Stationary Regret	Xiand Gao, Xiaobo Li, Shuzhong Zhang	In this paper, we consider online learning with non-convex loss functions.
27	Learning Determinantal Point Processes in Sublinear Time	Christophe Dupuy, Francis Bach	We propose a new class of determinantal point processes (DPPs) which can be manipulated for inference and parameter learning in potentially sublinear time in the number of items.
28	Nonlinear Structured Signal Estimation in High Dimensions via Iterative Hard Thresholding	Kaiqing Zhang, Zhuoran Yang, Zhaoran Wang	We study the high-dimensional signal estimation problem with nonlinear measurements, where the signal of interest is either sparse or low-rank.
29	Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis	Hiroyuki Kasai, Hiroyuki Sato, Bamdev Mishra	The present paper proposes a Riemannian stochastic quasi-Newton algorithm with variance reduction (R-SQN-VR).
30	Online Boosting Algorithms for Multi-label Ranking	Young Hun Jung, Ambuj Tewari	We consider the multi-label ranking approach to multi-label learning.
31	Zeroth-Order Online Alternating Direction Method of Multipliers: Convergence Analysis and Applications	Sijia Liu, Jie Chen, Pin-Yu Chen, Alfred Hero	In this paper, we design and analyze a new zeroth-order online algorithm, namely, the zeroth-order online alternating direction method of multipliers (ZOO-ADMM), which enjoys dual advantages of being gradient-free operation and employing the ADMM to accommodate complex structured regularizers.
32	High-Dimensional Bayesian Optimization via Additive Models with Overlapping Groups	Paul Rolland, Jonathan Scarlett, Ilija Bogunovic, Volkan Cevher	In this paper, we consider the approach of Kandasamy et al. (2015), in which the high-dimensional function decomposes as a sum of lower-dimensional functions on subsets of the underlying variables.
33	Robust Active Label Correction	Jan Kremer, Fei Sha, Christian Igel	We approximate the true label noise by a model that learns the aspects of the noise that are class-conditional (i.e., independent of the input given the observed label).
34	Factorial HMMs with Collapsed Gibbs Sampling for Optimizing Long-term HIV Therapy	Amit Gruber, Chen Yanover, Tal El-Hay, Anders S�nnerborg, Vanni Borghi, Francesca Incardona, Yaara Goldschmidt	We present a novel generative model for HIV drug resistance evolution.
35	Optimal Submodular Extensions for Marginal Estimation	Pankaj Pansari, Chris Russell, M Pawan Kumar	Submodular extensions of an energy function can be used to efficiently compute approximate marginals via variational inference.
36	Semi-Supervised Learning with Competitive Infection Models	Nir Rosenfeld, Amir Globerson	Our goal in this work is to explore alternative mechanisms for propagating labels.
37	Discriminative Learning of Prediction Intervals	Nir Rosenfeld, Yishay Mansour, Elad Yom-Tov	In this work we consider the task of constructing prediction intervals in an inductive batch setting.
38	Topic Compositional Neural Language Model	Wenlin Wang, Zhe Gan, Wenqi Wang, Dinghan Shen, Jiaji Huang, Wei Ping, Sanjeev Satheesh, Lawrence Carin	We propose a Topic Compositional Neural Language Model (TCNLM), a novel method designed to simultaneously capture both the global semantic meaning and the local word-ordering structure in a document.
39	Learning Priors for Invariance	Eric Nalisnick, Padhraic Smyth	In this paper, we address the problem of how to specify an informative prior when the problem of interest is known to exhibit invariance properties.
40	Optimal Cooperative Inference	Scott Cheng-Hsin Yang, Yue Yu, arash Givchi, Pei Wang, Wai Keen Vong, Patrick Shafto	We present such a framework.
41	Stochastic Multi-armed Bandits in Constant Space	David Liau, Zhao Song, Eric Price, Ger Yang	We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms.
42	Matrix completability analysis via graph k-connectivity	Dehua Cheng, Natali Ruchansky, Yan Liu	In this paper, we make the observation that even when the observed matrix is too sparse for accurate completion, there may be portions of the data where completion is still possible.
43	FLAG n� FLARE: Fast Linearly-Coupled Adaptive Gradient Methods	Xiang Cheng, Fred Roosta, Stefan Palombo, Peter Bartlett, Michael Mahoney	We present accelerated and adaptive gradient methods, called FLAG and FLARE, which can offer the best of both worlds.
44	Multi-view Metric Learning in Vector-valued Kernel Spaces	Riikka Huusari, Hachem Kadri, C�cile Capponi	We consider the problem of metric learning for multi-view data and present a novel method for learning within-view as well as between-view metrics in vector-valued kernel spaces, as a way to capture multi-modal structure of the data.
45	Gaussian Process Subset Scanning for Anomalous Pattern Detection in Non-iid Data	William Herlands, Edward McFowland, Andrew Wilson, Daniel Neill	We introduce methods for identifying anomalous patterns in non-iid data by combining Gaussian processes with novel log-likelihood ratio statistic and subset scanning techniques.
46	Dropout as a Low-Rank Regularizer for Matrix Factorization	Jacopo Cavazza, Pietro Morerio, Benjamin Haeffele, Connor Lane, Vittorio Murino, Rene Vidal	In this paper, we present a theoretical analysis of dropout for MF, where Bernoulli random variables are used to drop columns of the factors.
47	A Simple Analysis for Exp-concave Empirical Minimization with Arbitrary Convex Regularizer	Tianbao Yang, Zhe Li, Lijun Zhang	In this paper, we present a simple analysis of fast rates with high probability of empirical minimization for it stochastic composite optimization over a finite-dimensional bounded convex set with exponential concave loss functions and an arbitrary convex regularization.
48	Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables	Masaaki Takada, Taiji Suzuki, Hironori Fujisawa	In this paper, we pro- pose a new regularization method, “Independently Interpretable Lasso” (IILasso).
49	Boosting Variational Inference: an Optimization Perspective	Francesco Locatello, Rajiv Khanna, Joydeep Ghosh, Gunnar Ratsch	In the present work, we study the convergence properties of this approach from a modern optimization viewpoint by establishing connections to the classic Frank-Wolfe algorithm.
50	Personalized and Private Peer-to-Peer Machine Learning	Aur�lien Bellet, Rachid Guerraoui, Mahsa Taziki, Marc Tommasi	In this paper, we introduce an efficient algorithm to address the above problem in a fully decentralized (peer-to-peer) and asynchronous fashion, with provable convergence rate.
51	Tensor Regression Meets Gaussian Processes	Rose Yu, Guangyu Li, Yan Liu	In this paper, we demonstrate interesting connections between the two, especially for multi-way data analysis.
52	A Nonconvex Proximal Splitting Algorithm under Moreau-Yosida Regularization	Emanuel Laude, Tao Wu, Daniel Cremers	To overcome this difficulty, in this work we consider a lifted variant of the Moreau-Yosida regularized model and propose a novel multiblock primal-dual algorithm that intrinsically stabilizes the dual block.
53	Medoids in Almost-Linear Time via Multi-Armed Bandits	Vivek Bagaria, Govinda Kamath, Vasilis Ntranos, Martin Zhang, David Tse	We present an algorithm Med-dit to compute the medoid with high probability, which uses $O(n\log n)$ distance evaluations.
54	Regional Multi-Armed Bandits	Zhiyang Wang, Ruida Zhou, Cong Shen	We propose an efficient algorithm, UCB-g, that solves the regional bandit problem by combining the Upper Confidence Bound (UCB) and greedy principles.
55	Nearly second-order optimality of online joint detection and estimation via one-sample update schemes	Yang Cao, Liyan Xie, Yao Xie, Huan Xu	We show that for such problems, detection procedures based on sequential likelihood ratios with simple one-sample update estimates such as online mirror descent are nearly second-order optimal.
56	Sum-Product-Quotient Networks	Or Sharir, Amnon Shashua	We present a novel tractable generative model that extends Sum-Product Networks (SPNs) and significantly boosts their power.
57	Exploiting Strategy-Space Diversity for Batch Bayesian Optimization	Sunil Gupta, Alistair Shilton, Santu Rana, Svetha Venkatesh	This paper proposes a novel approach to batch Bayesian optimisation using a multi-objective optimisation framework with exploitation and exploration forming two objectives.
58	Beating Monte Carlo Integration: a Nonasymptotic Study of Kernel Smoothing Methods	Stephan Cl�men�on, Fran�ois Portier	This paper is devoted to the study of a kernel smoothing based competitor built from a sequence of $n\geq 1$ i.i.d random vectors with arbitrary continuous probability distribution $f(x)dx$, originally proposed in Delyon et al. (2016), from a nonasymptotic perspective.
59	Group invariance principles for causal generative models	Michel Besserve, Naji Shajarisales, Bernhard Sch�lkopf, Dominik Janzing	Our aim in this paper is to propose a group theoretic framework for ICM to unify and generalize these approaches.
60	A Provable Algorithm for Learning Interpretable Scoring Systems	Nataliya Sokolovska, Yann Chevaleyre, Jean-Daniel Zucker	In this contribution, we introduce an original methodology to simultaneously learn interpretable binning mapped to a class variable, and the weights associated with these bins contributing to the score.
61	Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes	Hyunjik Kim, Yee Whye Teh	We propose Scalable Kernel Composition (SKC), a scalable kernel search algorithm that extends the Automatic Statistician to bigger data sets.
62	Efficient Bandit Combinatorial Optimization Algorithm with Zero-suppressed Binary Decision Diagrams	Shinsaku Sakaue, Masakazu Ishihata, Shin-ichi Minato	To avoid dealing with such huge action sets directly, we propose an algorithm that takes advantage of zero-suppressed binary decision diagrams, which encode action sets as compact graphs.
63	Transfer Learning on fMRI Datasets	Hejia Zhang, Po-Hsuan Chen, Peter Ramadge	A method is introduced to improve prediction accuracy on a primary fMRI dataset by jointly learning a model using other secondary fMRI datasets.
64	An Optimization Approach to Learning Falling Rule Lists	Chaofan Chen, Cynthia Rudin	We propose an optimization approach to learning falling rule lists and "softly" falling rule lists, along with Monte-Carlo search algorithms that use bounds on the optimal solution to prune the search space.
65	Catalyst for Gradient-based Nonconvex Optimization	Courtney Paquette, Hongzhou Lin, Dmitriy Drusvyatskiy, Julien Mairal, Zaid Harchaoui	We introduce a generic scheme to solve nonconvex optimization problems using gradient-based algorithms originally designed for minimizing convex functions.
66	Benefits from Superposed Hawkes Processes	Hongteng Xu, Dixin Luo, Xu Chen, Lawrence Carin	We investigate superposed Hawkes process as an important class of such models, with properties studied in the framework of least squares estimation.
67	Nonparametric Preference Completion	Julian Katz-Samuels, Clayton Scott	We propose a k-nearest neighbors-like algorithm and prove that it is consistent.
68	Non-parametric estimation of Jensen-Shannon Divergence in Generative Adversarial Network training	Mathieu Sinn, Ambrish Rawat	This work presents a rigorous statistical analysis of GANs providing straight-forward explanations for common training pathologies such as vanishing gradients.
69	Efficient and principled score estimation with Nystr�m kernel exponential families	Dougal Sutherland, Heiko Strathmann, Michael Arbel, Arthur Gretton	We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite dimensional.
70	Symmetric Variational Autoencoder and Connections to Adversarial Learning	Liqun Chen, Shuyang Dai, Yunchen Pu, Erjin Zhou, Chunyuan Li, Qinliang Su, Changyou Chen, Lawrence Carin	Symmetric Variational Autoencoder and Connections to Adversarial Learning
71	Few-shot Generative Modelling with Generative Matching Networks	Sergey Bartunov, Dmitry Vetrov	We develop a new generative model called Generative Matching Network which is inspired by the recently proposed matching networks for one-shot learning in discriminative tasks.
72	Nonlinear Weighted Finite Automata	Tianyu Li, Guillaume Rabusseau, Doina Precup	Weighted finite automata (WFA) can expressively model functions defined over strings but are inherently linear models.Given the recent successes of non-linear models in machine learning, it is natural to wonder whether extending WFA to the non-linearsetting would be beneficial.In this paper, we propose a novel model of neural network based nonlinear WFA model (NL-WFA) along with a learning algorithm.
73	Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models	Hugh Salimbeni, Stefanos Eleftheriadis, James Hensman	The natural gradient method has been used effectively in conjugate Gaussian process models, but the non-conjugate case has been largely unexplored.
74	Variational inference for the multi-armed contextual bandit	I�igo Urteaga, Chris Wiggins	We consider contextual multi-armed bandit applications where the true reward distribution is unknown and complex, which we approximate with a mixture model whose parameters are inferred via variational inference.
75	Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods	Robert Gower, Nicolas Le Roux, Francis Bach	Our goal is to improve variance reducing stochastic methods through better control variates.
76	Subsampling for Ridge Regression via Regularized Volume Sampling	Michal Derezinski, Manfred Warmuth	We propose a new procedure for selecting the subset of vectors, such that the ridge estimator obtained from that subset offers strong statistical guarantees in terms of the mean squared prediction error over the entire dataset of n labeled vectors.
77	Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition	Pavel Izmailov, Alexander Novikov, Dmitry Kropotov	We propose a method (TT-GP) for approximate inference in Gaussian Process (GP) models.
78	Batch-Expansion Training: An Efficient Optimization Framework	Michal Derezinski, Dhruv Mahajan, S. Sathiya Keerthi, S. V. N. Vishwanathan, Markus Weimer	We propose Batch-Expansion Training (BET), a framework for running a batch optimizer on a gradually expanding dataset.
79	Batched Large-scale Bayesian Optimization in High-dimensional Spaces	Zi Wang, Clement Gehring, Pushmeet Kohli, Stefanie Jegelka	In this paper, we propose ensemble Bayesian optimization (EBO) to address three current challenges in BO simultaneously: (1) large-scale observations; (2) high dimensional input spaces; and (3) selections of batch queries that balance quality and diversity.
80	Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series	Feras Saad, Vikash Mansinghka	This article proposes a Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series data.
81	Stochastic Three-Composite Convex Minimization with a Linear Operator	Renbo Zhao, Volkan Cevher	We develop a primal-dual convex minimization framework to solve a class of stochastic convex three-composite problem with a linear operator.
82	Direct Learning to Rank And Rerank	Cynthia Rudin, Yining Wang	Learning-to-rank techniques have proven to be extremely useful for prioritization problems, where we rank items in order of their estimated probabilities, and dedicate our limited resources to the top-ranked items.
83	One-shot Coresets: The Case of k-Clustering	Olivier Bachem, Mario Lucic, Silvio Lattanzi	In this work, we affirmatively answer this question by proposing an efficient algorithm that constructs such one-shot summaries for k-clustering problems while retaining strong theoretical guarantees.
84	Random Warping Series: A Random Features Method for Time-Series Embedding	Lingfei Wu, Ian En-Hsu Yen, Jinfeng Yi, Fangli Xu, Qi Lei, Michael Witbrock	In this work, we study a family of alignment-aware positive definite (p.d.) kernels, with its feature embedding given by a distribution of Random Warping Series (RWS).
85	Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD	Sanghamitra Dutta, Gauri Joshi, Soumyadip Ghosh, Parijat Dube, Priya Nagpurkar	In this work we present the first theoretical characterization of the speed-up offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime (wallclock time).
86	Variational Inference based on Robust Divergences	Futoshi Futami, Issei Sato, Masashi Sugiyama	In this paper, based on Zellner’s optimization and variational formulation of Bayesian inference, we propose an outlier-robust pseudo-Bayesian variational method by replacing the Kullback-Leibler divergence used for data fitting to a robust divergence such as the beta- and gamma-divergences.
87	Variational Rejection Sampling	Aditya Grover, Ramki Gummadi, Miguel Lazaro-Gredilla, Dale Schuurmans, Stefano Ermon	We propose a novel rejection sampling step that discards samples from the variational posterior which are assigned low likelihoods by the model.
88	Best arm identification in multi-armed bandits with delayed feedback	Aditya Grover, Todor Markov, Peter Attia, Norman Jin, Nicolas Perkins, Bryan Cheong, Michael Chen, Zi Yang, Stephen Harris, William Chueh, Stefano Ermon	In this paper, we propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedbacks.
89	A fully adaptive algorithm for pure exploration in linear bandits	Liyuan Xu, Junya Honda, Masashi Sugiyama	We propose the first fully-adaptive algorithm for pure exploration in linear bandits—the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly.
90	Contextual Bandits with Stochastic Experts	Rajat Sen, Karthikeyan Shanmugam, Sanjay Shakkottai	We propose upper-confidence bound (UCB) algorithms for this problem, which employ two different importance sampling based estimators for the mean reward for each expert.
91	Human Interaction with Recommendation Systems	Sven Schmit, Carlos Riquelme	We propose a simple model where users with heterogeneous preferences arrive over time.
92	Community Detection in Hypergraphs: Optimal Statistical Limit and Efficient Algorithms	I Chien, Chung-Yi Lin, I-Hsiang Wang	In this paper, community detection in hypergraphs is explored.
93	Smooth and Sparse Optimal Transport	Mathieu Blondel, Vivien Seguy, Antoine Rolet	In this paper, we explore regularizing the primal and dual OT formulations with a strongly convex term, which corresponds to relaxing the dual and primal constraints with smooth approximations.
94	Robust Maximization of Non-Submodular Objectives	Ilija Bogunovic, Junyao Zhao, Volkan Cevher	In this work, we present a new algorithm OBLIVIOUS-GREEDY and prove the first constant-factor approximation guarantees for a wider class of non-submodular objectives.
95	Cause-Effect Inference by Comparing Regression Errors	Patrick Bloebaum, Dominik Janzing, Takashi Washio, Shohei Shimizu, Bernhard Schoelkopf	We address the problem of inferring the causal relation between two variables by comparing the least-squares errors of the predictions in both possible causal directions.
96	Tree-based Bayesian Mixture Model for Competing Risks	Alexis Bellot, Mihaela Schaar	We aim with this setting to provide accurate individual estimates but also interpretable conclusions for use as a clinical decision support tool.
97	Actor-Critic Fictitious Play in Simultaneous Move Multistage Games	Julien Perolat, Bilal Piot, Olivier Pietquin	Using an architecture inspired by actor-critic algorithms, we build a stochastic approximation of the fictitious play process.
98	Random Subspace with Trees for Feature Selection Under Memory Constraints	Antonio Sutera, C�lia Ch�tel, Gilles Louppe, Louis Wehenkel, Pierre Geurts	In this paper, we consider the problem of feature selection in applications where the memory is not large enough to contain all features.
99	Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information	Jakob Runge	Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information
100	Quotient Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures	Tomi Silander, Janne Lepp�-aho, Elias J��saari, Teemu Roos	We introduce an information theoretic criterion for Bayesian network structure learning which we call quotient normalized maximum likelihood (qNML).
101	Convex Optimization over Intersection of Simple Sets: improved Convergence Rate Guarantees via an Exact Penalty Approach	Achintya Kundu, Francis Bach, Chiranjib Bhattacharya	We consider the problem of minimizing a convex function over the intersection of finitely many simple sets which are easy to project onto.
102	Variational Sequential Monte Carlo	Christian Naesseth, Scott Linderman, Rajesh Ranganath, David Blei	In this paper we present a new approximating family of distributions, the variational sequential Monte Carlo (VSMC) family, and show how to optimize it in variational inference.
103	Statistically Efficient Estimation for Non-Smooth Probability Densities	Masaaki Imaizumi, Takanori Maehara, Yuichi Yoshida	In this paper, we propose new estimators for non-smooth density functions by employing the notion of Szemeredi partitions from graph theory.
104	SDCA-Powered Inexact Dual Augmented Lagrangian Method for Fast CRF Learning	Xu Hu, Guillaume Obozinski	We propose an efficient dual augmented Lagrangian formulation to learn conditional random fields (CRF).
105	Generalized Concomitant Multi-Task Lasso for Sparse Multimodal Regression	Mathurin Massias, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon	We provide new statistical and computational solutions to perform heteroscedastic regression, with an emphasis on functional brain imaging with magneto- and electroencephalography (M/EEG).
106	Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models	Atsushi Nitanda, Taiji Suzuki	We propose a new technique that boosts the convergence of training generative adversarial networks.
107	Statistical Sparse Online Regression: A Diffusion Approximation Perspective	Jianqing Fan, Wenyan Gong, Chris Junchi Li, Qiang Sun	In this paper, we propose to adopt the diffusion approximation techniques to study online regression.
108	Guaranteed Sufficient Decrease for Stochastic Variance Reduced Gradient Optimization	Fanhua Shang, Yuanyuan Liu, Kaiwen Zhou, James Cheng, Kelvin Kai Wing Ng, Yuichi Yoshida	In this paper, we propose a novel sufficient decrease technique for stochastic variance reduced gradient descent methods such as SVRG and SAGA.
109	Delayed Sampling and Automatic Rao-Blackwellization of Probabilistic Programs	Lawrence Murray, Daniel Lund�n, Jan Kudlicka, David Broman, Thomas Sch�n	We introduce a dynamic mechanism for the solution of analytically-tractable substructure in probabilistic programs, using conjugate priors and affine transformations to reduce variance in Monte Carlo estimators.
110	Learning to Round for Discrete Labeling Problems	Pritish Mohapatra, Jawahar C.V., M Pawan Kumar	We present a novel interpretation of rounding procedures as sampling from a latent variable model, which opens the door to the use of powerful machine learning formulations in their design.
111	Approximate Ranking from Pairwise Comparisons	Reinhard Heckel, Max Simchowitz, Kannan Ramchandran, Martin Wainwright	In this paper we consider the problem of finding approximate rankings from pairwise comparisons.
112	Semi-Supervised Prediction-Constrained Topic Models	Michael Hughes, Gabriel Hope, Leah Weiner, Thomas McCoy, Roy Perlis, Erik Sudderth, Finale Doshi-Velez	We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels.
113	A Stochastic Differential Equation Framework for Guiding Online User Activities in Closed Loop	Yichen Wang, Evangelos Theodorou, Apurv Verma, Le Song	In this paper, we propose a framework to reformulate point processes into stochastic differential equations, which allows us to extend methods from stochastic optimal control to address the activity guiding problem.
114	Accelerated Stochastic Mirror Descent: From Continuous-time Dynamics to Discrete-time Algorithms	Pan Xu, Tianhao Wang, Quanquan Gu	We present a new framework to analyze accelerated stochastic mirror descent through the lens of continuous-time stochastic dynamic systems.
115	A Unified Framework for Nonconvex Low-Rank plus Sparse Matrix Recovery	Xiao Zhang, Lingxiao Wang, Quanquan Gu	We propose a unified framework to solve general low-rank plus sparse matrix recovery problems based on matrix factorization, which covers a broad family of objective functions satisfying the restricted strong convexity and smoothness conditions.
116	Bayesian Nonparametric Poisson-Process Allocation for Time-Sequence Modeling	Hongyi Ding, Mohammad Khan, Issei Sato, Masashi Sugiyama	In this work, we present the Bayesian nonparametric Poisson process allocation (BaNPPA), a latent-function model for time-sequences, which automatically infers the number of latent functions.
117	Factor Analysis on a Graph	Masayuki Karasuyama, Hiroshi Mamitsuka	We propose a Gaussian based analysis which is a combination of graph constrained covariance matrix estimation and factor analysis (FA).
118	Crowdclustering with Partition Labels	Junxiang Chen, Yale Chang, Peter Castaldi, Michael Cho, Brian Hobbs, Jennifer Dy	In this paper, we propose a crowdclustering model that directly analyzes partition labels.
119	Learning Structural Weight Uncertainty for Sequential Decision-Making	Ruiyi Zhang, Chunyuan Li, Changyou Chen, Lawrence Carin	We propose efficient posterior learning of structural weight uncertainty, within an SVGD framework, by employing matrix variate Gaussian priors on NN parameters.
120	Towards Memory-Friendly Deterministic Incremental Gradient Method	Jiahao Xie, Hui Qian, Zebang Shen, Chao Zhang	In this paper, we propose a new deterministic variant of the IG method SVRG that blends a periodically updated full gradient with a component function gradient selected in a cyclic order.
121	Optimality of Approximate Inference Algorithms on Stable Instances	Hunter Lang, David Sontag, Aravindan Vijayaraghavan	The goal of this paper is to partially explain the performance of α-expansion and an LP relaxation algorithm on MAP inference in Ferromagnetic Potts models (FPMs).
122	Bayesian Approaches to Distribution Regression	Ho Chung Leon Law, Dougal Sutherland, Dino Sejdinovic, Seth Flaxman	We frame our models in a neural network style, allowing for simple MAP inference using backpropagation to learn the parameters, as well as MCMC-based inference which can fully propagate uncertainty.
123	Submodularity on Hypergraphs: From Sets to Sequences	Marko Mitrovic, Moran Feldman, Andreas Krause, Amin Karbasi	In this paper, we introduce two new algorithms that provably give constant factor approximations for general graphs and hypergraphs having bounded in or out degrees.
124	Provable Estimation of the Number of Blocks in Block Models	Bowei Yan, Purnamrita Sarkar, Xiuyuan Cheng	In this paper, we propose an approach based on semi-definite relaxations, which does not require prior knowledge of model parameters like many existing convex relaxation methods and recovers the number of clusters and the clustering matrix exactly under a broad parameter regime, with probability tending to one.
125	Differentially Private Regression with Gaussian Processes	Michael Smith, Mauricio �lvarez, Max Zwiessele, Neil D. Lawrence	We propose a method using GPs to provide differentially private (DP) regression.
126	Adaptive balancing of gradient and update computation times using global geometry and approximate subproblems	Sai Praneeth Reddy Karimireddy, Sebastian Stich, Martin Jaggi	In this work, we propose a new framework, Approx Composite Minimization (ACM) that uses approximate update steps to ensure balance between the two operations.
127	VAE with a VampPrior	Jakub Tomczak, Max Welling	In this paper, we propose to extend the variational auto-encoder (VAE) framework with a new type of prior which we call "Variational Mixture of Posteriors" prior, or VampPrior for short.
128	Structured Factored Inference for Probabilistic Programming	Avi Pfeffer, Brian Ruttenberg, William Kretschmer, Alison OConnor	We present structured factored inference (SFI), a framework that enables factored inference algorithms to scale to significantly more complex programs.
129	A Generic Approach for Escaping Saddle points	Sashank Reddi, Manzil Zaheer, Suvrit Sra, Barnabas Poczos, Francis Bach, Ruslan Salakhutdinov, Alex Smola	To tackle this challenge, we introduce a generic framework that minimizes Hessian-based computations while at the same time provably converging to second-order critical points.
130	Policy Evaluation and Optimization with Continuous Treatments	Nathan Kallus, Angela Zhou	We study the problem of policy evaluation and learning from batched contextual bandit data when treatments are continuous, going beyond previous work on discrete treatments.
131	Multiphase MCMC Sampling for Parameter Inference in Nonlinear Ordinary Differential Equations	Alan Lazarus, Dirk Husmeier, Theodore Papamarkou	This paper presents a multiphase MCMC approach that attempts to close the gap between efficiency and accuracy.
132	Why Adaptively Collected Data Have Negative Bias and How to Correct for It	Xinkun Nie, Xiaoying Tian, Jonathan Taylor, James Zou	In this paper, we prove that when the data collection procedure satisfies natural conditions, then sample means of the data have systematic negative biases.
133	Sparse Linear Isotonic Models	Sheng Chen, Arindam Banerjee	In this paper, we introduce sparse linear isotonic models (SLIMs) for high-dimensional problems by hybridizing ideas in parametric sparse linear models and AIMs, which enjoy a few appealing advantages over both.
134	Robustness of classifiers to uniform $\ell_p$ and Gaussian noise	Jean-Yves Franceschi, Alhussein Fawzi, Omar Fawzi	We study the robustness of classifiers to various kinds of random noise models.
135	Nested CRP with Hawkes-Gaussian Processes	Xi Tan, Vinayak Rao, Jennifer Neville	In this paper, we propose a novel nonparametric Bayesian model that incorporates senders and receivers of messages into a hierarchical structure that governs the content and reciprocity of communications.
136	Sketching for Kronecker Product Regression and P-splines	Huaian Diao, Zhao Song, Wen Sun, David Woodruff	We take TensorSketch outside of the context of polynomials kernels, and show its utility in applications in which the underlying design matrix is a Kronecker product of smaller matrices.
137	Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models	Ardavan Saeedi, Matthew Hoffman, Stephen DiVerdi, Asma Ghandeharioun, Matthew Johnson, Ryan Adams	In this work, we develop a statistical model that meets these objectives.
138	Cheap Checking for Cloud Computing: Statistical Analysis via Annotated Data Streams	Chris Hickey, Graham Cormode	Our work aims to provide fast and practical methods to verify analysis of large data sets, where the client’s computation and memory costs are kept to a minimum.
139	Minimax Reconstruction Risk of Convolutional Sparse Dictionary Learning	Shashank Singh, Barnabas Poczos, Jian Ma	We compare our results to similar results for IID SDL and verify our theory with synthetic experiments.
140	Kernel Conditional Exponential Family	Michael Arbel, Arthur Gretton	An algorithm is provided for learning the generalized natural parameter, and consistency of the estimator is established in the well specified case.
141	Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go?	Chandrashekar Lakshminarayanan, Csaba Szepesvari	In this paper, we study a constant step-size averaged linear stochastic approximation (CALSA) algorithm, and for a given class of problems, we ask whether properties of $i)$ a universal constant step-size and $ii)$ a uniform fast rate of $\frac{C}{t}$ for the mean square-error hold for all instance of the class, where the constant $C>0$ does not depend on the problem instance.
142	Stochastic Zeroth-order Optimization in High Dimensions	Yining Wang, Simon Du, Sivaraman Balakrishnan, Aarti Singh	Under sparsity assumptions on the gradients or function values, we present two algorithms: a successive component/feature selection algorithm and a noisy mirror descent algorithm using Lasso gradient estimates, and show that both algorithms have convergence rates that depend only logarithmically on the ambient dimension of the problem.
143	Teacher Improves Learning by Selecting a Training Subset	Yuzhe Ma, Robert Nowak, Philippe Rigollet, Xuezhou Zhang, Xiaojin Zhu	For general learners, we provide a mixed-integer nonlinear programming-based algorithm to find a super teaching set.
144	Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation	Penporn Koanantakool, Alnur Ali, Ariful Azad, Aydin Buluc, Dmitriy Morozov, Leonid Oliker, Katherine Yelick, Sang-Yun Oh	To address these deficiencies, we introduce HP-CONCORD, a highly scalable optimization method for estimating a sparse inverse covariance matrix based on a regularized pseudolikelihood framework, without assuming Gaussianity.
145	Robust Vertex Enumeration for Convex Hulls in High Dimensions	Pranjal Awasthi, Bahman Kalantari, Yikai Zhang	We design a fast and robust algorithm named {All Vertex Traingle Algorithm (AVTA)} for detecting the vertices of the convex hull of a set of points in high dimensions.
146	Fast generalization error bound of deep learning from a kernel perspective	Taiji Suzuki	We show that the optimal width of the internal layers can be determined through the degree of freedom and derive the optimal convergence rate that is faster than $O(1/\sqrt{n})$ rate which has been shown in the existing studies.
147	Product Kernel Interpolation for Scalable Gaussian Processes	Jacob Gardner, Geoff Pleiss, Ruihan Wu, Kilian Weinberger, Andrew Wilson	We develop a new technique for MVM based learning that exploits product kernel structure.
148	Towards Provable Learning of Polynomial Neural Networks Using Low-Rank Matrix Estimation	Mohammadreza Soltani, Chinmay Hegde	In this context, we propose two novel, non-convex training algorithms which do not need any extra tuning parameters other than the number of hidden neurons.
149	Scalable Generalized Dynamic Topic Models	Patrick J�hnichen, Florian Wenzel, Marius Kloft, Stephan Mandt	In this paper, we present several new results around DTMs.
150	Bayesian Structure Learning for Dynamic Brain Connectivity	Michael Andersen, Ole Winther, Lars Kai Hansen, Russell Poldrack, Oluwasanmi Koyejo	This manuscript proposes a novel Bayesian model for dynamic brain connectivity.
151	Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method	Mark Eisen, Aryan Mokhtari, Alejandro Ribeiro	This paper propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition.
152	Frank-Wolfe Splitting via Augmented Lagrangian Method	Gauthier Gidel, Fabian Pedregosa, Simon Lacoste-Julien	In this work, we develop and analyze the Frank-Wolfe Augmented Lagrangian (FW-AL) algorithm, a method for minimizing a smooth function over convex compact sets related by a “linear consistency” constraint that only requires access to a linear minimization oracle over the individual constraints.
153	Learning linear structural equation models in polynomial time and sample complexity	Asish Ghoshal, Jean Honorio	We develop a new algorithm — which is computationally and statistically efficient and works in the high-dimensional regime — for learning linear SEMs from purely observational data with arbitrary noise distribution.
154	Convergence diagnostics for stochastic gradient descent with constant learning rate	Jerry Chee, Panos Toulis	In this paper, we develop a statistical diagnostic test to detect such phase transition in the context of stochastic gradient descent with constant learning rate.
155	Learning Sparse Polymatrix Games in Polynomial Time and Sample Complexity	Asish Ghoshal, Jean Honorio	We consider the problem of learning sparse polymatrix games from observations of strategic interactions.
156	Nonparametric Sharpe Ratio Function Estimation in Heteroscedastic Regression Models via Convex Optimization	Seung-Jean Kim, Johan Lim, Joong-Ho Won	We propose to solve the problem by solving a sequence of finite-dimensional convex programs with increasing dimensions, which can be done globally and efficiently.
157	Stochastic algorithms for entropy-regularized optimal transport problems	Brahim Khalil Abid, Robert Gower	In this work we develop a family of fast and practical stochastic algorithms for solving the optimal transport problem with an entropic penalization.
158	Plug-in Estimators for Conditional Expectations and Probabilities	Steffen Grunewalder	We study plug-in estimators of conditional expectations and probabilities, and we provide a systematic analysis of their rates of convergence.
159	Factorized Recurrent Neural Architectures for Longer Range Dependence	Francois Belletti, Alex Beutel, Sagar Jain, Ed Chi	In this article, we apply the theory of LRD stochastic processes to modern recurrent architectures, such as LSTMs and GRUs, and prove they do not provide LRD under assumptions sufficient for gradients to vanish.
160	On the Statistical Efficiency of Compositional Nonparametric Prediction	Yixi Xu, Jean Honorio, Xiao Wang	In this paper, we propose a compositional nonparametric method in which a model is expressed as a labeled binary tree of $2k+1$ nodes, where each node is either a summation, a multiplication, or the application of one of the $q$ basis functions to one of the $p$ covariates.
161	Metrics for Deep Generative Models	Nutan Chen, Alexej Klushyn, Richard Kurle, Xueyan Jiang, Justin Bayer, Patrick Smagt	The method yields a principled distance measure, provides a tool for visual inspection of deep generative models, and an alternative to linear interpolation in latent space.
162	Combinatorial Penalties: Which structures are preserved by convex relaxations?	Marwa El Halabi, Francis Bach, Volkan Cevher	We consider the homogeneous and the non-homogeneous convex relaxations for combinatorial penalty functions defined on support sets.
163	Generalized Binary Search For Split-Neighborly Problems	Stephen Mussmann, Percy Liang	In this paper, we introduce a weaker condition, split-neighborly, which requires that for the set of hypotheses two neighbors disagree on, any subset is splittable by some test.
164	Intersection-Validation: A Method for Evaluating Structure Learning without Ground Truth	Jussi Viinikka, Ralf Eggeling, Mikko Koivisto	This work introduces a method to compare algorithms’ ability to learn the model structure, assuming no ground truth is given.
165	On Statistical Optimality of Variational Bayes	Debdeep Pati, Anirban Bhattacharya, Yun Yang	The article addresses a long-standing open problem on the justification of using variational Bayes methods for parameter estimation.
166	Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed Systems	Jason Ge, Zhaoran Wang, Mengdi Wang, Han Liu	This paper proposes a distributed privacy-preserving sparse PCA (DPS-PCA) algorithm that generates a minimax-optimal sparse PCA estimator under differential privacy constraints.
167	Online Regression with Partial Information: Generalization and Linear Projection	Shinji Ito, Daisuke Hatano, Hanna Sumita, Akihiro Yabe, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi	In this paper, we propose a general setting for the limitation of the available information, where the observed information is determined by a function chosen from a given set of observation functions.
168	Learning Generative Models with Sinkhorn Divergences	Aude Genevay, Gabriel Peyre, Marco Cuturi	This paper presents the first tractable method to train large scale generative models using an OT-based loss called Sinkhorn loss which tackles these three issues by relying on two key ideas: (a) entropic smoothing, which turns the original OT loss into a differentiable and more robust quantity that can be computed using Sinkhorn fixed point iterations; (b) algorithmic (automatic) differentiation of these iterations with seam- less GPU execution.
169	Reparameterizing the Birkhoff Polytope for Variational Permutation Inference	Scott Linderman, Gonzalo Mena, Hal Cooper, Liam Paninski, John Cunningham	Combinatorial optimization algorithms may enable efficient point estimation, but fully Bayesian inference poses a severe challenge in this high-dimensional, discrete space.
170	Achieving the time of 1-NN, but the accuracy of k-NN	Lirong Xue, Samory Kpotufe	We propose a simple approach which, given distributed computing resources, can nearly achieve the accuracy of k-NN prediction, while matching (or improving) the faster prediction time of 1-NN.
171	Efficient Weight Learning in High-Dimensional Untied MLNs	Khan Mohammad Al Farabi, Somdeb Sarkhel, Deepak Venugopal	In this paper, we present an approach to perform efficient weight learning in MLNs containing high-dimensional, untied formulas.
172	Learning with Complex Loss Functions and Constraints	Harikrishna Narasimhan	We develop a general approach for solving constrained classification problems, where the loss and constraints are defined in terms of a general function of the confusion matrix.
173	Solving lp-norm regularization with tensor kernels	Saverio Salzo, Lorenzo Rosasco, Johan Suykens	In this paper, we discuss how a suitable family of tensor kernels can be used to efficiently solve nonparametric extensions of lp regularized learning methods.
174	Weighted Tensor Decomposition for Learning Latent Variables with Partial Data	Omer Gottesman, Weiwei Pan, Finale Doshi-Velez	In this work, we consider the case in which certain dimensions of the data are not always observed–common in applied settings, where not all measurements may be taken for all observations–resulting in moment estimates of varying quality.
175	Multi-objective Contextual Bandit Problem with Similarity Information	Eralp Turgay, Doruk Oner, Cem Tekin	In this paper we propose the multi-objective contextual bandit problem with similarity information.
176	Turing: A Language for Flexible Probabilistic Inference	Hong Ge, Kai Xu, Zoubin Ghahramani	In this work, we present a system called Turing for building MCMC algorithms for probabilistic programming inference.
177	Fast and Scalable Learning of Sparse Changes in High-Dimensional Gaussian Graphical Model Structure	Beilun Wang, arshdeep Sekhon, Yanjun Qi	We propose a novel method, DIFFEE for estimating DIFFerential networks via an Elementary Estimator under a high-dimensional situation.
178	Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control	Sanket Kamthe, Marc Deisenroth	To reduce the number of system interactions while simultaneously handling constraints, we propose a model-based RL framework based on probabilistic Model Predictive Control (MPC).
179	Approximate Bayesian Computation with Kullback-Leibler Divergence as Data Discrepancy	Bai Jiang	To bypass this difficulty, we adopt a Kullback-Leibler divergence estimator to assess the data discrepancy.
180	Practical Bayesian optimization in the presence of outliers	Ruben Martinez-Cantin, Kevin Tee, Michael McCourt	In this paper, we present an empirical evaluation of Bayesian optimization methods in the presence of outliers.
181	Competing with Automata-based Expert Sequences	Mehryar Mohri, Scott Yang	We consider a general framework of online learning with expert advice where regret is defined with respect to sequences of experts accepted by a weighted automaton.
182	Reducing Crowdsourcing to Graphon Estimation, Statistically	Devavrat Shah, Christina Lee	In this paper, we utilize a statistical reduction from crowdsourcing to graphon estimation to advance the state-of-art for both of these challenges.
183	Robust Locally-Linear Controllable Embedding	Ershad Banijamali, Rui Shu, mohammad Ghavamzadeh, Hung Bui, Ali Ghodsi	In this paper, we present a new model for learning robust locally-linear controllable embedding (RCE).
184	Combinatorial Semi-Bandits with Knapsacks	Karthik Abinav Sankararaman, Aleksandrs Slivkins	We define a common generalization, support it with several motivating examples, and design an algorithm for it.
185	Structured Optimal Transport	David Alvarez-Melis, Tommi Jaakkola, Stefanie Jegelka	In this work, we develop a nonlinear generalization of (discrete) optimal transport that is able to reflect much additional structure.
186	Graphical Models for Non-Negative Data Using Generalized Score Matching	Shiqing Yu, Mathias Drton, Ali Shojaie	In this paper, we give a generalized form of score matching for non-negative data that improves estimation efficiency.
187	Asynchronous Doubly Stochastic Group Regularized Learning	Bin Gu, Zhouyuan Huo, Heng Huang	To address this challenging problem, in this paper, we propose a novel asynchronous doubly stochastic proximal gradient algorithm with variance reduction (AsyDSPG+).
188	Convergence of Value Aggregation for Imitation Learning	Ching-An Cheng, Byron Boots	In this paper, we debunk the common belief that value aggregation always produces a convergent policy sequence with improving performance.
189	Inference in Sparse Graphs with Pairwise Measurements and Side Information	Dylan Foster, Karthik Sridharan, Daniel Reichman	We present new algorithms and a sharp finite-sample analysis for this problem on trees and sparse graphs with poor expansion properties such as hypergrids and ring lattices.
190	Parallel and Distributed MCMC via Shepherding Distributions	Arkabandhu Chowdhury, Christopher Jermaine	In this paper, we present a general algorithmic framework for developing easily parallelizable/distributable Markov Chain Monte Carlo (MCMC) algorithms.
191	The Power Mean Laplacian for Multilayer Graph Clustering	Pedro Mercado, Antoine Gautier, Francesco Tudisco, Matthias Hein	We introduce in this paper a one-parameter family of matrix power means for merging the Laplacians from different layers and analyze it in expectation in the stochastic block model.
192	Adaptive Sampling for Coarse Ranking	Sumeet Katariya, Lalit Jain, Nandana Sengupta, James Evans, Robert Nowak	We propose a computationally efficient PAC algorithm LUCBRank for coarse ranking, and derive an upper bound on its sample complexity.
193	Comparison Based Learning from Weak Oracles	Ehsan Kazemi, Lin Chen, Sanjoy Dasgupta, Amin Karbasi	In this paper, we introduce a new weak oracle model, where a non-malicious user responds to a pairwise comparison query only when she is quite sure about the answer.
194	The Binary Space Partitioning-Tree Process	Xuhui Fan, Bin Li, Scott Sisson	In this work, we propose a self-consistent Binary Space Partitioning (BSP)-Tree process to generalize the Mondrian process.
195	On denoising modulo 1 samples of a function	Mihai Cucuringu, Hemant Tyagi	Given the samples $(x_i,y_i)_{i=1}^{n}$ our goal is to recover smooth, robust estimates of the clean samples $f(x_i) \bmod 1$.
196	Scalable Hash-Based Estimation of Divergence Measures	Morteza Noshad, Alfred Hero	We propose a scalable divergence estimation method based on hashing.
197	Conditional Gradient Method for Stochastic Submodular Maximization: Closing the Gap	Aryan Mokhtari, Hamed Hassani, Amin Karbasi	In this paper, we study the problem of constrained and stochastic continuous submodular maximization.
198	Online Continuous Submodular Maximization	Lin Chen, Hamed Hassani, Amin Karbasi	In this paper, we consider an online optimization process, where the objective functions are not convex (nor concave) but instead belong to a broad class of continuous submodular functions.
199	Efficient Bayesian Methods for Counting Processes in Partially Observable Environments	Ferdian Jovan, Jeremy Wyatt, Nick Hawes	We present two tractable approximations, which we combine in a switching filter.
200	Matrix-normal models for fMRI analysis	Michael Shvartsman, Narayanan Sundaram, Mikio Aoi, Adam Charles, Theodore Willke, Jonathan Cohen	Our primary theoretical contribution shows how some of these meth- ods can be written as instantiations of the same model, allowing us to generalize them to flexibly modeling structured noise covari- ances.
201	The emergence of spectral universality in deep networks	Jeffrey Pennington, Samuel Schoenholz, Surya Ganguli	To this end, we leverage powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network’s Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth.
202	Spectral Algorithms for Computing Fair Support Vector Machines	Mahbod Olfat, Anil Aswani	This paper develops computationally tractable algorithms for designing accurate but fair support vector machines (SVM’s).
203	Bayesian Multi-label Learning with Sparse Features and Labels, and Label Co-occurrences	He Zhao, Piyush Rai, Lan Du, Wray Buntine	We present a probabilistic, fully Bayesian framework for multi-label learning.
204	Nonparametric Bayesian sparse graph linear dynamical systems	Rahi Kalantari, Joydeep Ghosh, Mingyuan Zhou	Nonparametric Bayesian sparse graph linear dynamical systems
205	Proximity Variational Inference	Jaan Altosaar, Rajesh Ranganath, David Blei	In this paper, we develop proximity variational inference (PVI).
206	Near-Optimal Machine Teaching via Explanatory Teaching Sets	Yuxin Chen, Oisin Mac Aodha, Shihan Su, Pietro Perona, Yisong Yue	In this paper, we propose NOTES, a principled framework for constructing interpretable teaching sets, utilizing explanations to accelerate the teaching process.
207	Learning Hidden Quantum Markov Models	Siddarth Srinivasan, Geoff Gordon, Byron Boots	We extend previous work on HQMMs with three contributions: (1) we show how classical hidden Markov models (HMMs) can be simulated on a quantum circuit, (2) we reformulate HQMMs by relaxing the constraints for modeling HMMs on quantum circuits, and (3) we present a learning algorithm to estimate the parameters of an HQMM from data.
208	Labeled Graph Clustering via Projected Gradient Descent	Shiau Hong Lim, Gregory Calvez	Inspired by recent advances in non-convex approaches to low-rank recovery problems, we propose an algorithm based on projected gradient descent that enjoys similar provable guarantees as the convex counterpart, but can be orders of magnitude faster.
209	Gradient Diversity: a Key Ingredient for Scalable Distributed Learning	Dong Yin, Ashwin Pananjady, Max Lam, Dimitris Papailiopoulos, Kannan Ramchandran, Peter Bartlett	In this work, we present an analysis hinting that high similarity between concurrently processed gradients may be a cause of this performance degradation.
210	HONES: A Fast and Tuning-free Homotopy Method For Online Newton Step	Yuting Ye, Lihua Lei, Cheng Ju	In this article, we develop and analyze a homotopy continuation method, referred to as HONES , for solving the sequential generalized projections in Online Newton Step (Hazan et al., 2006b), as well as the generalized problem known as sequential standard quadratic programming.
211	Probability�Revealing Samples	Krzysztof Onak, Xiaorui Sun	We introduce a model in which every sample comes with the information about the probability of selecting it.
212	Derivative Free Optimization Via Repeated Classification	Tatsunori Hashimoto, Steve Yadlowsky, John Duchi	We develop a procedure for minimizing a function using $n$ batched function value measurements at each of $T$ rounds by using classifiers to identify a function’s sublevel set.
213	Online Ensemble Multi-kernel Learning Adaptive to Non-stationary and Adversarial Environments	Yanning Shen, Tianyi Chen, Georgios Giannakis	Leveraging the random feature approximation and its recent orthogonality-promoting variant, the present contribution develops an online multi-kernel learning scheme to infer the intended nonlinear function ‘on the fly.’
214	A Unified Dynamic Approach to Sparse Model Selection	Chendi Huang, Yuan Yao	In this paper, we introduce a simple iterative regularization path, which follows the dynamics of a sparse Mirror Descent algorithm or a generalization of Linearized Bregman Iterations with nonlinear loss.
215	Bootstrapping EM via Power EM and Convergence in the Naive Bayes Model	Costis Daskalakis, Christos Tzamos, Manolis Zampetakis	We study the convergence properties of the Expectation-Maximization algorithm in the Naive Bayes model.
216	Dimensionality Reduced $\ell^{0}$-Sparse Subspace Clustering	Yingzhen Yang	In this paper, we present Dimensionality Reduced $\ell^{0}$-Sparse Subspace Clustering (DR-$\ell^{0}$-SSC).