# Paper Digest: AISTATS 2019 Highlights

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The International Conference on Artificial Intelligence and Statistics (AISTATS) is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to **sign up our free daily paper digest service ** to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team

team@paperdigest.org

#### TABLE 1: AISTATS 2019 Papers

Title | Authors | Highlight | |
---|---|---|---|

1 | Proximal Splitting Meets Variance Reduction | Fabian Pedregosa, Kilian Fatras, Mattia Casotto | In this work we propose and analyze VRTOS, a variance-reduced method to solve problems with an arbitrary number of non-smooth terms. |

2 | Optimal Noise-Adding Mechanism in Additive Differential Privacy | Quan Geng, Wei Ding, Ruiqi Guo, Sanjiv Kumar | We derive the optimal $(0, \delta)$-differentially private query-output independent noise-adding mechanism for single real-valued query function under a general cost-minimization framework. |

3 | Tossing Coins Under Monotonicity | Matey Neykov | As an application of this framework we propose a two step procedure for the binary monotone single index model, which consists of running LASSO and consequently running an isotonic regression. |

4 | Gaussian Regression with Convex Constraints | Matey Neykov | The focus of this paper is the linear model with Gaussian design under convex constraints. |

5 | Risk-Averse Stochastic Convex Bandit | Adrian Rivera Cardoso, Huan Xu | Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. |

6 | Error bounds for sparse classifiers in high-dimensions | Antoine Dedieu | Thus, we propose a tractable proximal algorithm to compute it and assess its empirical performance. |

7 | Boosting Transfer Learning with Survival Data from Heterogeneous Domains | Alexis Bellot, Mihaela Schaar | In this work we propose a survival prediction model that is able to improve predictions on a small data domain of interest – such as a local hospital – by leveraging related data from other domains – such as data from other hospitals. |

8 | Resampled Priors for Variational Autoencoders | Matthias Bauer, Andriy Mnih | We propose Learned Accept/Reject Sampling (LARS), a method for constructing richer priors using rejection sampling with a learned acceptance function. |

9 | Scalable Bayesian Learning for State Space Models using Variational Inference with SMC Samplers | Marcel Hirt, Petros Dellaportas | We present a scalable approach to performing approximate fully Bayesian inference in generic state space models. |

10 | Scalable Thompson Sampling via Optimal Transport | Ruiyi Zhang, Zheng Wen, Changyou Chen, Chen Fang, Tong Yu, Lawrence Carin | In this paper, we use distribution optimization techniques to approximate the posterior distribution, solved via Wasserstein gradient flows. |

11 | Inferring Multidimensional Rates of Aging from Cross-Sectional Data | Emma Pierson, Pang Wei Koh, Tatsunori Hashimoto, Daphne Koller, Jure Leskovec, Nick Eriksson, Percy Liang | Motivated by the study of human aging, we present an interpretable latent-variable model that learns temporal dynamics from cross-sectional data. |

12 | Interaction Detection with Bayesian Decision Tree Ensembles | Junliang Du, Antonio R. Linero | We address this problem by introducing Dirichlet process forests (DP-Forests), which leverage the presence of low-order interactions by clustering the trees so that trees within the same cluster focus on detecting a specific interaction. |

13 | On the Interaction Effects Between Prediction and Clustering | Matt Barnes, Artur Dubrawski | In particular, we focus on the problem of estimating the out-of-cluster (OOC) prediction loss given an approximate clustering with probabilistic error rate p_0. |

14 | Towards a Theoretical Understanding of Hashing-Based Neural Nets | Yibo Lin, Zhao Song, Lin F. Yang | In this paper, we provide provable guarantees on some hashing-based parameter reduction methods in neural nets. |

15 | Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds | Pan Zhou, Xiao-Tong Yuan, Jiashi Feng | In this paper, we introduce the Riemannian SPIDER (R-SPIDER) method as a novel nonlinear-metric extension of SPIDER for efficient non-convex optimization on Riemannian manifolds. |

16 | LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models | Yuan Zhou, Bradley J. Gram-Hansen, Tobias Kohn, Tom Rainforth, Hongseok Yang, Frank Wood | We develop a new Low-level, First-order Probabilistic Programming Language (LF-PPL) suited for models containing a mix of continuous, discrete, and/or piecewise-continuous variables. |

17 | Identifiability of Generalized Hypergeometric Distribution (GHD) Directed Acyclic Graphical Models | Gunwoong Park, Hyewon Park | We introduce a new class of identifiable DAG models where the conditional distribution of each node given its parents belongs to a family of generalized hypergeometric distributions (GHD). |

18 | Unbiased Implicit Variational Inference | Michalis K. Titsias, Francisco Ruiz | We develop unbiased implicit variational inference (UIVI), a method that expands the applicability of variational inference by defining an expressive variational family. |

19 | Efficient Linear Bandits through Matrix Sketching | Ilja Kuzborskij, Leonardo Cella, Nicol� Cesa-Bianchi | We prove that two popular linear contextual bandit algorithms, OFUL and Thompson Sampling, can be made efficient using Frequent Directions, a deterministic online sketching technique. |

20 | Orthogonal Estimation of Wasserstein Distances | Mark Rowland, Jiri Hron, Yunhao Tang, Krzysztof Choromanski, Tamas Sarlos, Adrian Weller | In this paper, we propose a new variant of sliced Wasserstein distance, study the use of orthogonal coupling in Monte Carlo estimation of Wasserstein distances and draw connections with stratified sampling, and evaluate our approaches experimentally in a range of large-scale experiments in generative modelling and reinforcement learning. |

21 | Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity | Simon S. Du, Wei Hu | We consider the convex-concave saddle point problem $\min_{x}\max_{y} f(x)+y^\top A x-g(y)$ where $f$ is smooth and convex and $g$ is smooth and strongly convex. |

22 | Greedy and IHT Algorithms for Non-convex Optimization with Monotone Costs of Non-zeros | Shinsaku Sakaue | In this paper, we consider the case where non-zero patterns are specified by monotone set functions. |

23 | Block Stability for MAP Inference | Hunter Lang, David Sontag, Aravindan Vijayaraghavan | We introduce a significantly more relaxed condition that only requires portions of an input instance to be stable. |

24 | A Stein�Papangelou Goodness-of-Fit Test for Point Processes | Jiasen Yang, Vinayak Rao, Jennifer Neville | In this work, we propose a general Stein operator for point processes based on the Papangelou conditional intensity function. |

25 | KAMA-NNs: Low-dimensional Rotation Based Neural Networks | Krzysztof Choromanski, Aldo Pacchiano, Jeffrey Pennington, Yunhao Tang | In the adaptive setting we propose efficient algorithms for learning products of low-dimensional rotations and show how our architectures can be used to improve space and time complexity of state of the art reinforcement learning (RL) algorithms (e.g. PPO, TRPO). |

26 | Statistical Windows in Testing for the Initial Distribution of a Reversible Markov Chain | Quentin Berthet, Varun Kanade | We study the problem of hypothesis testing between two discrete distributions, where we only have access to samples after the action of a known reversible Markov chain, playing the role of noise. |

27 | Sketching for Latent Dirichlet-Categorical Models | Joseph Tassarotti, Jean-Baptiste Tristan, Michael Wick | We examine a related problem in which the parameters of a Bayesian model are very large and expensive to store in memory, and propose more compact representations of parameter values that can be used during inference. |

28 | Adaptive Activity Monitoring with Uncertainty Quantification in Switching Gaussian Process Models | Randy Ardywibowo, Guang Zhao, Zhangyang Wang, Bobak Mortazavi, Shuai Huang, Xiaoning Qian | To efficiently compute the Gaussian process model likelihood and quantify the context prediction uncertainty, we propose a block circulant embedding technique and use Fast Fourier Transforms (FFT) for inference. |

29 | Near Optimal Algorithms for Hard Submodular Programs with Discounted Cooperative Costs | Rishabh Iyer, Jeffrey Bilmes | In this paper, we investigate a class of submodular problems which in general are very hard. |

30 | Fast Stochastic Algorithms for Low-rank and Nonsmooth Matrix Problems | Dan Garber, Atara Kaplan | In this paper we develop efficient algorithms for \textit{stochastic} optimization of a strongly-convex objective which includes both a nonsmooth term and a low-rank promoting term. |

31 | Logarithmic Regret for Online Gradient Descent Beyond Strong Convexity | Dan Garber | In this work, we use this classical result for the first time to obtain faster rates for \textit{online convex optimization} over polyhedral sets with curved convex, though not strongly convex, loss functions. |

32 | Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches | Filip Hanzely, Peter Richtarik | In this paper we design and analyze an accelerated coordinate descent (\texttt{ACD}) method which in each iteration updates a random subset of coordinates according to an arbitrary but fixed probability law, which is a parameter of the method. |

33 | Globally-convergent Iteratively Reweighted Least Squares for Robust Regression Problems | Bhaskar Mukhoty, Govind Gopakumar, Prateek Jain, Purushottam Kar | We provide the first global model recovery results for the IRLS (iteratively reweighted least squares) heuristic for robust regression problems. |

34 | Modularity-based Sparse Soft Graph Clustering | Alexandre Hollocou, Thomas Bonald, Marc Lelarge | We introduce an efficient optimization algorithm to solve this relaxation, that is both memory efficient and local. |

35 | Pathwise Derivatives for Multivariate Distributions | Martin Jankowiak, Theofanis Karaletsos | We exploit the link between the transport equation and derivatives of expectations to construct efficient pathwise gradient estimators for multivariate distributions. |

36 | Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning | Bo Liu, Xiao-Tong Yuan, Lezi Wang, Qingshan Liu, Junzhou Huang, Dimitris N. Metaxas | In this paper, we present a sample distributed greedy pursuit method for non-convex sparse learning under cardinality constraint. |

37 | Vine copula structure learning via Monte Carlo tree search | Bo Chang, Shenyi Pan, Harry Joe | To tackle this problem, we propose a novel approach to learning vine structures using MCTS. |

38 | Blind Demixing via Wirtinger Flow with Random Initialization | Jialin Dong, Yuanming Shi | To find simple initialization approach that works equally well as spectral initialization, we propose to solve blind demixing problem via Wirtinger flow with random initialization, which yields a natural implementation. |

39 | Performance Metric Elicitation from Pairwise Classifier Comparisons | Gaurush Hiranandani, Shant Boodaghians, Ruta Mehta, Oluwasanmi Koyejo | By exploiting key geometric properties of the space of confusion matrices, we obtain provably query efficient algorithms for eliciting linear and linear-fractional performance metrics. |

40 | Analysis of Network Lasso for Semi-Supervised Regression | Alexander Jung, Natalia Vesselinova | We apply network Lasso to semi-supervised regression problems involving network-structured data. |

41 | Learning Mixtures of Smooth Product Distributions: Identifiability and Algorithm | Nikos Kargas, Nicholas D. Sidiropoulos | In this work, we focus on multivariate mixtures of non-parametric product distributions and propose a two-stage approach which recovers the component distributions of the mixture under a smoothness condition. |

42 | Robust Matrix Completion from Quantized Observations | Jie Shen, Pranjal Awasthi, Ping Li | In this work, we consider the noisy statistical model where each observed entry can be flipped with some probability after quantization. |

43 | Foundations of Sequence-to-Sequence Modeling for Time Series | Zelda Mariet, Vitaly Kuznetsov | We provide the first theoretical analysis of this time series forecasting framework. |

44 | Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit | Yang Cao, Zheng Wen, Branislav Kveton, Yao Xie | We consider a scenario where the reward distributions may change in a piecewise-stationary fashion at unknown time steps. |

45 | An Optimal Algorithm for Stochastic Three-Composite Optimization | Renbo Zhao, William B. Haskell, Vincent Y. F. Tan | We develop an optimal primal-dual first-order algorithm for a class of stochastic three-composite convex minimization problems. |

46 | A Thompson Sampling Algorithm for Cascading Bandits | Wang Chi Cheung, Vincent Tan, Zixin Zhong | We design and analyze TS-Cascade, a Thompson sampling algorithm for the cascading bandit problem. |

47 | Lifelong Optimization with Low Regret | Yi-Shan Wu, Po-An Wang, Chi-Jen Lu | In this work, we study a problem arising from two lines of works: online optimization and lifelong learning. |

48 | Sparse Multivariate Bernoulli Processes in High Dimensions | Parthe Pandit, Mojtaba Sahraee-Ardakan, Arash Amini, Sundeep Rangan, Alyson K. Fletcher | We propose and analyze an $\ell_1$-regularized maximum likelihood (ML) estimator under the assumption that the parameter tensor is approximately sparse. |

49 | An Optimal Algorithm for Stochastic and Adversarial Bandits | Julian Zimmert, Yevgeny Seldin | We derive an algorithm that achieves the optimal (up to constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. |

50 | Efficient Bayesian Experimental Design for Implicit Models | Steven Kleinegesse, Michael U. Gutmann | We devise a novel experimental design framework for implicit models that improves upon previous work in two ways. |

51 | Local Saddle Point Optimization: A Curvature Exploitation Approach | Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann | We propose a novel optimization approach that exploits curvature information in order to escape from these undesired stationary points. |

52 | Testing Conditional Independence on Discrete Data using Stochastic Complexity | Alexander Marx, Jilles Vreeken | We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. |

53 | Distributionally Robust Submodular Maximization | Matthew Staib, Bryan Wilder, Stefanie Jegelka | In this paper, we achieve better performance on the actual underlying function f by directly optimizing a combination of bias and variance. |

54 | A Robust Zero-Sum Game Framework for Pool-based Active Learning | Dixian Zhu, Zhe Li, Xiaoyu Wang, Boqing Gong, Tianbao Yang | In this paper, we present a novel robust zero- sum game framework for pool-based active learning grounded on advanced statistical learning theory. |

55 | Support and Invertibility in Domain-Invariant Representations | Fredrik D. Johansson, David Sontag, Rajesh Ranganath | In this work, we give generalization bounds for unsupervised domain adaptation that hold for any representation function by acknowledging the cost of non-invertibility. |

56 | Efficient Inference in Multi-task Cox Process Models | Virginia Aglietti, Theodoros Damoulas, Edwin V. Bonilla | We derive closed-form expressions for the moments of the intensity functions and develop an efficient variational inference algorithm that is orders of magnitude faster than competing deterministic and stochastic approximations of multivariate LGCPs, coregionalization models, and multi-task permanental processes. |

57 | Optimization of Inf-Convolution Regularized Nonconvex Composite Problems | Emanuel Laude, Tao Wu, Daniel Cremers | In this work, we consider nonconvex composite problems that involve inf-convolution with a Legendre function, which gives rise to an anisotropic generalization of the proximal mapping and Moreau-envelope. |

58 | On Connecting Stochastic Gradient MCMC and Differential Privacy | Bai Li, Changyou Chen, Hao Liu, Lawrence Carin | We develop theory on the performance of the proposed differentially-private SG-MCMC method. |

59 | What made you do this? Understanding black-box decisions with sufficient input subsets | Brandon Carter, Jonas Mueller, Siddhartha Jain, David Gifford | Local explanation frameworks aim to rationalize particular decisions made by a black-box prediction model. We instead propose sufficient input subsets that identify minimal subsets of features whose observed values alone suffice for the same decision to be reached, even if all other input feature values are missing. |

60 | Computation Efficient Coded Linear Transform | Sinong Wang, Jiashang Liu, Ness Shroff, Pengyu Yang | In this paper, we propose a coded computation strategy, referred to as diagonal code, that achieves the optimum recovery threshold and the optimum computation load. |

61 | Mixing of Hamiltonian Monte Carlo on strongly log-concave distributions 2: Numerical integrators | Oren Mangoubi, Aaron Smith | We obtain quantitative bounds on the mixing properties of the Hamiltonian Monte Carlo (HMC) algorithm with target distribution in d-dimensional Euclidean space, showing that HMC mixes quickly whenever the target log-distribution is strongly concave and has Lipschitz gradients. |

62 | Temporal Quilting for Survival Analysis | Changhee Lee, William Zame, Ahmed Alaa, Mihaela Schaar | This paper develops a novel approach that combines the collective intelligence of different underlying survival models to produce a valid survival function that is well-calibrated and offers superior discriminative performance at different time horizons. |

63 | Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms | Mathieu Blondel, Andre Martins, Vlad Niculae | We analyze their properties in depth, showing that they unify many well-known loss functions and allow to create useful new ones easily. |

64 | On Target Shift in Adversarial Domain Adaptation | Yitong Li, Michael Murias, Samantha Major, Geraldine Dawson, David Carlson | In this work, we propose a method called Domain Adversarial nets for Target Shift (DATS) to address label shift while learning a domain invariant representation. |

65 | Optimal Testing in the Experiment-rich Regime | Sven Schmit, Virag Shah, Ramesh Johari | Motivated by the widespread adoption of large-scale A/B testing in industry, we propose a new experimentation framework for the setting where potential experiments are abundant (i.e., many hypotheses are available to test), and observations are costly; we refer to this as the experiment-rich regime. |

66 | Reversible Jump Probabilistic Programming | David A. Roberts, Marcus Gallagher, Thomas Taimre | In this paper we present a method for automatically deriving a Reversible Jump Markov chain Monte Carlo sampler from probabilistic programs that specify the target and proposal distributions. |

67 | Graph Embedding with Shifted Inner Product Similarity and Its Improved Approximation Capability | Akifumi Okuno, Geewook Kim, Hidetoshi Shimodaira | We propose shifted inner-product similarity (SIPS), which is a novel yet very simple extension of the ordinary inner-product similarity (IPS) for neural-network based graph embedding (GE). |

68 | High-dimensional Mixed Graphical Model with Ordinal Data: Parameter Estimation and Statistical Inference | Huijie Feng, Yang Ning | We propose a flexible model called Latent Mixed Gaussian Copula Model that simultaneously deals with such mixed data by assuming that the observed ordinal variables are generated by latent variables. |

69 | Robust Graph Embedding with Noisy Link Weights | Akifumi Okuno, Hidetoshi Shimodaira | We propose $\beta$-graph embedding for robustly learning feature vectors from data vectors and noisy link weights. |

70 | Exploring Fast and Communication-Efficient Algorithms in Large-Scale Distributed Networks | Yue Yu, Jiaxiang Wu, Junzhou Huang | In this work, we propose a new algorithm LPC-SVRG with quantized gradients and its acceleration ALPC-SVRG to effectively reduce the communication complexity while maintaining the same convergence as the unquantized algorithms. |

71 | Defending against Whitebox Adversarial Attacks via Randomized Discretization | Yuchen Zhang, Percy Liang | In this paper, we propose and analyze a simple and computationally efficient defense strategy: inject random Gaussian noise, discretize each pixel, and then feed the result into any pre-trained classifier. |

72 | Fisher Information and Natural Gradient Learning in Random Deep Networks | Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi | The present paper uses statistical neurodynamical method to reveal the properties of the Fisher information matrix in a net of random connections. |

73 | Robust descent using smoothed multiplicative noise | Matthew J. Holland | In this work, we propose a novel robust gradient descent procedure which makes use of a smoothed multiplicative noise applied directly to observations before constructing a sum of soft-truncated gradient coordinates. |

74 | Classification using margin pursuit | Matthew J. Holland | In this work, we study a new approach to optimizing the margin distribution realized by binary classifiers, in which the learner searches the hypothesis space in such a way that a pre-set margin level ends up being a distribution-robust estimator of the margin location. |

75 | Linear Queries Estimation with Local Differential Privacy | Raef Bassily | We study the problem of estimating a set of d linear queries with respect to some unknown distribution p over a domain $[J]$ based on a sensitive data set of n individuals under the constraint of local differential privacy. |

76 | Bayesian Learning of Neural Network Architectures | Georgi Dikov, Justin Bayer | In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. |

77 | Nonlinear Acceleration of Primal-Dual Algorithms | Raghu Bollapragada, Damien Scieur, Alexandre d�Aspremont | We describe a convergence acceleration scheme for multi-step optimization algorithms. |

78 | Gaussian Process Latent Variable Alignment Learning | Ieva Kazlauskaite, Carl Henrik Ek, Neill Campbell | We present a model that can automatically learn alignments between high-dimensional data in an unsupervised manner. |

79 | A Bayesian model for sparse graphs with flexible degree distribution and overlapping community structure | Juho Lee, Lancelot James, Seungjin Choi, Francois Caron | We propose an extension of the model that can accommodate an overlapping community structure. |

80 | Pseudo-Bayesian Learning with Kernel Fourier Transform as Prior | Ga�l Letarte, Emilie Morvant, Pascal Germain | We revisit Rahimi and Recht (2007)’s kernel random Fourier features (RFF) method through the lens of the PAC-Bayesian theory. |

81 | Forward Amortized Inference for Likelihood-Free Variational Marginalization | Luca Ambrogioni, Umut G��l�, Julia Berezutskaya, Eva Borne, Yagmur G��l�t�rk, Max Hinne, Eric Maris, Marcel Gerven | In this paper, we introduce a new form of amortized variational inference by using the forward KL divergence in a joint-contrastive variational loss. |

82 | SpikeCaKe: Semi-Analytic Nonparametric Bayesian Inference for Spike-Spike Neuronal Connectivity | Luca Ambrogioni, Patrick Ebel, Max Hinne, Umut G��l�, Marcel Gerven, Eric Maris | In this paper we introduce a semi-analytic variational framework for approximating the posterior of a Gaussian processes coupled through non-linear emission models. |

83 | Scalable Gaussian Process Inference with Finite-data Mean and Variance Guarantees | Jonathan H. Huggins, Trevor Campbell, Mikolaj Kasprzak, Tamara Broderick | Our main contribution is a novel objective for approximate inference in the nonparametric setting: the preconditioned Fisher (pF) divergence. |

84 | Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization | Jonas Kohler, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann, Ming Zhou, Klaus Neymeyr | Our main contribution towards this goal is the identification of various problem instances in the realm of machine learning where Batch Normalization can provably accelerate optimization. |

85 | A new evaluation framework for topic modeling algorithms based on synthetic corpora | Hanyu Shi, Martin Gerlach, Isabel Diersen, Doug Downey, Luis Amaral | Here, we propose a new framework for the evaluation of topic modeling algorithms based on synthetic corpora containing an unambiguously defined ground truth topic structure. |

86 | On Kernel Derivative Approximation with Random Fourier Features | Zoltan Szabo, Bharath Sriperumbudur | Particularly, in this paper, we focus on the approximation quality of RFFs for kernel derivatives and prove that the existing finite-sample guarantees can be improved exponentially in terms of the domain where they hold, using recent tools from unbounded empirical process theory. |

87 | Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows | George Papamakarios, David Sterratt, Iain Murray | We present Sequential Neural Likelihood (SNL), a new method for Bayesian inference in simulator models, where the likelihood is intractable but simulating data from the model is possible. |

88 | Optimal Transport for Multi-source Domain Adaptation under Target Shift | Ievgen Redko, Nicolas Courty, R�mi Flamary, Devis Tuia | In this paper, we tackle the problem of reducing discrepancies between multiple domains, i.e. multi-source domain adaptation, and consider it under the target shift assumption: in all domains we aim to solve a classification problem with the same output classes, but with different labels proportions. |

89 | Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning | Aapo Hyvarinen, Hiroaki Sasaki, Richard Turner | Here, we propose a general framework for nonlinear ICA, which, as a special case, can make use of temporal structure. |

90 | Deep Neural Networks Learn Non-Smooth Functions Effectively | Masaaki Imaizumi, Kenji Fukumizu | We derive the generalization error of estimators by DNNs with a ReLU activation, and show that convergence rates of the generalization by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate. |

91 | Attenuating Bias in Word vectors | Sunipa Dev, Jeff Phillips | In this work, we explore new simple ways to detect the most stereotypically gendered words in an embedding and remove the bias from them. |

92 | Fisher-Rao Metric, Geometry, and Complexity of Neural Networks | Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, James Stokes | We introduce a new notion of capacity — the Fisher-Rao norm — that possesses desirable invariance properties and is motivated by Information Geometry. |

93 | Accelerated Decentralized Optimization with Local Updates for Smooth and Strongly Convex Objectives | Hadrien Hendrikx, Francis Bach, Laurent Massoulie | In this paper, we study the problem of minimizing a sum of smooth and strongly convex functions split over the nodes of a network in a decentralized fashion. |

94 | Interaction Matters: A Note on Non-asymptotic Local Convergence of Generative Adversarial Networks | Tengyuan Liang, James Stokes | Motivated by the pursuit of a systematic computational and algorithmic understanding of Generative Adversarial Networks (GANs), we present a simple yet unified non-asymptotic local convergence theory for smooth two-player games, which subsumes several discrete-time gradient-based saddle point dynamics. |

95 | On Constrained Nonconvex Stochastic Optimization: A Case Study for Generalized Eigenvalue Decomposition | Zhehui Chen, Xingguo Li, Lin Yang, Jarvis Haupt, Tuo Zhao | Motivated by these neat geometric structures, we propose a simple, efficient, and stochastic primal-dual algorithm solving the online GEV problem. |

96 | Generalized Boltzmann Machine with Deep Neural Structure | Yingru Liu, Dongliang Xie, Xin Wang | In this paper, to bridge the gap between RBM and artificial neural network, we propose an energy-based probabilistic model that is more flexible on modeling continuous data. |

97 | Extreme Stochastic Variational Inference: Distributed Inference for Large Scale Mixture Models | Jiong Zhang, Parameswaran Raman, Shihao Ji, Hsiang-Fu Yu, S.V.N. Vishwanathan, Inderjit Dhillon | In this paper, we present extreme stochastic variational inference (ESVI), a distributed, asynchronous and lock-free algorithm to perform variational inference for mixture models on massive real world datasets. |

98 | Correcting the bias in least squares regression with volume-rescaled sampling | Michal Derezinski, Manfred K. Warmuth, Daniel Hsu | We show this when the additional sample consists of d points drawn jointly according to the input distribution rescaled by the squared volume spanned by the points. |

99 | Conservative Exploration using Interleaving | Sumeet Katariya, Branislav Kveton, Zheng Wen, Vamsi K. Potluru | We design efficient learning algorithms for this problem, bound their n-step regret, and evaluate them on both synthetic and real-world problems. |

100 | Conditionally Independent Multiresolution Gaussian Processes | Jalil Taghia, Thomas Sch�n | Our contribution is a new construction which instead assumes conditional independence among GPs across resolutions. |

101 | Active Exploration in Markov Decision Processes | Jean Tarbouriech, Alessandro Lazaric | We introduce a novel learning algorithm to solve this problem showing that active exploration in MDPs may be significantly more difficult than in MAB. We introduce the active exploration problem in Markov decision processes (MDPs). |

102 | On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes | Xiaoyu Li, Francesco Orabona | In this paper, we start closing this gap: we theoretically analyze in the convex and non-convex settings a generalized version of the AdaGrad stepsizes. |

103 | Bandit Online Learning with Unknown Delays | Bingcong Li, Tianyi Chen, Georgios B. Giannakis | This paper deals with bandit online learning, where feedback of unknown delay can emerge in non-stochastic multi-armed bandit (MAB) and bandit convex optimization (BCO) settings. |

104 | Learning Invariant Representations with Kernel Warping | Yingyi Ma, Vignesh Ganapathiraman, Xinhua Zhang | In this work, we break this limitation by designing a new representation learning algorithm that incorporates invariances \emph{beyond transformation}. |

105 | $�^3$-IRT: A New Item Response Model and its Applications | Yu Chen, Telmo Silva Filho, Ricardo B. Prudencio, Tom Diethe, Peter Flach | In this paper, we propose the $\beta^3$-IRT model, which models continuous responses and can generate a much enriched family of Item Characteristic Curves. |

106 | Can You Trust This Prediction? Auditing Pointwise Reliability After Learning | Peter Schulam, Suchi Saria | In this paper, we describe resampling uncertainty estimation (RUE), an algorithm to audit the pointwise reliability of predictions. |

107 | Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach | Ryo Karakida, Shotaro Akaho, Shun-ichi Amari | The present study reveals novel statistics of FIM that are universal among a wide class of DNNs. |

108 | Conditional Sparse $L_p$-norm Regression With Optimal Probability | John Hainline, Brendan Juba, Hai S. Le, David Woodruff | We consider the following conditional linear regression problem: the task is to identify both (i) a $k$-DNF condition $c$ and (ii) a linear rule $f$ such that the probability of $c$ is (approximately) at least some given bound $\mu$, and minimizing the $l_p$ loss of $f$ at predicting the target $z$ in the distribution conditioned on $c$. |

109 | On the Connection Between Learning Two-Layer Neural Networks and Tensor Decomposition | Marco Mondelli, Andrea Montanari | We consider a model with feature vectors $x$, $r$ hidden units with weights $w_i$ and output $y$, i.e., $y=\sum_{i=1}^r \sigma(w_i^{T} x)$, with activation functions given by low-degree polynomials. |

110 | Autoencoding any Data through Kernel Autoencoders | Pierre Laforgue, St�phan Cl�men�on, Florence d�Alche-Buc | This paper investigates a novel algorithmic approach to data representation based on kernel methods. |

111 | Towards Understanding the Generalization Bias of Two Layer Convolutional Linear Classifiers with Gradient Descent | Yifan Wu, Barnabas Poczos, Aarti Singh | A major challenge in understanding the generalization of deep learning is to explain why (stochastic) gradient descent can exploit the network architecture to find solutions that have good generalization performance when using high capacity models. |

112 | Learning to Optimize under Non-Stationarity | Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu | We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. |

113 | SPONGE: A generalized eigenproblem for clustering signed networks | Mihai Cucuringu, Peter Davies, Aldo Glielmo, Hemant Tyagi | We introduce a principled and theoretically sound spectral method for k-way clustering in signed graphs, where the affinity measure between nodes takes either positive or negative values. |

114 | Deep Neural Networks with Multi-Branch Architectures Are Intrinsically Less Non-Convex | Hongyang Zhang, Junru Shao, Ruslan Salakhutdinov | In this work, we provide strong guarantees of this quantity for two classes of network architectures. |

115 | Are we there yet? Manifold identification of gradient-related proximal methods | Yifan Sun, Halyun Jeong, Julie Nutini, Mark Schmidt | In this work we provide a unified view by giving a simple condition under which any proximal method using a constant step size can achieve finite-iteration manifold detection. |

116 | Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication | Jayadev Acharya, Ziteng Sun, Huanyu Zhang | We propose Hadamard Response (HR), a local privatization scheme that requires no shared randomness and is symmetric with respect to the users. |

117 | XBART: Accelerated Bayesian Additive Regression Trees | Jingyu He, Saar Yalov, P. Richard Hahn | We present a stochastic hill climbing algorithm that matches the remarkable predictive accuracy of previous BART implementations, but is many times faster and less memory intensive. |

118 | A Swiss Army Infinitesimal Jackknife | Ryan Giordano, William Stephenson, Runjing Liu, Michael Jordan, Tamara Broderick | In this work, we use a linear approximation to the dependence of the fitting procedure on the weights, producing results that can be faster than repeated re-fitting by an order of magnitude. |

119 | Online Multiclass Boosting with Bandit Feedback | Daniel T. Zhang, Young Hun Jung, Ambuj Tewari | We present online boosting algorithms for multiclass classification with bandit feedback, where the learner only receives feedback about the correctness of its prediction. |

120 | Auto-Encoding Total Correlation Explanation | Shuyang Gao, Rob Brekelmans, Greg Ver Steeg, Aram Galstyan | We propose an information-theoretic approach to characterizing disentanglement and dependence in representation learning using multivariate mutual information, also called total correlation. |

121 | Towards Efficient Data Valuation Based on the Shapley Value | Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve G�rel, Bo Li, Ce Zhang, Dawn Song, Costas J. Spanos | In this paper, we study the problem of \emph{data valuation} by utilizing the Shapley value, a popular notion of value which originated in coopoerative game theory. |

122 | Bayesian optimisation under uncertain inputs | Rafael Oliveira, Lionel Ott, Fabio Ramos | In this context, we propose an upper confidence bound (UCB) algorithm for BO problems where both the outcome of a query and the true query location are uncertain. |

123 | Optimal Minimization of the Sum of Three Convex Functions with a Linear Operator | Seyoon Ko, Joong-Ho Won | We propose a class of optimal-rate primal-dual algorithms for minimization of the sum of three convex functions with a linear operator. |

124 | Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron | Sharan Vaswani, Francis Bach, Mark Schmidt | For such models, we show that the stochastic gradients of common loss functions satisfy a strong growth condition. |

125 | No-regret algorithms for online $k$-submodular maximization | Tasuku Soma | We present a polynomial time algorithm for online maximization of $k$-submodular maximization. |

126 | Lagrange Coded Computing: Optimal Design for Resiliency, Security, and Privacy | Qian Yu, Songze Li, Netanel Raviv, Seyed Mohammadreza Mousavi Kalan, Mahdi Soltanolkotabi, Salman A. Avestimehr | We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. |

127 | Subsampled Renyi Differential Privacy and Analytical Moments Accountant | Yu-Xiang Wang, Borja Balle, Shiva Prasad Kasiviswanathan | We study the problem of subsampling in differential privacy (DP), a question that is the centerpiece behind many successful differentially private machine learning algorithms. |

128 | Model Consistency for Learning with Mirror-Stratifiable Regularizers | Jalal Fadili, Guillaume Garrigos, J�r�me Malick, Gabriel Peyr� | In this work, we provide the theoretical underpinning of this behavior using the notion of mirror-stratifiable regularizers. |

129 | From Cost-Sensitive to Tight F-measure Bounds | Kevin Bascol, R�mi Emonet, Elisa Fromont, Amaury Habrard, Guillaume Metzler, Marc Sebban | In this article, we derive theoretical guarantees that give tight bounds on the best F-measure that can be obtained from cost-sensitive learning. |

130 | Feature subset selection for the multinomial logit model via mixed-integer optimization | Shunsuke Kamiya, Ryuhei Miyashiro, Yuichi Takano | In this paper, we propose an exact algorithm to solve the problem for the MNL model. |

131 | Low-Precision Random Fourier Features for Memory-constrained Kernel Approximation | Jian Zhang, Avner May, Tri Dao, Christopher Re | Because storing a high-rank approximation is memory intensive, we propose using a low-precision quantization of random Fourier features (LP-RFFs) to build a high-rank approximation under a memory budget. |

132 | Restarting Frank-Wolfe | Thomas Kerdreux, Alexandre d�Aspremont, Sebastian Pokutta | Here, we present a new variant of Conditional Gradients, that can dynamically adapt to the function’s geometric properties using restarts and thus smoothly interpolates between the sublinear and linear regimes. |

133 | Adaptive Ensemble Prediction for Deep Neural Networks based on Confidence Level | Hiroshi Inoue | We evaluated the adaptive ensembling by using various datasets and showed that it reduces the computation cost significantly while achieving accuracy similar to that of static ensembling using a pre-defined number of local predictions. |

134 | Infinite Task Learning in RKHSs | Romain Brault, Alex Lambert, Zoltan Szabo, Maxime Sangnier, Florence d�Alche-Buc | In this work, we introduce a novel approach called Infinite Task Learning whose goal is to learn a function whose output is a function over the hyperparameter space. |

135 | Detection of Planted Solutions for Flat Satisfiability Problems | Quentin Berthet, Jordan Ellenberg | We describe the properties of random instances of flat satisfiability, as well of the optimal rates of detection of the associated hypothesis testing problem. |

136 | Markov Properties of Discrete Determinantal Point Processes | Kayvan Sadeghi, Alessandro Rinaldo | In this note, we derive the Markov properties of discrete DPPs and show how they can be expressed using graphical models. |

137 | Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms | Alihan Huyuk, Cem Tekin | We analyze the regret of combinatorial Thompson sampling (CTS) for the combinatorial multi-armed bandit with probabilistically triggered arms under the semi-bandit feedback setting. |

138 | Distilling Policy Distillation | Wojciech M. Czarnecki, Razvan Pascanu, Simon Osindero, Siddhant Jayakumar, Grzegorz Swirszcz, Max Jaderberg | In this work, we rigorously explore the entire landscape of policy distillation, comparing the motivations and strengths of each variant through theoretical and empirical analysis. |

139 | Support Localization and the Fisher Metric for off-the-grid Sparse Regularization | Clarice Poon, Nicolas Keriven, Gabriel Peyr� | In this article, we study the BLASSO (i.e. the off-the-grid version of l1 LASSO regularization) and show that the Fisher-Rao distance is the natural way to ensure and quantify support recovery, since it preserves the invariance of the problem under reparameterization. |

140 | Fast Gaussian process based gradient matching for parameter identification in systems of nonlinear ODEs | Philippe Wenk, Alkis Gotovos, Stefan Bauer, Nico S. Gorbach, Andreas Krause, Joachim M. Buhmann | We offer a novel interpretation which leads to a better understanding, improvements in state-of-the-art performance in terms of accuracy and robustness and a decrease in run time due to a more efficient setup for general nonlinear dynamical systems. |

141 | Semi-Generative Modelling: Covariate-Shift Adaptation with Cause and Effect Features | Julius K�gelgen, Alexander Mey, Marco Loog | Using ideas from causality, we argue that this requires learning with both causes, $X_C$, and effects, $X_E$, of a target variable, $Y$, and show how this setting leads to what we call a semi-generative model, $P(Y,X_E|X_C,\theta)$. |

142 | A Continuous-Time View of Early Stopping for Least Squares Regression | Alnur Ali, J. Zico Kolter, Ryan J. Tibshirani | We study the statistical properties of the iterates generated by gradient descent, applied to the fundamental problem of least squares regression. |

143 | Towards Clustering High-dimensional Gaussian Mixture Clouds in Linear Running Time | Dan Kushnir, Shirin Jalali, Iraj Saniee | In this paper, we focus on clustering $n$ samples from an arbitrary mixture of $c$-separated Gaussians in $\mathbb{R}^p$ in time that is linear in $p$ and $n$, and sample complexity that is independent of $p$. |

144 | Classifying Signals on Irregular Domains via Convolutional Cluster Pooling | Angelo Porrello, Davide Abati, Simone Calderara, Rita Cucchiara | We present a novel and hierarchical approach for supervised classification of signals spanning over a fixed graph, reflecting shared properties of the dataset. |

145 | Learning Rules-First Classifiers | Deborah Cohen, Amit Daniely, Amir Globerson, Gal Elidan | In this work, we focus on one such setting, where a label is perfectly predictable if the input contains certain features, or rules, and otherwise it is predictable by a linear classifier. |

146 | Wasserstein regularization for sparse multi-task regression | Hicham Janati, Marco Cuturi, Alexandre Gramfort | We focus in this paper on high-dimensional regression problems where each regressor can be associated to a location in a physical space, or more generally a generic geometric space. |

147 | Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors | Atsushi Nitanda, Taiji Suzuki | In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions. |

148 | Black Box Quantiles for Kernel Learning | Anthony Tompkins, Ransalu Senanayake, Philippe Morere, Fabio Ramos | In this paper, we present a novel technique to learn kernels that best fit the data. |

149 | Adversarial Variational Optimization of Non-Differentiable Simulators | Gilles Louppe, Joeri Hermans, Kyle Cranmer | We introduce Adversarial Variational Optimization (AVO), a likelihood-free inference algorithm for fitting a non-differentiable generative model incorporating ideas from generative adversarial networks, variational optimization and empirical Bayes. |

150 | Active Probabilistic Inference on Matrices for Pre-Conditioning in Stochastic Optimization | Filip Roos, Philipp Hennig | We propose an iterative algorithm inspired by classic iterative linear solvers that uses a probabilistic model to actively infer a pre-conditioner in situations where Hessian-projections can only be constructed with strong Gaussian noise. |

151 | Projection Free Online Learning over Smooth Sets | Kfir Levy, Andreas Krause | In this work we focus on the special case where the constraint set is smooth and we have an access to gradient and value oracles of the constraint function. |

152 | Confidence Scoring Using Whitebox Meta-models with Linear Classifier Probes | Tongfei Chen, Jiri Navratil, Vijay Iyengar, Karthikeyan Shanmugam | We propose a novel confidence scoring mechanism for deep neural networks based on a two-model paradigm involving a base model and a meta-model. |

153 | Learning Influence-Receptivity Network Structure with Guarantee | Ming Yu, Varun Gupta, Mladen Kolar | In this paper we consider the case where we observe a sequence of noisy adjacency matrices triggered by information/events with different topic distributions. |

154 | Iterative Bayesian Learning for Crowdsourced Regression | Jungseul Ok, Sewoong Oh, Yunhun Jang, Jinwoo Shin, Yung Yi | In particular, we introduce a Bayesian iterative scheme and show that it provably achieves the optimal mean squared error. |

155 | Nonconvex Matrix Factorization from Rank-One Measurements | Yuanxin Li, Cong Ma, Yuxin Chen, Yuejie Chi | We consider the problem of recovering low-rank matrices from random rank-one measurements, which spans numerous applications including phase retrieval, quantum state tomography, and learning shallow neural networks with quadratic activations, among others. |

156 | Fast and Robust Shortest Paths on Manifolds Learned from Data | Georgios Arvanitidis, Soren Hauberg, Philipp Hennig, Michael Schober | We propose a fast, simple and robust algorithm for computing shortest paths and distances on Riemannian manifolds learned from data. |

157 | Training a Spiking Neural Network with Equilibrium Propagation | Peter O�Connor, Efstratios Gavves, Max Welling | In this paper, we address problem (2) by proposing a way in which Equilibrium Propagation can be implemented with neurons which are constrained to just communicate binary values at each time step. |

158 | Learning One-hidden-layer ReLU Networks via Gradient Descent | Xiao Zhang, Yaodong Yu, Lingxiao Wang, Quanquan Gu | We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network. |

159 | Gain estimation of linear dynamical systems using Thompson Sampling | Matias I. M�ller, Cristian R. Rojas | We present the gain estimation problem for linear dynamical systems as a multi-armed bandit. |

160 | Universal Hypothesis Testing with Kernels: Asymptotically Optimal Tests for Goodness of Fit | Shengyu Zhu, Biao Chen, Pengfei Yang, Zhitang Chen | We characterize the asymptotic performance of nonparametric goodness of fit testing. |

161 | Calibrating Deep Convolutional Gaussian Processes | Gia-Lac Tran, Edwin V. Bonilla, John Cunningham, Pietro Michiardi, Maurizio Filippone | In this paper we show that, in fact, current combinations of CNNs and GPs are miscalibrated. |

162 | Stochastic algorithms with descent guarantees for ICA | Pierre Ablin, Alexandre Gramfort, Jean-Fran�ois Cardoso, Francis Bach | We develop a new majorization-minimization framework adapted to this loss function. |

163 | Sample Complexity of Sinkhorn Divergences | Aude Genevay, L�na�c Chizat, Francis Bach, Marco Cuturi, Gabriel Peyr� | We focus in this paper on Sinkhorn divergences (SDs), a regularized variant of OT distances which can interpolate, depending on the regularization strength $\varepsilon$, between OT ($\varepsilon=0$) and MMD ($\varepsilon=\infty$). |

164 | Adaptive Gaussian Copula ABC | Yanzhi Chen, Michael U. Gutmann | This work presents a simple yet effective ABC algorithm based on the combination of two classical ABC approaches — regression ABC and sequential ABC. |

165 | Top Feasible Arm Identification | Julian Katz-Samuels, Clayton Scott | We propose a new variant of the top arm identification problem, \emph{top feasible arm identification}, where there are $K$ arms associated with $D$-dimensional distributions and the goal is to find $m$ arms that maximize some known linear function of their means subject to the constraint that their means belong to a given set $P \subset R^D$. |

166 | Direct Acceleration of SAGA using Sampled Negative Momentum | Kaiwen Zhou, Qinghua Ding, Fanhua Shang, James Cheng, Danli Li, Zhi-Quan Luo | In this paper, we propose a directly accelerated variant of SAGA using a novel Sampled Negative Momentum (SSNM), which achieves the best known oracle complexity for strongly convex problems (with known strong convexity parameter). |

167 | Does data interpolation contradict statistical optimality? | Mikhail Belkin, Alexander Rakhlin, Alexandre B. Tsybakov | We show that classical learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss. |

168 | Inverting Supervised Representations with Autoregressive Neural Density Models | Charlie Nash, Nate Kushman, Christopher K.I. Williams | We present a method for feature interpretation that makes use of recent advances in autoregressive density estimation models to invert model representations. |

169 | Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning | Guillaume Rabusseau, Tianyu Li, Doina Precup | In this paper, we unravel a fundamental connection between weighted finite automata (WFAs) and second-order recurrent neural networks (2-RNNs): in the case of sequences of discrete symbols, WFAs and 2-RNNs with linear activation functions are expressively equivalent. |

170 | A Family of Exact Goodness-of-Fit Tests for High-Dimensional Discrete Distributions | Feras A. Saad, Cameron E. Freer, Nathanael L. Ackerman, Vikash K. Mansinghka | This paper presents a rank-based family of goodness-of-fit tests that is specialized to discrete distributions on high-dimensional domains. |

171 | Differentially Private Online Submodular Minimization | Adrian Rivera Cardoso, Rachel Cummings | In this paper we develop the first algorithms for online submodular minimization that preserve differential privacy under full information feedback and bandit feedback. |

172 | Semi-supervised clustering for de-duplication | Shrinu Kushagra, Shai Ben-David, Ihab Ilyas | In this work, we view de-duplication as a clustering problem where the goal is to put records corresponding to the same physical entity in the same cluster and putting records corresponding to different physical entities into different clusters. |

173 | Finding the bandit in a graph: Sequential search-and-stop | Pierre Perrault, Vianney Perchet, Michal Valko | In this paper, we address a learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. |

174 | Statistical Learning under Nonstationary Mixing Processes | Steve Hanneke, Liu Yang | Under these conditions, we propose a learning method, and establish that for bounded VC subgraph classes, the cumulative excess risk grows sublinearly in the number of predictions, at a quantified rate. |

175 | On Structure Priors for Learning Bayesian Networks | Ralf Eggeling, Jussi Viinikka, Aleksis Vuoksenmaa, Mikko Koivisto | We investigate empirically several structure priors in combination with different scores, using benchmark data sets and data sets generated from benchmark networks. |

176 | Partial Optimality of Dual Decomposition for MAP Inference in Pairwise MRFs | Alexander Bauer, Shinichi Nakajima, Nico Goernitz, Klaus-Robert M�ller | In this paper, we explain how the fully integral assignments obtained via DD partially agree with the optimal fractional assignments via LP relaxation when the latter is not tight. |

177 | Sparse Feature Selection in Kernel Discriminant Analysis via Optimal Scoring | Alexander F. Lapanowski, Irina Gaynanova | We consider the two-group classification problem and propose a kernel classifier based on the optimal scoring framework. |

178 | Learning Natural Programs from a Few Examples in Real-Time | Nagarajan Natarajan, Danny Simmons, Naren Datha, Prateek Jain, Sumit Gulwani | In this work, we develop a novel, real-time, ML-based program ranking algorithm that enables synthesis of natural, user-intended, personalized programs. |

179 | Truncated Back-propagation for Bilevel Optimization | Amirreza Shaban, Ching-An Cheng, Nathan Hatch, Byron Boots | In this paper, we analyze the properties of this family of approximate gradients and establish sufficient conditions for convergence. |

180 | Empirical Risk Minimization and Stochastic Gradient Descent for Relational Data | Victor Veitch, Morgane Austern, Wenda Zhou, David M. Blei, Peter Orbanz | We solve this problem using recent ideas from graph sampling theory to (i) define an empirical risk for relational data and (ii) obtain stochastic gradients for this empirical risk that are automatically unbiased. |

181 | Variable selection for Gaussian processes via sensitivity analysis of the posterior predictive distribution | Topi Paananen, Juho Piironen, Michael Riis Andersen, Aki Vehtari | To improve on this, we propose two novel variable selection methods for Gaussian process models that utilize the predictions of a full model in the vicinity of the training points and thereby rank the variables based on their predictive relevance. |

182 | Lifted Weight Learning of Markov Logic Networks Revisited | Ondrej Kuzelka, Vyacheslav Kungurtsev | We study lifted weight learning of Markov logic networks. |

183 | Causal Discovery in the Presence of Missing Data | Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Hedvig Kjellstr�m, Kun Zhang | In this paper, we aim at developing a causal discovery method to recover the underlying causal structure from observed data that are missing under different mechanisms, including missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). |

184 | Learning Tree Structures from Noisy Data | Konstantinos E. Nikolakakis, Dionysios S. Kalogerias, Anand D. Sarwate | We study the impact of measurement noise on the task of learning the underlying tree structure via the well-known \textit{Chow-Liu algorithm} and provide formal sample complexity guarantees for exact recovery. |

185 | Active multiple matrix completion with adaptive confidence sets | Andrea Locatelli, Alexandra Carpentier, Michal Valko | We address the problem of an active setting for a matrix completion, where the learner can choose, from which matrix, it receives a sample (drawn uniformly at random). |

186 | Confidence-based Graph Convolutional Networks for Semi-Supervised Learning | Shikhar Vashishth, Prateek Yadav, Manik Bhandari, Partha Talukdar | We fill this important gap in this paper and propose ConfGCN, which estimates labels scores along with their confidences jointly in GCN-based setting. |

187 | Negative Momentum for Improved Game Dynamics | Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, R�mi Le Priol, Gabriel Huang, Simon Lacoste-Julien, Ioannis Mitliagkas | In this paper, we analyze gradient-based methods with momentum on simple games. |

188 | Deep learning with differential Gaussian process flows | Pashupati Hegde, Markus Heinonen, Harri L�hdesm�ki, Samuel Kaski | We propose a novel deep learning paradigm of differential flows that learn a stochastic differential equation transformations of inputs prior to a standard classification or regression function. |

189 | Data-dependent compression of random features for large-scale kernel approximation | Raj Agrawal, Trevor Campbell, Jonathan Huggins, Tamara Broderick | We propose to combine the simplicity and generality of RFMs with a data-dependent feature selection scheme to achieve desirable theoretical approximation properties of Nyström with just $O(\log J+)$ features. |

190 | Large-Margin Classification in Hyperbolic Space | Hyunghoon Cho, Benjamin DeMeo, Jian Peng, Bonnie Berger | To enable accurate classification of points in hyperbolic space while respecting their hyperbolic geometry, we introduce hyperbolic SVM, a hyperbolic formulation of support vector machine classifiers, and describe its theoretical connection to the Euclidean counterpart. |

191 | Generalizing the theory of cooperative inference | Pei Wang, Pushpi Paranamana, Patrick Shafto | We provide geometric interpretations, connections to and implications for optimal transport, and connections to importance sampling, and conclude by outlining open questions and challenges to realizing the promise of Cooperative Inference. |

192 | MaxHedge: Maximizing a Maximum Online | Stephen Pasteris, Fabio Vitale, Kevin Chan, Shiqiang Wang, Mark Herbster | We introduce a new online learning framework where, at each trial, the learner is required to select a subset of actions from a given known action set. |

193 | The Gaussian Process Autoregressive Regression Model (GPAR) | James Requeima, William Tebbutt, Wessel Bruinsma, Richard E. Turner | We present the Gaussian Process Autoregressive Regression (GPAR) model, a scalable multi-output GP model that is able to capture nonlinear, possibly input-varying, dependencies between outputs in a simple and tractable way: the product rule is used to decompose the joint distribution over the outputs into a set of conditionals, each of which is modelled by a standard GP. |

194 | Towards Optimal Transport with Global Invariances | David Alvarez-Melis, Stefanie Jegelka, Tommi S. Jaakkola | In this work, we propose a general framework for optimal transport in the presence of latent global transformations. |

195 | Unsupervised Alignment of Embeddings with Wasserstein Procrustes | Edouard Grave, Armand Joulin, Quentin Berthet | In this paper, we propose to use an alternative formulation, based on the joint estimation of an orthogonal matrix and a permutation matrix. |

196 | Sequential Patient Recruitment and Allocation for Adaptive Clinical Trials | Onur Atan, William R. Zame, Mihaela Schaar | Because finding the exact solution to this Markov Decision Process is computationally intractable, we propose an algorithm Knowledge Gradient for Randomized Controlled Trials (RCT-KG) – that yields an approximate solution. |

197 | Probabilistic Forecasting with Spline Quantile Function RNNs | Jan Gasthaus, Konstantinos Benidis, Yuyang Wang, Syama Sundar Rangapuram, David Salinas, Valentin Flunkert, Tim Januschowski | In this paper, we propose a flexible method for probabilistic modeling with conditional quantile functions using monotonic regression splines. |

198 | Exponential Weights on the Hypercube in Polynomial Time | Sudeep Raja Putta, Abhishek Shetty | In this paper, we present a polynomial time algorithm called PolyExp for OLO on the hypercube. |

199 | Sharp Analysis of Learning with Discrete Losses | Alex Nowak, Francis Bach, Alessandro Rudi | In this paper we study a least-squares framework to systematically design learning algorithms for discrete losses, with quantitative characterizations in terms of statistical and computational complexity. |

200 | Designing Optimal Binary Rating Systems | Nikhil Garg, Ramesh Johari | We consider the optimal design of rating systems that collect binary feedback after transactions. |

201 | Stochastic Negative Mining for Learning with Large Output Spaces | Sashank J. Reddi, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Jiecao Chen, Sanjiv Kumar | We consider the problem of retrieving the most relevant labels for a given input when the size of the output space is very large. |

202 | Learning One-hidden-layer Neural Networks under General Input Distributions | Weihao Gao, Ashok V. Makkuva, Sewoong Oh, Pramod Viswanath | In this paper, we provide a novel unified framework to design loss functions with desirable landscape properties for a wide range of general input distributions. |

203 | A Geometric Perspective on the Transferability of Adversarial Directions | Zachary Charles, Harrison Rosenberg, Dimitris Papailiopoulos | In this work, we show that in the context of linear classifiers and two-layer ReLU networks, there provably exist directions that give rise to adversarial perturbations for many classifiers and data points simultaneously. |

204 | Non-linear process convolutions for multi-output Gaussian processes | Mauricio A. Alvarez, Wil Ward, Cristian Guarnizo | The paper introduces a non-linear version of the process convolution formalism for building covariance functions for multi-output Gaussian processes. |

205 | Lovasz Convolutional Networks | Prateek Yadav, Madhav Nimishakavi, Naganand Yadati, Shikhar Vashishth, Arun Rajkumar, Partha Talukdar | In this work, we propose Lovasz Convolutional Network (LCNs) which are capable of incorporating global graph properties. |

206 | Bridging the gap between regret minimization and best arm identification, with application to A/B tests | R�my Degenne, Thomas Nedelec, Clement Calauzenes, Vianney Perchet | This analysis sheds light on the common observation that ill-callibrated UCB-algorithms minimize regret while still identifying quickly the best arm. |

207 | Gaussian Process Modulated Cox Processes under Linear Inequality Constraints | Andr�s F. Lopez-lopera, ST John, Nicolas Durrande | We introduce a novel finite approximation of GP-modulated Cox processes where positiveness conditions can be imposed directly on the GP, with no restrictions on the covariance function. |

208 | Implicit Kernel Learning | Chun-Liang Li, Wei-Cheng Chang, Youssef Mroueh, Yiming Yang, Barnabas Poczos | We investigate two applications of the proposed IKL as examples, including generative adversarial networks with MMD (MMD GAN) and standard supervised learning. |

209 | Bounding Inefficiency of Equilibria in Continuous Actions Games using Submodularity and Curvature | Pier Giuseppe Sessa, Maryam Kamgarpour, Andreas Krause | In this paper, we define the class of valid utility games with continuous strategies and provide efficiency bounds for their CCEs. |

210 | Variational Information Planning for Sequential Decision Making | Jason Pacheco, John Fisher | As MI typically lacks a closed form, we propose an approach which maintains variational approximations of, both, the posterior and MI utility. |

211 | Renyi Differentially Private ERM for Smooth Objectives | Chen Chen, Jaewoo Lee, Dan Kifer | In this paper, we present a Renyi Differentially Private stochastic gradient descent (SGD) algorithm for convex empirical risk minimization. |

212 | Projection-Free Bandit Convex Optimization | Lin Chen, Mingrui Zhang, Amin Karbasi | In this paper, we propose the first computationally efficient projection-free algorithm for bandit convex optimization (BCO) with a general convex constraint. |

213 | Provable Robustness of ReLU networks via Maximization of Linear Regions | Francesco Croce, Maksym Andriushchenko, Matthias Hein | We propose in this paper a regularization scheme for ReLU networks which provably improves the robustness of the classifier by maximizing the linear regions of the classifier as well as the distance to the decision boundary. |

214 | Test without Trust: Optimal Locally Private Distribution Testing | Jayadev Acharya, Clement Canonne, Cody Freitag, Himanshu Tyagi | We study the problem of distribution testing when the samples can only be accessed using a locally differentially private mechanism and focus on two representative testing questions of identity (goodness-of-fit) and independence testing for discrete distributions. |

215 | Distributed Maximization of “Submodular plus Diversity” Functions for Multi-label Feature Selection on Huge Datasets | Mehrdad Ghadiri, Mark Schmidt | In this paper, we consider this problem as an optimization problem that seeks to maximize the sum of a sum-sum diversity function and a non-negative monotone submodular function. |

216 | On Euclidean k-Means Clustering with alpha-Center Proximity | Amit Deshpande, Anand Louis, Apoorv Singh | $k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are \emph{stable} under additive or multiplicative perturbation of data. |

217 | Noisy Blackbox Optimization using Multi-fidelity Queries: A Tree Search Approach | Rajat Sen, Kirthevasan Kandasamy, Sanjay Shakkottai | In this work, we combine structured state-space exploration through hierarchical partitioning with querying these partitions at multiple fidelities, and develop a multi-fidelity bandit based tree-search algorithm for noisy black-box optimization. |

218 | Safe Convex Learning under Uncertain Constraints | Ilnura Usmanova, Andreas Krause, Maryam Kamgarpour | We propose a new variant of the Frank-Wolfe algorithm, which applies to the case of uncertain linear constraints. |

219 | The non-parametric bootstrap and spectral analysis in moderate and high-dimension | Noureddine El Karoui, Elizabeth Purdom | We focus on the modern framework where p/n is not close to 0 but remains bounded as n and p tend to infinity. |

220 | Knockoffs for the Mass: New Feature Importance Statistics with False Discovery Guarantees | Jaime Roquero Gimenez, Amirata Ghorbani, James Zou | We develop an efficient algorithm to generate valid knockoffs from Bayesian Networks. |

221 | Training Variational Autoencoders with Buffered Stochastic Variational Inference | Rui Shu, Hung Bui, Jay Whang, Stefano Ermon | In this paper, we propose the Buffered Stochastic Variational Inference (BSVI), a new refinement procedure that makes use of SVI’s sequence of intermediate variational proposal distributions and their corresponding importance weights to construct a new generalized importance-weighted lower bound. |

222 | Regularized Contextual Bandits | Xavier Fontaine, Quentin Berthet, Vianney Perchet | To tackle this problem we use a nonparametric model and propose an algorithm splitting the context space into bins, and solving simultaneously — and independently — regularized multi-armed bandit instances on each bin. |

223 | Risk-Sensitive Generative Adversarial Imitation Learning | Jonathan Lacotte, Mohammad Ghavamzadeh, Yinlam Chow, Marco Pavone | We consider the generative adversarial approach to imitation learning (GAIL) and derive an optimization problem for our formulation, which we call it risk- sensitive GAIL (RS-GAIL). |

224 | Learning Controllable Fair Representations | Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, Stefano Ermon | We propose an information-theoretically motivated objective for learning maximally expressive representations subject to fairness constraints. |

225 | Multi-Task Time Series Analysis applied to Drug Response Modelling | Alex Bird, Chris Williams, Christopher Hawthorne | In this paper we show how these models can be personalised to an individual level while retaining statistical power, via use of multi-task learning (MTL). |

226 | Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization | Jaime Roquero Gimenez, James Zou | Here we present a generalization of the knockoff procedure that we call simultaneous multi-knockoffs. |

227 | Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features | Arno Solin, Manon Kok | Gaussian processes (GPs) provide a powerful framework for extrapolation, interpolation, and noise removal in regression and classification. |

228 | Distributional reinforcement learning with linear function approximation | Marc G. Bellemare, Nicolas Le Roux, Pablo Samuel Castro, Subhodeep Moitra | In this paper we adapt the Cramer distance to deal with arbitrary vectors. |

229 | Matroids, Matchings, and Fairness | Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Sergei Vassilvtiskii | In this work we extend this line of work to include algorithms for optimization subject to one or multiple matroid constraints. |

230 | Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function | Wojciech Tarnowski, Piotr Warchol, Stanislaw Jastrz?bski, Jacek Tabor, Maciej Nowak | We propose that in ResNets this can be resolved based on our results by ensuring the same level of dynamical isometry at initialization. |

231 | The Termination Critic | Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup | In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. |

232 | Consistent Online Optimization: Convex and Submodular | Mohammad Reza Karimi Jaghargh, Andreas Krause, Silvio Lattanzi, Sergei Vassilvtiskii | In this work we develop online learning algorithms that update a sublinear number of times. |

233 | Learning Determinantal Point Processes by Corrective Negative Sampling | Zelda Mariet, Mike Gartrell, Suvrit Sra | To address this issue, which reduces the quality of the learned model, we introduce a novel optimization problem, Contrastive Estimation (CE), which encodes information about "negative" samples into the basic learning model. |

234 | Probabilistic Semantic Inpainting with Pixel Constrained CNNs | Emilien Dupont, Suhas Suresha | In this paper, we propose a method to perform probabilistic semantic inpainting by building a model, based on PixelCNNs, that learns a distribution of images conditioned on a subset of visible pixels. |

235 | Least Squares Estimation of Weakly Convex Functions | Sun Sun, Yaoliang Yu | In this work we argue that convexity, as a global property, is too strict and prone to outliers. |

236 | Interval Estimation of Individual-Level Causal Effects Under Unobserved Confounding | Nathan Kallus, Xiaojie Mao, Angela Zhou | We study the problem of learning conditional average treatment effects (CATE) from observational data with unobserved confounders. |

237 | Amortized Variational Inference with Graph Convolutional Networks for Gaussian Processes | Linfeng Liu, Liping Liu | In this paper, we focus on the approximation of GP posterior at a local level: we define a reusable template to approximate the posterior at neighborhoods while maintaining a global approximation. |

238 | Online Decentralized Leverage Score Sampling for Streaming Multidimensional Time Series | Rui Xie, Zengyan Wang, Shuyang Bai, Ping Ma, Wenxuan Zhong | We propose a leverage score sampling (LSS) method for efficient online inference of the streaming vector autoregressive (VAR) model. |

239 | Interpretable Cascade Classifiers with Abstention | Matthieu Clertant, Nataliya Sokolovska, Yann Chevaleyre, Blaise Hanczar | In this contribution, we develop a POMDP-based framework to learn cost-sensitive heterogeneous cascading systems. |

240 | Kernel Exponential Family Estimation via Doubly Dual Embedding | Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He | We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space. |

241 | Revisiting Adversarial Risk | Arun Sai Suggala, Adarsh Prasad, Vaishnavh Nagarajan, Pradeep Ravikumar | In this work, we show that the main reason behind this confusion is the inaccurate definition of adversarial perturbation that is used in the literature. |

242 | A Memoization Framework for Scaling Submodular Optimization to Large Scale Problems | Rishabh Iyer, Jeffrey Bilmes | In this paper, we present a new model called the pre-computational complexity model, along with a unifying memoization based framework, which looks at the specific form of the given submodular function. |

243 | Bernoulli Race Particle Filters | Sebastian M. Schmon, Arnaud Doucet, George Deligiannidis | We propose here a novel algorithm that allows for resampling according to the true intractable weights when only an unbiased estimator of the weights is available. |

244 | Augmented Ensemble MCMC sampling in Factorial Hidden Markov Models | Kaspar M�rtens, Michalis Titsias, Christopher Yau | We introduce a general purpose ensemble Markov Chain Monte Carlo (MCMC) technique to improve on existing poorly mixing samplers. |

245 | Probabilistic Riemannian submanifold learning with wrapped Gaussian process latent variable models | Anton Mallasto, S�ren Hauberg, Aasa Feragen | Latent variable models (LVMs) learn probabilistic models of data manifolds lying in an ambient Euclidean space. |

246 | Unbiased Smoothing using Particle Independent Metropolis-Hastings | Lawrece Middleton, George Deligiannidis, Arnaud Doucet, Pierre E. Jacob | We propose a simple way of coupling two MCMC chains built using Particle Independent Metropolis-Hastings (PIMH) to produce unbiased smoothing estimators. |

247 | Two-temperature logistic regression based on the Tsallis divergence | Ehsan Amid, Manfred K. Warmuth, Sriram Srinivasan | We develop a variant of multiclass logistic regression that is significantly more robust to noise. |

248 | Avoiding Latent Variable Collapse with Generative Skip Models | Adji B. Dieng, Yoon Kim, Alexander M. Rush, David M. Blei | In this paper, we propose a simple new way to avoid latent variable collapse by including skip connections in our generative model; these connections enforce strong links between the latent variables and the likelihood function. |

249 | SMOGS: Social Network Metrics of Game Success | Fan Bu, Sonia Xu, Katherine Heller, Alexander Volfovsky | In this paper we propose a novel metric of basketball game success, derived from a team’s dynamic social network of game play. |

250 | Fast Algorithms for Sparse Reduced-Rank Regression | Benjamin Dubois, Jean-Fran�ois Delmas, Guillaume Obozinski | In particular, based on an analysis of the geometry of the problem, we establish that a proximal Polyak-{Ł}ojasiewicz inequality is satisfied in a neighborhood of the set of optima under a condition on the regularization parameter. |

251 | Modeling simple structures and geometry for better stochastic optimization algorithms | Hilal Asi, John C. Duchi | We develop model-based methods for stochastic optimization problems, introducing the approximate-proximal point, or aProx, family, which includes stochastic subgradient, proximal point, and bundle methods. |

252 | Online learning with feedback graphs and switching costs | Anshuka Rangi, Massimo Franceschetti | We propose two new algorithms: Threshold Based EXP3 and EXP3.SC. |

253 | Interpretable Almost-Exact Matching for Causal Inference | Awa Dieng, Yameng Liu, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky | The method proposed in this work aims to match units on a weighted Hamming distance, taking into account the relative importance of the covariates; the algorithm aims to match units on as many relevant variables as possible. |

254 | Statistical Optimal Transport via Factored Couplings | Aden Forrow, Jan-Christian H�tter, Mor Nitzan, Philippe Rigollet, Geoffrey Schiebinger, Jonathan Weed | We propose a new method to estimate Wasserstein distances and optimal transport plans between two probability distributions from samples in high dimension. |

255 | $HS^2$: Active learning over hypergraphs with pointwise and pairwise queries | I (Eli) Chien, Huozhi Zhou, Pan Li | We propose a hypergraph-based active learning scheme which we term $HS^2$; $HS^2$ generalizes the previously reported algorithm $S^2$ originally proposed for graph-based active learning with pointwise queries. |

256 | Clustering Time Series with Nonlinear Dynamics: A Bayesian Non-Parametric and Particle-Based Approach | Alexander Lin, Yingzhuo Zhang, Jeremy Heng, Stephen A. Allsop, Kay M. Tye, Pierre E. Jacob, Demba Ba | We propose a general statistical framework for clustering multiple time series that exhibit nonlinear dynamics into an a-priori-unknown number of sub-groups. |

257 | Efficient Nonconvex Empirical Risk Minimization via Adaptive Sample Size Methods | Aryan Mokhtari, Asuman Ozdaglar, Ali Jadbabaie | In this paper, we are interested in finding a local minimizer of an empirical risk minimization (ERM) problem where the loss associated with each sample is possibly a nonconvex function. |

258 | An Optimal Control Approach to Sequential Machine Teaching | Laurent Lessard, Xuezhou Zhang, Xiaojin Zhu | Our key insight is to formulate sequential machine teaching as a time-optimal control problem. |

259 | An Online Algorithm for Smoothed Regression and LQR Control | Gautam Goel, Adam Wierman | We consider Online Convex Optimization (OCO) in the setting where the costs are $m$-strongly convex and the online learner pays a switching cost for changing decisions between rounds. |

260 | Uncertainty Autoencoders: Learning Compressed Representations via Variational Information Maximization | Aditya Grover, Stefano Ermon | In this work, we propose Uncertainty Autoencoders, a learning framework for unsupervised representation learning inspired by compressed sensing. |

261 | Structured Disentangled Representations | Babak Esmaeili, Hao Wu, Sarthak Jain, Alican Bozkurt, N Siddharth, Brooks Paige, Dana H. Brooks, Jennifer Dy, Jan-Willem Meent | We propose a two-level hierarchical objective to control relative degree of statistical independence between blocks of variables and individual variables within blocks. |

262 | Estimating Network Structure from Incomplete Event Data | Benjamin Mark, Garvesh Raskutti, Rebecca Willett | In this work, we develop a novel approach to estimating the parameters of a BAR process in the presence of unobserved events via an unbiased estimator of the complete data log-likelihood function. |

263 | Locally Private Mean Estimation: $Z$-test and Tight Confidence Intervals | Marco Gaboardi, Ryan Rogers, Or Sheffet | This work provides tight upper- and lower-bounds for the problem of mean estimation under differential privacy in the local-model, when the input is composed of $n$ i.i.d. drawn samples from a Gaussian. |

264 | Estimation of Non-Normalized Mixture Models | Takeru Matsuda, Aapo Hyv�rinen | We develop a general method for estimating a finite mixture of non-normalized models. |

265 | Rotting bandits are no harder than stochastic ones | Julien Seznec, Andrea Locatelli, Alexandra Carpentier, Alessandro Lazaric, Michal Valko | In this paper, we consider the non-parametric rotting bandit setting, where rewards can only decrease. |

266 | A Topological Regularizer for Classifiers via Persistent Homology | Chao Chen, Xiuyan Ni, Qinxun Bai, Yusu Wang | In this paper, we initiate a new direction and propose to enforce the structural simplicity of the classification boundary by regularizing over its topological complexity. |

267 | Overcomplete Independent Component Analysis via SDP | Anastasia Podosinnikova, Amelia Perry, Alexander S. Wein, Francis Bach, Alexandre d�Aspremont, David Sontag | We present a novel algorithm for overcomplete independent components analysis (ICA), where the number of latent sources k exceeds the dimension p of observed variables. |

268 | Doubly Semi-Implicit Variational Inference | Dmitry Molchanov, Valery Kharitonov, Artem Sobolev, Dmitry Vetrov | We extend the existing framework of semi-implicit variational inference (SIVI) and introduce doubly semi-implicit variational inference (DSIVI), a way to perform variational inference and learning when both the approximate posterior and the prior distribution are semi-implicit. |

269 | Reducing training time by efficient localized kernel regression | Nicole M�ecke | We study generalization properties of kernel regularized least squares regression based on a partitioning approach. |

270 | Scalable High-Order Gaussian Process Regression | Shandian Zhe, Wei Xing, Robert M. Kirby | To address these issues, we propose HOGPR, a High-Order Gaussian Process Regression model, which can flexibly capture complex correlations among the outputs and scale up to a large number of outputs. |

271 | A Higher-Order Kolmogorov-Smirnov Test | Veeranjaneyulu Sadhanala, Yu-Xiang Wang, Aaditya Ramdas, Ryan J. Tibshirani | We present an extension of the Kolmogorov-Smirnov (KS) two-sample test, which can be more sensitive to differences in the tails. |

272 | Bayesian Learning of Conditional Kernel Mean Embeddings for Automatic Likelihood-Free Inference | Kelvin Hsu, Fabio Ramos | In this paper, we present kernel embedding likelihood-free inference (KELFI), a holistic framework that automatically learns model hyperparameters to improve inference accuracy given limited simulation budget. |

273 | Parallel Asynchronous Stochastic Coordinate Descent with Auxiliary Variables | Hsiang-Fu Yu, Cho-Jui Hsieh, Inderjit S. Dhillon | In this paper, we consider a rich family of objective functions where AUX-PCD can be applied. |

274 | Credit Assignment Techniques in Stochastic Computation Graphs | Th�ophane Weber, Nicolas Heess, Lars Buesing, David Silver | In this work, we address these problems by generalizing concepts from the reinforcement learning literature. |

275 | Efficient Bayesian Optimization for Target Vector Estimation | Anders Kirk Uhrenholt, Bj�ern Sand Jensen | We address this issue by proposing a Gaussian process model that considers the individual function outputs and derive a distribution over the resulting 2-norm. |

276 | Correspondence Analysis Using Neural Networks | Hsiang Hsu, Salman Salamatian, Flavio P. Calmon | By re-interpreting the objective in CA using an information-theoretic tool called the principal inertia components, we demonstrate that performing CA is equivalent to solving a functional optimization problem over the space of finite variance functions of two random variable. |

277 | Interpolating between Optimal Transport and MMD using Sinkhorn Divergences | Jean Feydy, Thibault S�journ�, Fran�ois-Xavier Vialard, Shun-ichi Amari, Alain Trouve, Gabriel Peyr� | This paper studies the Sinkhorn divergences, a family of geometric divergences that interpolates between MMD and OT. |

278 | Multi-Observation Regression | Rafael Frongillo, Nishant A. Mehta, Tom Morgan, Bo Waggoner | We propose four algorithms formalizing the concept of ERM for this problem, two of which have statistical guarantees in settings allowing both slow and fast convergence rates, but which are out-performed empirically by the other two. |

279 | Adaptive MCMC via Combining Local Samplers | Ki�rash Shaloudegi, Andr�s Gy�rgy | Here we take a different approach and, similarly to parallel MCMC methods, instead of trying to find a single chain that samples from the whole distribution, we combine samples from several chains run in parallel, each exploring only parts of the state space (e.g., a few modes only). |

280 | Variance reduction properties of the reparameterization trick | Ming Xu, Matias Quiroz, Robert Kohn, Scott A. Sisson | We explore this under the idealized assumptions that the variational approximation is a mean-field Gaussian density and that the log of the joint density of the model parameters and the data is a quadratic function that depends on the variational mean. |

281 | Hierarchical Clustering for Euclidean Data | Moses Charikar, Vaggos Chatziafratis, Rad Niazadeh, Grigory Yaroslavtsev | In this paper we take the first step and give novel scalable algorithms for this problem tailored to Euclidean data in R^d and under vector-based similarity measures, a prevalent model in several typical machine learning applications. |

282 | Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization | Zhe Wang, Yi Zhou, Yingbin Liang, Guanghui Lan | In this paper, we propose a stochastic variance-reduced cubic-regularization (SVRC) method under random sampling, and study its convergence guarantee as well as sample complexity. |

283 | Variational Noise-Contrastive Estimation | Benjamin Rhodes, Michael U. Gutmann | To increase the number of techniques in our arsenal, we propose variational noise-contrastive estimation (VNCE), building on NCE which is a method that only applies to unnormalised models. |

284 | Improving Quadrature for Constrained Integrands | Henry R. Chai, Roman Garnett | We present an improved Bayesian framework for performing inference of affine transformations of constrained functions. |

285 | High Dimensional Inference in Partially Linear Models | Ying Zhu, Zhuqing Yu, Guang Cheng | We propose two semiparametric versions of the debiased Lasso procedure for the model $Y_{i}=X_{i}\beta_{0}+g_{0}(Z_{i})+\varepsilon_{i}$, where the parameter vector of interest $\beta_{0}$ is high dimensional but sparse (exactly or approximately) and $g_{0}$ is an unknown nuisance function. |

286 | Cost aware Inference for IoT Devices | Pengkai Zhu, Durmus Alp Emre Acar, Nan Feng, Prateek Jain, Venkatesh Saligrama | We propose to train a decentralized gated network that, given an observed instance at test-time, allows for activation of select devices to transmit information to a central node, which then performs inference. |

287 | Banded Matrix Operators for Gaussian Markov Models in the Automatic Differentiation Era | Nicolas Durrande, Vincent Adam, Lucas Bordeaux, Stefanos Eleftheriadis, James Hensman | The aim of the paper is to make modern inference methods (such as variational inference or gradient-based sampling) available for Gaussian models with banded precision. |

288 | A Unified Weight Learning Paradigm for Multi-view Learning | Lai Tian, Feiping Nie, Xuelong Li | In this paper, we present a Unified Paradigm (UP) that contains the aforementioned three popular paradigms as special cases. |

289 | Region-Based Active Learning | Corinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Ningshan Zhang | We give a detailed theoretical analysis of ORIWAL, including generalization error guarantees and bounds on the number of points labeled, in terms of both the hypothesis set used in each region and the probability mass of that region. |

290 | Precision Matrix Estimation with Noisy and Missing Data | Roger Fan, Byoungwook Jang, Yuekai Sun, Shuheng Zhou | We develop an alternating direction method of multipliers (ADMM) algorithm for these problems, providing a feasible algorithm to estimate precision matrices with indefinite input and potentially nonconvex penalties. |

291 | Exploring $k$ out of Top $?$ Fraction of Arms in Stochastic Bandits | Wenbo Ren, Jia Liu, Ness B. Shroff | We prove lower bounds for the four variants (finite or infinite arms, and known or unknown threshold), and propose algorithms for each. |

292 | AutoML from Service Provider�s Perspective: Multi-device, Multi-tenant Model Selection with GP-EI | Chen Yu, Bojan Karla�, Jie Zhong, Ce Zhang, Ji Liu | In this paper, we focus on the AutoML problem from the \emph{service provider’s perspective}, motivated by the following practical consideration: When an AutoML service needs to serve {\em multiple users} with {\em multiple devices} at the same time, how can we allocate these devices to users in an efficient way? |

293 | On Theory for BART | Veronika Rockov�, Enakshi Saha | Here, we study the exact BART prior and propose a simple modification so that it also enjoys optimality properties. |

294 | Deep Topic Models for Multi-label Learning | Rajat Panda, Ankit Pensia, Nikhil Mehta, Mingyuan Zhou, Piyush Rai | We present a probabilistic framework for multi-label learning based on a deep generative model for the binary label vector associated with each observation. |

295 | On the Dynamics of Gradient Descent for Autoencoders | Thanh V. Nguyen, Raymond K. W. Wong, Chinmay Hegde | We focus on three generative models for data that are common in statistical machine learning: (i) the mixture-of-gaussians model, (ii) the sparse coding model, and (iii) the sparsity model with non-negative coefficients. |

296 | Complexities in Projection-Free Stochastic Non-convex Minimization | Zebang Shen, Cong Fang, Peilin Zhao, Junzhou Huang, Hui Qian | For constrained nonconvex minimization problems, we propose a meta stochastic projection-free optimization algorithm, named Normalized Frank Wolfe Updating, that can take any Gradient Estimator (GE) as input. |

297 | Differentiable Antithetic Sampling for Variance Reduction in Stochastic Variational Inference | Mike Wu, Noah Goodman, Stefano Ermon | In this paper, we explore a technique that uses correlated, but more representative, samples to reduce estimator variance. |

298 | Efficient Greedy Coordinate Descent for Composite Problems | Sai Praneeth Karimireddy, Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi | In this paper, we consider greedy updates that are based on subgradients for a class of non-smooth composite problems, including $L1$-regularized problems, SVMs and related applications. |

299 | Decentralized Gradient Tracking for Continuous DR-Submodular Maximization | Jiahao Xie, Chao Zhang, Zebang Shen, Chao Mi, Hui Qian | In this paper, we focus on the continuous DR-submodular maximization over a network. |

300 | Adaptive Rao-Blackwellisation in Gibbs Sampling for Probabilistic Graphical Models | Craig Kelly, Somdeb Sarkhel, Deepak Venugopal | In this paper, we propose an adaptive approach for Rao-Blackwellisation, where we add parallel Markov chains defined over different collapsed PGM structures. |

301 | Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems | Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter Bartlett, Martin Wainwright | We study derivative-free methods for policy optimization over the class of linear policies. |

302 | Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective | Anirudh Vemula, Wen Sun, J. Bagnell | We examine these black-box methods closely to identify situations in which they are worse than action space exploration methods and those in which they are superior. |

303 | Sampling from Non-Log-Concave Distributions via Variance-Reduced Gradient Langevin Dynamics | Difan Zou, Pan Xu, Quanquan Gu | We study stochastic variance reduction-based Langevin dynamic algorithms, SVRG-LD and SAGA-LD \citep{dubey2016variance}, for sampling from non-log-concave distributions. |

304 | Graph to Graph: a Topology Aware Approach for Graph Structures Learning and Generation | Mingming Sun, Ping Li | Primarily, we focus on the issue of how to effectively learn the topology of the source graph and then decode it to form the topology of the target graph. |

305 | Imitation-Regularized Offline Learning | Yifei Ma, Yu-Xiang Wang, Balakrishnan Narayanaswamy | With probability logging, our main theoretical contribution connects IML-underfitting to the existence of either confounding variables or model misspecification. |

306 | A maximum-mean-discrepancy goodness-of-fit test for censored data | Tamara Fernandez, Arthur Gretton | We introduce a kernel-based goodness-of-fit test for censored data, where observations may be missing in random time intervals: a common occurrence in clinical trials and industrial life-testing. |

307 | Sobolev Descent | Youssef Mroueh, Tom Sercu, Anant Raj | We introduce Sobolev descent that constructs similar paths by following gradient flows of a critic function in a kernel space or parametrized by a neural network. |

308 | Learning the Structure of a Nonstationary Vector Autoregression | Daniel Malinsky, Peter Spirtes | We demonstrate our approach on both simulated and real macroeconomic data. |

309 | Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning | Tadashi Kozuno, Eiji Uchibe, Kenji Doya | In this paper, we propose and analyze conservative value iteration, which unifies value iteration, soft value iteration, advantage learning, and dynamic policy programming. |

310 | A Fast Sampling Algorithm for Maximum Inner Product Search | QIN DING, Hsiang-Fu Yu, Cho-Jui Hsieh | In this paper, we propose Sampling-MIPS, which is the first sampling based algorithm that can be applied to the MIPS problem on a set of general vectors with both positive and negative values. |

311 | Minimum Volume Topic Modeling | Byoungwook Jang, Alfred Hero | We propose a new topic modeling procedure that takes advantage of the fact that the Latent Dirichlet Allocation (LDA) log-likelihood function is asymptotically equivalent to the logarithm of the volume of the topic simplex. |

312 | Binary Space Partitioning Forest | Xuhui Fan, Bin Li, Scott SIsson | We propose to generate a cutting hyperplane, which is assumed to be parallel to $d-2$ dimensions, to cut each node in the d-dimensional BSP-tree. |

313 | Improved Semi-Supervised Learning with Multiple Graphs | Krishnamurthy Viswanathan, Sushant Sachdeva, Andrew Tomkins, Sujith Ravi | We present a new approach for graph based semi-supervised learning based on a multi-component extension to the Gaussian MRF model. |

314 | Optimizing over a Restricted Policy Class in MDPs | Ershad Banijamali, Yasin Abbasi-Yadkori, Mohammad Ghavamzadeh, Nikos Vlassis | We address the problem of finding an optimal policy in a Markov decision process (MDP) under a restricted policy class defined by the convex hull of a set of base policies. |

315 | Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate | Mor Shpigel Nacson, Nathan Srebro, Daniel Soudry | We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate — in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data. |

316 | Deep Switch Networks for Generating Discrete Data and Language | Payam Delgosha, Naveen Goela | An interpretable, statistical framework is introduced for training these nonlinear networks based on a maximum-likelihood objective function. |

317 | A recurrent Markov state-space generative model for sequences | Anand Ramachandran, Steve Lumetta, Eric Klee, Deming Chen | In this article, we present a new generative model for sequences that combines both aspects, the ability to perform exact inferences and the ability to model long-term structure, by augmenting the HMM with a deterministic, continuous state variable modeled through a Recurrent Neural Network. |

318 | A Potential Outcomes Calculus for Identifying Conditional Path-Specific Effects | Daniel Malinsky, Ilya Shpitser, Thomas Richardson | In this paper we present the potential outcome calculus (po-calculus), a natural generalization of do-calculus for arbitrary potential outcomes. |

319 | Adversarial Discrete Sequence Generation without Explicit NeuralNetworks as Discriminators | Zhongliang Li, Tian Xia, Xingyu Lou, Kaihe Xu, Shaojun Wang, Jing Xiao | This paper presents a novel approach to train GANs for discrete sequence generation without resorting to an explicit neural network as the discriminator. |

320 | Adaptive Estimation for Approximate $k$-Nearest-Neighbor Computations | Daniel LeJeune, Reinhard Heckel, Richard Baraniuk | In this paper, we consider the approximate $k$-nearest-neighbor problem, which is the problem of finding a subset of O(k) points in a given set of points that contains the set of $k$ nearest neighbors of a given query point. |

321 | Model-Free Linear Quadratic Control via Reduction to Expert Prediction | Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari | In this work, we present a new model-free algorithm for controlling linear quadratic (LQ) systems, and show that its regret scales as $O(T^{\xi+2/3})$ for any small $\xi>0$ if time horizon satisfies $T>C^{1/\xi}$ for a constant $C$. |

322 | Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport | Adarsh Subbaswamy, Peter Schulam, Suchi Saria | We propose a proactive approach which learns a relationship in the training domain that will generalize to the target domain by incorporating prior knowledge of aspects of the data generating process that are expected to differ as expressed in a causal selection diagram. |

323 | Structured Robust Submodular Maximization: Offline and Online Algorithms | Nima Anari, Nika Haghtalab, Seffi Naor, Sebastian Pokutta, Mohit Singh, Alfredo Torrico | In this work, we consider robust submodular maximization with structured combinatorial constraints and give efficient algorithms with provable guarantees. |

324 | Sample-Efficient Imitation Learning via Generative Adversarial Nets | Lionel Blond�, Alexandros Kalousis | We show that our approach is simple to implement and that the learned agents remain remarkably stable, as shown in our experiments that span a variety of continuous control tasks. |

325 | Probabilistic Multilevel Clustering via Composite Transportation Distance | Nhat Ho, Viet Huynh, Dinh Phung, Michael Jordan | We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. |

326 | A General Framework for Multi-fidelity Bayesian Optimization with Gaussian Processes | Jialin Song, Yuxin Chen, Yisong Yue | In this paper, we study multi-fidelity Bayesian optimization with complex structural dependencies among multiple outputs, and propose MF-MI-Greedy, a principled algorithmic framework for addressing this problem. |

327 | Online Algorithm for Unsupervised Sensor Selection | Arun Verma, Manjesh Hanawal, Csaba Szepesvari, Venkatesh Saligrama | Our objective is to learn strategies for selecting a test that gives the best trade-off between accuracy and costs in such unsupervised sensor selection (USS) problems. |

328 | Best of many worlds: Robust model selection for online supervised learning | Vidya Muthukumar, Mitas Ray, Anant Sahai, Peter Bartlett | We introduce algorithms for online, full-information prediction that are computationally efficient and competitive with contextual tree experts of unknown complexity, in both probabilistic and adversarial settings. |

329 | Accelerating Imitation Learning with Predictive Models | Ching-An Cheng, Xinyan Yan, Evangelos Theodorou, Byron Boots | In this work, we seek to further accelerate the convergence rate of online imitation learning, thereby making it more sample efficient. |

330 | Online Learning in Kernelized Markov Decision Processes | Sayak Ray Chowdhury, Aditya Gopalan | We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions. |

331 | Lifting high-dimensional non-linear models with Gaussian regressors | Christos Thrampoulidis, Ankit Singh Rawat | We resolve this issue by proposing and analyzing an alternative convex recovery method. |

332 | Domain-Size Aware Markov Logic Networks | Happy Mittal, Ayush Bhardwaj, Vibhav Gogate, Parag Singla | As our second contribution, we present a principled solution to this problem by defining Domain-size Aware Markov Logic Networks (DA-MLNs) which can be seen as re-parameterizing the MLNs after taking domain size into consideration. |

333 | Database Alignment with Gaussian Features | Osman E. Dai, Daniel Cullina, Negar Kiyavash | We consider the problem of aligning a pair of databases with jointly Gaussian features. |

334 | Size of Interventional Markov Equivalence Classes in random DAG models | Dmitriy Katz, Karthikeyan Shanmugam, Chandler Squires, Caroline Uhler | For constant density, we show that the expected $\log$ observational MEC size asymptotically (in the number of vertices) approaches a constant. |

335 | Reparameterizing Distributions on Lie Groups | Luca Falorsi, Pim de Haan, Tim R. Davidson, Patrick Forr� | In this work we define a general framework to create reparameterizable densities on arbitrary Lie groups, and provide a detailed practitioners guide to further the ease of usage. |

336 | Revisit Batch Normalization: New Understanding and Refinement via Composition Optimization | Xiangru Lian, Ji Liu | In this paper, we try to understand BN from an optimization perspective by providing an explicit objective function associated with BN. |

337 | Multi-Order Information for Working Set Selection of Sequential Minimal Optimization | Qimao Yang, Changrong Li, Jun Guo | A new working set selection method for sequential minimal optimization (SMO) is proposed in this paper. |

338 | Harmonizable mixture kernels with variational Fourier features | Zheyang Shen, Markus Heinonen, Samuel Kaski | In this work we propose the novel harmonizable mixture kernel (HMK), a family of expressive, interpretable, non-stationary kernels derived from mixture models on the generalized spectral representation. |

339 | Multiscale Gaussian Process Level Set Estimation | Shubhanshu Shekhar, Tara Javidi | In this paper, the problem of estimating the level set of a black-box function from noisy and expensive evaluation queries is considered. |

340 | The LORACs Prior for VAEs: Letting the Trees Speak for the Data | Sharad Vikram, Matthew D. Hoffman, Matthew J. Johnson | To alleviate this problem, we propose using a flexible Bayesian nonparametric hierarchical clustering prior based on the time-marginalized coalescent (TMC). |

341 | Adversarial Learning of a Sampler Based on an Unnormalized Distribution | Chunyuan Li, Ke Bai, Jianqiao Li, Guoyin Wang, Changyou Chen, Lawrence Carin | The proposed method is compared to alternative approaches, with encouraging results demonstrated across a range of applications, including deep soft Q-learning. |

342 | Active Ranking with Subset-wise Preferences | Aadirupa Saha, Aditya Gopalan | We consider the problem of probably approximately correct (PAC) ranking $n$ items by adaptively eliciting subset-wise preference feedback. |

343 | Recovery Guarantees For Quadratic Tensors With Sparse Observations | Hongyang Zhang, Vatsal Sharan, Moses Charikar, Yingyu Liang | We consider the tensor completion problem of predicting the missing entries of a tensor. |

344 | Sample Efficient Graph-Based Optimization with Noisy Observations | Thanh Tan Nguyen, Ali Shameli, Yasin Abbasi-Yadkori, Anup Rao, Branislav Kveton | We study sample complexity of optimizing “hill-climbing friendly” functions defined on a graph under noisy observations. |

345 | Robustness Guarantees for Density Clustering | Heinrich Jiang, Jennifer Jang, Ofir Nachum | We show both robustness and consistency guarantees for a simple modification of the popular DBSCAN algorithm. |

346 | Fixing Mini-batch Sequences with Hierarchical Robust Partitioning | Shengjie Wang, Wenruo Bai, Chandrashekhar Lavania, Jeff Bilmes | We propose a general and efficient hierarchical robust partitioning framework to generate a deterministic sequence of mini-batches, one that offers assurances of being high quality, unlike a randomly drawn sequence. |

347 | Multitask Metric Learning: Theory and Algorithm | Boyu Wang, Hejia Zhang, Peng Liu, Zebang Shen, Joelle Pineau | In this paper, we study the problem of multitask metric learning (mtML). |

348 | Efficient Bayes Risk Estimation for Cost-Sensitive Classification | Daniel Andrade, Yuzuru Okajima | In this work, we show that for linear classifiers this task can be considerably simplified, leading to a one dimensional integral for which we propose an efficient approximation. |

349 | Interpreting Black Box Predictions using Fisher Kernels | Rajiv Khanna, Been Kim, Joydeep Ghosh, Sanmi Koyejo | Our goal is to ask “which training examples are most responsible for a given set of predictions”? |

350 | Representation Learning on Graphs: A Reinforcement Learning Application | Sephora Madjiheurem, Laura Toni | In this work, we study value function approximation in reinforcement learning (RL) problems with high dimensional state or action spaces via a generalized version of representation policy iteration (RPI). |

351 | ABCD-Strategy: Budgeted Experimental Design for Targeted Causal Structure Discovery | Raj Agrawal, Chandler Squires, Karren Yang, Karthikeyan Shanmugam, Caroline Uhler | Given that randomized experiments are usually expensive to perform, we propose a general framework and theory based on optimal Bayesian experimental design to select experiments for targeted causal discovery. |

352 | Batched Stochastic Bayesian Optimization via Combinatorial Constraints Design | Kevin K. Yang, Yuxin Chen, Alycia Lee, Yisong Yue | Motivated by these considerations, we investigate \emph{Batched Stochastic Bayesian Optimization} (BSBO), a novel Bayesian optimization scheme for choosing the constraints in order to guide exploration towards items with greater utility. |

353 | Convergence of Gradient Descent on Separable Data | Mor Shpigel Nacson, Jason Lee, Suriya Gunasekar, Pedro Henrique Pamplona Savarese, Nathan Srebro, Daniel Soudry | Within this family, for simple linear models we show that the optimal rates with fixed step size is indeed obtained for the commonly used exponentially tailed losses such as logistic loss. |

354 | Structured Neural Topic Models for Reviews | Babak Esmaeili, Hongyi Huang, Byron Wallace, Jan-Willem van de Meent | We present Variational Aspect-based Latent Topic Allocation (VALTA), a family of autoencoding topic models that learn aspect-based representations of reviews. |

355 | Adaptive Minimax Regret against Smooth Logarithmic Losses over High-Dimensional l1-Balls via Envelope Complexity | Kohei Miyaguchi, Kenji Yamanishi | We develop a new theoretical framework, the envelope complexity, to analyze the minimax regret with logarithmic loss functions. |

356 | Low-Dimensional Density Ratio Estimation for Covariate Shift Correction | Petar Stojanov, Mingming Gong, Jaime Carbonell, Kun Zhang | In this work, we investigate the dependence of covariate shift correction performance on the dimensionality of the features, and propose a correction method that finds a low-dimensional representation of the features, which takes into account feature relevant to the target $Y$, and exploits the density ratio of this representation for importance reweighting. |

357 | Evaluating model calibration in classification | Juozas Vaicenavicius, David Widmann, Carl Andersson, Fredrik Lindsten, Jacob Roll, Thomas Sch�n | In this work, we develop a general theoretical calibration evaluation framework grounded in probability theory, and point out subtleties present in model calibration evaluation that lead to refined interpretations of existing evaluation techniques. |

358 | Towards Gradient Free and Projection Free Stochastic Optimization | Anit Kumar Sahu, Manzil Zaheer, Soummya Kar | A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank-Wolfe algorithm makes it gradient free. |

359 | On Multi-Cause Approaches to Causal Inference with Unobserved Counfounding: Two Cautionary Failure Cases and A Promising Alternative | Alexander D�Amour | In this paper, we present two simple, analytical counterexamples that challenge the general claims that are central to these approaches. |

360 | Data-Driven Approach to Multiple-Source Domain Adaptation | Petar Stojanov, Mingming Gong, Jaime Carbonell, Kun Zhang | We propose a data-driven method to represent these changes across multiple source domains and perform unsupervised domain adaptation. |