Paper Digest: AISTATS 2019 Highlights

June 17, 2019June 18, 2020 admin

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The International Conference on Artificial Intelligence and Statistics (AISTATS) is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: AISTATS 2019 Papers

	Title	Authors	Highlight
1	Proximal Splitting Meets Variance Reduction	Fabian Pedregosa, Kilian Fatras, Mattia Casotto	In this work we propose and analyze VRTOS, a variance-reduced method to solve problems with an arbitrary number of non-smooth terms.
2	Optimal Noise-Adding Mechanism in Additive Differential Privacy	Quan Geng, Wei Ding, Ruiqi Guo, Sanjiv Kumar	We derive the optimal $(0, \delta)$-differentially private query-output independent noise-adding mechanism for single real-valued query function under a general cost-minimization framework.
3	Tossing Coins Under Monotonicity	Matey Neykov	As an application of this framework we propose a two step procedure for the binary monotone single index model, which consists of running LASSO and consequently running an isotonic regression.
4	Gaussian Regression with Convex Constraints	Matey Neykov	The focus of this paper is the linear model with Gaussian design under convex constraints.
5	Risk-Averse Stochastic Convex Bandit	Adrian Rivera Cardoso, Huan Xu	Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse.
6	Error bounds for sparse classifiers in high-dimensions	Antoine Dedieu	Thus, we propose a tractable proximal algorithm to compute it and assess its empirical performance.
7	Boosting Transfer Learning with Survival Data from Heterogeneous Domains	Alexis Bellot, Mihaela Schaar	In this work we propose a survival prediction model that is able to improve predictions on a small data domain of interest – such as a local hospital – by leveraging related data from other domains – such as data from other hospitals.
8	Resampled Priors for Variational Autoencoders	Matthias Bauer, Andriy Mnih	We propose Learned Accept/Reject Sampling (LARS), a method for constructing richer priors using rejection sampling with a learned acceptance function.
9	Scalable Bayesian Learning for State Space Models using Variational Inference with SMC Samplers	Marcel Hirt, Petros Dellaportas	We present a scalable approach to performing approximate fully Bayesian inference in generic state space models.
10	Scalable Thompson Sampling via Optimal Transport	Ruiyi Zhang, Zheng Wen, Changyou Chen, Chen Fang, Tong Yu, Lawrence Carin	In this paper, we use distribution optimization techniques to approximate the posterior distribution, solved via Wasserstein gradient flows.
11	Inferring Multidimensional Rates of Aging from Cross-Sectional Data	Emma Pierson, Pang Wei Koh, Tatsunori Hashimoto, Daphne Koller, Jure Leskovec, Nick Eriksson, Percy Liang	Motivated by the study of human aging, we present an interpretable latent-variable model that learns temporal dynamics from cross-sectional data.
12	Interaction Detection with Bayesian Decision Tree Ensembles	Junliang Du, Antonio R. Linero	We address this problem by introducing Dirichlet process forests (DP-Forests), which leverage the presence of low-order interactions by clustering the trees so that trees within the same cluster focus on detecting a specific interaction.
13	On the Interaction Effects Between Prediction and Clustering	Matt Barnes, Artur Dubrawski	In particular, we focus on the problem of estimating the out-of-cluster (OOC) prediction loss given an approximate clustering with probabilistic error rate p_0.
14	Towards a Theoretical Understanding of Hashing-Based Neural Nets	Yibo Lin, Zhao Song, Lin F. Yang	In this paper, we provide provable guarantees on some hashing-based parameter reduction methods in neural nets.
15	Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds	Pan Zhou, Xiao-Tong Yuan, Jiashi Feng	In this paper, we introduce the Riemannian SPIDER (R-SPIDER) method as a novel nonlinear-metric extension of SPIDER for efficient non-convex optimization on Riemannian manifolds.
16	LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models	Yuan Zhou, Bradley J. Gram-Hansen, Tobias Kohn, Tom Rainforth, Hongseok Yang, Frank Wood	We develop a new Low-level, First-order Probabilistic Programming Language (LF-PPL) suited for models containing a mix of continuous, discrete, and/or piecewise-continuous variables.
17	Identifiability of Generalized Hypergeometric Distribution (GHD) Directed Acyclic Graphical Models	Gunwoong Park, Hyewon Park	We introduce a new class of identifiable DAG models where the conditional distribution of each node given its parents belongs to a family of generalized hypergeometric distributions (GHD).
18	Unbiased Implicit Variational Inference	Michalis K. Titsias, Francisco Ruiz	We develop unbiased implicit variational inference (UIVI), a method that expands the applicability of variational inference by defining an expressive variational family.
19	Efficient Linear Bandits through Matrix Sketching	Ilja Kuzborskij, Leonardo Cella, Nicol� Cesa-Bianchi	We prove that two popular linear contextual bandit algorithms, OFUL and Thompson Sampling, can be made efficient using Frequent Directions, a deterministic online sketching technique.
20	Orthogonal Estimation of Wasserstein Distances	Mark Rowland, Jiri Hron, Yunhao Tang, Krzysztof Choromanski, Tamas Sarlos, Adrian Weller	In this paper, we propose a new variant of sliced Wasserstein distance, study the use of orthogonal coupling in Monte Carlo estimation of Wasserstein distances and draw connections with stratified sampling, and evaluate our approaches experimentally in a range of large-scale experiments in generative modelling and reinforcement learning.
21	Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity	Simon S. Du, Wei Hu	We consider the convex-concave saddle point problem $\min_{x}\max_{y} f(x)+y^\top A x-g(y)$ where $f$ is smooth and convex and $g$ is smooth and strongly convex.
22	Greedy and IHT Algorithms for Non-convex Optimization with Monotone Costs of Non-zeros	Shinsaku Sakaue	In this paper, we consider the case where non-zero patterns are specified by monotone set functions.
23	Block Stability for MAP Inference	Hunter Lang, David Sontag, Aravindan Vijayaraghavan	We introduce a significantly more relaxed condition that only requires portions of an input instance to be stable.
24	A Stein�Papangelou Goodness-of-Fit Test for Point Processes	Jiasen Yang, Vinayak Rao, Jennifer Neville	In this work, we propose a general Stein operator for point processes based on the Papangelou conditional intensity function.
25	KAMA-NNs: Low-dimensional Rotation Based Neural Networks	Krzysztof Choromanski, Aldo Pacchiano, Jeffrey Pennington, Yunhao Tang	In the adaptive setting we propose efficient algorithms for learning products of low-dimensional rotations and show how our architectures can be used to improve space and time complexity of state of the art reinforcement learning (RL) algorithms (e.g. PPO, TRPO).
26	Statistical Windows in Testing for the Initial Distribution of a Reversible Markov Chain	Quentin Berthet, Varun Kanade	We study the problem of hypothesis testing between two discrete distributions, where we only have access to samples after the action of a known reversible Markov chain, playing the role of noise.
27	Sketching for Latent Dirichlet-Categorical Models	Joseph Tassarotti, Jean-Baptiste Tristan, Michael Wick	We examine a related problem in which the parameters of a Bayesian model are very large and expensive to store in memory, and propose more compact representations of parameter values that can be used during inference.
28	Adaptive Activity Monitoring with Uncertainty Quantification in Switching Gaussian Process Models	Randy Ardywibowo, Guang Zhao, Zhangyang Wang, Bobak Mortazavi, Shuai Huang, Xiaoning Qian	To efficiently compute the Gaussian process model likelihood and quantify the context prediction uncertainty, we propose a block circulant embedding technique and use Fast Fourier Transforms (FFT) for inference.
29	Near Optimal Algorithms for Hard Submodular Programs with Discounted Cooperative Costs	Rishabh Iyer, Jeffrey Bilmes	In this paper, we investigate a class of submodular problems which in general are very hard.
30	Fast Stochastic Algorithms for Low-rank and Nonsmooth Matrix Problems	Dan Garber, Atara Kaplan	In this paper we develop efficient algorithms for \textit{stochastic} optimization of a strongly-convex objective which includes both a nonsmooth term and a low-rank promoting term.
31	Logarithmic Regret for Online Gradient Descent Beyond Strong Convexity	Dan Garber	In this work, we use this classical result for the first time to obtain faster rates for \textit{online convex optimization} over polyhedral sets with curved convex, though not strongly convex, loss functions.
32	Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches	Filip Hanzely, Peter Richtarik	In this paper we design and analyze an accelerated coordinate descent (\texttt{ACD}) method which in each iteration updates a random subset of coordinates according to an arbitrary but fixed probability law, which is a parameter of the method.
33	Globally-convergent Iteratively Reweighted Least Squares for Robust Regression Problems	Bhaskar Mukhoty, Govind Gopakumar, Prateek Jain, Purushottam Kar	We provide the first global model recovery results for the IRLS (iteratively reweighted least squares) heuristic for robust regression problems.
34	Modularity-based Sparse Soft Graph Clustering	Alexandre Hollocou, Thomas Bonald, Marc Lelarge	We introduce an efficient optimization algorithm to solve this relaxation, that is both memory efficient and local.
35	Pathwise Derivatives for Multivariate Distributions	Martin Jankowiak, Theofanis Karaletsos	We exploit the link between the transport equation and derivatives of expectations to construct efficient pathwise gradient estimators for multivariate distributions.
36	Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning	Bo Liu, Xiao-Tong Yuan, Lezi Wang, Qingshan Liu, Junzhou Huang, Dimitris N. Metaxas	In this paper, we present a sample distributed greedy pursuit method for non-convex sparse learning under cardinality constraint.
37	Vine copula structure learning via Monte Carlo tree search	Bo Chang, Shenyi Pan, Harry Joe	To tackle this problem, we propose a novel approach to learning vine structures using MCTS.
38	Blind Demixing via Wirtinger Flow with Random Initialization	Jialin Dong, Yuanming Shi	To find simple initialization approach that works equally well as spectral initialization, we propose to solve blind demixing problem via Wirtinger flow with random initialization, which yields a natural implementation.
39	Performance Metric Elicitation from Pairwise Classifier Comparisons	Gaurush Hiranandani, Shant Boodaghians, Ruta Mehta, Oluwasanmi Koyejo	By exploiting key geometric properties of the space of confusion matrices, we obtain provably query efficient algorithms for eliciting linear and linear-fractional performance metrics.
40	Analysis of Network Lasso for Semi-Supervised Regression	Alexander Jung, Natalia Vesselinova	We apply network Lasso to semi-supervised regression problems involving network-structured data.
41	Learning Mixtures of Smooth Product Distributions: Identifiability and Algorithm	Nikos Kargas, Nicholas D. Sidiropoulos	In this work, we focus on multivariate mixtures of non-parametric product distributions and propose a two-stage approach which recovers the component distributions of the mixture under a smoothness condition.
42	Robust Matrix Completion from Quantized Observations	Jie Shen, Pranjal Awasthi, Ping Li	In this work, we consider the noisy statistical model where each observed entry can be flipped with some probability after quantization.
43	Foundations of Sequence-to-Sequence Modeling for Time Series	Zelda Mariet, Vitaly Kuznetsov	We provide the first theoretical analysis of this time series forecasting framework.
44	Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit	Yang Cao, Zheng Wen, Branislav Kveton, Yao Xie	We consider a scenario where the reward distributions may change in a piecewise-stationary fashion at unknown time steps.
45	An Optimal Algorithm for Stochastic Three-Composite Optimization	Renbo Zhao, William B. Haskell, Vincent Y. F. Tan	We develop an optimal primal-dual first-order algorithm for a class of stochastic three-composite convex minimization problems.
46	A Thompson Sampling Algorithm for Cascading Bandits	Wang Chi Cheung, Vincent Tan, Zixin Zhong	We design and analyze TS-Cascade, a Thompson sampling algorithm for the cascading bandit problem.
47	Lifelong Optimization with Low Regret	Yi-Shan Wu, Po-An Wang, Chi-Jen Lu	In this work, we study a problem arising from two lines of works: online optimization and lifelong learning.
48	Sparse Multivariate Bernoulli Processes in High Dimensions	Parthe Pandit, Mojtaba Sahraee-Ardakan, Arash Amini, Sundeep Rangan, Alyson K. Fletcher	We propose and analyze an $\ell_1$-regularized maximum likelihood (ML) estimator under the assumption that the parameter tensor is approximately sparse.
49	An Optimal Algorithm for Stochastic and Adversarial Bandits	Julian Zimmert, Yevgeny Seldin	We derive an algorithm that achieves the optimal (up to constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon.
50	Efficient Bayesian Experimental Design for Implicit Models	Steven Kleinegesse, Michael U. Gutmann	We devise a novel experimental design framework for implicit models that improves upon previous work in two ways.
51	Local Saddle Point Optimization: A Curvature Exploitation Approach	Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann	We propose a novel optimization approach that exploits curvature information in order to escape from these undesired stationary points.
52	Testing Conditional Independence on Discrete Data using Stochastic Complexity	Alexander Marx, Jilles Vreeken	We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity.
53	Distributionally Robust Submodular Maximization	Matthew Staib, Bryan Wilder, Stefanie Jegelka	In this paper, we achieve better performance on the actual underlying function f by directly optimizing a combination of bias and variance.
54	A Robust Zero-Sum Game Framework for Pool-based Active Learning	Dixian Zhu, Zhe Li, Xiaoyu Wang, Boqing Gong, Tianbao Yang	In this paper, we present a novel robust zero- sum game framework for pool-based active learning grounded on advanced statistical learning theory.
55	Support and Invertibility in Domain-Invariant Representations	Fredrik D. Johansson, David Sontag, Rajesh Ranganath	In this work, we give generalization bounds for unsupervised domain adaptation that hold for any representation function by acknowledging the cost of non-invertibility.
56	Efficient Inference in Multi-task Cox Process Models	Virginia Aglietti, Theodoros Damoulas, Edwin V. Bonilla	We derive closed-form expressions for the moments of the intensity functions and develop an efficient variational inference algorithm that is orders of magnitude faster than competing deterministic and stochastic approximations of multivariate LGCPs, coregionalization models, and multi-task permanental processes.
57	Optimization of Inf-Convolution Regularized Nonconvex Composite Problems	Emanuel Laude, Tao Wu, Daniel Cremers	In this work, we consider nonconvex composite problems that involve inf-convolution with a Legendre function, which gives rise to an anisotropic generalization of the proximal mapping and Moreau-envelope.
58	On Connecting Stochastic Gradient MCMC and Differential Privacy	Bai Li, Changyou Chen, Hao Liu, Lawrence Carin	We develop theory on the performance of the proposed differentially-private SG-MCMC method.
59	What made you do this? Understanding black-box decisions with sufficient input subsets	Brandon Carter, Jonas Mueller, Siddhartha Jain, David Gifford	Local explanation frameworks aim to rationalize particular decisions made by a black-box prediction model. We instead propose sufficient input subsets that identify minimal subsets of features whose observed values alone suffice for the same decision to be reached, even if all other input feature values are missing.
60	Computation Efficient Coded Linear Transform	Sinong Wang, Jiashang Liu, Ness Shroff, Pengyu Yang	In this paper, we propose a coded computation strategy, referred to as diagonal code, that achieves the optimum recovery threshold and the optimum computation load.
61	Mixing of Hamiltonian Monte Carlo on strongly log-concave distributions 2: Numerical integrators	Oren Mangoubi, Aaron Smith	We obtain quantitative bounds on the mixing properties of the Hamiltonian Monte Carlo (HMC) algorithm with target distribution in d-dimensional Euclidean space, showing that HMC mixes quickly whenever the target log-distribution is strongly concave and has Lipschitz gradients.
62	Temporal Quilting for Survival Analysis	Changhee Lee, William Zame, Ahmed Alaa, Mihaela Schaar	This paper develops a novel approach that combines the collective intelligence of different underlying survival models to produce a valid survival function that is well-calibrated and offers superior discriminative performance at different time horizons.
63	Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms	Mathieu Blondel, Andre Martins, Vlad Niculae	We analyze their properties in depth, showing that they unify many well-known loss functions and allow to create useful new ones easily.
64	On Target Shift in Adversarial Domain Adaptation	Yitong Li, Michael Murias, Samantha Major, Geraldine Dawson, David Carlson	In this work, we propose a method called Domain Adversarial nets for Target Shift (DATS) to address label shift while learning a domain invariant representation.
65	Optimal Testing in the Experiment-rich Regime	Sven Schmit, Virag Shah, Ramesh Johari	Motivated by the widespread adoption of large-scale A/B testing in industry, we propose a new experimentation framework for the setting where potential experiments are abundant (i.e., many hypotheses are available to test), and observations are costly; we refer to this as the experiment-rich regime.
66	Reversible Jump Probabilistic Programming	David A. Roberts, Marcus Gallagher, Thomas Taimre	In this paper we present a method for automatically deriving a Reversible Jump Markov chain Monte Carlo sampler from probabilistic programs that specify the target and proposal distributions.
67	Graph Embedding with Shifted Inner Product Similarity and Its Improved Approximation Capability	Akifumi Okuno, Geewook Kim, Hidetoshi Shimodaira	We propose shifted inner-product similarity (SIPS), which is a novel yet very simple extension of the ordinary inner-product similarity (IPS) for neural-network based graph embedding (GE).
68	High-dimensional Mixed Graphical Model with Ordinal Data: Parameter Estimation and Statistical Inference	Huijie Feng, Yang Ning	We propose a flexible model called Latent Mixed Gaussian Copula Model that simultaneously deals with such mixed data by assuming that the observed ordinal variables are generated by latent variables.
69	Robust Graph Embedding with Noisy Link Weights	Akifumi Okuno, Hidetoshi Shimodaira	We propose $\beta$-graph embedding for robustly learning feature vectors from data vectors and noisy link weights.
70	Exploring Fast and Communication-Efficient Algorithms in Large-Scale Distributed Networks	Yue Yu, Jiaxiang Wu, Junzhou Huang	In this work, we propose a new algorithm LPC-SVRG with quantized gradients and its acceleration ALPC-SVRG to effectively reduce the communication complexity while maintaining the same convergence as the unquantized algorithms.
71	Defending against Whitebox Adversarial Attacks via Randomized Discretization	Yuchen Zhang, Percy Liang	In this paper, we propose and analyze a simple and computationally efficient defense strategy: inject random Gaussian noise, discretize each pixel, and then feed the result into any pre-trained classifier.
72	Fisher Information and Natural Gradient Learning in Random Deep Networks	Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi	The present paper uses statistical neurodynamical method to reveal the properties of the Fisher information matrix in a net of random connections.
73	Robust descent using smoothed multiplicative noise	Matthew J. Holland	In this work, we propose a novel robust gradient descent procedure which makes use of a smoothed multiplicative noise applied directly to observations before constructing a sum of soft-truncated gradient coordinates.
74	Classification using margin pursuit	Matthew J. Holland	In this work, we study a new approach to optimizing the margin distribution realized by binary classifiers, in which the learner searches the hypothesis space in such a way that a pre-set margin level ends up being a distribution-robust estimator of the margin location.
75	Linear Queries Estimation with Local Differential Privacy	Raef Bassily	We study the problem of estimating a set of d linear queries with respect to some unknown distribution p over a domain $[J]$ based on a sensitive data set of n individuals under the constraint of local differential privacy.
76	Bayesian Learning of Neural Network Architectures	Georgi Dikov, Justin Bayer	In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth.
77	Nonlinear Acceleration of Primal-Dual Algorithms	Raghu Bollapragada, Damien Scieur, Alexandre d�Aspremont	We describe a convergence acceleration scheme for multi-step optimization algorithms.
78	Gaussian Process Latent Variable Alignment Learning	Ieva Kazlauskaite, Carl Henrik Ek, Neill Campbell	We present a model that can automatically learn alignments between high-dimensional data in an unsupervised manner.
79	A Bayesian model for sparse graphs with flexible degree distribution and overlapping community structure	Juho Lee, Lancelot James, Seungjin Choi, Francois Caron	We propose an extension of the model that can accommodate an overlapping community structure.
80	Pseudo-Bayesian Learning with Kernel Fourier Transform as Prior	Ga�l Letarte, Emilie Morvant, Pascal Germain	We revisit Rahimi and Recht (2007)’s kernel random Fourier features (RFF) method through the lens of the PAC-Bayesian theory.
81	Forward Amortized Inference for Likelihood-Free Variational Marginalization	Luca Ambrogioni, Umut G��l�, Julia Berezutskaya, Eva Borne, Yagmur G��l�t�rk, Max Hinne, Eric Maris, Marcel Gerven	In this paper, we introduce a new form of amortized variational inference by using the forward KL divergence in a joint-contrastive variational loss.
82	SpikeCaKe: Semi-Analytic Nonparametric Bayesian Inference for Spike-Spike Neuronal Connectivity	Luca Ambrogioni, Patrick Ebel, Max Hinne, Umut G��l�, Marcel Gerven, Eric Maris	In this paper we introduce a semi-analytic variational framework for approximating the posterior of a Gaussian processes coupled through non-linear emission models.
83	Scalable Gaussian Process Inference with Finite-data Mean and Variance Guarantees	Jonathan H. Huggins, Trevor Campbell, Mikolaj Kasprzak, Tamara Broderick	Our main contribution is a novel objective for approximate inference in the nonparametric setting: the preconditioned Fisher (pF) divergence.
84	Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization	Jonas Kohler, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann, Ming Zhou, Klaus Neymeyr	Our main contribution towards this goal is the identification of various problem instances in the realm of machine learning where Batch Normalization can provably accelerate optimization.
85	A new evaluation framework for topic modeling algorithms based on synthetic corpora	Hanyu Shi, Martin Gerlach, Isabel Diersen, Doug Downey, Luis Amaral	Here, we propose a new framework for the evaluation of topic modeling algorithms based on synthetic corpora containing an unambiguously defined ground truth topic structure.
86	On Kernel Derivative Approximation with Random Fourier Features	Zoltan Szabo, Bharath Sriperumbudur	Particularly, in this paper, we focus on the approximation quality of RFFs for kernel derivatives and prove that the existing finite-sample guarantees can be improved exponentially in terms of the domain where they hold, using recent tools from unbounded empirical process theory.
87	Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows	George Papamakarios, David Sterratt, Iain Murray	We present Sequential Neural Likelihood (SNL), a new method for Bayesian inference in simulator models, where the likelihood is intractable but simulating data from the model is possible.
88	Optimal Transport for Multi-source Domain Adaptation under Target Shift	Ievgen Redko, Nicolas Courty, R�mi Flamary, Devis Tuia	In this paper, we tackle the problem of reducing discrepancies between multiple domains, i.e. multi-source domain adaptation, and consider it under the target shift assumption: in all domains we aim to solve a classification problem with the same output classes, but with different labels proportions.
89	Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning	Aapo Hyvarinen, Hiroaki Sasaki, Richard Turner	Here, we propose a general framework for nonlinear ICA, which, as a special case, can make use of temporal structure.
90	Deep Neural Networks Learn Non-Smooth Functions Effectively	Masaaki Imaizumi, Kenji Fukumizu	We derive the generalization error of estimators by DNNs with a ReLU activation, and show that convergence rates of the generalization by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate.
91	Attenuating Bias in Word vectors	Sunipa Dev, Jeff Phillips	In this work, we explore new simple ways to detect the most stereotypically gendered words in an embedding and remove the bias from them.
92	Fisher-Rao Metric, Geometry, and Complexity of Neural Networks	Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, James Stokes	We introduce a new notion of capacity — the Fisher-Rao norm — that possesses desirable invariance properties and is motivated by Information Geometry.
93	Accelerated Decentralized Optimization with Local Updates for Smooth and Strongly Convex Objectives	Hadrien Hendrikx, Francis Bach, Laurent Massoulie	In this paper, we study the problem of minimizing a sum of smooth and strongly convex functions split over the nodes of a network in a decentralized fashion.
94	Interaction Matters: A Note on Non-asymptotic Local Convergence of Generative Adversarial Networks	Tengyuan Liang, James Stokes	Motivated by the pursuit of a systematic computational and algorithmic understanding of Generative Adversarial Networks (GANs), we present a simple yet unified non-asymptotic local convergence theory for smooth two-player games, which subsumes several discrete-time gradient-based saddle point dynamics.
95	On Constrained Nonconvex Stochastic Optimization: A Case Study for Generalized Eigenvalue Decomposition	Zhehui Chen, Xingguo Li, Lin Yang, Jarvis Haupt, Tuo Zhao	Motivated by these neat geometric structures, we propose a simple, efficient, and stochastic primal-dual algorithm solving the online GEV problem.
96	Generalized Boltzmann Machine with Deep Neural Structure	Yingru Liu, Dongliang Xie, Xin Wang	In this paper, to bridge the gap between RBM and artificial neural network, we propose an energy-based probabilistic model that is more flexible on modeling continuous data.
97	Extreme Stochastic Variational Inference: Distributed Inference for Large Scale Mixture Models	Jiong Zhang, Parameswaran Raman, Shihao Ji, Hsiang-Fu Yu, S.V.N. Vishwanathan, Inderjit Dhillon	In this paper, we present extreme stochastic variational inference (ESVI), a distributed, asynchronous and lock-free algorithm to perform variational inference for mixture models on massive real world datasets.
98	Correcting the bias in least squares regression with volume-rescaled sampling	Michal Derezinski, Manfred K. Warmuth, Daniel Hsu	We show this when the additional sample consists of d points drawn jointly according to the input distribution rescaled by the squared volume spanned by the points.
99	Conservative Exploration using Interleaving	Sumeet Katariya, Branislav Kveton, Zheng Wen, Vamsi K. Potluru	We design efficient learning algorithms for this problem, bound their n-step regret, and evaluate them on both synthetic and real-world problems.
100	Conditionally Independent Multiresolution Gaussian Processes	Jalil Taghia, Thomas Sch�n	Our contribution is a new construction which instead assumes conditional independence among GPs across resolutions.
101	Active Exploration in Markov Decision Processes	Jean Tarbouriech, Alessandro Lazaric	We introduce a novel learning algorithm to solve this problem showing that active exploration in MDPs may be significantly more difficult than in MAB. We introduce the active exploration problem in Markov decision processes (MDPs).
102	On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes	Xiaoyu Li, Francesco Orabona	In this paper, we start closing this gap: we theoretically analyze in the convex and non-convex settings a generalized version of the AdaGrad stepsizes.
103	Bandit Online Learning with Unknown Delays	Bingcong Li, Tianyi Chen, Georgios B. Giannakis	This paper deals with bandit online learning, where feedback of unknown delay can emerge in non-stochastic multi-armed bandit (MAB) and bandit convex optimization (BCO) settings.
104	Learning Invariant Representations with Kernel Warping	Yingyi Ma, Vignesh Ganapathiraman, Xinhua Zhang	In this work, we break this limitation by designing a new representation learning algorithm that incorporates invariances \emph{beyond transformation}.
105	$�^3$-IRT: A New Item Response Model and its Applications	Yu Chen, Telmo Silva Filho, Ricardo B. Prudencio, Tom Diethe, Peter Flach	In this paper, we propose the $\beta^3$-IRT model, which models continuous responses and can generate a much enriched family of Item Characteristic Curves.
106	Can You Trust This Prediction? Auditing Pointwise Reliability After Learning	Peter Schulam, Suchi Saria	In this paper, we describe resampling uncertainty estimation (RUE), an algorithm to audit the pointwise reliability of predictions.
107	Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach	Ryo Karakida, Shotaro Akaho, Shun-ichi Amari	The present study reveals novel statistics of FIM that are universal among a wide class of DNNs.
108	Conditional Sparse $L_p$-norm Regression With Optimal Probability	John Hainline, Brendan Juba, Hai S. Le, David Woodruff	We consider the following conditional linear regression problem: the task is to identify both (i) a $k$-DNF condition $c$ and (ii) a linear rule $f$ such that the probability of $c$ is (approximately) at least some given bound $\mu$, and minimizing the $l_p$ loss of $f$ at predicting the target $z$ in the distribution conditioned on $c$.
109	On the Connection Between Learning Two-Layer Neural Networks and Tensor Decomposition	Marco Mondelli, Andrea Montanari	We consider a model with feature vectors $x$, $r$ hidden units with weights $w_i$ and output $y$, i.e., $y=\sum_{i=1}^r \sigma(w_i^{T} x)$, with activation functions given by low-degree polynomials.
110	Autoencoding any Data through Kernel Autoencoders	Pierre Laforgue, St�phan Cl�men�on, Florence d�Alche-Buc	This paper investigates a novel algorithmic approach to data representation based on kernel methods.
111	Towards Understanding the Generalization Bias of Two Layer Convolutional Linear Classifiers with Gradient Descent	Yifan Wu, Barnabas Poczos, Aarti Singh	A major challenge in understanding the generalization of deep learning is to explain why (stochastic) gradient descent can exploit the network architecture to find solutions that have good generalization performance when using high capacity models.
112	Learning to Optimize under Non-Stationarity	Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu	We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting.
113	SPONGE: A generalized eigenproblem for clustering signed networks	Mihai Cucuringu, Peter Davies, Aldo Glielmo, Hemant Tyagi	We introduce a principled and theoretically sound spectral method for k-way clustering in signed graphs, where the affinity measure between nodes takes either positive or negative values.
114	Deep Neural Networks with Multi-Branch Architectures Are Intrinsically Less Non-Convex	Hongyang Zhang, Junru Shao, Ruslan Salakhutdinov	In this work, we provide strong guarantees of this quantity for two classes of network architectures.
115	Are we there yet? Manifold identification of gradient-related proximal methods	Yifan Sun, Halyun Jeong, Julie Nutini, Mark Schmidt	In this work we provide a unified view by giving a simple condition under which any proximal method using a constant step size can achieve finite-iteration manifold detection.
116	Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication	Jayadev Acharya, Ziteng Sun, Huanyu Zhang	We propose Hadamard Response (HR), a local privatization scheme that requires no shared randomness and is symmetric with respect to the users.
117	XBART: Accelerated Bayesian Additive Regression Trees	Jingyu He, Saar Yalov, P. Richard Hahn	We present a stochastic hill climbing algorithm that matches the remarkable predictive accuracy of previous BART implementations, but is many times faster and less memory intensive.
118	A Swiss Army Infinitesimal Jackknife	Ryan Giordano, William Stephenson, Runjing Liu, Michael Jordan, Tamara Broderick	In this work, we use a linear approximation to the dependence of the fitting procedure on the weights, producing results that can be faster than repeated re-fitting by an order of magnitude.
119	Online Multiclass Boosting with Bandit Feedback	Daniel T. Zhang, Young Hun Jung, Ambuj Tewari	We present online boosting algorithms for multiclass classification with bandit feedback, where the learner only receives feedback about the correctness of its prediction.
120	Auto-Encoding Total Correlation Explanation	Shuyang Gao, Rob Brekelmans, Greg Ver Steeg, Aram Galstyan	We propose an information-theoretic approach to characterizing disentanglement and dependence in representation learning using multivariate mutual information, also called total correlation.
121	Towards Efficient Data Valuation Based on the Shapley Value	Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve G�rel, Bo Li, Ce Zhang, Dawn Song, Costas J. Spanos	In this paper, we study the problem of \emph{data valuation} by utilizing the Shapley value, a popular notion of value which originated in coopoerative game theory.
122	Bayesian optimisation under uncertain inputs	Rafael Oliveira, Lionel Ott, Fabio Ramos	In this context, we propose an upper confidence bound (UCB) algorithm for BO problems where both the outcome of a query and the true query location are uncertain.
123	Optimal Minimization of the Sum of Three Convex Functions with a Linear Operator	Seyoon Ko, Joong-Ho Won	We propose a class of optimal-rate primal-dual algorithms for minimization of the sum of three convex functions with a linear operator.
124	Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron	Sharan Vaswani, Francis Bach, Mark Schmidt	For such models, we show that the stochastic gradients of common loss functions satisfy a strong growth condition.
125	No-regret algorithms for online $k$-submodular maximization	Tasuku Soma	We present a polynomial time algorithm for online maximization of $k$-submodular maximization.
126	Lagrange Coded Computing: Optimal Design for Resiliency, Security, and Privacy	Qian Yu, Songze Li, Netanel Raviv, Seyed Mohammadreza Mousavi Kalan, Mahdi Soltanolkotabi, Salman A. Avestimehr	We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers.
127	Subsampled Renyi Differential Privacy and Analytical Moments Accountant	Yu-Xiang Wang, Borja Balle, Shiva Prasad Kasiviswanathan	We study the problem of subsampling in differential privacy (DP), a question that is the centerpiece behind many successful differentially private machine learning algorithms.
128	Model Consistency for Learning with Mirror-Stratifiable Regularizers	Jalal Fadili, Guillaume Garrigos, J�r�me Malick, Gabriel Peyr�	In this work, we provide the theoretical underpinning of this behavior using the notion of mirror-stratifiable regularizers.
129	From Cost-Sensitive to Tight F-measure Bounds	Kevin Bascol, R�mi Emonet, Elisa Fromont, Amaury Habrard, Guillaume Metzler, Marc Sebban	In this article, we derive theoretical guarantees that give tight bounds on the best F-measure that can be obtained from cost-sensitive learning.
130	Feature subset selection for the multinomial logit model via mixed-integer optimization	Shunsuke Kamiya, Ryuhei Miyashiro, Yuichi Takano	In this paper, we propose an exact algorithm to solve the problem for the MNL model.
131	Low-Precision Random Fourier Features for Memory-constrained Kernel Approximation	Jian Zhang, Avner May, Tri Dao, Christopher Re	Because storing a high-rank approximation is memory intensive, we propose using a low-precision quantization of random Fourier features (LP-RFFs) to build a high-rank approximation under a memory budget.
132	Restarting Frank-Wolfe	Thomas Kerdreux, Alexandre d�Aspremont, Sebastian Pokutta	Here, we present a new variant of Conditional Gradients, that can dynamically adapt to the function’s geometric properties using restarts and thus smoothly interpolates between the sublinear and linear regimes.
133	Adaptive Ensemble Prediction for Deep Neural Networks based on Confidence Level	Hiroshi Inoue	We evaluated the adaptive ensembling by using various datasets and showed that it reduces the computation cost significantly while achieving accuracy similar to that of static ensembling using a pre-defined number of local predictions.
134	Infinite Task Learning in RKHSs	Romain Brault, Alex Lambert, Zoltan Szabo, Maxime Sangnier, Florence d�Alche-Buc	In this work, we introduce a novel approach called Infinite Task Learning whose goal is to learn a function whose output is a function over the hyperparameter space.
135	Detection of Planted Solutions for Flat Satisfiability Problems	Quentin Berthet, Jordan Ellenberg	We describe the properties of random instances of flat satisfiability, as well of the optimal rates of detection of the associated hypothesis testing problem.
136	Markov Properties of Discrete Determinantal Point Processes	Kayvan Sadeghi, Alessandro Rinaldo	In this note, we derive the Markov properties of discrete DPPs and show how they can be expressed using graphical models.
137	Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms	Alihan Huyuk, Cem Tekin	We analyze the regret of combinatorial Thompson sampling (CTS) for the combinatorial multi-armed bandit with probabilistically triggered arms under the semi-bandit feedback setting.
138	Distilling Policy Distillation	Wojciech M. Czarnecki, Razvan Pascanu, Simon Osindero, Siddhant Jayakumar, Grzegorz Swirszcz, Max Jaderberg	In this work, we rigorously explore the entire landscape of policy distillation, comparing the motivations and strengths of each variant through theoretical and empirical analysis.
139	Support Localization and the Fisher Metric for off-the-grid Sparse Regularization	Clarice Poon, Nicolas Keriven, Gabriel Peyr�	In this article, we study the BLASSO (i.e. the off-the-grid version of l1 LASSO regularization) and show that the Fisher-Rao distance is the natural way to ensure and quantify support recovery, since it preserves the invariance of the problem under reparameterization.
140	Fast Gaussian process based gradient matching for parameter identification in systems of nonlinear ODEs	Philippe Wenk, Alkis Gotovos, Stefan Bauer, Nico S. Gorbach, Andreas Krause, Joachim M. Buhmann	We offer a novel interpretation which leads to a better understanding, improvements in state-of-the-art performance in terms of accuracy and robustness and a decrease in run time due to a more efficient setup for general nonlinear dynamical systems.
141	Semi-Generative Modelling: Covariate-Shift Adaptation with Cause and Effect Features	Julius K�gelgen, Alexander Mey, Marco Loog	Using ideas from causality, we argue that this requires learning with both causes, $X_C$, and effects, $X_E$, of a target variable, $Y$, and show how this setting leads to what we call a semi-generative model, $P(Y,X_E\|X_C,\theta)$.
142	A Continuous-Time View of Early Stopping for Least Squares Regression	Alnur Ali, J. Zico Kolter, Ryan J. Tibshirani	We study the statistical properties of the iterates generated by gradient descent, applied to the fundamental problem of least squares regression.
143	Towards Clustering High-dimensional Gaussian Mixture Clouds in Linear Running Time	Dan Kushnir, Shirin Jalali, Iraj Saniee	In this paper, we focus on clustering $n$ samples from an arbitrary mixture of $c$-separated Gaussians in $\mathbb{R}^p$ in time that is linear in $p$ and $n$, and sample complexity that is independent of $p$.
144	Classifying Signals on Irregular Domains via Convolutional Cluster Pooling	Angelo Porrello, Davide Abati, Simone Calderara, Rita Cucchiara	We present a novel and hierarchical approach for supervised classification of signals spanning over a fixed graph, reflecting shared properties of the dataset.
145	Learning Rules-First Classifiers	Deborah Cohen, Amit Daniely, Amir Globerson, Gal Elidan	In this work, we focus on one such setting, where a label is perfectly predictable if the input contains certain features, or rules, and otherwise it is predictable by a linear classifier.
146	Wasserstein regularization for sparse multi-task regression	Hicham Janati, Marco Cuturi, Alexandre Gramfort	We focus in this paper on high-dimensional regression problems where each regressor can be associated to a location in a physical space, or more generally a generic geometric space.
147	Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors	Atsushi Nitanda, Taiji Suzuki	In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions.
148	Black Box Quantiles for Kernel Learning	Anthony Tompkins, Ransalu Senanayake, Philippe Morere, Fabio Ramos	In this paper, we present a novel technique to learn kernels that best fit the data.
149	Adversarial Variational Optimization of Non-Differentiable Simulators	Gilles Louppe, Joeri Hermans, Kyle Cranmer	We introduce Adversarial Variational Optimization (AVO), a likelihood-free inference algorithm for fitting a non-differentiable generative model incorporating ideas from generative adversarial networks, variational optimization and empirical Bayes.
150	Active Probabilistic Inference on Matrices for Pre-Conditioning in Stochastic Optimization	Filip Roos, Philipp Hennig	We propose an iterative algorithm inspired by classic iterative linear solvers that uses a probabilistic model to actively infer a pre-conditioner in situations where Hessian-projections can only be constructed with strong Gaussian noise.
151	Projection Free Online Learning over Smooth Sets	Kfir Levy, Andreas Krause	In this work we focus on the special case where the constraint set is smooth and we have an access to gradient and value oracles of the constraint function.
152	Confidence Scoring Using Whitebox Meta-models with Linear Classifier Probes	Tongfei Chen, Jiri Navratil, Vijay Iyengar, Karthikeyan Shanmugam	We propose a novel confidence scoring mechanism for deep neural networks based on a two-model paradigm involving a base model and a meta-model.
153	Learning Influence-Receptivity Network Structure with Guarantee	Ming Yu, Varun Gupta, Mladen Kolar	In this paper we consider the case where we observe a sequence of noisy adjacency matrices triggered by information/events with different topic distributions.
154	Iterative Bayesian Learning for Crowdsourced Regression	Jungseul Ok, Sewoong Oh, Yunhun Jang, Jinwoo Shin, Yung Yi	In particular, we introduce a Bayesian iterative scheme and show that it provably achieves the optimal mean squared error.
155	Nonconvex Matrix Factorization from Rank-One Measurements	Yuanxin Li, Cong Ma, Yuxin Chen, Yuejie Chi	We consider the problem of recovering low-rank matrices from random rank-one measurements, which spans numerous applications including phase retrieval, quantum state tomography, and learning shallow neural networks with quadratic activations, among others.
156	Fast and Robust Shortest Paths on Manifolds Learned from Data	Georgios Arvanitidis, Soren Hauberg, Philipp Hennig, Michael Schober	We propose a fast, simple and robust algorithm for computing shortest paths and distances on Riemannian manifolds learned from data.
157	Training a Spiking Neural Network with Equilibrium Propagation	Peter O�Connor, Efstratios Gavves, Max Welling	In this paper, we address problem (2) by proposing a way in which Equilibrium Propagation can be implemented with neurons which are constrained to just communicate binary values at each time step.
158	Learning One-hidden-layer ReLU Networks via Gradient Descent	Xiao Zhang, Yaodong Yu, Lingxiao Wang, Quanquan Gu	We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network.
159	Gain estimation of linear dynamical systems using Thompson Sampling	Matias I. M�ller, Cristian R. Rojas	We present the gain estimation problem for linear dynamical systems as a multi-armed bandit.
160	Universal Hypothesis Testing with Kernels: Asymptotically Optimal Tests for Goodness of Fit	Shengyu Zhu, Biao Chen, Pengfei Yang, Zhitang Chen	We characterize the asymptotic performance of nonparametric goodness of fit testing.
161	Calibrating Deep Convolutional Gaussian Processes	Gia-Lac Tran, Edwin V. Bonilla, John Cunningham, Pietro Michiardi, Maurizio Filippone	In this paper we show that, in fact, current combinations of CNNs and GPs are miscalibrated.
162	Stochastic algorithms with descent guarantees for ICA	Pierre Ablin, Alexandre Gramfort, Jean-Fran�ois Cardoso, Francis Bach	We develop a new majorization-minimization framework adapted to this loss function.
163	Sample Complexity of Sinkhorn Divergences	Aude Genevay, L�na�c Chizat, Francis Bach, Marco Cuturi, Gabriel Peyr�	We focus in this paper on Sinkhorn divergences (SDs), a regularized variant of OT distances which can interpolate, depending on the regularization strength $\varepsilon$, between OT ($\varepsilon=0$) and MMD ($\varepsilon=\infty$).
164	Adaptive Gaussian Copula ABC	Yanzhi Chen, Michael U. Gutmann	This work presents a simple yet effective ABC algorithm based on the combination of two classical ABC approaches — regression ABC and sequential ABC.
165	Top Feasible Arm Identification	Julian Katz-Samuels, Clayton Scott	We propose a new variant of the top arm identification problem, \emph{top feasible arm identification}, where there are $K$ arms associated with $D$-dimensional distributions and the goal is to find $m$ arms that maximize some known linear function of their means subject to the constraint that their means belong to a given set $P \subset R^D$.
166	Direct Acceleration of SAGA using Sampled Negative Momentum	Kaiwen Zhou, Qinghua Ding, Fanhua Shang, James Cheng, Danli Li, Zhi-Quan Luo	In this paper, we propose a directly accelerated variant of SAGA using a novel Sampled Negative Momentum (SSNM), which achieves the best known oracle complexity for strongly convex problems (with known strong convexity parameter).
167	Does data interpolation contradict statistical optimality?	Mikhail Belkin, Alexander Rakhlin, Alexandre B. Tsybakov	We show that classical learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss.
168	Inverting Supervised Representations with Autoregressive Neural Density Models	Charlie Nash, Nate Kushman, Christopher K.I. Williams	We present a method for feature interpretation that makes use of recent advances in autoregressive density estimation models to invert model representations.
169	Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning	Guillaume Rabusseau, Tianyu Li, Doina Precup	In this paper, we unravel a fundamental connection between weighted finite automata (WFAs) and second-order recurrent neural networks (2-RNNs): in the case of sequences of discrete symbols, WFAs and 2-RNNs with linear activation functions are expressively equivalent.
170	A Family of Exact Goodness-of-Fit Tests for High-Dimensional Discrete Distributions	Feras A. Saad, Cameron E. Freer, Nathanael L. Ackerman, Vikash K. Mansinghka	This paper presents a rank-based family of goodness-of-fit tests that is specialized to discrete distributions on high-dimensional domains.
171	Differentially Private Online Submodular Minimization	Adrian Rivera Cardoso, Rachel Cummings	In this paper we develop the first algorithms for online submodular minimization that preserve differential privacy under full information feedback and bandit feedback.
172	Semi-supervised clustering for de-duplication	Shrinu Kushagra, Shai Ben-David, Ihab Ilyas	In this work, we view de-duplication as a clustering problem where the goal is to put records corresponding to the same physical entity in the same cluster and putting records corresponding to different physical entities into different clusters.
173	Finding the bandit in a graph: Sequential search-and-stop	Pierre Perrault, Vianney Perchet, Michal Valko	In this paper, we address a learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem.
174	Statistical Learning under Nonstationary Mixing Processes	Steve Hanneke, Liu Yang	Under these conditions, we propose a learning method, and establish that for bounded VC subgraph classes, the cumulative excess risk grows sublinearly in the number of predictions, at a quantified rate.
175	On Structure Priors for Learning Bayesian Networks	Ralf Eggeling, Jussi Viinikka, Aleksis Vuoksenmaa, Mikko Koivisto	We investigate empirically several structure priors in combination with different scores, using benchmark data sets and data sets generated from benchmark networks.
176	Partial Optimality of Dual Decomposition for MAP Inference in Pairwise MRFs	Alexander Bauer, Shinichi Nakajima, Nico Goernitz, Klaus-Robert M�ller	In this paper, we explain how the fully integral assignments obtained via DD partially agree with the optimal fractional assignments via LP relaxation when the latter is not tight.
177	Sparse Feature Selection in Kernel Discriminant Analysis via Optimal Scoring	Alexander F. Lapanowski, Irina Gaynanova	We consider the two-group classification problem and propose a kernel classifier based on the optimal scoring framework.
178	Learning Natural Programs from a Few Examples in Real-Time	Nagarajan Natarajan, Danny Simmons, Naren Datha, Prateek Jain, Sumit Gulwani	In this work, we develop a novel, real-time, ML-based program ranking algorithm that enables synthesis of natural, user-intended, personalized programs.
179	Truncated Back-propagation for Bilevel Optimization	Amirreza Shaban, Ching-An Cheng, Nathan Hatch, Byron Boots	In this paper, we analyze the properties of this family of approximate gradients and establish sufficient conditions for convergence.
180	Empirical Risk Minimization and Stochastic Gradient Descent for Relational Data	Victor Veitch, Morgane Austern, Wenda Zhou, David M. Blei, Peter Orbanz	We solve this problem using recent ideas from graph sampling theory to (i) define an empirical risk for relational data and (ii) obtain stochastic gradients for this empirical risk that are automatically unbiased.
181	Variable selection for Gaussian processes via sensitivity analysis of the posterior predictive distribution	Topi Paananen, Juho Piironen, Michael Riis Andersen, Aki Vehtari	To improve on this, we propose two novel variable selection methods for Gaussian process models that utilize the predictions of a full model in the vicinity of the training points and thereby rank the variables based on their predictive relevance.
182	Lifted Weight Learning of Markov Logic Networks Revisited	Ondrej Kuzelka, Vyacheslav Kungurtsev	We study lifted weight learning of Markov logic networks.
183	Causal Discovery in the Presence of Missing Data	Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Hedvig Kjellstr�m, Kun Zhang	In this paper, we aim at developing a causal discovery method to recover the underlying causal structure from observed data that are missing under different mechanisms, including missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).
184	Learning Tree Structures from Noisy Data	Konstantinos E. Nikolakakis, Dionysios S. Kalogerias, Anand D. Sarwate	We study the impact of measurement noise on the task of learning the underlying tree structure via the well-known \textit{Chow-Liu algorithm} and provide formal sample complexity guarantees for exact recovery.
185	Active multiple matrix completion with adaptive confidence sets	Andrea Locatelli, Alexandra Carpentier, Michal Valko	We address the problem of an active setting for a matrix completion, where the learner can choose, from which matrix, it receives a sample (drawn uniformly at random).
186	Confidence-based Graph Convolutional Networks for Semi-Supervised Learning	Shikhar Vashishth, Prateek Yadav, Manik Bhandari, Partha Talukdar	We fill this important gap in this paper and propose ConfGCN, which estimates labels scores along with their confidences jointly in GCN-based setting.
187	Negative Momentum for Improved Game Dynamics	Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, R�mi Le Priol, Gabriel Huang, Simon Lacoste-Julien, Ioannis Mitliagkas	In this paper, we analyze gradient-based methods with momentum on simple games.
188	Deep learning with differential Gaussian process flows	Pashupati Hegde, Markus Heinonen, Harri L�hdesm�ki, Samuel Kaski	We propose a novel deep learning paradigm of differential flows that learn a stochastic differential equation transformations of inputs prior to a standard classification or regression function.
189	Data-dependent compression of random features for large-scale kernel approximation	Raj Agrawal, Trevor Campbell, Jonathan Huggins, Tamara Broderick	We propose to combine the simplicity and generality of RFMs with a data-dependent feature selection scheme to achieve desirable theoretical approximation properties of Nyström with just $O(\log J+)$ features.
190	Large-Margin Classification in Hyperbolic Space	Hyunghoon Cho, Benjamin DeMeo, Jian Peng, Bonnie Berger	To enable accurate classification of points in hyperbolic space while respecting their hyperbolic geometry, we introduce hyperbolic SVM, a hyperbolic formulation of support vector machine classifiers, and describe its theoretical connection to the Euclidean counterpart.
191	Generalizing the theory of cooperative inference	Pei Wang, Pushpi Paranamana, Patrick Shafto	We provide geometric interpretations, connections to and implications for optimal transport, and connections to importance sampling, and conclude by outlining open questions and challenges to realizing the promise of Cooperative Inference.
192	MaxHedge: Maximizing a Maximum Online	Stephen Pasteris, Fabio Vitale, Kevin Chan, Shiqiang Wang, Mark Herbster	We introduce a new online learning framework where, at each trial, the learner is required to select a subset of actions from a given known action set.
193	The Gaussian Process Autoregressive Regression Model (GPAR)	James Requeima, William Tebbutt, Wessel Bruinsma, Richard E. Turner	We present the Gaussian Process Autoregressive Regression (GPAR) model, a scalable multi-output GP model that is able to capture nonlinear, possibly input-varying, dependencies between outputs in a simple and tractable way: the product rule is used to decompose the joint distribution over the outputs into a set of conditionals, each of which is modelled by a standard GP.
194	Towards Optimal Transport with Global Invariances	David Alvarez-Melis, Stefanie Jegelka, Tommi S. Jaakkola	In this work, we propose a general framework for optimal transport in the presence of latent global transformations.
195	Unsupervised Alignment of Embeddings with Wasserstein Procrustes	Edouard Grave, Armand Joulin, Quentin Berthet	In this paper, we propose to use an alternative formulation, based on the joint estimation of an orthogonal matrix and a permutation matrix.
196	Sequential Patient Recruitment and Allocation for Adaptive Clinical Trials	Onur Atan, William R. Zame, Mihaela Schaar	Because finding the exact solution to this Markov Decision Process is computationally intractable, we propose an algorithm Knowledge Gradient for Randomized Controlled Trials (RCT-KG) – that yields an approximate solution.
197	Probabilistic Forecasting with Spline Quantile Function RNNs	Jan Gasthaus, Konstantinos Benidis, Yuyang Wang, Syama Sundar Rangapuram, David Salinas, Valentin Flunkert, Tim Januschowski	In this paper, we propose a flexible method for probabilistic modeling with conditional quantile functions using monotonic regression splines.
198	Exponential Weights on the Hypercube in Polynomial Time	Sudeep Raja Putta, Abhishek Shetty	In this paper, we present a polynomial time algorithm called PolyExp for OLO on the hypercube.
199	Sharp Analysis of Learning with Discrete Losses	Alex Nowak, Francis Bach, Alessandro Rudi	In this paper we study a least-squares framework to systematically design learning algorithms for discrete losses, with quantitative characterizations in terms of statistical and computational complexity.
200	Designing Optimal Binary Rating Systems	Nikhil Garg, Ramesh Johari	We consider the optimal design of rating systems that collect binary feedback after transactions.
201	Stochastic Negative Mining for Learning with Large Output Spaces	Sashank J. Reddi, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Jiecao Chen, Sanjiv Kumar	We consider the problem of retrieving the most relevant labels for a given input when the size of the output space is very large.
202	Learning One-hidden-layer Neural Networks under General Input Distributions	Weihao Gao, Ashok V. Makkuva, Sewoong Oh, Pramod Viswanath	In this paper, we provide a novel unified framework to design loss functions with desirable landscape properties for a wide range of general input distributions.
203	A Geometric Perspective on the Transferability of Adversarial Directions	Zachary Charles, Harrison Rosenberg, Dimitris Papailiopoulos	In this work, we show that in the context of linear classifiers and two-layer ReLU networks, there provably exist directions that give rise to adversarial perturbations for many classifiers and data points simultaneously.
204	Non-linear process convolutions for multi-output Gaussian processes	Mauricio A. Alvarez, Wil Ward, Cristian Guarnizo	The paper introduces a non-linear version of the process convolution formalism for building covariance functions for multi-output Gaussian processes.
205	Lovasz Convolutional Networks	Prateek Yadav, Madhav Nimishakavi, Naganand Yadati, Shikhar Vashishth, Arun Rajkumar, Partha Talukdar	In this work, we propose Lovasz Convolutional Network (LCNs) which are capable of incorporating global graph properties.
206	Bridging the gap between regret minimization and best arm identification, with application to A/B tests	R�my Degenne, Thomas Nedelec, Clement Calauzenes, Vianney Perchet	This analysis sheds light on the common observation that ill-callibrated UCB-algorithms minimize regret while still identifying quickly the best arm.
207	Gaussian Process Modulated Cox Processes under Linear Inequality Constraints	Andr�s F. Lopez-lopera, ST John, Nicolas Durrande	We introduce a novel finite approximation of GP-modulated Cox processes where positiveness conditions can be imposed directly on the GP, with no restrictions on the covariance function.
208	Implicit Kernel Learning	Chun-Liang Li, Wei-Cheng Chang, Youssef Mroueh, Yiming Yang, Barnabas Poczos	We investigate two applications of the proposed IKL as examples, including generative adversarial networks with MMD (MMD GAN) and standard supervised learning.
209	Bounding Inefficiency of Equilibria in Continuous Actions Games using Submodularity and Curvature	Pier Giuseppe Sessa, Maryam Kamgarpour, Andreas Krause	In this paper, we define the class of valid utility games with continuous strategies and provide efficiency bounds for their CCEs.
210	Variational Information Planning for Sequential Decision Making	Jason Pacheco, John Fisher	As MI typically lacks a closed form, we propose an approach which maintains variational approximations of, both, the posterior and MI utility.
211	Renyi Differentially Private ERM for Smooth Objectives	Chen Chen, Jaewoo Lee, Dan Kifer	In this paper, we present a Renyi Differentially Private stochastic gradient descent (SGD) algorithm for convex empirical risk minimization.
212	Projection-Free Bandit Convex Optimization	Lin Chen, Mingrui Zhang, Amin Karbasi	In this paper, we propose the first computationally efficient projection-free algorithm for bandit convex optimization (BCO) with a general convex constraint.
213	Provable Robustness of ReLU networks via Maximization of Linear Regions	Francesco Croce, Maksym Andriushchenko, Matthias Hein	We propose in this paper a regularization scheme for ReLU networks which provably improves the robustness of the classifier by maximizing the linear regions of the classifier as well as the distance to the decision boundary.
214	Test without Trust: Optimal Locally Private Distribution Testing	Jayadev Acharya, Clement Canonne, Cody Freitag, Himanshu Tyagi	We study the problem of distribution testing when the samples can only be accessed using a locally differentially private mechanism and focus on two representative testing questions of identity (goodness-of-fit) and independence testing for discrete distributions.
215	Distributed Maximization of “Submodular plus Diversity” Functions for Multi-label Feature Selection on Huge Datasets	Mehrdad Ghadiri, Mark Schmidt	In this paper, we consider this problem as an optimization problem that seeks to maximize the sum of a sum-sum diversity function and a non-negative monotone submodular function.
216	On Euclidean k-Means Clustering with alpha-Center Proximity	Amit Deshpande, Anand Louis, Apoorv Singh	$k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are \emph{stable} under additive or multiplicative perturbation of data.
217	Noisy Blackbox Optimization using Multi-fidelity Queries: A Tree Search Approach	Rajat Sen, Kirthevasan Kandasamy, Sanjay Shakkottai	In this work, we combine structured state-space exploration through hierarchical partitioning with querying these partitions at multiple fidelities, and develop a multi-fidelity bandit based tree-search algorithm for noisy black-box optimization.
218	Safe Convex Learning under Uncertain Constraints	Ilnura Usmanova, Andreas Krause, Maryam Kamgarpour	We propose a new variant of the Frank-Wolfe algorithm, which applies to the case of uncertain linear constraints.
219	The non-parametric bootstrap and spectral analysis in moderate and high-dimension	Noureddine El Karoui, Elizabeth Purdom	We focus on the modern framework where p/n is not close to 0 but remains bounded as n and p tend to infinity.
220	Knockoffs for the Mass: New Feature Importance Statistics with False Discovery Guarantees	Jaime Roquero Gimenez, Amirata Ghorbani, James Zou	We develop an efficient algorithm to generate valid knockoffs from Bayesian Networks.
221	Training Variational Autoencoders with Buffered Stochastic Variational Inference	Rui Shu, Hung Bui, Jay Whang, Stefano Ermon	In this paper, we propose the Buffered Stochastic Variational Inference (BSVI), a new refinement procedure that makes use of SVI’s sequence of intermediate variational proposal distributions and their corresponding importance weights to construct a new generalized importance-weighted lower bound.
222	Regularized Contextual Bandits	Xavier Fontaine, Quentin Berthet, Vianney Perchet	To tackle this problem we use a nonparametric model and propose an algorithm splitting the context space into bins, and solving simultaneously — and independently — regularized multi-armed bandit instances on each bin.
223	Risk-Sensitive Generative Adversarial Imitation Learning	Jonathan Lacotte, Mohammad Ghavamzadeh, Yinlam Chow, Marco Pavone	We consider the generative adversarial approach to imitation learning (GAIL) and derive an optimization problem for our formulation, which we call it risk- sensitive GAIL (RS-GAIL).
224	Learning Controllable Fair Representations	Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, Stefano Ermon	We propose an information-theoretically motivated objective for learning maximally expressive representations subject to fairness constraints.
225	Multi-Task Time Series Analysis applied to Drug Response Modelling	Alex Bird, Chris Williams, Christopher Hawthorne	In this paper we show how these models can be personalised to an individual level while retaining statistical power, via use of multi-task learning (MTL).
226	Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization	Jaime Roquero Gimenez, James Zou	Here we present a generalization of the knockoff procedure that we call simultaneous multi-knockoffs.
227	Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features	Arno Solin, Manon Kok	Gaussian processes (GPs) provide a powerful framework for extrapolation, interpolation, and noise removal in regression and classification.
228	Distributional reinforcement learning with linear function approximation	Marc G. Bellemare, Nicolas Le Roux, Pablo Samuel Castro, Subhodeep Moitra	In this paper we adapt the Cramer distance to deal with arbitrary vectors.
229	Matroids, Matchings, and Fairness	Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Sergei Vassilvtiskii	In this work we extend this line of work to include algorithms for optimization subject to one or multiple matroid constraints.
230	Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function	Wojciech Tarnowski, Piotr Warchol, Stanislaw Jastrz?bski, Jacek Tabor, Maciej Nowak	We propose that in ResNets this can be resolved based on our results by ensuring the same level of dynamical isometry at initialization.
231	The Termination Critic	Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup	In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents.
232	Consistent Online Optimization: Convex and Submodular	Mohammad Reza Karimi Jaghargh, Andreas Krause, Silvio Lattanzi, Sergei Vassilvtiskii	In this work we develop online learning algorithms that update a sublinear number of times.
233	Learning Determinantal Point Processes by Corrective Negative Sampling	Zelda Mariet, Mike Gartrell, Suvrit Sra	To address this issue, which reduces the quality of the learned model, we introduce a novel optimization problem, Contrastive Estimation (CE), which encodes information about "negative" samples into the basic learning model.
234	Probabilistic Semantic Inpainting with Pixel Constrained CNNs	Emilien Dupont, Suhas Suresha	In this paper, we propose a method to perform probabilistic semantic inpainting by building a model, based on PixelCNNs, that learns a distribution of images conditioned on a subset of visible pixels.
235	Least Squares Estimation of Weakly Convex Functions	Sun Sun, Yaoliang Yu	In this work we argue that convexity, as a global property, is too strict and prone to outliers.
236	Interval Estimation of Individual-Level Causal Effects Under Unobserved Confounding	Nathan Kallus, Xiaojie Mao, Angela Zhou	We study the problem of learning conditional average treatment effects (CATE) from observational data with unobserved confounders.
237	Amortized Variational Inference with Graph Convolutional Networks for Gaussian Processes	Linfeng Liu, Liping Liu	In this paper, we focus on the approximation of GP posterior at a local level: we define a reusable template to approximate the posterior at neighborhoods while maintaining a global approximation.
238	Online Decentralized Leverage Score Sampling for Streaming Multidimensional Time Series	Rui Xie, Zengyan Wang, Shuyang Bai, Ping Ma, Wenxuan Zhong	We propose a leverage score sampling (LSS) method for efficient online inference of the streaming vector autoregressive (VAR) model.
239	Interpretable Cascade Classifiers with Abstention	Matthieu Clertant, Nataliya Sokolovska, Yann Chevaleyre, Blaise Hanczar	In this contribution, we develop a POMDP-based framework to learn cost-sensitive heterogeneous cascading systems.
240	Kernel Exponential Family Estimation via Doubly Dual Embedding	Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He	We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space.
241	Revisiting Adversarial Risk	Arun Sai Suggala, Adarsh Prasad, Vaishnavh Nagarajan, Pradeep Ravikumar	In this work, we show that the main reason behind this confusion is the inaccurate definition of adversarial perturbation that is used in the literature.
242	A Memoization Framework for Scaling Submodular Optimization to Large Scale Problems	Rishabh Iyer, Jeffrey Bilmes	In this paper, we present a new model called the pre-computational complexity model, along with a unifying memoization based framework, which looks at the specific form of the given submodular function.
243	Bernoulli Race Particle Filters	Sebastian M. Schmon, Arnaud Doucet, George Deligiannidis	We propose here a novel algorithm that allows for resampling according to the true intractable weights when only an unbiased estimator of the weights is available.
244	Augmented Ensemble MCMC sampling in Factorial Hidden Markov Models	Kaspar M�rtens, Michalis Titsias, Christopher Yau	We introduce a general purpose ensemble Markov Chain Monte Carlo (MCMC) technique to improve on existing poorly mixing samplers.
245	Probabilistic Riemannian submanifold learning with wrapped Gaussian process latent variable models	Anton Mallasto, S�ren Hauberg, Aasa Feragen	Latent variable models (LVMs) learn probabilistic models of data manifolds lying in an ambient Euclidean space.
246	Unbiased Smoothing using Particle Independent Metropolis-Hastings	Lawrece Middleton, George Deligiannidis, Arnaud Doucet, Pierre E. Jacob	We propose a simple way of coupling two MCMC chains built using Particle Independent Metropolis-Hastings (PIMH) to produce unbiased smoothing estimators.
247	Two-temperature logistic regression based on the Tsallis divergence	Ehsan Amid, Manfred K. Warmuth, Sriram Srinivasan	We develop a variant of multiclass logistic regression that is significantly more robust to noise.
248	Avoiding Latent Variable Collapse with Generative Skip Models	Adji B. Dieng, Yoon Kim, Alexander M. Rush, David M. Blei	In this paper, we propose a simple new way to avoid latent variable collapse by including skip connections in our generative model; these connections enforce strong links between the latent variables and the likelihood function.
249	SMOGS: Social Network Metrics of Game Success	Fan Bu, Sonia Xu, Katherine Heller, Alexander Volfovsky	In this paper we propose a novel metric of basketball game success, derived from a team’s dynamic social network of game play.
250	Fast Algorithms for Sparse Reduced-Rank Regression	Benjamin Dubois, Jean-Fran�ois Delmas, Guillaume Obozinski	In particular, based on an analysis of the geometry of the problem, we establish that a proximal Polyak-{Ł}ojasiewicz inequality is satisfied in a neighborhood of the set of optima under a condition on the regularization parameter.
251	Modeling simple structures and geometry for better stochastic optimization algorithms	Hilal Asi, John C. Duchi	We develop model-based methods for stochastic optimization problems, introducing the approximate-proximal point, or aProx, family, which includes stochastic subgradient, proximal point, and bundle methods.
252	Online learning with feedback graphs and switching costs	Anshuka Rangi, Massimo Franceschetti	We propose two new algorithms: Threshold Based EXP3 and EXP3.SC.
253	Interpretable Almost-Exact Matching for Causal Inference	Awa Dieng, Yameng Liu, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky	The method proposed in this work aims to match units on a weighted Hamming distance, taking into account the relative importance of the covariates; the algorithm aims to match units on as many relevant variables as possible.
254	Statistical Optimal Transport via Factored Couplings	Aden Forrow, Jan-Christian H�tter, Mor Nitzan, Philippe Rigollet, Geoffrey Schiebinger, Jonathan Weed	We propose a new method to estimate Wasserstein distances and optimal transport plans between two probability distributions from samples in high dimension.
255	$HS^2$: Active learning over hypergraphs with pointwise and pairwise queries	I (Eli) Chien, Huozhi Zhou, Pan Li	We propose a hypergraph-based active learning scheme which we term $HS^2$; $HS^2$ generalizes the previously reported algorithm $S^2$ originally proposed for graph-based active learning with pointwise queries.
256	Clustering Time Series with Nonlinear Dynamics: A Bayesian Non-Parametric and Particle-Based Approach	Alexander Lin, Yingzhuo Zhang, Jeremy Heng, Stephen A. Allsop, Kay M. Tye, Pierre E. Jacob, Demba Ba	We propose a general statistical framework for clustering multiple time series that exhibit nonlinear dynamics into an a-priori-unknown number of sub-groups.
257	Efficient Nonconvex Empirical Risk Minimization via Adaptive Sample Size Methods	Aryan Mokhtari, Asuman Ozdaglar, Ali Jadbabaie	In this paper, we are interested in finding a local minimizer of an empirical risk minimization (ERM) problem where the loss associated with each sample is possibly a nonconvex function.
258	An Optimal Control Approach to Sequential Machine Teaching	Laurent Lessard, Xuezhou Zhang, Xiaojin Zhu	Our key insight is to formulate sequential machine teaching as a time-optimal control problem.
259	An Online Algorithm for Smoothed Regression and LQR Control	Gautam Goel, Adam Wierman	We consider Online Convex Optimization (OCO) in the setting where the costs are $m$-strongly convex and the online learner pays a switching cost for changing decisions between rounds.
260	Uncertainty Autoencoders: Learning Compressed Representations via Variational Information Maximization	Aditya Grover, Stefano Ermon	In this work, we propose Uncertainty Autoencoders, a learning framework for unsupervised representation learning inspired by compressed sensing.
261	Structured Disentangled Representations	Babak Esmaeili, Hao Wu, Sarthak Jain, Alican Bozkurt, N Siddharth, Brooks Paige, Dana H. Brooks, Jennifer Dy, Jan-Willem Meent	We propose a two-level hierarchical objective to control relative degree of statistical independence between blocks of variables and individual variables within blocks.
262	Estimating Network Structure from Incomplete Event Data	Benjamin Mark, Garvesh Raskutti, Rebecca Willett	In this work, we develop a novel approach to estimating the parameters of a BAR process in the presence of unobserved events via an unbiased estimator of the complete data log-likelihood function.
263	Locally Private Mean Estimation: $Z$-test and Tight Confidence Intervals	Marco Gaboardi, Ryan Rogers, Or Sheffet	This work provides tight upper- and lower-bounds for the problem of mean estimation under differential privacy in the local-model, when the input is composed of $n$ i.i.d. drawn samples from a Gaussian.
264	Estimation of Non-Normalized Mixture Models	Takeru Matsuda, Aapo Hyv�rinen	We develop a general method for estimating a finite mixture of non-normalized models.
265	Rotting bandits are no harder than stochastic ones	Julien Seznec, Andrea Locatelli, Alexandra Carpentier, Alessandro Lazaric, Michal Valko	In this paper, we consider the non-parametric rotting bandit setting, where rewards can only decrease.
266	A Topological Regularizer for Classifiers via Persistent Homology	Chao Chen, Xiuyan Ni, Qinxun Bai, Yusu Wang	In this paper, we initiate a new direction and propose to enforce the structural simplicity of the classification boundary by regularizing over its topological complexity.
267	Overcomplete Independent Component Analysis via SDP	Anastasia Podosinnikova, Amelia Perry, Alexander S. Wein, Francis Bach, Alexandre d�Aspremont, David Sontag	We present a novel algorithm for overcomplete independent components analysis (ICA), where the number of latent sources k exceeds the dimension p of observed variables.
268	Doubly Semi-Implicit Variational Inference	Dmitry Molchanov, Valery Kharitonov, Artem Sobolev, Dmitry Vetrov	We extend the existing framework of semi-implicit variational inference (SIVI) and introduce doubly semi-implicit variational inference (DSIVI), a way to perform variational inference and learning when both the approximate posterior and the prior distribution are semi-implicit.
269	Reducing training time by efficient localized kernel regression	Nicole M�ecke	We study generalization properties of kernel regularized least squares regression based on a partitioning approach.
270	Scalable High-Order Gaussian Process Regression	Shandian Zhe, Wei Xing, Robert M. Kirby	To address these issues, we propose HOGPR, a High-Order Gaussian Process Regression model, which can flexibly capture complex correlations among the outputs and scale up to a large number of outputs.
271	A Higher-Order Kolmogorov-Smirnov Test	Veeranjaneyulu Sadhanala, Yu-Xiang Wang, Aaditya Ramdas, Ryan J. Tibshirani	We present an extension of the Kolmogorov-Smirnov (KS) two-sample test, which can be more sensitive to differences in the tails.
272	Bayesian Learning of Conditional Kernel Mean Embeddings for Automatic Likelihood-Free Inference	Kelvin Hsu, Fabio Ramos	In this paper, we present kernel embedding likelihood-free inference (KELFI), a holistic framework that automatically learns model hyperparameters to improve inference accuracy given limited simulation budget.
273	Parallel Asynchronous Stochastic Coordinate Descent with Auxiliary Variables	Hsiang-Fu Yu, Cho-Jui Hsieh, Inderjit S. Dhillon	In this paper, we consider a rich family of objective functions where AUX-PCD can be applied.
274	Credit Assignment Techniques in Stochastic Computation Graphs	Th�ophane Weber, Nicolas Heess, Lars Buesing, David Silver	In this work, we address these problems by generalizing concepts from the reinforcement learning literature.
275	Efficient Bayesian Optimization for Target Vector Estimation	Anders Kirk Uhrenholt, Bj�ern Sand Jensen	We address this issue by proposing a Gaussian process model that considers the individual function outputs and derive a distribution over the resulting 2-norm.
276	Correspondence Analysis Using Neural Networks	Hsiang Hsu, Salman Salamatian, Flavio P. Calmon	By re-interpreting the objective in CA using an information-theoretic tool called the principal inertia components, we demonstrate that performing CA is equivalent to solving a functional optimization problem over the space of finite variance functions of two random variable.
277	Interpolating between Optimal Transport and MMD using Sinkhorn Divergences	Jean Feydy, Thibault S�journ�, Fran�ois-Xavier Vialard, Shun-ichi Amari, Alain Trouve, Gabriel Peyr�	This paper studies the Sinkhorn divergences, a family of geometric divergences that interpolates between MMD and OT.
278	Multi-Observation Regression	Rafael Frongillo, Nishant A. Mehta, Tom Morgan, Bo Waggoner	We propose four algorithms formalizing the concept of ERM for this problem, two of which have statistical guarantees in settings allowing both slow and fast convergence rates, but which are out-performed empirically by the other two.
279	Adaptive MCMC via Combining Local Samplers	Ki�rash Shaloudegi, Andr�s Gy�rgy	Here we take a different approach and, similarly to parallel MCMC methods, instead of trying to find a single chain that samples from the whole distribution, we combine samples from several chains run in parallel, each exploring only parts of the state space (e.g., a few modes only).
280	Variance reduction properties of the reparameterization trick	Ming Xu, Matias Quiroz, Robert Kohn, Scott A. Sisson	We explore this under the idealized assumptions that the variational approximation is a mean-field Gaussian density and that the log of the joint density of the model parameters and the data is a quadratic function that depends on the variational mean.
281	Hierarchical Clustering for Euclidean Data	Moses Charikar, Vaggos Chatziafratis, Rad Niazadeh, Grigory Yaroslavtsev	In this paper we take the first step and give novel scalable algorithms for this problem tailored to Euclidean data in R^d and under vector-based similarity measures, a prevalent model in several typical machine learning applications.
282	Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization	Zhe Wang, Yi Zhou, Yingbin Liang, Guanghui Lan	In this paper, we propose a stochastic variance-reduced cubic-regularization (SVRC) method under random sampling, and study its convergence guarantee as well as sample complexity.
283	Variational Noise-Contrastive Estimation	Benjamin Rhodes, Michael U. Gutmann	To increase the number of techniques in our arsenal, we propose variational noise-contrastive estimation (VNCE), building on NCE which is a method that only applies to unnormalised models.
284	Improving Quadrature for Constrained Integrands	Henry R. Chai, Roman Garnett	We present an improved Bayesian framework for performing inference of affine transformations of constrained functions.
285	High Dimensional Inference in Partially Linear Models	Ying Zhu, Zhuqing Yu, Guang Cheng	We propose two semiparametric versions of the debiased Lasso procedure for the model $Y_{i}=X_{i}\beta_{0}+g_{0}(Z_{i})+\varepsilon_{i}$, where the parameter vector of interest $\beta_{0}$ is high dimensional but sparse (exactly or approximately) and $g_{0}$ is an unknown nuisance function.
286	Cost aware Inference for IoT Devices	Pengkai Zhu, Durmus Alp Emre Acar, Nan Feng, Prateek Jain, Venkatesh Saligrama	We propose to train a decentralized gated network that, given an observed instance at test-time, allows for activation of select devices to transmit information to a central node, which then performs inference.
287	Banded Matrix Operators for Gaussian Markov Models in the Automatic Differentiation Era	Nicolas Durrande, Vincent Adam, Lucas Bordeaux, Stefanos Eleftheriadis, James Hensman	The aim of the paper is to make modern inference methods (such as variational inference or gradient-based sampling) available for Gaussian models with banded precision.
288	A Unified Weight Learning Paradigm for Multi-view Learning	Lai Tian, Feiping Nie, Xuelong Li	In this paper, we present a Unified Paradigm (UP) that contains the aforementioned three popular paradigms as special cases.
289	Region-Based Active Learning	Corinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Ningshan Zhang	We give a detailed theoretical analysis of ORIWAL, including generalization error guarantees and bounds on the number of points labeled, in terms of both the hypothesis set used in each region and the probability mass of that region.
290	Precision Matrix Estimation with Noisy and Missing Data	Roger Fan, Byoungwook Jang, Yuekai Sun, Shuheng Zhou	We develop an alternating direction method of multipliers (ADMM) algorithm for these problems, providing a feasible algorithm to estimate precision matrices with indefinite input and potentially nonconvex penalties.
291	Exploring $k$ out of Top $?$ Fraction of Arms in Stochastic Bandits	Wenbo Ren, Jia Liu, Ness B. Shroff	We prove lower bounds for the four variants (finite or infinite arms, and known or unknown threshold), and propose algorithms for each.
292	AutoML from Service Provider�s Perspective: Multi-device, Multi-tenant Model Selection with GP-EI	Chen Yu, Bojan Karla�, Jie Zhong, Ce Zhang, Ji Liu	In this paper, we focus on the AutoML problem from the \emph{service provider’s perspective}, motivated by the following practical consideration: When an AutoML service needs to serve {\em multiple users} with {\em multiple devices} at the same time, how can we allocate these devices to users in an efficient way?
293	On Theory for BART	Veronika Rockov�, Enakshi Saha	Here, we study the exact BART prior and propose a simple modification so that it also enjoys optimality properties.
294	Deep Topic Models for Multi-label Learning	Rajat Panda, Ankit Pensia, Nikhil Mehta, Mingyuan Zhou, Piyush Rai	We present a probabilistic framework for multi-label learning based on a deep generative model for the binary label vector associated with each observation.
295	On the Dynamics of Gradient Descent for Autoencoders	Thanh V. Nguyen, Raymond K. W. Wong, Chinmay Hegde	We focus on three generative models for data that are common in statistical machine learning: (i) the mixture-of-gaussians model, (ii) the sparse coding model, and (iii) the sparsity model with non-negative coefficients.
296	Complexities in Projection-Free Stochastic Non-convex Minimization	Zebang Shen, Cong Fang, Peilin Zhao, Junzhou Huang, Hui Qian	For constrained nonconvex minimization problems, we propose a meta stochastic projection-free optimization algorithm, named Normalized Frank Wolfe Updating, that can take any Gradient Estimator (GE) as input.
297	Differentiable Antithetic Sampling for Variance Reduction in Stochastic Variational Inference	Mike Wu, Noah Goodman, Stefano Ermon	In this paper, we explore a technique that uses correlated, but more representative, samples to reduce estimator variance.
298	Efficient Greedy Coordinate Descent for Composite Problems	Sai Praneeth Karimireddy, Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi	In this paper, we consider greedy updates that are based on subgradients for a class of non-smooth composite problems, including $L1$-regularized problems, SVMs and related applications.
299	Decentralized Gradient Tracking for Continuous DR-Submodular Maximization	Jiahao Xie, Chao Zhang, Zebang Shen, Chao Mi, Hui Qian	In this paper, we focus on the continuous DR-submodular maximization over a network.
300	Adaptive Rao-Blackwellisation in Gibbs Sampling for Probabilistic Graphical Models	Craig Kelly, Somdeb Sarkhel, Deepak Venugopal	In this paper, we propose an adaptive approach for Rao-Blackwellisation, where we add parallel Markov chains defined over different collapsed PGM structures.
301	Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems	Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter Bartlett, Martin Wainwright	We study derivative-free methods for policy optimization over the class of linear policies.
302	Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective	Anirudh Vemula, Wen Sun, J. Bagnell	We examine these black-box methods closely to identify situations in which they are worse than action space exploration methods and those in which they are superior.
303	Sampling from Non-Log-Concave Distributions via Variance-Reduced Gradient Langevin Dynamics	Difan Zou, Pan Xu, Quanquan Gu	We study stochastic variance reduction-based Langevin dynamic algorithms, SVRG-LD and SAGA-LD \citep{dubey2016variance}, for sampling from non-log-concave distributions.
304	Graph to Graph: a Topology Aware Approach for Graph Structures Learning and Generation	Mingming Sun, Ping Li	Primarily, we focus on the issue of how to effectively learn the topology of the source graph and then decode it to form the topology of the target graph.
305	Imitation-Regularized Offline Learning	Yifei Ma, Yu-Xiang Wang, Balakrishnan Narayanaswamy	With probability logging, our main theoretical contribution connects IML-underfitting to the existence of either confounding variables or model misspecification.
306	A maximum-mean-discrepancy goodness-of-fit test for censored data	Tamara Fernandez, Arthur Gretton	We introduce a kernel-based goodness-of-fit test for censored data, where observations may be missing in random time intervals: a common occurrence in clinical trials and industrial life-testing.
307	Sobolev Descent	Youssef Mroueh, Tom Sercu, Anant Raj	We introduce Sobolev descent that constructs similar paths by following gradient flows of a critic function in a kernel space or parametrized by a neural network.
308	Learning the Structure of a Nonstationary Vector Autoregression	Daniel Malinsky, Peter Spirtes	We demonstrate our approach on both simulated and real macroeconomic data.
309	Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning	Tadashi Kozuno, Eiji Uchibe, Kenji Doya	In this paper, we propose and analyze conservative value iteration, which unifies value iteration, soft value iteration, advantage learning, and dynamic policy programming.
310	A Fast Sampling Algorithm for Maximum Inner Product Search	QIN DING, Hsiang-Fu Yu, Cho-Jui Hsieh	In this paper, we propose Sampling-MIPS, which is the first sampling based algorithm that can be applied to the MIPS problem on a set of general vectors with both positive and negative values.
311	Minimum Volume Topic Modeling	Byoungwook Jang, Alfred Hero	We propose a new topic modeling procedure that takes advantage of the fact that the Latent Dirichlet Allocation (LDA) log-likelihood function is asymptotically equivalent to the logarithm of the volume of the topic simplex.
312	Binary Space Partitioning Forest	Xuhui Fan, Bin Li, Scott SIsson	We propose to generate a cutting hyperplane, which is assumed to be parallel to $d-2$ dimensions, to cut each node in the d-dimensional BSP-tree.
313	Improved Semi-Supervised Learning with Multiple Graphs	Krishnamurthy Viswanathan, Sushant Sachdeva, Andrew Tomkins, Sujith Ravi	We present a new approach for graph based semi-supervised learning based on a multi-component extension to the Gaussian MRF model.
314	Optimizing over a Restricted Policy Class in MDPs	Ershad Banijamali, Yasin Abbasi-Yadkori, Mohammad Ghavamzadeh, Nikos Vlassis	We address the problem of finding an optimal policy in a Markov decision process (MDP) under a restricted policy class defined by the convex hull of a set of base policies.
315	Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate	Mor Shpigel Nacson, Nathan Srebro, Daniel Soudry	We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate — in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data.
316	Deep Switch Networks for Generating Discrete Data and Language	Payam Delgosha, Naveen Goela	An interpretable, statistical framework is introduced for training these nonlinear networks based on a maximum-likelihood objective function.
317	A recurrent Markov state-space generative model for sequences	Anand Ramachandran, Steve Lumetta, Eric Klee, Deming Chen	In this article, we present a new generative model for sequences that combines both aspects, the ability to perform exact inferences and the ability to model long-term structure, by augmenting the HMM with a deterministic, continuous state variable modeled through a Recurrent Neural Network.
318	A Potential Outcomes Calculus for Identifying Conditional Path-Specific Effects	Daniel Malinsky, Ilya Shpitser, Thomas Richardson	In this paper we present the potential outcome calculus (po-calculus), a natural generalization of do-calculus for arbitrary potential outcomes.
319	Adversarial Discrete Sequence Generation without Explicit NeuralNetworks as Discriminators	Zhongliang Li, Tian Xia, Xingyu Lou, Kaihe Xu, Shaojun Wang, Jing Xiao	This paper presents a novel approach to train GANs for discrete sequence generation without resorting to an explicit neural network as the discriminator.
320	Adaptive Estimation for Approximate $k$-Nearest-Neighbor Computations	Daniel LeJeune, Reinhard Heckel, Richard Baraniuk	In this paper, we consider the approximate $k$-nearest-neighbor problem, which is the problem of finding a subset of O(k) points in a given set of points that contains the set of $k$ nearest neighbors of a given query point.
321	Model-Free Linear Quadratic Control via Reduction to Expert Prediction	Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari	In this work, we present a new model-free algorithm for controlling linear quadratic (LQ) systems, and show that its regret scales as $O(T^{\xi+2/3})$ for any small $\xi>0$ if time horizon satisfies $T>C^{1/\xi}$ for a constant $C$.
322	Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport	Adarsh Subbaswamy, Peter Schulam, Suchi Saria	We propose a proactive approach which learns a relationship in the training domain that will generalize to the target domain by incorporating prior knowledge of aspects of the data generating process that are expected to differ as expressed in a causal selection diagram.
323	Structured Robust Submodular Maximization: Offline and Online Algorithms	Nima Anari, Nika Haghtalab, Seffi Naor, Sebastian Pokutta, Mohit Singh, Alfredo Torrico	In this work, we consider robust submodular maximization with structured combinatorial constraints and give efficient algorithms with provable guarantees.
324	Sample-Efficient Imitation Learning via Generative Adversarial Nets	Lionel Blond�, Alexandros Kalousis	We show that our approach is simple to implement and that the learned agents remain remarkably stable, as shown in our experiments that span a variety of continuous control tasks.
325	Probabilistic Multilevel Clustering via Composite Transportation Distance	Nhat Ho, Viet Huynh, Dinh Phung, Michael Jordan	We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence.
326	A General Framework for Multi-fidelity Bayesian Optimization with Gaussian Processes	Jialin Song, Yuxin Chen, Yisong Yue	In this paper, we study multi-fidelity Bayesian optimization with complex structural dependencies among multiple outputs, and propose MF-MI-Greedy, a principled algorithmic framework for addressing this problem.
327	Online Algorithm for Unsupervised Sensor Selection	Arun Verma, Manjesh Hanawal, Csaba Szepesvari, Venkatesh Saligrama	Our objective is to learn strategies for selecting a test that gives the best trade-off between accuracy and costs in such unsupervised sensor selection (USS) problems.
328	Best of many worlds: Robust model selection for online supervised learning	Vidya Muthukumar, Mitas Ray, Anant Sahai, Peter Bartlett	We introduce algorithms for online, full-information prediction that are computationally efficient and competitive with contextual tree experts of unknown complexity, in both probabilistic and adversarial settings.
329	Accelerating Imitation Learning with Predictive Models	Ching-An Cheng, Xinyan Yan, Evangelos Theodorou, Byron Boots	In this work, we seek to further accelerate the convergence rate of online imitation learning, thereby making it more sample efficient.
330	Online Learning in Kernelized Markov Decision Processes	Sayak Ray Chowdhury, Aditya Gopalan	We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions.
331	Lifting high-dimensional non-linear models with Gaussian regressors	Christos Thrampoulidis, Ankit Singh Rawat	We resolve this issue by proposing and analyzing an alternative convex recovery method.
332	Domain-Size Aware Markov Logic Networks	Happy Mittal, Ayush Bhardwaj, Vibhav Gogate, Parag Singla	As our second contribution, we present a principled solution to this problem by defining Domain-size Aware Markov Logic Networks (DA-MLNs) which can be seen as re-parameterizing the MLNs after taking domain size into consideration.
333	Database Alignment with Gaussian Features	Osman E. Dai, Daniel Cullina, Negar Kiyavash	We consider the problem of aligning a pair of databases with jointly Gaussian features.
334	Size of Interventional Markov Equivalence Classes in random DAG models	Dmitriy Katz, Karthikeyan Shanmugam, Chandler Squires, Caroline Uhler	For constant density, we show that the expected $\log$ observational MEC size asymptotically (in the number of vertices) approaches a constant.
335	Reparameterizing Distributions on Lie Groups	Luca Falorsi, Pim de Haan, Tim R. Davidson, Patrick Forr�	In this work we define a general framework to create reparameterizable densities on arbitrary Lie groups, and provide a detailed practitioners guide to further the ease of usage.
336	Revisit Batch Normalization: New Understanding and Refinement via Composition Optimization	Xiangru Lian, Ji Liu	In this paper, we try to understand BN from an optimization perspective by providing an explicit objective function associated with BN.
337	Multi-Order Information for Working Set Selection of Sequential Minimal Optimization	Qimao Yang, Changrong Li, Jun Guo	A new working set selection method for sequential minimal optimization (SMO) is proposed in this paper.
338	Harmonizable mixture kernels with variational Fourier features	Zheyang Shen, Markus Heinonen, Samuel Kaski	In this work we propose the novel harmonizable mixture kernel (HMK), a family of expressive, interpretable, non-stationary kernels derived from mixture models on the generalized spectral representation.
339	Multiscale Gaussian Process Level Set Estimation	Shubhanshu Shekhar, Tara Javidi	In this paper, the problem of estimating the level set of a black-box function from noisy and expensive evaluation queries is considered.
340	The LORACs Prior for VAEs: Letting the Trees Speak for the Data	Sharad Vikram, Matthew D. Hoffman, Matthew J. Johnson	To alleviate this problem, we propose using a flexible Bayesian nonparametric hierarchical clustering prior based on the time-marginalized coalescent (TMC).
341	Adversarial Learning of a Sampler Based on an Unnormalized Distribution	Chunyuan Li, Ke Bai, Jianqiao Li, Guoyin Wang, Changyou Chen, Lawrence Carin	The proposed method is compared to alternative approaches, with encouraging results demonstrated across a range of applications, including deep soft Q-learning.
342	Active Ranking with Subset-wise Preferences	Aadirupa Saha, Aditya Gopalan	We consider the problem of probably approximately correct (PAC) ranking $n$ items by adaptively eliciting subset-wise preference feedback.
343	Recovery Guarantees For Quadratic Tensors With Sparse Observations	Hongyang Zhang, Vatsal Sharan, Moses Charikar, Yingyu Liang	We consider the tensor completion problem of predicting the missing entries of a tensor.
344	Sample Efficient Graph-Based Optimization with Noisy Observations	Thanh Tan Nguyen, Ali Shameli, Yasin Abbasi-Yadkori, Anup Rao, Branislav Kveton	We study sample complexity of optimizing “hill-climbing friendly” functions defined on a graph under noisy observations.
345	Robustness Guarantees for Density Clustering	Heinrich Jiang, Jennifer Jang, Ofir Nachum	We show both robustness and consistency guarantees for a simple modification of the popular DBSCAN algorithm.
346	Fixing Mini-batch Sequences with Hierarchical Robust Partitioning	Shengjie Wang, Wenruo Bai, Chandrashekhar Lavania, Jeff Bilmes	We propose a general and efficient hierarchical robust partitioning framework to generate a deterministic sequence of mini-batches, one that offers assurances of being high quality, unlike a randomly drawn sequence.
347	Multitask Metric Learning: Theory and Algorithm	Boyu Wang, Hejia Zhang, Peng Liu, Zebang Shen, Joelle Pineau	In this paper, we study the problem of multitask metric learning (mtML).
348	Efficient Bayes Risk Estimation for Cost-Sensitive Classification	Daniel Andrade, Yuzuru Okajima	In this work, we show that for linear classifiers this task can be considerably simplified, leading to a one dimensional integral for which we propose an efficient approximation.
349	Interpreting Black Box Predictions using Fisher Kernels	Rajiv Khanna, Been Kim, Joydeep Ghosh, Sanmi Koyejo	Our goal is to ask “which training examples are most responsible for a given set of predictions”?
350	Representation Learning on Graphs: A Reinforcement Learning Application	Sephora Madjiheurem, Laura Toni	In this work, we study value function approximation in reinforcement learning (RL) problems with high dimensional state or action spaces via a generalized version of representation policy iteration (RPI).
351	ABCD-Strategy: Budgeted Experimental Design for Targeted Causal Structure Discovery	Raj Agrawal, Chandler Squires, Karren Yang, Karthikeyan Shanmugam, Caroline Uhler	Given that randomized experiments are usually expensive to perform, we propose a general framework and theory based on optimal Bayesian experimental design to select experiments for targeted causal discovery.
352	Batched Stochastic Bayesian Optimization via Combinatorial Constraints Design	Kevin K. Yang, Yuxin Chen, Alycia Lee, Yisong Yue	Motivated by these considerations, we investigate \emph{Batched Stochastic Bayesian Optimization} (BSBO), a novel Bayesian optimization scheme for choosing the constraints in order to guide exploration towards items with greater utility.
353	Convergence of Gradient Descent on Separable Data	Mor Shpigel Nacson, Jason Lee, Suriya Gunasekar, Pedro Henrique Pamplona Savarese, Nathan Srebro, Daniel Soudry	Within this family, for simple linear models we show that the optimal rates with fixed step size is indeed obtained for the commonly used exponentially tailed losses such as logistic loss.
354	Structured Neural Topic Models for Reviews	Babak Esmaeili, Hongyi Huang, Byron Wallace, Jan-Willem van de Meent	We present Variational Aspect-based Latent Topic Allocation (VALTA), a family of autoencoding topic models that learn aspect-based representations of reviews.
355	Adaptive Minimax Regret against Smooth Logarithmic Losses over High-Dimensional l1-Balls via Envelope Complexity	Kohei Miyaguchi, Kenji Yamanishi	We develop a new theoretical framework, the envelope complexity, to analyze the minimax regret with logarithmic loss functions.
356	Low-Dimensional Density Ratio Estimation for Covariate Shift Correction	Petar Stojanov, Mingming Gong, Jaime Carbonell, Kun Zhang	In this work, we investigate the dependence of covariate shift correction performance on the dimensionality of the features, and propose a correction method that finds a low-dimensional representation of the features, which takes into account feature relevant to the target $Y$, and exploits the density ratio of this representation for importance reweighting.
357	Evaluating model calibration in classification	Juozas Vaicenavicius, David Widmann, Carl Andersson, Fredrik Lindsten, Jacob Roll, Thomas Sch�n	In this work, we develop a general theoretical calibration evaluation framework grounded in probability theory, and point out subtleties present in model calibration evaluation that lead to refined interpretations of existing evaluation techniques.
358	Towards Gradient Free and Projection Free Stochastic Optimization	Anit Kumar Sahu, Manzil Zaheer, Soummya Kar	A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank-Wolfe algorithm makes it gradient free.
359	On Multi-Cause Approaches to Causal Inference with Unobserved Counfounding: Two Cautionary Failure Cases and A Promising Alternative	Alexander D�Amour	In this paper, we present two simple, analytical counterexamples that challenge the general claims that are central to these approaches.
360	Data-Driven Approach to Multiple-Source Domain Adaptation	Petar Stojanov, Mingming Gong, Jaime Carbonell, Kun Zhang	We propose a data-driven method to represent these changes across multiple source domains and perform unsupervised domain adaptation.