Paper Digest: AISTATS 2020 Highlights

June 17, 2020August 18, 2020 admin

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The International Conference on Artificial Intelligence and Statistics (AISTATS) is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas. In 2020, it is to be held virtually due to covid-19 pandemic.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: AISTATS 2020 Papers

	Title	Authors	Highlight
1	Linearly Convergent Frank-Wolfe without Line-Search	Fabian Pedregosa, Geoffrey Negiar, Armin Askari, Martin Jaggi	In this paper we propose variants of Away-steps and Pairwise FW that lift both restrictions simultaneously.
2	Guarantees of Stochastic Greedy Algorithms for Non-monotone Submodular Maximization	Shinsaku Sakaue	In this paper, we prove that SG (with slight modification) can achieve almost $1/4$-approximation guarantees in expectation in linear time even if objective functions are non-monotone.
3	On Maximization of Weakly Modular Functions: Guarantees of Multi-stage Algorithms, Tractability, and Hardness	Shinsaku Sakaue	In this paper, we study cardinality-constrained maximization of {\it weakly modular} functions, whose closeness to being modular is measured by {\it submodularity} and {\it supermodularity ratios}, and reveal what we can and cannot do by using the weak modularity.
4	Adaptive Trade-Offs in Off-Policy Learning	Mark Rowland, Will Dabney, Remi Munos	In this paper, we take a unifying view of this space of algorithms, and consider their trade-offs of three fundamental quantities: update variance, fixed-point bias, and contraction rate.
5	Conditional Importance Sampling for Off-Policy Learning	Mark Rowland, Anna Harutyunyan, Hado Hasselt, Diana Borsa, Tom Schaul, Remi Munos, Will Dabney	The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios.
6	Multiplicative Gaussian Particle Filter	Xuan Su, Wee Sun Lee, Zhen Zhang	We propose a new sampling-based approach for approximate inference in filtering problems.
7	Stretching the Effectiveness of MLE from Accuracy to Bias for Pairwise Comparisons	Jingyan Wang, Nihar Shah, R Ravi	In this work, we consider one specific type of fairness, which is the notion of bias in statistics.
8	Fast and Accurate Ranking Regression	Ilkay Yildiz, Jennifer Dy, Deniz Erdogmus, Jayashree Kalpathy-Cramer, Susan Ostmo, J. Peter Campbell, Michael F. Chiang, Stratis Ioannidis	Using this equivalence, we propose two spectral algorithms for ranking regression that learn model parameters up to 579 times faster than the Newton’s method.
9	Tight Analysis of Privacy and Utility Tradeoff in Approximate Differential Privacy	Quan Geng, Wei Ding, Ruiqi Guo, Sanjiv Kumar	We characterize the minimum noise amplitude and power for noise-adding mechanisms in (epsilon, delta)-differential privacy for single real-valued query function.
10	Long-and Short-Term Forecasting for Portfolio Selection with Transaction Costs	Guy Uziel, Ran El-Yaniv	In this paper we focus on the problem of online portfolio selection with transaction costs.
11	Nonparametric Sequential Prediction While Deep Learning the Kernel	Guy Uziel	In this paper, therefore, we propose a novel algorithm that simultaneously satisfies a short-term goal, to perform as good as the best choice in hindsight of a data-adaptive kernel, learned using a deep neural network, and a long-term goal, to achieve the same theoretical asymptotic guarantee.
12	Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation	Yuxuan Song, Ning Miao, Hao Zhou, Lantao Yu, Mingxuan Wang, Lei Li	In this paper, we propose $\psi$-MLE, a new training scheme for autoregressive sequence generative models, which is effective and stable when operating at large sample space encountered in text generation.
13	A Double Residual Compression Algorithm for Efficient Distributed Learning	Xiaorui Liu, Yao Li, Jiliang Tang, Ming Yan	In this paper, we propose DORE, a DOuble REsidual compression stochastic gradient descent algorithm, to reduce over $95%$ of the overall communication such that the obstacle can be immensely mitigated.
14	Asynchronous Gibbs Sampling	Alexander Terenin, Daniel Simpson, David Draper	We introduce a theoretical framework for analyzing asynchronous Gibbs sampling and other extensions of MCMC that do not possess the Markov property.
15	Learning Fair Representations for Kernel Models	Zilong Tan, Samuel Yeom, Matt Fredrikson, Ameet Talwalkar	We leverage the classical Sufficient Dimension Reduction (SDR) framework to construct representations as subspaces of the reproducing kernel Hilbert space (RKHS), whose member functions are guaranteed to satisfy fairness.
16	A Nonparametric Off-Policy Policy Gradient	Samuele Tosatto, Joao Carvalho, Hany Abdulsamad, Jan Peters	We address this issue by building on the general sample efficiency of off-policy algorithms.
17	Non-Parametric Calibration for Classification	Jonathan Wenger, Hedvig Kjellstr?m, Rudolph Triebel)	In this paper, we propose a method that adjusts the confidence estimates of a general classifier such that they approach the probability of classifying correctly.
18	Minimax Testing of Identity to a Reference Ergodic Markov Chain	Geoffrey Wolfer, Aryeh Kontorovich	We obtain nearly matching (up to logarithmic factors) upper and lower sample complexity bounds for our notion of distance, which is based on total variation.
19	A Linear-time Independence Criterion Based on a Finite Basis Approximation	Longfei Yan, W. Bastiaan Kleijn, Thushara Abhayapala	We propose a novel independence criterion for two random variables with linear-time complexity.
20	Minimax Bounds for Structured Prediction Based on Factor Graphs	Kevin Bello, Asish Ghoshal, Jean Honorio	In this work, we provide minimax lower bounds for a class of general factor-graph inference models in the context of structured prediction.That is, we characterize the necessary sample complexity for any conceivable algorithm to achieve learning of general factor-graph predictors.
21	On the Convergence of SARAH and Beyond	Bingcong Li, Meng Ma, Georgios B. Giannakis	The main theme of this work is a unifying algorithm, \textbf{L}oop\textbf{L}ess \textbf{S}ARAH (L2S) for problems formulated as summation of n individual loss functions.
22	Uncertainty in Neural Networks: Approximately Bayesian Ensembling	Tim Pearce, Felix Leibfried, Alexandra Brintrup	This work proposes one modification to the usual process that we argue does result in approximate Bayesian inference; regularising parameters about values drawn from a distribution which can be set equal to the prior.
23	LIBRE: Learning Interpretable Boolean Rule Ensembles	Graziano Mita, Paolo Papotti, Maurizio Filippone, Pietro Michiardi	We present a novel method—LIBRE—learn an interpretable classifier, which materializes as a set of Boolean rules.
24	Marginal Densities, Factor Graph Duality, and High-Temperature Series Expansions	Mehdi Molkaraie	We prove that the marginal densities of a global probability mass function in aprimal normal factor graph and the corresponding marginal densities in the dual normal factor graph are related via local mappings.
25	Neighborhood Growth Determines Geometric Priors for Relational Representation Learning	Melanie Weber	In this paper, we propose a combinatorial approach to evaluating embeddability, i.e., to decide whether a data set is best represented in Euclidean, Hyperbolic or Spherical space.
26	Fair Decisions Despite Imperfect Predictions	Niki Kilbertus, Manuel Gomez Rodriguez, Bernhard Sch?lkopf, Krikamol Muandet, Isabel Valera	In this paper, we show that, in this selective labels setting, learning to predict is suboptimal in terms of both fairness and utility.
27	A Characterization of Mean Squared Error for Estimator with Bagging	Martin Mihelich, Charles Dognin, Yan Shu, Michael Blot	In this paper, we theoretically investigate how the bagging method can reduce the Mean Squared Error (MSE) when applied on a statistical estimator.
28	Uncertainty Quantification for Sparse Deep Learning	Yuexi Wang, Veronika Rockova	This paper takes a step forward in this important direction by taking a Bayesian point of view.
29	Minimizing Dynamic Regret and Adaptive Regret Simultaneously	Lijun Zhang, Shiyin Lu, Tianbao Yang	In this paper, we bridge this gap by proposing novel online algorithms that are able to minimize the dynamic regret and adaptive regret simultaneously.
30	A Stein Goodness-of-fit Test for Directional Distributions	Wenkai Xu, Takeru Matsuda	In this study, we propose nonparametric goodness-of-fit testing procedures for general directional distributions based on kernel Stein discrepancy.
31	Unsupervised Neural Universal Denoiser for Finite-Input General-Output Noisy Channel	Taeeon Park, Taesup Moon	We devise a novel neural network-based universal denoiser for the finite-input, general-output (FIGO) channel.
32	Leave-One-Out Cross-Validation for Bayesian Model Comparison in Large Data	M?ns Magnusson, Aki Vehtari, Johan Jonasson, Michael Andersen	We propose an efficient method for estimating differences in predictive performance by combining fast approximate LOO surrogates with exact LOO sub-sampling using the difference estimator and supply proofs with regards to scaling characteristics.
33	Robust Importance Weighting for Covariate Shift	Fengpei Li, Henry Lam, Siddharth Prusty	In this paper, we propose and analyze a new estimator that systematically integrates the residuals of NR with KMM reweighting, based on a control-variate perspective.
34	Adaptive Online Kernel Sampling for Vertex Classification	Peng Yang, Ping Li	In this paper, we introduce an online kernel sampling (OKS) technique, a new second-order OKL method that slightly improve the bound from $O(d \log(T))$ down to $O(r \log(T))$ where $r$ is the rank of the learned data and is usually much smaller than d.
35	A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning	Nhan Pham, Lam Nguyen, Dzung Phan, PHUONG HA NGUYEN, Marten Dijk, Quoc Tran-Dinh	We propose a novel hybrid stochastic policy gradient estimator by combining an unbiased policy gradient estimator, the REINFORCE estimator, with another biased one, an adapted SARAH estimator for policy optimization.
36	Stopping criterion for active learning based on deterministic generalization bounds	Hideaki Ishibashi, Hideitsu Hino	In this study, we propose a criterion for automatically stopping active learning.
37	Ivy: Instrumental Variable Synthesis for Causal Inference	Zhaobin Kuang, Frederic Sala, Nimit Sohoni, Sen Wu, Aldo C?rdova-Palomera, Jared Dunnmon, James Priest, Christopher Re	To relax these assumptions, we propose Ivy, a new method to combine IV candidates that can handle correlated and invalid IV candidates in a robust manner.
38	High Dimensional Robust Sparse Regression	Liu Liu, Yanyao Shen, Tianyang Li, Constantine Caramanis	Our main contribution is a robust variant of Iterative Hard Thresholding.
39	Nested-Wasserstein Self-Imitation Learning for Sequence Generation	Ruiyi Zhang, Changyou Chen, Zhe Gan, Zheng Wen, Wenlin Wang, Lawrence Carin	To alleviate these issues, we propose the concept of nested-Wasserstein distance for distributional semantic matching.
40	Greed Meets Sparsity: Understanding and Improving Greedy Coordinate Descent for Sparse Optimization	Huang Fang, Zhenan Fan, Yifan Sun, Michael Friedlander	We present an improved convergence analysis of GCD for sparse optimization, and a formal analysis of its screening properties.
41	Recommendation on a Budget: Column Space Recovery from Partially Observed Entries with Random or Active Sampling	Carolyn Kim, Mohsen Bayati	We analyze alternating minimization for column space recovery of a partially observed, approximately low rank matrix with a growing number of columns and a fixed budget of observations per column.
42	Fast Noise Removal for k-Means Clustering	Sungjin Im, Mahshid Montazer Qaem, Benjamin Moseley, Xiaorui Sun, Rudy Zhou	This paper considers k-means clustering in the presence of noise.
43	Sketching Transformed Matrices with Applications to Natural Language Processing	Yingyu Liang, Zhao Song, Mengdi Wang, Lin Yang, Xin Yang	In this paper, we first propose a space-efficient sketching algorithm for computing the product of a given small matrix with the transformed matrix.
44	Unconditional Coresets for Regularized Loss Minimization	Alireza Samadian, Kirk Pruhs, Benjamin Moseley, Sungjin Im, Ryan Curtin	Our main result is that if the regularizer’s effect does not become negligible as the norm of the hypothesis scales, and as the data scales, then a uniform sample of modest size is with high probability a coreset.
45	ASAP: Architecture Search, Anneal and Prune	Asaf Noy, Niv Nayman, Tal Ridnik, Nadav Zamir, Sivan Doveh, Itamar Friedman, Raja Giryes, Lihi Zelnik	In this paper, we propose a differentiable search space that allows the annealing of architecture weights, while gradually pruning inferior operations, thus the search converges to a single output network in a continuous manner.
46	Understanding Generalization in Deep Learning via Tensor Methods	Jingling Li, Yanchao Sun, Jiahao Su, Taiji Suzuki, Furong Huang	In this work, we advance the understanding of the relations between the network’s architecture and its generalizability from the compression perspective.
47	Accelerating Gradient Boosting Machines	Haihao Lu, Sai Praneeth Karimireddy, Natalia Ponomareva, Vahab Mirrokni	In this work, we propose an Accelerated Gradient Boosting Machine (AGBM) by incorporating Nesterov’s acceleration techniques into the design of GBM.
48	Online Binary Space Partitioning Forests	Xuhui Fan, Bin Li, Scott SIsson	In this paper, we develop an online BSP-Forest framework to address this limitation.
49	Sparse Hilbert-Schmidt Independence Criterion Regression	Benjamin Poignard, Makoto Yamada	In this paper, we propose the sparse Hilbert–Schmidt Independence Criterion (SpHSIC) regression, which is a versatile nonlinear feature selection algorithm based on the HSIC and is a continuous optimization variant of the well-known minimum redundancy maximum relevance (mRMR) feature selection algorithm.
50	Sharp Thresholds of the Information Cascade Fragility Under a Mismatched Model	Wasim Huleihel, Ofer Shayevitz	Accordingly, in this paper we study a mismatch model where players believe that the revealing probabilities are $\{q_\ell\}_{\ell\in\mathbb{N}}$ when they truly are $\{p_\ell\}_{\ell\in\mathbb{N}}$, and study the effect of this mismatch on information cascades.
51	Optimal sampling in unbiased active learning	Henrik Imberg, Johan Jonasson, Marina Axelson-Fisk	We argue that this produces suboptimal predictions and present sampling schemes for unbiased pool-based active learning that minimise the actual prediction error, and demonstrate a better predictive performance than competing methods on a number of benchmark datasets.
52	The Area of the Convex Hull of Sampled Curves: a Robust Functional Statistical Depth measure	Guillaume Staerman, Pavlo Mozharovskyi, St?phan Cl?men\con	In this paper, we propose a novel notion of functional depth based on the area of the convex hull of sampled curves, capturing gradual departures from centrality, even beyond the envelope of the data, in a natural fashion.
53	Diameter-based Interactive Structure Discovery	Christopher Tosh, Daniel Hsu	We introduce interactive structure discovery, a generic framework that encompasses many interactive learning settings, including active learning, top-k item identification, interactive drug discovery, and others.
54	Utility/Privacy Trade-off through the lens of Optimal Transport	Etienne Boursier, Vianney Perchet	Unlike classical solutions that focus on the first point, we consider instead agents that optimize a natural trade-off between both objectives.
55	A Lyapunov analysis for accelerated gradient methods: from deterministic to stochastic case	Maxime Laborde, Adam Oberman	We show that this connection can be extended to the case of stochastic gradients, and develop Lyapunov function based convergence rates proof for Nesterov’s accelerated stochastic gradient descent.
56	Interpretable Deep Gaussian Processes with Moments	Chi-Ken Lu, Scott Cheng-Hsin Yang, Xiaoran Hao, Patrick Shafto	We propose interpretable DGP based on approximating DGP as a GP by calculating the exact moments, which additionally identify the heavy-tailed nature of some DGP distributions.
57	Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions	Lars Buesing, Nicolas Heess, Theophane Weber	In particular, we propose the TreeSample algorithm, an adaptation of Monte Carlo Tree Search to approximate inference.
58	Accelerated Bayesian Optimisation through Weight-Prior Tuning	Alistair Shilton, Sunil Gupta, Santu Rana, Pratibha Vellanki, Cheng Li, Svetha Venkatesh, Laurence Park, Alessandra Sutti, David Rubin, Thomas Dorin, Alireza Vahid, Murray Height, Teo Slezak	In this paper we show how such auxiliary data may be used to construct a GP covariance corresponding to a more appropriate weight prior for the objective function.
59	Variance Reduction for Evolution Strategies via Structured Control Variates	Yunhao Tang, Krzysztof Choromanski, Alp Kucukelbir	We propose a new method for improving accuracy of the ES algorithms, that as opposed to recent approaches utilizing only Monte Carlo structure of the gradient estimator, takes advantage of the underlying MDP structure to reduce the variance.
60	Optimization of Graph Total Variation via Active-Set-based Combinatorial Reconditioning	Zhenzhang Ye, Thomas M?llenhoff, Tao Wu, Daniel Cremers	In this work, we propose a novel adaptive preconditioning strategy for proximal algorithms on this problem class.
61	Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization	Kenji Kawaguchi, Haihao Lu	We propose a new stochastic optimization framework for empirical risk minimization problems such as those that arise in machine learning.
62	A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent	Eduard Gorbunov, Filip Hanzely, Peter Richtarik	In this paper we introduce a unified analysis of a large family of variants of proximal stochastic gradient descent (SGD) which so far have required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities.
63	Entropy Weighted Power k-Means Clustering	Saptarshi Chakraborty, Debolina Paul, Swagatam Das, Jason Xu	This paper addresses these issues by introducing entropy regularization to learn feature relevance while annealing.
64	Identifying and Correcting Label Bias in Machine Learning	Heinrich Jiang, Ofir Nachum	In this paper, we provide a mathematical formulation of how this bias can arise.
65	AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity	Yibo Zeng, Fei Feng, Wotao Yin	In this paper, we propose AsyncQVI, an asynchronous-parallel Q-value iteration for discounted Markov decision processes whose transition and reward can only be sampled through a generative model.
66	Active Community Detection with Maximal Expected Model Change	Dan Kushnir, Benjamin Mirabelli	We present a novel active learning algorithm for community detection on networks.
67	RCD: Repetitive causal discovery of linear non-Gaussian acyclic models with latent confounders	Takashi Nicholas Maeda, Shohei Shimizu	This paper proposes a causal functional model-based method called repetitive causal discovery (RCD) to discover the causal structure of observed variables affected by latent confounders.
68	A Simple Approach for Non-stationary Linear Bandits	Peng Zhao, Lijun Zhang, Yuan Jiang, Zhi-Hua Zhou	In this paper, we demonstrate that a simple restarted strategy is sufficient to attain the same regret guarantee.
69	Distributionally Robust Formulation and Model Selection for the Graphical Lasso	Pedro Cisneros, Alexander Petersen, Sang-Yun Oh	Building on a recent framework for distributionally robust optimization, we consider inverse covariance matrix estimation for multivariate data.
70	Efficient Spectrum-Revealing CUR Matrix Decomposition	Cheng Chen, Ming Gu, Zhihua Zhang, Weinan Zhang, Yong Yu	In this paper, we propose a novel CUR algorithm based on truncated LU factorization with an efficient variant of complete pivoting.
71	Graph DNA: Deep Neighborhood Aware Graph Encoding for Collaborative Filtering	Liwei Wu, Hsiang-Fu Yu, Nikhil Rao, James Sharpnack, Cho-Jui Hsieh	In this paper, we consider recommender systems with side information in the form of graphs.
72	Characterization of Overlap in Observational Studies	Michael Oberst, Fredrik Johansson, Dennis Wei, Tian Gao, Gabriel Brat, David Sontag, Kush Varshney	We formalize overlap estimation as a problem of finding minimum volume sets subject to coverage constraints and reduce this problem to binary classification with Boolean rule classifiers.
73	Modular Block-diagonal Curvature Approximations for Feedforward Architectures	Felix Dangel, Stefan Harmeling, Philipp Hennig	We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian).
74	A Unified Statistically Efficient Estimation Framework for Unnormalized Models	Masatoshi Uehara, Takafumi Kanamori, Takashi Takenouchi, Takeru Matsuda	In this study, we propose a unified, statistically efficient estimation framework for unnormalized models and several efficient estimators, whose asymptotic variance is the same as the MLE.
75	More Powerful Selective Kernel Tests for Feature Selection	Jen Ning Lim, Makoto Yamada, Wittawat Jitkrittum, Yoshikazu Terada, Shigeyuki Matsui, Hidetoshi Shimodaira	We show how recent advances inmultiscale bootstrap makesthis possible and demonstrate our proposal over a range of synthetic and real world experiments.
76	Imputation estimators for unnormalized models with missing data	Masatoshi Uehara, Takeru Matsuda, Jae Kwang Kim	We propose estimation methods for such unnormalized models with missing data.
77	Wasserstein Style Transfer	Youssef Mroueh	We propose Gaussian optimal transport for image style transfer in an Encoder/Decoder framework.
78	Elimination of All Bad Local Minima in Deep Learning	Kenji Kawaguchi, Leslie Kaelbling	In this paper, we theoretically prove that adding one special neuron per output unit eliminates all suboptimal local minima of any deep neural network, for multi-class classification, binary classification, and regression with an arbitrary loss function, under practical assumptions.
79	Fully Decentralized Joint Learning of Personalized Models and Collaboration Graphs	Valentina Zantedeschi, Aur?lien Bellet, Marc Tommasi	We propose to train personalized models that leverage a collaboration graph describing the relationships between user personal tasks, which we learn jointly with the models.
80	Formal Limitations on the Measurement of Mutual Information	David McAllester, Karl Stratos	In this paper, we prove that serious statistical limitations are inherent to any method of measuring mutual information.
81	Scalable Feature Selection for (Multitask) Gradient Boosted Trees	Cuize Han, Nikhil Rao, Daria Sorokina, Karthik Subbian	We develop a scalable forward feature selection variant for GBDT, via a novel group testing procedure that works well in high dimensions, and enjoys favorable theoretical performance and computational guarantees.
82	Model-Agnostic Counterfactual Explanations for Consequential Decisions	Amir-Hossein Karimi, Gilles Barthe, Borja Balle, Isabel Valera	In contrast, we build on standard theory and tools from formal verification and propose a novel algorithm that solves a sequence of satisfiability problems, where both the distance function (objective) and predictive model (constraints) are represented as logic formulae.
83	Obfuscation via Information Density Estimation	Hsiang Hsu, Shahab Asoodeh, Flavio Calmon	In this paper, we propose a framework to identify information-leaking features via information density estimation.
84	Linear Dynamics: Clustering without identification	Chloe Hsu, Michaela Hardt, Moritz Hardt	We analyze a computationally efficient and provably convergent algorithm to estimate the eigenvalues of the state-transition matrix in a linear dynamical system.
85	Low-rank regularization and solution uniqueness in over-parameterized matrix sensing	Kelly Geyer, Anastasios Kyrillidis, Amir Kalev	In this contribution, we prove that in fact, under certain conditions, the PSD constraint by itself is sufficient to lead to a unique low-rank matrix recovery, without explicit or implicit regularization.
86	Robustness for Non-Parametric Classification: A Generic Attack and Defense	Yao-Yuan Yang, Cyrus Rashtchian, Yizhen Wang, Kamalika Chaudhuri	In this work, we take a holistic look at adversarial examples for non-parametric classifiers, including nearest neighbors, decision trees, and random forests.
87	Contextual Online False Discovery Rate Control	Shiyun Chen, Shiva Kasiviswanathan	In this paper, we consider a setting where an ordered (possibly infinite) sequence of hypotheses arrives in a stream, and for each hypothesis we observe a p-value along with a set of features specific to that hypothesis.
88	Sequential no-Substitution k-Median-Clustering	Tom Hess, Sivan Sabato	We study the sample-based k-median clustering objective under a sequential setting without substitutions.
89	Robust Learning from Discriminative Feature Feedback	Sanjoy Dasgupta, Sivan Sabato	In this paper, we introduce a more realistic, robust version of the framework, in which the annotator is allowed to make mistakes.
90	Hermitian matrices for clustering directed graphs: insights and applications	Mihai Cucuringu, Huan Li, He Sun, Luca Zanetti	To overcome these downsides, we propose a spectral clustering algorithm based on a complex-valued matrix representation of digraphs.
91	Kernel Conditional Density Operators	Ingmar Schuster, Mattes Mollenhauer, Stefan Klus, Krikamol Muandet	We introduce a novel conditional density estimationmodel termed the conditional densityoperator (CDO).
92	Learning Overlapping Representations for the Estimation of Individualized Treatment Effects	Yao Zhang, Alexis Bellot, Mihaela Schaar	Based on these results, we develop a deep kernel regression algorithm and posterior regularization framework that substantially outperforms the state-of-the-art on a variety of benchmarks data sets.
93	Additive Tree-Structured Covariance Function for Conditional Parameter Spaces in Bayesian Optimization	Xingchen Ma, Matthew Blaschko	In this work, we generalize the additive assumption to tree-structured functions and propose an additive tree-structured covariance function, showing improved sample-efficiency, wider applicability and greater flexibility.
94	Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms	Ping Ma, Xinlian Zhang, Xin Xing, Jingyi Ma, Michael Mahoney	In this article, we develop asymptotic analysis to derive the distribution of RandNLA sampling estimators for the least-squares problem.
95	The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions	Feras Saad, Cameron Freer, Martin Rinard, Vikash Mansinghka	This paper introduces a new algorithm for the fundamental problem of generating a random integer from a discrete probability distribution using a source of independent and unbiased random coin flips.
96	A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization	Zhize Li, Jian LI	In this paper, we show that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate $O(\sqrt{\kappa}\ln\frac{1}{\epsilon})$, which improves the previous result $O(\kappa\ln\frac{1}{\epsilon})$ provided by (Toth & Kelley, 2015) for quadratic functions.
97	Black Box Submodular Maximization: Discrete and Continuous Settings	Lin Chen, Mingrui Zhang, Hamed Hassani, Amin Karbasi	In this paper, we consider the problem of black box continuous submodular maximization where we only have access to the function values and no information about the derivatives is provided.
98	Corruption-Tolerant Gaussian Process Bandit Optimization	Ilija Bogunovic, Andreas Krause, Jonathan Scarlett	We introduce an algorithm Fast-Slow GP-UCB based on Gaussian process methods, randomized selection between two instances labeled ’fast’ (but non-robust) and ’slow’ (but robust), enlarged confidence bounds, and the principle of optimism under uncertainty.
99	On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms	Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar	We start with the MAML method and its first-order approximation (FO-MAML) and highlight the challenges that emerge in their analysis.
100	Alternating Minimization Converges Super-Linearly for Mixed Linear Regression	Avishek Ghosh, Ramchandran Kannan	In this paper, we close this gap between theory and practice for the special case of a mixture of $2$ linear regressions.
101	Learning Gaussian Graphical Models via Multiplicative Weights	Anamay Chaturvedi, Jonathan Scarlett	In this paper, we adapt a recently proposed algorithm of Klivans and Meka (FOCS, 2017), based on the method of multiplicative weight updates, from the Ising model to the Gaussian model, via non-trivial modifications to both the algorithm and its analysis.
102	Mitigating Overfitting in Supervised Classification from Two Unlabeled Datasets: A Consistent Risk Correction Approach	Nan Lu, Tianyi Zhang, Gang Niu, Masashi Sugiyama	Therefore, we propose to wrap the terms that cause a negative empirical risk by certain correction functions.
103	Infinitely deep neural networks as diffusion processes	Stefano Peluchetti, Stefano Favaro	We consider parameter distributions that shrink as the number of layers increases in order to recover well-behaved stochastic processes in the limit of infinite depth.
104	Stable behaviour of infinitely wide deep neural networks	Stefano Peluchetti, Stefano Favaro, Sandra Fortini	We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are independent and identically distributed as symmetric centered stable distributions.
105	Neural Topic Model with Attention for Supervised Learning	Xinyi Wang, YI YANG	This paper presents Topic Attention Model (TAM), a supervised neural topic model that integrates an attention recurrent neural network (RNN) model.
106	Causal Mosaic: Cause-Effect Inference via Nonlinear ICA and Ensemble Method	Pengzhou Wu, Kenji Fukumizu	We address the problem of distinguishing cause from effect in bivariate setting.
107	Stochastic Bandits with Delay-Dependent Payoffs	Leonardo Cella, Nicol? Cesa-Bianchi	Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled.
108	Risk Bounds for Learning Multiple Components with Permutation-Invariant Losses	Fabien Lauer	This paper proposes a simple approach to derive efficient error bounds for learning multiple components with sparsity-inducing regularization.
109	Balancing Learning Speed and Stability in Policy Gradient via Adaptive Exploration	Matteo Papini, Andrea Battistello, Marcello Restelli	Using tools from the safe PG literature, we design a surrogate objective for the policy variance that captures the effects this parameter has on the learning speed and on the quality of the final solution.
110	Independent Subspace Analysis for Unsupervised Learning of Disentangled Representations	Jan Stuehmer, Richard Turner, Sebastian Nowozin	We first show that these modifications, e.g. beta-VAE, simplify the tendency of variational inference to underfit, causing pathological over-pruning and over-orthogonalization of learned components. Second, we propose a complementary approach: to modify the probabilistic model with a structured latent prior.
111	A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players	Abbas Mehrabian, Etienne Boursier, Emilie Kaufmann, Vianney Perchet	We consider the challenging heterogeneous setting, in which different arms may have different means for different players, and propose a new and efficient algorithm that combines the idea of leveraging forced collisions for implicit communication and that of performing matching eliminations.
112	Regularity as Regularization: Smooth and Strongly Convex Brenier Potentials in Optimal Transport	Fran\cois-Pierre Paty, Alexandre d?Aspremont, Marco Cuturi	We propose in this work to draw inspiration from this theory and use regularity as a regularization tool.
113	On Generalization Bounds of a Family of Recurrent Neural Networks	Minshuo Chen, Xingguo Li, Tuo Zhao	To connect theory and practice, we study the generalization properties of vanilla RNNs as well as their variants, including Minimal Gated Unit (MGU), Long Short Term Memory (LSTM), and Convolutional (Conv) RNNs.
114	Simulator Calibration under Covariate Shift with Kernels	Keiichi Kisamori, Motonobu Kanagawa, Keisuke Yamazaki	We propose a novel calibration method for computer simulators, dealing with the problem of covariate shift.Covariate shift is the situation where input distributions for training and test are different, and ubiquitous in applications of simulations.
115	Convergence Rates of Gradient Descent and MM Algorithms for Bradley-Terry Models	Milan Vojnovic, Se-Young Yun, Kaifang Zhou	We present tight convergence rate bounds for gradient descent and MM algorithms for maximum likelihood (ML) estimation and maximum a posteriori probability (MAP) estimation of a popular Bayesian inference method, for Bradley-Terry models of ranking data.
116	A Locally Adaptive Bayesian Cubature Method	Matthew Fisher, Chris Oates, Catherine Powell, Aretha Teckentrup	The main contributions of this work are twofold; first we establish that existing BC methods do not possess local adaptivity in the sense of many classical adaptive methods and secondly, we developed a novel BC method whose behaviour, demonstrated empirically, is analogous to such methods.
117	Fast and Bayes-consistent nearest neighbors	Klim Efremenko, Aryeh Kontorovich, Moshe Noivirt	This paper aims at bridging these realms: to reap the advantages of fast evaluation time while maintaining Bayes consistency, and further without sacrificing too much in the risk decay rate.
118	Explaining the Explainer: A First Theoretical Analysis of LIME	Damien Garreau, Ulrike Luxburg	In this paper, we provide the first theoretical analysis of LIME.
119	A Continuous-time Perspective for Modeling Acceleration in Riemannian Optimization	Foivos Alimisis, Antonio Orvieto, Gary Becigneul, Aurelien Lucchi	We propose a novel second-order ODE as the continuous-time limit of a Riemannian accelerated gradient-based method on a manifold with curvature bounded from below.
120	Deep Active Learning: Unified and Principled Method for Query and Training	Changjian Shui, Fan Zhou, Christian Gagn?, Boyu Wang	In this paper, we are proposing a unified and principled method for both the querying and training processes in deep batch active learning.
121	Sparse and Low-rank Tensor Estimation via Cubic Sketchings	Botao Hao, Anru R. Zhang, Guang Cheng	In this paper, we propose a general framework for sparse and low-rank tensor estimation from cubic sketchings.
122	A nonasymptotic law of iterated logarithm for general M-estimators	Arnak Dalalyan, Nicolas Schreuder, Victor-Emmanuel Brunel	In this paper, we propose the first non-asymptotic ’any-time’ deviation bounds for general M-estimators, where ’any-time’ means that the bound holds with a prescribed probability for every sample size.
123	Robust Stackelberg buyers in repeated auctions	Thomas Nedelec, Clement Calauzenes, Vianney Perchet, Noureddine El Karoui	We consider the practical and classical setting where the seller is using an exploration stage to learn the value distributions of the bidders before running a revenue-maximizing auction in a exploitation phase.
124	Radial Bayesian Neural Networks: Beyond Discrete Support In Large-Scale Bayesian Deep Learning	Sebastian Farquhar, Michael A. Osborne, Yarin Gal	We propose Radial Bayesian Neural Networks (BNNs): a variational approximate posterior for BNNs which scales well to large models.
125	Practical Nonisotropic Monte Carlo Sampling in High Dimensions via Determinantal Point Processes	Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang	We propose a new class of practical structured methods for nonisotropic Monte Carlo (MC) sampling, called DPPMC, designed for high-dimensional nonisotropic distributions where samples are correlated to reduce the variance of the estimator via determinantal point processes.
126	Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation	Si Yi Meng, Sharan Vaswani, Issam Hadj Laradji), Mark Schmidt, Simon Lacoste-Julien	We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models.
127	Two-sample Testing Using Deep Learning	Matthias Kirchler, Shahryar Khorasani, Marius Kloft, Christoph Lippert	We propose a two-sample testing procedure based on learned deep neural network representations.
128	RATQ: A Universal Fixed-Length Quantizer for Stochastic Optimization	Prathamesh Mayekar, Himanshu Tyagi	We present Rotated Adaptive Tetra-iterated Quantizer (RATQ), afixed-length quantizer for gradients in first order stochasticoptimization.
129	Rep the Set: Neural Networks for Learning Set Representations	Konstantinos Skianis, Giannis Nikolentzos, Stratis Limnios, Michalis Vazirgiannis	In this paper, we present a new neural network architecture, called RepSet, that can handle examples that are represented as sets of vectors.
130	A Multiclass Classification Approach to Label Ranking	Robin Vogel, St?phan Cl?men\con	This article is devoted to the analysis of this statistical learning problem, halfway between multiclass classification and posterior probability estimation (regression) and referred to as \textit{label ranking} here.
131	Conservative Exploration in Reinforcement Learning	Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta	We present two optimistic algorithms that guarantee (w.h.p.) that the conservative constraint is never violated during learning.
132	A principled approach for generating adversarial images under non-smooth dissimilarity metrics	Aram-Alexandre Pooladian, Chris Finlay, Tim Hoheisel, Adam Oberman	In this work, we propose an attack methodology not only for cases where the perturbations are measured by Lp norms, but in fact any adversarial dissimilarity metric with a closed proximal form.
133	Regularization via Structural Label Smoothing	Weizhi Li, Gautam Dasarathy, Visar Berisha	In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network by softening the ground-truth labels in the training data in an attempt to penalize overconfident outputs.
134	Communication-Efficient Asynchronous Stochastic Frank-Wolfe over Nuclear-norm Balls	Jiacheng Zhuo, Qi Lei, Alex Dimakis, Constantine Caramanis	In this work, we propose an asynchronous Stochastic Frank Wolfe (SFW-asyn) method, which, for the first time, solves the two problems simultaneously, while successfully maintaining the same convergence rate as the vanilla SFW.
135	Linear Convergence of Adaptive Stochastic Gradient Descent	Yuege Xie, Xiaoxia Wu, Rachel Ward	We prove that the norm version of the adaptive stochastic gradient method (AdaGrad-Norm) achieves a linear convergence rate for a subset of either strongly convex functions or non-convex functions that satisfy the Polyak Lojasiewicz (PL) inequality.
136	Contextual Combinatorial Volatile Multi-armed Bandit with Adaptive Discretization	Andi Nika, Sepehr Elahi, Cem Tekin	Under the semi-bandit feedback setting and assuming that the contexts lie in a space ${\cal X}$ endowed with the Euclidean norm and that the expected base arm outcomes (expected rewards) are Lipschitz continuous in the contexts (expected base arm outcomes), we propose an algorithm called Adaptive Contextual Combinatorial Upper Confidence Bound (ACC-UCB).
137	A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach	Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil	In this paper we consider solving saddle point problems using two variants of Gradient Descent-Ascent algorithms, Extra-gradient (EG) and Optimistic Gradient Descent Ascent (OGDA) methods.
138	Bandit Convex Optimization in Non-stationary Environments	Peng Zhao, Guanghui Wang, Lijun Zhang, Zhi-Hua Zhou	In this paper, we investigate BCO in non-stationary environments and choose the dynamic regret as the performance measure, which is defined as the difference between the cumulative loss incurred by the algorithm and that of any feasible comparator sequence.
139	Decentralized Multi-player Multi-armed Bandits with No Collision Information	Chengshuai Shi, Wei Xiong, Cong Shen, Jing Yang	The decentralized stochastic multi-player multi-armed bandit (MP-MAB) problem, where the collision information is not available to the players, is studied in this paper.
140	Bayesian Image Classification with Deep Convolutional Gaussian Processes	Vincent Dutordoir, Mark Wilk, Artem Artemev, James Hensman	We propose a translation insensitive convolutional kernel, which relaxes the translation invariance constraint imposed by previous convolutional GPs.
141	Optimizing Millions of Hyperparameters by Implicit Differentiation	Jonathan Lorraine, Paul Vicol, David Duvenaud	We propose an algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations.
142	A Topology Layer for Machine Learning	Rickard Br?el Gabrielsson, Bradley J. Nelson, Anjan Dwaraknath, Primoz Skraba	We present a differentiable topology layer that computes persistent homology based on level set filtrations and edge-based filtrations.
143	Differentiable Feature Selection by Discrete Relaxation	Rishit Sheth, Nicol? Fusi	In this paper, we introduce Differentiable Feature Selection, a gradient-based search algorithm for feature selection.
144	Private Protocols for U-Statistics in the Local Model and Beyond	James Bell, Aur?lien Bellet, Adria Gascon, Tejas Kulkarni	In this paper, we study the problem of computing $U$-statistics of degree $2$, i.e., quantities that come in the form of averages over pairs of data points, in the local model of differential privacy (LDP).
145	Automatic Differentiation of Some First-Order Methods in Parametric Optimization	Sheheryar Mehmood, Peter Ochs	We aim at computing the derivative of the solution to a parametric optimization problem with respect to the involved parameters.
146	DYNOTEARS: Structure Learning from Time-Series Data	Roxana Pamfil, Nisara Sriwattanaworachai, Shaan Desai, Philip Pilgerstorfer, Konstantinos Georgatzis, Paul Beaumont, Bryon Aragam	We revisit the structure learning problem for dynamic Bayesian networks and propose a method that simultaneously estimates contemporaneous (intra-slice) and time-lagged (inter-slice) relationships between variables in a time-series.
147	Unsupervised Hierarchy Matching with Optimal Transport over Hyperbolic Spaces	David Alvarez-Melis, Youssef Mroueh, Tommi Jaakkola	In contrast, we approach the problem from a purely geometric perspective: given only a vector-space representation of the items in the two hierarchies, we seek to infer correspondences across them.
148	Competing Bandits in Matching Markets	Lydia T. Liu, Horia Mania, Michael Jordan	We propose a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards.
149	Revisiting the Landscape of Matrix Factorization	Hossein Valavi, Sulin Liu, Peter Ramadge	We revisit this problem and provide simple, intuitive proofs of a set of extended results for low-rank and general-rank problems.
150	Value Preserving State-Action Abstractions	David Abel, Nate Umbanhowar, Khimya Khetarpal, Dilip Arumugam, Doina Precup, Michael Littman	To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve representation of near-optimal policies.
151	GP-VAE: Deep Probabilistic Time Series Imputation	Vincent Fortuin, Dmitry Baranchuk, Gunnar Raetsch, Stephan Mandt	We propose a new deep sequential latent variable model for dimensionality reduction and data imputation.
152	Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction	Boyue Li, Shicong Cen, Yuxin Chen, Yuejie Chi	This paper focuses on distributed optimization in the network setting (also known as the decentralized setting), where each agent is only allowed to aggregate information from its neighbors over a graph.
153	Optimized Score Transformation for Fair Classification	Dennis Wei, Karthikeyan Natesan Ramamurthy, Flavio Calmon	In the finite sample setting, we propose to approach this solution using a combination of standard probabilistic classifiers and ADMM.
154	Variational Autoencoders for Sparse and Overdispersed Discrete Data	He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung, Mingyuan Zhou	To address these issues, we develop a VAE-based framework using the negative binomial distribution as the data distribution.
155	Spatio-temporal alignments: Optimal transport through space and time	Hicham Janati, Marco Cuturi, Alexandre Gramfort	In this paper, we propose Spatio-Temporal Alignments (STA), a new differentiable formulation of DTW that captures spatial and temporal variability.
156	Accelerating Smooth Games by Manipulating Spectral Shapes	Wa?ss Azizian, Damien Scieur, Ioannis Mitliagkas, Simon Lacoste-Julien, Gauthier Gidel	In this framework, we describe gradient-based methods, such as extragradient, as transformations on the spectral shape.
157	Langevin Monte Carlo without smoothness	Niladri Chatterji, Jelena Diakonikolas, Michael I. Jordan, Peter Bartlett	In this paper, we remove this limitation, providing polynomial-time convergence guarantees for a variant of LMC in the setting of nonsmooth log-concave distributions.
158	EM Converges for a Mixture of Many Linear Regressions	Jeongyeol Kwon, Constantine Caramanis	We study the convergence of the Expectation-Maximization (EM) algorithm for mixtures of linear regressions with an arbitrary number $k$ of components.
159	Locally Accelerated Conditional Gradients	Jelena Diakonikolas, Alejandro Carderera, Sebastian Pokutta	To address this issue, we present Locally Accelerated Conditional Gradients – an algorithmic framework that couples accelerated steps with conditional gradient steps to achieve \emph{local} acceleration on smooth strongly convex problems.
160	Coping With Simulators That Don?t Always Return	Andrew Warrington, Frank Wood, Saeid Naderiparizi	We investigate and address computational inefficiencies that arise from adding process noise to deterministic simulators that fail to return for certain inputs; a property we describe as ’brittle’.
161	Post-Estimation Smoothing: A Simple Baseline for Learning with Side Information	Esther Rolf, Michael I. Jordan, Benjamin Recht	We propose a post-estimation smoothing operator as a fast and effective method for incorporating structural index data into prediction.
162	Equalized odds postprocessing under imperfect group information	Pranjal Awasthi, Matth?us Kleindessner, Jamie Morgenstern	In this paper, we ask to what extent fairness interventions can be effective even when only imperfect information about the protected attribute is available.
163	The True Sample Complexity of Identifying Good Arms	Julian Katz-Samuels, Kevin Jamieson	We consider two multi-armed bandit problems with $n$ arms: \emph{(i)} given an $\epsilon > 0$, identify an arm with mean that is within $\epsilon$ of the largest mean and \emph{(ii)} given a threshold $\mu_0$ and integer $k$, identify $k$ arms with means larger than $\mu_0$.
164	Validated Variational Inference via Practical Posterior Error Bounds	Jonathan Huggins, Mikolaj Kasprzak, Trevor Campbell, Tamara Broderick	In this paper, we provide rigorous bounds on the error of posterior mean and uncertainty estimates that arise from full-distribution approximations, as in variational inference.
165	A Rule for Gradient Estimator Selection, with an Application to Variational Inference	Tomas Geffner, Justin Domke	Inspired by this principle, we propose a technique to automatically select an estimator when a finite pool of estimators is given.
166	Naive Feature Selection: Sparsity in Naive Bayes	Armin Askari, Alexandre d?Aspremont, Laurent El Ghaoui	We propose a sparse version of naive Bayes, which can be used for feature selection.
167	Fixed-confidence guarantees for Bayesian best-arm identification	Xuedong Shang, Rianne Heide, Pierre Menard, Emilie Kaufmann, Michal Valko	As our main contribution, we provide the first sample complexity analysis of TTTS and T3C when coupled with a very natural Bayesian stopping rule, for bandits with Gaussian rewards, solving one of the open questions raised by Russo (2016).
168	Learning Hierarchical Interactions at Scale: A Convex Optimization Approach	Hussein Hazimeh, Rahul Mazumder	In this paper, we study a convex relaxation which enforces strong hierarchy and develop a highly scalable algorithm based on proximal gradient descent.
169	OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits	Niladri Chatterji, Vidya Muthukumar, Peter Bartlett	We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information.
170	Optimization Methods for Interpretable Differentiable Decision Trees Applied to Reinforcement Learning	Andrew Silva, Matthew Gombolay, Taylor Killian, Ivan Jimenez, Sung-Hyun Son	We overcome this limitation by allowing for a gradient update over the entire tree that improves sample complexity affords interpretable policy extraction.
171	Sharp Analysis of Expectation-Maximization for Weakly Identifiable Models	Raaz Dwivedi, Nhat Ho, Koulik Khamaru, Martin Wainwright, Michael Jordan, Bin Yu	We study a class of weakly identifiable location-scale mixture models for which the maximum likelihood estimates based on $n$ i.i.d. samples are known to have lower accuracy than the classical $n^{- \frac{1}{2}}$ error.
172	Stochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory	Jianyi Zhang, Ruiyi Zhang, Lawrence Carin, Changyou Chen	Notably, for the first time, we develop non-asymptotic convergence theory for the SPOS framework (related to SVGD), characterizing algorithm convergence in terms of the 1-Wasserstein distance w.r.t. the numbers of particles and iterations.
173	Dynamical Systems Theory for Causal Inference with Application to Synthetic Control Methods	Yi Ding, Panos Toulis	In this paper, we adopt results in nonlinear time series analysis for causal inference in dynamical settings.
174	RelatIF: Identifying Explanatory Training Samples via Relative Influence	Elnaz Barshan, Marc-Etienne Brunet, Gintare Karolina Dziugaite	In this work, we focus on the use of influence functions to identify relevant training examples that one might hope “explain” the predictions of a machine learning model.
175	Ensemble Gaussian Processes with Spectral Features for Online Interactive Learning with Scalability	Qin Lu, Georgios Karanikolas, Yanning Shen, Georgios B. Giannakis	While most GP approaches rely on a single preselected prior, the present work employs a weighted ensemble of GP priors, each having a unique covariance (kernel) belonging to a prescribed kernel dictionary – which leads to a richer space of learning functions.
176	Distributionally Robust Bayesian Quadrature Optimization	Thanh Nguyen, Sunil Gupta, Huong Ha, Santu Rana, Svetha Venkatesh	In this work, we study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d samples.
177	Sparse Orthogonal Variational Inference for Gaussian Processes	Jiaxin Shi, Michalis Titsias, Andriy Mnih	We introduce a new interpretation of sparse variational approximations for Gaussian processes using inducing points, which can lead to more scalable algorithms than previous methods.
178	The Sylvester Graphical Lasso (SyGlasso)	Yu Wang, Byoungwook Jang, Alfred Hero	This paper introduces the Sylvester graphical lasso (SyGlasso) that captures multiway dependencies present in tensor-valued data.
179	Frequentist Regret Bounds for Randomized Least-Squares Value Iteration	Andrea Zanette, David Brandfonbrener, Emma Brunskill, Matteo Pirotta, Alessandro Lazaric	In this paper, we introduce an optimistically-initialized variant of the popular randomized least-squares value iteration (RLSVI), a model-free algorithm where exploration is induced by perturbing the least-squares approximation of the action-value function.
180	DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate	Saeed Soori, Konstantin Mishchenko, Aryan Mokhtari, Maryam Mehri Dehnavi, Mert Gurbuzbalaban	In this paper, we consider distributed algorithms for solving the empirical risk minimization problem under the master/worker communication model.
181	Discrete Action On-Policy Learning with Action-Value Critic	Yuguang Yue, Yunhao Tang, Mingzhang Yin, Mingyuan Zhou	To effectively operate in multidimensional discrete action spaces, we construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
182	Old Dog Learns New Tricks: Randomized UCB for Bandit Problems	Sharan Vaswani, Abbas Mehrabian, Audrey Durand, Branislav Kveton	We propose RandUCB, a bandit strategy that uses theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), uses randomization to trade off exploration and exploitation.
183	Thompson Sampling for Linearly Constrained Bandits	Vidit Saxena, Joakim Jalden, Joseph Gonzalez	In this paper, we describe LinConTS, a TS-based algorithm for bandits that place a linear constraint on the probability of earning a reward in every round.
184	Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles	Aditya Modi, Nan Jiang, Ambuj Tewari, Satinder Singh	In this paper, we consider a setting where we have access to an ensemble of pre-trained and possibly inaccurate simulators (models).
185	FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization	Amirhossein Reisizadeh, Aryan Mokhtari, Hamed Hassani, Ali Jadbabaie, Ramtin Pedarsani	In this paper, we present FedPAQ, a communication-efficient Federated Learning method with Periodic Averaging and Quantization.
186	Online Learning Using Only Peer Prediction	Yang Liu, Dave Helmbold	We propose an approach that uses peer prediction and identify conditions where it succeeds.
187	Deontological Ethics By Monotonicity Shape Constraints	Serena Wang, Maya Gupta	We propose that in some cases such ethical principles can be incorporated into a machine-learned model by adding shape constraints that constrain the model to respond only positively to relevant inputs.
188	On Random Subsampling of Gaussian Process Regression: A Graphon-Based Analysis	Kohei Hayashi, Masaaki Imaizumi, Yuichi Yoshida	In this paper, we study random subsampling of Gaussian process regression, one of the simplest approximation baselines, from a theoretical perspective.
189	Randomized Exploration in Generalized Linear Bandits	Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier	We study two randomized algorithms for generalized linear bandits.
190	Assessing Local Generalization Capability in Deep Models	Huan Wang, Nitish Shirish Keskar, Caiming Xiong, Richard Socher	Guided by the proof, we propose a metric to score the generalization capability of a model, as well as an algorithm that optimizes the perturbed model accordingly.
191	Fast Algorithms for Computational Optimal Transport and Wasserstein Barycenter	Wenshuo Guo, Nhat Ho, Michael Jordan	We provide theoretical complexity analysis for new algorithms to compute the optimal transport (OT) distance between two discrete probability distributions, and demonstrate their favorable practical performance compared to state-of-art primal-dual algorithms.
192	Adaptive Discretization for Evaluation of Probabilistic Cost Functions	Christoph Zimmer, Danny Driess, Mona Meister, Nguyen-Tuong Duy	In this paper, we propose an approach for evaluating continuous candidate paths by employing an adaptive discretization scheme, with a probabilistic cost function learned from observations.
193	Censored Quantile Regression Forest	Alexander Hanbo Li, Jelena Bradic	Based on a local adaptive representation of random forests, we develop its regression adjustment for randomly censored regression quantile models.
194	Choosing the Sample with Lowest Loss makes SGD Robust	Vatsal Shah, Xiaoxia Wu, Sujay Sanghavi	In this paper we propose a simple variant of the simple SGD method: in each step, first choose a set of k samples, then from these choose the one with the smallest current loss, and do an SGD-like update with this chosen sample.
195	Learning with minibatch Wasserstein : asymptotic and gradient properties	Kilian Fatras, Younes Zine, R?mi Flamary, Remi Gribonval, Nicolas Courty	We propose in this paper an analysis of this practice, which effects are not well understood so far.
196	AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC	Ruqi Zhang, A. Feder Cooper, Christopher De Sa	To address this tension, we propose a novel second-order SG-MCMC algorithm—AMAGOLD—that infrequently uses Metropolis-Hastings (M-H) corrections to remove bias.
197	On casting importance weighted autoencoder to an EM algorithm to learn deep generative models	Dongha Kim, Jaesung Hwang, Yongdai Kim	We propose a new and general approach to learn deep generative models.
198	Conditional Linear Regression	Diego Calderon, Brendan Juba, Sirui Li, Zongyi Li, Lisa Ruan	We give a computationally efficient algorithm with theoretical analysis for the conditional linear regression task, which is the joint task of identifying a significant portion of the data distribution, described by a k-DNF, along with a linear predictor on that portion with a small loss.
199	Distributionally Robust Bayesian Optimization	Johannes Kirschner, Ilija Bogunovic, Stefanie Jegelka, Andreas Krause	In this paper, we study such a problem when the distributional shift is measured via the maximum mean discrepancy (MMD).
200	On the optimality of kernels for high-dimensional clustering	Leena C Vankadara, Debarghya Ghoshdastidar	We consider the problem of high dimensional Gaussian clustering and show that, with the exponential kernel function, the sufficient conditions for partial recovery of clusters using the NP-hard kernel k-means objective matches the known information-theoretic limit up to a factor of $\sqrt{2}$.
201	Improved Regret Bounds for Projection-free Bandit Convex Optimization	Dan Garber, Ben Kretzu	We present the first such algorithm that attains $O(T^{3/4})$ expected regret using only $O(T)$ overall calls to the linear optimization oracle, in expectation, where $T$ in the number of prediction rounds.
202	Variational Autoencoders and Nonlinear ICA: A Unifying Framework	Ilyes Khemakhem, Diederik Kingma, Ricardo Monti, Aapo Hyvarinen	We address this issue by showing that for a broad family of deep latent-variable models, identification of the true joint distribution over observed and latent variables is actually possible up to very simple transformations, thus achieving a principled and powerful form of disentanglement.
203	Online Learning with Continuous Variations: Dynamic Regret and Reductions	Ching-An Cheng, Jonathan Lee, Ken Goldberg, Byron Boots	Motivated by this observation, we establish a new setup, called Continuous Online Learning (COL), where the gradient of online loss function changes continuously across rounds with respect to the learner’s decisions.
204	An Optimal Algorithm for Bandit Convex Optimization with Strongly-Convex and Smooth Loss	Shinji Ito	Our study resolves these two issues by introducing an algorithm that achieves an optimal regret bound of $\tilde{O}(d \sqrt{T})$ under a mild assumption, without self-concordant barriers.
205	A Deep Generative Model for Fragment-Based Molecule Generation	Marco Podda, Davide Bacciu, Alessio Micheli	In this work, we address two limitations of the former: generation of invalid and duplicate molecules.
206	Deep Structured Mixtures of Gaussian Processes	Martin Trapp, Robert Peharz, Franz Pernkopf, Carl Edward Rasmussen	In this paper, we introduce deep structured mixtures of GP experts, a stochastic process model which i) allows exact posterior inference, ii) has attractive computational and memory costs, and iii) when used as GP approximation, captures predictive uncertainties consistently better than previous expert-based approximations.
207	Noisy-Input Entropy Search for Efficient Robust Bayesian Optimization	Lukas Fr?hlich, Edgar Klenske, Julia Vinogradska, Christian Daniel, Melanie Zeilinger	We consider the problem of robust optimization within the well-established Bayesian Optimization (BO) framework.
208	Dependent randomized rounding for clustering and partition systems with knapsack constraints	David Harris, Thomas Pensyl, Aravind Srinivasan, Khoa Trinh	We develop new randomized algorithms targeting such problems, and study two in particular: multi-knapsack median and multi-knapsack center.
209	Domain-Liftability of Relational Marginal Polytopes	Ondrej Kuzelka, Yuyi Wang	In this paper we study the following two problems: (i) Do domain-liftability results for the partition functions of Markov logic networks (MLNs)carry over to the problem of relational marginal polytope construction?
210	Derivative-Free & Order-Robust Optimisation	Haitham Ammar, Victor Gabillon, Rasul Tutunov, Michal Valko	In this paper, we formalise order-robust optimisation as an instance of online learning minimising simple regret, and propose Vroom, a zero’th order optimisation algorithm capable of achieving vanishing regret in non-stationary environments, while recovering favorable rates under stochastic reward-generating processes.
211	Stepwise Model Selection for Sequence Prediction via Deep Kernel Learning	Yao Zhang, Daniel Jarrett, Mihaela Schaar	In this paper, we propose a novel Bayesian optimization (BO) algorithm to tackle the challenge of model selection in this setting.
212	Dynamic content based ranking	Seppo Virtanen, Mark Girolami	We introduce a novel state space model for a set of sequentially time-stamped partial rankings of items and textual descriptions for the items.
213	Fairness Evaluation in Presence of Biased Noisy Labels	Riccardo Fogliato, Alexandra Chouldechova, Max G?Sell	We propose a sensitivity analysis framework for assessing how assumptions on the noise across groups affect the predictive bias properties of the risk assessment model as a predictor of reoffense.
214	Calibrated Surrogate Maximization of Linear-fractional Utility in Binary Classification	Han Bao, Masashi Sugiyama	In this paper, we consider linear-fractional metrics, which are a family of classification performance metrics that encompasses many standard ones such as the F-measure and Jaccard index, and propose methods to directly maximize performances under those metrics.
215	Decentralized gradient methods: does topology matter?	Giovanni Neglia, Chuan Xu, Don Towsley, Gianmarco Calbi	While theoretical results suggest that worker communication topology should have strong impact on the number of epochs needed to converge, previous experiments have shown the opposite conclusion. This paper sheds lights on this apparent contradiction and show how sparse topologies can lead to faster convergence even in the absence of communication delays.
216	Truly Batch Model-Free Inverse Reinforcement Learning about Multiple Intentions	Giorgia Ramponi, Amarildo Likmeta, Alberto Maria Metelli, Andrea Tirinzoni, Marcello Restelli	In this paper, we address the IRL about multiple intentions in a fully model-free and batch setting.
217	Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness	Ant?nio H. Ribeiro, Koen Tiels, Luis A. Aguirre, Thomas Sch?n	In this paper, we argue that this principle, while powerful, might need some refinement to explain recent developments.
218	Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks	Jinming Xu, Ye Tian, Ying Sun, Gesualdo Scutari	This paper proposes a novel family of primal-dual-based distributed algorithms for smooth, convex, multi-agent optimization over networks that uses only gradient information and gossip communications.
219	Stochastic Linear Contextual Bandits with Diverse Contexts	Weiqiang Wu, Jing Yang, Cong Shen	In this paper, we investigate the impact of context diversity on stochastic linear contextual bandits.
220	Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models	Benjamin Lengerich, Sarah Tan, Chun-Hao Chang, Giles Hooker, Rich Caruana	To compute this decomposition, we present a fast, exact algorithm that transforms any piecewise-constant function (such as a tree-based model) into a purified, canonical representation.
221	Balanced Off-Policy Evaluation in General Action Spaces	Arjun Sondhi, David Arbour, Drew Dimmery	In this work we present balanced off-policy evaluation (B-OPE), a generic method for estimating weights which minimize this imbalance.
222	Approximate Cross-Validation in High Dimensions with Guarantees	William Stephenson, Tamara Broderick	Approximate Cross-Validation in High Dimensions with Guarantees
223	How fine can fine-tuning be? Learning efficient language models	Evani Radiya-Dixit, Xin Wang	In this work, we address these questions by using Bidirectional Encoder Representations from Transformers (BERT) as an example.
224	Interpretable Companions for Black-Box Models	Danqing Pan, Tong Wang, Satoshi Hara	We present an interpretable companion model for any pre-trained black-box classifiers.
225	A PTAS for the Bayesian Thresholding Bandit Problem	Yue Qin, Jian Peng, Yuan Zhou	In this paper, we study the Bayesian thresholding bandit problem (BTBP), where the goal is to adaptively make a budget of $Q$ queries to $n$ stochastic arms and determine the label for each arm (whether its mean reward is closer to $0$ or $1$).
226	Learning Rate Adaptation for Differentially Private Learning	Antti Koskela, Antti Honkela	In this paper, we propose a differentially private algorithm for the adaptation of the learning rate for differentially private stochastic gradient descent (SGD) that avoids the need for validation set use.
227	Thresholding Graph Bandits with GrAPL	Daniel LeJeune, Gautam Dasarathy, Richard Baraniuk	In this paper, we introduce a new online decision making paradigm that we call Thresholding Graph Bandits.
228	Bandit optimisation of functions in the Mat?rn kernel RKHS	David Janz, David Burt, Javier Gonzalez	We consider the problem of optimising functions in the reproducing kernel Hilbert space (RKHS) of a Matérn kernel with smoothness parameter $u$ over the domain $[0,1]^d$ under noisy bandit feedback.
229	Hypothesis Testing Interpretations and Renyi Differential Privacy	Borja Balle, Gilles Barthe, Marco Gaboardi, Justin Hsu, Tetsuya Sato	In this paper, we identify some conditions under which a privacy definition given in terms of a statistical divergence satisfies a similar interpretation.
230	Lipschitz Continuous Autoencoders in Application to Anomaly Detection	Young-geun Kim, Yongchan Kwon, Hyunwoong Chang, Myunghee Cho Paik	In this work, we formalize current practices, build a theoretical framework of anomaly detection algorithms equipped with an objective function and a hypothesis space, and establish a desirable property of the anomaly detection algorithm, namely, admissibility.
231	Private k-Means Clustering with Stability Assumptions	Moshe Shechner, Or Sheffet, Uri Stemmer	In this work we improve upon this line of works on multiple axes.
232	Momentum in Reinforcement Learning	Nino Vieillard, Bruno Scherrer, Olivier Pietquin, Matthieu Geist	Specifically,we propose a simple improvement on DQN based on MoVI, and experiment it on Atari games.
233	A Primal-Dual Solver for Large-Scale Tracking-by-Assignment	Stefan Haller, Mangal Prakash, Lisa Hutschenreiter, Tobias Pietzsch, Carsten Rother, Florian Jug, Paul Swoboda, Bogdan Savchynskyy	We propose a fast approximate solver for the combinatorial problem known as tracking-by-assignment, which we apply to cell tracking.
234	Precision-Recall Curves Using Information Divergence Frontiers	Josip Djolonga, Mario Lucic, Marco Cuturi, Olivier Bachem, Olivier Bousquet, Sylvain Gelly	In this paper, we present a general evaluation framework for generative models that measures the trade-off between precision and recall using Renyi divergences.
235	Computing Tight Differential Privacy Guarantees Using FFT	Antti Koskela, Joonas J?lk?, Antti Honkela	In this paper, we propose a numerical accountant for evaluating the privacy loss for algorithms with continuous one dimensional output.
236	Hyperbolic Manifold Regression	Gian Marconi, Carlo Ciliberto, Lorenzo Rosasco	In this work, we consider the task of regression onto hyperbolic space for whichwe propose two approaches: a non-parametric kernel-method for which we also proveexcess risk bounds and a parametric deep learning model that is informed bythe geodesics of the target space.
237	Approximate Inference with Wasserstein Gradient Flows	Charlie Frogner, Tomaso Poggio	We present a novel approximate inference method for diffusion processes, based on the Wasserstein gradient flow formulation of the diffusion.
238	Thresholding Bandit Problem with Both Duels and Pulls	Yichong Xu, Xi Chen, Aarti Singh, Artur Dubrawski	This paper provides an algorithm called Rank-Search (RS) for solving TBP-DC by alternating between ranking and binary search.
239	GAIT: A Geometric Approach to Information Theory	Jose Gallego Posada, Ankit Vani, Max Schwarzer, Simon Lacoste-Julien	Based on this notion of entropy, we introduce geometry-aware counterparts for several concepts and theorems in information theory.
240	On Thompson Sampling for Smoother-than-Lipschitz Bandits	James Grant, David Leslie	We provide the first bounds on the regret of Thompson Sampling for continuum armed bandits under weak conditions on the function class containing the true function and sub-exponential observation noise.
241	Safe-Bayesian Generalized Linear Regression	Rianne Heide, Alisa Kirichenko, Peter Grunwald, Nishant Mehta	We show that for generalized linear models (GLMs), $\eta$-generalized Bayes concentrates around the best approximation of the truth within the model for specific $\eta eq 1$, even under severely misspecified noise, as long as the tails of the true distribution are exponential.
242	Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy	Majid Jahani, Xi He, Chenxin Ma, Aryan Mokhtari, Dheevatsa Mudigere, Alejandro Ribeiro, Martin Takac	In this paper, we propose a Distributed Accumulated Newton Conjugate gradiEnt (DANCE) method in which sample size is gradually increasing to quickly obtain a solution whose empirical loss is under satisfactory statistical accuracy.
243	Contextual Constrained Learning for Dose-Finding Clinical Trials	Hyun-Suk Lee, Cong Shen, James Jordon, Mihaela Schaar	In this paper, we propose C3T-Budget, a contextual constrained clinical trial algorithm for dose-finding under both budget and safety constraints.
244	Support recovery and sup-norm convergence rates for sparse pivotal estimation	Mathurin Massias, Quentin Bertrand, Alexandre Gramfort, Joseph Salmon	In this work we show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
245	Learning Entangled Single-Sample Distributions via Iterative Trimming	Hui Yuan, Yingyu Liang	We study mean estimation and linear regression under general conditions, and analyze a simple and computationally efficient method based on iteratively trimming samples and re-estimating the parameter on the trimmed sample set.
246	The Quantile Snapshot Scan: Comparing Quantiles of Spatial Data from Two Snapshots in Time	Travis Moore, Wong Weng-Keen	We introduce the Quantile Snapshot Scan (Qsnap), a spatial scan algorithm which identifies spatial regions that differ the most between two snapshots in time.
247	Statistical guarantees for local graph clustering	Wooseok Ha, Kimon Fountoulakis, Michael Mahoney	In this paper, we adopt a statistical perspective on local graph clustering, and we analyze the performance of the l1-regularized PageRank method for the recovery of a single target cluster, given a seed node inside the cluster.
248	Learning High-dimensional Gaussian Graphical Models under Total Positivity without Adjustment of Tuning Parameters	Yuhao Wang, Uma Roy, Caroline Uhler	We here propose a new method to estimate the underlying undirected graphical model under MTP2 and show that it is provably consistent in structure recovery without adjusting the tuning parameters.
249	On Pruning for Score-Based Bayesian Network Structure Learning	Alvaro Henrique Chaim Correia, James Cussens, Cassio Campos	We derive new non-trivial theoretical upper bounds for the BDeu score that considerably improve on the state-of-the-art.
250	Statistical and Computational Rates in Graph Logistic Regression	Quentin Berthet, Nicolai Baldin	We consider the problem of graph logistic regression, based on partial observation of a large network, and on side information associated to its vertices.
251	Kernels over Sets of Finite Sets using RKHS Embeddings, with Application to Bayesian (Combinatorial) Optimization	Poompol Buathong, David Ginsbourger, Tipaluck Krityakierne	We focus on kernel methods for set-valued inputs and their application to Bayesian set optimization, notably combinatorial optimization.
252	Rk-means: Fast Clustering for Relational Data	Ryan Curtin, Benjamin Moseley, Hung Ngo, XuanLong Nguyen, Dan Olteanu, Maximilian Schleich	This paper introduces Rk-means, or relational k-means algorithm, for clustering relational data tuples without having to access the full data matrix.
253	Statistical Estimation of the Poincar? constant and Application to Sampling Multimodal Distributions	Loucas Pillaud-Vivien	In this paper, we show both theoretically and experimentally that, given sufficiently many samples of a measure, we can estimate its Poincaré constant.
254	Integrals over Gaussians under Linear Domain Constraints	Alexandra Gessner, Oindrila Kanjilal, Philipp Hennig	We present an efficient black-box algorithm that exploits geometry for the estimation of integrals over a small, truncated Gaussian volume, and to simulate therefrom.
255	Taxonomy of Dual Block-Coordinate Ascent Methods for Discrete Energy Minimization	Siddharth Tourani, Alexander Shekhovtsov, Carsten Rother, Bogdan Savchynskyy	We consider the maximum-a-posteriori inference problem in discrete graphical models and study solvers based on the dual block-coordinate ascent rule.
256	PersLay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures	Mathieu Carriere, Frederic Chazal, Yuichi Ike, Theo Lacombe, Martin Royer, Yuhei Umeda	In this work, we focus on persistence diagrams built on top of graphs.
257	MAP Inference for Customized Determinantal Point Processes via Maximum Inner Product Search	Insu Han, Jennifer Gillenwater	In this work, we propose a new MAP algorithm: we show that, by performing a one-time preprocessing step on a basic DPP, it is possible to run an approximate version of the standard greedy MAP approximation algorithm on any customized version of the DPP in time sublinear in the number of items.
258	Why Non-myopic Bayesian Optimization is Promising and How Far Should We Look-ahead? A Study via Rollout	Xubo Yue, Raed AL Kontar	In this work we focus on the rollout approximation for solving the intractable DP.
259	Robust Optimisation Monte Carlo	Borislav Ikonomov, Michael U. Gutmann	In this paper, we demonstrate an important previously unrecognised failure mode of OMC: It generates strongly overconfident approximations by collapsing regions of similar or near-constant likelihood into a single point.
260	Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis	Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Dipakbhai Thakkar, Blake Woodworth	Our main contribution is to design a framework for providing valid, instance-specific confidence intervals for point estimates that can be generated by heuristics.
261	Fast Markov chain Monte Carlo algorithms via Lie groups	Steve Huntsman	From basic considerations of the Lie group that preserves a target probability measure, we derive the Barker, Metropolis, and ensemble Markov chain Monte Carlo (MCMC) algorithms, as well as variants of waste-recycling Metropolis-Hastings and an altogether new MCMC algorithm.
262	Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning	Tianyu Li, Bogdan Mazoure, Doina Precup, Guillaume Rabusseau	In this paper, we propose a novel algorithm that incorporate reward information into the representations of the environment to unify these two stages.
263	A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games	Wa?ss Azizian, Ioannis Mitliagkas, Simon Lacoste-Julien, Gauthier Gidel	We provide new analysis of the EG’s local and global convergence properties and use is to get a tighter global convergence rate for OG and CO.
264	Doubly Sparse Variational Gaussian Processes	Vincent Adam, Stefanos Eleftheriadis, Artem Artemev, Nicolas Durrande, James Hensman	In this work, we propose to take the best of both worlds: we show that the inducing point framework is still valid for state space models and that it can bring further computational and memory savings.
265	Online Convex Optimization with Perturbed Constraints: Optimal Rates against Stronger Benchmarks	Victor Valls, George Iosifidis, Douglas Leith, Leandros Tassiulas	To this end, we present an online primal-dual proximal gradient algorithm that has $O(T^\epsilon \vee T^{1-\epsilon})$ regret and $O(T^\epsilon)$ constraint violation, where $\epsilon \in [0,1)$ is a parameter in the learning rate.
266	Persistence Enhanced Graph Neural Network	Qi Zhao, Ze Ye, Chao Chen, Yusu Wang	To fully exploit such structural information in real world graphs, we propose a new network architecture which learns to use persistent homology information to reweight messages passed between graph nodes during convolution.
267	Feature relevance quantification in explainable AI: A causal problem	Dominik Janzing, Lenon Minorics, Patrick Bloebaum	We conclude that unconditional rather than conditional expectations provide the right notion of dropping features.
268	Neural Decomposition: Functional ANOVA with Variational Autoencoders	Kaspar M?rtens, Christopher Yau	In this paper, we focus on characterising the sources of variation in Conditional VAEs.
269	BasisVAE: Translation-invariant feature-level clustering with Variational Autoencoders	Kaspar M?rtens, Christopher Yau	In this paper, we propose to achieve this through the BasisVAE: a combination of the VAE and a probabilistic clustering prior, which lets us learn a one-hot basis function representation as part of the decoder network.
270	How To Backdoor Federated Learning	Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, Vitaly Shmatikov	We show that this makes federated learning vulnerable to amodel-poisoning attack that is significantly more powerful than poisoningattacks that target only the training data.
271	Exploiting Categorical Structure Using Tree-Based Methods	Brian Lucena	We develop a mathematical framework for representing the structure of categorical variables and show how to generalize decision trees to make use of this structure.
272	A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments	Adam Foster, Martin Jankowiak, Matthew O?Meara, Yee Whye Teh, Tom Rainforth	We introduce a fully stochastic gradient based approach to Bayesian optimal experimental design (BOED).
273	Mixed Strategies for Robust Optimization of Unknown Objectives	Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause	We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter.
274	Functional Gradient Boosting for Learning Residual-like Networks with Statistical Guarantees	Atsushi Nitanda, Taiji Suzuki	In this paper, to resolve these problems, we propose a new functional gradient boosting for learning deep residual-like networks in a layer-wise fashion with its statistical guarantees on multi-class classification tasks.
275	Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity	Aaron Sidford, Mengdi Wang, Lin Yang, Yinyu Ye	In this paper we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors.
276	Convergence Rates of Smooth Message Passing with Rounding in Entropy-Regularized MAP Inference	Jonathan Lee, Aldo Pacchiano, Michael Jordan	With an appropriately chosen regularization constant, we present a theoretical guarantee on the number of iterations sufficient to recover the true integral MAP solution when the LP is tight and the solution is unique.
277	Finite-Time Error Bounds for Biased Stochastic Approximation with Applications to Q-Learning	Gang Wang, Georgios B. Giannakis	Leveraging a \emph{multistep Lyapunov function} that looks ahead to several future updates to accommodate the gradient bias, we prove a general result on the convergence of the iterates, and use it to derive finite-time bounds on the mean-square error in the case of constant stepsizes.
278	Automated Augmented Conjugate Inference for Non-conjugate Gaussian Process Models	Theo Galy-Fajou, Florian Wenzel, Manfred Opper	We propose automated augmented conjugate inference, a new inference method for non-conjugate Gaussian processes (GP) models.Our method automatically constructs an auxiliary variable augmentation that renders the GP model conditionally conjugate.
279	Bayesian Reinforcement Learning via Deep, Sparse Sampling	Divya Grover, Debabrota Basu, Christos Dimitrakakis	We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal as well as lower computational complexity.
280	Deterministic Decoding for Discrete Data in Variational Autoencoders	Daniil Polykovskiy, Dmitry Vetrov	In this paper, we study a VAE model with a deterministic decoder (DD-VAE) for sequential data that selects the highest-scoring tokens instead of sampling.
281	Monotonic Gaussian Process Flows	Ivan Ustyuzhaninov, Ieva Kazlauskaite, Carl Henrik Ek, Neill Campbell	We propose a new framework for imposing monotonicity constraints in a Bayesian non-parametric setting based on numerical solutions of stochastic differential equations.
282	Flexible distribution-free conditional predictive bands using density estimators	Rafael Izbicki, Gilson Shimizu, Rafael Stern	We introduce two conformal methods based on conditional density estimators that do not depend on this type of assumption to obtain asymptotic conditional coverage: Dist-split and CD-split.
283	Variational Integrator Networks for Physically Structured Embeddings	Steindor Saemundsson, Alexander Terenin, Katja Hofmann, Marc Deisenroth	By leveraging recent work connecting deep neural networks to systems of differential equations, we propose \emph{variational integrator networks}, a class of neural network architectures designed to preserve the geometric structure of physical systems.
284	Black-Box Inference for Non-Linear Latent Force Models	Wil Ward, Tom Ryder, Dennis Prangle, Mauricio Alvarez	We compare estimates on systems where the posterior is known, demonstrating the effectiveness of the approximation, and apply to problems with non-linear dynamics, multi-output systems and models with non-Gaussian likelihoods.
285	Importance Sampling via Local Sensitivity	Anant Raj, Cameron Musco, Lester Mackey	To overcome both obstacles we introduce \emph{local sensitivity}, which measures data point importance in a ball around some center $x_0$.
286	Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling	Mojmir Mutny, Michal Derezinski, Andreas Krause	However, we show that when the coordinate blocks are sampled with probability proportional to their determinant, the convergence rate depends solely on the eigenvalue distribution of matrix M, and has an analytically tractable form.
287	Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection	Vaggos Chatziafratis, Grigory Yaroslavtsev, Euiwoong Lee, Konstantin Makarychev, Sara Ahmadian, Alessandro Epasto, Mohammad Mahdian	Here, for the maximization dual of Dasgupta’s objective (introduced by Moseley-Wang), we present polynomial-time 42.46% approximation algorithms that use Max-Uncut Bisection as a subroutine.
288	Laplacian-Regularized Graph Bandits: Algorithms and Theoretical Analysis	Kaige Yang, Laura Toni, Xiaowen Dong	We introduce a novel bandit algorithm where the smoothness prior is imposed via the random-walk graph Laplacian, which leads to a single-user cumulative regret scaling as $\Tilde{\mathcal{O}}(\Psi d \sqrt{T})$ with time horizon $T$, feature dimensionality $d$, and the scalar parameter $\Psi \in (0,1)$ that depends on the graph connectivity.
289	Enriched mixtures of generalised Gaussian process experts	Charles Gadd, Sara Wade, Alexis Boukouvalas	We focus on alternative mixtures of GP experts, which model the joint distribution of the inputs and targets explicitly.
290	Causal Bayesian Optimization	Virginia Aglietti, Xiaoyu Lu, Andrei Paleyes, Javier Gonz?lez	We propose a new algorithm called Causal Bayesian Optimization (CBO).
291	Linear predictor on linearly-generated data with missing values: non consistency and solutions	Marine Le Morvan, Nicolas Prost, Julie Josse, Erwan Scornet, Gael Varoquaux	We consider building predictors when the data have missing values.
292	A Novel Confidence-Based Algorithm for Structured Bandits	Andrea Tirinzoni, Alessandro Lazaric, Marcello Restelli	We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem and rapidly discard all sub-optimal arms.
293	Quantitative stability of optimal transport maps and linearization of the 2-Wasserstein space	Quentin M?rigot, Alex Delalande, Frederic Chazal	This work studies an explicit embedding of the set of probability measures into a Hilbert space, defined using optimal transport maps from a reference probability density.
294	Bayesian experimental design using regularized determinantal point processes	Michal Derezinski, Feynman Liang, Michael Mahoney	A key novelty is that we offer improved guarantees under the Bayesian framework, where prior knowledge is incorporated into the criteria.
295	Non-exchangeable feature allocation models with sublinear growth of the feature sizes	Giuseppe Di Benedetto, Francois Caron, Yee Whye Teh	In this article, we describe a class of non-exchangeable feature allocation models where the number of objects sharing a given feature grows sublinearly, where the rate can be controlled by a tuning parameter.
296	Calibrated Prediction with Covariate Shift via Unsupervised Domain Adaptation	Sangdon Park, Osbert Bastani, James Weimer, Insup Lee	We pro-pose an algorithm for calibrating predictions that accounts for the possibility of covariate shift, given labeled examples from the train-ing distribution and unlabeled examples from the real-world distribution.
297	Inference of Dynamic Graph Changes for Functional Connectome	Dingjue Ji, Junwei Lu, Yiliang Zhang, Siyuan Gao, Hongyu Zhao	We propose an inferential method to detect the dynamic changes of brain networks based on time-varying graphical models.
298	An approximate KLD based experimental design for models with intractable likelihoods	Ziqiao Ao, Jinglai Li	In this work we consider a special type of ED problems where the likelihoods are not available in a closed form.
299	Almost-Matching-Exactly for Treatment Effect Estimation under Network Interference	Usaid Awan, Marco Morucci, Vittorio Orlandi, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky	We propose a matching method that recovers direct treatment effects from randomized experiments where units are connected in an observed network, and units that share edges can potentially influence each others’ outcomes.
300	?Bring Your Own Greedy?+Max: Near-Optimal 1/2-Approximations for Submodular Knapsack	Grigory Yaroslavtsev, Samson Zhou, Dmitrii Avdiukhin	Motivated by applications to recommendation systems and other scenarios with query-limited access to vast amounts of data, we propose a new rigorous algorithmic framework for a standard formulation of this problem as a submodular maximization subject to a linear (knapsack) constraint.
301	Sample complexity bounds for localized sketching	Rakshith Sharma Srinivasa, Mark Davenport, Justin Romberg	We consider sketched approximate matrix multiplication and ridge regression in the novel setting of localized sketching, where at any given point, only part of the data matrix is available.
302	An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays	Julian Zimmert, Yevgeny Seldin	We propose a new algorithm for adversarial multi-armed bandits with unrestricted delays.
303	Learning Dynamic and Personalized Comorbidity Networks from Event Data using Deep Diffusion Processes	Zhaozhi Qian, Ahmed Alaa, Alexis Bellot, Mihaela Schaar, Jem Rashbass	To this end, we develop deep diffusion processes (DDP) to model ’dynamic comorbidity networks’, i.e., the temporal relationships between comorbid disease onsets expressed through a dynamic graph.
304	Tensorized Random Projections	Beheshteh Rakhshan, Guillaume Rabusseau	We introduce a novel random projection technique for efficiently reducing the dimension of very high-dimensional tensors.
305	Nonparametric Estimation in the Dynamic Bradley-Terry Model	Heejong Bong, Wanshan Li, Shamindra Shrotriya, Alessandro Rinaldo	We propose a time-varying generalization of the Bradley-Terry model that allows for nonparametric modeling of dynamic global rankings of distinct teams.
306	Gaussian-Smoothed Optimal Transport: Metric Structure and Statistical Efficiency	Ziv Goldfeld, Kristjan Greenewald	This work proposes a novel Gaussian-smoothed OT (GOT) framework, that achieves the best of both worlds: preserving the 1-Wasserstein metric structure while alleviating the empirical approximation curse of dimensionality.
307	Learning in Gated Neural Networks	Ashok Makkuva, Sewoong Oh, Sreeram Kannan, Pramod Viswanath	In this paper, we perform a careful analysis of the optimization landscape and show that with appropriately designed loss functions, gradient descent can indeed learn the parameters accurately.
308	Validation of Approximate Likelihood and Emulator Models for Computationally Intensive Simulations	Niccolo Dalmasso, Ann Lee, Rafael Izbicki, Taylor Pospisil, Ilmun Kim, Chieh-An Lin	Here we propose a statistical framework that can distinguish any arbitrary misspecified model from the target likelihood, and that in addition can identify with statistical confidence the regions of parameter as well as feature space where the fit is inadequate.
309	Fenchel Lifted Networks: A Lagrange Relaxation of Neural Network Training	Fangda Gu, Armin Askari, Laurent El Ghaoui	In this paper, we introduce a new class of lifted models, Fenchel lifted networks, that enjoy the same benefits as previous lifted models, without suffering a degradation in performance over classical networks.
310	Adversarial Robustness Guarantees for Classification with Gaussian Processes	Arno Blaas, Andrea Patane, Luca Laurenti, Luca Cardelli, Marta Kwiatkowska, Stephen Roberts	Specifically, given a compact subset of the input space $T\subseteq \mathbb{R}^d$ enclosing a test point $x^*$ and a GPC trained on a dataset $\mathcal{D}$, we aim to compute the minimum and the maximum classification probability for the GPC over all the points in $T$.
311	Causal inference in degenerate systems: An impossibility result	Yue Wang, Linbo Wang	In this paper, we characterize a degenerate causal system using multiplicity of Markov boundaries.
312	ChemBO: Bayesian Optimization of Small Organic Molecules with Synthesizable Recommendations	Ksenia Korovina, Sailun Xu, Kirthevasan Kandasamy, Willie Neiswanger, Barnabas Poczos, Jeff Schneider, Eric Xing	We describe ChemBO, a Bayesian optimization framework for generating and optimizing organic molecules for desired molecular properties.
313	Local Differential Privacy for Sampling	Hisham Husain, Borja Balle, Zac Cranko, Richard Nock	We propose to model this scenario by assuming each individual holds a distribution over the space of data records, and develop novel local DP methods to sample privately from these distributions.
314	Learning Sparse Nonparametric DAGs	Xun Zheng, Chen Dan, Bryon Aragam, Pradeep Ravikumar, Eric Xing	Unlike existing approaches that require specific modeling choices, loss functions, or algorithms, we present a completely general framework that can be applied to general nonlinear models (e.g. without additive noise), general differentiable loss functions, and generic black-box optimization routines.
315	Minimax Rank-$1$ Matrix Factorization	Venkatesh Saligrama, Alexander Olshevsky, Julien Hendrickx	We propose a method based on least squares in the log-space and show its performance matches the lower bounds that we derive for this problem in the small-perturbation regime, which are related to the spectral gap of a graph representing the revealed entries.
316	Context Mover?s Distance & Barycenters: Optimal Transport of Contexts for Building Representations	Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi	We present a framework for building unsupervised representations of entities and their compositions, where each entity is viewed as a probability distribution rather than a vector embedding.
317	Data Generation for Neural Programming by Example	Judith Clymo, Adria Gascon, Brooks Paige, Nathanael Fijalkow, Haik Manukian	In this paper we introduce a novel approach using an SMT solver to synthesize inputs which cover a diverse set of behaviors for a given program.
318	An Inverse-free Truncated Rayleigh-Ritz Method for Sparse Generalized Eigenvalue Problem	Yunfeng Cai, Ping Li	In this paper, we focus on the development of a three-stage algorithm named {\em inverse-free truncated Rayleigh-Ritz method} ({\em IFTRR}) to efficiently solve SGEP.
319	The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits	Ronshee Chawla, Abishek Sankararaman, Ayalvadi Ganesh, Sanjay Shakkottai	We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of N agents, solving the same MAB instance to minimize individual cumulative regret.
320	Understanding the Effects of Batching in Online Active Learning	Kareem Amin, Corinna Cortes, Giulia DeSalvo, Afshin Rostamizadeh	In this work, we present an analysis for a generic class of batch online AL algorithms, which reveals that the effects of batching are in fact mild and only result in an additional label complexity term that is quasilinear in the batch size.
321	Adaptive multi-fidelity optimization with fast learning rates	C?me Fiegel, Victor Gabillon, Michal Valko	This paper studies the problem of optimizing a locally smooth function with a limited budget, where the learner has to make a tradeoff between the cost and the bias of these approximations.
322	On the interplay between noise and curvature and its effect on optimization and generalization	Valentin Thomas, Fabian Pedregosa, Bart Merri?nboer, Pierre-Antoine Manzagol, Yoshua Bengio, Nicolas Le Roux	While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed.
323	A Reduction from Reinforcement Learning to No-Regret Online Learning	Ching-An Cheng, Remi Tachet Combes, Byron Boots, Geoff Gordon	We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "any" online algorithm with sublinear regret can generate policies with provable performance guarantees.
324	The Implicit Regularization of Ordinary Least Squares Ensembles	Daniel LeJeune, Hamid Javadi, Richard Baraniuk	We study the case of an ensemble of linear predictors, where each individual predictor is fit using ordinary least squares on a random submatrix of the data matrix.
325	Adaptive Exploration in Linear Contextual Bandit	Botao Hao, Tor Lattimore, Csaba Szepesvari	We start to bridge the gap by designing an algorithm that is asymptotically optimal and has good finite-time empirical performance.
326	A Three Sample Hypothesis Test for Evaluating Generative Models	Casey Meehan, Kamalika Chaudhuri, Sanjoy Dasgupta	In this work, we formalize a form of overfitting that we call {\em{data-copying}} – where the generative model memorizes and outputs training samples or small variations thereof.
327	Learning Ising and Potts Models with Latent Variables	Surbhi Goel	We study the problem of learning graphical models with latent variables.
328	Learning piecewise Lipschitz functions in changing environments	Dravyansh Sharma, Maria-Florina Balcan, Travis Dick	In this work we provide an $O(\sqrt{sdT\log T}+sT^{1-\beta})$ regret bound for $\beta$-dispersed functions, where $\beta$ roughly quantifies the rate at which discontinuities appear in the utility functions in expectation (typically $\beta\ge1/2$ in problems of practical interest \cite{2019arXiv190409014B,balcan2018dispersion}).
329	POPCORN: Partially Observed Prediction Constrained Reinforcement Learning	Joseph Futoma, Michael Hughes, Finale Doshi-Velez	We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available.
330	Optimal Approximation of Doubly Stochastic Matrices	Nikitas Rontsis, Paul Goulart	We consider the least-squares approximation of a matrix C in the set of doubly stochastic matrices with the same sparsity pattern as C.
331	The Expressive Power of a Class of Normalizing Flow Models	Zhifeng Kong, Kamalika Chaudhuri	In this work, we study some basic normalizing flows and rigorously establish bounds on their expressive power.
332	Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Functions	Gr?goire Mialon, Julien Mairal, Alexandre d?Aspremont	We design simple screening tests to automatically discard data samples in empirical risk minimization withoutlosing optimization guarantees.
333	An Empirical Study of Stochastic Gradient Descent with Structured Covariance Noise	Yeming Wen, Kevin Luk, Maxime Gazeau, Guodong Zhang, Harris Chan, Jimmy Ba	To address the problem of improving generalization while maintaining optimal convergence in large-batch training, we propose to add covariance noise to the gradients.
334	Amortized Inference of Variational Bounds for Learning Noisy-OR	Yiming Yan, Melissa Ailem, Fei Sha	In this paper, we propose Amortized Conjugate Posterior (ACP), a hybrid approach taking advantages of both types of approaches.
335	Gain with no Pain: Efficiency of Kernel-PCA by Nystr?m Sampling	Nicholas Sterge, Bharath Sriperumbudur, Lorenzo Rosasco, Alessandro Rudi	In this paper, we analyze a Nyström based approach to efficient large scale kernel principal component analysis (PCA).
336	Logistic regression with peer-group effects via inference in higher-order Ising models	Constantinos Daskalakis, Nishanth Dikkala, Ioannis Panageas	In this work we study extensions of these models to models with higher-order sufficient statistics, modeling behavior on a social network with peer-group effects.
337	An Asymptotic Rate for the LASSO Loss	Cynthia Rush	We study the linear asymptotic regime where the under sampling ratio, n/p, approaches a constant greater than 0 in the limit.
338	Constructing a provably adversarially-robust classifier from a high accuracy one	Grzegorz Gluch, R?diger Urbanke	In this paper we focus on our conceptual contribution, but we do present two examples to illustrate our framework.
339	Distributed, partially collapsed MCMC for Bayesian Nonparametrics	Kumar Avinava Dubey, Michael Zhang, Eric Xing, Sinead Williamson	We exploit the fact that completely random measures, which commonly-used models like the Dirichlet process and the beta-Bernoulli process can be expressed using, are decomposable into independent sub-measures.
340	Quantized Frank-Wolfe: Faster Optimization, Lower Communication, and Projection Free	Mingrui Zhang, Lin Chen, Aryan Mokhtari, Hamed Hassani, Amin Karbasi	In this paper, we propose Quantised Frank-Wolfe (QFW), the first projection free and communication-efficient algorithm for solving constrained optimization problems at scale.
341	A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option	P Sharoff, Nishant Mehta, Ravi Ganti	We consider a sequential decision-making problem where an agent can take one action at a time and each action has a stochastic temporal extent, i.e., a new action cannot be taken until the previous one is finished.
342	Prophets, Secretaries, and Maximizing the Probability of Choosing the Best	Hossein Esfandiari, MohammadTaghi Hajiaghayi, Brendan Lucier, Michael Mitzenmacher	Along the way, we show that the best achievable success probability for the random-order case matches that of the i.i.d. case, which is approximately 0.58010.5801, under a “no-superstars” condition that no single distribution is very likely ex ante to generate the maximum value.
343	A Wasserstein Minimum Velocity Approach to Learning Unnormalized Models	Ziyu Wang, Shuyu Cheng, Li Yueru, Jun Zhu, Bo Zhang	In this paper, we present a scalable approximation to a general family of learning objectives including score matching, by observing a new connection between these objectives and Wasserstein gradient flows.
344	Sharp Asymptotics and Optimal Performance for Inference in Binary Models	Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis	We study convex empirical risk minimization for high-dimensional inference in binary models.
345	A Theoretical Case Study of Structured Variational Inference for Community Detection	Mingzhang Yin, Y. X. Rachel Wang, Purnamrita Sarkar	In this paper, we study the advantage of structured variational inference in the context of the two-class Stochastic Blockmodel.
346	Orthogonal Gradient Descent for Continual Learning	Mehrdad Farajtabar, Navid Azizan, Alex Mott, Ang Li	In this paper, we propose to address this issue from a parameter space perspective and study an approach to restrict the direction of the gradient updates to avoid forgetting previously-learned data.
347	Hamiltonian Monte Carlo Swindles	Dan Piponi, Matthew Hoffman, Pavel Sountsov	In this work, we explore a complementary approach to variance reduction based on two classical Monte Carlo ’swindles’: first, running an auxiliary coupled chain targeting a tractable approximation to the target distribution, and using the auxiliary samples as control variates; and second, generating anti-correlated (“antithetic”) samples by running two chains with flipped randomness.
348	A single algorithm for both restless and rested rotting bandits	Julien Seznec, Pierre Menard, Alessandro Lazaric, Michal Valko	In this paper, we introduce a novel algorithm, Rotting Adaptive Window UCB (RAW-UCB), that achieves near-optimal regret in both rotting rested and restless bandit, without any prior knowledge of the setting (rested or restless) and the type of non-stationarity (e.g., piece-wise constant, bounded variation).
349	Adversarial Robustness of Flow-Based Generative Models	Phillip Pope, Yogesh Balaji, Soheil Feizi	In this paper, we study adversarial robustness of flow-based generative models both theoretically (for some simple models) and empirically (for more complex ones).
350	The Power of Batching in Multiple Hypothesis Testing	Tijana Zrnic, Daniel Jiang, Aaditya Ramdas, Michael Jordan	To this end, we introduce Batch-BH and Batch-St-BH, algorithms for controlling the FDR when a possibly infinite sequence of batches of hypotheses is tested by repeated application of one of the most widely used offline algorithms, the Benjamini-Hochberg (BH) method or Storey’s improvement of the BH method.
351	Adversarial Risk Bounds through Sparsity based Compression	Emilio Balda, Niklas Koep, Arash Behboodi, Rudolf Mathar	In this work, we focus on $\ell_\infty$ attacks with $\ell_\infty$ bounded inputs and prove margin-based bounds.Specifically, we use a compression-based approach that relies on efficiently compressing the set of tunable parameters without distorting the adversarial risk.
352	Learning spectrograms with convolutional spectral kernels	Zheyang Shen, Markus Heinonen, Samuel Kaski	We present a principled framework to interpret CSK, as well as other deep probabilistic models, using approximated Fourier transform, yielding a concise representation of input-frequency spectrogram.
353	Federated Heavy Hitters Discovery with Differential Privacy	Wennan Zhu, Peter Kairouz, Brendan McMahan, Haicheng Sun, Wei Li	To address these risks, we propose a distributed and privacy-preserving algorithm for discovering the heavy hitters in a population of user-generated data streams.
354	Online Batch Decision-Making with High-Dimensional Covariates	Chi-Hua Wang, Guang Cheng	We propose and investigate a class of new algorithms for sequential decision making that interacts with a batch of users simultaneously instead of a user at each decision epoch.
355	Sample Complexity of Estimating the Policy Gradient for Nearly Deterministic Dynamical Systems	Osbert Bastani	We propose a theoretical framework for understanding this phenomenon.
356	Scalable Gradients for Stochastic Differential Equations	Xuechen Li, Ting-Kam Leonard Wong, Ricky T. Q. Chen, David Duvenaud	We generalize this method to stochastic differential equations, allowing time-efficient and constant-memory computation of gradients with high-order adaptive solvers.
357	Understanding the Intrinsic Robustness of Image Distributions using Conditional Generative Models	Xiao Zhang, Jinghui Chen, Quanquan Gu, David Evans	In this work, we assume the underlying data distribution is captured by some conditional generative model, and prove intrinsic robustness bounds for a general class of classifiers, which solves an open problem in Fawzi et al. (2018).
358	Uncertainty Quantification for Deep Context-Aware Mobile Activity Recognition and Unknown Context Discovery	Zepeng Huo, Arash PakBin, Xiaohan Chen, Nathan Hurley, Ye Yuan, Xiaoning Qian, Zhangyang Wang, Shuai Huang, Bobak Mortazavi	We develop a context-aware mixture of deep models termed the $\alpha$-$\beta$ network coupled with uncertainty quantification (UQ) based upon maximum entropy to enhance human activity recognition performance.
359	Learnable Bernoulli Dropout for Bayesian Deep Learning	Shahin Boluki, Randy Ardywibowo, Siamak Zamani Dadaneh, Mingyuan Zhou, Xiaoning Qian	In this work, we propose learnable Bernoulli dropout (LBD), a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters.
360	General Identification of Dynamic Treatment Regimes Under Interference	Eli Sherman, David Arbour, Ilya Shpitser	Inthis paper we consider the problem of identifyingoptimal treatment policies in the presenceof interference.
361	Gaussian Sketching yields a J-L Lemma in RKHS	Samory Kpotufe, Bharath Sriperumbudur	The main contribution of the paper is to show that Gaussian sketching of a kernel-Gram matrix $\bm K$ yields an operator whose counterpart in an RKHS $\cal H$, is a \emph{random projection} operator—in the spirit of Johnson-Lindenstrauss (J-L) lemma.
362	Wasserstein Smoothing: Certified Robustness against Wasserstein Adversarial Attacks	Alexander Levine, Soheil Feizi	In this work, we propose the first defense with certified robustness against Wasserstein adversarial attacks using randomized smoothing.
363	Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning	Ming Yin, Yu-Xiang Wang	In this paper, we prove that with a simple modification to the MIS estimator, we can asymptotically attain the Cramer-Rao lower bound, provided that the action space is finite.
364	Learning Dynamic Hierarchical Topic Graph with Graph Convolutional Network for Document Classification	Zhengjue Wang, Chaojie Wang, Hao Zhang, Zhibin Duan, Mingyuan Zhou, Bo Chen	To address these constrains, we integrate a probabilistic deep topic model into graph construction, and propose a novel trainable hierarchical topic graph (HTG), including word-level, hierarchical topic-level and document-level nodes, exhibiting semantic variation from fine-grained to coarse.
365	Differentiable Causal Backdoor Discovery	Limor Gultchin, Matt Kusner, Varun Kanade, Ricardo Silva	In this work, we present an algorithm that exploits auxiliary variables, similar to instruments, in order to find an appropriate adjustment by a gradient-based optimization method.
366	Stochastic Recursive Variance-Reduced Cubic Regularization Methods	Dongruo Zhou, Quanquan Gu	In this paper, we first present a Stochastic Recursive Variance-Reduced Cubic regularization method (SRVRC) using a recursively updated semi-stochastic gradient and Hessian estimators.
367	Better Long-Range Dependency By Bootstrapping A Mutual Information Regularizer	Yanshuai Cao, Peng Xu	In this work, we develop a novel regularizer to improve the learning of long-range dependency of sequence data.
368	On the Completeness of Causal Discovery in the Presence of Latent Confounding with Tiered Background Knowledge	Bryan Andrews	In this paper, we define tiered background knowledge and show that FCI is sound and complete with the incorporation of this knowledge.
369	One Sample Stochastic Frank-Wolfe	Mingrui Zhang, Zebang Shen, Aryan Mokhtari, Hamed Hassani, Amin Karbasi	The aim of this paper is to bring them back without sacrificing the efficiency.
370	Convex Geometry of Two-Layer ReLU Networks: Implicit Autoencoding and Interpretable Models	Tolga Ergen, Mert Pilanci	We develop a convex analytic framework for ReLU neural networks which elucidates the inner workings of hidden neurons and their function space characteristics.
371	A Robust Univariate Mean Estimator is All You Need	Adarsh Prasad, Sivaraman Balakrishnan, Pradeep Ravikumar	In such an adversarial setup, we aim to design statistically optimal estimators for flexible non-parametric distribution classes such as distributions with bounded-2k moments and symmetric distributions.
372	Patient-Specific Effects of Medication Using Latent Force Models with Gaussian Processes	Li-Fang Cheng, Bianca Dumitrascu, Michael Zhang, Corey Chivers, Michael Draugelis, Kai Li, Barbara Engelhardt	We propose a novel approach that models the effect of interventions as a hybrid Gaussian process composed of a GP capturing patient baseline physiology convolved with a latent force model capturing effects of treatments on specific physiological features.
373	Robust Variational Autoencoders for Outlier Detection and Repair of Mixed-Type Data	Simao Eduardo, Alfredo Nazabal, Christopher K. I. Williams, Charles Sutton	We introduce the Robust VariationalAutoencoder (RVAE), a deep generative model that learns the jointdistribution of the clean data while identifying the outlier cells, allowing their imputation (repair).
374	Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions	Kamiar Rahnama Rad, Wenda Zhou, Arian Maleki	This paper aims to fill this gap for penalized regression in the generalized linear family.
375	A Diversity-aware Model for Majority Vote Ensemble Accuracy	Bob Durrant, Nick Lim	In this paper, we explore the predictive power of several common diversity measures and show – with extensive experiments – that contrary to earlier work that finds no clear link between these diversity measures (in isolation) and ensemble accuracy instead by using the $\rho$ diversity measure of Sneath and Sokal as an estimator for the dispersion parameter of a Polya-Eggenberger distribution we can predict, independently of the choice of base classifier family, the accuracy of a majority vote classifier ensemble ridiculously well.
376	Scaling up Kernel Ridge Regression via Locality Sensitive Hashing	Amir Zandieh, Navid Nouri, Ameya Velingker, Michael Kapralov, Ilya Razenshteyn	In this paper we introduce a simple weighted version of random binning features, and show that the corresponding kernel function generates Gaussian processes of any desired smoothness.
377	Ordering-Based Causal Structure Learning in the Presence of Latent Variables	Daniel Bernstein, Basil Saeed, Chandler Squires, Caroline Uhler	Motivated by this result, we propose a greedy algorithm over the space of posets for causal structure discovery in the presence of latent confounders and compare its performance to the current state-of-the-art algorithms FCI and FCI+ on synthetic data.
378	Budget Learning via Bracketing	Durmus Alp Emre Acar, Aditya Gangrade, Venkatesh Saligrama	We propose a new formulation for the BL problem via the concept of bracketings.
379	Optimal Algorithms for Multiplayer Multi-Armed Bandits	PO-AN WANG, Alexandre Proutiere, Kaito Ariu, Yassir Jedra, Alessio Russo	For this problem, we present DPE1 (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same asymptotic regret as that obtained by an optimal centralized algorithm.
380	AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning	Rizal Fathony, Zico Kolter	We propose a method that enables practitioners to conveniently incorporate custom non-decomposable performance metrics into differentiable learning pipelines, notably those based upon neural network architectures.
381	Optimal Deterministic Coresets for Ridge Regression	Praneeth Kacham, David Woodruff	We consider the ridge regression problem, for which we are given an nxd matrix A of examples and a corresponding nxd’ matrix B of labels, as well as a ridge parameter $\lambda \geq 0$, and would like to output an $X’ \in R^{d \times d’}$ for which $$\\|AX’-B\\|_F^2 + \lambda \\|X’\\|_F^2 \leq (1+\epsilon)OPT,$$ where ${OPT} = \min_{Y \in \mathbb{R}^{d \times d’}} \\|AY-B\\|_F^2 + \lambda \\|Y\\|_F^2.
382	Expressiveness and Learning of Hidden Quantum Markov Models	Sandesh Adhikary, Siddarth Srinivasan, Geoff Gordon, Byron Boots	We tackle these problems by showing that HQMMs are a special subclass of the general class of observable operator models (OOMs) that do not suffer from the negative probability problem by design.
383	Solving the Robust Matrix Completion Problem via a System of Nonlinear Equations	Yunfeng Cai, Ping Li	We consider the problem of robust matrix completion, which aims to recover a low rank matrix $L_$ and a sparse matrix $S_$ from incomplete observations of their sum $M=L_+S_\in\mathbb{R}^{m\times n}$.
384	Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation	Shuhang Chen, Adithya Devraj, Ana Busic, Sean Meyn	This paper concerns error bounds for recursive equations subject to Markovian disturbances.
385	Stochastic Neural Network with Kronecker Flow	Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville	In this work, we address this need and present the Kronecker Flow, a generalization of the Kronecker product to invertible mappings designed for stochastic neural networks.
386	Fair Correlation Clustering	Sara Ahmadian, Alessandro Epasto, Ravi Kumar, Mohammad Mahdian	In this paper, we study correlation clustering under fairness constraints.
387	Towards Competitive N-gram Smoothing	Moein Falahatgar, Mesrob Ohannessian, Alon Orlitsky, Venkatadheeraj Pichapati	In the hopes of explaining this performance, we study it through the lens of competitive distribution estimation: the ability to perform as well as an oracle aware of further structure in the data.
388	Multi-level Gaussian Graphical Models Conditional on Covariates	Gi Bum Kim, Seyoung Kim	We propose a statistical model called multi-level conditional Gaussian graphical models for modeling multi-level output networks influenced by both individual-level and group-level inputs.
389	Semi-Modular Inference: enhanced learning in multi-modular models by tempering the influence of components	Christian Carmona, Geoff Nicholls	Working within an existing coherent loss-based generalisation of Bayesian inference, we show existing Modular/Cut-model inference is coherent, and write down a new family of Semi-Modular Inference (SMI) schemes, indexed by an influence parameter, with Bayesian inference and Cut-models as special cases.
390	Invertible Generative Modeling using Linear Rational Splines	Hadi Mohaghegh Dolatabadi, Sarah Erfani, Christopher Leckie	In this paper, we explore using linear rational splines as a replacement for affine transformations used in coupling layers.
391	LdSM: Logarithm-depth Streaming Multi-label Decision Trees	Maryam Majzoubi, Anna Choromanska	In this paper we develop the LdSM algorithm for the construction and training of multi-label decision trees, where in every node of the tree we optimize a novel objective function that favors balanced splits, maintains high class purity of children nodes, and allows sending examples to multiple directions but with a penalty that prevents tree over-growth.
392	Prior-aware Composition Inference for Spectral Topic Models	Moontae Lee, David Bindel, David Mimno	We propose two novel estimation methods that respect previously unclear prior structures of spectral topic models.
393	Variational Optimization on Lie Groups, with Examples of Leading (Generalized) Eigenvalue Problems	Molei Tao, Tomoki Ohsawa	The article considers smooth optimization of functions on Lie groups.
394	Best-item Learning in Random Utility Models with Subset Choices	Bangalore) Aadirupa Saha, Bangalore) Aditya Gopalan	We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities.
395	Regularized Autoencoders via Relaxed Injective Probability Flow	Abhishek Kumar, Ben Poole, Kevin Murphy	We propose a generative model based on probability flows that does away with the bijectivity requirement on the model and only assumes injectivity.
396	Stochastic Variance-Reduced Algorithms for PCA with Arbitrary Mini-Batch Sizes	Cheolmin Kim, Diego Klabjan	We present two stochastic variance-reduced PCA algorithms and their convergence analyses.
397	Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks	Mingchen Li, Mahdi Soltanolkotabi, Samet Oymak	Despite this (over)fitting capacity in this paper we demonstrate that such overparameterized networks have an intriguing robustness capability: they are surprisingly robust to label noise when first order methods with early stopping is used to train them.
398	Scalable Nonparametric Factorization for High-Order Interaction Events	Zhimeng Pan, Zheng Wang, Shandian Zhe	To address these issues, we propose a Bayesian nonparametric factorization model for high-order interaction events, which can flexibly estimate/embed the static, nonlinear relationships and capture various long-term and short-term excitations effects, encoding these effects and their decaying patterns into the latent factors.
399	Gaussianization Flows	Chenlin Meng, Yang Song, Jiaming Song, Stefano Ermon	Based on iterative Gaussianization, we propose a new type of normalizing flow models that grants both efficient computation of likelihoods and efficient inversion for sample generation.
400	Adaptive, Distribution-Free Prediction Intervals for Deep Networks	Danijel Kivaranovic, Kory D. Johnson, Hannes Leeb	We present methods from the statistics literature that can be used efficiently with neural networks under minimal assumptions with guaranteed performance.
401	A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms	Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare	We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
402	Automatic Differentiation of Sketched Regression	Hang Liao, Barak Pearlmutter, Vamsi Potluru, David Woodruff	Sketching for speeding up regression problems involves using a sketching matrix $S$ to quickly find the approximate solution to a linear least squares regression (LLS) problem: given $A$ of size $n \times d$, with $n \gg d$, along with $b$ of size $n \times 1$, we seek a vector $y$ with minimal regression error $\lVert A y – b\rVert_2$.
403	Sublinear Optimal Policy Value Estimation in Contextual Bandits	Weihao Kong, Emma Brunskill, Gregory Valiant	We study the problem of estimating the expected reward of the optimal policy in the stochastic disjoint linear bandit setting.
404	Budget-Constrained Bandits over General Cost and Reward Distributions	Semih Cayci, Atilla Eryilmaz, R Srikant	In order to achieve tight regret bounds, we propose algorithms that exploit the correlation between the cost and reward of each arm by extracting the common information via linear minimum mean-square error estimation.
405	Measuring Mutual Information Between All Pairs of Variables in Subquadratic Complexity	Mohsen Ferdosi, Arash Gholamidavoodi, Hosein Mohimani	In this paper, we consider the problem of finding pairs of variables with high mutual information in sub-quadratic complexity.
406	Online Continuous DR-Submodular Maximization with Long-Term Budget Constraints	Omid Sadeghi, Maryam Fazel	In this paper, we study a class of online optimization problems with long-term budget constraints where the objective functions are not necessarily concave (nor convex), but they instead satisfy the Diminishing Returns (DR) property.
407	Prediction Focused Topic Models via Feature Selection	Jason Ren, Russell Kunes, Finale Doshi-Velez	We introduce a novel approach, the prediction-focused topic model, that uses the supervisory signal to retain only vocabulary terms that improve, or at least do not hinder, prediction performance.
408	Accelerated Factored Gradient Descent for Low-Rank Matrix Factorization	Dongruo Zhou, Yuan Cao, Quanquan Gu	In this paper, we answer this question affirmatively by proposing a novel and practical accelerated factored gradient descent method motivated by Nesterov’s accelerated gradient descent.
409	Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models	Christian Weilbach, Boyan Beronov, Frank Wood, William Harvey	We exploit minimally faithful inversion of graphical model structures to specify sparse continuous normalizing flows (CNFs) for amortized inference.
410	Graph Coarsening with Preserved Spectral Properties	Yu Jin, Andreas Loukas, Joseph JaJa	We show that the proposed spectral distance captures the structural differences in the graph coarsening process.
411	A Theoretical and Practical Framework for Regression and Classification from Truncated Samples	Andrew Ilyas, Emmanouil Zampetakis, Constantinos Daskalakis	We present a general framework for regression and classification from samples that are truncated according to the value of the dependent variable.
412	Permutation Invariant Graph Generation via Score-Based Generative Modeling	Chenhao Niu, Yang Song, Jiaming Song, Shengjia Zhao, Aditya Grover, Stefano Ermon	To address this difficulty, we propose a permutation invariant approach to modeling graphs, using the recent framework of score-based generative modeling.
413	Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation	Jun Sun, Gang Wang, Georgios B. Giannakis, Qinmin Yang, Zaiyue Yang	In this paper, we provide a finite-time analysis of the fully decentralized TD(0) learning under both i.i.d. as well as Markovian samples, and prove that all local estimates converge linearly to a small neighborhood of the optimum.
414	Multi-attribute Bayesian optimization with interactive preference learning	Raul Astudillo, Peter Frazier	We propose a novel multi-attribute Bayesian optimization with preference learning approach.
415	On the Sample Complexity of Learning Sum-Product Networks	Ishaq Aden-Ali, Hassan Ashtiani	In this work, we initiate the study of the sample complexity of PAC-learning the set of distributions that correspond to SPNs.
416	Tighter Theory for Local SGD on Identical and Heterogeneous Data	Ahmed Khaled Ragab Bayoumi, Konstantin Mishchenko, Peter Richtarik	We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous.
417	Approximate Cross-validation: Guarantees for Model Assessment and Selection	Ashia Wilson, Maximilian Kasy, Lester Mackey	We address these questions with three main contributions: (i) we provide uniform non-asymptotic, deterministic model assessment guarantees for approximate CV; (ii) we show that (roughly) the same conditions also guarantee model selection performance comparable to CV; (iii) we provide a proximal Newton extension of the approximate CV framework for non-smooth prediction problems and develop improved assessment guarantees for problems such as L1-regularized ERM.
418	On Minimax Optimality of GANs for Robust Mean Estimation	Kaiwen Wu, Gavin Weiguang Ding, Ruitong Huang, Yaoliang Yu	In this work, we study the statistical and robust properties of GANs for Gaussian mean estimation under Huber’s contamination model, where an epsilon proportion of training data may be arbitrarily corrupted.
419	Auditing ML Models for Individual Bias and Unfairness	Songkai Xue, Mikhail Yurochkin, Yuekai Sun	We formalize the task in an optimization problem and develop a suite of inferential tools for the optimal value.
420	Stein Variational Inference for Discrete Distributions	Jun Han, Fan Ding, Xianglong Liu, Lorenzo Torresani, Jian Peng, Qiang Liu	In this work, we fill this gap by proposing a simple general-purpose framework that transforms discrete distributions to equivalent piecewise continuous distribution, on which we apply gradient-free Stein variational gradient descent to perform efficient approximate inference.
421	Revisiting Stochastic Extragradient	Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtarik, Yura Malitsky	We fix a fundamental issue in the stochastic extragradient method by providing a new sampling strategy that is motivated by approximating implicit updates.
422	A Framework for Sample Efficient Interval Estimation with Control Variates	Shengjia Zhao, Christopher Yeh, Stefano Ermon	We consider the problem of estimating confidence intervals for the mean of a random variable, where the goal is to produce the smallest possible interval for a given number of samples.
423	Nonmyopic Gaussian Process Optimization with Macro-Actions	Dmitrii Kharkovskii, Chun Kai Ling, Bryan Kian Hsiang Low	This paper presents a multi-staged approach to nonmyopic adaptive Gaussian process optimization (GPO) for Bayesian optimization (BO) of unknown, highly complex objective functions that, in contrast to existing nonmyopic adaptive BO algorithms, exploits the notion of macro-actions for scaling up to a further lookahead to match up to a larger available budget.