# Paper Digest: UAI 2020 Highlights

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The Conference on Uncertainty in Artificial Intelligence (UAI) is one of the premier international conferences on research related to knowledge representation, learning, and reasoning in the presence of uncertainty. In 2020, it is to be held virtually due to covid-19 pandemic.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to **sign up our free daily paper digest service ** to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team

team@paperdigest.org

#### TABLE 1: Paper Digest: UAI 2020 Highlights

Title | Authors | Highlight | |
---|---|---|---|

1 | Semi-supervised learning, causality, and the conditional cluster assumption | Julius K?gelgen, Alexander Mey, Marco Loog, Bernhard Sch?lkopf | Since both these cases are restrictive, we extend their work by considering classification using cause and effect features at the same time, such as predicting a disease from both risk factors and symptoms. |

2 | Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise | Yue Wang, Shaofeng Zou | Our techniques in this paper provide a general framework for finite-sample analysis of non-convex value-based reinforcement learning algorithms for optimal control. |

3 | PAC-Bayesian Contrastive Unsupervised Representation Learning | Kento Nozawa, Pascal Germain, Benjamin Guedj | We present PAC-Bayesian generalisation bounds for CURL, which are then used to derive a new representation learning algorithm. |

4 | Static and Dynamic Values of Computation in MCTS | Eren Sezener, Peter Dayan | Inthis work, we take a more direct approach, and explicitly quantify thevalue of a computation based on its expected impact on the quality ofthe action eventually chosen. |

5 | Kernel Conditional Moment Test via Maximum Moment Restriction | Krikamol Muandet, Wittawat Jitkrittum, Jonas K?bler | We propose a new family of specification tests called kernel conditional moment (KCM) tests. |

6 | Bounding the expected run-time of nonconvex optimization with early stopping | Thomas Flynn, Kwangmin Yu, Abid Malik, Nicholas D?Imperio, Shinjae Yoo | We develop the approach in the general setting of a first-order optimization algorithm, with possibly biased update directions subject to a geometric drift condition. |

7 | Amortized variance reduction for doubly stochastic objective | Ayman Boustati, Sattar Vakili, James Hensman, ST John | We propose a new approach in which we use a recognition network to cheaply approximate the optimal control variate for each mini-batch, with no additional model gradient computations. |

8 | Randomized Exploration for Non-Stationary Stochastic Linear Bandits | Baekjin Kim, Ambuj Tewari | For non-stationary linear bandits, where each action is associated with a $d$-dimensional feature and the unknown parameter is time-varying with total variation $B_T$, we propose two randomized algorithms, Discounted Randomized LinUCB (D-RandLinUCB) and Discounted Linear Thompson Sampling (D-LinTS) via the two perturbation approaches. |

9 | Divergence-Based Motivation for Online EM and Combining Hidden Variable Models | Ehsan Amid, Manfred K. Warmuth | As a result, we formulate an online version of the EM algorithm by adding an analogous inertia term which is a relative entropy divergence to the old model. |

10 | Iterative Channel Estimation for Discrete Denoising under Channel Uncertainty | Hongjoon Ahn, Taesup Moon | We propose a novel iterative channel estimation (ICE) algorithm that essentially removes the critical known noisy channel assumption for universal discrete denoising problem. |

11 | Nonparametric Fisher Geometry with Application to Density Estimation | Andrew Holbrook, Shiwei Lan, Jeffrey Streets, Babak Shahbaba | Following recent work, we consider the nonparametric generalization of the Fisher geometry. |

12 | Learning Intrinsic Rewards as a Bi-Level Optimization Problem | Bradly Stadie, Lunjun Zhang, Jimmy Ba | To facilitate our methods, we introduces a new general conditioning layer: Conditional Layer Normalization (CLN). |

13 | Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems | Seyed Mohammad Asghari, Yi Ouyang, Ashutosh Nayyar | When only one system’s dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. |

14 | Learning Behaviors with Uncertain Human Feedback | Xu He, Haipeng Chen, Bo An | To address this challenge, we introduce a novel feedback model that considers the uncertainty of human feedback. |

15 | Regret Analysis of Bandit Problems with Causal Background Knowledge | Yangyi Lu, Amirhossein Meisami, Ambuj Tewari, William Yan | We propose two algorithms, causal upper confidence bound (C-UCB) and causal Thompson Sampling (C-TS), that enjoy improved cumulative regret bounds compared with algorithms that do not use causal information. |

16 | Evaluation of Causal Structure Learning Algorithms via Risk Estimation | Marco Eigenmann, Sach Mukherjee, Marloes Maathuis | Motivated by this gap, we pose the following question: how can one assess, in a given problem setting, the practical efficacy of one or more causal structure learning methods? |

17 | Kidney Exchange with Inhomogeneous Edge Existence Uncertainty | hoda bidkhori, John Dickerson, Duncan McElfresh, Ke Ren | Subsequently, we propose a sample-average-approximation (SAA) based approach to solve this problem. |

18 | On the design of consequential ranking algorithms | Behzad Tabibian, Vicen? G?mez, Abir De, Bernhard Sch?lkopf, Manuel Gomez Rodriguez | To overcome this, we introduce an efficient gradient-based algorithm to learn parameterized consequential ranking models that effectively approximate optimal ones. |

19 | Fair Contextual Multi-Armed Bandits: Theory and Experiments | Yifang Chen, Alex Cuellar, Haipeng Luo, Jignesh Modi, Heramb Nemlekar, Stefanos Nikolaidis | We introduce a Multi-Armed Bandit algorithm with fairness constraints, where fairness is defined as a minimum rate at which a task or a resource is assigned to a user. |

20 | Submodular Bandit Problem Under Multiple Constraints | Sho Takemori, Masahiro Sato, Takashi Sonoda, Janmajay Singh, Tomoko Ohkuma | Thus, motivated by diversified retrieval considering budget constraints,we introduce a submodular bandit problem under the intersection of$l$ knapsacks and a $k$-system constraint. |

21 | Exploration Analysis in Finite-Horizon Turn-based Stochastic Games | Jialian Li, Yichi Zhou, Tongzheng Ren, Jun Zhu | In this work, we address the exploration issue for the $N$-player finite-horizon turn-based stochastic games (FTSG). |

22 | Amortized Nesterov?s Momentum: A Robust Momentum and Its Application to Deep Learning | Kaiwen Zhou, Yanghua Jin, Qinghua Ding, James Cheng | This work proposes a novel momentum technique, the Amortized Nesterov’s Momentum, for stochastic convex optimization. |

23 | Testing Goodness of Fit of Conditional Density Models with Kernels | Wittawat Jitkrittum, Heishiro Kanagawa, Bernhard Sch?lkopf | We propose two nonparametric statistical tests of goodness of fit for conditional distributions: given a conditional probability density function p(y|x) and a joint sample, decide whether the sample is drawn from p(y|x)q(x) for some density q(x). |

24 | Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes | Or Dinari, Oren Freifeld | As a remedy, we propose another, but closely related, hierarchical Bayesian nonparametric framework. |

25 | Statistically Efficient Greedy Equivalence Search | Max Chickering | We present simple synthetic experiments that compare a backward-only variantof the new algorithm to GES using finite data, showing increasing benefit of the new algorithm as the complexity of the generative model increases. |

26 | Robust Collective Classification against Structural Attacks | Kai Zhou, Yevgeniy Vorobeychik | We formulate the task of learning a robust AMN classifier as a bi-level program, where the inner problem is a challenging non- linear integer program that computes optimal structural changes to the AMN. |

27 | Efficient Rollout Strategies for Bayesian Optimization | Eric Lee, David Eriksson, David Bindel, Bolong Cheng, Mike Mccourt | We show that a combination of quasi-Monte Carlo, common random numbers, and control variates significantly reduce the computational burden of rollout. We then formulate a policy-search based approach that removes the need to optimize the rollout acquisition function. |

28 | IDA with Background Knowledge | Zhuangyan Fang, Yangbo He | In this paper, we consider the problem of estimating all possible causal effects from observational data with two types of background knowledge: direct causal information and non-ancestral information. |

29 | Complete Dictionary Learning via $\ell_p$-norm Maximization | Yifei Shen, Ye Xue, Jun Zhang, Khaled Letaief, Vincent Lau | In this paper, we investigate a family of $\ell_p$-norm ($p>2,p \in N$) maximization approaches for the complete dictionary learning problem from theoretical and algorithmic aspects. |

30 | Collapsible IDA: Collapsing Parental Sets for Locally Estimating Possible Causal Effects | Yue Liu, Zhuangyan Fang, Yangbo He, Zhi Geng | In this paper, we combine the adjustment set selection procedure with the original IDA framework. |

31 | Learning Joint Nonlinear Effects from Single-variable Interventions in the Presence of Hidden Confounders | Sorawit Saengkyongam, Ricardo Silva | We propose an approach to estimate the effect of multiple simultaneous interventions in the presence of hidden confounders. |

32 | Causal screening in dynamical systems | S?ren Wengel Mogensen | We suggest some inexpensive methods for causal screening which provide output with a sound causal interpretation under the assumption of ancestral faithfulness. |

33 | Bayesian Online Prediction of Change Points | Diego Agudelo-Espa?a, Sebastian Gomez-Gonzalez, Stefan Bauer, Bernhard Sch?lkopf, Jan Peters | We extend the Bayesian Online Change Point Detection algorithm to also infer the number of time steps until the next change point (i.e., the residual time). |

34 | Walking on Two Legs: Learning Image Segmentation with Noisy Labels | Guohua Cheng, Hongli Ji, Yan Tian | In this paper, we propose a novel label refinement and sample reweighting method, and a novel generative adversarial network (GAN) is introduced to fuse these two models into an integrated framework. |

35 | Election Control by Manipulating Issue Significance | Andrew Estornell, Sanmay Das, Edith Elkind, Yevgeniy Vorobeychik | We study a variation of this model in which voters have judgments about relative importance of issues, and a malicious actor can manipulate these judgments. |

36 | Coresets for Estimating Means and Mean Square Error with Limited Greedy Samples | Saeed Vahidian, Baharan Mirzasoleiman, Alexander Cloninger | We introduce a scalable optimization algorithm with no correction steps (in contrast to Frank–Wolfe and its variants), a variant of gradient ascent for coreset selection in graphs, that greedily selects a weighted subset of vertices that are deemed most important to sample. |

37 | Robust Spatial-Temporal Incident Prediction | Ayan Mukhopadhyay, Kai Wang, Andrew Perrault, Mykel Kochenderfer, Milind Tambe, Yevgeniy Vorobeychik | We present a general approach for incident forecasting that is robust to spatial shifts. |

38 | Lagrangian Decomposition for Neural Network Verification | Rudy Bunel, Alessandro De Palma, Alban Desmaison, Krishnamurthy Dvijotham, Pushmeet Kohli, Philip Torr, M. Pawan Kumar | We propose a novel approach based on Lagrangian Decomposition. |

39 | Robust modal regression with direct gradient approximation of modal regression risk | Hiroaki Sasaki, Tomoya Sakai, Takafumi Kanamori | In this paper, we take a novel approach of \emph{directly} approximating the gradient of MRR in modal regression. |

40 | A Simple Online Algorithm for Competing with Dynamic Comparators | Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou | In this paper, we design a simple algorithm based on the online ensemble, which provably enjoys the same (even slightly stronger) guarantee as the state-of-the-art rate, yet is much more efficient because our algorithm does not involve any non-convex problem solving. |

41 | Skewness Ranking Optimization for Personalized Recommendation | Yu-Neng Chuang, Chih-Ming Chen, Chuan-Ju Wang, Ming-Feng Tsai | In this paper, we propose a novel optimization criterion that leverages features of the skew normal distribution to better model the problem of personalized recommendation. |

42 | High Dimensional Discrete Integration over the Hypergrid | Raj Kumar Maity, Arya Mazumdar, Soumyabrata Pal | In this paper, we show an improved technique to obtain an approximation factor of $4+O(1/q^2)$ to this problem. |

43 | Neural Likelihoods via Cumulative Distribution Functions | Pawel Chilinski, Ricardo Silva | We leverage neural networks as universal approximators of monotonic functions to build a parameterization of conditional cumulative distribution functions (CDFs). |

44 | Unknown mixing times in apprenticeship and reinforcement learning | Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour | We derive and analyze learning algorithms for apprenticeship learning, policy evaluation and policy gradient for average reward criteria. |

45 | TX-Ray: Quantifying and Explaining Model-Knowledge Transfer in (Un-)Supervised NLP | Nils Rethmeier, Vageesh Kumar Saxena, Isabelle Augenstein | Thus, for TX-Ray, we modify the established computer vision explainability principle of ‘visualizing preferred inputs of neurons’ to make it usable for both NLP and for transfer analysis. |

46 | What You See May Not Be What You Get: UCB Bandit Algorithms Robust to $\varepsilon$-Contamination | Laura Niss, Ambuj Tewari | Motivated by applications of bandit algorithms in education, we consider a stochastic multi-armed bandit problem with $\varepsilon$-contaminated rewards. |

47 | The Hawkes Edge Partition Model for Continuous-time Event-based Temporal Networks | Sikun Yang, Heinz Koeppl | We propose a novel probabilistic framework to model continuously generated interaction events data. |

48 | Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect | Priyank Agrawal, Theja Tulabandula | We provide novel algorithms that achieves sublinear regret in time and the relevant wear-in/wear-out parameters. |

49 | Compositional uncertainty in deep Gaussian processes | Ivan Ustyuzhaninov, Ieva Kazlauskaite, Markus Kaiser, Erik Bodin, Neill Campbell, Carl Henrik Ek | We show that the application of simplifying mean-field assumptions across the hierarchy leads to the layers of a DGP collapsing to near-deterministic transformations. |

50 | Streaming Nonlinear Bayesian Tensor Decomposition | Zhimeng Pan, Zheng Wang, Shandian Zhe | To address this problem, we propose Streaming Nonlinear Bayesian Tensor Decomposition (SNBTD) that can conduct high-quality, closed-form and iteration-free updates upon receiving new tensor entries. |

51 | Relaxed Multivariate Bernoulli Distribution and Its Applications to Deep Generative Models | Xi Wang, Junming Yin | In this work, we propose a multivariate generalization of the Relaxed Bernoulli distribution, which can be reparameterized and can capture the correlation between variables via a Gaussian copula. |

52 | One-Bit Compressed Sensing via One-Shot Hard Thresholding | Jie Shen | We study a non-convex sparsity-constrained program and present a novel and concise analysis that moves away from the widely used notion of Gaussian width. |

53 | GPIRT: A Gaussian Process Model for Item Response Theory | JBrandon Duck-Mayr, Roman Garnett, Jacob Montgomery | We propose a Bayesian nonparametric model that solves this problem by placing Gaussian process priors on the latent functions defining the IRFs. |

54 | Identifying causal effects in maximally oriented partially directed acyclic graphs | Emilija Perkovic | We develop a necessary and sufficient causal identification criterion for maximally oriented partially directed acyclic graphs (MPDAGs). |

55 | Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and Self-Control Gradient Estimator | Siamak Zamani Dadaneh, Shahin Boluki, Mingzhang Yin, Mingyuan Zhou, Xiaoning Qian | We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing. |

56 | Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison | Tengyang Xie, Nan Jiang | We prove performance guarantees of two algorithms for approximating Q* in batch reinforcement learning. |

57 | Towards Threshold Invariant Fair Classification | Mingliang Chen, Min Wu | To achieve this goal, this paper proposes to equalize the risk distributions among the groups via two approximation methods. |

58 | Optimal Statistical Hypothesis Testing for Social Choice | Lirong Xia | We address the following question in this paper: “What are the most robust statistical methods for social choice?” |

59 | A SUPER* Algorithm to Optimize Paper Bidding in Peer Review | Tanner Fiez, Nihar Shah, Lillian Ratliff | In this paper, we develop a framework to study this problem in a principled manner. |

60 | Measurement Dependence Inducing Latent Causal Models | Alex Markham, Moritz Grosse-Wentrup | We show that this task can be framed in terms of the graph theoretic problem of finding edge clique covers,resulting in an algorithm for returning minimal MeDIL causal models (minMCMs). |

61 | The Indian Chefs Process | Patrick Dallaire, Luca Ambrogioni, Ludovic Trottier, Umut G??l?, Max Hinne, Philippe Gigu?re, Marcel Gerven, Fran?ois Laviolette | This paper introduces the Indian chefs process (ICP) as a Bayesian nonparametric prior on the joint space of infinite directed acyclic graphs (DAGs) and orders that generalizes the Indian buffet process. |

62 | Spectral Methods for Ranking with Scarce Data | Lalit Jain, Anna Gilbert, Umang Varma | In this paper we modify a popular and well studied method, RankCentrality for rank aggregation to account for few comparisons and that incorporates additional feature information. |

63 | Anchored Causal Inference in the Presence of Measurement Error | Basil Saeed, Anastasiya Belyaeva, Yuhao Wang, Caroline Uhler | We consider the problem of learning a causal graph in the presence of measurement error. |

64 | How Private Are Commonly-Used Voting Rules? | Ao LIU, Yun Lu, Lirong Xia, Vassilis Zikas | In this work, we present the first framework for answering the question:“How private are commonly-used voting rules?" |

65 | Differentially Private Small Dataset Release Using Random Projections | Lovedeep Gondara, Ke Wang | As a solution, we propose DPRP (Differentially Private Data Release via Random Projections), a reconstruction based approach for releasing differentially private small datasets. |

66 | Semi-supervised Sequential Generative Models | Michael Teng, Tuan Anh Le, Adam Scibior, Frank Wood | We introduce a novel objective for training deep generative time-series models with discrete latent variables for which supervision is only sparsely available. |

67 | Robust contrastive learning and nonlinear ICA in the presence of outliers | Hiroaki Sasaki, Takashi Takenouchi, Ricardo Monti, Aapo Hyvarinen | We develop two robust nonlinear ICA methods based on the ?-divergence, which is a robust alternative to the KL-divergence in logistic regression. |

68 | Selling Data at an Auction under Privacy Constraints | Mengxiao Zhang, Fernando Beltran, Jiamou Liu | For a data broker who wants to purchase data from such data owners, we propose the SingleMindedQuery (SMQ) mechanism, which uses a reverse auction to select data owners and determine compensations. |

69 | Mixed-Membership Stochastic Block Models for Weighted Networks | Adrien Dulac, Eric Gaussier, Christine Largeron | We address in this study the problem of modeling weighted networks through generalized stochastic block models. |

70 | MaskAAE: Latent space optimization for Adversarial Auto-Encoders | Arnab Mondal, Sankalan Pal Chowdhury, Aravind Jayendran, Himanshu Asnani, Parag Singla, Prathosh A P | In this work, we hypothesise that the dimensionality of the AE model’s latent space has a critical effect on the quality of generated data. |

71 | Slice Sampling for General Completely Random Measures | Peiyuan Zhu, Alexandre Bouchard-Cote, Trevor Campbell | In this paper we present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables, enabling efficient, parallelized computation without sacrificing flexibility. |

72 | Semi-Supervised Learning: the Case When Unlabeled Data is Equally Useful | Jingge Zhu | In this work, we consider statistical models where the data distributions can be characterized by continuous parameters. |

73 | Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks | Meet Vadera, Brian Jalaian, Benjamin Marlin | In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier, extending prior work on the Bayesian Dark Knowledge framework. |

74 | Complex Markov Logic Networks: Expressivity and Liftability | Ondrej Kuzelka | We introduce complex MLNs, which use complex-valued weights, and show that, unlike standard MLNs with real-valued weights, complex MLNs are"fully expressive". |

75 | Faster algorithms for Markov equivalence | Zhongyi Hu, Robin Evans | Several criteria for this already exist, but in this paper we give a new non-parametric characterization in terms of the heads and tails that arise in the parameterization for discrete models. |

76 | Verifying Individual Fairness in Machine Learning Models | Philips George John, Deepak Vijaykeerthy, Diptikalyan Saha | Our objective is to construct verifiers for proving individual fairness of a given model, and we do so by considering appropriate relaxations of the problem. |

77 | An Interpretable and Sample Efficient Deep Kernel for Gaussian Process | Yijue Dai, Tianjian Zhang, Zhidi Lin, Feng Yin, Sergios Theodoridis, Shuguang Cui | We propose a novel Gaussian process kernel that takes advantage of a deep neural network (DNN) structure but retains good interpretability. |

78 | Amortized Bayesian Optimization over Discrete Spaces | Kevin Swersky, Yulia Rubanova, David Dohan, Kevin Murphy | Our key insight is that we can train a generative model to generate candidates that maximize the acquisition function. |

79 | Batch simulations and uncertainty quantification in Gaussian process surrogate approximate Bayesian computation | Marko Jarvenpaa, Aki Vehtari, Pekka Marttinen | In this article we improve this approach in several ways. |

80 | Deep Sigma Point Processes | Martin Jankowiak, Geoff Pleiss, Jacob Gardner | We introduce Deep Sigma Point Processes, a class of parametric models inspired by the compositional structure of Deep Gaussian Processes (DGPs). |

81 | Robust $k$-means++ | Amit Deshpande, Praneeth Kacham, Rameshwar Pratap | For any 0 |

82 | On Counterfactual Explanations under Predictive Multiplicity | Martin Pawelczyk, Klaus Broelemann, Gjergji. Kasneci | In this work, we derive a general upper bound for the costs of counterfactual explanations under predictive multiplicity. |

83 | A Practical Riemannian Algorithm for Computing Dominant Generalized Eigenspace | Zhiqiang Xu, Ping Li | In this work, we propose a practical Riemannian algorithm based on the first-order optimization on generalized Stiefel manifolds while efficiently leveraging second-order information. |

84 | No-regret Exploration in Contextual Reinforcement Learning | Aditya Modi, Ambuj Tewari | In this paper, we propose a no-regret online RL algorithm in the setting where the MDP parameters are obtained from the context using generalized linear mappings (GLMs). |

85 | Layering-MCMC for Structure Learning in Bayesian Networks | Jussi Viinikka, Mikko Koivisto | This renders the computation of the (unnormalized) posterior probability of a state, called layering, more involved, for which task we give an efficient dynamic programming algorithm. |

86 | C-MI-GAN : Estimation of Conditional Mutual Information using MinMax formulation | Arnab Mondal, Arnab Bhattacharjee, Sudipto Mukherjee, Himanshu Asnani, Sreeram Kannan, Prathosh A P | In this work, we focus on conditional mutual information (CMI) estimation by utilizing its formulation as a \textit{minmax} optimization problem. |

87 | Stochastic Variational Inference for Dynamic Correlated Topic Models | Federico Tomasi, Praveen Chandar, Gal Levy-Fix, Mounia Lalmas-Roelleke, Zhenwen Dai | We propose an extension to CTM that models the evolution of both topic correlation and word co-occurrence over time. |

88 | Adversarial Learning for 3D Matching | Wei Xing, Brian Ziebart | In this paper, we explore extending an adversarial approach to learning bipartite matchings that avoids these weaknesses to the three dimensional setting. |

89 | Ordering Variables for Weighted Model Integration | Vincent Derkinderen, Evert Heylen, Pedro Zuidberg Dos Martires, Samuel Kolb, Luc Raedt | We show how variable ordering heuristics from the discrete domain can be ported to the discrete-continuous domain. |

90 | Online Parameter-Free Learning of Multiple Low Variance Tasks | Giulia Denevi, Massimiliano Pontil, Dimitrios Stamos | We propose a method to learn a common bias vector for a growing sequence of low-variance tasks. |

91 | Zeroth Order Non-convex optimization with Dueling-Choice Bandits | Yichong Xu, Aparna Joshi, Aarti Singh, Artur Dubrawski | We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009),, where instead of directly querying the point with the maximum Upper Confidence Bound (UCB), we perform constrained optimization and use comparisons to filter out suboptimal points. |

92 | Semi-bandit Optimization in the Dispersed Setting | Travis Dick, Wesley Pegden, Maria-Florina Balcan | In the online setting, we must choose algorithm parameters for each instance as they arrive, and our goal is to be competitive with the best fixed algorithm in hindsight.There are two major challenges in online data-driven algorithm design. |

93 | Adapting Text Embeddings for Causal Inference | Victor Veitch, Dhanya Sridhar, David Blei | This paper develops a method to estimate such causal effects from observational text data, adjusting for confounding features of the text such as the subject or writing quality. |

94 | Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models | Zhijian Ou, Yunfu Song | In this paper, we show that the annoying difficulty of obtaining reliable stochastic gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed in a new method based on stochastic approximation (SA) theory of the Robbins-Monro type. |

95 | Hidden Markov Nonlinear ICA: Unsupervised Learning from Nonstationary Time Series | Hermanni H?lv?, Aapo Hyvarinen | The central idea in such works is that the latent components are assumed to be independent conditional on some observed auxiliary variables, such as the time-segment index. |

96 | Identification and Estimation of Causal Effects Defined by Shift Interventions | Numair Sani, Jaron Lee, Ilya Shpitser | We give sound and complete identification algorithms for both types of shift interventions, and derive efficient semi-parametric estimators for the mean response to a shift intervention in a special case motivated by a healthcare problem. |

97 | Risk Bounds for Low Cost Bipartite Ranking | San Gultekin, John Paisley | In this paper we consider an alternative approach, which leverages the structure of the widely-adopted pairwise squared loss, to obtain a stochastic and low cost algorithm that does not require stochastic gradients or learning rates. |

98 | Multitask Soft Option Learning | Maximilian Igl, Andrew Gambardella, Jinke He, Nantas Nardelli, N Siddharth, Wendelin Boehmer, Shimon Whiteson | We present Multitask Soft Option Learning (MSOL), a hierarchical multitask framework based on Planning as Inference. |

99 | 99% of Worker-Master Communication in Distributed Optimization Is Not Needed | Konstantin Mishchenko, Filip Hanzely, Peter Richtarik | In this paper we discuss sparsification of worker-to-server communication in large distributed systems. |

100 | Graphical continuous Lyapunov models | Gherardo Varando, Niels Richard Hansen | This parametrization canbe interpreted as a new graphical model class, and we show how the model classbehaves under marginalization and introduce a method for structure learning via$\ell_1$-penalized loss minimization. |

101 | Structure Learning for Cyclic Linear Causal Models | Carlos Amendola, Philipp Dettling, Mathias Drton, Federica Onori, Jun Wu | We consider the problem of structure learning for linear causal models based on observational data. |

102 | Sensor Placement for Spatial Gaussian Processes with Integral Observations | Krista Longi, Chang Rajani, Tom Sillanp??, Joni M?kinen, Timo Rauhala, Ari Salmi, Edward Haeggstr?m, Arto Klami | We provide tools for optimizing sensor locations for GPs using integral observations, extending both model-based and geometric strategies for GP sensor placement.We demonstrate the techniques in ultrasonic detection of fouling in closed pipes. |

103 | Active Model Estimation in Markov Decision Processes | Jean Tarbouriech, Shubhanshu Shekhar, Matteo Pirotta, Mohammad Ghavamzadeh, Alessandro Lazaric | To address this issue, we propose an algorithm that is based on maximum weighted entropy, a heuristic that stems from common sense and our theoretical analysis. |

104 | Dueling Posterior Sampling for Preference-Based Reinforcement Learning | Ellen Novoseller, Yibing Wei, Yanan Sui, Yisong Yue, Joel Burdick | Building upon ideas from preference-based bandit learning and posterior sampling in RL, we present DUELING POSTERIOR SAMPLING (DPS), which employs preference-based posterior sampling to learn both the system dynamics and the underlying utility function that governs the preference feedback. |

105 | Permutation-Based Causal Structure Learning with Unknown Intervention Targets | Chandler Squires, Yuhao Wang, Caroline Uhler | We consider the problem of estimating causal DAG models from a mix of observational and interventional data, when the intervention targets are partially or completely unknown. |

106 | MASSIVE: Tractable and Robust Bayesian Learning of Many-Dimensional Instrumental Variable Models | Ioan Gabriel Bucur, Tom Claassen, Tom Heskes | We propose a general and efficient causal inference algorithm that accounts for model uncertainty by performing Bayesian model averaging over the most promising many-dimensional instrumental variable models, while at the same time employing weaker assumptions regarding the data generating process. |

107 | Popularity Agnostic Evaluation of Knowledge Graph Embeddings | Aisha Mohamed, Shameem Parambath, Zoi Kaoudi, Ashraf Aboulnaga | In this paper, we show that the distribution of entities and relations in common knowledge graphs is highly skewed, with some entities and relations being much more popular than the rest. |

108 | Learning LWF Chain Graphs: A Markov Blanket Discovery Approach | Mohammad Ali Javidian, Marco Valtorta, Pooyan Jamshidi | We provide a sound and scalable constraint-based framework for learning the structure of LWF CGs from faithful causally sufficient data and prove its correctness when the Markov blanket discovery algorithms in this paper are used. |

109 | Batch norm with entropic regularization turns deterministic autoencoders into generative models | Amur Ghose, Abdullah Rashwan, Pascal Poupart | We show in this work that utilizing batch normalization as a source for non-determinism suffices to turn deterministic autoencoders into generative models on par with variational ones, so long as we add a suitable entropic regularization to the training objective. |

110 | Adaptive Hyper-box Matching for Interpretable Individualized Treatment Effect Estimation | Marco Morucci, Vittorio Orlandi, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky | We propose a matching method for observational data that matches units with others in unit-specific, hyper-box-shaped regions of the covariate space. |

111 | Generalized Policy Elimination: an efficient algorithm for Nonparametric Contextual Bandits | Aurelien Bibaut, Antoine Chambaz, Mark Laan | We propose the Generalized Policy Elimination (GPE) algorithm, an oracle-efficient contextual bandit (CB) algorithm inspired by the Policy Elimination algorithm of Dudik et al. [2011]. |

112 | Differentially Private Top-k Selection via Stability on Unknown Domain | Ricardo Silva Carvalho, Ke Wang, Lovedeep Gondara, Chunyan Miao | We propose a new method that satisfies approximate differential privacy for top-$k$ selection with unordered output in the unknown data domain setting, not relying on the full knowledge of the domain universe. |

113 | Active Learning of Conditional Mean Embeddings via Bayesian Optimisation | Sayak Ray Chowdhury, Rafael Oliveira, Fabio Ramos | We consider the problem of sequentially optimising the conditional expectation of an objective function, with both the conditional distribution and the objective function assumed to be fixed but unknown. |

114 | Flexible Prior Elicitation via the Prior Predictive Distribution | Marcelo Hartmann, Georgi Agiashvili, Paul B?rkner, Arto Klami | We propose an alternative approach that only requires knowledge about the observable outcomes – knowledge which is often much easier for experts to provide. |

115 | Model-Augmented Conditional Mutual Information Estimation for Feature Selection | Alan Yang, AmirEmad Ghassami, Maxim Raginsky, Negar Kiyavash, Elyse Rosenbaum | We propose a novel two-step approach which facilitates Markov blanket feature selection in high dimensions. |

116 | Finite-Memory Near-Optimal Learning for Markov Decision Processes with Long-Run Average Reward | Jan Kretinsky, Fabian Michel, Lukas Michel, Guillermo Perez | We consider learning policies online in Markov decision processes with the long-run average reward (a.k.a. mean payoff). |

117 | Constraint-Based Causal Discovery using Partial Ancestral Graphs in the presence of Cycles | Joris M. Mooij, Tom Claassen | In this work, we show that—surprisingly—the output of the Fast Causal Inference (FCI) algorithm is correct if it is applied to observational data generated by a system that involves feedback. |

118 | Estimation Rates for Sparse Linear Cyclic Causal Models | Jan-Christian Huetter, Philippe Rigollet | In this work, allowing for cycles, we study linear structural equations models with homoscedastic Gaussian noise and in the presence of interventions that make the model identifiable. |

119 | Prediction Intervals: Split Normal Mixture from Quality-Driven Deep Ensembles | T?rik S. Salem, Helge Langseth, Heri Ramampiaro | In this paper, we present a method for generating prediction intervals along with point estimates from an ensemble of neural networks. |

120 | On the Relationship Between Probabilistic Circuits and Determinantal Point Processes | Honghua Zhang, Steven Holtzen, Guy Broeck | We propose a unified analysis and shared language for discussing DPPs and PCs. |

121 | Probabilistic Safety for Bayesian Neural Networks | Matthew Wicker, Luca Laurenti, Andrea Patane, Marta Kwiatkowska | We apply our methods to BNNs trained on a regression task, airborne collision avoidance, and MNIST, empirically showing that our approach allows one to certify probabilistic safety of BNNs with millions of parameters. |

122 | Distortion estimates for approximate Bayesian inference | Hanwen Xing, Geoff Nicholls, Jeong (Kate) Lee | We give graphical diagnostics for posterior approximation at the observed data. |

123 | Mutual Information Based Knowledge Transfer Under State-Action Dimension Mismatch | Michael Wan, Tanmay Gangwani, Jian Peng | In this paper, we propose a new framework for transfer learning where the teacher and the student can have arbitrarily different state- and action-spaces. |

124 | Provably Efficient Third-Person Imitation from Offline Observation | Aaron Zweig, Joan Bruna | We present problem-dependent, statistical learning guarantees for third-person imitation from observation in an offline setting, and a lower bound on performance in the online setting. |

125 | Automated Dependence Plots | David Inouye, Liu Leqi, Joon Sik Kim, Bryon Aragam, Pradeep Ravikumar | To address these drawbacks, we formalize a method for automating the selection of interesting PDPs and extend PDPs beyond showing single features to show the model response along arbitrary directions, for example in raw feature space or a latent space arising from some generative model. |

126 | EiGLasso: Scalable Estimation of Cartesian Product of Sparse Inverse Covariance Matrices | Jun Ho Yoon, Seyoung Kim | In this paper, we address the problem of jointly estimating dependencies across samples and dependencies across multiple features, where each set of dependencies is modeled as an inverse covariance matrix. |

127 | Improved Vector Pruning in Exact Algorithms for Solving POMDPs | Eric Hansen, Thomas Bowman | In this paper, we show how to similarly limit the number of variables. |

128 | Symbolic Querying of Vector Spaces: Probabilistic Databases Meets Relational Embeddings | Tal Friedman, Guy Broeck | We propose unifying techniques from probabilistic databases and relational embedding models with the goal of performing complex queries on incomplete and uncertain data. |

129 | Learning to learn generative programs with Memoised Wake-Sleep | Luke Hewitt, Tuan Anh Le, Joshua Tenenbaum | To tackle the challenge of performing program induction as an ‘inner-loop’ to learning, we propose the Memoised Wake-Sleep (MWS) algorithm, which extends Wake Sleep by explicitly storing and reusing the best programs discovered by the inference network throughout training. |

130 | Flexible Approximate Inference via Stratified Normalizing Flows | Chris Cundy, Stefano Ermon | We develop an approximate inference procedure that allows explicit control of the bias/variance tradeoff, interpolating between the sampling and the variational regime. |

131 | Bounded Rationality in Las Vegas: Probabilistic Finite Automata Play Multi-Armed Bandits | Xinming Liu, Joseph Halpern | To test this hypothesis, we consider the multi-armed bandit (MAB) problem. |

132 | Greedy Policy Search: A Simple Baseline for Learnable Test-Time Augmentation | Alexander Lyzhov, Yuliya Molchanova, Arsenii Ashukha, Dmitry Molchanov, Dmitry Vetrov | The primary goal of this paper is to demonstrate that test-time augmentation policies can be successfully learned too. |

133 | Non Parametric Graph Learning for Bayesian Graph Neural Networks | Soumyasundar Pal, Saber Malekmohammadi, Florence Regol, Yingxue Zhang, Yishi Xu, Mark Coates | Inthis paper, we propose a novel non-parametricgraph model for constructing the posterior distributionof graph adjacency matrices. |

134 | Stable Policy Optimization via Off-Policy Divergence Regularization | Ahmed Touati, Amy Zhang, Joelle Pineau, Pascal Vincent | In this paper we revisit the theoretical foundations of these algorithms and propose a new algorithm which stabilizes the policy improvement through a proximity term that constrains the discounted state-action visitation distribution induced by consecutive policies to be close to one another. |

135 | PoRB-Nets: Poisson Process Radial Basis Function Networks | Beau Coker, Melanie Fernandez Pradier, Finale Doshi-Velez | We present a novel prior over radial basis function networks (RBFNs) that allows for independent specification of functional amplitude variance and lengthscale (i.e., smoothness), where the inverse lengthscale corresponds to the concentration of radial basis functions. |

136 | Deriving Bounds And Inequality Constraints Using Logical Relations Among Counterfactuals | Noam Finkelstein, Ilya Shpitser | We develop a new general method for obtaining bounds on causal parameters using rules of probability and restrictions on counterfactuals implied by causal graphical models. |

137 | Locally Masked Convolution for Autoregressive Models | Ajay Jain, Pieter Abbeel, Deepak Pathak | To generate data in arbitrary orders, we introduce LMConv: a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image. |

138 | Time Series Analysis using a Kernel based Multi-Modal Uncertainty Decomposition Framework | Rishabh Singh, Jose Principe | We demonstrate in this paper using synthesized datasets that such uncertainty features provide a better ability for online detection of statistical change points in time series data when compared to existing non-parametric and unsupervised methods. |

139 | OCEAN: Online Task Inference for Compositional Tasks with Context Adaptation | Hongyu Ren, Yuke Zhu, Jure Leskovec, Animashree Anandkumar, Animesh Garg | We propose a variational inference framework OCEAN to perform online task inference for compositional tasks. |

140 | Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets | Jakob Runge | The paper introduces a novel conditional independence (CI) based method for linear and nonlinear, lagged and contemporaneous causal discovery from observational time series in the causally sufficient case. |