Paper Digest: NIPS 2019 Highlights

November 14, 2018November 16, 2019 admin

Download NIPS-2019-Paper-Digests.pdf– highlights of all 1,427 NIPS-2019 papers.

The Conference on Neural Information Processing Systems (NIPS) is one of the top machine learning conferences in the world. In 2019, it is to be held in Vancouver, Canada. There were more than 6,743 paper submissions, of which around 1,427 were accepted. Many papers also published their code (download link).

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: NeurIPS 2019 Papers

	Title	Authors	Highlight
1	Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation	Risto Vuorio, Shao-Hua Sun, Hexiang Hu, Joseph J. Lim	In this paper, we augment MAML with the capability to identify the mode of tasks sampled from a multimodal task distribution and adapt quickly through gradient updates.
2	ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks	Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee	We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language.
3	Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers	Liwei Wu, Shuqing Li, Cho-Jui Hsieh, James L. Sharpnack	Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD).
4	Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video	Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, Ian Reid	This paper tackles these challenges by proposing a geometry consistency loss for scale-consistent predictions and an induced self-discovered mask for handling moving objects and occlusions.
5	Zero-shot Learning via Simultaneous Generating and Learning	Hyeonwoo Yu, Beomhee Lee	Beyond exploiting relations between classes of seen and unseen, we present a deep generative model to provide the model with experience about both seen and unseen classes.
6	Ask not what AI can do, but what AI should do: Towards a framework of task delegability	Brian Lubars, Chenhao Tan	We approach this problem of task delegability from a human-centered perspective by developing a framework on human perception of task delegation to AI.
7	Stand-Alone Self-Attention in Vision Models	Niki Parmar, Prajit Ramachandran, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jon Shlens	In developing and testing a pure self-attention vision model, we verify that self-attention can indeed be an effective stand-alone layer.
8	High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks	Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee	In this work, we question if such handcrafted architectures are necessary and instead propose a different approach: finding minimal inductive bias for video prediction while maximizing network capacity.
9	Unsupervised learning of object structure and dynamics from videos	Matthias Minderer, Chen Sun, Ruben Villegas, Forrester Cole, Kevin P. Murphy, Honglak Lee	To address this challenge, we adopt a keypoint-based image representation and learn a stochastic dynamics model of the keypoints.
10	GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism	Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, zhifeng Chen	To address the need for efficient and task-independent model parallelism, we introduce TensorPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers.
11	Meta-Learning with Implicit Gradients	Aravind Rajeswaran, Chelsea Finn, Sham M. Kakade, Sergey Levine	By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer.
12	Adversarial Examples Are Not Bugs, They Are Features	Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madry	We demonstrate that adversarial examples can be directly attributed to the presence of non-robust features: features (derived from patterns in the data distribution) that are highly predictive, yet brittle and (thus) incomprehensible to humans.
13	Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks	Vineet Kosaraju, Amir Sadeghian, Roberto Mart?n-Mart?n, Ian Reid, Hamid Rezatofighi, Silvio Savarese	In this paper, we present Social-BiGAT, a graph-based generative adversarial network that generates realistic, multimodal trajectory predictions for multiple pedestrians in a scene.
14	FreeAnchor: Learning to Match Anchors for Visual Object Detection	Xiaosong Zhang, Fang Wan, Chang Liu, Rongrong Ji, Qixiang Ye	In this study, we propose a learning-to-match approach to break IoU restriction, allowing objects to match anchors in a flexible manner.
15	Private Hypothesis Selection	Mark Bun, Gautam Kamath, Thomas Steinke, Steven Z. Wu	We provide a differentially private algorithm for hypothesis selection.
16	Differentially Private Algorithms for Learning Mixtures of Separated Gaussians	Gautam Kamath, Or Sheffet, Vikrant Singhal, Jonathan Ullman	In this work, we give new algorithms for learning the parameters of a high-dimensional, well separated, Gaussian mixture model subject to the strong constraint of differential privacy.
17	Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation	Mark Bun, Thomas Steinke	We propose the trimmed mean estimator, which interpolates between the mean and the median, as a way of attaining much lower sensitivity on average while losing very little in terms of statistical accuracy.
18	Multi-Resolution Weak Supervision for Sequential Data	Paroma Varma, Frederic Sala, Shiori Sagawa, Jason Fries, Daniel Fu, Saelig Khattar, Ashwini Ramamoorthy, Ke Xiao, Kayvon Fatahalian, James Priest, Christopher R?	We propose Dugong, the first framework to model multi-resolution weak supervision sources with complex correlations to assign probabilistic labels to training data.
19	DeepUSPS: Deep Robust Unsupervised Saliency Prediction via Self-supervision	Tam Nguyen, Maximilian Dax, Chaithanya Kumar Mummadi, Nhung Ngo, Thi Hoai Phuong Nguyen, Zhongyu Lou, Thomas Brox	In this work, we propose a two-stage mechanism for robust unsupervised object saliency prediction, where the first stage involves refinement of the noisy pseudo labels generated from different handcrafted methods.
20	The Point Where Reality Meets Fantasy: Mixed Adversarial Generators for Image Splice Detection	Vladimir V. Kniaz, Vladimir Knyaz, Fabio Remondino	In this paper, we propose a new framework for training of discriminative segmentation model via an adversarial process.
21	You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle	Dinghuai Zhang, Tianyuan Zhang, Yiping Lu, Zhanxing Zhu, Bin Dong	In this paper, we show that adversarial training can be cast as a discrete time differential game.
22	Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement	Chao Yang, Xiaojian Ma, Wenbing Huang, Fuchun Sun, Huaping Liu, Junzhou Huang, Chuang Gan	In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives.
23	Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance	Kimia Nadjahi, Alain Durmus, Umut Simsekli, Roland Badeau	In this study, we investigate the asymptotic properties of estimators that are obtained by minimizing SW.
24	Generalized Sliced Wasserstein Distances	Soheil Kolouri, Kimia Nadjahi, Umut Simsekli, Roland Badeau, Gustavo Rohde	In this paper, we first clarify the mathematical connection between the SW distance and the Radon transform. We then utilize the generalized Radon transform to define a new family of distances for probability measures, which we call generalized sliced-Wasserstein (GSW) distances.
25	First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise	Thanh Huy Nguyen, Umut Simsekli, Mert Gurbuzbalaban, Ga?l RICHARD	In this study, we provide formal theoretical analysis where we derive explicit conditions for the step-size such that the metastability behavior of the discrete-time system is similar to its continuous-time limit.
26	Blind Super-Resolution Kernel Estimation using an Internal-GAN	Sefi Bell-Kligler, Assaf Shocher, Michal Irani	In this paper we show how this powerful cross-scale recurrence property can be realized using Deep Internal Learning.
27	Noise-tolerant fair classification	Alex Lamy, Ziyuan Zhong	In this paper, we answer the question in the affirmative: we show that if one measures fairness using the mean-difference score, and sensitive features are subject to noise from the mutually contaminated learning model, then owing to a simple identity we only need to change the desired fairness-tolerance.
28	Generalization in Generative Adversarial Networks: A Novel Perspective from Privacy Protection	Bingzhe Wu, Shiwan Zhao, Chaochao Chen, Haoyang Xu, Li Wang, Xiaolu Zhang, Guangyu Sun, Jun Zhou	In this paper, we aim to understand the generalization properties of generative adversarial networks (GANs) from a new perspective of privacy protection.
29	Joint-task Self-supervised Learning for Temporal Correspondence	Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz, Ming-Hsuan Yang	This paper proposes to learn reliable dense correspondence from videos in a self-supervised manner.
30	Provable Gradient Variance Guarantees for Black-Box Variational Inference	Justin Domke	Recent variational inference methods use stochastic gradient estimators whose variance is not well understood.
31	Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation	Justin Domke, Daniel R. Sheldon	This paper gives bounds for the common “reparameterization” estimators when the target is smooth and the variational family is a location-scale distribution.
32	Experience Replay for Continual Learning	David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, Gregory Wayne	Here, we introduce CLEAR, a replay-based method that greatly reduces catastrophic forgetting in multi-task reinforcement learning.
33	Deep ReLU Networks Have Surprisingly Few Activation Patterns	Boris Hanin, David Rolnick	In this paper, we show that the average number of activation patterns for ReLU networks at initialization is bounded by the total number of neurons raised to the input dimension.
34	Chasing Ghosts: Instruction Following as Bayesian State Tracking	Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee	Based on this intuition, we formulate the problem of finding the goal location in Vision-and-Language Navigation (VLN) within the framework of Bayesian state tracking – learning observation and motion models conditioned on these expectable events.
35	Block Coordinate Regularization by Denoising	Yu Sun, Jiaming Liu, Ulugbek Kamilov	In this work, we develop a new block coordinate RED algorithm that decomposes a large-scale estimation problem into a sequence of updates over a small subset of the unknown variables.
36	Reducing Noise in GAN Training with Variance Reduced Extragradient	Tatjana Chavdarova, Gauthier Gidel, Fran?ois Fleuret, Simon Lacoste-Julien	We address this issue with a novel stochastic variance-reduced extragradient (SVRE) optimization algorithm, which for a large class of games improves upon the previous convergence rates proposed in the literature.
37	Learning Erdos-Renyi Random Graphs via Edge Detecting Queries	Zihan Li, Matthias Fresacher, Jonathan Scarlett	In this paper, we consider the problem of learning an unknown graph via queries on groups of nodes, with the result indicating whether or not at least one edge is present among those nodes.
38	A Primal-Dual link between GANs and Autoencoders	Hisham Husain, Richard Nock, Robert C. Williamson	In this work, we study the $f$-GAN and WAE models and make two main discoveries.
39	muSSP: Efficient Min-cost Flow Algorithm for Multi-object Tracking	Congchao Wang, Yizhi Wang, Yinxue Wang, Chiung-Ting Wu, Guoqiang Yu	In this paper, by exploiting the special structures and properties of the graphs formulated in MOT problems, we develop an efficient min-cost flow algorithm, namely, minimum-update Successive Shortest Path (muSSP).
40	Category Anchor-Guided Unsupervised Domain Adaptation for Semantic Segmentation	Qiming ZHANG, Jing Zhang, Wei Liu, Dacheng Tao	In this paper, we propose a novel category anchor-guided (CAG) UDA model for semantic segmentation, which explicitly enforces category-aware feature alignment to learn shared discriminative features and classifiers simultaneously.
41	Invert to Learn to Invert	Patrick Putzky, Max Welling	In this work, we propose an iterative inverse model with constant memory that relies on invertible networks to avoid storing intermediate activations.
42	Equitable Stable Matchings in Quadratic Time	Nikolaos Tziavelis, Ioannis Giannakopoulos, Katerina Doka, Nectarios Koziris, Panagiotis Karras	In this paper, we propose an alternative that is computationally simpler and achieves high equity too.
43	Zero-Shot Semantic Segmentation	Maxime Bucher, Tuan-Hung VU, Matthieu Cord, Patrick P?rez	In this paper, we introduce the new task of zero-shot semantic segmentation: learning pixel-wise classifiers for never-seen object categories with zero training examples.
44	Metric Learning for Adversarial Robustness	Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, Baishakhi Ray	Motivated by this observation, we propose to regularize the representation space under attack with metric learning to produce more robust classifiers.
45	DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction	Qiangeng Xu, Weiyue Wang, Duygu Ceylan, Radomir Mech, Ulrich Neumann	In this paper, we present DISN, a Deep Implicit Surface Net- work which can generate a high-quality detail-rich 3D mesh from a 2D image by predicting the underlying signed distance fields.
46	Batched Multi-armed Bandits Problem	Zijun Gao, Yanjun Han, Zhimei Ren, Zhengqing Zhou	In this paper, we study the multi-armed bandit problem in the batched setting where the employed policy must split data into a small number of batches.
47	vGraph: A Generative Model for Joint Community Detection and Node Representation Learning	Fan-Yun Sun, Meng Qu, Jordan Hoffmann, Chin-Wei Huang, Jian Tang	We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively.
48	Differentially Private Bayesian Linear Regression	Garrett Bernstein, Daniel R. Sheldon	We investigate the problem of Bayesian linear regression, with the goal of computing posterior distributions that correctly quantify uncertainty given privately released statistics.
49	Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos	Yitian Yuan, Lin Ma, Jingwen Wang, Wei Liu, Wenwu Zhu	In this paper, we propose a novel semantic conditioned dynamic modulation (SCDM) mechanism, which relies on the sentence semantics to modulate the temporal convolution operations for better correlating and composing the sentence related video contents over time.
50	AGEM: Solving Linear Inverse Problems via Deep Priors and Sampling	Bichuan Guo, Yuxing Han, Jiangtao Wen	In this paper we propose to use a denoising autoencoder (DAE) prior to simultaneously solve a linear inverse problem and estimate its noise parameter.
51	CPM-Nets: Cross Partial Multi-View Networks	Changqing Zhang, Zongbo Han, yajie cui, Huazhu Fu, Joey Tianyi Zhou, Qinghua Hu	To address the challenge, we propose a novel framework termed Cross Partial Multi-View Networks (CPM-Nets).
52	Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis	Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, hongsheng Li	In order to better exploit the semantic layout for the image generator, we propose to predict convolutional kernels conditioned on the semantic label map to generate the intermediate feature maps from the noise maps and eventually generate the images.
53	Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling	Andrey Kolobov, Yuval Peres, Cheng Lu, Eric J. Horvitz	We propose a novel optimization objective for this setting that has several practically desirable properties, and efficient algorithms for it with optimality guarantees even in the face of mixed content change observability and initially unknown change model parameters.
54	SySCD: A System-Aware Parallel Coordinate Descent Algorithm	Nikolas Ioannou, Celestine Mendler-D?nner, Thomas Parnell	In this paper we propose a novel parallel stochastic coordinate descent (SCD) algorithm with convergence guarantees that exhibits strong scalability.
55	Importance Weighted Hierarchical Variational Inference	Artem Sobolev, Dmitry P. Vetrov	To overcome this roadblock, we introduce a new family of variational upper bounds on a marginal log-density in the case of hierarchical models (also known as latent variable models).
56	RSN: Randomized Subspace Newton	Robert Gower, Dmitry Koralev, Felix Lieder, Peter Richtarik	We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology.
57	Trust Region-Guided Proximal Policy Optimization	Yuhui Wang, Hao He, Xiaoyang Tan, Yaozhong Gan	In this paper, we give an in-depth analysis on the exploration behavior of PPO, and show that PPO is prone to suffer from the risk of lack of exploration especially under the case of bad initialization, which may lead to the failure of training or being trapped in bad local optima.
58	Adversarial Self-Defense for Cycle-Consistent GANs	Dina Bashkirova, Ben Usman, Kate Saenko	In this paper, we show how such self-attacking behavior of unsupervised translation methods affects their performance and provide two defense techniques.
59	Towards closing the gap between the theory and practice of SVRG	Othmane Sebbouh, Nidham Gazagnadou, Samy Jelassi, Francis Bach, Robert Gower	Our first contribution is that we take several steps towards closing this gap.
60	Uniform Error Bounds for Gaussian Process Regression with Application to Safe Control	Armin Lederer, Jonas Umlauft, Sandra Hirche	In this paper, we employ the Gaussian process distribution and continuity arguments to derive a novel uniform error bound under weaker assumptions.
61	ETNet: Error Transition Network for Arbitrary Style Transfer	Chunjin Song, Zhijie Wu, Yang Zhou, Minglun Gong, Hui Huang	Inspired by the works on error-correction, instead, we propose a self-correcting model to predict what is wrong with the current stylization and refine it accordingly in an iterative manner.
62	No Pressure! Addressing the Problem of Local Minima in Manifold Learning Algorithms	Max Vladymyrov	We propose a natural extension to several manifold learning methods aimed at identifying pressured points, i.e. points stuck in the poor local minima and have poor embedding quality.
63	Deep Equilibrium Models	Shaojie Bai, J. Zico Kolter, Vladlen Koltun	We present a new approach to modeling sequential data: the deep equilibrium model (DEQ).
64	Saccader: Improving Accuracy of Hard Attention Models for Vision	Gamaleldin Elsayed, Simon Kornblith, Quoc V. Le	Here, we propose a novel hard attention model, which we term Saccader.
65	Multiway clustering via tensor block models	Miaoyan Wang, Yuchen Zeng	We propose a tensor block model, develop a unified least-square estimation, and obtain the theoretical accuracy guarantees for multiway clustering.
66	Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives	Wang Chi Cheung	We propose a no-regret algorithm based on the Frank-Wolfe algorithm (Frank and Wolfe 1956), UCRL2 (Jaksch et al. 2010), as well as a crucial and novel gradient threshold procedure.
67	NAT: Neural Architecture Transformer for Accurate and Compact Architectures	Yong Guo, Yin Zheng, Mingkui Tan, Qi Chen, Jian Chen, Peilin Zhao, Junzhou Huang	To make the problem feasible, we cast the optimization problem into a Markov decision process (MDP) and seek to learn a Neural Architecture Transformer (NAT) to replace the redundant operations with the more computationally efficient ones (e.g., skip connection or directly removing the connection).
68	Selecting Optimal Decisions via Distributionally Robust Nearest-Neighbor Regression	Ruidi Chen, Ioannis Paschalidis	This paper develops a prediction-based prescriptive model for optimal decision making that (i) predicts the outcome under each action using a robust nonlinear model, and (ii) adopts a randomized prescriptive policy determined by the predicted outcomes.
69	Network Pruning via Transformable Architecture Search	Xuanyi Dong, Yi Yang	To break the structure limitation of the pruned networks, we propose to apply neural architecture search to search directly for a network with flexible channel and layer sizes.
70	Differentiable Cloth Simulation for Inverse Problems	Junbang Liang, Ming Lin, Vladlen Koltun	We propose a differentiable cloth simulator that can be embedded as a layer in deep neural networks.
71	Poisson-Randomized Gamma Dynamical Systems	Aaron Schein, Scott Linderman, Mingyuan Zhou, David Blei, Hanna Wallach	This paper presents the Poisson-randomized gamma dynamical system (PRGDS), a model for sequentially observed count tensors that encodes a strong inductive bias toward sparsity and burstiness.
72	Volumetric Correspondence Networks for Optical Flow	Gengshan Yang, Deva Ramanan	Instead, we introduce several simple modifications that dramatically simplify the use of volumetric layers – (1) volumetric encoder-decoder architectures that efficiently capture large receptive fields, (2) multi-channel cost volumes that capture multi-dimensional notions of pixel similarities, and finally, (3) separable volumetric filtering that significantly reduces computation and parameters while preserving accuracy.
73	Learning Conditional Deformable Templates with Convolutional Networks	Adrian Dalca, Marianne Rakic, John Guttag, Mert Sabuncu	In this work, we present a probabilistic model and efficient learning strategy that yields either universal or \textit{conditional} templates, jointly with a neural network that provides efficient alignment of the images to these templates.
74	Fast Low-rank Metric Learning for Large-scale and High-dimensional Data	Han Liu, Zhizhong Han, Yu-Shen Liu, Ming Gu	To address this issue, we present a novel fast low-rank metric learning (FLRML) method.
75	Efficient Symmetric Norm Regression via Linear Sketching	Zhao Song, Ruosong Wang, Lin Yang, Hongyang Zhang, Peilin Zhong	We provide efficient algorithms for overconstrained linear regression problems with size $n \times d$ when the loss function is a symmetric norm (a norm invariant under sign-flips and coordinate-permutations).
76	RUBi: Reducing Unimodal Biases for Visual Question Answering	Remi Cadene, Corentin Dancette, Hedi Ben younes, Matthieu Cord, Devi Parikh	We propose RUBi, a new learning strategy to reduce biases in any VQA model.
77	Why Can’t I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition	Jinwoo Choi, Chen Gao, Joseph C. E. Messou, Jia-Bin Huang	In this paper, we propose to mitigate scene bias for video representation learning.
78	NeurVPS: Neural Vanishing Point Scanning via Conic Convolution	Yichao Zhou, Haozhi Qi, Jingwei Huang, Yi Ma	In this work, we identify a canonical conic space in which the neural network can effectively compute the global geometric information of vanishing points locally, and we propose a novel operator named conic convolution that can be implemented as regular convolutions in this space.
79	DATA: Differentiable ArchiTecture Approximation	Jianlong Chang, xinbang zhang, Yiwen Guo, GAOFENG MENG, SHIMING XIANG, Chunhong Pan	To bridge this gap, we develop Differentiable ArchiTecture Approximation (DATA) with an Ensemble Gumbel-Softmax (EGS) estimator to automatically approximate architectures during searching and validating in a differentiable manner.
80	Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge	Tingting Qiao, Jing Zhang, Duanqing Xu, Dacheng Tao	In this paper, and inspired by this process, we propose a novel text-to-image method called LeicaGAN to combine the above three phases in a unified framework.
81	Memory-oriented Decoder for Light Field Salient Object Detection	Miao Zhang, Jingjing Li, JI WEI, Yongri Piao, Huchuan Lu	In this paper, we present a deep-learning-based method where a novel memory-oriented decoder is tailored for light field saliency detection.
82	Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition	Xuesong Niu, Hu Han, Shiguang Shan, Xilin Chen	In this work, we propose a semi-supervised approach for AU recognition utilizing a large number of web face images without AU labels and a small face dataset with AU labels inspired by the co-training methods.
83	Correlated Uncertainty for Learning Dense Correspondences from Noisy Labels	Natalia Neverova, David Novotny, Andrea Vedaldi	We address this issue by augmenting neural network predictors with the ability to output a distribution over labels, thus explicitly and introspectively capturing the aleatoric uncertainty in the annotations.
84	Powerset Convolutional Neural Networks	Chris Wendler, Markus P?schel, Dan Alistarh	We present a novel class of convolutional neural networks (CNNs) for set functions, i.e., data indexed with the powerset of a finite set.
85	Optimal Pricing in Repeated Posted-Price Auctions with Different Patience of the Seller and the Buyer	Arsenii Vanunts, Alexey Drutsa	We study revenue optimization pricing algorithms for repeated posted-price auctions where a seller interacts with a single strategic buyer that holds a fixed private valuation.
86	An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums	Hadrien Hendrikx, Francis Bach, Laurent Massouli?	In this work, we propose an efficient \textbf{A}ccelerated \textbf{D}ecentralized stochastic algorithm for \textbf{F}inite \textbf{S}ums named ADFS, which uses local stochastic proximal updates and randomized pairwise communications between nodes.
87	Point-Voxel CNN for Efficient 3D Deep Learning	Zhijian Liu, Haotian Tang, Yujun Lin, Song Han	In this paper, we propose PVCNN that represents the 3D input data in points to reduce the memory consumption, while performing the convolutions in voxels to reduce the irregular, sparse data access and improve the locality.
88	Deep Learning without Weight Transport	Mohamed Akrout, Collin Wilson, Peter Humphreys, Timothy Lillicrap, Douglas B. Tweed	Here we describe two mechanisms — a neural circuit called a weight mirror and a modification of an algorithm proposed by Kolen and Pollack in 1994 — both of which let the feedback path learn appropriate synaptic weights quickly and accurately even in large networks, without weight transport or complex wiring.
89	Combinatorial Bandits with Relative Feedback	Aadirupa Saha, Aditya Gopalan	For both settings, we devise instance-dependent and order-optimal regret algorithms with regret $O(\frac{n}{m} \ln T)$ and $O(\frac{n}{k} \ln T)$, respectively.
90	General Proximal Incremental Aggregated Gradient Algorithms: Better and Novel Results under General Scheme	Tao Sun, Yuejiao Sun, Dongsheng Li, Qing Liao	In this paper, we propose a general proximal incremental aggregated gradient algorithm, which contains various existing algorithms including the basic incremental aggregated gradient method.
91	A Condition Number for Joint Optimization of Cycle-Consistent Networks	Leonidas J. Guibas, Qixing Huang, Zhenxiao Liang	This paper presents an algorithm that select a subset of weighted cycles to minimize a condition number of the induced joint optimization problem.
92	Explicit Disentanglement of Appearance and Perspective in Generative Models	Nicki Skafte, S?ren Hauberg	Specifically, we propose a model with two latent spaces: one that represents spatial transformations of the input data, and another that represents the transformed data.
93	Polynomial Cost of Adaptation for X-Armed Bandits	Hedi Hadiji	In the context of stochastic continuum-armed bandits, we present an algorithm that adapts to the unknown smoothness of the objective function.
94	Learning to Propagate for Graph Meta-Learning	LU LIU, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang	In this paper, we show that a meta-learner that explicitly relates tasks on a graph describing the relations of their output dimensions (e.g., classes) can significantly improve few-shot learning.
95	Secretary Ranking with Minimal Inversions	Sepehr Assadi, Eric Balkanski, Renato Leme	We present an algorithm that ranks n elements with only O(n^{3/2}) inversions in expectation, and show that any algorithm necessarily suffers \Omega(n^{3/2}) inversions when there are n available positions.
96	Nonparametric Regressive Point Processes Based on Conditional Gaussian Processes	Siqi Liu, Milos Hauskrecht	In this work, we propose and develop a new nonparametric regressive point process model based on Gaussian processes.
97	Learning Perceptual Inference by Contrasting	Chi Zhang, Baoxiong Jia, Feng Gao, Yixin Zhu, HongJing Lu, Song-Chun Zhu	In this work, we study how to improve machines’ reasoning ability on one challenging task of this kind: Raven’s Progressive Matrices (RPM).
98	Selecting the independent coordinates of manifolds with large aspect ratios	Yu-Chia Chen, Marina Meila	Hence, we propose a bicriterial Independent Eigencoordinate Selection (IES) algorithm that selects smooth embeddings with few eigenvectors.
99	Region-specific Diffeomorphic Metric Mapping	Zhengyang Shen, Francois-Xavier Vialard, Marc Niethammer	We introduce a region-specific diffeomorphic metric mapping (RDMM) registration approach.
100	Deep Supervised Summarization: Algorithm and Application to Learning Instructions	Chengguang Xu, Ehsan Elhamifar	To do so, we propose to learn representations of data so that the input of transformed data to the facility location recovers their ground-truth representatives.
101	Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations	Vincent Sitzmann, Michael Zollhoefer, Gordon Wetzstein	We propose Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance.
102	Reconciling ?-Returns with Experience Replay	Brett Daley, Christopher Amato	Towards this, we propose the first method to enable practical use of ?-returns in arbitrary replay-based methods without relying on other forms of decorrelation such as asynchronous gradient updates.
103	Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence	Fengxiang He, Tongliang Liu, Dacheng Tao	This paper reports both theoretical and empirical evidence of a training strategy that we should control the ratio of batch size to learning rate not too large to achieve a good generalization ability.
104	Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs	Max Simchowitz, Kevin G. Jamieson	This paper establishes that optimistic algorithms attain gap-dependent and non-asymptotic logarithmic regret for episodic MDPs.
105	A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation	Mitsuru Kusumoto, Takuya Inoue, Gentaro Watanabe, Takuya Akiba, Masanori Koyama	In this paper, we will propose a novel and efficient recomputation method that can be applied to a wider range of neural nets than previous methods.
106	Combinatorial Inference against Label Noise	Paul Hongsuck Seo, Geeho Kim, Bohyung Han	To handle the label noise issue in a principled way, we propose a unique classification framework of constructing multiple models in heterogeneous coarse-grained meta-class spaces and making joint inference of the trained models for the final predictions in the original (base) class space.
107	Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning	Chao Qu, Shie Mannor, Huan Xu, Yuan Qi, Le Song, Junwu Xiong	We consider the networked multi-agent reinforcement learning (MARL) problem in a fully decentralized setting, where agents learn to coordinate to achieve joint success.
108	Convolution with even-sized kernels and symmetric padding	Shuang Wu, Guanrui Wang, Pei Tang, Feng Chen, Luping Shi	In this work, we quantify the shift problem occurs in even-sized kernel convolutions by an information erosion hypothesis, and eliminate it by proposing symmetric padding on four sides of the feature maps (C2sp, C4sp).
109	On The Classification-Distortion-Perception Tradeoff	Dong Liu, Haochen Zhang, Zhiwei Xiong	In this paper, we extend the previous perception-distortion tradeoff to the case of classification-distortion-perception (CDP) tradeoff, where we introduced the classification error rate of the restored signal in addition to distortion and perceptual difference.
110	Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up	Dominic Richards, Patrick Rebeschini	We analyse the learning performance of Distributed Gradient Descent in the context of multi-agent decentralised non-parametric regression with the square loss function when i.i.d. samples are assigned to agents.
111	Online sampling from log-concave distributions	Holden Lee, Oren Mangoubi, Nisheeth Vishnoi	Technically, lack of strong convexity is a significant barrier to analysis and, here, our main contribution is a martingale exit time argument that shows our Markov chain remains in a ball of radius roughly poly-logarithmic in $T$ for enough time to reach within $\epsilon$ of $\pi_t$.
112	Envy-Free Classification	Maria-Florina F. Balcan, Travis Dick, Ritesh Noothigattu, Ariel D. Procaccia	On a conceptual level, we argue that envy-freeness also provides a compelling notion of fairness for classification tasks, especially when individuals have heterogeneous preferences.
113	Finding Friend and Foe in Multi-Agent Games	Jack Serrino, Max Kleiman-Weiner, David C. Parkes, Josh Tenenbaum	Here we develop the DeepRole algorithm, a multi-agent reinforcement learning agent that we test on “The Resistance: Avalon”, the most popular hidden role game.
114	Image Synthesis with a Single (Robust) Classifier	Shibani Santurkar, Andrew Ilyas, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madry	We show that the basic classification framework alone can be used to tackle some of the most challenging tasks in image synthesis.
115	Model Compression with Adversarial Robustness: A Unified Optimization Framework	Shupeng Gui, Haotao N. Wang, Haichuan Yang, Chen Yu, Zhangyang Wang, Ji Liu	We propose a novel Adversarially Trained Model Compression (ATMC) framework.
116	Cross-channel Communication Networks	Jianwei Yang, Zhile Ren, Chuang Gan, Hongyuan Zhu, Devi Parikh	We introduce a novel network unit called Cross-channel Communication (C3) block, a simple yet effective module to encourage the neuron communication within the same layer.
117	CondConv: Conditionally Parameterized Convolutions for Efficient Inference	Brandon Yang, Gabriel Bender, Quoc V. Le, Jiquan Ngiam	We propose conditionally parameterized convolutions (CondConv), which learn specialized convolutional kernels for each example.
118	Regression Planning Networks	Danfei Xu, Roberto Mart?n-Mart?n, De-An Huang, Yuke Zhu, Silvio Savarese, Li F. Fei-Fei	In this work, we combine the benefits of these two paradigms and propose a learning-to-plan method that can directly generate a long-term symbolic plan conditioned on high-dimensional observations.
119	Twin Auxilary Classifiers GAN	Mingming Gong, Yanwu Xu, Chunyuan Li, Kun Zhang, Kayhan Batmanghelich	In this paper, we identify the source of low diversity issue theoretically and propose a practical solution to the problem.
120	Conditional Structure Generation through Graph Variational Generative Adversarial Nets	Carl Yang, Peiye Zhuang, Wenhan Shi, Alan Luu, Pan Li	While existing graph generative models only consider graph structures without semantic contexts, we formulate the novel problem of conditional structure generation, and propose a novel unified model of graph variational generative adversarial nets (CondGen) to handle the intrinsic challenges of flexible context-structure conditioning and permutation-invariant generation.
121	Distributional Policy Optimization: An Alternative Approach for Continuous Control	Chen Tessler, Guy Tennenholtz, Shie Mannor	We identify a fundamental problem in policy gradient-based methods in continuous control.
122	Sampling Sketches for Concave Sublinear Functions of Frequencies	Edith Cohen, Ofir Geri	Our main contribution is the design of composable sampling sketches that can be tailored to any concave sublinear function of the frequencies.
123	Deliberative Explanations: visualizing network insecurities	Pei Wang, Nuno Nvasconcelos	A new approach to explainable AI, denoted {\it deliberative explanations,\/} is proposed.
124	Computing Full Conformal Prediction Set with Approximate Homotopy	Eugene Ndiaye, Ichiro Takeuchi	We propose efficient algorithms to compute conformal prediction set using approximated solution of (convex) regularized empirical risk minimization.
125	Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift	Stephan Rabanser, Stephan G?nnemann, Zachary Lipton	This paper explores the problem of building ML systems that fail loudly, investigating methods for detecting dataset shift, identifying exemplars that most typify the shift, and quantifying shift malignancy.
126	Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards	Siyuan Li, Rui Wang, Minxue Tang, Chongjie Zhang	In this paper, we aim to adapt low-level skills to downstream tasks while maintaining the generality of reward design.
127	Multi-View Reinforcement Learning	Minne Li, Lisheng Wu, Jun WANG, Haitham Bou Ammar	We define the MVRL framework by extending partially observable Markov decision processes (POMDPs) to support more than one observation model and propose two solution methods through observation augmentation and cross-view policy transfer.
128	Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution	Thang Vu, Hyunjun Jang, Trung X. Pham, Chang Yoo	This paper considers an architecture referred to as Cascade Region Proposal Network (Cascade RPN) for improving the region-proposal quality and detection performance by systematically addressing the limitation of the conventional RPN that heuristically defines the anchors and aligns the features to the anchors.
129	Neural Diffusion Distance for Image Segmentation	Jian Sun, Zongben Xu	In this work, we propose a spec-diff-net for computing diffusion distance on graph based on approximate spectral decomposition.
130	Fine-grained Optimization of Deep Neural Networks	Mete Ozay	In this work, we conjecture that if we can impose multiple constraints on weights of DNNs to upper bound the norms of the weights, and train the DNNs with these weights, then we can attain empirical generalization errors closer to the derived theoretical bounds, and improve accuracy of the DNNs. To this end, we pose two problems.
131	Extending Stein's unbiased risk estimator to train deep denoisers with correlated pairs of noisy images	Magauiya Zhussip, Shakarim Soltanayev, Se Young Chun	Here, we propose an extended SURE (eSURE) to train deep denoisers with correlated pairs of noise realizations per image and applied it to the case with two uncorrelated realizations per image to achieve better performance than SURE based method and comparable results to Noise2Noise.
132	Fixing Implicit Derivatives: Trust-Region Based Learning of Continuous Energy Functions	Chris Russell, Matteo Toso, Neill Campbell	We present a new technique for the learning of continuous energy functions that we refer to as Wibergian Learning.
133	Hyperspherical Prototype Networks	Pascal Mettes, Elise van der Pol, Cees Snoek	This paper introduces hyperspherical prototype networks, which unify classification and regression with prototypes on hyperspherical output spaces.
134	Expressive power of tensor-network factorizations for probabilistic modeling	Ivan Glasser, Ryan Sweke, Nicola Pancotti, Jens Eisert, Ignacio Cirac	Inspired by these developments, and the natural correspondence between tensor networks and probabilistic graphical models, we provide a rigorous analysis of the expressive power of various tensor-network factorizations of discrete multivariate probability distributions.
135	HyperGCN: A New Method For Training Graph Convolutional Networks on Hypergraphs	Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Vikram Nitin, Anand Louis, Partha Talukdar	Motivated by the fact that a graph convolutional network (GCN) has been effective for graph-based SSL, we propose HyperGCN, a novel GCN for SSL on attributed hypergraphs.
136	SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points	Zhize Li	We analyze stochastic gradient algorithms for optimizing nonconvex problems.
137	Efficient Meta Learning via Minibatch Proximal Update	Pan Zhou, Xiaotong Yuan, Huan Xu, Shuicheng Yan, Jiashi Feng	To remedy this deficiency, in this paper we propose a minibatch proximal update based meta-learning approach for learning to efficient hypothesis transfer.
138	Unconstrained Monotonic Neural Networks	Antoine Wehenkel, Gilles Louppe	In this work, we propose the Unconstrained Monotonic Neural Network (UMNN) architecture based on the insight that a function is monotonic as long as its derivative is strictly positive.
139	Guided Similarity Separation for Image Retrieval	Chundi Liu, Guangwei Yu, Maksims Volkovs, Cheng Chang, Himanshu Rai, Junwei Ma, Satya Krishna Gorti	In this work we propose a different approach where we leverage graph convolutional networks to directly encode neighbor information into image descriptors.
140	Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss	Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma	Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes. We design two novel methods to improve performance in such scenarios.
141	Strategizing against No-regret Learners	Yuan Deng, Jon Schneider, Balasubramanian Sivan	We study this question and show that under some mild assumptions, the player can always guarantee himself a utility of at least what he would get in a Stackelberg equilibrium.
142	D-VAE: A Variational Autoencoder for Directed Acyclic Graphs	Muhan Zhang, Shali Jiang, Zhicheng Cui, Roman Garnett, Yixin Chen	In this paper, we study deep generative models for DAGs, and propose a novel DAG variational autoencoder (D-VAE).
143	Hierarchical Optimal Transport for Document Representation	Mikhail Yurochkin, Sebastian Claici, Edward Chien, Farzaneh Mirzazadeh, Justin M. Solomon	As an alternative, we introduce hierarchical optimal transport as a meta-distance between documents, where documents are modeled as distributions over topics, which themselves are modeled as distributions over words.
144	Multivariate Sparse Coding of Nonstationary Covariances with Gaussian Processes	Rui Li	We propose a unified nonstationary modeling framework to jointly encode the observation correlations to generate a piece-wise representation with a hyper-level Gaussian process (GP) governing the overall contour of the pieces.
145	Positional Normalization	Boyi Li, Felix Wu, Kilian Q. Weinberger, Serge Belongie	In this paper, we propose a novel normalization method that deviates from this theme.
146	A New Defense Against Adversarial Images: Turning a Weakness into a Strength	Shengyuan Hu, Tao Yu, Chuan Guo, Wei-Lun Chao, Kilian Q. Weinberger	In this paper, we adopt a novel perspective and regard the omnipresence of adversarial perturbations as a strength rather than a weakness.
147	Quadratic Video Interpolation	Xiangyu Xu, Li Siyao, Wenxiu Sun, Qian Yin, Ming-Hsuan Yang	To address these issues, we propose a quadratic video interpolation method which exploits the acceleration information in videos.
148	ResNets Ensemble via the Feynman-Kac Formalism to Improve Natural and Robust Accuracies	Bao Wang, Zuoqiang Shi, Stanley Osher	Based on this unified viewpoint, we propose a simple yet effective ResNets ensemble algorithm to boost the accuracy of the robustly trained model on both clean and adversarial images.
149	Incremental Scene Synthesis	Benjamin Planche, Xuejian Rong, Ziyan Wu, Srikrishna Karanam, Harald Kosch, YingLi Tian, Jan Ernst, ANDREAS HUTTER	We present a method to incrementally generate complete 2D or 3D scenes with the following properties: (a) it is globally consistent at each step according to a learned scene prior, (b) real observations of a scene can be incorporated while observing global consistency, (c) unobserved regions can be hallucinated locally in consistence with previous observations, hallucinations and global priors, and (d) hallucinations are statistical in nature, i.e., different scenes can be generated from the same observations.
150	Self-Supervised Generalisation with Meta Auxiliary Learning	Shikun Liu, Andrew Davison, Edward Johns	We propose a new method which automatically learns appropriate labels for an auxiliary task, such that any supervised learning task can be improved without requiring access to any further data.
151	Variational Denoising Network: Toward Blind Noise Modeling and Removal	Zongsheng Yue, Hongwei Yong, Qian Zhao, Deyu Meng, Lei Zhang	In this work we propose a new variational inference method, which integrates both noise estimation and image denoising into a unique Bayesian framework, for blind image denoising.
152	Fast Sparse Group Lasso	Yasutoshi Ida, Yasuhiro Fujiwara, Hisashi Kashima	This paper proposes a fast Block Coordinate Descent for Sparse Group Lasso.
153	Learnable Tree Filter for Structure-preserving Feature Transform	Lin Song, Yanwei Li, Zeming Li, Gang Yu, Hongbin Sun, Jian Sun, Nanning Zheng	In this paper, we propose the learnable tree filter to form a generic tree filtering module that leverages the structural property of minimal spanning tree to model long-range dependencies while preserving the details.
154	Data-Dependence of Plateau Phenomenon in Learning with Neural Network — Statistical Mechanical Analysis	Yuki Yoshida, Masato Okada	In this paper, using statistical mechanical formulation, we clarified the relationship between the plateau phenomenon and the statistical property of the data learned.
155	Coordinated hippocampal-entorhinal replay as structural inference	Talfan Evans, Neil Burgess	We propose that this offline inference corresponds to coordinated hippocampal-entorhinal replay during sharp wave ripples.
156	Cascaded Dilated Dense Network with Two-step Data Consistency for MRI Reconstruction	Hao Zheng, Faming Fang, Guixu Zhang	Inspired by recent deep learning methods, we propose a Cascaded Dilated Dense Network (CDDN) for MRI reconstruction.
157	On the Ineffectiveness of Variance Reduced Optimization for Deep Learning	Aaron Defazio, Leon Bottou	We show that naive application of the SVRG technique and related approaches fail, and explore why.
158	On the Curved Geometry of Accelerated Optimization	Aaron Defazio	In this work we propose a differential geometric motivation for Nesterov’s accelerated gradient method (AGM) for strongly-convex problems.
159	Multi-marginal Wasserstein GAN	Jiezhang Cao, Langyuan Mo, Yifan Zhang, Kui Jia, Chunhua Shen, Mingkui Tan	In this paper, we propose a novel Multi-marginal Wasserstein GAN (MWGAN) to minimize Wasserstein distance among domains.
160	Better Exploration with Optimistic Actor Critic	Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann	To address both of these phenomena, we introduce a new algorithm, Optimistic Actor Critic, which approximates a lower and upper confidence bound on the state-action value function.
161	Importance Resampling for Off-policy Prediction	Matthew Schlegel, Wesley Chung, Daniel Graves, Jian Qian, Martha White	In this work, we explore a resampling strategy as an alternative to reweighting.
162	The Label Complexity of Active Learning from Observational Data	Songbai Yan, Kamalika Chaudhuri, Tara Javidi	We provably demonstrate that the result of this is an algorithm which is statistically consistent as well as more label-efficient than prior work.
163	Meta-Learning Representations for Continual Learning	Khurram Javed, Martha White	In this paper, we propose OML, an objectivethat directly minimizes catastrophic interference by learning representations thataccelerate future learning and are robust to forgetting under online updates in con-tinual learning.
164	Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training	Haichao Zhang, Jianyu Wang	We introduce a feature scattering-based adversarial training approach for improving model robustness against adversarial attacks.
165	Visualizing the PHATE of Neural Networks	Scott Gigante, Adam S. Charles, Smita Krishnaswamy, Gal Mishne	To this end, we introduce a novel visualization algorithm that reveals the internal geometry of such networks: Multislice PHATE (M-PHATE), the first method designed explicitly to visualize how a neural network’s hidden representations of data evolve throughout the course of training.
166	The Cells Out of Sample (COOS) dataset and benchmarks for measuring out-of-sample generalization of image classifiers	Alex Lu, Amy Lu, Wiebke Schormann, Marzyeh Ghassemi, David Andrews, Alan Moses	We created a public dataset of 132,209 images of mouse cells, COOS-7 (Cells Out Of Sample 7-Class).
167	Nonconvex Low-Rank Tensor Completion from Noisy Data	Changxiao Cai, Gen Li, H. Vincent Poor, Yuxin Chen	Focusing on “incoherent” and well-conditioned tensors of a constant CP rank, we propose a two-stage nonconvex algorithm — (vanilla) gradient descent following a rough initialization — that achieves the best of both worlds.
168	Beyond Online Balanced Descent: An Optimal Algorithm for Smoothed Online Optimization	Gautam Goel, Yiheng Lin, Haoyuan Sun, Adam Wierman	No existing algorithms have competitive ratios matching this bound, and we show that the state-of-the-art algorithm, Online Balanced Decent (OBD), has a competitive ratio that is $\Omega(m^{-2/3})$.
169	Channel Gating Neural Networks	Weizhe Hua, Yuan Zhou, Christopher M. De Sa, Zhiru Zhang, G. Edward Suh	This paper introduces channel gating, a dynamic, fine-grained, and hardware-efficient pruning scheme to reduce the computation cost for convolutional neural networks (CNNs).
170	Neural networks grown and self-organized by noise	Guruprasad Raghavan, Matt Thomson	In this paper, we propose a biologically inspired developmental algorithm that can ‘grow’ a functional, layered neural network from a single initial cell.
171	Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning	Xinyang Chen, Sinan Wang, Bo Fu, Mingsheng Long, Jianmin Wang	In this paper, we launch an in-depth empirical investigation into negative transfer in fine-tuning and find that, for the weight parameters and feature representations, transferability of their spectral components is diverse.
172	Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting	Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, Deyu Meng	To address this issue, we propose a method capable of adaptively learning an explicit weighting function directly from data.
173	Variational Structured Semantic Inference for Diverse Image Captioning	Fuhai Chen, Rongrong Ji, Jiayi Ji, Xiaoshuai Sun, Baochang Zhang, Xuri Ge, Yongjian Wu, Feiyue Huang, Yan Wang	To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema.
174	Mapping State Space using Landmarks for Universal Goal Reaching	Zhiao Huang, Hao Su, Fangchen Liu	We propose a method to address this issue in large MDPs with sparse rewards, in which exploration and routing across remote states are both extremely challenging.
175	Transferable Normalization: Towards Improving Transferability of Deep Neural Networks	Ximei Wang, Ying Jin, Mingsheng Long, Jianmin Wang, Michael I. Jordan	In this paper, we delve into the components of DNN architectures and propose Transferable Normalization (TransNorm) in place of existing normalization techniques.
176	Random deep neural networks are biased towards simple functions	Giacomo De Palma, Bobak Kiani, Seth Lloyd	We prove that the binary classifiers of bit strings generated by random wide deep neural networks with ReLU activation function are biased towards simple functions.
177	XNAS: Neural Architecture Search with Expert Advice	Niv Nayman, Asaf Noy, Tal Ridnik, Itamar Friedman, Rong Jin, Lihi Zelnik	This paper introduces a novel optimization method for differential neural architecture search, based on the theory of prediction with expert advice.
178	CNN^{2}: Viewpoint Generalization via a Binocular Vision	Wei-Da Chen, Shan-Hung (Brandon) Wu	Observing that humans use binocular vision to understand the world, we study in this paper whether the 3D viewpoint generalizability of CNNs can be achieved via a binocular vision.
179	Generalized Off-Policy Actor-Critic	Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson	We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting.
180	DAC: The Double Actor-Critic Architecture for Learning Options	Shangtong Zhang, Shimon Whiteson	We apply an actor-critic algorithm on each augmented MDP, yielding the Double Actor-Critic (DAC) architecture.
181	Numerically Accurate Hyperbolic Embeddings Using Tiling-Based Models	Tao Yu, Christopher M. De Sa	To address this, we propose a new model which uses an integer-based tiling to represent \emph{any} point in hyperbolic space with provably bounded numerical error.
182	Controlling Neural Level Sets	Matan Atzmon, Niv Haim, Lior Yariv, Ofer Israelov, Haggai Maron, Yaron Lipman	In this paper we present a simple and scalable approach to directly control level sets of a deep neural network.
183	Blended Matching Pursuit	Cyrille Combettes, Sebastian Pokutta	We present a blended matching pursuit algorithm, combining coordinate descent-like steps with stronger gradient descent steps, for minimizing a smooth convex function over a linear space spanned by a set of atoms.
184	An Improved Analysis of Training Over-parameterized Deep Neural Networks	Difan Zou, Quanquan Gu	In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters.
185	Controllable Text-to-Image Generation	Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip Torr	In this paper, we propose a novel controllable text-to-image generative adversarial network (ControlGAN), which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions.
186	Improving Textual Network Learning with Variational Homophilic Embeddings	Wenlin Wang, Chenyang Tao, Zhe Gan, Guoyin Wang, Liqun Chen, Xinyuan Zhang, Ruiyi Zhang, Qian Yang, Ricardo Henao, Lawrence Carin	Different from most existing methods that optimize a discriminative objective, we introduce Variational Homophilic Embedding (VHE), a fully generative model that learns network embeddings by modeling the semantic (textual) information with a variational autoencoder, while accounting for the structural (topology) information through a novel homophilic prior design.
187	Rethinking Generative Mode Coverage: A Pointwise Guaranteed Approach	Peilin Zhong, Yuchen Mo, Chang Xiao, Pengyu Chen, Changxi Zheng	Rethinking this problem from a game-theoretic perspective, we show that a complete mode coverage is firmly attainable.
188	The Randomized Midpoint Method for Log-Concave Sampling	Ruoqi Shen, Yin Tat Lee	To solve the sampling problem, we propose a new framework to discretize stochastic differential equations.
189	Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update	Su Young Lee, Choi Sungik, Sae-Young Chung	We propose Episodic Backward Update (EBU) – a novel deep reinforcement learning algorithm with a direct value propagation.
190	Fully Neural Network based Model for General Temporal Point Processes	Takahiro Omi, naonori ueda, Kazuyuki Aihara	We herein propose a novel RNN based model in which the time course of the intensity function is represented in a general manner.
191	Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks	Zhonghui You, Kun Yan, Jinmian Ye, Meng Ma, Ping Wang	In this work, we propose a global filter pruning algorithm called Gate Decorator, which transforms a vanilla CNN module by multiplying its output by the channel-wise scaling factors (i.e. gate).
192	Discrimination in Online Markets: Effects of Social Bias on Learning from Reviews and Policy Design	Faidra Georgia Monachou, Itai Ashlagi	We study this problem using a two-sided large market model with employers and workers mediated by a platform.
193	Provably Powerful Graph Networks	Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, Yaron Lipman	Differently put, we suggest a simple model that interleaves applications of standard Multilayer-Perceptron (MLP) applied to the feature dimension and matrix multiplication.
194	Order Optimal One-Shot Distributed Learning	Arsalan Sharifnassab, Saber Salehkaleybar, S. Jamaloddin Golestani	We propose an algorithm called Multi-Resolution Estimator (MRE) whose expected error is no larger than $\tilde{O}( m^{-1/\max(d,2)} n^{-1/2})$, where $d$ is the dimension of the parameter space.
195	Information Competing Process for Learning Diversified Representations	Jie Hu, Rongrong Ji, ShengChuan Zhang, Xiaoshuai Sun, Qixiang Ye, Chia-Wen Lin, Qi Tian	Towards learning diversified representations, a new approach, termed Information Competing Process (ICP), is proposed in this paper.
196	GENO — GENeric Optimization for Classical Machine Learning	Soeren Laue, Matthias Mitterreiter, Joachim Giesen	We show on a wide variety of classical but also some recently suggested problems that the automatically generated solvers are (1) as efficient as well engineered, specialized solvers, (2) more efficient by a decent margin than recent state-of-the-art solvers, and (3) orders of magnitude more efficient than classical modeling language plus solver approaches.
197	Conditional Independence Testing using Generative Adversarial Networks	Alexis Bellot, Mihaela van der Schaar	Our contribution is a new test statistic based on samples from a generative adversarial network designed to approximate directly a conditional distribution that encodes the null hypothesis, in a manner that maximizes power (the rate of true negatives).
198	Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function	Aviv Rosenberg, Yishay Mansour	We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes.
199	Partitioning Structure Learning for Segmented Linear Regression Trees	Xiangyu Zheng, Song Xi Chen	This paper proposes a partitioning structure learning method for segmented linear regression trees (SLRT), which assigns linear predictors over the terminal nodes.
200	A Tensorized Transformer for Language Modeling	Xindian Ma, Peng Zhang, Shuai Zhang, Nan Duan, Yuexian Hou, Ming Zhou, Dawei Song	In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD).
201	Kernel Stein Tests for Multiple Model Comparison	Jen Ning Lim, Makoto Yamada, Bernhard Sch?lkopf, Wittawat Jitkrittum	We address the problem of non-parametric multiple model comparison: given $l$ candidate models, decide whether each candidate is as good as the best one(s) or worse than it.
202	Disentangled behavioural representations	Amir Dezfouli, Hassan Ashtiani, Omar Ghattas, Richard Nock, Peter Dayan, Cheng Soon Ong	To achieve this, we propose a novel end-to-end learning framework in which an encoder is trained to map the behavior of subjects into a low-dimensional latent space.
203	More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation	Quanfu Fan, Chun-Fu (Richard) Chen, Hilde Kuehne, Marco Pistoia, David Cox	To address this problem, we present an lightweight and memory-friendly architecture for action recognition that performs on par with or better than current architectures by using only a fraction of resources.
204	Rethinking the CSC Model for Natural Images	Dror Simon, Michael Elad	In this work we provide new insights regarding the CSC model and its capability to represent natural images, and suggest a Bayesian connection between this model and its patch-based ancestor.
205	Integrating Bayesian and Discriminative Sparse Kernel Machines for Multi-class Active Learning	Weishi Shi, Qi Yu	We propose a novel active learning (AL) model that integrates Bayesian and discriminative kernel machines for fast and accurate multi-class data sampling.
206	Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity	Deepak Pathak, Christopher Lu, Trevor Darrell, Phillip Isola, Alexei A. Efros	In contrast, this paper investigates a modular co-evolution strategy: a collection of primitive agents learns to dynamically self-assemble into composite bodies while also learning to coordinate their behavior to control these bodies.
207	Perceiving the arrow of time in autoregressive motion	Kristof Meding, Dominik Janzing, Bernhard Sch?lkopf, Felix A. Wichmann	We employ a so-called frozen noise paradigm enabling us to compare human performance with four different algorithms on a trial-by-trial basis: A causal inference algorithm exploiting the dependence structure of additive noise terms, a neurally inspired network, a Bayesian ideal observer model as well as a simple heuristic.
208	DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections	Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li	In this work, we propose an algorithm, DualDICE, for estimating these quantities.
209	Hyper-Graph-Network Decoders for Block Codes	Eliya Nachmani, Lior Wolf	In this work, we extend these results to much larger families of algebraic block codes, by performing message passing with graph neural networks.
210	Large Scale Markov Decision Processes with Changing Rewards	Adrian Rivera Cardoso, He Wang, Huan Xu	By approximating the state-action occupancy measures with a linear architecture of dimension $d\ll\|S\|$, we propose a modified algorithm with a computational complexity polynomial in $d$ and independent of $\|S\|$.
211	Multiview Aggregation for Learning Category-Specific Shape Reconstruction	Srinath Sridhar, Davis Rempe, Julien Valentin, Bouaziz Sofien, Leonidas J. Guibas	We present a method that can estimate dense 3D shape, and aggregate shape across multiple and varying number of input views.
212	Semi-Parametric Dynamic Contextual Pricing	Virag Shah, Ramesh Johari, Jose Blanchet	Motivated by the application of real-time pricing in e-commerce platforms, we consider the problem of revenue-maximization in a setting where the seller can leverage contextual information describing the customer’s history and the product’s type to predict her valuation of the product.
213	Interlaced Greedy Algorithm for Maximization of Submodular Functions in Nearly Linear Time	Alan Kuhnle	A deterministic approximation algorithm is presented for the maximization of non-monotone submodular functions over a ground set of size $n$ subject to cardinality constraint $k$; the algorithm is based upon the idea of interlacing two greedy procedures.
214	Initialization of ReLUs for Dynamical Isometry	Rebekka Burkholz, Alina Dubatovka	We derive the joint signal output distribution exactly, without mean field assumptions, for fully-connected networks with Gaussian weights and biases, and analyze deviations from the mean field results.
215	Gradient Information for Representation and Modeling	Jie Ding, Robert Calderbank, Vahid Tarokh	Motivated by Fisher divergence, in this paper we present a new set of information quantities which we refer to as gradient information.
216	SpiderBoost and Momentum: Faster Variance Reduction Algorithms	Zhe Wang, Kaiyi Ji, Yi Zhou, Yingbin Liang, Vahid Tarokh	In this paper, we propose SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee.
217	Minimax Optimal Estimation of Approximate Differential Privacy on Neighboring Databases	Xiyang Liu, Sewoong Oh	We pose it as a property estimation problem, and study the fundamental trade-offs involved in the accuracy in estimated privacy guarantees and the number of samples required.
218	Backprop with Approximate Activations for Memory-efficient Network Training	Ayan Chakrabarti, Benjamin Moseley	In this paper, we propose a new implementation for back-propagation that significantly reduces memory usage, by enabling the use of approximations with negligible computational cost and minimal effect on training performance.
219	Training Image Estimators without Image Ground Truth	Zhihao Xia, Ayan Chakrabarti	In this paper, we introduce an unsupervised framework for training image estimation networks, from a training set that contains only measurements—with two varied measurements per image—but no ground-truth for the full images desired as output.
220	Deep Structured Prediction for Facial Landmark Detection	Lisha Chen, Hui Su, Qiang Ji	This paper proposes a method for deep structured facial landmark detection based on combining a deep Convolutional Network with a Conditional Random Field.
221	Information-Theoretic Confidence Bounds for Reinforcement Learning	Xiuyuan Lu, Benjamin Van Roy	We integrate information-theoretic concepts into the design and analysis of optimistic algorithms and Thompson sampling.
222	Transfer Anomaly Detection by Inferring Latent Domain Representations	Atsutoshi Kumagai, Tomoharu Iwata, Yasuhiro Fujiwara	We propose a method to improve the anomaly detection performance on target domains by transferring knowledge on related domains.
223	Total Least Squares Regression in Input Sparsity Time	Huaian Diao, Zhao Song, David Woodruff, Xin Yang	We give an algorithm for finding a solution $X$ to the linear system $\hat{A}X=\hat{B}$ for which the cost $\\|A-\hat{A}\\|_F^2 + \\|B-\hat{B}\\|_F^2$ is at most a multiplicative $(1+\epsilon)$ factor times the optimal cost, up to an additive error $\eta$ that may be an arbitrarily small function of $n$.
224	Park: An Open Platform for Learning-Augmented Computer Systems	Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, ravichandra addanki, Mehrdad Khani Shirkoohi, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska, Dr.Mohammad Alizadeh	We present Park, a platform for researchers to experiment with Reinforcement Learning (RL) for computer systems.
225	Adapting Neural Networks for the Estimation of Treatment Effects	Claudia Shi, David Blei, Victor Veitch	We propose two adaptations based on insights from the statistical literature on the estimation of treatment effects.
226	Learning Transferable Graph Exploration	Hanjun Dai, Yujia Li, Chenglong Wang, Rishabh Singh, Po-Sen Huang, Pushmeet Kohli	We propose a `learning to explore’ framework where we learn a policy from a distribution of environments.
227	Conformal Prediction Under Covariate Shift	Ryan J. Tibshirani, Rina Foygel Barber, Emmanuel Candes, Aaditya Ramdas	We extend conformal prediction methodology beyond the case of exchangeable data.
228	Optimal Analysis of Subset-Selection Based L_p Low-Rank Approximation	Chen Dan, Hong Wang, Hongyang Zhang, Yuchen Zhou, Pradeep K. Ravikumar	We show that for the problem of $\ell_p$ rank-$k$ approximation of any given matrix over $R^{n\times m}$ and $C^{n\times m}$, the algorithm of column subset selection enjoys approximation ratio $(k+1)^{1/p}$ for $1\le p\le 2$ and $(k+1)^{1-1/p}$ for $p\ge 2$.
229	Asymmetric Valleys: Beyond Sharp and Flat Local Minima	Haowei He, Gao Huang, Yang Yuan	In this paper, we observe that local minima of modern deep networks are more than being flat or sharp.
230	Positive-Unlabeled Compression on the Cloud	Yixing Xu, Yunhe Wang, Jia Zeng, Kai Han, Chunjing XU, Dacheng Tao, Chang Xu	In this paper, we present a novel positive-unlabeled (PU) setting for addressing this problem.
231	Direct Estimation of Differential Functional Graphical Models	Boxin Zhao, Y. Samuel Wang, Mladen Kolar	We consider the problem of estimating the difference between two functional undirected graphical models with shared structures.
232	On the Calibration of Multiclass Classification with Rejection	Chenri Ni, Nontawat Charoenphakdee, Junya Honda, Masashi Sugiyama	We propose rejection criteria for more general losses for this approach and guarantee calibration to the Bayes-optimal solution.
233	Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller	Pratyusha Sharma, Deepak Pathak, Abhinav Gupta	Our central insight is to enforce this structure explicitly during learning by decoupling what to achieve (intended task) from how to perform it (controller).
234	Stagewise Training Accelerates Convergence of Testing Error Over SGD	Zhuoning Yuan, Yan Yan, Rong Jin, Tianbao Yang	This paper provides some theoretical evidence for explaining this faster convergence.
235	Learning Robust Options by Conditional Value at Risk Optimization	Takuya Hiraoka, Takahisa Imagawa, Tatsuya Mori, Takashi Onishi, Yoshimasa Tsuruoka	In this paper, we propose a conditional value at risk (CVaR)-based method to learn options that work well in both the average and worst cases.
236	Non-asymptotic Analysis of Stochastic Methods for Non-Smooth Non-Convex Regularized Problems	Yi Xu, Rong Jin, Tianbao Yang	Our contributions are two-fold: (i) we show that they enjoy the same complexities as their counterparts for solving convex regularized non-convex problems in terms of finding an approximate stationary point; (ii) we develop more practical variants using dynamic mini-batch size instead of a fixed mini-batch size without requiring the target accuracy level of solution.
237	On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective	Lili Su, Pengkun Yang	We consider training over-parameterized two-layer neural networks with Rectified Linear Unit (ReLU) using gradient descent (GD) method.
238	Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries	Fuwen Tan, Paola Cascante-Bonilla, Xiaoxiao Guo, Hui Wu, Song Feng, Vicente Ordonez	We propose Drill-down, an effective framework for encoding multiple queries with an efficient compact state representation that significantly extends current methods for single-round image retrieval.
239	Visual Sequence Learning in Hierarchical Prediction Networks and Primate Visual Cortex	JIELIN QIU, Ge Huang, Tai Sing Lee	In this paper we developed a computational hierarchical network model to understand the spatiotemporal sequence learning effects observed in the primate visual cortex.
240	Dual Variational Generation for Low Shot Heterogeneous Face Recognition	Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He	This paper considers HFR as a dual generation problem, and proposes a novel Dual Variational Generation (DVG) framework.
241	Discovering Neural Wirings	Mitchell Wortsman, Ali Farhadi, Mohammad Rastegari	In this work we propose a method for discovering neural wirings.
242	On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems	Baekjin Kim, Ambuj Tewari	We investigate the optimality of perturbation based algorithms in the stochastic and adversarial multi-armed bandit problems.
243	Knowledge Extraction with No Observable Data	Jaemin Yoo, Minyong Cho, Taebum Kim, U Kang	In this work, we propose KegNet (Knowledge Extraction with Generative Networks), a novel approach to extract the knowledge of a trained deep neural network and to generate artificial data points that replace the missing training data in knowledge distillation.
244	PAC-Bayes under potentially heavy tails	Matthew Holland	We derive PAC-Bayesian learning guarantees for heavy-tailed losses, and obtain a novel optimal Gibbs posterior which enjoys finite-sample excess risk bounds at logarithmic confidence.
245	One-Shot Object Detection with Co-Attention and Co-Excitation	Ting-I Hsieh, Yi-Chen Lo, Hwann-Tzong Chen, Tyng-Luh Liu	This paper aims to tackle the challenging problem of one-shot object detection.
246	Quaternion Knowledge Graph Embeddings	SHUAI ZHANG, Yi Tay, Lina Yao, Qi Liu	In this work, we move beyond the traditional complex-valued representations, introducing more expressive hypercomplex representations to model entities and relations for knowledge graph embeddings.
247	Glyce: Glyph-vectors for Chinese Character Representations	Yuxian Meng, Wei Wu, Fei Wang, Xiaoya Li, Ping Nie, Fan Yin, Muyu Li, Qinghong Han, Yuxian Meng, Jiwei Li	In this paper, we address this gap by presenting Glyce, the glyph-vectors for Chinese character representations.
248	Turbo Autoencoder: Deep learning based channel codes for point-to-point communication channels	Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, Pramod Viswanath	In this work, we make significant progress on this problem by designing a fully end-to-end jointly trained neural encoder and decoder, namely, Turbo Autoencoder (TurboAE), with the following contributions: (a) under moderate block lengths, TurboAE approaches state-of-the-art performance under canonical channels; (b) moreover, TurboAE outperforms the state-of-the-art codes under non-canonical settings in terms of reliability.
249	Heterogeneous Graph Learning for Visual Commonsense Reasoning	Weijiang Yu, Jingwen Zhou, Weihao Yu, Xiaodan Liang, Nong Xiao	In this paper, we propose a new Heterogeneous Graph Learning (HGL) framework for seamlessly integrating the intra-graph and inter-graph reasoning in order to bridge the vision and language domain.
250	Probabilistic Watershed: Sampling all spanning forests for seeded segmentation and semi-supervised learning	Enrique Fita Sanmartin, Sebastian Damrich, Fred A. Hamprecht	We propose instead to consider all possible spanning forests and calculate, for every node, the probability of sampling a forest connecting a certain seed with that node.
251	Classification-by-Components: Probabilistic Modeling of Reasoning over a Set of Components	Sascha Saralajew, Lars Holdijk, Maike Rees, Ebubekir Asan, Thomas Villmann	In this work, a network architecture, denoted as Classification-By-Components network (CBC), is proposed.
252	Identifying Causal Effects via Context-specific Independence Relations	Santtu Tikka, Antti Hyttinen, Juha Karvanen	Motivated by this, we design a calculus and an automated search procedure for identifying causal effects in the presence of CSIs.
253	Bridging Machine Learning and Logical Reasoning by Abductive Learning	Wang-Zhou Dai, Qiuling Xu, Yang Yu, Zhi-Hua Zhou	In this paper, we present the abductive learning targeted at unifying the two AI paradigms in a mutually beneficial way, where the machine learning model learns to perceive primitive logic facts from data, while logical reasoning can exploit symbolic domain knowledge and correct the wrongly perceived facts for improving the machine learning models.
254	Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function	Zihan Zhang, Xiangyang Ji	We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently.
255	On the Global Convergence of (Fast) Incremental Expectation Maximization Methods	Belhal Karimi, Hoi-To Wai, Eric Moulines, Marc Lavielle	In this paper, we analyze incremental and stochastic version of the EM algorithm as well as the variance reduced-version of [Chen et al., 2018] in a common unifying framework.
256	A Linearly Convergent Proximal Gradient Algorithm for Decentralized Optimization	Sulaiman Alghunaim, Kun Yuan, Ali H. Sayed	This work studies decentralized composite optimization problems with non-smooth regularization terms.
257	Regularizing Trajectory Optimization with Denoising Autoencoders	Rinu Boney, Norman Di Palo, Mathias Berglund, Alexander Ilin, Juho Kannala, Antti Rasmus, Harri Valpola	We propose to regularize trajectory optimization by means of a denoising autoencoder that is trained on the same trajectories as the model of the environment.
258	Learning Hierarchical Priors in VAEs	Alexej Klushyn, Nutan Chen, Richard Kurle, Botond Cseke, Patrick van der Smagt	We introduce a graph-based interpolation method, which shows that the topology of the learned latent representation corresponds to the topology of the data manifold—and present several examples, where desired properties of latent representation such as smoothness and simple explanatory factors are learned by the prior.
259	Epsilon-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits	Sivan Sabato	We provide an algorithm for this setting, that with a high probability returns an epsilon-best arm, while incurring a cost that depends only linearly on the total expected reward of all arms, and does not depend at all on the number of arms.
260	Safe Exploration for Interactive Machine Learning	Matteo Turchetta, Felix Berkenkamp, Andreas Krause	In this paper, we introduce a novel framework that renders any existing unsafe IML algorithm safe.
261	Addressing Failure Detection by Learning Model Confidence	Charles Corbi?re, Nicolas THOME, Avner Bar-Hen, Matthieu Cord, Patrick P?rez	In this paper, we propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP).
262	Combinatorial Bayesian Optimization using the Graph Cartesian Product	Changyong Oh, Jakub Tomczak, Efstratios Gavves, Max Welling	We introduce COMBO, a new Gaussian Process (GP) BO.
263	Fooling Neural Network Interpretations via Adversarial Model Manipulation	Juyeon Heo, Sunghwan Joo, Taesup Moon	We propose two types of fooling, Passive and Active, and demonstrate such foolings generalize well to the entire validation set as well as transfer to other interpretation methods.
264	On Lazy Training in Differentiable Programming	L?na?c Chizat, Edouard Oyallon, Francis Bach	In this work, we show that this “lazy training” phenomenon is not specific to over-parameterized neural networks, and is due to a choice of scaling, often implicit, that makes the model behave as its linearization around the initialization, thus yielding a model equivalent to learning with positive-definite kernels.
265	Quality Aware Generative Adversarial Networks	KANCHARLA PARIMALA, Sumohana Channappayya	In this work, we show how a distance metric that is a variant of the Structural SIMilarity (SSIM) index (a popular full-reference image quality assessment algorithm), and a novel quality aware discriminator gradient penalty function that is inspired by the Natural Image Quality Evaluator (NIQE, a popular no-reference image quality assessment algorithm) can each be used as excellent regularizers for GAN objective functions.
266	Copula-like Variational Inference	Marcel Hirt, Petros Dellaportas, Alain Durmus	This paper considers a new family of variational distributions motivated by Sklar’s theorem.
267	Implicit Regularization for Optimal Sparse Recovery	Tomas Vaskevicius, Varun Kanade, Patrick Rebeschini	We investigate implicit regularization schemes for gradient descent methods applied to unpenalized least squares regression to solve the problem of reconstructing a sparse signal from an underdetermined system of linear measurements under the restricted isometry assumption.
268	Locally Private Gaussian Estimation	Matthew Joseph, Janardhan Kulkarni, Jieming Mao, Steven Z. Wu	We study a basic private estimation problem: each of n users draws a single i.i.d. sample from an unknown Gaussian distribution N(\mu,\sigma^2), and the goal is to estimate \mu while guaranteeing local differential privacy for each user.
269	Multi-mapping Image-to-Image Translation via Learning Disentanglement	Xiaoming Yu, Yuanqi Chen, Shan Liu, Thomas Li, Ge Li	To address this issue, we propose a novel unified model, which bridges these two objectives.
270	Spatially Aggregated Gaussian Processes with Multivariate Areal Outputs	Yusuke Tanaka, Toshiyuki Tanaka, Tomoharu Iwata, Takeshi Kurashima, Maya Okawa, Yasunori Akagi, Hiroyuki Toda	We propose a probabilistic model for inferring the multivariate function from multiple areal data sets with various granularities.
271	Fast Structured Decoding for Sequence Models	Zhiqing Sun, Zhuohan Li, Haoqing Wang, Di He, Zi Lin, Zhihong Deng	Specifically, we design an efficient approximation for Conditional Random Fields (CRF) for non-autoregressive sequence models, and further propose a dynamic transition technique to model positional contexts in the CRF.
272	Learning Temporal Pose Estimation from Sparsely-Labeled Videos	Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani	To reduce the need for dense annotations, we propose a PoseWarper network that leverages training videos with sparse annotations (every k frames) to learn to perform dense temporal pose propagation and estimation.
273	Putting An End to End-to-End: Gradient-Isolated Learning of Representations	Sindy L?we, Peter O’Connor, Bastiaan Veeling	We propose a novel deep learning method for local self-supervised representation learning that does not require labels nor end-to-end backpropagation but exploits the natural order in data instead.
274	Scalable Gromov-Wasserstein Learning for Graph Partitioning and Matching	Hongteng Xu, Dixin Luo, Lawrence Carin	We propose a scalable Gromov-Wasserstein learning (S-GWL) method and establish a novel and theoretically-supported paradigm for large-scale graph analysis.
275	Meta-Reinforced Synthetic Data for One-Shot Fine-Grained Visual Recognition	Satoshi Tsutsui, Yanwei Fu, David Crandall	To this end, this paper proposes a meta-learning framework to reinforce the generated images by original images so that these images can facilitate one-shot learning.
276	Real-Time Reinforcement Learning	Simon Ramstedt, Chris Pal	In this paper, we introduce a new framework, in which states and actions evolve simultaneously and show how it is related to the classical MDP formulation.
277	Robust Multi-agent Counterfactual Prediction	Alexander Peysakhovich, Christian Kroer, Adam Lerer	We propose a method for analyzing the sensitivity of counterfactual conclusions to violations of these assumptions, which we call robust multi-agent counterfactual prediction (RMAC).
278	Approximate Inference Turns Deep Networks into Gaussian Processes	Mohammad Emtiyaz E. Khan, Alexander Immer, Ehsan Abedi, Maciej Korzepa	In this paper, we show that certain Gaussian posterior approximations for Bayesian DNNs are equivalent to GP posteriors.
279	Deep Signature Transforms	Patrick Kidger, Patric Bonnier, Imanol Perez Arribas, Cristopher Salvi, Terry Lyons	We propose a novel approach which combines the advantages of the signature transform with modern deep learning frameworks.
280	Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits	Yogev Bar-On, Yishay Mansour	We present algorithms both for the case that the communication graph is known to all the agents, and for the case that the graph is unknown.
281	Convergent Policy Optimization for Safe Reinforcement Learning	Ming Yu, Zhuoran Yang, Mladen Kolar, Zhaoran Wang	We study the safe reinforcement learning problem with nonlinear function approximation, where policy optimization is formulated as a constrained optimization problem with both the objective and the constraint being nonconvex functions.
282	Augmented Neural ODEs	Emilien Dupont, Arnaud Doucet, Yee Whye Teh	To address these limitations, we introduce Augmented Neural ODEs which, in addition to being more expressive models, are empirically more stable, generalize better and have a lower computational cost than Neural ODEs.
283	Thompson Sampling for Multinomial Logit Contextual Bandits	Min-hwan Oh, Garud Iyengar	The distinguishing feature in this work is that this feedback has a multinomial logistic distribution.
284	Backpropagation-Friendly Eigendecomposition	Wei Wang, Zheng Dang, Yinlin Hu, Pascal Fua, Mathieu Salzmann	In this paper, we introduce a numerically stable and differentiable approach to leveraging eigenvectors in deep networks.
285	FastSpeech: Fast, Robust and Controllable Text to Speech	Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu	In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS.
286	Ultrametric Fitting by Gradient Descent	Giovanni Chierchia, Benjamin Perret	We aim to overcome this limitation by presenting a general optimization framework for ultrametric fitting.
287	Distinguishing Distributions When Samples Are Strategically Transformed	Hanrui Zhang, Yu Cheng, Vincent Conitzer	In this paper, we give necessary and sufficient conditions for when the principal can distinguish between agents of “good” and “bad” types, when the type affects the distribution of samples that the agent has access to.
288	Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks	Gauthier Gidel, Francis Bach, Simon Lacoste-Julien	Using a time rescaling, we show that, with a vanishing initialization and a small enough step size, this dynamics sequentially learns the solutions of a reduced-rank regression with a gradually increasing rank.
289	Deep Set Prediction Networks	Yan Zhang, Jonathon Hare, Adam Prugel-Bennett	We propose a general model for predicting sets that properly respects the structure of sets and avoids this problem.
290	DppNet: Approximating Determinantal Point Processes with Deep Networks	Zelda E. Mariet, Yaniv Ovadia, Jasper Snoek	We approach this problem by introducing DppNets: generative deep models that produce DPP-like samples for arbitrary ground sets.
291	Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control	Sai Qian Zhang, Qi Zhang, Jieyu Lin	In this work, we propose Variance Based Control (VBC), a simple yet efficient technique to improve communication efficiency in MARL.
292	Neural Lyapunov Control	Ya-Chien Chang, Nima Roohi, Sicun Gao	We propose new methods for learning control policies and neural network Lyapunov functions for nonlinear control problems, with provable guarantee of stability.
293	Fully Dynamic Consistent Facility Location	Vincent Cohen-Addad, Niklas Oskar D. Hjuler, Nikos Parotsidis, David Saulpic, Chris Schwiegelshohn	In this paper, we focus on general metric spaces and mainly on the facility location problem.
294	SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems	Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel Bowman	In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard.
295	A Flexible Generative Framework for Graph-based Semi-supervised Learning	Jiaqi Ma, Weijing Tang, Ji Zhu, Qiaozhu Mei	In this work, we propose a flexible generative framework for graph-based semi-supervised learning, which approaches the joint distribution of the node features, labels, and the graph structure.
296	Inherent Weight Normalization in Stochastic Neural Networks	Georgios Detorakis, Sourav Dutta, Abhishek Khanna, Matthew Jerry, Suman Datta, Emre Neftci	Here, we further demonstrate that always-on multiplicative stochasticity combined with simple threshold neurons provide a sufficient substrate for deep learning machines.
297	Optimal Decision Tree with Noisy Outcomes	Su Jia, viswanath nagarajan, Fatemeh Navidi, R Ravi	We design new approximation algorithms for both the non-adaptive setting, where the test sequence must be fixed a-priori, and the adaptive setting where the test sequence depends on the outcomes of prior tests.
298	Meta-Curvature	Eunbyung Park, Junier B. Oliva	We propose meta-curvature (MC), a framework to learn curvature information for better generalization and fast model adaptation.
299	Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning	Nathan Kallus, Masatoshi Uehara	We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS.
300	KerGM: Kernelized Graph Matching	Zhen Zhang, Yijian Xiang, Lingfei Wu, Bing Xue, Arye Nehorai	In our paper, we provide a unifying view for these two problems by introducing new rules for array operations in Hilbert spaces.
301	Transfusion: Understanding Transfer Learning for Medical Imaging	Maithra Raghu, Chiyuan Zhang, Jon Kleinberg, Samy Bengio	In this paper, we explore properties of transfer learning for medical imaging.
302	Adversarial training for free!	Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, Tom Goldstein	We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters.
303	Communication-Efficient Distributed Learning via Lazily Aggregated Quantized Gradients	Jun Sun, Tianyi Chen, Georgios Giannakis, Zaiyue Yang	The present paper develops a novel aggregated gradient approach for distributed machine learning that adaptively compresses the gradient communication.
304	Implicitly learning to reason in first-order logic	Vaishak Belle, Brendan Juba	In this work, we present a new theoretical approach to robustly learning to reason in first-order logic, and consider universally quantified clauses over a countably infinite domain.
305	Kernel-Based Approaches for Sequence Modeling: Connections to Neural Methods	Kevin Liang, Guoyin Wang, Yitong Li, Ricardo Henao, Lawrence Carin	We investigate time-dependent data analysis from the perspective of recurrent kernel machines, from which models with hidden units and gated memory cells arise naturally.
306	PC-Fairness: A Unified Framework for Measuring Causality-based Fairness	Yongkai Wu, Lu Zhang, Xintao Wu, Hanghang Tong	In this paper, we develop a framework for measuring different causality-based fairness.
307	Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration	Jianchun Chen, Lingjing Wang, Xiang Li, Yi Fang	To address this issue, we present an end-to-end trainable deep neural networks, named Arbitrary Continuous Geometric Transformation Networks (Arbicon-Net), to directly predict the dense displacement field for pairwise image alignment.
308	Assessing Disparate Impact of Personalized Interventions: Identifiability and Bounds	Nathan Kallus, Angela Zhou	We prove how we can nonetheless point-identify these quantities under the additional assumption of monotone treatment response, which may be reasonable in many applications.
309	The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the XAUC Metric	Nathan Kallus, Angela Zhou	To better account for this, in this paper, we investigate the fairness of predictive risk scores from the point of view of a bipartite ranking task, where one seeks to rank positive examples higher than negative ones.
310	HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models	Sharon Zhou, Mitchell Gordon, Ranjay Krishna, Austin Narcomey, Li F. Fei-Fei, Michael Bernstein	Our work establishes a gold standard human benchmark for generative realism.
311	First order expansion of convex regularized estimators	Pierre Bellec, Arun Kuchibhotla	We consider first order expansions of convex penalized estimators in high-dimensional regression problems with random designs.
312	Capacity Bounded Differential Privacy	Kamalika Chaudhuri, Jacob Imola, Ashwin Machanavajjhala	In this work, we present a novel relaxation of differential privacy, capacity bounded differential privacy, where the adversary that distinguishes output distributions is assumed to be capacity-bounded — i.e. bounded not in computational power, but in terms of the function class from which their attack algorithm is drawn.
313	Universal Boosting Variational Inference	Trevor Campbell, Xinglong Li	We thus develop universal boosting variational inference (UBVI), a BVI scheme that exploits the simple geometry of probability densities under the Hellinger metric to prevent the degeneracy of other gradient-based BVI methods, avoid difficult joint optimizations of both component and weight, and simplify fully-corrective weight optimizations.
314	SGD on Neural Networks Learns Functions of Increasing Complexity	Dimitris Kalimeris, Gal Kaplun, Preetum Nakkiran, Benjamin Edelman, Tristan Yang, Boaz Barak, Haofeng Zhang	More generally, we give evidence for the hypothesis that, as iterations progress, SGD learns functions of increasing complexity.
315	The Landscape of Non-convex Empirical Risk with Degenerate Population Risk	Shuang Li, Gongguo Tang, Michael B. Wakin	In this work, we focus on the situation where the corresponding population risk is a degenerate non-convex loss function, namely, the Hessian of the population risk can have zero eigenvalues.
316	Making AI Forget You: Data Deletion in Machine Learning	Antonio Ginart, Melody Guan, Gregory Valiant, James Y. Zou	In this paper we initiate a framework studying what to do when it is no longer permissible to deploy models derivative from specific user data.
317	Practical Differentially Private Top-k Selection with Pay-what-you-get Composition	David Durfee, Ryan M. Rogers	We study the problem of top-k selection over a large domain universe subject to user-level differential privacy.
318	Conformalized Quantile Regression	Yaniv Romano, Evan Patterson, Emmanuel Candes	In this paper we propose a new method that is fully adaptive to heteroscedasticity.
319	Thompson Sampling with Information Relaxation Penalties	Seungki Min, Costis Maglaras, Ciamac C. Moallemi	We consider a finite-horizon multi-armed bandit (MAB) problem in a Bayesian setting, for which we propose an information relaxation sampling framework.
320	Deep Generalized Method of Moments for Instrumental Variable Analysis	Andrew Bennett, Nathan Kallus, Tobias Schnabel	In this paper, we propose the DeepGMM algorithm to overcome this.
321	Learning Sample-Specific Models with Low-Rank Personalized Regression	Ben Lengerich, Bryon Aragam, Eric P. Xing	To address this challenge, we propose to estimate sample-specific models that tailor inference and prediction at the individual level.
322	Dancing to Music	Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, Jan Kautz	In this paper, we propose a synthesis-by-analysis learning framework to generate dance from music.
323	Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask	Hattie Zhou, Janice Lan, Rosanne Liu, Jason Yosinski	In this paper we study the three critical components of the Lottery Ticket (LT) algorithm, showing that each may be varied significantly without impacting the overall results.
324	Implicit Generation and Modeling with Energy Based Models	Yilun Du, Igor Mordatch	We present techniques to scale MCMC based EBM training on continuous neural networks, and we show its success on the high-dimensional data domains of ImageNet32x32, ImageNet128x128, CIFAR-10, and robotic hand trajectories, achieving better samples than other likelihood models and nearing the performance of contemporary GAN approaches, while covering all modes of the data.
325	LCA: Loss Change Allocation for Neural Network Training	Janice Lan, Rosanne Liu, Hattie Zhou, Jason Yosinski	We propose a new window into training called Loss Change Allocation (LCA), in which credit for changes to the network loss is conservatively partitioned to the parameters.
326	Predicting the Politics of an Image Using Webly Supervised Data	Christopher Thomas, Adriana Kovashka	In this paper, we model visual political bias in contemporary media sources at scale, using webly supervised data. We collect a dataset of over one million unique images and associated news articles from left- and right-leaning news sources, and develop a method to predict the image’s political leaning.
327	Adaptive GNN for Image Analysis and Editing	Lingyu Liang, LianWen Jin, Yong Xu	In mathematical analysis, we propose an adaptive GNN model by recursive definition, and derive its relation with two basic operations in CV: filtering and propagation operations.
328	Ultra Fast Medoid Identification via Correlated Sequential Halving	Tavor Baharav, David Tse	In this work, we show that we can better exploit the structure of the underlying computation problem by modifying the traditional bandit sampling strategy and using it in conjunction with a suitably chosen multi-armed bandit algorithm.
329	Tight Dimension Independent Lower Bound on the Expected Convergence Rate for Diminishing Step Sizes in SGD	PHUONG_HA NGUYEN, Lam Nguyen, Marten van Dijk	We study the convergence of Stochastic Gradient Descent (SGD) for strongly convex objective functions.
330	Asymptotics for Sketching in Least Squares Regression	Edgar Dobriban, Sifan Liu	In this paper, we make progress on this problem, working in an asymptotic framework where the number of datapoints and dimension of features goes to infinity.
331	MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies	Xue Bin Peng, Michael Chang, Grace Zhang, Pieter Abbeel, Sergey Levine	In this work, we propose multiplicative compositional policies (MCP), a method for learning reusable motor skills that can be composed to produce a range of complex behaviors.
332	Exact inference in structured prediction	Kevin Bello, Jean Honorio	We consider the generative process proposed by Globerson et al. (2015) and apply it to general connected graphs.
333	Coda: An End-to-End Neural Program Decompiler	Cheng Fu, Huili Chen, Haolan Liu, Xinyun Chen, Yuandong Tian, Farinaz Koushanfar, Jishen Zhao	To address the above problems, we propose Coda1, the first end-to-end neural-based framework for code decompilation.
334	Bat-G net: Bat-inspired High-Resolution 3D Image Reconstruction using Ultrasonic Echoes	Gunpil Hwang, Seohyeon Kim, Hyeon-Min Bae	In this paper, a bat-inspired high-resolution ultrasound 3D imaging system is presented.
335	Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates	Sharan Vaswani, Aaron Mishkin, Issam Laradji, Mark Schmidt, Gauthier Gidel, Simon Lacoste-Julien	We propose to use line-search techniques to automatically set the step-size when training models that can interpolate the data.
336	Scalable Structure Learning of Continuous-Time Bayesian Networks from Incomplete Data	Dominik Linzner, Michael Schmidt, Heinz Koeppl	Instead of sampling and scoring all possible structures individually, we assume the generator of the CTBN to be composed as a mixture of generators stemming from different structures. In this framework, structure learning can be performed via a gradient-based optimization of mixture weights.
337	Privacy-Preserving Classification of Personal Text Messages with Secure Multi-Party Computation	Devin Reich, Ariel Todoki, Rafael Dowsley, Martine De Cock, anderson nascimento	We propose the first privacy-preserving solution for text classification that is provably secure.
338	Efficiently Estimating Erdos-Renyi Graphs with Node Differential Privacy	Jonathan Ullman, Adam Sealfon	We give a simple, computationally efficient, and node-differentially-private algorithm for estimating the parameter of an Erdos-Renyi graph—that is, estimating p in a G(n,p)—with near-optimal accuracy.
339	Learning Representations for Time Series Clustering	Qianli Ma, Jiawei Zheng, Sen Li, Gary W. Cottrell	Here we propose a novel unsupervised temporal representation learning model, named Deep Temporal Clustering Representation (DTCR), which integrates the temporal reconstruction and K-means objective into the seq2seq model.
340	Verified Uncertainty Calibration	Ananya Kumar, Percy S. Liang, Tengyu Ma	To get the best of both worlds, we introduce the scaling-binning calibrator, which first fits a parametric function that acts like a baseline for variance reduction and then bins the function values to actually ensure calibration.
341	A Normative Theory for Causal Inference and Bayes Factor Computation in Neural Circuits	Wenhao Zhang, Si Wu, Brent Doiron, Tai Sing Lee	In this paper, we consider the causal inference in multisensory processing and propose a novel generative model based on neural population code that takes into account both stimulus feature and stimulus reliability in the inference.
342	Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction	Yunji Kim, Seonghyeon Nam, In Cho, Seon Joo Kim	We propose a deep video prediction model conditioned on a single image and an action class.
343	Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks	Yiwen Guo, Ziang Yan, Changshui Zhang	In this paper, we aim at reducing the query complexity of black-box attacks in this category.
344	Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction	Difan Zou, Pan Xu, Quanquan Gu	In this paper, we propose a Stochastic Recursive Variance-Reduced gradient HMC (SRVR-HMC) algorithm.
345	Learning Latent Process from High-Dimensional Event Sequences via Efficient Sampling	Qitian Wu, Zixuan Zhang, Xiaofeng Gao, Junchi Yan, Guihai Chen	To these ends, in this paper, we propose a seminal adversarial imitation learning framework for high-dimension event sequence generation which could be decomposed into: 1) a latent structural intensity model that estimates the adjacent nodes without explicit networks and learns to capture the temporal dynamics in the latent space of markers over observed sequence; 2) an efficient random walk based generation model that aims at imitating the generation process of high-dimension event sequences from a bottom-up view; 3) a discriminator specified as a seq2seq network optimizing the rewards to help the generator output event sequences as real as possible.
346	Cross-sectional Learning of Extremal Dependence among Financial Assets	Xing Yan, Qi Wu, Wen Zhang	We propose a novel probabilistic model to facilitate the learning of multivariate tail dependence of multiple financial assets.
347	Principal Component Projection and Regression in Nearly Linear Time through Asymmetric SVRG	Yujia Jin, Aaron Sidford	In this paper we provide the first algorithms that solve these problems in nearly linear time for fixed eigenvalue distribution and large n.
348	Compression with Flows via Local Bits-Back Coding	Jonathan Ho, Evan Lohn, Pieter Abbeel	To fill in this gap, we introduce local bits-back coding, a new compression technique for flow models.
349	Exact Rate-Distortion in Autoencoders via Echo Noise	Rob Brekelmans, Daniel Moyer, Aram Galstyan, Greg Ver Steeg	We introduce a new noise channel, Echo noise, that admits a simple, exact expression for mutual information for arbitrary input distributions.
350	iSplit LBI: Individualized Partial Ranking with Ties via Split LBI	Qianqian Xu, Xinwei Sun, Zhiyong Yang, Xiaochun Cao, Qingming Huang, Yuan Yao	In this paper, instead of learning a global ranking which is agreed with the consensus, we pursue the tie-aware partial ranking from an individualized perspective.
351	Domes to Drones: Self-Supervised Active Triangulation for 3D Human Pose Reconstruction	Aleksis Pirinen, Erik G?rtner, Cristian Sminchisescu	In order to address the view selection problem in a principled way, we here introduce ACTOR, an active triangulation agent for 3d human pose reconstruction.
352	MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization	Shangyu Chen, Wenya Wang, Sinno Jialin Pan	In this paper, we propose to learn $g_r$ by a neural network.
353	Improved Precision and Recall Metric for Assessing Generative Models	Tuomas Kynk??nniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, Timo Aila	We present an evaluation metric that can separately and reliably measure both of these aspects in image generation tasks by forming explicit, non-parametric representations of the manifolds of real and generated data.
354	A First-Order Algorithmic Framework for Distributionally Robust Logistic Regression	JIAJIN LI, SEN HUANG, Anthony Man-Cho So	In this paper, we take a first step towards resolving the above difficulty by developing a first-order algorithmic framework for tackling a class of Wasserstein distance-based distributionally robust logistic regression (DRLR) problem.
355	PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph	Yikang LI, Tao Ma, Yeqi Bai, Nan Duan, Sining Wei, Xiaogang Wang	Therefore, to generate the images with preferred objects and rich interactions, we propose a semi-parametric method, PasteGAN, for generating the image from the scene graph and the image crops, where spatial arrangements of the objects and their pair-wise relationships are defined by the scene graph and the object appearances are determined by the given object crops.
356	Handling correlated and repeated measurements with the smoothed multivariate square-root Lasso	Quentin Bertrand, Mathurin Massias, Alexandre Gramfort, Joseph Salmon	In this work, we propose a concomitant estimator that can cope with complex noise structure by using non-averaged measurements, its data-fitting term arising as a smoothing of the nuclear norm.
357	Joint Optimization of Tree-based Index and Deep Model for Recommender Systems	Han Zhu, Daqing Chang, Ziru Xu, Pengye Zhang, Xiang Li, Jie He, Han Li, Jian Xu, Kun Gai	Our purpose, in this paper, is to develop a method to jointly learn the index structure and user preference prediction model.
358	Learning Generalizable Device Placement Algorithms for Distributed Machine Learning	ravichandra addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, Mohammad Alizadeh	We present Placeto, a reinforcement learning (RL) approach to efficiently find device placements for distributed neural network training.
359	Uncoupled Regression from Pairwise Comparison Data	Liyuan Xu, Junya Honda, Gang Niu, Masashi Sugiyama	We propose two practical methods for uncoupled regression from pairwise comparison data and show that the learned regression model converges to the optimal model with the optimal parametric convergence rate when the target variable distributes uniformly.
360	Cross Attention Network for Few-shot Classification	Ruibing Hou, Hong Chang, Bingpeng MA, Shiguang Shan, Xilin Chen	In this work, we propose a novel Cross Attention Network to address the challenging problems in few-shot classification.
361	A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution	Qing Qu, Xiao Li, Zhihui Zhu	We study the multi-channel sparse blind deconvolution (MCS-BD) problem, whose task is to simultaneously recover a kernel $\mathbf a$ and multiple sparse inputs $\{\mathbf x_i\}_{i=1}^p$ from their circulant convolution $\mathbf y_i = \mb a \circledast \mb x_i $ ($i=1,\cdots,p$).
362	SCAN: A Scalable Neural Networks Framework Towards Compact and Efficient Models	Linfeng Zhang, Zhanhong Tan, Jiebo Song, Jingwei Chen, Chenglong Bao, Kaisheng Ma	To address this problem, we propose the so-called SCAN framework for networks training and inference, which is orthogonal and complementary to existing acceleration and compression methods.
363	Revisiting the Bethe-Hessian: Improved Community Detection in Sparse Heterogeneous Graphs	Lorenzo Dall’Amico, Romain Couillet, Nicolas Tremblay	This article studies spectral clustering based on the Bethe-Hessian matrix H_r= (r^2-1)I_n+D-rA for sparse heterogeneous graphs (following the degree-corrected stochastic block model) in a two-class setting.
364	Teaching Multiple Concepts to a Forgetful Learner	Anette Hunziker, Yuxin Chen, Oisin Mac Aodha, Manuel Gomez Rodriguez, Andreas Krause, Pietro Perona, Yisong Yue, Adish Singla	In this paper, we look at the problem from the perspective of discrete optimization and introduce a novel algorithmic framework for teaching multiple concepts with strong performance guarantees.
365	Regularized Weighted Low Rank Approximation	Frank Ban, David Woodruff, Richard Zhang	We derive provably sharper guarantees for the regularized version by obtaining parameterized complexity bounds in terms of the statistical dimension rather than the rank, allowing for a rank-independent runtime that can be significantly faster.
366	Practical and Consistent Estimation of f-Divergences	Paul Rubenstein, Olivier Bousquet, Josip Djolonga, Carlos Riquelme, Ilya O. Tolstikhin	Under these assumptions we propose and study an estimator that can be easily implemented, works well in high dimensions, and enjoys faster rates of convergence.
367	Approximation Ratios of Graph Neural Networks for Combinatorial Problems	Ryoma Sato, Makoto Yamada, Hisashi Kashima	In this paper, from a theoretical perspective, we study how powerful graph neural networks (GNNs) can be for learning approximation algorithms for combinatorial problems.
368	Thinning for Accelerating the Learning of Point Processes	Tianbo Li, Yiping Ke	We propose thinning as a downsampling method for accelerating the learning of point processes.
369	A Prior of a Googol Gaussians: a Tensor Ring Induced Prior for Generative Models	Maxim Kuznetsov, Daniil Polykovskiy, Dmitry P. Vetrov, Alex Zhebrak	Altogether, we propose a novel plug-and-play framework for generative models that can be utilized in any GAN and VAE-like architectures.
370	Differentially Private Markov Chain Monte Carlo	Mikko Heikkil?, Joonas J?lk?, Onur Dikmen, Antti Honkela	In this paper, we further extend the applicability of DP Bayesian learning by presenting the first general DP Markov chain Monte Carlo (MCMC) algorithm whose privacy-guarantees are not subject to unrealistic assumptions on Markov chain convergence and that is applicable to posterior inference in arbitrary models.
371	Full-Gradient Representation for Neural Network Visualization	Suraj Srinivas, Fran?ois Fleuret	We introduce a new tool for interpreting neural nets, namely full-gradients, which decomposes the neural net response into input sensitivity and per-neuron sensitivity components.
372	q-means: A quantum algorithm for unsupervised machine learning	Iordanis Kerenidis, Jonas Landman, Alessandro Luongo, Anupam Prakash	In this paper, we introduce q-means, a new quantum algorithm for clustering.
373	Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints	Sebastian Tschiatschek, Ahana Ghosh, Luis Haug, Rati Devidze, Adish Singla	In this paper, we consider the setting where the learner has its own preferences that it additionally takes into consideration.
374	Limitations of the empirical Fisher approximation for natural gradient descent	Frederik Kunstner, Philipp Hennig, Lukas Balles	We dispute this argument by showing that the empirical Fisher—unlike the Fisher—does not generally capture second-order information.
375	Flow-based Image-to-Image Translation with Feature Disentanglement	Ruho Kondo, Keisuke Kawano, Satoshi Koide, Takuro Kutsuna	To this end we propose a flow-based image-to-image model, called Flow U-Net with Squeeze modules (FUNS), that allows us to disentangle the features while retaining the ability to generate highquality diverse images from condition images.
376	Learning dynamic polynomial proofs	Alhussein Fawzi, Mateusz Malinowski, Hamza Fawzi, Omar Fawzi	In this paper, we consider the fundamental computational task of automatically searching for proofs of polynomial inequalities.
377	Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models	Vincent LE GUEN, Nicolas THOME	To handle this challenging task, we introduce DILATE (DIstortion Loss including shApe and TimE), a new objective function for training deep neural networks.
378	Understanding Attention and Generalization in Graph Neural Networks	Boris Knyazev, Graham W. Taylor, Mohamed Amer	Motivated by insights from the work on Graph Isomorphism Networks, we design simple graph reasoning tasks that allow us to study attention in a controlled environment.
379	Data Cleansing for Models Trained with SGD	Satoshi Hara, Atsushi Nitanda, Takanori Maehara	In this paper, we propose an algorithm that can identify influential instances without using any domain knowledge.
380	Curvilinear Distance Metric Learning	Shuo Chen, Lei Luo, Jian Yang, Chen Gong, Jun Li, Heng Huang	After that, by extending such straight lines to general curved forms, we propose a Curvilinear Distance Metric Learning (CDML) method, which adaptively learns the nonlinear geometries of the training data.
381	Embedding Symbolic Knowledge into Deep Networks	Xie Yaqi, Ziwei Xu, Kuldeep S Meel, Mohan Kankanhalli, Harold Soh	In this work, we aim to leverage prior symbolic knowledge to improve the performance of deep models.
382	Modeling Uncertainty by Learning a Hierarchy of Deep Neural Connections	Raanan Yehezkel Rohekar, Yaniv Gurwicz, Shami Nisimov, Gal Novik	We propose an approach for modeling this confounder by sharing neural connectivity patterns between the generative and discriminative networks.
383	Efficient Graph Generation with Graph Recurrent Attention Networks	Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Will Hamilton, David K. Duvenaud, Raquel Urtasun, Richard Zemel	We propose a new family of efficient and expressive deep generative models of graphs, called Graph Recurrent Attention Networks (GRANs).
384	Beyond Alternating Updates for Matrix Factorization with Inertial Bregman Proximal Gradient Algorithms	Mahesh Chandra Mukkamala, Peter Ochs	We exploit this theory by proposing a novel Bregman distance for matrix factorization problems, which, at the same time, allows for simple/closed form update steps.
385	Learning Deep Bilinear Transformation for Fine-grained Image Representation	Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo	In this paper, we propose a deep bilinear transformation (DBT) block, which can be deeply stacked in convolutional neural networks to learn fine-grained image representations.
386	Practical Deep Learning with Bayesian Principles	Kazuki Osawa, Siddharth Swaroop, Mohammad Emtiyaz E. Khan, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota	In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference.
387	Training Language GANs from Scratch	Cyprien de Masson d’Autume, Shakir Mohamed, Mihaela Rosca, Jack Rae	We combine existing techniques such as large batch sizes, dense rewards and discriminator regularization to stabilize and improve language GANs.
388	Pseudo-Extended Markov chain Monte Carlo	Christopher Nemeth, Fredrik Lindsten, Maurizio Filippone, James Hensman	In this paper, we introduce the pseudo-extended MCMC method as a simple approach for improving the mixing of the MCMC sampler for multi-modal posterior distributions.
389	Differentially Private Bagging: Improved utility and cheaper privacy than subsample-and-aggregate	James Jordon, Jinsung Yoon, Mihaela van der Schaar	In this paper,we extend this approach by dividing the data several times (rather than just once)and learning models on each chunk within each division.
390	Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters	Alberto Maria Metelli, Amarildo Likmeta, Marcello Restelli	In this paper, we address this question by proposing a Bayesian framework in which we employ approximate posterior distributions to model the uncertainty of the value function and Wasserstein barycenters to propagate it across state-action pairs.
391	On Adversarial Mixup Resynthesis	Christopher Beckham, Sina Honari, Alex M. Lamb, Vikas Verma, Farnoosh Ghadiri, R Devon Hjelm, Yoshua Bengio, Chris Pal	In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders.
392	A Geometric Perspective on Optimal Representations for Reinforcement Learning	Marc Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle	We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functions.
393	Learning New Tricks From Old Dogs: Multi-Source Transfer Learning From Pre-Trained Networks	Joshua Lee, Prasanna Sattigeri, Gregory Wornell	For such scenarios, we consider the multi-source learning problem of training a classifier using an ensemble of pre-trained neural networks for a set of classes that have not been observed by any of the source networks, and for which we have very few training samples.
394	Understanding and Improving Layer Normalization	Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin	In this paper, our main contribution is to take a step further in understanding LayerNorm.
395	Uncertainty-based Continual Learning with Adaptive Regularization	Hongjoon Ahn, Sungmin Cha, Donggyu Lee, Taesup Moon	We introduce a new neural network-based continual learning algorithm, dubbed as Uncertainty-regularized Continual Learning (UCL), which builds on traditional Bayesian online learning framework with variational inference.
396	LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning	Yali Du, Lei Han, Meng Fang, Ji Liu, Tianhong Dai, Dacheng Tao	In this paper, we propose to merge the two directions and learn each agent an intrinsic reward function which diversely stimulates the agents at each time step.
397	U-Time: A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging	Mathias Perslev, Michael Jensen, Sune Darkner, Poul J?rgen Jennum, Christian Igel	We propose U-Time, a fully feed-forward deep learning approach to physiological time series segmentation developed for the analysis of sleep data.
398	Massively scalable Sinkhorn distances via the Nystrom method	Jason Altschuler, Francis Bach, Alessandro Rudi, Jonathan Niles-Weed	In this work, we show that this challenge is surprisingly easy to circumvent: combining two simple techniques—the Nyström method and Sinkhorn scaling—provably yields an accurate approximation of the Sinkhorn distance with significantly lower time and memory requirements than other approaches.
399	Double Quantization for Communication-Efficient Distributed Optimization	Yue Yu, Jiaxiang Wu, Longbo Huang	In this paper, to reduce the communication complexity, we propose double quantization, a general scheme for quantizing both model parameters and gradients.
400	Globally optimal score-based learning of directed acyclic graphs in high-dimensions	Bryon Aragam, Arash Amini, Qing Zhou	We prove that $\Omega(s\log p)$ samples suffice to learn a sparse Gaussian directed acyclic graph (DAG) from data, where $s$ is the maximum Markov blanket size.
401	Multi-relational Poincare Graph Embeddings	Ivana Balazevic, Carl Allen, Timothy Hospedales	To address this, we propose a model that embeds multi-relational graph data in the Poincaré ball model of hyperbolic space.
402	No-Press Diplomacy: Modeling Multi-Agent Gameplay	Philip Paquette, Yuchen Lu, SETON STEVEN BOCCO, Max Smith, Satya O.-G., Jonathan K. Kummerfeld, Joelle Pineau, Satinder Singh, Aaron C. Courville	In this work, we focus on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players.
403	State Aggregation Learning from Markov Transition Data	Yaqi Duan, Tracy Ke, Mengdi Wang	In this paper, we propose a tractable algorithm that estimates the probabilistic aggregation map from the system’s trajectory.
404	Disentangling Influence: Using disentangled representations to audit model predictions	Charles Marx, Richard Phillips, Sorelle Friedler, Carlos Scheidegger, Suresh Venkatasubramanian	In this paper, we develop disentangled influence audits, a procedure to audit the indirect influence of features.
405	Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning	David Janz, Jiri Hron, Przemyslaw Mazur, Katja Hofmann, Jos? Miguel Hern?ndez-Lobato, Sebastian Tschiatschek	We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL.
406	Partially Encrypted Deep Learning using Functional Encryption	Th?o Ryffel, David Pointcheval, Francis Bach, Edouard Dufour-Sans, Romain Gay	We propose a practical framework to perform partially encrypted and privacy-preserving predictions which combines adversarial training and functional encryption.
407	Decentralized Cooperative Stochastic Bandits	David Mart?nez-Rubio, Varun Kanade, Patrick Rebeschini	We study a decentralized cooperative stochastic multi-armed bandit problem with K arms on a network of N agents.
408	Statistical bounds for entropic optimal transport: sample complexity and the central limit theorem	Gonzalo Mena, Jonathan Niles-Weed	We prove several fundamental statistical bounds for entropic OT with the squared Euclidean cost between subgaussian probability measures in arbitrary dimension.
409	Efficient Deep Approximation of GMMs	Shirin Jalali, Carl Nuzman, Iraj Saniee	In this work, we extend this idea to a rich class of functions, namely the discriminant functions that arise in optimal Bayesian classification of Gaussian mixture models (GMMs) in $\mathds{R}^n$.
410	Learning low-dimensional state embeddings and metastable clusters from time series data	Yifan Sun, Yaqi Duan, Hao Gong, Mengdi Wang	In the spirit of diffusion map, we propose an efficient method for learning a low-dimensional state embedding and capturing the process’s dynamics.
411	Exploiting Local and Global Structure for Point Cloud Semantic Segmentation with Contextual Point Representations	Xu Wang, Jingming He, Lin Ma	In this paper, we propose one novel model for point cloud semantic segmentation,which exploits both the local and global structures within the point cloud based onthe contextual point representations.
412	Scalable Bayesian dynamic covariance modeling with variational Wishart and inverse Wishart processes	Creighton Heaukulani, Mark van der Wilk	We implement gradient-based variational inference routines for Wishart and inverse Wishart processes, which we apply as Bayesian models for the dynamic, heteroskedastic covariance matrix of a multivariate time series.
413	Kernel Instrumental Variable Regression	Rahul Singh, Maneesh Sahani, Arthur Gretton	We propose kernel instrumental variable regression (KIV), a nonparametric generalization of 2SLS, modeling relations among X, Y, and Z as nonlinear functions in reproducing kernel Hilbert spaces (RKHSs).
414	Symmetry-Based Disentangled Representation Learning requires Interaction with Environments	Hugo Caselles-Dupr?, Michael Garcia Ortiz, David Filliat	We build on their work and make observations, theoretical and empirical, that lead us to argue that Symmetry-Based Disentangled Representation Learning cannot only be based on static observations: agents should interact with the environment to discover its symmetries.
415	Fast Efficient Hyperparameter Tuning for Policy Gradient Methods	Supratik Paul, Vitaly Kurin, Shimon Whiteson	In this paper, we propose Hyperparameter Optimisation on the Fly (HOOF), a gradient-free algorithm that requires no more than one training run to automatically adapt the hyperparameter that affect the policy update directly through the gradient.
416	Offline Contextual Bayesian Optimization	Ian Char, Youngseog Chung, Willie Neiswanger, Kirthevasan Kandasamy, Oak Nelson, Mark Boyer, Egemen Kolemen	In this work, we describe a theoretically grounded Bayesian optimization method to tackle this problem.
417	Making the Cut: A Bandit-based Approach to Tiered Interviewing	Candice Schumann, Zhi Lang, Jeffrey Foster, John Dickerson	We present new algorithms in both the probably approximately correct (PAC) and fixed-budget settings that select a near-optimal cohort with provable guarantees.
418	Unsupervised Scalable Representation Learning for Multivariate Time Series	Jean-Yves Franceschi, Aymeric Dieuleveut, Martin Jaggi	In this paper, we tackle this challenge by proposing an unsupervised method to learn universal embeddings of time series.
419	A state-space model for inferring effective connectivity of latent neural dynamics from simultaneous EEG/fMRI	Tao Tu, John Paisley, Stefan Haufe, Paul Sajda	In this study, we develop a linear state-space model to infer the effective connectivity in a distributed brain network based on simultaneously recorded EEG and fMRI data.
420	End to end learning and optimization on graphs	Bryan Wilder, Eric Ewing, Bistra Dilkina, Milind Tambe	Here, we propose an alternative decision-focused learning approach that integrates a differentiable proxy for common graph optimization problems as a layer in learned systems.
421	Game Design for Eliciting Distinguishable Behavior	Fan Yang, Liu Leqi, Yifan Wu, Zachary Lipton, Pradeep K. Ravikumar, Tom M. Mitchell, William W. Cohen	In this paper, we formulate the task of designing behavior diagnostic games that elicit distinguishable behavior as a mutual information maximization problem, which can be solved by optimizing a variational lower bound.
422	When does label smoothing help?	Rafael M?ller, Simon Kornblith, Geoffrey E. Hinton	We show that label smoothing encourages the representations of training examples from the same class to group in tight clusters.
423	Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning	Harsh Gupta, R. Srikant, Lei Ying	We present finite-time performance bounds for the case where the learning rate is fixed.
424	Rethinking Deep Neural Network Ownership Verification: Embedding Passports to Defeat Ambiguity Attacks	Lixin Fan, Kam Woh Ng, Chee Seng Chan	As remedies to the above-mentioned loophole, this paper proposes novel passport-based DNN ownership verification schemes which are both robust to network modifications and resilient to ambiguity attacks.
425	Scalable Spike Source Localization in Extracellular Recordings using Amortized Variational Inference	Cole Hurwitz, Kai Xu, Akash Srivastava, Alessio Buccino, Matthias Hennig	In this work, we present a Bayesian modelling approach for localizing the source of individual spikes on high-density, microelectrode arrays.
426	Optimal Sketching for Kronecker Product Regression and Low Rank Approximation	Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David Woodruff	In this work, we provide significantly faster algorithms.
427	Distribution-Independent PAC Learning of Halfspaces with Massart Noise	Ilias Diakonikolas, Themis Gouleakis, Christos Tzamos	We study the problem of distribution-independent PAC learning of halfspaces in the presence of Massart noise.
428	The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies	Basri Ronen, David Jacobs, Yoni Kasten, Shira Kritchman	We study the relationship between the frequency of a function and the speed at which a neural network learns it.
429	Adaptive Auxiliary Task Weighting for Reinforcement Learning	Xingyu Lin, Harjatin Baweja, George Kantor, David Held	In this work, we propose a principled online learning algorithm that dynamically combines different auxiliary tasks to speed up training for reinforcement learning.
430	Blocking Bandits	Soumya Basu, Rajat Sen, Sujay Sanghavi, Sanjay Shakkottai	We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter.
431	Global Convergence of Least Squares EM for Demixing Two Log-Concave Densities	Wei Qian, Yuqian Zhang, Yudong Chen	We demonstrate that Least Squares EM, a variant of the EM algorithm, converges to the true location parameter from a randomly initialized point.
432	Prior-Free Dynamic Auctions with Low Regret Buyers	Yuan Deng, Jon Schneider, Balasubramanian Sivan	In this work, we do away with this assumption and consider the prior-free setting where the buyer’s value each round is chosen adversarially (possibly adaptively).
433	On Single Source Robustness in Deep Fusion Models	Taewan Kim, Joydeep Ghosh	Motivated by this discovery, two possible approaches are proposed to increase robustness: a carefully designed loss with corresponding training algorithms for deep fusion models, and a simple convolutional fusion layer that has a structural advantage in dealing with noise.
434	Policy Evaluation with Latent Confounders via Optimal Balance	Andrew Bennett, Nathan Kallus	Instead, we propose an adversarial objective and weights that minimize it, ensuring sufficient balance in the latent confounders regardless of outcome model.
435	Think Globally, Act Locally: A Deep Neural Network Approach to High-Dimensional Time Series Forecasting	Rajat Sen, Hsiang-Fu Yu, Inderjit S. Dhillon	In this paper, we seek to correct this deficiency and propose DeepGLO, a deep forecasting model which thinks globally and acts locally.
436	Adaptive Cross-Modal Few-shot Learning	Chen Xing, Negar Rostamzadeh, Boris Oreshkin, Pedro O. O. Pinheiro	In this paper, we propose to leverage cross-modal information to enhance metric-based few-shot learning methods.
437	Spectral Modification of Graphs for Improved Spectral Clustering	Ioannis Koutis, Huong Le	In this paper we show that for any graph $G$, there exists a `spectral maximizer’ graph $H$ which is cut-similar to $G$, but has eigenvalues that are near the theoretical limit implied by the cut structure of $G$.
438	Hyperbolic Graph Convolutional Neural Networks	Ines Chami, Zhitao Ying, Christopher R?, Jure Leskovec	Here we propose Hyperbolic Graph Convolutional Neural Network (HGCN), the first inductive hyperbolic GCN that leverages both the expressiveness of GCNs and hyperbolic geometry to learn inductive node representations for hierarchical and scale-free graphs.
439	Cost Effective Active Search	Shali Jiang, Roman Garnett, Benjamin Moseley	We propose simple and fast approximations for computing its expectation, which serves as an essential role in our proposed policy.
440	Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs	Jian QIAN, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric	While it has been analyzed in infinite-horizon discounted and finite-horizon problems, we focus on designing and analysing the exploration bonus in the more challenging infinite-horizon undiscounted setting.
441	Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks	Xiao Sun, Jungwook Choi, Chia-Yu Chen, Naigang Wang, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Xiaodong Cui, Wei Zhang, Kailash Gopalakrishnan	Using theoretical insights, we propose a hybrid FP8 (HFP8) format and DNN end-to-end distributed training procedure.
442	Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers	Guang-He Lee, Yang Yuan, Shiyu Chang, Tommi Jaakkola	In particular, we offer adversarial robustness guarantees and associated algorithms for the discrete case where the adversary is $\ell_0$ bounded.
443	Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees	Ruqi Zhang, Christopher M. De Sa	In this paper, we propose a new auxiliary-variable minibatched Gibbs sampling method, Poisson-minibatching Gibbs, which both produces unbiased samples and has a theoretical guarantee on its convergence rate.
444	One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers	Ari Morcos, Haonan Yu, Michela Paganini, Yuandong Tian	Here, we attempt to answer this question by generating winning tickets for one training configuration (optimizer and dataset) and evaluating their performance on another configuration.
445	Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces	Chuan Guo, Ali Mousavi, Xiang Wu, Daniel N. Holtmann-Rice, Satyen Kale, Sashank Reddi, Sanjiv Kumar	In this paper, we demonstrate that theoretically there is no limitation to using low-dimensional embedding-based methods, and provide experimental evidence that overfitting is the root cause of the poor performance of embedding-based methods.
446	Fair Algorithms for Clustering	Suman Bera, Deeparnab Chakrabarty, Nicolas Flores, Maryam Negahbani	We study the problem of finding low-cost {\em fair clusterings} in data where each data point may belong to many protected groups.
447	Learning Mean-Field Games	Xin Guo, Anran Hu, Renyuan Xu, Junzi Zhang	This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population.
448	SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers	Igor Fedorov, Ryan P. Adams, Matthew Mattina, Paul Whatmough	This paper challenges the idea that CNNs are not suitable for deployment on MCUs. We demonstrate that it is possible to automatically design CNNs which generalize well, while also being small enough to fit onto memory-limited MCUs.
449	Deep imitation learning for molecular inverse problems	Eric Jonas	We treat this as a problem of graph-structured prediction, where armed with per-vertex information on a subset of the vertices, we infer the edges and edge types.
450	Visual Concept-Metaconcept Learning	Chi Han, Jiayuan Mao, Chuang Gan, Josh Tenenbaum, Jiajun Wu	In this paper, we propose the visual concept-metaconcept learner (VCML) for joint learning of concepts and metaconcepts from images and associated question-answer pairs.
451	Few-shot Video-to-Video Synthesis	Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Bryan Catanzaro, Jan Kautz	To address the limitations, we propose a few-shot vid2vid framework, which learns to synthesize videos of previously unseen subjects or scenes by leveraging few example images of the target at test time.
452	Neural Similarity Learning	Weiyang Liu, Zhen Liu, James M. Rehg, Le Song	By generalizing inner product with a bilinear matrix, we propose the neural similarity which serves as a learnable parametric similarity measure for CNNs.
453	Ordered Memory	Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron C. Courville	In this paper, we propose the Ordered Memory architecture.
454	MixMatch: A Holistic Approach to Semi-Supervised Learning	David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, Colin A. Raffel	In this work, we unify the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that guesses low-entropy labels for data-augmented unlabeled examples and mixes labeled and unlabeled data using MixUp.
455	Multivariate Triangular Quantile Maps for Novelty Detection	Jingjing Wang, Sun Sun, Yaoliang Yu	In this work, we present a general framework for neural novelty detection that centers around a multivariate extension of the univariate quantile function.
456	Fast Parallel Algorithms for Statistical Subset Selection Problems	Sharon Qian, Yaron Singer	In this paper, we propose a new framework for designing fast parallel algorithms for fundamental statistical subset selection tasks that include feature selection and experimental design.
457	PHYRE: A New Benchmark for Physical Reasoning	Anton Bakhtin, Laurens van der Maaten, Justin Johnson, Laura Gustafson, Ross Girshick	We develop the PHYRE benchmark for physical reasoning that contains a set of simple classical mechanics puzzles in a 2D physical environment.
458	On the number of variables to use in principal component regression	Ji Xu, Daniel J. Hsu	We study least squares linear regression over uncorrelated Gaussian features that are selected in order of decreasing variance.
459	Factor Group-Sparse Regularization for Efficient Low-Rank Matrix Recovery	Jicong Fan, Lijun Ding, Yudong Chen, Madeleine Udell	This paper develops a new class of nonconvex regularizers for low-rank matrix recovery.
460	Mutually Regressive Point Processes	Ifigeneia Apostolopoulou, Scott Linderman, Kyle Miller, Artur Dubrawski	In this paper, we introduce the first general class of Bayesian point process models extended with a nonlinear component that allows both excitatory and inhibitory relationships in continuous time.
461	Data-driven Estimation of Sinusoid Frequencies	Gautier Izacard, Sreyas Mohan, Carlos Fernandez-Granda	In this work, we propose a novel neural-network architecture that produces a significantly more accurate representation, and combine it with an additional neural-network module trained to detect the number of frequencies.
462	E2-Train: Training State-of-the-art CNNs with Over 80% Less Energy	Ziyu Jiang, Yue Wang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, Zhangyang Wang	This paper attempts to explore an orthogonal direction: how to conduct more energy-efficient training of CNNs, so as to enable on-device training?
463	ANODEV2: A Coupled Neural ODE Framework	Tianjun Zhang, Zhewei Yao, Amir Gholami, Joseph E. Gonzalez, Kurt Keutzer, Michael W. Mahoney, George Biros	Here, we propose \OURS, which extends this approach by introducing a framework that allows ODE-based evolution for both the weights and the activations, in a coupled formulation.
464	Estimating Entropy of Distributions in Constant Space	Jayadev Acharya, Sourbh Bhadane, Piotr Indyk, Ziteng Sun	Our main contribution is an algorithm that requires $O\left(\frac{k \log (1/\varepsilon)^2}{\varepsilon^3}\right)$ samples and a constant $O(1)$ memory words of space and outputs a $\pm\varepsilon$ estimate of $H(p)$.
465	On the Utility of Learning about Humans for Human-AI Coordination	Micah Carroll, Rohin Shah, Mark K. Ho, Tom Griffiths, Sanjit Seshia, Pieter Abbeel, Anca Dragan	To demonstrate this, we introduce a simple environment that requires challenging coordination, based on the popular game Overcooked, and learn a simple model that mimics human play.
466	Efficient Regret Minimization Algorithm for Extensive-Form Correlated Equilibrium	Gabriele Farina, Chun Kai Ling, Fei Fang, Tuomas Sandholm	In this paper, we introduce the first efficient regret minimization algorithm for computing extensive-form correlated equilibria in large two-player general-sum games with no chance moves.
467	Learning in Generalized Linear Contextual Bandits with Stochastic Delays	Zhengyuan Zhou, Renyuan Xu, Jose Blanchet	In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed.
468	Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness	Saeed Mahloujifar, Xiao Zhang, Mohammad Mahmoody, David Evans	This paper presents a method for empirically measuring and bounding the concentration of a concrete dataset which is proven to converge to the actual concentration.
469	Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions	Gabriele Farina, Christian Kroer, Tuomas Sandholm	We study the performance of optimistic regret-minimization algorithms for both minimizing regret in, and computing Nash equilibria of, zero-sum extensive-form games.
470	Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model	Erik Nijkamp, Mitch Hill, Song-Chun Zhu, Ying Nian Wu	This paper studies a curious phenomenon in learning energy-based model (EBM) using MCMC.
471	Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting	Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, Xifeng Yan	In this paper, we propose to tackle such forecasting problem with Transformer.
472	On the Accuracy of Influence Functions for Measuring Group Effects	Pang Wei W. Koh, Kai-Siang Ang, Hubert Teo, Percy S. Liang	In this paper, we find that across many different types of groups and for a range of real-world datasets, the predicted effect (using influence functions) of a group correlates surprisingly well with its actual effect, even if the absolute and relative errors are large.
473	Face Reconstruction from Voice using Generative Adversarial Networks	Yandong Wen, Bhiksha Raj, Rita Singh	In this paper, we address the challenge posed by a subtask of voice profiling – reconstructing someone’s face from their voice.
474	Incremental Few-Shot Learning with Attention Attractor Networks	Mengye Ren, Renjie Liao, Ethan Fetaya, Richard Zemel	To this end, we propose a meta-learning model, the Attention Attractor Network, which regularizes the learning of novel classes.
475	On Testing for Biases in Peer Review	Ivan Stelmakh, Nihar Shah, Aarti Singh	We consider the issue of biases in scholarly research, specifically, in peer review.
476	Learning Disentangled Representation for Robust Person Re-identification	Chanho Eom, Bumsub Ham	To tackle this problem, we propose to disentangle identity-related and -unrelated features from person images.
477	Balancing Efficiency and Fairness in On-Demand Ridesourcing	Nixie S. Lesmana, Xuan Zhang, Xiaohui Bei	In this paper, we focus on both the system efficiency and the fairness among drivers and quantitatively analyze the trade-offs between these two objectives.
478	Latent Ordinary Differential Equations for Irregularly-Sampled Time Series	Yulia Rubanova, Tian Qi Chen, David K. Duvenaud	We generalize RNNs to have continuous-time hidden dynamics defined by ordinary differential equations (ODEs), a model we call ODE-RNNs.
479	Deep RGB-D Canonical Correlation Analysis For Sparse Depth Completion	Yiqi Zhong, Cho-Ying Wu, Suya You, Ulrich Neumann	In this paper, we propose our Correlation For Completion Network (CFCNet), an end-to-end deep learning model that uses the correlation between two data sources to perform sparse depth completion.
480	Input Similarity from the Neural Network Perspective	Guillaume Charpiat, Nicolas Girard, Loris Felardos, Yuliya Tarabalka	Given a trained neural network, we aim at understanding how similar it considers any two samples.
481	Adaptive Sequence Submodularity	Marko Mitrovic, Ehsan Kazemi, Moran Feldman, Andreas Krause, Amin Karbasi	In this paper, we view the problem of adaptive and sequential decision making through the lens of submodularity and propose an adaptive greedy policy with strong theoretical guarantees.
482	Weight Agnostic Neural Networks	Adam Gaier, David Ha	In this work, we question to what extent neural network architectures alone, without learning any weight parameters, can encode solutions for a given task.
483	Learning to Predict Without Looking Ahead: World Models Without Forward Prediction	Daniel Freeman, David Ha, Luke Metz	In this work, we introduce a modification to traditional reinforcement learning which we call observational dropout, whereby we limit the agents ability to observe the real environment at each timestep.
484	Reducing the variance in online optimization by transporting past gradients	S?bastien Arnold, Pierre-Antoine Manzagol, Reza Babanezhad Harikandeh, Ioannis Mitliagkas, Nicolas Le Roux	We propose to correct this staleness using the idea of {\em implicit gradient transport} (IGT) which transforms gradients computed at previous iterates into gradients evaluated at the current iterate without using the Hessian explicitly.
485	Characterizing Bias in Classifiers using Generative Models	Daniel McDuff, Shuang Ma, Yale Song, Ashish Kapoor	We propose a simulation-based approach for interrogating classifiers using generative adversarial models in a systematic manner.
486	Optimal Stochastic and Online Learning with Individual Iterates	Yunwen Lei, Peng Yang, Ke Tang, Ding-Xuan Zhou	In this paper, we propose a theoretically sound strategy to select an individual iterate of the vanilla SCMD, which is able to achieve optimal rates for both convex and strongly convex problems in a non-smooth learning setting.
487	Policy Learning for Fairness in Ranking	Ashudeep Singh, Thorsten Joachims	To address this need, we propose a general LTR framework that can optimize a wide range of utility metrics (e.g. NDCG) while satisfying fairness of exposure constraints with respect to the items.
488	Off-Policy Evaluation via Off-Policy Classification	Alexander Irpan, Kanishka Rao, Konstantinos Bousmalis, Chris Harris, Julian Ibarz, Sergey Levine	In this work, we consider the problem of model selection for deep reinforcement learning (RL) in real-world environments.
489	Regularized Gradient Boosting	Corinna Cortes, Mehryar Mohri, Dmitry Storcheus	We introduce a new algorithm, called rgb, that directly benefits from these generalization bounds and that, at every boosting round, applies the \emph{Structural Risk Minimization} principle to search for a base predictor with the best empirical fit versus complexity trade-off.
490	Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model	Atilim Gunes Baydin, Lei Shao, Wahid Bhimji, Lukas Heinrich, Saeid Naderiparizi, Andreas Munk, Jialin Liu, Bradley Gram-Hansen, Gilles Louppe, Lawrence Meadows, Philip Torr, Victor Lee, Kyle Cranmer, Mr. Prabhat, Frank Wood	We present a novel probabilistic programming framework that couples directly to existing large-scale simulators through a cross-platform probabilistic execution protocol, which allows general-purpose inference engines to record and control random number draws within simulators in a language-agnostic way.
491	Markov Random Fields for Collaborative Filtering	Harald Steck	In this paper, we model the dependencies among the items that are recommended to a user in a collaborative-filtering problem via a Gaussian Markov Random Field (MRF).
492	A Step Toward Quantifying Independently Reproducible Machine Learning Research	Edward Raff	We take the first step toward a quantifiable answer by manually attempting to implement 255 papers published from 1984 until 2017, recording features of each paper, and performing statistical analysis of the results.
493	Scalable Global Optimization via Local Bayesian Optimization	David Eriksson, Michael Pearce, Jacob Gardner, Ryan D. Turner, Matthias Poloczek	We propose the TuRBO algorithm that fits a collection of local models and performs a principled global allocation of samples across these models via an implicit bandit approach.
494	Time-series Generative Adversarial Networks	Jinsung Yoon, Daniel Jarrett, M Van Der Schaar	We propose a novel framework for generating realistic time-series data that combines the flexibility of the unsupervised paradigm with the control afforded by supervised training.
495	Ouroboros: On Accelerating Training of Transformer-Based Language Models	Qian Yang, Zhouyuan Huo, Wenlin Wang, Lawrence Carin	We propose the first model-parallel algorithm that speeds the training of Transformer-based language models.
496	A Refined Margin Distribution Analysis for Forest Representation Learning	Shen-Huan Lyu, Liang Yang, Zhi-Hua Zhou	In this paper, we formulate the forest representation learning approach called \textsc{CasDF} as an additive model which boosts the augmented feature instead of the prediction.
497	Robustness to Adversarial Perturbations in Learning from Incomplete Data	Amir Najafi, Shin-ichi Maeda, Masanori Koyama, Takeru Miyato	We develop a generalization theory for our framework based on a number of novel complexity measures, such as an adversarial extension of Rademacher complexity and its semi-supervised analogue.
498	Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks	Kohei Hayashi, Taiki Yamaguchi, Yohei Sugawara, Shin-ichi Maeda	In this study, we first characterize a decomposition class specific to CNNs by adopting a flexible graphical notation.
499	An Adaptive Empirical Bayesian Method for Sparse Deep Learning	Wei Deng, Xiao Zhang, Faming Liang, Guang Lin	We propose a novel adaptive empirical Bayesian (AEB) method for sparse deep learning, where the sparsity is ensured via a class of self-adaptive spike-and-slab priors.
500	Adaptive Influence Maximization with Myopic Feedback	Binghui Peng, Wei Chen	We study the adaptive influence maximization problem with myopic feedback under the independent cascade model: one sequentially selects k nodes as seeds one by one from a social network, and each selected seed returns the immediate neighbors it activates as the feedback available for by later selections, and the goal is to maximize the expected number of total activated nodes, referred as the influence spread.
501	Focused Quantization for Sparse CNNs	Yiren Zhao, Xitong Gao, Daniel Bates, Robert Mullins, Cheng-Zhong Xu	In this paper, we attend to the statistical properties of sparse CNNs and present focused quantization, a novel quantization strategy based on power-of-two values, which exploits the weight distributions after fine-grained pruning.
502	Quantum Embedding of Knowledge for Reasoning	Dinesh Garg, Shajith Ikbal Mohamed, Santosh K. Srivastava, Harit Vishwakarma, Hima Karanam, L Venkata Subramaniam	We present a novel approach called Embed2Reason (E2R) that embeds a symbolic KB into a vector space in a logical structure preserving manner.
503	Optimal Best Markovian Arm Identification with Fixed Confidence	Vrettos Moulos	We give a complete characterization of the sampling complexity of best Markovian arm identification in one-parameter Markovian bandit models.
504	Limiting Extrapolation in Linear Approximate Value Iteration	Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill	We introduce an algorithm that approximates value functions by combining Q-values estimated at a set of anchor states.
505	Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model	Andrea Zanette, Mykel J. Kochenderfer, Emma Brunskill	We propose an algorithm that is initially agnostic to the MDP but that can leverage the specific MDP structure, expressed in terms of variances of the rewards and next-state value function, and gaps in the optimal action-value function to reduce the sample complexity needed to find a good policy, precisely highlighting the contribution of each state-action pair to the final sample complexity.
506	Invertible Convolutional Flow	Mahdi Karami, Dale Schuurmans, Jascha Sohl-Dickstein, Laurent Dinh, Daniel Duckworth	As an alternative, we investigate a set of novel normalizing flows based on the circular and symmetric convolutions.
507	A Latent Variational Framework for Stochastic Optimization	Philippe Casgrain	This paper provides a unifying theoretical framework for stochastic optimization algorithms by means of a latent stochastic variational problem.
508	Topology-Preserving Deep Image Segmentation	Xiaoling Hu, Fuxin Li, Dimitris Samaras, Chao Chen	We propose a novel method that learns to segment with correct topology.
509	Connective Cognition Network for Directional Visual Commonsense Reasoning	Aming Wu, Linchao Zhu, Yahong Han, Yi Yang	Inspired by this idea, towards VCR, we propose a connective cognition network (CCN) to dynamically reorganize the visual neuron connectivity that is contextualized by the meaning of questions and answers.
510	Online Markov Decoding: Lower Bounds and Near-Optimal Approximation Algorithms	Vikas Garg, Tamar Pichkhadze	We resolve the fundamental problem of online decoding with general nth order ergodic Markov chain models.
511	A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning	Francisco Garcia, Philip S. Thomas	In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems.
512	Push-pull Feedback Implements Hierarchical Information Retrieval Efficiently	Xiao Liu, Xiaolong Zou, Zilong Ji, Gengshuo Tian, Yuanyuan Mi, Tiejun Huang, K. Y. Michael Wong, Si Wu	Here, we investigate the role of feedback in hierarchical information retrieval.
513	Learning Disentangled Representations for Recommendation	Jianxin Ma, Chang Zhou, Peng Cui, Hongxia Yang, Wenwu Zhu	In this paper, we present the MACRo-mIcro Disentangled Variational Auto-Encoder (MacridVAE) for learning disentangled representations from user behavior.
514	Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels	Simon S. Du, Kangcheng Hou, Russ R. Salakhutdinov, Barnabas Poczos, Ruosong Wang, Keyulu Xu	The current paper presents a new class of graph kernels, Graph Neural Tangent Kernels (GNTKs), which correspond to infinitely wide multi-layer GNNs trained by gradient descent.
515	In-Place Zero-Space Memory Protection for CNN	Hui Guan, Lin Ning, Zhen Lin, Xipeng Shen, Huiyang Zhou, Seung-Hwan Lim	This paper introduces in-place zero-space ECC assisted with a new training scheme weight distribution-oriented training.
516	Acceleration via Symplectic Discretization of High-Resolution Differential Equations	Bin Shi, Simon S. Du, Weijie Su, Michael I. Jordan	We study first-order optimization algorithms obtained by discretizing ordinary differential equations (ODEs) corresponding to Nesterov’s accelerated gradient methods (NAGs) and Polyak’s heavy-ball method.
517	XLNet: Generalized Autoregressive Pretraining for Language Understanding	Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, Quoc V. Le	In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation.
518	Comparison Against Task Driven Artificial Neural Networks Reveals Functional Properties in Mouse Visual Cortex	Jianghong Shi, Eric Shea-Brown, Michael Buice	We find that the comparison procedure is robust to different choices of stimuli set and the level of sub-sampling that one might expect in a large scale brain survey with thousands of neurons.
519	Variance Reduced Policy Evaluation with Smooth Function Approximation	Hoi-To Wai, Mingyi Hong, Zhuoran Yang, Zhaoran Wang, Kexin Tang	We formulate the policy evaluation problem as a non-convex primal-dual, finite-sum optimization problem, whose primal sub-problem is non-convex and dual sub-problem is strongly concave.
520	Learning GANs and Ensembles Using Discrepancy	Ben Adlam, Corinna Cortes, Mehryar Mohri, Ningshan Zhang	We present efficient algorithms using discrepancy for two tasks: training a GAN directly, namely DGAN, and mixing previously trained generative models, namely EDGAN.
521	Co-Generation with GANs using AIS based HMC	Tiantian Fang, Alexander Schwing	Therefore, in this paper, we study the occurring challenges for co-generation with GANs.
522	AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification	Ronghui You, Zihan Zhang, Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, Shanfeng Zhu	We propose a new label tree-based deep learning model for XMTC, called AttentionXML, with two unique features: 1) a multi-label attention mechanism with raw text as input, which allows to capture the most relevant part of text to each label; and 2) a shallow and wide probabilistic label tree (PLT), which allows to handle millions of labels, especially for “tail labels”.
523	Addressing Sample Complexity in Visual Tasks Using HER and Hallucinatory GANs	Himanshu Sahni, Toby Buckley, Pieter Abbeel, Ilya Kuzovkin	In this work, we show how visual trajectories can be hallucinated to appear successful by altering agent observations using a generative model trained on relatively few snapshots of the goal.
524	Abstract Reasoning with Distracting Features	Kecheng Zheng, Zheng-Jun Zha, Wei Wei	Inspired this fact, we propose feature robust abstract reasoning (FRAR) model, which consists of a reinforcement learning based teacher network to determine the sequence of training and a student network for predictions.
525	Generalized Block-Diagonal Structure Pursuit: Learning Soft Latent Task Assignment against Negative Transfer	Zhiyong Yang, Qianqian Xu, Yangbangyan Jiang, Xiaochun Cao, Qingming Huang	To circumvent this issue, we propose a novel multi-task learning method, which simultaneously learns latent task representations and a block-diagonal Latent Task Assignment Matrix (LTAM).
526	Adversarial Training and Robustness for Multiple Perturbations	Florian Tramer, Dan Boneh	Our aim is to understand the reasons underlying this robustness trade-off, and to train models that are simultaneously robust to multiple perturbation types.
527	Doubly-Robust Lasso Bandit	Gi-Soo Kim, Myunghee Cho Paik	We consider the stochastic linear contextual bandit problem and propose a novel algorithm, namely the Doubly-Robust Lasso Bandit algorithm, which exploits the sparse structure of the regression parameter as in Lasso, while blending the doubly-robust technique used in missing data literature.
528	DM2C: Deep Mixed-Modal Clustering	Yangbangyan Jiang, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang	In this paper, we consider a more challenging task where each instance is represented in only one modality, which we call mixed-modal data.
529	MaCow: Masked Convolutional Generative Flow	Xuezhe Ma, Xiang Kong, Shanghang Zhang, Eduard Hovy	In this work, we introduce masked convolutional generative flow (MaCow), a simple yet effective architecture of generative flow using masked convolution.
530	Learning by Abstraction: The Neural State Machine	Drew Hudson, Christopher D. Manning	We introduce the Neural State Machine, seeking to bridge the gap between the neural and symbolic views of AI and integrate their complementary strengths for the task of visual reasoning.
531	Adaptive Gradient-Based Meta-Learning Methods	Mikhail Khodak, Maria-Florina F. Balcan, Ameet S. Talwalkar	We build a theoretical framework for designing and understanding practical meta-learning methods that integrates sophisticated formalizations of task-similarity with the extensive literature on online convex optimization and sequential prediction algorithms.
532	Equipping Experts/Bandits with Long-term Memory	Kai Zheng, Haipeng Luo, Ilias Diakonikolas, Liwei Wang	We propose the first black-box approach to obtaining long-term memory guarantees for online learning in the sense of Bousquet and Warmuth, 2002, by reducing the problem to achieving typical switching regret.
533	A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning	Wenhao Yang, Xiang Li, Zhihua Zhang	We propose and study a general framework for regularized Markov decision processes (MDPs) where the goal is to find an optimal policy that maximizes the expected discounted total reward plus a policy regularization term.
534	Scalable inference of topic evolution via models for latent geometric structures	Mikhail Yurochkin, Zhiwei Fan, Aritra Guha, Paraschos Koutris, XuanLong Nguyen	We develop new models and algorithms for learning the temporal dynamics of the topic polytopes and related geometric objects that arise in topic model based inference.
535	Effective End-to-end Unsupervised Outlier Detection via Inlier Priority of Discriminative Network	Siqi Wang, Yijie Zeng, Xinwang Liu, En Zhu, Jianping Yin, Chuanfu Xu, Marius Kloft	In this paper, we propose a framework named E^3Outlier, which can perform UOD in a both effective and end-to-end manner: First, instead of the commonly-used autoencoders in previous end-to-end UOD methods, E^3Outlier for the first time leverages a discriminative DNN for better representation learning, by using surrogate supervision to create multiple pseudo classes from original unlabelled data.
536	Deep Active Learning with a Neural Architecture Search	Yonatan Geifman, Ran El-Yaniv	We challenge this assumption and propose a novel active strategy whereby the learning algorithm searches for effective architectures on the fly, while actively learning.
537	Efficiently escaping saddle points on manifolds	Christopher Criscitiello, Nicolas Boumal	Generalizing Jin et al.’s recent work on perturbed gradient descent (PGD) for optimization on linear spaces [How to Escape Saddle Points Efficiently (2017), Stochastic Gradient Descent Escapes Saddle Points Efficiently (2019)], we study a version of perturbed Riemannian gradient descent (PRGD) to show that necessary optimality conditions can be met approximately with high probability, without evaluating the Hessian.
538	AutoAssist: A Framework to Accelerate Training of Deep Neural Networks	Jiong Zhang, Hsiang-Fu Yu, Inderjit S. Dhillon	In this paper, we propose AutoAssist, a simple framework to accelerate training of a deep neural network.
539	DFNets: Spectral CNNs for Graphs with Feedback-Looped Filters	W. O. K. Asiri Suranga Wijesinghe, Qing Wang	We propose a novel spectral convolutional neural network (CNN) model on graph structured data, namely Distributed Feedback-Looped Networks (DFNets).
540	Learning Dynamics of Attention: Human Prior for Interpretable Machine Reasoning	Wonjae Kim, Yoonho Lee	We propose Dynamics of Attention for Focus Transition (DAFT) as a human prior for machine reasoning.
541	Comparing Unsupervised Word Translation Methods Step by Step	Mareike Hartmann, Yova Kementchedjhieva, Anders S?gaard	We focus on the first step and compare distribution matching techniques in the context of language pairs for which mixed training stability and evaluation scores have been reported.
542	Learning from Bad Data via Generation	Tianyu Guo, Chang Xu, Boxin Shi, Chao Xu, Dacheng Tao	We suppose the real data distribution lies in a distribution set supported by the empirical distribution of bad data. A worst-case formulation can be developed over this distribution set, and then be interpreted as a generation task in an adversarial manner.
543	Constrained deep neural network architecture search for IoT devices accounting for hardware calibration	Florian Scheidegger, Luca Benini, Costas Bekas, A. Cristiano I. Malossi	We propose a unique narrow-space architecture search that focuses on delivering low-cost and rapidly executing networks that respect strict memory and time requirements typical of Internet-of-Things (IoT) near-sensor computing platforms.
544	Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection	Yihe Dong, Samuel Hopkins, Jerry Li	We study two problems in high-dimensional robust statistics: \emph{robust mean estimation} and \emph{outlier detection}.
545	Iterative Least Trimmed Squares for Mixed Linear Regression	Yanyao Shen, Sujay Sanghavi	In this paper, we analyze ILTS in the setting of mixed linear regression with corruptions (MLR-C).
546	Dynamic Ensemble Modeling Approach to Nonstationary Neural Decoding in Brain-Computer Interfaces	Yu Qi, Bin Liu, Yueming Wang, Gang Pan	We propose a dynamic ensemble modeling (DyEnsemble) approach that is capable of adapting to changes in neural signals by employing a proper combination of decoding functions.
547	Divergence-Augmented Policy Optimization	Qing Wang, Yingru Li, Jiechao Xiong, Tong Zhang	This paper introduces a method to stabilize policy optimization when off-policy data are reused.
548	Intrinsic dimension of data representations in deep neural networks	Alessio Ansuini, Alessandro Laio, Jakob H. Macke, Davide Zoccolan	Here we study the intrinsic dimensionality (ID) of data representations, i.e. the minimal number of parameters needed to describe a representation.
549	Towards a Zero-One Law for Column Subset Selection	Zhao Song, David Woodruff, Peilin Zhong	In this work we give approximation algorithms for {\it every} function $g$ which is approximately monotone and satisfies an approximate triangle inequality, and we show both of these conditions are necessary.
550	Compositional De-Attention Networks	Yi Tay, Anh Tuan Luu, Aston Zhang, Shuohang Wang, Siu Cheung Hui	This paper proposes a new quasi-attention that is compositional in nature, i.e., learning whether to \textit{add}, \textit{subtract} or \textit{nullify} a certain vector when learning representations.
551	Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning	Jian Ni, Shanghang Zhang, Haiyong Xie	To address these limitations, we propose a Dual Adversarial Semantics-Consistent Network (referred to as DASCN), which learns both primal and dual Generative Adversarial Networks (GANs) in a unified framework for GZSL.
552	Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers	Zeyuan Allen-Zhu, Yuanzhi Li, Yingyu Liang	In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations.
553	Mining GOLD Samples for Conditional GANs	Sangwoo Mo, Chiheon Kim, Sungwoong Kim, Minsu Cho, Jinwoo Shin	We introduce a simple yet effective approach to improving cGANs by measuring the discrepancy between the data distribution and the model distribution on given samples.
554	Deep Model Transferability from Attribution Maps	Jie Song, Yixin Chen, Xinchao Wang, Chengchao Shen, Mingli Song	In this paper, we propose an embarrassingly simple yet very efficacious approach to estimating the transferability of deep networks, especially those handling vision tasks.
555	Fully Parameterized Quantile Function for Distributional Reinforcement Learning	Derek Yang, Li Zhao, Zichuan Lin, Tao Qin, Jiang Bian, Tie-Yan Liu	In this paper, we propose fully parameterized quantile function that parameterizes both the quantile fraction axis (i.e., the x-axis) and the value axis (i.e., y-axis) for distributional RL.
556	Direct Optimization through `\arg \max` for Discrete Variational Auto-Encoder	Guy Lorberbom, Tommi Jaakkola, Andreea Gane, Tamir Hazan	In contrast to previous works which resort to \emph{softmax}-based relaxations, we propose to optimize it directly by applying the \emph{direct loss minimization} approach.
557	Distributional Reward Decomposition for Reinforcement Learning	Zichuan Lin, Li Zhao, Derek Yang, Tao Qin, Tie-Yan Liu, Guangwen Yang	In this paper, we propose Distributional Reward Decomposition for Reinforcement Learning (DRDRL), a novel reward decomposition algorithm which captures the multiple reward channel structure under distributional setting.
558	L_DMI: A Novel Information-theoretic Loss Function for Training Deep Nets Robust to Label Noise	Yilun Xu, Peng Cao, Yuqing Kong, Yizhou Wang	In this paper, we propose a novel information-theoretic loss function, L_DMI, for training deep neural networks robust to label noise.
559	Convergence Guarantees for Adaptive Bayesian Quadrature Methods	Motonobu Kanagawa, Philipp Hennig	In this work, for a broad class of adaptive Bayesian quadrature methods, we prove consistency, deriving non-tight but informative convergence rates.
560	Progressive Augmentation of GANs	Dan Zhang, Anna Khoreva	To mitigate this issue we introduce a new regularization technique – progressive augmentation of GANs (PA-GAN).
561	UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization	Ali Kavis, Kfir Y. Levy, Francis Bach, Volkan Cevher	We propose a novel adaptive, accelerated algorithm for the stochastic constrained convex optimization setting.Our method, which is inspired by the Mirror-Prox method, \emph{simultaneously} achieves the optimal rates for smooth/non-smooth problems with either deterministic/stochastic first-order oracles.
562	Meta-Surrogate Benchmarking for Hyperparameter Optimization	Aaron Klein, Zhenwen Dai, Frank Hutter, Neil Lawrence, Javier Gonzalez	This work proposes a method to alleviate these issues by means of a meta-surrogate model for HPO tasks trained on off-line generated data.
563	Learning to Perform Local Rewriting for Combinatorial Optimization	Xinyun Chen, Yuandong Tian	In this paper, we propose NeuRewriter that learns a policy to pick heuristics and rewrite the local components of the current solution to iteratively improve it until convergence.
564	Anti-efficient encoding in emergent communication	Rahma Chaabouni, Eugene Kharitonov, Emmanuel Dupoux, Marco Baroni	We study whether the same pattern emerges when two neural networks, a “speaker” and a “listener”, are trained to play a signaling game.
565	Singleshot : a scalable Tucker tensor decomposition	Abraham Traore, Maxime Berar, Alain Rakotomamonjy	This paper introduces a new approach for the scalable Tucker decomposition problem.
566	Neural Machine Translation with Soft Prototype	Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Cheng Xiang Zhai, Tie-Yan Liu	In this work, we propose a new framework that introduces a soft prototype into the encoder-decoder architecture, which allows the decoder to have indirect access to both past and future information, such that each target word can be generated based on the better global understanding.
567	Reliable training and estimation of variance networks	Nicki Skafte, Martin J?rgensen, S?ren Hauberg	We propose and investigate new complementary methodologies for estimating predictive variance networks in regression neural networks.
568	Copula Multi-label Learning	Weiwei Liu	In particular, the paper first leverages the kernel trick to construct continuous distribution in the output space, and then estimates our proposed model semiparametrically where the copula is modeled parametrically, while the marginal distributions are modeled nonparametrically.
569	Bayesian Learning of Sum-Product Networks	Martin Trapp, Robert Peharz, Hong Ge, Franz Pernkopf, Zoubin Ghahramani	In this paper, we introduce a well-principled Bayesian framework for SPN structure learning.
570	Bayesian Batch Active Learning as Sparse Subset Approximation	Robert Pinsler, Jonathan Gordon, Eric Nalisnick, Jos? Miguel Hern?ndez-Lobato	In this paper, we introduce a novel Bayesian batch active learning approach that mitigates these issues.
571	Optimal Sparsity-Sensitive Bounds for Distributed Mean Estimation	zengfeng Huang, Ziyue Huang, Yilei WANG, Ke Yi	We propose a new sparsity-aware algorithm, which improves previous results both theoretically and empirically.
572	Global Sparse Momentum SGD for Pruning Very Deep Neural Networks	Xiaohan Ding, guiguang ding, Xiangxin Zhou, Yuchen Guo, Jungong Han, Ji Liu	In this paper, we propose a novel momentum-SGD-based optimization method to reduce the network complexity by on-the-fly pruning.
573	Variational Bayesian Decision-making for Continuous Utilities	Tomasz Kusmierczyk, Joseph Sakaya, Arto Klami	We present an automatic pipeline that co-opts continuous utilities into variational inference algorithms to account for decision-making.
574	The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks	Ryo Karakida, Shotaro Akaho, Shun-ichi Amari	We reveal that batch normalization in the last layer contributes to drastically decreasing such pathological sharpness if the width and sample number satisfy a specific condition.
575	Single-Model Uncertainties for Deep Learning	Natasa Tagasovska, David Lopez-Paz	To estimate aleatoric uncertainty, we propose Simultaneous Quantile Regression (SQR), a loss function to learn all the conditional quantiles of a given target variable.
576	Is Deeper Better only when Shallow is Good?	Eran Malach, Shai Shalev-Shwartz	In this work we explore the relation between expressivity properties of deep networks and the ability to train them efficiently using gradient-based algorithms.
577	Wasserstein Weisfeiler-Lehman Graph Kernels	Matteo Togninalli, Elisabetta Ghisu, Felipe Llinares-L?pez, Bastian Rieck, Karsten Borgwardt	We propose a novel method that relies on the Wasserstein distance between the node feature vector distributions of two graphs, which allows to find subtler differences in data sets by considering graphs as high-dimensional objects, rather than simple means.
578	Domain Generalization via Model-Agnostic Learning of Semantic Features	Qi Dou, Daniel Coelho de Castro, Konstantinos Kamnitsas, Ben Glocker	We investigate the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics.
579	Grid Saliency for Context Explanations of Semantic Segmentation	Lukas Hoyer, Mauricio Munoz, Prateek Katiyar, Anna Khoreva, Volker Fischer	To overcome this limitation, we extend the existing approaches to generate grid saliencies, which provide spatially coherent visual explanations for (pixel-level) dense prediction networks.
580	First-order methods almost always avoid saddle points: The case of vanishing step-sizes	Ioannis Panageas, Georgios Piliouras, Xiao Wang	In this paper, we resolve this question on the affirmative for gradient descent, mirror descent, manifold descent and proximal point.
581	Maximum Mean Discrepancy Gradient Flow	Michael Arbel, Anna Korba, Adil SALIM, Arthur Gretton	We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties.
582	Oblivious Sampling Algorithms for Private Data Analysis	Sajin Sasy, Olga Ohrimenko	We study secure and privacy-preserving data analysis based on queries executed on samples from a dataset.
583	Semi-supervisedly Co-embedding Attributed Networks	Zaiqiao Meng, Shangsong Liang, Jinyuan Fang, Teng Xiao	In this paper, to deal with the problem, we present a semi-supervised co-embedding model for attributed networks (SCAN) based on the generalized SVAE for the heterogeneous data, which collaboratively learns low- dimensional vector representations of both nodes and attributes for partially labelled attributed networks semi-supervisedly.
584	From voxels to pixels and back: Self-supervision in natural-image reconstruction from fMRI	Roman Beliy, Guy Gaziv, Assaf Hoogi, Francesca Strappini, Tal Golan, Michal Irani	We present a novel approach which, in addition to the scarce labeled data (training pairs), allows to train fMRI-to-image reconstruction networks also on “unlabeled” data (i.e., images without fMRI recording, and fMRI recording without images).
585	Copulas as High-Dimensional Generative Models: Vine Copula Autoencoders	Natasa Tagasovska, Damien Ackerer, Thibault Vatter	We introduce the vine copula autoencoder (VCAE), a flexible generative model for high-dimensional distributions built in a straightforward three-step procedure.
586	Nonstochastic Multiarmed Bandits with Unrestricted Delays	Tobias Sommer Thune, Nicol? Cesa-Bianchi, Yevgeny Seldin	We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical nor bounded.
587	BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling	Lars Maal?e, Marco Fraccaro, Valentin Li?vin, Ole Winther	In this paper we close the performance gap by constructing VAE models that can effectively utilize a deep hierarchy of stochastic variables and model complex covariance structures.
588	Code Generation as a Dual Task of Code Summarization	Bolin Wei, Ge Li, Xin Xia, Zhiyi Fu, Zhi Jin	In this paper, we apply the relations between two tasks to improve the performance of both tasks.
589	Diffeomorphic Temporal Alignment Nets	Ron A. Shapira Weber, Matan Eyal, Nicki Skafte, Oren Shriki, Oren Freifeld	Here we propose the Diffeomorphic Temporal alignment Net (DTAN), a learning-based method for time-series joint alignment.
590	Weakly Supervised Instance Segmentation using the Bounding Box Tightness Prior	Cheng-Chun Hsu, Kuang-Jui Hsu, Chung-Chi Tsai, Yen-Yu Lin, Yung-Yu Chuang	This paper presents a weakly supervised instance segmentation method that consumes training data with tight bounding box annotations.
591	On the Power and Limitations of Random Features for Understanding Neural Networks	Gilad Yehudai, Ohad Shamir	In this paper, we formalize the link between existing results and random features, and argue that despite the impressive positive results, random feature approaches are also inherently limited in what they can explain.
592	Efficient Pure Exploration in Adaptive Round model	tianyuan jin, Jieming SHI, Xiaokui Xiao, Enhong Chen	In this paper, we study both PAC and exact top-$k$ arm identification problems and design efficient algorithms considering both round complexity and query complexity.
593	Multi-objects Generation with Amortized Structural Regularization	Taufik Xu, Chongxuan LI, Jun Zhu, Bo Zhang	In this paper, we propose amortized structural regularization (ASR), which adopts posterior regularization (PR) to embed human knowledge into DGMs via a set of structural constraints.
594	Neural Shuffle-Exchange Networks – Sequence Processing in O(n log n) Time	Karlis Freivalds, Emils Ozolin?, Agris ?ostaks	We introduce a new Shuffle-Exchange neural network model for sequence to sequence tasks which have O(log n) depth and O(n log n) total complexity.
595	DetNAS: Backbone Search for Object Detection	Yukang Chen, Tong Yang, Xiangyu Zhang, GAOFENG MENG, Xinyu Xiao, Jian Sun	In this work, we present DetNAS to use Neural Architecture Search (NAS) for the design of better backbones for object detection.
596	Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates	Adil SALIM, Dmitry Koralev, Peter Richtarik	We propose a new algorithm—Stochastic Proximal Langevin Algorithm (SPLA)—for sampling from a log concave distribution.
597	Fast AutoAugment	Sungbin Lim, Ildoo Kim, Taesup Kim, Chiheon Kim, Sungwoong Kim	In this paper, we propose an algorithm called Fast AutoAugment that finds effective augmentation policies via a more efficient search strategy based on density matching.
598	On the Convergence Rate of Training Recurrent Neural Networks	Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song	More importantly, in this paper we build general toolkits to analyze multi-layer networks with ReLU activations.
599	Interval timing in deep reinforcement learning agents	Ben Deverett, Ryan Faulkner, Meire Fortunato, Gregory Wayne, Joel Z. Leibo	In artificial agents, little work has directly addressed (1) which architectural components are necessary for successful development of this ability, (2) how this timing ability comes to be represented in the units and actions of the agent, and (3) whether the resulting behavior of the system converges on solutions similar to those of biology.
600	Graph-based Discriminators: Sample Complexity and Expressiveness	Roi Livni, Yishay Mansour	For $k\geq 2$ we introduce a notion similar to the VC-dimension, and show that it controls the sample complexity.
601	Large Scale Structure of Neural Network Loss Landscapes	Stanislav Fort, Stanislaw Jastrzebski	We propose and experimentally verify a unified phenomenological model of the loss landscape that incorporates many of them.
602	Learning Nonsymmetric Determinantal Point Processes	Mike Gartrell, Victor-Emmanuel Brunel, Elvis Dohmatob, Syrine Krichene	We present a method that enables a tractable algorithm, based on maximum likelihood estimation, for learning nonsymmetric DPPs from data composed of observed subsets.
603	Hypothesis Set Stability and Generalization	Dylan J. Foster, Spencer Greenberg, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan	We present a study of generalization for data-dependent hypothesis sets.
604	Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds	Bo Yang, Jianan Wang, Ronald Clark, Qingyong Hu, Sen Wang, Andrew Markham, Niki Trigoni	We propose a novel, conceptually simple and general framework for instance segmentation on 3D point clouds.
605	Precision-Recall Balanced Topic Modelling	Seppo Virtanen, Mark Girolami	We formulate topic modelling as an information retrieval task, where the goal is, based on the latent topic representation, to capture relevant term co-occurrence patterns.
606	Learning Sparse Distributions using Iterative Hard Thresholding	Jacky Y. Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi O. Koyejo	In this work, we consider IHT as a solution to the problem of learning sparse discrete distributions.
607	Discriminative Topic Modeling with Logistic LDA	Iryna Korshunova, Hanchen Xiong, Mateusz Fedoryszak, Lucas Theis	We propose logistic LDA, a novel discriminative variant of latent Dirichlet allocation which is easy to apply to arbitrary inputs.
608	Quantum Wasserstein Generative Adversarial Networks	Shouvanik Chakrabarti, Huang Yiming, Tongyang Li, Soheil Feizi, Xiaodi Wu	Inspired by previous studies on the adversarial training of classical and quantum generative models, we propose the first design of quantum Wasserstein Generative Adversarial Networks (WGANs), which has been shown to improve the robustness and the scalability of the adversarial training of quantum generative models even on noisy quantum hardware.
609	Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion	Joan Serr?, Santiago Pascual, Carlos Segura Perales	In this paper, we propose Blow, a single-scale normalizing flow using hypernetwork conditioning to perform many-to-many voice conversion between raw audio.
610	Hyperparameter Learning via Distributional Transfer	Ho Chung Law, Peilin Zhao, Leung Sing Chan, Junzhou Huang, Dino Sejdinovic	We propose to transfer information across tasks using learnt representations of training datasets used in those tasks.
611	Discriminator optimal transport	Akinori Tanaka	Based on some experiments and a bit of OT theory, we propose discriminator optimal transport (DOT) scheme to improve generated images.
612	High-dimensional multivariate forecasting with low-rank Gaussian Copula Processes	David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico, Jan Gasthaus	We propose to combine an RNN-based time series model with a Gaussian copula process output model with a low-rank covariance structure to reduce the computational complexity and handle non-Gaussian marginal distributions.
613	Are Anchor Points Really Indispensable in Label-Noise Learning?	Xiaobo Xia, Tongliang Liu, Nannan Wang, Bo Han, Chen Gong, Gang Niu, Masashi Sugiyama	In this paper, without employing anchor points, we propose a transition-revision (T-Revision) method to effectively learn transition matrices, leading to better classifiers.
614	Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations	Fenglin Liu, Yuanxin Liu, Xuancheng Ren, Xiaodong He, Xu Sun	In this work, we aim at representing an image with a set of integrated visual regions and corresponding textual concepts, reflecting certain semantics.
615	Differentiable Ranking and Sorting using Optimal Transport	Marco Cuturi, Olivier Teboul, Jean-Philippe Vert	We propose in this paper to replace the usual sort procedure with a differentiable proxy.
616	Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks	Ga?l Letarte, Pascal Germain, Benjamin Guedj, Francois Laviolette	We present a comprehensive study of multilayer neural networks with binary activation, relying on the PAC-Bayesian theory.
617	Likelihood-Free Overcomplete ICA and Applications In Causal Discovery	Chenwei DING, Mingming Gong, Kun Zhang, Dacheng Tao	To tackle these problems, we present a Likelihood-Free Overcomplete ICA algorithm (LFOICA) that estimates the mixing matrix directly by back-propagation without any explicit assumptions on the density function of independent components.
618	Interior-Point Methods Strike Back: Solving the Wasserstein Barycenter Problem	DongDong Ge, Haoyue Wang, Zikai Xiong, Yinyu Ye	In this paper, we overcome the difficulty by developing a new adapted interior-point method that fully exploits the problem’s special matrix structure to reduce the iteration complexity and speed up the Newton procedure.
619	Beyond Vector Spaces: Compact Data Representation as Differentiable Weighted Graphs	Denis Mazur, Vage Egiazarian, Stanislav Morozov, Artem Babenko	In this paper, we aim to eliminate the inductive bias imposed by the embedding space geometry.
620	Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections	Boris Muzellec, Marco Cuturi	We propose in this work two methods to extrapolate, from an transport map that is optimal on a subspace, one that is nearly optimal in the entire space.
621	Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent	Huizhuo Yuan, Xiangru Lian, Chris Junchi Li, Ji Liu, Wenqing Hu	In this paper, we investigate the stochastic compositional optimization in the general smooth non-convex setting.
622	On the convergence of single-call stochastic extra-gradient methods	Yu-Guan Hsieh, Franck Iutzeler, J?r?me Malick, Panayotis Mertikopoulos	In this paper, we develop a synthetic view of such algorithms, and we complement the existing literature by showing that they retain a $O(1/t)$ ergodic convergence rate in smooth, deterministic problems.
623	Infra-slow brain dynamics as a marker for cognitive function and decline	Shagun Ajmera Shyam Sunder Ajmera, Shreya Rajagopal, Razi Rehman, Devarajan Sridharan	We investigated this question with a novel application of Gaussian Process Factor Analysis (GPFA) and machine learning to fMRI data.
624	Robust Principal Component Analysis with Adaptive Neighbors	Rui Zhang, Hanghang Tong	To tackle the issue, we propose a general framework namely robust weight learning with adaptive neighbors (RWL-AN), via which adaptive weight vector is automatically obtained with both robustness and sparse neighbors.
625	High-Quality Self-Supervised Deep Image Denoising	Samuli Laine, Tero Karras, Jaakko Lehtinen, Timo Aila	We describe a novel method for training high-quality image denoising models based on unorganized collections of corrupted images.
626	Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup	Sebastian Goldt, Madhu Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborov?	We show how the dynamics of stochastic gradient descent (SGD) is captured by a set of differential equations and prove that this description is asymptotically exact in the limit of large inputs.
627	GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs	Yuan Liu, Zehong Shen, Zhixuan Lin, Sida Peng, Hujun Bao, Xiaowei Zhou	In this paper, we introduce a novel visual descriptor named Group Invariant Feature Transform (GIFT), which is both discriminative and robust to geometric transformations.
628	Online Prediction of Switching Graph Labelings with Cluster Specialists	Mark Herbster, James Robinson	We present an algorithm based on a specialist approach; we develop the machinery of cluster specialists which probabilistically exploits the cluster structure in the graph.
629	Graph-Based Semi-Supervised Learning with Non-ignorable Non-response	Fan Zhou, Tengfei Li, Haibo Zhou, Hongtu Zhu, Ye Jieping	To solve the problem, we propose a Graph-based joint model with Non-ignorable Non-response (GNN), followed by a joint inverse weighting estimation procedure incorporated with sampling imputation approach.
630	BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning	Andreas Kirsch, Joost van Amersfoort, Yarin Gal	We develop BatchBALD, a tractable approximation to the mutual information between a batch of points and model parameters, which we use as an acquisition function to select multiple informative points jointly for the task of deep Bayesian active learning.
631	A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off	Yaniv Blumenfeld, Dar Gilboa, Daniel Soudry	We apply mean field techniques to networks with quantized activations in order to evaluate the degree to which quantization degrades signal propagation at initialization.
632	Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs	Marek Petrik, Reazul Hasan Russel	This paper proposes a new paradigm that can achieve better solutions with the same robustness guarantees without using confidence regions as ambiguity sets.
633	Cross-lingual Language Model Pretraining	Alexis CONNEAU, Guillaume Lample	In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining.
634	Approximate Bayesian Inference for a Mechanistic Model of Vesicle Release at a Ribbon Synapse	Cornelius Schr?der, Ben James, Leon Lagnado, Philipp Berens	Here, we develop an approximate Bayesian inference scheme for a fully stochastic, biophysically inspired model of glutamate release at the ribbon synapse, a highly specialized synapse found in different sensory systems.
635	Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input	Maxence Ernoult, Benjamin Scellier, Yoshua Bengio, Damien Querlioz, Julie Grollier	In this work, we introduce a discrete-time version of EP with simplified equations and with reduced simulation time, bringing EP closer to practical machine learning tasks.
636	Universal Invariant and Equivariant Graph Neural Networks	Nicolas Keriven, Gabriel Peyr?	In this paper, we consider a specific class of invariant and equivariant networks, for which we prove new universality theorems.
637	Are sample means in multi-armed bandits positively or negatively biased?	Jaehyeok Shin, Aaditya Ramdas, Alessandro Rinaldo	In this paper, we decouple three different sources of this selection bias: adaptive \emph{sampling} of arms, adaptive \emph{stopping} of the experiment, and adaptively \emph{choosing} which arm to study.
638	On the Correctness and Sample Complexity of Inverse Reinforcement Learning	Abi Komanduru, Jean Honorio	A L1-regularized Support Vector Machine formulation of the IRL problem motivated by the geometric analysis is then proposed with the basic objective of the inverse reinforcement problem in mind: to find a reward function that generates a specified optimal policy.
639	VIREL: A Variational Inference Framework for Reinforcement Learning	Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson	We propose VIREL, a theoretically grounded probabilistic inference framework for RL that utilises a parametrised action-value function to summarise future dynamics of the underlying MDP, generalising existing approaches.
640	First Order Motion Model for Image Animation	Aliaksandr Siarohin, Stephane Lathuillere, Sergey Tulyakov, Elisa Ricci, Nicu Sebe	Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video. Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
641	Tensor Monte Carlo: Particle Methods for the GPU era	Laurence Aitchison	To address these issues, we developed tensor Monte-Carlo (TMC) which gives exponentially many importance samples by separately drawing samples for each of the latent variables, then averaging over all possible combinations.
642	Unsupervised Emergence of Egocentric Spatial Structure from Sensorimotor Prediction	Alban Laflaqui?re, Michael Garcia Ortiz	We propose a simple sensorimotor predictive scheme, apply it to different agents and types of exploration, and evaluate the pertinence of these hypotheses.
643	Learning from Label Proportions with Generative Adversarial Networks	Jiabin Liu, Bo Wang, Zhiquan Qi, YingJie Tian, Yong Shi	In this paper, we leverage generative adversarial networks (GANs) to derive an effective algorithm LLP-GAN for learning from label proportions (LLP), where only the bag-level proportional information in labels is available.
644	Efficient and Thrifty Voting by Any Means Necessary	Debmalya Mandal, Ariel D. Procaccia, Nisarg Shah, David Woodruff	We take an unorthodox view of voting by expanding the design space to include both the elicitation rule, whereby voters map their (cardinal) preferences to votes, and the aggregation rule, which transforms the reported votes into collective decisions.
645	PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation	Can Qin, Haoxuan You, Lichen Wang, C.-C. Jay Kuo, Yun Fu	In this paper, we propose a novel 3D Domain Adaptation Network for point cloud data (PointDAN).
646	ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization	Xiangyi Chen, Sijia Liu, Kaidi Xu, Xingguo Li, Xue Lin, Mingyi Hong, David Cox	In this paper, we propose a zeroth-order AdaMM (ZO-AdaMM) algorithm, that generalizes AdaMM to the gradient-free regime.
647	Non-Stationary Markov Decision Processes, a Worst-Case Approach using Model-Based Reinforcement Learning	Erwan Lecarpentier, Emmanuel Rachelson	This work tackles the problem of robust zero-shot planning in non-stationary stochastic environments.
648	Depth-First Proof-Number Search with Heuristic Edge Cost and Application to Chemical Synthesis Planning	Akihiro Kishimoto, Beat Buesser, Bei Chen, Adi Botea	We address this disadvantage of DFPN in RA with a novel approach to combine DFPN with Heuristic Edge Initialization.
649	Toward a Characterization of Loss Functions for Distribution Learning	Nika Haghtalab, Cameron Musco, Bo Waggoner	In this work we study loss functions for learning and evaluating probability distributions over large discrete domains.
650	Coresets for Archetypal Analysis	Sebastian Mair, Ulf Brefeld	In this paper, we propose efficient coresets for archetypal analysis.
651	Emergence of Object Segmentation in Perturbed Generative Models	Adam Bielski, Paolo Favaro	We introduce a novel framework to build a model that can learn how to segment objects from a collection of images without any human annotation.
652	Optimal Sparse Decision Trees	Xiyang Hu, Cynthia Rudin, Margo Seltzer	This work introduces the first practical algorithm for optimal decision trees for binary variables.
653	Escaping from saddle points on Riemannian manifolds	Yue Sun, Nicolas Flammarion, Maryam Fazel	We consider minimizing a nonconvex, smooth function $f$ on a Riemannian manifold $\mathcal{M}$.
654	Multi-source Domain Adaptation for Semantic Segmentation	Sicheng Zhao, Bo Li, Xiangyu Yue, Yang Gu, Pengfei Xu, Runbo Hu, Hua Chai, Kurt Keutzer	In this paper, we propose to investigate multi-source domain adaptation for semantic segmentation.
655	Localized Structured Prediction	Carlo Ciliberto, Francis Bach, Alessandro Rudi	In this work we propose the first theoretical framework to deal with part-based data from a general perspective and study a novel method within the setting of statistical learning theory.
656	Nonzero-sum Adversarial Hypothesis Testing Games	Sarath Yasodharan, Patrick Loiseau	We study nonzero-sum hypothesis testing games that arise in the context of adversarial classification, in both the Bayesian as well as the Neyman-Pearson frameworks.
657	Manifold-regression to predict from MEG/EEG brain signals without source modeling	David Sabbagh, Pierre Ablin, Gael Varoquaux, Alexandre Gramfort, Denis A. Engemann	In this article, we focus on the task of regression with rank-reduced covariance matrices.
658	Modeling Tabular data using Conditional GAN	Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni	We design CTGAN, which uses a conditional generative adversarial network to address these challenges.
659	Normalization Helps Training of Quantized LSTM	Lu Hou, Jinhua Zhu, James Kwok, Fei Gao, Tao Qin, Tie-Yan Liu	In this paper, we first show theoretically that training a quantized LSTM is difficult because quantization makes the exploding gradient problem more severe, particularly when the LSTM weight matrices are large. We then show that the popularly used weight/layer/batch normalization schemes can help stabilize the gradient magnitude in training quantized LSTMs.
660	Trajectory of Alternating Direction Method of Multipliers and Adaptive Acceleration	Clarice Poon, Jingwei Liang	By studying the geometric properties of ADMM, we discuss the limitations of current inertial accelerated ADMM and then present and analyze an adaptive acceleration scheme for the method.
661	Deep Scale-spaces: Equivariance Over Scale	Daniel Worrall, Max Welling	We introduce deep scale-spaces, a generalization of convolutional neural networks, exploiting the scale symmetry structure of conventional image recognition tasks.
662	GRU-ODE-Bayes: Continuous Modeling of Sporadically-Observed Time Series	Edward De Brouwer, Jaak Simm, Adam Arany, Yves Moreau	To address these challenges, we propose (1) a continuous-time version of the Gated Recurrent Unit, building upon the recent Neural Ordinary Differential Equations (Chen et al., 2018), and (2) a Bayesian update network that processes the sporadic observations.
663	Estimating Convergence of Markov chains with L-Lag Couplings	Niloy Biswas, Pierre E. Jacob, Paul Vanetti	We introduce L-lag couplings to generate computable, non-asymptotic upper bound estimates for the total variation or the Wasserstein distance of general Markov chains.
664	Learning-Based Low-Rank Approximations	Piotr Indyk, Ali Vakilian, Yang Yuan	We introduce a “learning-based” algorithm for the low-rank decomposition problem: given an $n \times d$ matrix $A$, and a parameter $k$, compute a rank-$k$ matrix $A’$ that minimizes the approximation loss $\\|A-A’\\|_F$.
665	Implicit Regularization in Deep Matrix Factorization	Sanjeev Arora, Nadav Cohen, Wei Hu, Yuping Luo	We study the implicit regularization of gradient descent over deep linear neural networks for matrix completion and sensing, a model referred to as deep matrix factorization.
666	List-decodable Linear Regression	Sushrut Karmalkar, Adam Klivans, Pravesh Kothari	To solve the problem we introduce a new framework for list-decodable learning that strengthens the “identifiability to algorithms” paradigm based on the sum-of-squares method.
667	Learning elementary structures for 3D shape generation and matching	Theo Deprelle, Thibault Groueix, Matthew Fisher, Vladimir Kim, Bryan Russell, Mathieu Aubry	More precisely, we present two complementary approaches to learn elementary structures in a deep learning framework: (i) continuous surface deformation learning and (ii) 3D structure points learning.
668	On the Hardness of Robust Classification	Pascale Gourdeau, Varun Kanade, Marta Kwiatkowska, James Worrell	In this paper we study the feasibility of robust learning from the perspective of computational learning theory, considering both sample and computational complexity.
669	Foundations of Comparison-Based Hierarchical Clustering	Debarghya Ghoshdastidar, Micha?l Perrot, Ulrike von Luxburg	We address the classical problem of hierarchical clustering, but in a framework where one does not have access to a representation of the objects or their pairwise similarities.
670	What the Vec? Towards Probabilistically Grounded Embeddings	Carl Allen, Ivana Balazevic, Timothy Hospedales	We show that different interactions of PMI vectors encode semantic properties that can be captured in low dimensional word embeddings by suitable projection, theoretically explaining why the embeddings of W2V and Glove work, and, in turn, revealing an interesting mathematical interconnection between the semantic relationships of relatedness, similarity, paraphrase and analogy.
671	Minimizers of the Empirical Risk and Risk Monotonicity	Marco Loog, Tom Viering, Alexander Mey	Our work introduces the formal notion of risk monotonicity, which asks the risk to not deteriorate with increasing training set sizes in expectation over the training samples.
672	Explicit Planning for Efficient Exploration in Reinforcement Learning	Liangpeng Zhang, Ke Tang, Xin Yao	We argue that explicit planning for exploration can help alleviate such a problem, and propose a Value Iteration for Exploration Cost (VIEC) algorithm which computes the optimal exploration scheme by solving an augmented MDP.
673	Lower Bounds on Adversarial Robustness from Optimal Transport	Arjun Nitin Bhagoji, Daniel Cullina, Prateek Mittal	In this paper, we use optimal transport to characterize the maximum achievable accuracy in an adversarial classification scenario.
674	Neural Spline Flows	Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios	Building upon recent work, we propose a fully-differentiable module based on monotonic rational-quadratic splines, which enhances the flexibility of both coupling and autoregressive transforms while retaining analytic invertibility.
675	Phase Transitions and Cyclic Phenomena in Bandits with Switching Constraints	David Simchi-Levi, Yunzong Xu	We consider the classical stochastic multi-armed bandit problem with a constraint on the total cost incurred by switching between actions.
676	Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization	Koen Helwegen, James Widdicombe, Lukas Geiger, Zechun Liu, Kwang-Ting Cheng, Roeland Nusselder	In this paper, we argue that these latent weights cannot be treated analogously to weights in real-valued networks.
677	Nonlinear scaling of resource allocation in sensory bottlenecks	Laura Rose Edmondson, Alejandro Jimenez Rodriguez, Hannes P. Saal	Here, we show analytically and numerically that resource allocation scales nonlinearly in efficient coding models that maximize information transfer, when inputs arise from separate regions with different receptor densities.
678	Constrained Reinforcement Learning Has Zero Duality Gap	Santiago Paternain, Luiz Chamon, Miguel Calvo-Fullana, Alejandro Ribeiro	This work provides theoretical support to these approaches by establishing that despite its non-convexity, this problem has zero duality gap, i.e., it can be solved exactly in the dual domain, where it becomes convex.
679	Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules	Niklas Gebauer, Michael Gastegger, Kristof Sch?tt	Here, we introduce a generative neural network for 3d point sets that respects the rotational invariance of the targeted structures.
680	An adaptive nearest neighbor rule for classification	Akshay Balsubramani, Sanjoy Dasgupta, yoav Freund, Shay Moran	We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter.
681	Coresets for Clustering with Fairness Constraints	Lingxiao Huang, Shaofeng Jiang, Nisheeth Vishnoi	The main contribution of this paper is an approach to clustering with fairness constraints that involve {\em multiple, non-disjoint} attributes, that is {\em also scalable}.
682	PerspectiveNet: A Scene-consistent Image Generator for New View Synthesis in Real Indoor Environments	Ben Graham, David Novotny, Jeremy Reizenstein	Given a set of a reference RGBD views of an indoor environment, and a new viewpoint, our goal is to predict the view from that location.
683	MAVEN: Multi-Agent Variational Exploration	Anuj Mahajan, Tabish Rashid, Mikayel Samvelyan, Shimon Whiteson	In this paper, we analyse value-based methods that are known to have superior performance in complex environments.
684	Competitive Gradient Descent	Florian Schaefer, Anima Anandkumar	We introduce a new algorithm for the numerical computation of Nash equilibria of competitive two-player games.
685	Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses	Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi	In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression.
686	Continual Unsupervised Representation Learning	Dushyant Rao, Francesco Visin, Andrei Rusu, Razvan Pascanu, Yee Whye Teh, Raia Hadsell	In this work, we propose an approach (CURL) to tackle a more general problem that we will refer to as unsupervised continual learning.
687	Self-Routing Capsule Networks	Taeyoung Hahn, Myeongjang Pyeon, Gunhee Kim	In this work, we propose a novel and surprisingly simple routing strategy called self-routing where each capsule is routed independently by its subordinate routing network.
688	The Parameterized Complexity of Cascading Portfolio Scheduling	Eduard Eiben, Robert Ganian, Iyad Kanj, Stefan Szeider	In this paper we study the parameterized complexity of this problem and establish its fixed-parameter tractability by utilizing structural properties of the success relation between algorithms and test instances.
689	Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards	Falcon Dai, Matthew Walter	We propose a new complexity measure for Markov decision processes (MDPs), the maximum expected hitting cost (MEHC).
690	Bipartite expander Hopfield networks as self-decoding high-capacity error correcting codes	Rishidev Chaudhuri, Ila Fiete	We prove that it is possible to construct an associative content-addressable network that combines the properties of strong error correcting codes and Hopfield networks: it simultaneously possesses exponentially many stable states, these states are robust enough, with large enough basins of attraction that they can be correctly recovered despite errors in a finite fraction of all nodes, and the errors are intrinsically corrected by the network’s own dynamics.
691	Sequence Modeling with Unconstrained Generation Order	Dmitrii Emelianenko, Elena Voita, Pavel Serdyukov	In contrast, we propose a more general model that can generate the output sequence by inserting tokens in any arbitrary order.
692	Probabilistic Logic Neural Networks for Reasoning	Meng Qu, Jian Tang	In this paper, we propose the probabilistic Logic Neural Network (pLogicNet), which combines the advantages of both methods.
693	A Polynomial Time Algorithm for Log-Concave Maximum Likelihood via Locally Exponential Families	Brian Axelrod, Ilias Diakonikolas, Alistair Stewart, Anastasios Sidiropoulos, Gregory Valiant	Specifically, we present an algorithm which, given $n$ points in $\mathbb{R}^d$ and an accuracy parameter $\eps>0$, runs in time $\poly(n,d,1/\eps),$ and returns a log-concave distribution which, with high probability, has the property that the likelihood of the $n$ points under the returned distribution is at most an additive $\eps$ less than the maximum likelihood that could be achieved via any log-concave distribution.
694	A Unifying Framework for Spectrum-Preserving Graph Sparsification and Coarsening	Gecia Bravo Hermsdorff, Lee Gunderson	In this work, we provide a unifying framework that captures both of these operations, allowing one to simultaneously sparsify and coarsen a graph while preserving its large-scale structure.
695	Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond	Xuechen Li, Yi Wu, Lester Mackey	In this paper, we establish the convergence rate of sampling algorithms obtained by discretizing smooth It\^o diffusions exhibiting fast $2$-Wasserstein contraction, based on local deviation properties of the integration scheme.
696	The Implicit Bias of AdaGrad on Separable Data	Qian Qian, Xiaoyuan Qian	We study the implicit bias of AdaGrad on separable linear classification problems.
697	On two ways to use determinantal point processes for Monte Carlo integration	Guillaume Gautier, R?mi Bardenet, Michal Valko	In this paper, we first take the EZ estimator out of the cellar, and analyze it using modern arguments. Second, we provide an efficient implementation to sample exactly a particular multidimensional DPP called multivariate Jacobi ensemble.
698	LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition	Zuxuan Wu, Caiming Xiong, Yu-Gang Jiang, Larry S. Davis	This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios.
699	How degenerate is the parametrization of neural networks with the ReLU activation function?	Dennis Maximilian Elbr?chter, Julius Berner, Philipp Grohs	We present pathologies which prevent inverse stability in general, and, for shallow networks, proceed to establish a restricted space of parametrizations on which we have inverse stability w.r.t. to a Sobolev norm.
700	Spike-Train Level Backpropagation for Training Deep Recurrent Spiking Neural Networks	Wenrui Zhang, Peng Li	To enable supervised training of RSNNs under a well-defined loss function, we present a novel Spike-Train level RSNNs Backpropagation (ST-RSBP) algorithm for training deep RSNNs.
701	Re-examination of the Role of Latent Variables in Sequence Modeling	Guokun Lai, Zihang Dai, Yiming Yang, Shinjae Yoo	Our analysis reveals that under the restriction of fully factorized output distribution in previous evaluations, the stochastic variants were implicitly leveraging intra-step correlation but the deterministic recurrent baselines were prohibited to do so, resulting in an unfair comparison.
702	Max-value Entropy Search for Multi-Objective Bayesian Optimization	Syrine Belakaria, Aryan Deshwal, Janardhan Rao Doppa	We propose a novel approach referred to as Max-value Entropy Search for Multi-objective Optimization (MESMO) to solve this problem.
703	Stein Variational Gradient Descent With Matrix-Valued Kernels	Dilin Wang, Ziyang Tang, Chandrajit Bajaj, Qiang Liu	In this work, we enhance SVGD by leveraging preconditioning matrices, such as the Hessian and Fisher information matrix, to incorporate geometric information into SVGD updates.
704	Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms	Shahana Ibrahim, Xiao Fu, Nikolaos Kargas, Kejun Huang	We propose an algebraic algorithm reminiscent of convex geometry-based structured matrix factorization to solve the model identification problem efficiently, and an identifiability-enhanced algorithm for handling more challenging and critical scenarios.
705	Detecting Overfitting via Adversarial Examples	Roman Werpachowski, Andr?s Gy?rgy, Csaba Szepesvari	We propose a new hypothesis test that uses only the original test data to detect overfitting.
706	A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment	Felix Leibfried, Sergio Pascual-D?az, Jordi Grau-Moya	In this paper, we investigate the use of empowerment in the presence of an extrinsic reward signal.
707	SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies	Seyed Kamyar Seyed Ghasemipour, Shixiang (Shane) Gu, Richard Zemel	In this work, we propose SMILe, a scalable framework for Meta Inverse Reinforcement Learning (Meta-IRL) based on maximum entropy IRL, which can learn high-quality policies from few demonstrations.
708	Towards Understanding the Importance of Shortcut Connections in Residual Networks	Tianyi Liu, Minshuo Chen, Mo Zhou, Simon S. Du, Enlu Zhou, Tuo Zhao	In this paper, we study a two-layer non-overlapping convolutional ResNet.
709	Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains	Elliot Meyerson, Risto Miikkulainen	To approach this question, deep multi-task learning is extended in this paper to the setting where there is no obvious overlap between task architectures.
710	Solving Interpretable Kernel Dimensionality Reduction	Chieh Wu, Jared Miller, Yale Chang, Mario Sznaier, Jennifer Dy	This work extends the theoretical guarantees of ISM to an entire family of kernels, thereby empowering ISM to solve any kernel method of the same objective.
711	Interaction Hard Thresholding: Consistent Sparse Quadratic Regression in Sub-quadratic Time and Space	Shuo Yang, Yanyao Shen, Sujay Sanghavi	In this paper, we provide a new algorithm – Interaction Hard Thresholding (IntHT) which is the first one to provably accurately solve this problem in sub-quadratic time and space.
712	A Model to Search for Synthesizable Molecules	John Bradshaw, Brooks Paige, Matt J. Kusner, Marwin Segler, Jos? Miguel Hern?ndez-Lobato	We propose a new molecule generation model, mirroring a more realistic real-world process, where (a) reactants are selected, and (b) combined to form more complex molecules.
713	Post training 4-bit quantization of convolutional networks for rapid-deployment	Ron Banner, Yury Nahshan, Daniel Soudry	This paper introduces the first practical 4-bit post training quantization approach: it does not involve training the quantized model (fine-tuning), nor it requires the availability of the full dataset.
714	Fast and Flexible Multi-Task Classification using Conditional Neural Adaptive Processes	James Requeima, Jonathan Gordon, John Bronskill, Sebastian Nowozin, Richard E. Turner	The goal of this paper is to design image classification systems that, after an initial multi-task training phase, can automatically adapt to new tasks encountered at test time.
715	Differentially Private Anonymized Histograms	Ananda Theertha Suresh	Motivated by these applications, we propose the first differentially private mechanism to release anonymized histograms that achieves near-optimal privacy utility trade-off both in terms of number of items and the privacy parameter.
716	Dynamic Local Regret for Non-convex Online Forecasting	Sergul Aydore, Tianhao Zhu, Dean P. Foster	We introduce a local regret for non-convex models in a dynamic environment.
717	Learning Local Search Heuristics for Boolean Satisfiability	Emre Yolcu, Barnabas Poczos	We present an approach to learn SAT solver heuristics from scratch through deep reinforcement learning with a curriculum.
718	Provably Efficient Q-Learning with Low Switching Cost	Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang	Our main contribution, Q-Learning with UCB2 exploration, is a model-free algorithm for $H$-step episodic MDP that achieves sublinear regret whose local switching cost in $K$ episodes is $O(H^3SA\log K)$, and we provide a lower bound of $\Omega(HSA)$ on the local switching cost for any no-regret algorithm.
719	Solving graph compression via optimal transport	Vikas Garg, Tommi Jaakkola	We propose a new approach to graph compression by appeal to optimal transport.
720	PyTorch: An Imperative Style, High-Performance Deep Learning Library	Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala	In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture.
721	Stability of Graph Scattering Transforms	Fernando Gama, Alejandro Ribeiro, Joan Bruna	In this work, we extend scattering transforms to network data by using multi-resolution graph wavelets, whose computation can be obtained by means of graph convolutions.
722	A Debiased MDI Feature Importance Measure for Random Forests	Xiao Li, Yu Wang, Sumanta Basu, Karl Kumbier, Bin Yu	In this paper, we address the feature selection bias of MDI from both theoretical and methodological perspectives.
723	Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle	Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang	The current paper presents a provably efficient algorithm for Q-learning with linear function approximation.
724	Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models	Shanshan Wu, Sujay Sanghavi, Alexandros G. Dimakis	The algorithm is (appropriately regularized) maximum conditional log-likelihood, which involves solving a convex program for each node; for Ising models this is -constrained logistic regression, while for more general alphabets an group-norm constraint needs to be used.
725	Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural Networks	Guodong Zhang, James Martens, Roger B. Grosse	In this work, we analyze for the first time the speed of convergence to global optimum for natural gradient descent on non-linear neural networks with the squared error loss.
726	Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices	Santosh Vempala, Andre Wibisono	We study the Unadjusted Langevin Algorithm (ULA) for sampling from a probability distribution $\nu = e^{-f}$ on $\R^n$.
727	Learning Distributions Generated by One-Layer ReLU Networks	Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi	We consider the problem of estimating the parameters of a $d$-dimensional rectified Gaussian distribution from i.i.d. samples.
728	Large-scale optimal transport map estimation using projection pursuit	Cheng Meng, Yuan Ke, Jingyi Zhang, Mengrui Zhang, Wenxuan Zhong, Ping Ma	Instead, we propose an estimation method of large-scale OTM by combining the idea of projection pursuit regression and sufficient dimension reduction.
729	A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning	Nicolas Carion, Nicolas Usunier, Gabriel Synnaeve, Alessandro Lazaric	By leveraging this property, we introduce a novel structured prediction approach to assign agents to tasks.
730	On Exact Computation with an Infinitely Wide Neural Net	Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Russ R. Salakhutdinov, Ruosong Wang	The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which we call Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm.
731	Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning	Gregory Farquhar, Shimon Whiteson, Jakob Foerster	Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation.
732	Chirality Nets for Human Pose Regression	Raymond Yeh, Yuan-Ting Hu, Alexander Schwing	We propose Chirality Nets, a family of deep nets that is equivariant to the “chirality transform,” i.e., the transformation to create a chiral pair.
733	Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds	Minshuo Chen, Haoming Jiang, Wenjing Liao, Tuo Zhao	In this paper, we prove that neural networks can efficiently approximate functions supported on low dimensional manifolds.
734	Fast Decomposable Submodular Function Minimization using Constrained Total Variation	Senanayak Sesh Kumar Karri, Francis Bach, Thomas Pock	In this paper, we consider a modified convex problem requiring constrained version of the total variation oracles that can be solved with significantly fewer calls to the simple minimization oracles.
735	Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model	Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George Dahl, Chris Shallue, Roger B. Grosse	In this work, we study how the critical batch size changes based on properties of the optimization algorithm, including acceleration and preconditioning, through two different lenses: large scale experiments and analysis using a simple noisy quadratic model (NQM).
736	Spherical Text Embedding	Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance Kaplan, Jiawei Han	To close this gap, we propose a spherical generative model based on which unsupervised word and paragraph embeddings are jointly learned.
737	Mobius Transformation for Fast Inner Product Search on Graph	Zhixin Zhou, Shulong Tan, Zhaozhuo Xu, Ping Li	We present a fast search on graph algorithm for Maximum Inner Product Search (MIPS).
738	Hyperbolic Graph Neural Networks	Qi Liu, Maximilian Nickel, Douwe Kiela	Motivated by recent advances in geometric representation learning, we propose a novel GNN architecture for learning representations on Riemannian manifolds with differentiable exponential and logarithmic maps.
739	Average Individual Fairness: Algorithms, Generalization and Experiments	Saeed Sharifi-Malvajerdi, Michael Kearns, Aaron Roth	We propose a new family of fairness definitions for classification problems that combine some of the best properties of both statistical and individual notions of fairness.
740	Fixing the train-test resolution discrepancy	Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Herve Jegou	This paper first shows that existing augmentations induce a significant discrepancy between the size of the objects seen by the classifier at train and test time: in fact, a lower train resolution improves the classification at test time! We then propose a simple strategy to optimize the classifier performance, that employs different train and test resolutions.
741	Modeling Dynamic Functional Connectivity with Latent Factor Gaussian Processes	Lingge Li, Dustin Pluta, Babak Shahbaba, Norbert Fortin, Hernando Ombao, Pierre Baldi	We present a latent factor Gaussian process model which addresses these challenges by learning a parsimonious representation of connectivity dynamics.
742	Manipulating a Learning Defender and Ways to Counteract	Jiarui Gan, Qingyu Guo, Long Tran-Thanh, Bo An, Michael Wooldridge	In this paper, we show that, however, these algorithms can be easily manipulated if the attacker responds untruthfully.
743	Learning-In-The-Loop Optimization: End-To-End Control And Co-Design Of Soft Robots Through Learned Deep Latent Representations	Andrew Spielberg, Allan Zhao, Yuanming Hu, Tao Du, Wojciech Matusik, Daniela Rus	We present a learning-in-the-loop co-optimization algorithm in which a latent state representation is learned as the robot figures out how to solve the task.
744	Learning to Infer Implicit Surfaces without 3D Supervision	Shichen Liu, Shunsuke Saito, Weikai Chen, Hao Li	To this end, we propose a novel ray-based field probing technique for efficient image-to-field supervision, as well as a general geometric regularizer for implicit surfaces, which provides natural shape priors in unconstrained regions.
745	Fast and Accurate Least-Mean-Squares Solvers	Ibrahim Jubran, Alaa Maalouf, Dan Feldman	We suggest an algorithm that gets a finite set of $n$ $d$-dimensional real vectors and returns a weighted subset of $d+1$ vectors whose sum is \emph{exactly} the same.
746	Certifiable Robustness to Graph Perturbations	Aleksandar Bojchevski, Stephan G?nnemann	We propose the first method for verifying certifiable (non-)robustness to graph perturbations for a general class of models that includes graph neural networks and label/feature propagation.
747	Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay	Frederic Koehler	We show that under a natural initialization, BP converges quickly to the global optimum of the Bethe free energy for Ising models on arbitrary graphs, as long as the Ising model is \emph{ferromagnetic} (i.e. neighbors prefer to be aligned).
748	Paradoxes in Fair Machine Learning	Paul Goelz, Anson Kahng, Ariel D. Procaccia	We extend equalized odds to the setting of cardinality-constrained fair classification, where we have a bounded amount of a resource to distribute.
749	Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost	Zhuoran Yang, Yongxin Chen, Mingyi Hong, Zhaoran Wang	To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning.
750	The spiked matrix model with generative priors	Benjamin Aubin, Bruno Loureiro, Antoine Maillard, Florent Krzakala, Lenka Zdeborov?	In this paper we study spiked matrix models, where a low-rank matrix is observed through a noisy channel.
751	Gradient Dynamics of Shallow Univariate ReLU Networks	Francis Williams, Matthew Trager, Daniele Panozzo, Claudio Silva, Denis Zorin, Joan Bruna	We present a theoretical and empirical study of the gradient dynamics of overparameterized shallow ReLU networks with one-dimensional input, solving least-squares interpolation.
752	Robust and Communication-Efficient Collaborative Learning	Amirhossein Reisizadeh, Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani	In this paper, we tackle these bottlenecks by proposing a novel decentralized and gradient-based optimization algorithm named as QuanTimed-DSGD.
753	Multiclass Learning from Contradictions	Sauptik Dhar, Vladimir Cherkassky, Mohak Shah	We introduce the notion of learning from contradictions, a.k.a Universum learning, for multiclass problems and propose a novel formulation for multiclass universum SVM (MU-SVM).
754	Learning from Trajectories via Subgoal Discovery	Sujoy Paul, Jeroen Vanbaar, Amit Roy-Chowdhury	In this paper, we propose an approach which uses the expert trajectories and learns to decompose the complex main task into smaller sub-goals.
755	Distributed Low-rank Matrix Factorization With Exact Consensus	Zhihui Zhu, Qiuwei Li, Xinshuo Yang, Gongguo Tang, Michael B. Wakin	In this paper, we study low-rank matrix factorization in the distributed setting, where local variables at each node encode parts of the overall matrix factors, and consensus is encouraged among certain such variables.
756	Online Normalization for Training Neural Networks	Vitaliy Chiley, Ilya Sharapov, Atli Kosson, Urs Koster, Ryan Reece, Sofia Samaniego de la Fuente, Vishal Subbiah, Michael James	We resolve a theoretical limitation of Batch Normalization by introducing an unbiased technique for computing the gradient of normalized activations.
757	The Synthesis of XNOR Recurrent Neural Networks with Stochastic Logic	Arash Ardakani, Zhengyun Ji, Amir Ardakani, Warren Gross	In this paper, we propose a method that converts all the multiplications in LSTMs to XNOR operations using stochastic computing.
758	An adaptive Mirror-Prox method for variational inequalities with singular operators	Kimon Antonakopoulos, Veronica Belmega, Panayotis Mertikopoulos	To address this issue, we propose a novel smoothness condition which we call Bregman smoothness, and which relates the variation of the operator to that of a suitably chosen Bregman function.
759	N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules	Shengchao Liu, Mehmet F. Demirel, Yingyu Liang	This paper introduces the N-gram graph, a simple unsupervised representation for molecules.
760	Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory	Bin Hu, Usman Syed	In this paper, we provide a unified analysis of temporal difference learning algorithms with linear function approximators by exploiting their connections to Markov jump linear systems (MJLS).
761	Facility Location Problem in Differential Privacy Model Revisited	Yunus Esencayi, Marco Gaboardi, Shi Li, Di Wang	In this paper we study the facility location problem in the model of differential privacy (DP) with uniform facility cost.
762	Energy-Inspired Models: Learning with Sampler-Induced Distributions	John Lawson, George Tucker, Bo Dai, Rajesh Ranganath	This yields a class of energy-inspired models (EIMs) that incorporate learned energy functions while still providing exact samples and tractable log-likelihood lower bounds.
763	Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator	Karl Krauth, Stephen Tu, Benjamin Recht	We study the sample complexity of approximate policy iteration (PI) for the Linear Quadratic Regulator (LQR), building on a recent line of work using LQR as a testbed to understand the limits of reinforcement learning (RL) algorithms on continuous control tasks.
764	A Universally Optimal Multistage Accelerated Stochastic Gradient Method	Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar	We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal rate both in the deterministic and stochastic case and operates without knowledge of noise characteristics.
765	From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction	Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen Baccus, Surya Ganguli	We develop such a systematic approach by combining dimensionality reduction and modern attribution methods for determining the relative importance of interneurons for specific visual computations.
766	Large Memory Layers with Product Keys	Guillaume Lample, Alexandre Sablayrolles, Marc’Aurelio Ranzato, Ludovic Denoyer, Herve Jegou	This paper introduces a structured memory which can be easily integrated into a neural network.
767	Learning Deterministic Weighted Automata with Queries and Counterexamples	Gail Weiss, Yoav Goldberg, Eran Yahav	We present an algorithm for reconstruction of a probabilistic deterministic finite automaton (PDFA) from a given black-box language model, such as a recurrent neural network (RNN).
768	Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent	Jaehoon Lee, Lechao Xiao, Samuel Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, Jeffrey Pennington	In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.
769	Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals	Surbhi Goel, Sushrut Karmalkar, Adam Klivans	We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary).
770	Visualizing and Measuring the Geometry of BERT	Emily Reif, Ann Yuan, Martin Wattenberg, Fernanda B. Viegas, Andy Coenen, Adam Pearce, Been Kim	This paper describes qualitative and quantitative investigations of one particularly effective model, BERT.
771	Self-Critical Reasoning for Robust Visual Question Answering	Jialin Wu, Raymond Mooney	To address this issue, we introduce a self-critical training objective that ensures that visual explanations of correct answers match the most influential image regions more than other competitive answer candidates.
772	Learning to Screen	Alon Cohen, Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Shay Moran	We model such scenarios as an assignment problem between items (candidates) and categories (departments): the items arrive one-by-one in an online manner, and upon processing each item the algorithm decides, based on its value and the categories it can be matched with, whether to retain or discard it (this decision is irrevocable).
773	A Communication Efficient Stochastic Multi-Block Alternating Direction Method of Multipliers	Hao Yu	In this paper, we propose a new parallel multi-block stochastic ADMM for distributed stochastic optimization, where each node is only required to perform simple stochastic gradient descent updates.
774	A Little Is Enough: Circumventing Defenses For Distributed Learning	Gilad Baruch, Moran Baruch, Yoav Goldberg	We observe that if the empirical variance between the gradients of workers is high enough, an attacker could take advantage of this and launch a non-omniscient attack that operates within the population variance.
775	Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks	Gunjan Verma, Ananthram Swami	In this paper, we propose a fundamentally different approach which instead changes the way the output is represented and decoded.
776	A Robust Non-Clairvoyant Dynamic Mechanism for Contextual Auctions	Yuan Deng, S?bastien Lahaie, Vahab Mirrokni	In this paper, we consider the problem of contextual auctions where the seller gradually learns a model of the buyer’s valuation as a function of the context (e.g., item features) and seeks a pricing policy that optimizes revenue.
777	Finite-Sample Analysis for SARSA with Linear Function Approximation	Shaofeng Zou, Tengyu Xu, Yingbin Liang	In this paper, we develop a novel technique to explicitly characterize the stochastic bias of a type of stochastic approximation procedures with time-varying Markov transition kernels.
778	Who is Afraid of Big Bad Minima? Analysis of gradient-flow in spiked matrix-tensor models	Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Lenka Zdeborov?	Here we present a quantitative theory explaining this behaviour in a spiked matrix-tensor model.
779	Graph Structured Prediction Energy Networks	Colin Graber, Alexander Schwing	To address this shortcoming, we introduce ‘Graph Structured Prediction Energy Networks,’ for which we develop inference techniques that allow to both model explicit local and implicit higher-order correlations while maintaining tractability of inference.
780	Private Learning Implies Online Learning: An Efficient Reduction	Alon Gonen, Elad Hazan, Shay Moran	In this paper we resolve this open question in the context of pure differential privacy.
781	Graph Agreement Models for Semi-Supervised Learning	Otilia Stretcu, Krishnamurthy Viswanathan, Dana Movshovitz-Attias, Emmanouil Platanios, Sujith Ravi, Andrew Tomkins	To address this, we propose Graph Agreement Models (GAM), which introduces an auxiliary model that predicts the probability of two nodes sharing the same label as a learned function of their features.
782	Latent distance estimation for random geometric graphs	Ernesto Araya Valdivia, De Castro Yohann	We introduce a spectral estimator of the pairwise distance between latent points and we prove that its rate of convergence is the same as the nonparametric estimation of a function on $\mathbb{S}^{d-1}$, up to a logarithmic factor.
783	Seeing the Wind: Visual Wind Speed Prediction with a Coupled Convolutional and Recurrent Neural Network	Jennifer Cardona, Michael Howland, John Dabiri	Here, we demonstrate a coupled convolutional neural network and recurrent neural network architecture that extracts the wind speed encoded in visually recorded flow-structure interactions of a flag and tree in naturally occurring wind.
784	The Functional Neural Process	Christos Louizos, Xiahan Shi, Klamer Schutte, Max Welling	We present a new family of exchangeable stochastic processes, the Functional Neural Processes (FNPs).
785	Recurrent Registration Neural Networks for Deformable Image Registration	Robin Sandk?hler, Simon Andermatt, Grzegorz Bauman, Sylvia Nyilas, Christoph Jud, Philippe C. Cattin	We reformulate the pairwise registration problem as a recursive sequence of successive alignments.
786	Unsupervised State Representation Learning in Atari	Ankesh Anand, Evan Racah, Sherjil Ozair, Yoshua Bengio, Marc-Alexandre C?t?, R Devon Hjelm	We introduce a method that tries to learn better state representations by maximizing mutual information across spatially and temporally distinct features of a neural encoder of the observations.
787	Unlocking Fairness: a Trade-off Revisited	Michael Wick, swetasudha panda, Jean-Baptiste Tristan	We investigate fairness and accuracy, but this time under a variety of controlled conditions in which we vary the amount and type of bias.
788	Fisher Efficient Inference of Intractable Models	Song Liu, Takafumi Kanamori, Wittawat Jitkrittum, Yu Chen	In this paper, we derive a Discriminative Likelihood Estimator (DLE) from the Kullback-Leibler divergence minimization criterion implemented via density ratio estimation and a Stein operator.
789	Thompson Sampling and Approximate Inference	My Phan, Yasin Abbasi Yadkori, Justin Domke	We study the effects of approximate inference on the performance of Thompson sampling in the $k$-armed bandit problems.
790	PRNet: Self-Supervised Learning for Partial-to-Partial Registration	Yue Wang, Justin M. Solomon	We present a simple, flexible, and general framework titled Partial Registration Network (PRNet), for partial-to-partial point cloud registration.
791	Surrogate Objectives for Batch Policy Optimization in One-step Decision Making	Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans	We investigate batch policy optimization for cost-sensitive classification and contextual bandits—two related tasks that obviate exploration but require generalizing from observed rewards to action selections in unseen contexts.
792	Modelling heterogeneous distributions with an Uncountable Mixture of Asymmetric Laplacians	Axel Brando, Jose A. Rodriguez, Jordi Vitria, Alberto Rubio Mu?oz	In this paper, we propose a generic deep learning framework that learns an Uncountable Mixture of Asymmetric Laplacians (UMAL), which will allow us to estimate heterogeneous distributions of the output variable and shows its connections to quantile regression.
793	Learning Macroscopic Brain Connectomes via Group-Sparse Factorization	Farzane Aminmansour, Andrew Patterson, Lei Le, Yisu Peng, Daniel Mitchell, Franco Pestilli, Cesar F. Caiafa, Russell Greiner, Martha White	In this work, we explore a framework that facilitates applying learning algorithms to automatically extract brain connectomes.
794	Approximating the Permanent by Sampling from Adaptive Partitions	Jonathan Kuck, Tri Dao, Hamid Rezatofighi, Ashish Sabharwal, Stefano Ermon	We present ADAPART, a simple and efficient method for exact sampling of permutations, each associated with a weight as determined by a matrix.
795	Retrosynthesis Prediction with Conditional Graph Logic Network	Hanjun Dai, Chengtao Li, Connor Coley, Bo Dai, Le Song	In this work, we propose a new approach to this task using the Conditional Graph Logic Network, a conditional graphical model built upon graph neural networks that learns when rules from reaction templates should be applied, implicitly considering whether the resulting reaction would be both chemically feasible and strategic.
796	Procrastinating with Confidence: Near-Optimal, Anytime, Adaptive Algorithm Configuration	Robert Kleinberg, Kevin Leyton-Brown, Brendan Lucier, Devon Graham	This paper introduces a new algorithm, “Structured Procrastination with Confidence”, that preserves the near-optimality and anytime properties of Structured Procrastination while adding adaptivity.
797	Online Learning via the Differential Privacy Lens	Jacob D. Abernethy, Young Hun Jung, Chansoo Lee, Audra McMillan, Ambuj Tewari	In this paper, we use differential privacy as a lens to examine online learning in both full and partial information settings.
798	PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points	Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu	To address this challenge, we propose to adopt perspective points as a new intermediate representation for 3D object detection, defined as the 2D projections of local Manhattan 3D keypoints to locate an object; these perspective points satisfy geometric constraints imposed by the perspective projection.
799	Parameter elimination in particle Gibbs sampling	Anna Wigren, Riccardo Sven Risuleo, Lawrence Murray, Fredrik Lindsten	We focus on particle Gibbs (PG) and particle Gibbs with ancestor sampling (PGAS), improving their performance beyond that of the ideal Gibbs sampler (which they approximate) by marginalizing out one or more parameters.
800	This Looks Like That: Deep Learning for Interpretable Image Recognition	Chaofan Chen, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, Jonathan K. Su	In this work, we introduce a deep network architecture — prototypical part network (ProtoPNet), that reasons in a similar way: the network dissects the image by finding prototypical parts, and combines evidence from the prototypes to make a final classification.
801	Adaptively Aligned Image Captioning via Adaptive Attention Time	Lun Huang, Wenmin Wang, Yaxian Xia, Jie Chen	In this paper, we propose a novel attention model, namely Adaptive Attention Time (AAT), to align the source and the target adaptively for image captioning.
802	Accurate Uncertainty Estimation and Decomposition in Ensemble Learning	Jeremiah Liu, John Paisley, Marianthi-Anna Kioumourtzoglou, Brent Coull	We introduce a Bayesian nonparametric ensemble (BNE) approach that augments an existing ensemble model to account for different sources of model uncertainty.
803	Learning Bayesian Networks with Low Rank Conditional Probability Tables	Adarsh Barik, Jean Honorio	In this paper, we provide a method to learn the directed structure of a Bayesian network using data.
804	Equal Opportunity in Online Classification with Partial Feedback	Yahav Bechavod, Katrina Ligett, Aaron Roth, Bo Waggoner, Steven Z. Wu	We study an online classification problem with partial feedback in which individuals arrive one at a time from a fixed but unknown distribution, and must be classified as positive or negative.
805	Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations	Kevin Smith, Lingjie Mei, Shunyu Yao, Jiajun Wu, Elizabeth Spelke, Josh Tenenbaum, Tomer Ullman	We propose ADEPT, a model that uses a coarse (approximate geometry) object-centric representation for dynamic 3D scene understanding.
806	Neural Multisensory Scene Inference	Jae Hyun Lim, Pedro O. O. Pinheiro, Negar Rostamzadeh, Chris Pal, Sungjin Ahn	In this paper, we propose the Generative Multisensory Network (GMN) for learning latent representations of 3D scenes which are partially observable through multiple sensory modalities.
807	Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems	Young Hun Jung, Ambuj Tewari	In this paper, we analyze the performance of Thompson sampling in episodic restless bandits with unknown parameters.
808	What Can ResNet Learn Efficiently, Going Beyond Kernels?	Zeyuan Allen-Zhu, Yuanzhi Li	We prove neural networks can efficiently learn a notable class of functions, including those defined by three-layer residual networks with smooth activations, without any distributional assumption.
809	Better Transfer Learning with Inferred Successor Maps	Tamas Madarasz, Tim Behrens	We thus provide a novel algorithmic approach for multi-task learning, as well as a common normative framework that links together these different characteristics of the brain’s spatial representation.
810	Unsupervised Co-Learning on `G`-Manifolds Across Irreducible Representations	Yifeng Fan, Tingran Gao, Zhizhen Jane Zhao	We introduce a novel co-learning paradigm for manifolds naturally admitting an action of a transformation group $\mathcal{G}$, motivated by recent developments on learning a manifold from attached fibre bundle structures.
811	Defending Against Neural Fake News	Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi	We thus present a model for controllable text generation called Grover.
812	Sample Adaptive MCMC	Michael Zhu	In this paper, we present Sample Adaptive MCMC (SA-MCMC), a MCMC method based on a reversible Markov chain for \pi^{\otimes N} that uses an adaptive proposal distribution based on the current state of N points and a sequential substitution procedure with one new likelihood evaluation per iteration and at most one updated point each iteration.
813	A Stochastic Composite Gradient Method with Incremental Variance Reduction	Junyu Zhang, Lin Xiao	We propose a stochastic composite gradient method that employs incremental variance-reduced estimators for both the inner vector mapping and its Jacobian.
814	Nonparametric Density Estimation & Convergence Rates for GANs under Besov IPM Losses	Ananya Uppal, Shashank Singh, Barnabas Poczos	We study the problem of estimating a nonparametric probability distribution under a family of losses called Besov IPMs.
815	STAR-Caps: Capsule Networks with Straight-Through Attentive Routing	Karim Ahmed, Lorenzo Torresani	In this work, we propose Star-Caps a capsule-based network that exploits a straight-through attentive routing to address the drawbacks of capsule networks.
816	Limitations of Lazy Training of Two-layers Neural Network	Song Mei, Theodor Misiakiewicz, Behrooz Ghorbani, Andrea Montanari	We study the supervised learning problem under either of the following two models: (1) Feature vectors x_i are d-dimensional Gaussian and responses are y_i = f_(x_i) for f_ an unknown quadratic function; (2) Feature vectors x_i are distributed as a mixture of two d-dimensional centered Gaussians, and y_i’s are the corresponding class labels.
817	Reconciling meta-learning and continual learning with online mixtures of tasks	Ghassen Jerfel, Erin Grant, Tom Griffiths, Katherine A. Heller	We use the connection between gradient-based meta-learning and hierarchical Bayes to propose a Dirichlet process mixture of hierarchical Bayesian models over the parameters of an arbitrary parametric model such as a neural network.
818	Distributionally Robust Optimization and Generalization in Kernel Methods	Matthew Staib, Stefanie Jegelka	In this paper, we study DRO with uncertainty sets measured via maximum mean discrepancy (MMD).
819	A General Theory of Equivariant CNNs on Homogeneous Spaces	Taco S. Cohen, Mario Geiger, Maurice Weiler	We present a general theory of Group equivariant Convolutional Neural Networks (G-CNNs) on homogeneous spaces such as Euclidean space and the sphere.
820	Trivializations for Gradient-Based Optimization on Manifolds	Mario Lezcano Casado	We introduce a framework to study the transformation of problems with manifold constraints into unconstrained problems through parametrizations in terms of a Euclidean space.
821	Write, Execute, Assess: Program Synthesis with a REPL	Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, Armando Solar-Lezama	We present a neural program synthesis approach integrating components which write, execute, and assess code to navigate the search space of possible programs.
822	A Meta-Analysis of Overfitting in Machine Learning	Rebecca Roelofs, Vaishaal Shankar, Benjamin Recht, Sara Fridovich-Keil, Moritz Hardt, John Miller, Ludwig Schmidt	We conduct the first large meta-analysis of overfitting due to test set reuse in the machine learning community.
823	(Nearly) Efficient Algorithms for the Graph Matching Problem on Correlated Random Graphs	Boaz Barak, Chi-Ning Chou, Zhixian Lei, Tselil Schramm, Yueqi Sheng	We consider the graph matching/similarity problem of determining how similar two given graphs $G_0,G_1$ are and recovering the permutation $\pi$ on the vertices of $G_1$ that minimizes the symmetric difference between the edges of $G_0$ and $\pi(G_1)$.
824	Preference-Based Batch and Sequential Teaching: Towards a Unified View of Models	Farnam Mansouri, Yuxin Chen, Ara Vartanian, Jerry Zhu, Adish Singla	To better understand the connections between these different batch and sequential models, we develop a novel framework which captures the teaching process via preference functions
825	Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback	Mingrui Zhang, Lin Chen, Hamed Hassani, Amin Karbasi	In this paper, we propose three online algorithms for submodular maximization.
826	Sampling Networks and Aggregate Simulation for Online POMDP Planning	Hao(Jackson) Cui, Roni Khardon	The paper introduces a new algorithm for planning in partially observable Markov decision processes (POMDP) based on the idea of aggregate simulation.
827	Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks	Gabriele Farina, Chun Kai Ling, Fei Fang, Tuomas Sandholm	To showcase how this novel formulation can inspire new algorithms to compute EFCEs, we propose a simple subgradient descent method which exploits this formulation and structural properties of EFCEs.
828	GNNExplainer: Generating Explanations for Graph Neural Networks	Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, Jure Leskovec	Here we propose GNNExplainer, the first general, model-agnostic approach for providing interpretable explanations for predictions of any GNN-based model on any graph-based machine learning task.
829	Linear Stochastic Bandits Under Safety Constraints	Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis	In this paper, we formulate a linear stochastic multi-armed bandit problem with safety constraints that depend (linearly) on an unknown parameter vector.
830	A coupled autoencoder approach for multi-modal analysis of cell types	Rohan Gala, Nathan Gouwens, Zizhen Yao, Agata Budzillo, Osnat Penn, Bosiljka Tasic, Gabe Murphy, Hongkui Zeng, Uygar S?mb?l	We pose this issue of cross-modal alignment as an optimization problem and develop an approach based on coupled training of autoencoders as a framework for such analyses.
831	Towards Automatic Concept-based Explanations	Amirata Ghorbani, James Wexler, James Y. Zou, Been Kim	In this work, we propose principles and desiderata for \emph{concept} based explanation, which goes beyond per-sample features to identify higher level human-understandable concepts that apply across the entire dataset.
832	Deep Generative Video Compression	Salvator Lombardo, JUN HAN, Christopher Schroers, Stephan Mandt	Here, we propose an end-to-end, deep generative modeling approach to compress temporal sequences with a focus on video.
833	Budgeted Reinforcement Learning in Continuous State Space	Nicolas Carrara, Edouard Leurent, Romain Laroche, Tanguy Urvoy, Odalric-Ambrym Maillard, Olivier Pietquin	This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs.
834	Discovery of Useful Questions as Auxiliary Tasks	Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Janarthanan Rajendran, Richard L. Lewis, Junhyuk Oh, Hado P. van Hasselt, David Silver, Satinder Singh	We present a novel method for a reinforcement learning (RL) agent to discover questions formulated as general value functions or GVFs, a fairly rich form of knowledge representation.
835	Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm	Giulia Luise, Saverio Salzo, Massimiliano Pontil, Carlo Ciliberto	We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence.
836	Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias	St?phane d’Ascoli, Levent Sagun, Giulio Biroli, Joan Bruna	The aim of this work is to understand this fact through the lens of dynamics in the loss landscape.
837	Correlation clustering with local objectives	Sanchit Kalhan, Konstantin Makarychev, Timothy Zhou	In this paper, we study algorithms for minimizing \ell_q norms (q >= 1) of the disagreements vector for both arbitrary and complete graphs.
838	Multiclass Performance Metric Elicitation	Gaurush Hiranandani, Shant Boodaghians, Ruta Mehta, Oluwasanmi O. Koyejo	In this paper, we propose novel strategies for eliciting multiclass classification performance metrics using only relative preference feedback.
839	Algorithmic Analysis and Statistical Estimation of SLOPE via Approximate Message Passing	Zhiqi Bu, Jason Klusowski, Cynthia Rush, Weijie Su	In this paper, we develop an asymptotically exact characterization of the SLOPE solution under Gaussian random designs through solving the SLOPE problem using approximate message passing (AMP).
840	Explicit Explore-Exploit Algorithms in Continuous State Spaces	Mikael Henaff	We present a new model-based algorithm for reinforcement learning (RL) which consists of explicit exploration and exploitation phases, and is applicable in large or infinite state spaces.
841	ADDIS: an adaptive discarding algorithm for online FDR control with conservative nulls	Jinjin Tian, Aaditya Ramdas	In this work, we introduce a new adaptive discarding method called ADDIS that provably controls the FDR and achieves the best of both worlds: it enjoys appreciable power increase over all existing methods if nulls are conservative (the practical case), and rarely loses power if nulls are exactly uniformly distributed (the ideal case).
842	Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices	Vincent Chen, Sen Wu, Alexander J. Ratner, Jen Weng, Christopher R?	We propose Slice-based Learning, a new programming model in which the slicing function (SF), a programmer abstraction, is used to specify additional model capacity for each slice.
843	Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse	James Lucas, George Tucker, Roger B. Grosse, Mohammad Norouzi	This paper presents a simple and intuitive explanation for posterior collapse through the analysis of linear VAEs and their direct correspondence with Probabilistic PCA (pPCA).
844	Language as an Abstraction for Hierarchical Deep Reinforcement Learning	YiDing Jiang, Shixiang (Shane) Gu, Kevin P. Murphy, Chelsea Finn	In this paper, we propose to use language as the abstraction, as it provides unique compositional structure, enabling fast learning and combinatorial generalization, while retaining tremendous flexibility, making it suitable for a variety of problems.
845	Efficient online learning with kernels for adversarial large scale problems	R?mi J?z?quel, Pierre Gaillard, Alessandro Rudi	Our contributions are twofold: 1) For the Gaussian kernel, we propose to build the basis beforehand (independently of the data) through Taylor expansion.
846	A Linearly Convergent Method for Non-Smooth Non-Convex Optimization on the Grassmannian with Applications to Robust Subspace and Dictionary Learning	Zhihui Zhu, Tianyu Ding, Daniel Robinson, Manolis Tsakiris, Ren? Vidal	In this paper we show that if the objective satisfies a certain Riemannian regularity condition with respect to some point in the Grassmannian, then a Riemannian subgradient method with appropriate initialization and geometrically diminishing step size converges at a linear rate to that point.
847	ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models	Andrei Barbu, David Mayo, Julian Alverio, William Luo, Christopher Wang, Dan Gutfreund, Josh Tenenbaum, Boris Katz	We collect a large real-world test set, ObjectNet, for object recognition with controls where object backgrounds, rotations, and imaging viewpoints are random.
848	Certified Adversarial Robustness with Additive Noise	Bai Li, Changyou Chen, Wenlin Wang, Lawrence Carin	To address these limitations, we introduce a framework that is scalable and provides certified bounds on the norm of the input manipulation for constructing adversarial examples.
849	Tight Dimensionality Reduction for Sketching Low Degree Polynomial Kernels	Michela Meister, Tamas Sarlos, David Woodruff	We give a new analysis of this sketch, providing nearly optimal bounds.
850	Non-Cooperative Inverse Reinforcement Learning	Xiangyuan Zhang, Kaiqing Zhang, Erik Miehling, Tamer Basar	To describe such strategic situations, we introduce the non-cooperative inverse reinforcement learning (N-CIRL) formalism.
851	DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization	Rixon Crane, Fred Roosta	For optimization of a large sum of functions in a distributed computing environment, we present a novel communication efficient Newton-type algorithm that enjoys a variety of advantages over similar existing methods.
852	Sobolev Independence Criterion	Youssef Mroueh, Tom Sercu, Mattia Rigotti, Inkit Padhi, Cicero Nogueira dos Santos	We propose the Sobolev Independence Criterion (SIC), an interpretable dependency measure between a high dimensional random variable X and a response variable Y. SIC decomposes to the sum of feature importance scores and hence can be used for nonlinear feature selection.
853	Maximum Entropy Monte-Carlo Planning	Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans, Martin M?ller	We develop a new algorithm for online planning in large scale sequential decision problems that improves upon the worst case efficiency of UCT.
854	Learning from brains how to regularize machines	Zhe Li, Wieland Brendel, Edgar Walker, Erick Cobos, Taliah Muhammad, Jacob Reimer, Matthias Bethge, Fabian Sinz, Zachary Pitkow, Andreas Tolias	We propose to regularize CNNs using large-scale neuroscience data to learn more robust neural features in terms of representational similarity.
855	Using Statistics to Automate Stochastic Optimization	Hunter Lang, Lin Xiao, Pengchuan Zhang	Rather than changing the learning rate at each iteration, we propose an approach that automates the most common hand-tuning heuristic: use a constant learning rate until “progress stops,” then drop.
856	Zero-shot Knowledge Transfer via Adversarial Belief Matching	Paul Micaelli, Amos J. Storkey	We propose a novel method which trains a student to match the predictions of its teacher without using any data or metadata.
857	Differentiable Convex Optimization Layers	Akshay Agrawal, Brandon Amos, Shane Barratt, Stephen Boyd, Steven Diamond, J. Zico Kolter	In this paper, we propose an approach to differentiating through disciplined convex programs, a subclass of convex optimization problems used by domain-specific languages (DSLs) for convex optimization.
858	Random Tessellation Forests	Shufei Ge, Shijia Wang, Yee Whye Teh, Liangliang Wang, Lloyd Elliott	Motivated by the need for a multi-dimensional partitioning tree with non-axis aligned cuts, we propose the Random Tessellation Process, a framework that includes the Mondrian process as a special case.
859	Learning Nearest Neighbor Graphs from Noisy Distance Samples	Blake Mason, Ardhendu Tripathy, Robert Nowak	In this paper, we propose an active algorithm to find the graph with high probability and analyze its query complexity.
860	Lookahead Optimizer: k steps forward, 1 step back	Michael Zhang, James Lucas, Jimmy Ba, Geoffrey E. Hinton	In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights.
861	Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer	Wenzheng Chen, Huan Ling, Jun Gao, Edward Smith, Jaakko Lehtinen, Alec Jacobson, Sanja Fidler	In this paper, we present DIB-Render, a novel rendering framework through which gradients can be analytically computed.
862	Covariate-Powered Empirical Bayes Estimation	Nikolaos Ignatiadis, Stefan Wager	In this paper, we propose a flexible plug-in empirical Bayes estimator that synthesizes both sources of information and may leverage any black-box predictive model.
863	Understanding the Role of Momentum in Stochastic Gradient Methods	Igor Gitman, Hunter Lang, Pengchuan Zhang, Lin Xiao	In this paper, we use the general formulation of QHM to give a unified analysis of several popular algorithms, covering their asymptotic convergence conditions, stability regions, and properties of their stationary distributions.
864	A neurally plausible model for online recognition and postdiction in a dynamical environment	Li Kevin Wenliang, Maneesh Sahani	Here, we propose a general framework for neural probabilistic inference in dynamic models based on the distributed distributional code (DDC) representation of uncertainty, naturally extending the underlying encoding to incorporate implicit probabilistic beliefs about both present and past.
865	Guided Meta-Policy Search	Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn	To this end, we propose to learn a reinforcement learning procedure in a federated way, where individual off-policy learners can solve the individual meta-training tasks, and then consolidate these solutions into a single meta-learner.
866	Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling	Tengyang Xie, Yifei Ma, Yu-Xiang Wang	Motivated by the many real-world applications of reinforcement learning (RL) that require safe-policy iterations, we consider the problem of off-policy evaluation (OPE) — the problem of evaluating a new policy using the historical data obtained by different behavior policies — under the model of nonstationary episodic Markov Decision Processes (MDP) with a long horizon and a large action space.
867	Contextual Bandits with Cross-Learning	Santiago Balseiro, Negin Golrezaei, Mohammad Mahdian, Vahab Mirrokni, Jon Schneider	We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on $C$ and achieve regret $\tilde{O}(\sqrt{KT})$ (when contexts are stochastic with known distribution), $\tilde{O}(K^{1/3}T^{2/3})$ (when contexts are stochastic with unknown distribution), and $\tilde{O}(\sqrt{KT})$ (when contexts are adversarial but rewards are stochastic).
868	Evaluating Protein Transfer Learning with TAPE	Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Peter Chen, John Canny, Pieter Abbeel, Yun Song	To facilitate progress in this field, we introduce the Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.
869	A Bayesian Theory of Conformity in Collective Decision Making	Koosha Khalvati, Saghar Mirbagheri, Seongmin A. Park, Jean-Claude Dreher, Rajesh PN Rao	In this paper, we present a new Bayesian theory of collective decision making based on a simple yet most commonly observed behavior: conformity.
870	Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel	Colin Wei, Jason D. Lee, Qiang Liu, Tengyu Ma	We show that sample efficiency can indeed depend on the presence of the regularizer: we construct a simple distribution in dimensions which the optimal regularized neural net learns with samples but the NTK requires samples to learn.
871	Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation	Colin Wei, Tengyu Ma	For feedforward neural nets as well as RNNs, we obtain tighter Rademacher complexity bounds by considering additional data-dependent properties of the network: the norms of the hidden layers of the network, and the norms of the Jacobians of each layer with respect to all previous layers.
872	A Benchmark for Interpretability Methods in Deep Neural Networks	Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim	We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks.
873	Memory Efficient Adaptive Optimization	Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer	We describe an effective and flexible adaptive optimization method with greatly reduced memory overhead.
874	Dynamic Incentive-Aware Learning: Robust Pricing in Contextual Auctions	Negin Golrezaei, Adel Javanmard, Vahab Mirrokni	We propose two learning policies that are robust to such strategic behavior.
875	Convergence-Rate-Matching Discretization of Accelerated Optimization Flows Through Opportunistic State-Triggered Control	Miguel Vaquero, Jorge Cortes	This paper provides a novel approach through the idea of opportunistic state-triggered control.
876	A Unified Framework for Data Poisoning Attack to Graph-based Semi-supervised Learning	Xuanqing Liu, Si Si, Jerry Zhu, Yang Li, Cho-Jui Hsieh	In this paper, we proposed a general framework for data poisoning attacks to graph-based semi-supervised learning (G-SSL).
877	Compositional generalization through meta sequence-to-sequence learning	Brenden M. Lake	In this paper, I show how memory-augmented neural networks can be trained to generalize compositionally through meta seq2seq learning.
878	Bayesian Joint Estimation of Multiple Graphical Models	Lingrui Gan, Xinming Yang, Naveen Narisetty, Feng Liang	In this paper, we propose a novel Bayesian group regularization method based on the spike and slab Lasso priors for jointly estimating multiple graphical models.
879	Practical Two-Step Lookahead Bayesian Optimization	Jian Wu, Peter Frazier	This paper proposes a computationally efficient algorithm that provides an accurate solution to the two-step lookahead Bayesian optimization problem in seconds to at most several minutes of computation per batch of evaluations.
880	Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models	Yunfei Teng, Wenbo Gao, Fran?ois Chalus, Anna E. Choromanska, Donald Goldfarb, Adrian Weller	We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader).
881	A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks	Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, Pengchuan Zhang	In this paper, we unify all existing LP-relaxed verifiers, to the best of our knowledge, under a general convex relaxation framework.
882	Neural Jump Stochastic Differential Equations	Junteng Jia, Austin R. Benson	To this end, we introduce Neural Jump Stochastic Differential Equations that provide a data-driven approach to learn continuous and discrete dynamic behavior, i.e., hybrid systems that both flow and jump.
883	Learning metrics for persistence-based summaries and applications for graph classification	Qi Zhao, Yusu Wang	We study this problem and develop a new weighted kernel, called WKPI, for persistence summaries, as well as an optimization framework to learn the weight (and thus kernel).
884	On the Value of Target Data in Transfer Learning	Steve Hanneke, Samory Kpotufe	We aim to understand the value of additional labeled or unlabeled target data in transfer learning, for any given amount of source data; this is motivated by practical questions around minimizing sampling costs, whereby, target data is usually harder or costlier to acquire than source data, but can yield better accuracy.
885	Stochastic Variance Reduced Primal Dual Algorithms for Empirical Composition Optimization	Adithya M Devraj, Jianshu Chen	We take a novel approach to solving this problem by reformulating the original minimization objective into an equivalent min-max objective, which brings out all the empirical averages that are originally inside the nonlinear loss functions.
886	On Robustness of Principal Component Regression	Anish Agarwal, Devavrat Shah, Dennis Shen, Dogyoon Song	As the main contribution of this work, we address this challenge by rigorously establishing that PCR is robust to noisy, sparse, and possibly mixed valued covariates.
887	Meta Learning with Relational Information for Short Sequences	Yujia Xie, Haoming Jiang, Feng Liu, Tuo Zhao, Hongyuan Zha	This paper proposes a new meta-learning method — named HARMLESS (HAwkes Relational Meta Learning method for Short Sequences) for learning heterogeneous point process models from a collection of short event sequence data along with a relational network.
888	Residual Flows for Invertible Generative Modeling	Tian Qi Chen, Jens Behrmann, David K. Duvenaud, Joern-Henrik Jacobsen	We give a tractable unbiased estimate of the log density, and reduce the memory required during training by a factor of ten.
889	Multi-Agent Common Knowledge Reinforcement Learning	Christian Schroeder de Witt, Jakob Foerster, Gregory Farquhar, Philip Torr, Wendelin Boehmer, Shimon Whiteson	In this paper, we show that common knowledge between agents allows for complex decentralised coordination.
890	Learning to Learn By Self-Critique	Antreas Antoniou, Amos J. Storkey	In this paper, we propose a framework called \emph{Self-Critique and Adapt} or SCA.
891	Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes	Greg Yang	More generally, we introduce a language for expressing neural network computations, and our result encompasses all such expressible neural networks.
892	Neural Networks with Cheap Differential Operators	Tian Qi Chen, David K. Duvenaud	We describe a family of neural network architectures that allow easy access to a family of differential operators involving \emph{dimension-wise derivatives}, and we show how to modify the backward computation graph to compute them efficiently.
893	Transductive Zero-Shot Learning with Visual Structure Constraint	Ziyu Wan, Dongdong Chen, Yan Li, Xingguang Yan, Junge Zhang, Yizhou Yu, Jing Liao	Based on the observation that visual features of test instances can be separated into different clusters, we propose a new visual structure constraint on class centers for transductive ZSL, to improve the generality of the projection function (\ie alleviate the above domain shift problem).
894	Dying Experts: Efficient Algorithms with Optimal Regret Bounds	Hamid Shayestehmanesh, Sajjad Azami, Nishant A. Mehta	We study a variant of decision-theoretic online learning in which the set of experts that are available to Learner can shrink over time.
895	Model Similarity Mitigates Test Set Overuse	Horia Mania, John Miller, Ludwig Schmidt, Moritz Hardt, Benjamin Recht	We proffer a new explanation for the apparent longevity of test data: Many proposed models are similar in their predictions and we prove that this similarity mitigates overfitting.
896	A unified theory for the origin of grid cells through the lens of pattern formation	Ben Sorscher, Gabriel Mel, Surya Ganguli, Samuel Ocko	Here we provide an analytic theory that unifies the two perspectives by casting the learning dynamics of neural networks trained on navigational tasks as a pattern forming dynamical system.
897	On Sample Complexity Upper and Lower Bounds for Exact Ranking from Noisy Comparisons	Wenbo Ren, Jia (Kevin) Liu, Ness Shroff	Different from most previous works, in this paper, we have three main novelties: (i) compared to prior works, our upper bounds (algorithms) and lower bounds on the sample complexity (aka number of comparisons) require the minimal assumptions on the instances, and are not restricted to specific models; (ii) we give lower bounds and upper bounds on instances with \textit{unequal} noise levels; and (iii) this paper aims at the \textit{exact} ranking without knowledge on the instances, while most of the previous works either focus on approximate rankings or study exact ranking but require prior knowledge.
898	Hierarchical Decision Making by Generating and Following Natural Language Instructions	Hengyuan Hu, Denis Yarats, Qucheng Gong, Yuandong Tian, Mike Lewis	We introduce a challenging real-time strategy game environment in which the actions of a large number of units must be coordinated across long time scales.
899	SHE: A Fast and Accurate Deep Neural Network for Encrypted Data	Qian Lou, Lei Jiang	In this paper, we propose a Shift-accumulation-based LHE-enabled deep neural network (SHE) for fast and accurate inferences on encrypted data.
900	Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond	Lin Chen, Hossein Esfandiari, Gang Fu, Vahab Mirrokni	In this paper, we aim to develop LSH schemes for distance functions that measure the distance between two probability distributions, particularly for f-divergences as well as a generalization to capture mutual information loss.
901	A Game Theoretic Approach to Class-wise Selective Rationalization	Shiyu Chang, Yang Zhang, Mo Yu, Tommi Jaakkola	To this end, we propose a new game theoretic approach to class-dependent rationalization, where the method is specifically trained to highlight evidence supporting alternative conclusions.
902	Efficiently avoiding saddle points with zero order methods: No gradients required	Emmanouil-Vasileios Vlatakis-Gkaragkounis, Lampros Flokas, Georgios Piliouras	We consider the case of derivative-free algorithms for non-convex optimization, also known as zero order algorithms, that use only function evaluations rather than gradients
903	Metamers of neural networks reveal divergence from human perceptual systems	Jenelle Feather, Alex Durango, Ray Gonzalez, Josh McDermott	To more thoroughly investigate their similarity to biological systems, we synthesized model metamers – stimuli that produce the same responses at some stage of a network’s representation.
904	Spatial-Aware Feature Aggregation for Image based Cross-View Geo-Localization	Yujiao Shi, Liu Liu, Xin Yu, Hongdong Li	In this paper, we develop a new deep network to explicitly address these inherent differences between ground and aerial views.
905	Decentralized sketching of low rank matrices	Rakshith Sharma Srinivasa, Kiryung Lee, Marius Junge, Justin Romberg	Leveraging the joint structure between the columns, we propose a method to recover the matrix to within an epsilon relative error in the Frobenius norm from a total of O(r(d_1 + d_2)\log^6(d_1 + d_2)/\epsilon^2) observations.
906	Average Case Column Subset Selection for Entrywise `\ell_1`-Norm Loss	Zhao Song, David Woodruff, Peilin Zhong	We study the column subset selection problem with respect to the entrywise $\ell_1$-norm loss.
907	Efficient Forward Architecture Search	Hanzhang Hu, John Langford, Rich Caruana, Saurajit Mukherjee, Eric J. Horvitz, Debadeepta Dey	We propose a neural architecture search (NAS) algorithm, Petridish, to iteratively add shortcut connections to existing network layers.
908	Unsupervised Meta-Learning for Few-Shot Image Classification	Siavash Khodadadeh, Ladislau Boloni, Mubarak Shah	In this paper, we propose UMTRA, an algorithm that performs unsupervised, model-agnostic meta-learning for classification tasks.
909	Learning Mixtures of Plackett-Luce Models from Structured Partial Orders	Zhibing Zhao, Lirong Xia	In this paper, we focus on three popular structures of partial orders: ranked top-$l_1$, $l_2$-way, and choice data over a subset of alternatives.
910	Certainty Equivalence is Efficient for Linear Quadratic Control	Horia Mania, Stephen Tu, Benjamin Recht	We show that for both the fully and partially observed settings, the sub-optimality gap between the cost incurred by playing the certainty equivalent controller on the true system and the cost incurred by using the optimal LQ controller enjoys a fast statistical rate, scaling as the square of the parameter error.
911	Scalable Bayesian inference of dendritic voltage via spatiotemporal recurrent state space models	Ruoxi Sun, Ian Kinsella, Scott Linderman, Liam Paninski	Here we introduce a scalable fully Bayesian approach.
912	Logarithmic Regret for Online Control	Naman Agarwal, Elad Hazan, Karan Singh	We study optimal regret bounds for control in linear dynamical systems under adversarially changing strongly convex cost functions, given the knowledge of transition dynamics.
913	Elliptical Perturbations for Differential Privacy	Matthew Reimherr, Jordan Awan	We study elliptical distributions in locally convex vector spaces, and determine conditions when they can or cannot be used to satisfy differential privacy (DP).
914	Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks	Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, Yang Liu	Inspired by the work on manually-defined patterns of vulnerabilities from various code representation graphs and the recent advance on graph neural networks, we propose Devign, a general graph neural network based model for graph-level classification through learning on a rich set of code semantic representations.
915	KNG: The K-Norm Gradient Mechanism	Matthew Reimherr, Jordan Awan	This paper presents a new mechanism for producing sanitized statistical summaries that achieve {\it differential privacy}, called the {\it K-Norm Gradient} Mechanism, or KNG.
916	CXPlain: Causal Explanations for Model Interpretation under Uncertainty	Patrick Schwab, Walter Karlen	We present experiments that demonstrate that CXPlain is significantly more accurate and faster than existing model-agnostic methods for estimating feature importance.
917	Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning	Wenjie Shi, Shiji Song, Hui Wu, Ya-Chu Hsu, Cheng Wu, Gao Huang	To tackle this problem, we propose a general acceleration method for model-free, off-policy deep RL algorithms by drawing the idea underlying regularized Anderson acceleration (RAA), which is an effective approach to accelerating the solving of fixed point problems with perturbations.
918	STREETS: A Novel Camera Network Dataset for Traffic Flow	Corey Snyder, Minh Do	In this paper, we introduce STREETS, a novel traffic flow dataset from publicly available web cameras in the suburbs of Chicago, IL.
919	Sequential Neural Processes	Gautam Singh, Jaesik Yoon, Youngsung Son, Sungjin Ahn	In this paper, we propose Sequential Neural Processes (SNP) which incorporates a temporal state-transition model of stochastic processes and thus extends its modeling capabilities to dynamic stochastic processes.
920	Policy Continuation with Hindsight Inverse Dynamics	Hao Sun, Zhizhong Li, Xiaotong Liu, Bolei Zhou, Dahua Lin	To tackle this difficulty, we propose a new approach called Policy Continuation with Hindsight Inverse Dynamics (PCHID).
921	Learning to Self-Train for Semi-Supervised Few-Shot Classification	Xinzhe Li, Qianru Sun, Yaoyao Liu, Qin Zhou, Shibao Zheng, Tat-Seng Chua, Bernt Schiele	In this paper we propose a novel semi-supervised meta-learning method called learning to self-train (LST) that leverages unlabeled data and specifically meta-learns how to cherry-pick and label such unsupervised data to further improve performance.
922	Temporal FiLM: Capturing Long-Range Sequence Dependencies with Feature-Wise Modulations.	Sawyer Birnbaum, Volodymyr Kuleshov, Zayd Enam, Pang Wei W. Koh, Stefano Ermon	Here, we propose Temporal Feature-Wise Linear Modulation (TFiLM) — a novel architectural component inspired by adaptive batch normalization and its extensions — that uses a recurrent neural network to alter the activations of a convolutional model.
923	From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization	Krzysztof M. Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Vikas Sindhwani	We present a new algorithm (ASEBO) for optimizing high-dimensional blackbox functions.
924	On the Expressive Power of Deep Polynomial Neural Networks	Joe Kileel, Matthew Trager, Joan Bruna	This paper proposes the dimension of this variety as a precise measure of the expressive power of polynomial neural networks.
925	DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation	Shashank Rajput, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos	In this work, we present DETOX, a Byzantine-resilient distributed training framework that combines algorithmic redundancy with robust aggregation.
926	Can SGD Learn Recurrent Neural Networks with Provable Generalization?	Zeyuan Allen-Zhu, Yuanzhi Li	In this paper, we show using the vanilla stochastic gradient descent (SGD), RNN can actually learn some notable concept class \emph{efficiently}, meaning that both time and sample complexity scale \emph{polynomially} in the input length (or almost polynomially, depending on the concept).
927	Limits of Private Learning with Access to Public Data	Raef Bassily, Shay Moran, Noga Alon	We consider learning problems where the training set consists of two types of examples: private and public. The goal is to design a learning algorithm that satisfies differential privacy only with respect to the private examples.
928	Discrete Object Generation with Reversible Inductive Construction	Ari Seff, Wenda Zhou, Farhan Damani, Abigail Doyle, Ryan P. Adams	Here, we present a generative model for discrete objects employing a Markov chain where transitions are restricted to a set of local operations that preserve validity.
929	Efficient Near-Optimal Testing of Community Changes in Balanced Stochastic Block Models	Aditya Gangrade, Praveen Venkatesh, Bobak Nazer, Venkatesh Saligrama	We propose and analyze the problems of \textit{community goodness-of-fit and two-sample testing} for stochastic block models (SBM), where changes arise due to modification in community memberships of nodes.
930	Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards	Alexander Trott, Stephan Zheng, Caiming Xiong, Richard Socher	We introduce a simple and effective model-free method to learn from shaped distance-to-goal rewards on tasks where success depends on reaching a goal state.
931	Superset Technique for Approximate Recovery in One-Bit Compressed Sensing	Larkin Flodin, Venkata Gandikota, Arya Mazumdar	In this paper, we propose a generic approach for signal recovery in nonadaptive 1bCS that leads to improved sample complexity for approximate recovery for a variety of signal models, including nonnegative signals and binary signals.
932	Bandits with Feedback Graphs and Switching Costs	Raman Arora, Teodor Vanislavov Marinov, Mehryar Mohri	We study the adversarial multi-armed bandit problem where the learner is supplied with partial observations modeled by a \emph{feedback graph} and where shifting to a new action incurs a fixed \emph{switching cost}.
933	Functional Adversarial Attacks	Cassidy Laidlaw, Soheil Feizi	We propose functional adversarial attacks, a novel class of threat models for crafting adversarial examples to fool machine learning models.
934	Statistical-Computational Tradeoff in Single Index Models	Lingxiao Wang, Zhuoran Yang, Zhaoran Wang	In this paper, we investigate the case when this critical assumption fails to hold, where the problem becomes considerably harder.
935	On Fenchel Mini-Max Learning	Chenyang Tao, Liqun Chen, Shuyang Dai, Junya Chen, Ke Bai, Dong Wang, Jianfeng Feng, Wenlian Lu, Georgiy Bobashev, Lawrence Carin	We present a novel probabilistic learning framework, called Fenchel Mini-Max Learning (FML), that accommodates all four desiderata in a flexible and scalable manner.
936	MarginGAN: Adversarial Training in Semi-Supervised Learning	Jinhao Dong, Tong Lin	Our method is motivated by the success of large-margin classifiers and the recent viewpoint that good semi-supervised learning requires a bad” GAN.
937	Poincare Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games	Emmanouil-Vasileios Vlatakis-Gkaragkounis, Lampros Flokas, Georgios Piliouras	We study a wide class of non-convex non-concave min-max games that generalizes over standard bilinear zero-sum games.
938	A unified variance-reduced accelerated gradient method for convex optimization	Guanghui Lan, Zhize Li, Yi Zhou	We propose a novel randomized incremental gradient algorithm, namely, VAriance-Reduced Accelerated Gradient (Varag), for finite-sum optimization.
939	Nearly Tight Bounds for Robust Proper Learning of Halfspaces with a Margin	Ilias Diakonikolas, Daniel Kane, Pasin Manurangsi	We study the problem of properly learning large margin halfspaces in the agnostic PAC model.
940	Same-Cluster Querying for Overlapping Clusters	Wasim Huleihel, Arya Mazumdar, Muriel Medard, Soumyabrata Pal	In this paper, we look at the more practical scenario of overlapping clusters, and provide upper bounds (with algorithms) on the sufficient number of queries.
941	Efficient Convex Relaxations for Streaming PCA	Raman Arora, Teodor Vanislavov Marinov	In this work, we give improved bounds on per iteration cost of mini-batched variants of both MSG and $\ell_2$-RMSG and arrive at an algorithm with total computational complexity matching that of Oja’s algorithm.
942	Learning Robust Global Representations by Penalizing Local Predictive Power	Haohan Wang, Songwei Ge, Zachary Lipton, Eric P. Xing	This paper proposes a method for training robust convolutional networks by penalizing the predictive power of the local representations learned by earlier layers.
943	Unsupervised Curricula for Visual Meta-Reinforcement Learning	Allan Jabri, Kyle Hsu, Abhishek Gupta, Ben Eysenbach, Sergey Levine, Chelsea Finn	We formulate unsupervised meta-RL as information maximization between a latent task variable and the meta-learner’s data distribution, and describe a practical instantiation which alternates between integration of recent experience into the task distribution and meta-learning of the updated tasks.
944	Sample Complexity of Learning Mixture of Sparse Linear Regressions	Akshay Krishnamurthy, Arya Mazumdar, Andrew McGregor, Soumyabrata Pal	In this paper, we consider the case where the signal vectors are sparse; this generalizesthe popular compressed sensing paradigm.
945	Large Scale Adversarial Representation Learning	Jeff Donahue, Karen Simonyan	In this work we show that progress in image generation quality translates to substantially improved representation learning performance.
946	G2SAT: Learning to Generate SAT Formulas	Jiaxuan You, Haoze Wu, Clark Barrett, Raghuram Ramanujan, Jure Leskovec	In this work, we present G2SAT, the first deep generative framework that learns to generate SAT formulas from a given set of input formulas.
947	Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy	Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang	In this paper, we prove that a variant of PPO and TRPO equipped with overparametrized neural networks converges to the globally optimal policy at a sublinear rate.
948	Dimensionality reduction: theoretical perspective on practical measures	Yair Bartal, Nova Fandina, Ofer Neiman	The goal of this paper is to bridge the gap between theory and practice view-points of metric dimensionality reduction, laying the foundation for a theoretical study of more practically oriented analysis.
949	Oracle-Efficient Algorithms for Online Linear Optimization with Bandit Feedback	Shinji Ito, Daisuke Hatano, Hanna Sumita, Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi	We propose computationally efficient algorithms for \textit{online linear optimization with bandit feedback}, in which a player chooses an \textit{action vector} from a given (possibly infinite) set $\mathcal{A} \subseteq \mathbb{R}^d$, and then suffers a loss that can be expressed as a linear function in action vectors.
950	Multilabel reductions: what is my loss optimising?		In this paper, we study five commonly used reductions, including the one-versus-all reduction, a reduction to multiclass classification, and normalised versions of the same, wherein the contribution of each instance is normalised by the number of relevant labels.
951	Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks	Yuan Cao, Quanquan Gu	We propose a novel algorithm called approximate gradient descent for training CNNs, and show that, with high probability, the proposed algorithm with random initialization grants a linear convergence to the ground-truth parameters up to statistical precision.
952	Deep Gamblers: Learning to Abstain with Portfolio Theory	Ziyin Liu, Zhikang Wang, Paul Pu Liang, Russ R. Salakhutdinov, Louis-Philippe Morency, Masahito Ueda	Inspired by portfolio theory, we propose a loss function for the selective classification problem based on the doubling rate of gambling.
953	Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples	Tengyu Xu, Shaofeng Zou, Yingbin Liang	In contrast to previous studies that characterized the non-asymptotic convergence rate of TDC only under identical and independently distributed (i.i.d.) data samples, we provide the first non-asymptotic convergence analysis for two time-scale TDC under a non-i.i.d.\ Markovian sample path and linear function approximation.
954	Transfer Learning via Minimizing the Performance Gap Between Domains	Boyu Wang, Jorge Mendez, Mingbo Cai, Eric Eaton	To formalize this intuition, we define the performance gap as a measure of the discrepancy between the source and target domains.
955	Splitting Steepest Descent for Growing Neural Architectures	Lemeng Wu, Dilin Wang, Qiang Liu	We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs.
956	Sequential Experimental Design for Transductive Linear Bandits	Lalit Jain, Kevin G. Jamieson, Tanner Fiez, Lillian Ratliff	In this paper we introduce the pure exploration \emph{transductive linear bandit problem}: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$, a set of items $\mathcal{Z}\subset \mathbb{R}^d$, a fixed confidence $\delta$, and an unknown vector $\theta^{\ast}\in \mathbb{R}^d$, the goal is to infer $\argmax_{z\in \mathcal{Z}} z^\top\theta^\ast$ with probability $1-\delta$ by making as few sequentially chosen noisy measurements of the form $x^\top\theta^{\ast}$ as possible.
957	Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence	Aditya Sharad Golatkar, Alessandro Achille, Stefano Soatto	Deep neural networks (DNNs), however, challenge this view: We show that removing regularization after an initial transient period has little effect on generalization, even if the final loss landscape is the same as if there had been no regularization.
958	Outlier-Robust High-Dimensional Sparse Estimation via Iterative Filtering	Ilias Diakonikolas, Daniel Kane, Sushrut Karmalkar, Eric Price, Alistair Stewart	Specifically, we focus on the fundamental problems of robust sparse mean estimation and robust sparse PCA. We give the first practically viable robust estimators for these problems.
959	Variational Graph Recurrent Neural Networks	Ehsan Hajiramezanali, Arman Hasanzadeh, Krishna Narayanan, Nick Duffield, Mingyuan Zhou, Xiaoning Qian	In this paper, we develop a novel hierarchical variational model that introduces additional latent random variables to jointly model the hidden states of a graph recurrent neural network (GRNN) to capture both topology and node attribute changes in dynamic graphs.
960	Semi-Implicit Graph Variational Auto-Encoders	Arman Hasanzadeh, Ehsan Hajiramezanali, Krishna Narayanan, Nick Duffield, Mingyuan Zhou, Xiaoning Qian	Semi-implicit graph variational auto-encoder (SIG-VAE) is proposed to expand the flexibility of variational graph auto-encoders (VGAE) to model graph data.
961	Unsupervised Learning of Object Keypoints for Perception and Control	Tejas D. Kulkarni, Ankush Gupta, Catalin Ionescu, Sebastian Borgeaud, Malcolm Reynolds, Andrew Zisserman, Volodymyr Mnih	In this work we aim to learn object representations that are useful for control and reinforcement learning (RL).
962	A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation	Xueying Bai, Jian Guan, Hongning Wang	In this work, we propose a model-based reinforcement learning solution which models the user-agent interaction for offline policy learning via a generative adversarial network.
963	Optimizing Generalized Rate Metrics with Three Players	Harikrishna Narasimhan, Andrew Cotter, Maya Gupta	We present a general framework for solving a large class of learning problems with non-linear functions of classification rates.
964	Consistency-based Semi-supervised Learning for Object detection	Jisoo Jeong, Seungeui Lee, Jeesoo Kim, Nojun Kwak	To alleviate this problem, we propose a Consistency-based Semi-supervised learning method for object Detection (CSD), which is a way of using consistency constraints as a tool for enhancing detection performance by making full use of available unlabeled data.
965	Rates of Convergence for Large-scale Nearest Neighbor Classification	Xingye Qiao, Jiexin Duan, Guang Cheng	For a large data set which cannot be loaded into the memory of a single machine due to computation, communication, privacy, or ownership limitations, we consider the divide and conquer scheme: the entire data set is divided into small subsamples, on which nearest neighbor predictions are made, and then a final decision is reached by aggregating the predictions on subsamples by majority voting.
966	An Embedding Framework for Consistent Polyhedral Surrogates	Jessica Finocchiaro, Rafael Frongillo, Bo Waggoner	We formalize and study the natural approach of designing convex surrogate loss functions via embeddings for problems such as classification or ranking.
967	Cross-Modal Learning with Adversarial Samples	CHAO LI, Shangqian Gao, Cheng Deng, De Xie, Wei Liu	In this paper, we propose a novel Cross-Modal correlation Learning with Adversarial samples, namely CMLA, which for the first time presents the existence of adversarial samples in cross-modal data.
968	Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes	Jun Yang, Shengyang Sun, Daniel M. Roy	The goal of this paper is to extend this bridge between Rademacher complexity and state-of-the-art PAC-Bayesian theory.
969	Input-Cell Attention Reduces Vanishing Saliency of Recurrent Neural Networks	Aya Abdelsalam Ismail, Mohamed Gunady, Luiz Pessoa, Hector Corrada Bravo, Soheil Feizi	In this work we analyze saliency-based methods for RNNs, both classical and gated cell architectures.
970	Program Synthesis and Semantic Parsing with Learned Code Idioms	Eui Chul Shin, Miltiadis Allamanis, Marc Brockschmidt, Alex Polozov	In this work, we present Patois, a system that allows a neural program synthesizer to explicitly interleave high-level and low-level reasoning at every generation step.
971	Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks	Yuan Cao, Quanquan Gu	We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points.
972	High-Dimensional Optimization in Adaptive Random Subspaces	Jonathan Lacotte, Mert Pilanci, Marco Pavone	We propose a new randomized optimization method for high-dimensional problems which can be seen as a generalization of coordinate descent to random subspaces.
973	Random Projections with Asymmetric Quantization	Xiaoyun Li, Ping Li	In this paper, we investigate the cosine similarity estimators derived in such setting under the Lloyd-Max (LM) quantization scheme.
974	Superposition of many models into one	Brian Cheung, Alexander Terekhov, Yubei Chen, Pulkit Agrawal, Bruno Olshausen	We present a method for storing multiple models within a single set of parameters.
975	Private Testing of Distributions via Sample Permutations	Maryam Aliakbarpour, Ilias Diakonikolas, Daniel Kane, Ronitt Rubinfeld	In this paper, we use the framework of property testing to design algorithms to test the properties of the distribution that the data is drawn from with respect to differential privacy.
976	McDiarmid-Type Inequalities for Graph-Dependent Variables and Stability Bounds	Rui (Ray) Zhang, Xingwu Liu, Yuyi Wang, Liwei Wang	We consider learning problems in which examples are dependent and their dependency relation is characterized by a graph.
977	How to Initialize your Network? Robust Initialization for WeightNorm & ResNets	Devansh Arpit, V?ctor Campos, Yoshua Bengio	To address these issues, we propose a novel parameter initialization strategy that avoids explosion/vanishment of information across layers for weight normalized networks with and without residual connections.
978	On Making Stochastic Classifiers Deterministic	Andrew Cotter, Maya Gupta, Harikrishna Narasimhan	In this paper, we attempt to answer the theoretical question of how well a stochastic classifier can be approximated by a deterministic one, and compare several different approaches, proving lower and upper bounds.
979	Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection	Xiaoyi Gu, Leman Akoglu, Alessandro Rinaldo	In this paper we are concerned with investigating the performance of NN-based methods for anomaly detection.
980	Improving Black-box Adversarial Attacks with a Transfer-based Prior	Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, Jun Zhu	To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously.
981	Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks	Sitao Luan, Mingde Zhao, Xiao-Wen Chang, Doina Precup	In this paper, we first analyze key factors constraining the expressive power of existing Graph Convolutional Networks (GCNs), including the activation function and shallow learning mechanisms. Then, we generalize spectral graph convolution and deep GCN in block Krylov subspace forms, upon which we devise two architectures, both scalable in depth however making use of multi-scale information differently.
982	Statistical Model Aggregation via Parameter Matching	Mikhail Yurochkin, Mayank Agarwal, Soumya Ghosh, Kristjan Greenewald, Nghia Hoang	Exploiting tools from Bayesian nonparametrics, we develop a general meta-modeling framework that learns shared global latent structures by identifying correspondences among local model parameterizations.
983	On the (In)fidelity and Sensitivity of Explanations	Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Suggala, David I. Inouye, Pradeep K. Ravikumar	We propose simple robust variants of two notions that have been considered in recent literature: (in)fidelity, and sensitivity.
984	Exponential Family Estimation via Adversarial Dynamics Embedding	Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans	We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks.
985	The Broad Optimality of Profile Maximum Likelihood	Yi Hao, Alon Orlitsky	We study three fundamental statistical-learning problems: distribution estimation, property estimation, and property testing.
986	MintNet: Building Invertible Neural Networks with Masked Convolutions	Yang Song, Chenlin Meng, Stefano Ermon	We propose a new way of constructing invertible neural networks by combining simple building blocks with a novel set of composition rules.
987	Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates	Jeffrey Negrea, Mahdi Haghifam, Gintare Karolina Dziugaite, Ashish Khisti, Daniel M. Roy	In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli (2019).
988	On Distributed Averaging for Stochastic k-PCA	Aditya Bhaskara, Pruthuvi Maheshakya Wijewardena	We consider a slight variant of the well-studied “distributed averaging” approach, and prove that this leads to significantly better bounds on the dependence between ‘n’ and the eigenvalue gaps.
989	Controllable Unsupervised Text Attribute Transfer via Editing Entangled Latent Representation	Ke Wang, Hang Hua, Xiaojun Wan	To address the above problems, we propose a more flexible unsupervised text attribute transfer framework which replaces the process of modeling attribute with minimal editing of latent representations based on an attribute classifier.
990	MaxGap Bandit: Adaptive Algorithms for Approximate Ranking	Sumeet Katariya, Ardhendu Tripathy, Robert Nowak	We propose elimination and UCB-style algorithms and show that they are minimax optimal.
991	Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting	Aditya Grover, Jiaming Song, Ashish Kapoor, Kenneth Tran, Alekh Agarwal, Eric J. Horvitz, Stefano Ermon	We employ this likelihood-free importance weighting method to correct for the bias in generative models.
992	Online Forecasting of Total-Variation-bounded Sequences	Dheeraj Baby, Yu-Xiang Wang	We consider the problem of online forecasting of sequences of length $n$ with total-variation at most $C_n$ using observations contaminated by independent $\sigma$-subgaussian noise.
993	Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization	Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, Viveck Cadambe	In this paper, we study local distributed SGD, where data is partitioned among computation nodes, and the computation nodes perform local updates with periodically exchanging the model among the workers to perform averaging.
994	Data Parameters: A New Family of Parameters for Learning a Differentiable Curriculum	Shreyas Saxena, Oncel Tuzel, Dennis DeCoste	In this work, we address this problem by introducing data parameters.
995	Unified Sample-Optimal Property Estimation in Near-Linear Time	Yi Hao, Alon Orlitsky	We consider the fundamental learning problem of estimating properties of distributions over large domains.
996	Region Mutual Information Loss for Semantic Segmentation	Shuai Zhao, Yang Wang, Zheng Yang, Deng Cai	In this paper, we develop a region mutual information (RMI) loss to model the dependencies among pixels more simply and efficiently.
997	Learning Stable Deep Dynamics Models	J. Zico Kolter, Gaurav Manek	In this paper, we propose an approach for learning dynamical systems that are guaranteed to be stable over the entire state space.
998	Image Captioning: Transforming Objects into Words	Simao Herdade, Armin Kappeler, Kofi Boakye, Joao Soares	In this work we introduce the Object Relation Transformer, that builds upon this approach by explicitly incorporating information about the spatial relationship between input detected objects through geometric attention.
999	Greedy Sampling for Approximate Clustering in the Presence of Outliers	Aditya Bhaskara, Sharvaree Vadgama, Hong Xu	In this work we show that for k-means and k-center clustering, simple modifications to the well-studied greedy algorithms result in nearly identical guarantees, while additionally being robust to outliers.
1000	Adversarial Fisher Vectors for Unsupervised Representation Learning	Joshua Susskind, Shuangfei Zhai, Walter Talbott, Carlos Guestrin	We examine Generative Adversarial Networks (GANs) through the lens of deep Energy Based Models (EBMs), with the goal of exploiting the density model that follows from this formulation.
1001	On Tractable Computation of Expected Predictions	Pasha Khosravi, YooJung Choi, Yitao Liang, Antonio Vergari, Guy Van den Broeck	In this paper, we identify a pair of generative and discriminative models that enables tractable computation of expectations, as well as moments of any order, of the latter with respect to the former in case of regression.
1002	Levenshtein Transformer	Jiatao Gu, Changhan Wang, Junbo Zhao	In this work, we develop Levenshtein Transformer, a new partially autoregressive model devised for more flexible and amenable sequence generation.
1003	Unlabeled Data Improves Adversarial Robustness	Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C. Duchi, Percy S. Liang	We prove that unlabeled data bridges this gap: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy.
1004	Machine Teaching of Active Sequential Learners	Tomi Peltola, Mustafa Mert ?elikok, Pedram Daee, Samuel Kaski	We formulate this sequential teaching problem, which current techniques in machine teaching do not address, as a Markov decision process, with the dynamics nesting a model of the learner and the actions being the teacher’s responses.
1005	Gaussian-Based Pooling for Convolutional Neural Networks	Takumi Kobayashi	In this paper, to improve performance of CNNs, we propose a novel local pooling method based on the Gaussian-based probabilistic model over local neuron activations for flexibly pooling (extracting) features, in contrast to the previous model restricting the output within the convex hull of local neurons.
1006	Meta Architecture Search	Albert Shaw, Wei Wei, Weiyang Liu, Le Song, Bo Dai	We propose the Bayesian Meta Architecture SEarch (BASE) framework which takes advantage of a Bayesian formulation of the architecture search problem to learn over an entire set of tasks simultaneously.
1007	NAOMI: Non-Autoregressive Multiresolution Sequence Imputation	Yukai Liu, Rose Yu, Stephan Zheng, Eric Zhan, Yisong Yue	In this paper, we take a non-autoregressive approach and propose a novel deep generative model: Non-AutOregressive Multiresolution Imputation (NAOMI) to impute long-range sequences given arbitrary missing patterns.
1008	Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks	Difan Zou, Ziniu Hu, Yewen Wang, Song Jiang, Yizhou Sun, Quanquan Gu	To deal with the above two problems, we propose a new effective sampling algorithm called LAyer-Dependent ImportancE Sampling (LADIES).
1009	Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test	Lizhong Ding, Mengyang Yu, Li Liu, Fan Zhu, Yong Liu, Yu Li, Ling Shao	To solve this problem, we formulate a deep energy adversarial network (DEAN), which casts the energy model learned from real data into an optimization of a goodness-of-fit (GOF) test statistic.
1010	Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards	Anmol Kagrecha, Jayakrishnan Nair, Krishna Jagannathan	In this paper, we consider the problem of selecting the arm that optimizes a linear combination of the expected reward and the associated Conditional Value at Risk (CVaR) in a fixed budget best-arm identification framework.
1011	Private Stochastic Convex Optimization with Optimal Rates	Raef Bassily, Vitaly Feldman, Kunal Talwar, Abhradeep Guha Thakurta	We study differentially private (DP) algorithms for stochastic convex optimization (SCO).
1012	Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers	Hadi Salman, Jerry Li, Ilya Razenshteyn, Pengchuan Zhang, Huan Zhang, Sebastien Bubeck, Greg Yang	In this paper, we employ adversarial training to improve the performance of randomized smoothing.
1013	Demystifying Black-box Models with Symbolic Metamodels	Ahmed M. Alaa, Mihaela van der Schaar	To address this issue, we introduce the symbolic metamodeling framework – a general methodology for interpreting predictions by converting “black-box” models into “white-box” functions that are understandable to human subjects.
1014	Neural Temporal-Difference Learning Converges to Global Optima	Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang	In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation.
1015	Privacy-Preserving Q-Learning with Functional Noise in Continuous Spaces	Baoxiang Wang, Nidhi Hegde	Our aim is to protect the value function approximator, without regard to the number of states queried to the function.
1016	Attentive State-Space Modeling of Disease Progression	Ahmed M. Alaa, Mihaela van der Schaar	In this paper, we develop the attentive state-space model, a deep probabilistic model that learns accurate and interpretable structured representations for disease trajectories.
1017	Online EXP3 Learning in Adversarial Bandits with Delayed Feedback	Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet	For the case where \sum_{t=1}^{T}d_{t} and T are unknown, we propose a novel doubling trick for online learning with delays and prove that this adaptive EXP3 achieves a regret of O\left(\sqrt{\ln K\left(K^{2}T+\sum_{t=1}^{T}d_{t}\right)}\right).
1018	A Direct tilde{O}(1/epsilon) Iteration Parallel Algorithm for Optimal Transport	Arun Jambulapati, Aaron Sidford, Kevin Tian	We give an algorithm which solves the problem to additive $\epsilon$ accuracy with $\tilde{O}(1/\epsilon)$ parallel depth and $\tilde{O}\left(n^2/\epsilon\right)$ work.
1019	Faster Boosting with Smaller Memory	Julaiti Alafate, Yoav S. Freund	This paper presents an alternative approach to implementing the boosted trees, which achieves a significant speedup over XGBoost and LightGBM, especially when the memory size is small.
1020	Variance Reduction for Matrix Games	Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian	We present a randomized primal-dual algorithm that solves the problem min_x max_y y^T A x to additive error epsilon in time nnz(A) + sqrt{nnz(A) n} / epsilon, for matrix A with larger dimension n and nnz(A) nonzero entries.
1021	Learning Neural Networks with Adaptive Regularization	Han Zhao, Yao-Hung Hubert Tsai, Russ R. Salakhutdinov, Geoffrey J. Gordon	To optimize the model, we present an efficient block coordinate descent algorithm with analytical solutions.
1022	Distributed estimation of the inverse Hessian by determinantal averaging	Michal Derezinski, Michael W. Mahoney	To address this, we propose determinantal averaging, a new approach for correcting the inversion bias.
1023	Smoothing Structured Decomposable Circuits	Andy Shih, Guy Van den Broeck, Paul Beame, Antoine Amarilli	We propose a near-linear time algorithm for this task and explore lower bounds for smoothing decomposable circuits, using existing results on range-sum queries.
1024	Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks	Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, George Pappas	In this paper, we present a convex optimization framework to compute guaranteed upper bounds on the Lipschitz constant of DNNs both accurately and efficiently.
1025	Provable Non-linear Inductive Matrix Completion	Kai Zhong, Zhao Song, Prateek Jain, Inderjit S. Dhillon	In this paper, we provide the first theoretical analysis for a simple NIMC model in the realizable setting, where the relevance score of a (query, item) pair is formulated as the inner product between their single-layer neural representations.
1026	Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback	Shuai Zheng, Ziyue Huang, James Kwok	In this paper, we propose a general distributed compressed SGD with Nesterov’s momentum.
1027	Sparse Variational Inference: Bayesian Coresets from Scratch	Trevor Campbell, Boyan Beronov	In the present work we remove this requirement by formulating coreset construction as sparsity-constrained variational inference within an exponential family.
1028	Personalizing Many Decisions with High-Dimensional Covariates	Nima Hamidi, Mohsen Bayati, Kapil Gupta	The main contribution of this paper is to introduce and theoretically analyze a new algorithm (REAL Bandit) with a regret that scales by r^2(k+d) when r is rank of the k by d matrix of unknown parameters.
1029	A Necessary and Sufficient Stability Notion for Adaptive Generalization	Moshe Shenfeld, Katrina Ligett	We introduce a new notion of the stability of computations, which holds under post-processing and adaptive composition.
1030	Necessary and Sufficient Geometries for Gradient Methods	Daniel Levy, John C. Duchi	We study the impact of the constraint set and gradient geometry on the convergence of online and stochastic methods for convex optimization, providing a characterization of the geometries for which stochastic gradient and adaptive gradient methods are (minimax) optimal.
1031	Landmark Ordinal Embedding	Nikhil Ghosh, Yuxin Chen, Yisong Yue	In this paper, we aim to learn a low-dimensional Euclidean representation from a set of constraints of the form “item j is closer to item i than item k”.
1032	Identification of Conditional Causal Effects under Markov Equivalence	Amin Jaber, Jiji Zhang, Elias Bareinboim	In this work, we derive an algorithm to identify conditional effects, which are particularly useful for evaluating conditional plans or policies.
1033	The Thermodynamic Variational Objective	Vaden Masrani, Tuan Anh Le, Frank Wood	We introduce the thermodynamic variational objective (TVO) for learning in both continuous and discrete deep generative models.
1034	Global Guarantees for Blind Demodulation with Generative Priors	Paul Hand, Babhru Joshi	We study a deep learning inspired formulation for the blind demodulation problem, which is the task of recovering two unknown vectors from their entrywise multiplication.
1035	Exact sampling of determinantal point processes with sublinear time preprocessing	Michal Derezinski, Daniele Calandriello, Michal Valko	For this purpose we provide DPP-VFX, a new algorithm which, given access only to L, samples exactly from a determinantal point process while satisfying the following two properties: (1) its preprocessing cost is n poly(k), i.e., sublinear in the size of L, and (2) its sampling cost is poly(k), i.e., independent of the size of L. Prior to our results, state-of-the-art exact samplers required O(n^3) preprocessing time and sampling time linear in n or dependent on the spectral properties of L.
1036	Geometry-Aware Neural Rendering	Joshua Tobin, Wojciech Zaremba, Pieter Abbeel	We propose Epipolar Cross Attention (ECA), an attention mechanism that leverages the geometry of the scene to perform efficient non-local operations, requiring only $O(n)$ comparisons per spatial dimension instead of $O(n^2)$.
1037	Variational Temporal Abstraction	Taesup Kim, Sungjin Ahn, Yoshua Bengio	We introduce a variational approach to learning and inference of temporally hierarchical structure and representation for sequential data.
1038	Subquadratic High-Dimensional Hierarchical Clustering	Amir Abboud, Vincent Cohen-Addad, Hussein Houdrouge	We consider the widely-used average-linkage, single-linkage, and Ward’s methods for computing hierarchical clusterings of high-dimensional Euclidean inputs.
1039	Learning Auctions with Robust Incentive Guarantees	Jacob D. Abernethy, Rachel Cummings, Bhuvesh Kumar, Sam Taggart, Jamie H. Morgenstern	In this paper, we combine tools from differential privacy, mechanism design, and sample complexity to give a repeated auction that (1) learns bidder demand from past data, (2) is approximately revenue-optimal, and (3) strategically robust, as it incentivizes bidders to behave truthfully.
1040	Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games	Kaiqing Zhang, Zhuoran Yang, Tamer Basar	Building upon this, we develop three projected nested-gradient methods that are guaranteed to converge to the NE of the game.
1041	Uniform convergence may be unable to explain generalization in deep learning	Vaishnavh Nagarajan, J. Zico Kolter	Through these findings, we cast doubt on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well.
1042	A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions	Mejbah Alam, Justin Gottschlich, Nesime Tatbul, Javier S. Turek, Tim Mattson, Abdullah Muzahid	In this paper, we apply MP to the automation of software performance regression testing.
1043	DTWNet: a Dynamic Time Warping Network	Xingyu Cai, Tingyang Xu, Jinfeng Yi, Junzhou Huang, Sanguthevar Rajasekaran	In this paper, we propose a novel component in an artificial neural network.
1044	Structured Graph Learning Via Laplacian Spectral Constraints	Sandeep Kumar, Jiaxi Ying, Jose Vinicius de Miranda Cardoso, Daniel Palomar	In this paper, we first show, for a set of important graph families it is possible to convert the combinatorial constraints of structure into eigenvalue constraints of the graph Laplacian matrix. Then we introduce a unified graph learning framework lying at the integration of the spectral properties of the Laplacian matrix with Gaussian graphical modeling, which is capable of learning structures of a large class of graph families.
1045	Thresholding Bandit with Optimal Aggregate Regret	Chao Tao, Sa?l Blanco, Jian Peng, Yuan Zhou	We introduce LSA, a new, simple and anytime algorithm that aims to minimize the aggregate regret (or the expected number of mis-classified arms).
1046	Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks	Yuanzhi Li, Colin Wei, Tengyu Ma	Towards explaining this phenomenon, we devise a setting in which we can prove that a two layer network trained with large initial learning rate and annealing provably generalizes better than the same network trained with a small learning rate from the start.
1047	Rethinking Kernel Methods for Node Representation Learning on Graphs	Yu Tian, Long Zhao, Xi Peng, Dimitris Metaxas	Here, we present a novel theoretical kernel-based framework for node classification that can bridge the gap between these two representation learning problems on graphs.
1048	Causal Confusion in Imitation Learning	Pim de Haan, Dinesh Jayaraman, Sergey Levine	We investigate how this problem arises, and propose a solution to combat it through targeted interventions—either environment interaction or expert queries—to determine the correct causal model.
1049	Optimizing Generalized PageRank Methods for Seed-Expansion Community Detection	Pan Li, I Chien, Olgica Milenkovic	Given this result, we propose a new GPR, termed Inverse PR (IPR), with LP weights that increase for the initial few steps of the walks.
1050	The Case for Evaluating Causal Models Using Interventional Measures and Empirical Data	Amanda Gentzel, Dan Garant, David Jensen	We argue for more frequent use of evaluation techniques that examine interventional measures rather than structural or observational measures, and that evaluate those measures on empirical data rather than synthetic data.
1051	Dimension-Free Bounds for Low-Precision Training	Zheng Li, Christopher M. De Sa	In this paper, we derive new bounds for low-precision training algorithms that do not contain the dimension $d$ , which lets us better understand what affects the convergence of these algorithms as parameters scale.
1052	Concentration of risk measures: A Wasserstein distance approach	Sanjay P. Bhat, Prashanth L.A.	Concentration of risk measures: A Wasserstein distance approach
1053	Meta-Inverse Reinforcement Learning with Probabilistic Context Variables	Lantao Yu, Tianhe Yu, Chelsea Finn, Stefano Ermon	To this end, we propose a deep latent variable model that is capable of learning rewards from unstructured, multi-task demonstration data, and critically, use this experience to infer robust rewards for new, structurally-similar tasks from a single demonstration.
1054	Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction	Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, Sergey Levine	Based on our analysis, we propose a practical algorithm, bootstrapping error accumulation reduction (BEAR).
1055	Bayesian Optimization with Unknown Search Space	Huong Ha, Santu Rana, Sunil Gupta, Thanh Nguyen, Hung Tran-The, Svetha Venkatesh	To address this problem, we propose a systematic volume expansion strategy for the Bayesian optimization.
1056	On the Downstream Performance of Compressed Word Embeddings	Avner May, Jian Zhang, Tri Dao, Christopher R?	We thus propose the eigenspace overlap score as a new measure.
1057	Multivariate Distributionally Robust Convex Regression under Absolute Error Loss	Jose Blanchet, Peter W. Glynn, Jun Yan, Zhengqing Zhou	This paper proposes a novel non-parametric multidimensional convex regression estimator which is designed to be robust to adversarial perturbations in the empirical measure.
1058	Neural Relational Inference with Fast Modular Meta-learning	Ferran Alet, Erica Weng, Tom?s Lozano-P?rez, Leslie Pack Kaelbling	We frame relational inference as a modular meta-learning problem, where neural modules are trained to be composed in different ways to solve many tasks.
1059	Gradient based sample selection for online continual learning	Rahaf Aljundi, Min Lin, Baptiste Goujaud, Yoshua Bengio	In this work, we formulate sample selection as a constraint reduction problem based on the constrained optimization view of continual learning.
1060	Attribution-Based Confidence Metric For Deep Neural Networks	Susmit Jha, Sunny Raj, Steven Fernandes, Sumit K. Jha, Somesh Jha, Brian Jalaian, Gunjan Verma, Ananthram Swami	We propose a novel confidence metric, namely, attribution-based confidence (ABC) for deep neural networks (DNNs).
1061	Theoretical evidence for adversarial robustness through randomization	Rafael Pinot, Laurent Meunier, Alexandre Araujo, Hisashi Kashima, Florian Yger, Cedric Gouy-Pailler, Jamal Atif	This paper investigates the theory of robustness against adversarial attacks.
1062	Online Continual Learning with Maximal Interfered Retrieval	Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin, Lucas Page-Caccia	In this work, we consider a controlled sampling of memories for replay.
1063	Neural Attribution for Semantic Bug-Localization in Student Programs	Rahul Gupta, Aditya Kanade, Shirish Shevade	In this work, we present NeuralBugLocator, a deep learning based technique, that can localize the bugs in a faulty program with respect to a failing test, without even running the program.
1064	Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates	Carlos Riquelme, Hugo Penedones, Damien Vincent, Hartmut Maennel, Sylvain Gelly, Timothy A. Mann, Andre Barreto, Gergely Neu	In this paper, we argue that the larger bias of TD can be a result of the amplification of local approximation errors.
1065	SPoC: Search-based Pseudocode to Code	Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, Percy S. Liang	We consider the task of mapping pseudocode to executable code, assuming a one-to-one correspondence between lines of pseudocode and lines of code.
1066	Generative Modeling by Estimating Gradients of the Data Distribution	Yang Song, Stefano Ermon	We introduce a new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching.
1067	Adversarial Music: Real world Audio Adversary against Wake-word Detection System	Juncheng Li, Shuhui Qu, Xinjian Li, Joseph Szurley, J. Zico Kolter, Florian Metze	In this work, we target our attack on the wake-word detection system.
1068	Prediction of Spatial Point Processes: Regularized Method with Out-of-Sample Guarantees	Muhammad Osama, Dave Zachariah, Peter Stoica	In this paper, we develop a method to infer predictive intensity intervals by learning a spatial model using a regularized criterion.
1069	Debiased Bayesian inference for average treatment effects	Kolyan Ray, Botond Szabo	Working in the standard potential outcomes framework, we propose a data-driven modification to an arbitrary (nonparametric) prior based on the propensity score that corrects for the first-order posterior bias, thereby improving performance.
1070	Margin-Based Generalization Lower Bounds for Boosted Classifiers	Allan Gr?nlund, Lior Kamma, Kasper Green Larsen, Alexander Mathiasen, Jelani Nelson	In this paper, we give the first margin-based lower bounds on the generalization error of boosted classifiers.
1071	Connections Between Mirror Descent, Thompson Sampling and the Information Ratio	Julian Zimmert, Tor Lattimore	We make a formal connection, showing that the information-theoretic bounds in most applications are derived from existing techniques from online convex optimisation.
1072	Graph Transformer Networks	Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, Hyunwoo J. Kim	In this paper, we propose Graph Transformer Networks (GTNs) that are capable of generating new graph structures, which involve identifying useful connections between unconnected nodes on the original graph, while learning effective node representation on the new graphs in an end-to-end fashion.
1073	Learning to Confuse: Generating Training Time Adversarial Data with Auto-Encoder	Ji Feng, Qi-Zhi Cai, Zhi-Hua Zhou	In this work, we consider one challenging training time attack by modifying training data with bounded perturbation, hoping to manipulate the behavior (both targeted or non-targeted) of any corresponding trained classifier during test time when facing clean samples.
1074	The Impact of Regularization on High-dimensional Logistic Regression	Fariborz Salehi, Ehsan Abbasi, Babak Hassibi	In the high-dimensional regime the underlying parameter vector is often structured (sparse, block-sparse, finite-alphabet, etc.) and so in this paper we study regularized logistic regression (RLR), where a convex regularizer that encourages the desired structure is added to the negative of the log-likelihood function.
1075	Adaptive Density Estimation for Generative Models	Thomas Lucas, Konstantin Shmelkov, Karteek Alahari, Cordelia Schmid, Jakob Verbeek	As a solution, we propose the use of deep invertible transformations in the latent variable decoder.
1076	Fast and Provable ADMM for Learning with Generative Priors	Fabian Latorre, Armin eftekhari, Volkan Cevher	In this work, we propose a (linearized) Alternating Direction Method-of-Multipliers (ADMM) algorithm for minimizing a convex function subject to a nonconvex constraint.
1077	Weighted Linear Bandits for Non-Stationary Environments	Yoan Russac, Claire Vernade, Olivier Capp?	To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past.
1078	Improved Regret Bounds for Bandit Combinatorial Optimization	Shinji Ito, Daisuke Hatano, Hanna Sumita, Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi	In this paper, we aim to reveal the property, which makes the bandit combinatorial optimization hard.
1079	Pareto Multi-Task Learning	Xi Lin, Hui-Ling Zhen, Zhenhua Li, Qing-Fu Zhang, Sam Kwong	In this paper, we generalize this idea and propose a novel Pareto multi-task learning algorithm (Pareto MTL) to find a set of well-distributed Pareto solutions which can represent different trade-offs among different tasks.
1080	SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits	Etienne Boursier, Vianney Perchet	We present a decentralized algorithm that achieves the same performance as a centralized one, contradicting the existing lower bounds for that problem.
1081	Novel positional encodings to enable tree-based transformers	Vighnesh Shiv, Chris Quirk	Motivated by this property, we propose a method to extend transformers to tree-structured data, enabling sequence-to-tree, tree-to-sequence, and tree-to-tree mappings.
1082	A Domain Agnostic Measure for Monitoring and Evaluating GANs	Paulina Grnarova, Kfir Y. Levy, Aurelien Lucchi, Nathanael Perraudin, Ian Goodfellow, Thomas Hofmann, Andreas Krause	We leverage the notion of duality gap from game theory to propose a measure that addresses both (i) and (ii) at a low computational cost.
1083	Submodular Function Minimization with Noisy Evaluation Oracle	Shinji Ito	For this problem, we provide an algorithm that returns an $O(n^{3/2}/\sqrt{T})$-additive approximate solution in expectation, where $n$ and $T$ stand for the size of the problem and the number of oracle calls, respectively.
1084	Counting the Optimal Solutions in Graphical Models	Radu Marinescu, Rina Dechter	We introduce #opt, a new inference task for graphical models which calls for counting the number of optimal solutions of the model.
1085	Modelling the Dynamics of Multiagent Q-Learning in Repeated Symmetric Games: a Mean Field Theoretic Approach	Shuyue Hu, Chin-wing Leung, Ho-fung Leung	In this paper, we study an n-agent setting with n tends to infinity, such that agents learn their policies concurrently over repeated symmetric bimatrix games with some other agents.
1086	Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling	Ming Hou, Jiajia Tang, Jianhai Zhang, Wanzeng Kong, Qibin Zhao	In this work, we first propose a polynomial tensor pooling (PTP) block for integrating multimodal features by considering high-order moments, followed by a tensorized fully connected layer. Treating PTP as a building block, we further establish a hierarchical polynomial fusion network (HPFN) to recursively transmit local correlations into global ones.
1087	Bootstrapping Upper Confidence Bound	Botao Hao, Yasin Abbasi Yadkori, Zheng Wen, Guang Cheng	In this paper, we propose a non-parametric and data-dependent UCB algorithm based on the multiplier bootstrap.
1088	Integer Discrete Flows and Lossless Compression	Emiel Hoogeboom, Jorn Peters, Rianne van den Berg, Max Welling	For that reason, we introduce a flow-based generative model for ordinal discrete data called Integer Discrete Flow (IDF): a bijective integer map that can learn rich transformations on high-dimensional data.
1089	Structured Prediction with Projection Oracles	Mathieu Blondel	We propose in this paper a general framework for deriving loss functions for structured prediction.
1090	A Primal Dual Formulation For Deep Learning With Constraints	Yatin Nandwani, Abhishek Pathak, Mausam, Parag Singla	In this paper, we present a constrained optimization formulation for training a deep network with a given set of hard constraints on output labels.
1091	Screening Sinkhorn Algorithm for Regularized Optimal Transport	Mokhtar Z. Alaya, Maxime Berar, Gilles Gasso, Alain Rakotomamonjy	We introduce in this paper a novel strategy for efficiently approximating the Sinkhorn distance between two discrete measures.
1092	PAC-Bayes Un-Expected Bernstein Inequality	Zakaria Mhammedi, Peter Gr?nwald, Benjamin Guedj	We present a new PAC-Bayesian generalization bound.
1093	Are Labels Required for Improving Adversarial Robustness?	Jean-Baptiste Alayrac, Jonathan Uesato, Po-Sen Huang, Alhussein Fawzi, Robert Stanforth, Pushmeet Kohli	Our main insight is that unlabeled data can be a competitive alternative to labeled data for training adversarially robust models.
1094	Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies	Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor	In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with greedy policies — act by 1-step planning — can achieve tight minimax performance in terms of regret, O(\sqrt{HSAT}).
1095	Multi-objective Bayesian optimisation with preferences over objectives	Majid Abdolshah, Alistair Shilton, Santu Rana, Sunil Gupta, Svetha Venkatesh	We present a multi-objective Bayesian optimisation algorithm that allows the user to express preference-order constraints on the objectives of the type objective A is more important than objective B.
1096	Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging	Pooria Joulani, Andr?s Gy?rgy, Csaba Szepesvari	We present two new algorithms, ASYNCADA and HEDGEHOG, for asynchronous sparse online and stochastic optimization.
1097	Calibration tests in multi-class classification: A unifying framework	David Widmann, Fredrik Lindsten, Dave Zachariah	We propose and study calibration measures for multi-class classification that generalize existing measures such as the expected calibration error, the maximum calibration error, and the maximum mean calibration error.
1098	Classification Accuracy Score for Conditional Generative Models	Suman Ravuri, Oriol Vinyals	To test this latter hypothesis, we use class-conditional generative models from a number of model classes – variational autoencoders, autoregressive models, and generative adversarial networks (GANs) – to infer the class labels of real data.
1099	Theoretical Analysis of Adversarial Learning: A Minimax Approach	Zhuozhuo Tu, Jingwei Zhang, Dacheng Tao	In this paper, we propose a general theoretical method for analyzing the risk bound in the presence of adversaries.
1100	Multiagent Evaluation under Incomplete Information	Mark Rowland, Shayegan Omidshafiei, Karl Tuyls, Julien Perolat, Michal Valko, Georgios Piliouras, Remi Munos	We propose adaptive algorithms for accurate ranking, provide correctness and sample complexity guarantees, then introduce a means of connecting uncertainties in noisy match outcomes to uncertainties in rankings.
1101	Tree-Sliced Variants of Wasserstein Distances	Tam Le, Makoto Yamada, Kenji Fukumizu, Marco Cuturi	We consider in this work a more general family of ground metrics, namely \textit{tree metrics}, which also yield fast closed-form computations and negative definite, and of which the sliced-Wasserstein distance is a particular case (the tree is a chain).
1102	Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration	Meelis Kull, Miquel Perello Nieto, Markus K?ngsepp, Telmo Silva Filho, Hao Song, Peter Flach	We propose a natively multiclass calibration method applicable to classifiers from any model class, derived from Dirichlet distributions and generalising the beta calibration method from binary classification.
1103	Comparing distributions: `\ell_1` geometry improves kernel two-sample testing	meyer scetbon, Gael Varoquaux	Here, we show that $L^p$ distances (with $p\geq 1$) between these distribution representatives give metrics on the space of distributions that are well-behaved to detect differences between distributions as they metrize the weak convergence.
1104	Robustness Verification of Tree-based Models	Hongge Chen, Huan Zhang, Si Si, Yang Li, Duane Boning, Cho-Jui Hsieh	For general problems, by exploiting the boxicity of the graph, we devise an efficient verification algorithm that can give tight lower bounds on robustness of decision tree ensembles, and allows iterative improvement and any-time termination.
1105	Towards Interpretable Reinforcement Learning Using Attention Augmented Agents	Alexander Mott, Daniel Zoran, Mike Chrzanowski, Daan Wierstra, Danilo Jimenez Rezende	Inspired by recent work in attention models for image captioning and question answering, we present a soft attention model for the reinforcement learning domain.
1106	Fast and Accurate Stochastic Gradient Estimation	Beidi Chen, Yingchen Xu, Anshumali Shrivastava	In this paper, we break this barrier by providing the first demonstration of a scheme, Locality sensitive hashing (LSH) sampled Stochastic Gradient Descent (LGD), which leads to superior gradient estimation while keeping the sampling cost per iteration similar to that of the uniform sampling.
1107	Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning	Igor Colin, Ludovic DOS SANTOS, Kevin Scaman	We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example.
1108	Root Mean Square Layer Normalization	Biao Zhang, Rico Sennrich	In this paper, we hypothesize that re-centering invariance in LayerNorm is dispensable and propose root mean square layer normalization, or RMSNorm.
1109	Universality in Learning from Linear Measurements	Ehsan Abbasi, Fariborz Salehi, Babak Hassibi	We study the problem of recovering a structured signal from independently and identically drawn linear measurements.
1110	Planning in entropy-regularized Markov decision processes and games	Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko	We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser.
1111	Exponentially convergent stochastic k-PCA without variance reduction	Cheng Tang	We present Matrix Krasulina, an algorithm for online k-PCA, by gen- eralizing the classic Krasulina’s method (Krasulina, 1969) from vector to matrix case.
1112	R2D2: Reliable and Repeatable Detector and Descriptor	Jerome Revaud, Cesar De Souza, Martin Humenberger, Philippe Weinzaepfel	In this work, we argue that repeatable regions are not necessarily discriminative and can therefore lead to select suboptimal keypoints.
1113	Selective Sampling-based Scalable Sparse Subspace Clustering	Shin Matsushima, Maria Brbic	To overcome this limitation, we introduce Selective Sampling-based Scalable Sparse Subspace Clustering (S5C) algorithm which selects subsamples based on the approximated subgradients and linearly scales with the number of data points in terms of time and memory requirements.
1114	A General Framework for Symmetric Property Estimation	Moses Charikar, Kirankumar Shiragur, Aaron Sidford	In this paper we provide a general framework for estimating symmetric properties of distributions from i.i.d. samples.
1115	Structured Variational Inference in Continuous Cox Process Models	Virginia Aglietti, Edwin V. Bonilla, Theodoros Damoulas, Sally Cripps	We propose a scalable framework for inference in a continuous sigmoidal Cox process that assumes the corresponding intensity function is given by a Gaussian process (GP) prior transformed with a scaled logistic sigmoid function.
1116	Generalization of Reinforcement Learners with Working and Episodic Memory	Meire Fortunato, Melissa Tan, Ryan Faulkner, Steven Hansen, Adri? Puigdom?nech Badia, Gavin Buttimore, Charles Deck, Joel Z. Leibo, Charles Blundell	In this paper, we aim to develop a comprehensive methodology to test different kinds of memory in an agent and assess how well the agent can apply what it learns in training to a holdout set that differs from the training set along dimensions that we suggest are relevant for evaluating memory-specific generalization.
1117	Distribution Learning of a Random Spatial Field with a Location-Unaware Mobile Sensor	Meera Pai, Animesh Kumar	While GPS or other localization methods can reduce this uncertainty, we address a more fundamental question: can a location-unaware mobile sensor, recording samples on a directed non-uniform random walk, learn the statistical distribution (as a function of space) of an underlying random process (spatial field)?
1118	Hindsight Credit Assignment	Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Gheshlaghi Azar, Bilal Piot, Nicolas Heess, Hado P. van Hasselt, Gregory Wayne, Satinder Singh, Doina Precup, Remi Munos	We consider the problem of efficient credit assignment in reinforcement learning.
1119	Efficient Identification in Linear Structural Causal Models with Instrumental Cutsets	Daniel Kumor, Bryant Chen, Elias Bareinboim	In this paper, we investigate graphical conditions to allow efficient identification in arbitrary linear structural causal models (SCMs).
1120	Kernelized Bayesian Softmax for Text Generation	Ning Miao, Hao Zhou, Chengqi Zhao, Wenxian Shi, Lei Li	In this paper, we propose KerBS, a novel approach for learning better embeddings for text generation.
1121	When to Trust Your Model: Model-Based Policy Optimization	Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine	In this paper, we study the role of model usage in policy optimization both theoretically and empirically.
1122	Correlation Clustering with Adaptive Similarity Queries	Marco Bressan, Nicol? Cesa-Bianchi, Andrea Paudice, Fabio Vitale	In this work we investigate correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries.
1123	Control What You Can: Intrinsically Motivated Task-Planning Agent	Sebastian Blaes, Marin Vlastelica Pogancic, Jiajie Zhu, Georg Martius	We present a novel intrinsically motivated agent that learns how to control the environment in a sample efficient manner, that is with as few environment interactions as possible, by optimizing learning progress.
1124	Selecting causal brain features with a single conditional independence test per feature	Atalanti Mastakouri, Bernhard Sch?lkopf, Dominik Janzing	We propose a constraint-based causal feature selection method for identifying causes of a given target variable, selecting from a set of candidate variables, while there can also be hidden variables acting as common causes with the target.
1125	Continuous Hierarchical Representations with Poincare Variational Auto-Encoders	Emile Mathieu, Charline Le Lan, Chris J. Maddison, Ryota Tomioka, Yee Whye Teh	We therefore endow VAEs with a Poincaré ball model of hyperbolic geometry as a latent space and rigorously derive the necessary methods to work with two main Gaussian generalisations on that space.
1126	A Generic Acceleration Framework for Stochastic Composite Optimization	Andrei Kulunchakov, Julien Mairal	In this paper, we introduce various mechanisms to obtain accelerated first-order stochastic optimization algorithms when the objective function is convex or strongly convex.
1127	Beating SGD Saturation with Tail-Averaging and Minibatching	Nicole Muecke, Gergely Neu, Lorenzo Rosasco	In this paper, we consider least squares learning in a nonparametric setting and contribute to filling this gap by focusing on the effect and interplay of multiple passes, mini-batching and averaging, in particular tail averaging.
1128	Random Quadratic Forms with Dependence: Applications to Restricted Isometry and Beyond	Arindam Banerjee, Qilong Gu, Vidyashankar Sivakumar, Steven Z. Wu	In this paper, we show that such independence is in fact not needed for such results which continue to hold under fairly general dependence structures.
1129	Continuous-time Models for Stochastic Optimization Algorithms	Antonio Orvieto, Aurelien Lucchi	We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods.
1130	Curriculum-guided Hindsight Experience Replay	Meng Fang, Tianyi Zhou, Yali Du, Lei Han, Zhengyou Zhang	In this paper, we propose to 1) adaptively select the failed experiences for replay according to the proximity to the true goals and the curiosity of exploration over diverse pseudo goals, and 2) gradually change the proportion of the goal-proximity and the diversity-based curiosity in the selection criteria: we adopt a human-like learning strategy that enforces more curiosity in earlier stages and changes to larger goal-proximity later.
1131	Implicit Semantic Data Augmentation for Deep Networks	Yulin Wang, Xuran Pan, Shiji Song, Hong Zhang, Gao Huang, Cheng Wu	In this paper, we propose a novel implicit semantic data augmentation (ISDA) approach to complement traditional augmentation techniques like flipping, translation or rotation.
1132	MetaInit: Initializing learning by learning to initialize	Yann N. Dauphin, Samuel Schoenholz	In this work, we introduce an algorithm called MetaInit as a step towards automating the search for good initializations using meta-learning.
1133	Scalable Deep Generative Relational Model with High-Order Node Dependence	Xuhui Fan, Bin Li, Caoyuan Li, Scott SIsson, Ling Chen	In this work, we propose a probabilistic framework for relational data modelling and latent structure exploring.
1134	Random Path Selection for Continual Learning	Jathushan Rajasegaran, Munawar Hayat, Salman H. Khan, Fahad Shahbaz Khan, Ling Shao	In this paper, we propose a random path selection algorithm, called RPS-Net, that progressively chooses optimal paths for the new tasks while encouraging parameter sharing and reuse.
1135	Efficient Algorithms for Smooth Minimax Optimization	Kiran K. Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh	For strongly-convex $g(\cdot, y),\ \forall y$, we propose a new direct optimal algorithm combining Mirror-Prox and Nesterov’s AGD, and show that it can find global optimum in $\widetilde{O}\left(1/k^2 \right)$ iterations, improving over current state-of-the-art rate of $O(1/k)$.
1136	Shadowing Properties of Optimization Algorithms	Antonio Orvieto, Aurelien Lucchi	In an attempt to encourage the use of continuous-time methods in optimization, we show that, if some additional regularity on the objective is assumed, the ODE representations of Gradient Descent and Heavy-ball do not suffer from the aforementioned problem, once we allow for a small perturbation on the algorithm initial condition.
1137	Causal Regularization	Dominik Janzing	We argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also help in getting better causal models.
1138	Learning Hawkes Processes from a handful of events	Farnood Salehi, William Trouleau, Matthias Grossglauser, Patrick Thiran	To solve both issues, we develop in this work an efficient algorithm based on variational expectation-maximization.
1139	Unsupervised Object Segmentation by Redrawing	Micka?l Chen, Thierry Arti?res, Ludovic Denoyer	We present ReDO, a new model able to extract objects from images without any annotation in an unsupervised way.
1140	Regret Bounds for Learning State Representations in Reinforcement Learning	Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard	We propose an algorithm (UCB-MS) with O(sqrt(T)) regret in any communicating Markov decision process.
1141	Band-Limited Gaussian Processes: The Sinc Kernel	Felipe Tobar	We propose a novel class of Gaussian processes (GPs) whose spectra have compact support, meaning that their sample trajectories are almost-surely band limited.
1142	Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification	Evgenii Chzhen, Christophe Denis, Mohamed Hebiri, Luca Oneto, Massimiliano Pontil	We study the problem of fair binary classification using the notion of Equal Opportunity.
1143	Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning	Valerio Perrone, Huibin Shen	In this work, we introduce a method to automatically design the BO search space by relying on evaluations of previous black-box functions.
1144	Streaming Bayesian Inference for Crowdsourced Classification	Edoardo Manino, Long Tran-Thanh, Nicholas Jennings	In this paper, we revisit the problem of binary classification from crowdsourced data.
1145	Neuropathic Pain Diagnosis Simulator for Causal Discovery Algorithm Evaluation	Ruibo Tu, Kun Zhang, Bo Bertilson, Hedvig Kjellstrom, Cheng Zhang	In this work, we handle the problem of evaluating causal discovery algorithms by building a flexible simulator in the medical setting. We develop a neuropathic pain diagnosis simulator, inspired by the fact that the biological processes of neuropathic pathophysiology are well studied with well-understood causal influences.
1146	Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs	Jonas Kubilius, Martin Schrimpf, Ha Hong, Najib Majaj, Rishi Rajalingham, Elias Issa, Kohitij Kar, Pouya Bashivan, Jonathan Prescott-Roy, Kailyn Schmidt, Aran Nayebi, Daniel Bear, Daniel L. Yamins, James J. DiCarlo	Here we demonstrate that better anatomical alignment to the brain and high performance on machine learning as well as neuroscience measures do not have to be in contradiction.
1147	k-Means Clustering of Lines for Big Data	Yair Marom, Dan Feldman	We suggest the first PTAS that computes a $(1+\epsilon)$-approximation to this problem in time $O(n \log n)$ for any constant approximation error $\epsilon \in (0, 1)$, and constant integers $k, d \geq 1$.
1148	Random Projections and Sampling Algorithms for Clustering of High-Dimensional Polygonal Curves	Stefan Meintrup, Alexander Munteanu, Dennis Rohde	We study the $k$-median clustering problem for high-dimensional polygonal curves with finite but unbounded number of vertices.
1149	Recurrent Space-time Graph Neural Networks	Andrei Nicolicioiu, Iulia Duta, Marius Leordeanu	We propose a neural graph model, recurrent in space and time, suitable for capturing both the local appearance and the complex higher-level interactions of different entities and objects within the changing world scene.
1150	Uncertainty on Asynchronous Time Event Prediction	Bertrand Charpentier, Marin Bilo?, Stephan G?nnemann	In this work, we tackle the task of predicting the next event (given a history), and how this prediction changes with the passage of time.
1151	Accurate, reliable and fast robustness evaluation	Wieland Brendel, Jonas Rauber, Matthias K?mmerer, Ivan Ustyuzhaninov, Matthias Bethge	We here develop a new set of gradient-based adversarial attacks which (a) are more reliable in the face of gradient-masking than other gradient-based attacks, (b) perform better and are more query efficient than current state-of-the-art gradient-based attacks, (c) can be flexibly adapted to a wide range of adversarial criteria and (d) require virtually no hyperparameter tuning.
1152	Sparse High-Dimensional Isotonic Regression	David Gamarnik, Julia Gaudio	We consider the problem of estimating an unknown coordinate-wise monotone function given noisy measurements, known as the isotonic regression problem.
1153	Triad Constraints for Learning Causal Structure of Latent Variables	Ruichu Cai, Feng Xie, Clark Glymour, Zhifeng Hao, Kun Zhang	In this paper, by properly leveraging the non-Gaussianity of the data, we propose to estimate the structure over latent variables with the so-called Triad constraints: we design a form of “pseudo-residual” from three variables, and show that when causal relations are linear and noise terms are non-Gaussian, the causal direction between the latent variables for the three observed variables is identifiable by checking a certain kind of independence relationship.
1154	On the Inductive Bias of Neural Tangent Kernels	Alberto Bietti, Julien Mairal	In particular, we study smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks, and compare to other known kernels for similar architectures.
1155	Cross-Domain Transferability of Adversarial Perturbations	Muhammad Muzammal Naseer, Salman H. Khan, Muhammad Haris Khan, Fahad Shahbaz Khan, Fatih Porikli	To this end, we propose a framework capable of launching highly transferable attacks that crafts adversarial patterns to mislead networks trained on wholly different domains.
1156	Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices	Don Dennis, Durmus Alp Emre Acar, Vikram Mandikal, Vinu Sankar Sadasivan, Venkatesh Saligrama, Harsha Vardhan Simhadri, Prateek Jain	To induce long-term dependencies, and yet admit parallelization, we introduce novel shallow RNNs.
1157	Kernel quadrature with DPPs	Ayoub Belhadji, R?mi Bardenet, Pierre Chainais	We study quadrature rules for functions living in an RKHS, using nodes sampled from a projection determinantal point process (DPP).
1158	REM: From Structural Entropy to Community Structure Deception	Yiwei Liu, Jiamou Liu, Zijian Zhang, Liehuang Zhu, Angsheng Li	To this end, we propose a community-based structural entropy to express the amount of information revealed by a community structure.
1159	Sim2real transfer learning for 3D human pose estimation: motion to the rescue	Carl Doersch, Andrew Zisserman	In this paper, we show that standard neural-network approaches, which perform poorly when trained on synthetic RGB images, can perform well when the data is pre-processed to extract cues about the person’s motion, notably as optical flow and the motion of 2D keypoints.
1160	Self-Supervised Deep Learning on Point Clouds by Reconstructing Space	Bjarne Sievers, Jonathan Sauder	We propose a self-supervised learning task for deep learning on raw point cloud data in which a neural network is trained to reconstruct point clouds whose parts have been randomly rearranged.
1161	Piecewise Strong Convexity of Neural Networks	Tristan Milne	We study the loss surface of a feed-forward neural network with ReLU non-linearities, regularized with weight decay.
1162	Minimum Stein Discrepancy Estimators	Alessandro Barp, Francois-Xavier Briol, Andrew Duncan, Mark Girolami, Lester Mackey	We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths.
1163	Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes	James Bailey, Georgios Piliouras	We show for the first time that it is possible to reconcile in online learning in zero-sum games two seemingly contradictory objectives: vanishing time-average regret and non-vanishing step sizes.
1164	Generalization Bounds for Neural Networks via Approximate Description Length	Amit Daniely, Elad Granot	We investigate the sample complexity of networks with bounds on the magnitude of its weights.
1165	Provably robust boosted decision stumps and trees against adversarial attacks	Maksym Andriushchenko, Matthias Hein	We show in this paper that for boosted decision stumps the \textit{exact} min-max robust loss and test error for an $l_\infty$-attack can be computed in $O(T\log T)$ time per input, where $T$ is the number of decision stumps and the optimal update step of the ensemble can be done in $O(n^2\,T\log T)$, where $n$ is the number of data points.
1166	Convergence of Adversarial Training in Overparametrized Neural Networks	Ruiqi Gao, Tianle Cai, Haochuan Li, Cho-Jui Hsieh, Liwei Wang, Jason D. Lee	This paper provides a partial answer to the success of adversarial training, by showing that it converges to a network where the surrogate loss with respect to the the attack algorithm is within $\epsilon$ of the optimal robust loss.
1167	A Composable Specification Language for Reinforcement Learning Tasks	Kishor Jothimurugan, Rajeev Alur, Osbert Bastani	We propose a language for specifying complex control tasks, along with an algorithm that compiles specifications in our language into a reward function and automatically performs reward shaping.
1168	The Option Keyboard: Combining Skills in Reinforcement Learning	Andre Barreto, Diana Borsa, Shaobo Hou, Gheorghe Comanici, Eser Ayg?n, Philippe Hamel, Daniel Toyama, Jonathan hunt, Shibl Mourad, David Silver, Doina Precup	Based on this premise, we propose a framework for combining skills using the formalism of options.
1169	Unified Language Model Pre-training for Natural Language Understanding and Generation	Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon	This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks.
1170	Learning to Correlate in Multi-Player General-Sum Sequential Games	Andrea Celli, Alberto Marchesi, Tommaso Bianchi, Nicola Gatti	In this paper, we focus on coarse correlated equilibria (CCEs) in sequential games.
1171	Stochastic Continuous Greedy ++: When Upper and Lower Bounds Match	Amin Karbasi, Hamed Hassani, Aryan Mokhtari, Zebang Shen	In this paper, we develop \scg~(\text{SCG}{$++$}), the first efficient variant of a conditional gradient method for maximizing a continuous submodular function subject to a convex constraint.
1172	Generative Well-intentioned Networks	Justin Cosentino, Jun Zhu	We propose Generative Well-intentioned Networks (GWINs), a novel framework for increasing the accuracy of certainty-based, closed-world classifiers.
1173	Online-Within-Online Meta-Learning	Giulia Denevi, Dimitris Stamos, Carlo Ciliberto, Massimiliano Pontil	We study the problem of learning a series of tasks in a fully online Meta-Learning setting.
1174	Learning step sizes for unfolded sparse coding	Pierre Ablin, Thomas Moreau, Mathurin Massias, Alexandre Gramfort	In this paper, we study the selection of adapted step sizes for ISTA.
1175	Biases for Emergent Communication in Multi-agent Reinforcement Learning	Tom Eccles, Yoram Bachrach, Guy Lever, Angeliki Lazaridou, Thore Graepel	We introduce inductive biases for positive signalling and positive listening, which ease this problem.
1176	Episodic Memory in Lifelong Language Learning	Cyprien de Masson d’Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama	We propose an episodic memory model that performs sparse experience replay and local adaptation to mitigate catastrophic forgetting in this setup.
1177	A Simple Baseline for Bayesian Uncertainty in Deep Learning	Wesley J. Maddox, Pavel Izmailov, Timur Garipov, Dmitry P. Vetrov, Andrew Gordon Wilson	We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning.
1178	Communication-efficient Distributed SGD with Sketching	Nikita Ivkin, Daniel Rothchild, Enayat Ullah, Vladimir braverman, Ion Stoica, Raman Arora	Motivated by the success of sketching methods in sub-linear/streaming algorithms, we introduce Sketched-SGD, an algorithm for carrying out distributed SGD by communicating sketches instead of full gradients.
1179	Modeling Conceptual Understanding in Image Reference Games	Rodolfo Corona Rodriguez, Stephan Alaniz, Zeynep Akata	In this work, we present both an image reference game between a speaker and a population of listeners where reasoning about the concepts other agents can comprehend is necessary and a model formulation with this capability.
1180	Kalman Filter, Sensor Fusion, and Constrained Regression: Equivalences and Insights	Maria Jahja, David Farrow, Roni Rosenfeld, Ryan J. Tibshirani	In this work, we show that the state estimates from the KF in a standard linear dynamical system setting are equivalent to those given by the KF in a transformed system, with infinite process noise (i.e., a “flat prior”) and an augmented measurement space.
1181	Near Neighbor: Who is the Fairest of Them All?	Sariel Har-Peled, Sepideh Mahabadi	In this work we study a “fair” variant of the near neighbor problem.
1182	Outlier-robust estimation of a sparse linear model using `\ell_1`-penalized Huber's `M`-estimator	Arnak Dalalyan, Philip Thompson	We study the problem of estimating a $p$-dimensional $s$-sparse vector in a linear model with Gaussian design.
1183	Learning nonlinear level sets for dimensionality reduction in function approximation	Guannan Zhang, Jiaxin Zhang, Jacob Hinkle	We developed a Nonlinear Level-set Learning (NLL) method for dimensionality reduction in high-dimensional function approximation with small data.
1184	Assessing Social and Intersectional Biases in Contextualized Word Representations	Yi Chern Tan, L. Elisa Celis	In this paper, we analyze the extent to which state-of-the-art models for contextual word representations, such as BERT and GPT-2, encode biases with respect to gender, race, and intersectional identities.
1185	Online Convex Matrix Factorization with Representative Regions	Jianhao Peng, Olgica Milenkovic, Abhishek Agarwal	We address both problems by proposing the first online convex MF algorithm that maintains a collection of constant-size sets of representative data samples needed for interpreting each of the basis (Ding et al., 2010) and has the same almost sure convergence guarantees as the online learning algorithm of Mairal et al., 2010.
1186	Self-supervised GAN: Analysis and Improvement with Multi-class Minimax Game	Ngoc-Trung Tran, Viet-Hung Tran, Bao-Ngoc Nguyen, Linxiao Yang, Ngai-Man (Man) Cheung	In this work, we perform an in-depth analysis to understand how SS tasks interact with learning of generator.
1187	Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products	Tharun Kumar Reddy Medini, Qixuan Huang, Yiqiu Wang, Vijai Mohan, Anshumali Shrivastava	To alleviate this problem, we present Merged-Average Classifiers via Hashing (MACH), a generic $K$-classification algorithm where memory provably scales at $O(\log K)$ without any assumption on the relation between classes.
1188	A Fourier Perspective on Model Robustness in Computer Vision	Dong Yin, Raphael Gontijo Lopes, Jon Shlens, Ekin Dogus Cubuk, Justin Gilmer	We find that both methods improve robustness to corruptions that are concentrated in the high frequency domain while reducing robustness to corruptions that are concentrated in the low frequency domain.
1189	The continuous Bernoulli: fixing a pervasive error in variational autoencoders	Gabriel Loaiza-Ganem, John P. Cunningham	We introduce and fully characterize a new [0,1]-supported, single parameter distribution: the continuous Bernoulli, which patches this pervasive bug in VAE.
1190	Privacy Amplification by Mixing and Diffusion Mechanisms	Borja Balle, Gilles Barthe, Marco Gaboardi, Joseph Geumlek	In this paper we investigate under what conditions stochastic post-processing can amplify the privacy of a mechanism.
1191	Variance Reduction in Bipartite Experiments through Correlation Clustering	Jean Pouget-Abadie, Kevin Aydin, Warren Schudy, Kay Brodersen, Vahab Mirrokni	This paper introduces a novel clustering objective and a corresponding algorithm that partitions a bipartite graph so as to maximize the statistical power of a bipartite experiment on that graph.
1192	Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning	Mahmoud (“Mido”) Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Mike Rabbat	In this work, we propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take advantage of a large number of distributed simulators.
1193	Metalearned Neural Memory	Tsendsuren Munkhdalai, Alessandro Sordoni, TONG WANG, Adam Trischler	We augment recurrent neural networks with an external memory mechanism that builds upon recent progress in metalearning.
1194	Learning Multiple Markov Chains via Adaptive Allocation	Mohammad Sadegh Talebi, Odalric-Ambrym Maillard	We present a novel learning algorithm that efficiently balances \emph{exploration} and \emph{exploitation} intrinsic to this problem, without any prior knowledge of the chains.
1195	Diffusion Improves Graph Learning	Johannes Klicpera, Stefan Wei?enberger, Stephan G?nnemann	In this work, we remove the restriction of using only the direct neighbors by introducing a powerful, yet spatially localized graph convolution: Graph diffusion convolution (GDC).
1196	Deep Random Splines for Point Process Intensity Estimation of Neural Population Data	Gabriel Loaiza-Ganem, Sean Perkins, Karen Schroeder, Mark Churchland, John P. Cunningham	Here we propose Deep Random Splines, a flexible class of random functions obtained by transforming Gaussian noise through a deep neural network whose output are the parameters of a spline.
1197	Variational Bayes under Model Misspecification	Yixin Wang, David Blei	In this work, we study VB under model misspecification.
1198	Global Convergence of Gradient Descent for Deep Linear Residual Networks	Lei Wu, Qingcan Wang, Chao Ma	We analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization.
1199	On Differentially Private Graph Sparsification and Applications	Raman Arora, Jalaj Upadhyay	In this paper, we study private sparsification of graphs.
1200	Manifold denoising by Nonlinear Robust Principal Component Analysis	He Lyu, Ningyu Sha, Shuyang Qin, Ming Yan, Yuying Xie, Rongrong Wang	We answer these two questions affirmatively by proposing and analyzing an optimization framework that separates the sparse component from the manifold under noisy data.
1201	Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes	Junzhe Zhang, Elias Bareinboim	In this paper, we investigate the online reinforcement learning (RL) problem for selecting optimal DTRs provided that observational data is available.
1202	ODE2VAE: Deep generative second order ODEs with Bayesian neural networks	Cagatay Yildiz, Markus Heinonen, Harri Lahdesmaki	We present Ordinary Differential Equation Variational Auto-Encoder (ODE2VAE), a latent second order ODE model for high-dimensional sequential data.
1203	Optimal Sampling and Clustering in the Stochastic Block Model	Se-Young Yun, Alexandre Proutiere	This paper investigates the design of joint adaptive sampling and clustering algorithms in networks whose structure follows the celebrated Stochastic Block Model (SBM).
1204	Recurrent Kernel Networks	Dexiong Chen, Laurent Jacob, Julien Mairal	In this paper, we revisit this link by generalizing convolutional kernel networks—originally related to a relaxation of the mismatch kernel—to model gaps in sequences.
1205	Cold Case: The Lost MNIST Digits	Chhavi Yadav, Leon Bottou	We propose a reconstruction that is accurate enough to serve as a replacement for the MNIST dataset, with insignificant changes in accuracy.
1206	Hierarchical Optimal Transport for Multimodal Distribution Alignment	John Lee, Max Dabagia, Eva Dyer, Christopher Rozell	To solve this numerically, we propose a distributed ADMM algorithm that also exploits the Sinkhorn distance, thus it has an efficient computational complexity that scales quadratically with the size of the largest cluster.
1207	Exploration via Hindsight Goal Generation	Zhizhou Ren, Kefan Dong, Yuan Zhou, Qiang Liu, Jian Peng	In this paper, we introduce Hindsight Goal Generation (HGG), a novel algorithmic framework that generates valuable hindsight goals which are easy for an agent to achieve in the short term and are also potential for guiding the agent to reach the actual goal in the long term.
1208	Shaping Belief States with Generative Environment Models for RL	Karol Gregor, Danilo Jimenez Rezende, Frederic Besse, Yan Wu, Hamza Merzic, Aaron van den Oord	We propose a way to efficiently train expressive generative models in complex environments.
1209	Globally Optimal Learning for Structured Elliptical Losses	Yoav Wald, Nofar Noy, Gal Elidan, Ami Wiesel	In this work, we analyze robust alternatives.
1210	Object landmark discovery through unsupervised adaptation	Enrique Sanchez, Georgios Tzimiropoulos	This paper proposes a method to ease the unsupervised learning of object landmark detectors.
1211	Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering	Biwei Huang, Kun Zhang, Pengtao Xie, Mingming Gong, Eric P. Xing, Clark Glymour	In this paper, we develop a unified framework for causal discovery and mechanism-based group identification.
1212	Search-Guided, Lightly-Supervised Training of Structured Prediction Energy Networks	Amirmohammad Rooshenas, Dongxu Zhang, Gopal Sharma, Andrew McCallum	In this paper, we instead use efficient truncated randomized search in this reward function to train structured prediction energy networks (SPENs), which provide efficient test-time inference using gradient-based search on a smooth, learned representation of the score landscape, and have previously yielded state-of-the-art results in structured prediction.
1213	Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions	Ashia C. Wilson, Lester Mackey, Andre Wibisono	We present a family of algorithms, called descent algorithms, for optimizing convex and non-convex functions.
1214	RUDDER: Return Decomposition for Delayed Rewards	Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, Sepp Hochreiter	We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes (MDPs).
1215	Graph Normalizing Flows	Jenny Liu, Aviral Kumar, Jimmy Ba, Jamie Kiros, Kevin Swersky	We introduce graph normalizing flows: a new, reversible graph neural network model for prediction and generation.
1216	Explanations can be manipulated and geometry is to blame	Ann-Kathrin Dombrowski, Maximillian Alber, Christopher Anders, Marcel Ackermann, Klaus-Robert M?ller, Pan Kessel	In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes.
1217	Communication trade-offs for Local-SGD with large step size	Aymeric Dieuleveut, Kumar Kshitij Patel	We propose a non-asymptotic error analysis, which enables comparison to \emph{one-shot averaging} i.e., a single communication round among independent workers, and \emph{mini-batch averaging} i.e., communicating at every step.
1218	Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics	Giancarlo Kerg, Kyle Goyette, Maximilian Puelma Touzel, Gauthier Gidel, Eugene Vorontsov, Yoshua Bengio, Guillaume Lajoie	We propose a novel connectivity structure based on the Schur decomposition and a splitting of the Schur form into normal and non-normal parts.
1219	No-Regret Learning in Unknown Games with Correlated Payoffs	Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause	In this paper, we consider a natural model where, besides a noisy measurement of the obtained reward, the player can also observe the opponents’ actions.
1220	Alleviating Label Switching with Optimal Transport	Pierre Monteiller, Sebastian Claici, Edward Chien, Farzaneh Mirzazadeh, Justin M. Solomon, Mikhail Yurochkin	We propose a resolution to label switching that leverages machinery from optimal transport.
1221	Paraphrase Generation with Latent Bag of Words	Yao Fu, Yansong Feng, John P. Cunningham	Inspired by variational autoencoders with discrete latent structures, in this work, we propose a latent bag of words (BOW) model for paraphrase generation.
1222	An Algorithmic Framework For Differentially Private Data Analysis on Trusted Processors	Joshua Allen, Bolin Ding, Janardhan Kulkarni, Harsha Nori, Olga Ohrimenko, Sergey Yekhanin	In this work, we propose a framework based on trusted processors and a new definition of differential privacy called Oblivious Differential Privacy, which combines the best of both local and global models.
1223	Compacting, Picking and Growing for Unforgetting Continual Learning	Ching-Yi Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, Chu-Song Chen	In this paper, we propose a simple but effective approach to continual deep learning.
1224	Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems	Asma Ghandeharioun, Judy Hanwen Shen, Natasha Jaques, Craig Ferguson, Noah Jones, Agata Lapedriza, Rosalind Picard	In particular, we propose a self-play scenario where the dialog system talks to itself and we calculate a combination of proxies such as sentiment and semantic coherence on the conversation trajectory.
1225	A New Distribution on the Simplex with Auto-Encoding Applications	Andrew Stirn, Tony Jebara, David Knowles	We construct a new distribution for the simplex using the Kumaraswamy distribution and an ordered stick-breaking process.
1226	AutoPrune: Automatic Network Pruning by Regularizing Auxiliary Parameters	XIA XIAO, Zigeng Wang, Sanguthevar Rajasekaran	To build a better generalized and easy-to-use pruning method, we propose AutoPrune, which prunes the network through optimizing a set of trainable auxiliary parameters instead of original weights.
1227	A neurally plausible model learns successor representations in partially observable environments	Eszter V?rtes, Maneesh Sahani	Here, we introduce a neurally plausible model using \emph{distributional successor features}, which builds on the distributed distributional code for the representation and computation of uncertainty, and which allows for efficient value function computation in partially observed environments via the successor representation.
1228	Learning about an exponential amount of conditional distributions	Mohamed Belghazi, Maxime Oquab, David Lopez-Paz	We introduce the Neural Conditioner (NC), a self-supervised machine able to learn about all the conditional distributions of a random vector X.
1229	Towards modular and programmable architecture search	Renato Negrinho, Matthew Gormley, Geoffrey J. Gordon, Darshan Patil, Nghia Le, Daniel Ferreira	In this work, we propose a formal language for encoding search spaces over general computational graphs.
1230	Towards Hardware-Aware Tractable Learning of Probabilistic Models	Laura I. Galindez Olascoaga, Wannes Meert, Nimish Shah, Marian Verhelst, Guy Van den Broeck	We propose a novel resource-aware cost metric that takes into consideration the hardware’s properties in determining whether the inference task can be efficiently deployed.
1231	On Robustness to Adversarial Examples and Polynomial Optimization	Pranjal Awasthi, Abhratanu Dutta, Aravindan Vijayaraghavan	The main contribution of this work is to exhibit a strong connection between achieving robustness to adversarial examples, and a rich class of polynomial optimization problems, thereby making progress on the above questions.
1232	Rand-NSG: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node	Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, Rohan Kadekodi	We present a new graph-based indexing and search system called DiskANN that can index, store, and search a billion point database on a single workstation with just 64GB RAM and an inexpensive solid-state drive (SSD).
1233	A Solvable High-Dimensional Model of GAN	Chuang Wang, Hong Hu, Yue Lu	We present a theoretical analysis of the training process for a single-layer GAN fed by high-dimensional input data.
1234	Using Embeddings to Correct for Unobserved Confounding in Networks	Victor Veitch, Yixin Wang, David Blei	We consider causal inference in the presence of unobserved confounding.
1235	MonoForest framework for tree ensemble analysis	Igor Kuralenok, Vasilii Ershov, Igor Labutin	In this work, we introduce a new decision tree ensemble representation framework: instead of using a graph model we transform each tree into a well-known polynomial form.
1236	Bayesian Optimization under Heavy-tailed Payoffs	Sayak Ray Chowdhury, Aditya Gopalan	We consider black box optimization of an unknown function in the nonparametric Gaussian process setting when the noise in the observed function values can be heavy tailed.
1237	Combining Generative and Discriminative Models for Hybrid Inference	Victor Garcia Satorras, Max Welling, Zeynep Akata	In this work we propose a hybrid model that combines graphical inference with a learned inverse model, which we structure as in a graph neural network, while the iterative algorithm as a whole is formulated as a recurrent neural network.
1238	A Graph Theoretic Additive Approximation of Optimal Transport	Nathaniel Lahn, Deepika Mulchandani, Sharath Raghvendra	We present an adaptation of the classical graph algorithm of Gabow and Tarjan and provide a novel analysis of this algorithm that bounds its execution time by $\BigO(\frac{n^2 C}{\delta}+ \frac{nC^2}{\delta^2})$.
1239	Adversarial Robustness through Local Linearization	Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, Pushmeet Kohli	In this work, we introduce a novel regularizer that encourages the loss to behave linearly in the vicinity of the training data, thereby penalizing gradient obfuscation while encouraging robustness.
1240	Sampled Softmax with Random Fourier Features	Ankit Singh Rawat, Jiecao Chen, Felix Xinnan X. Yu, Ananda Theertha Suresh, Sanjiv Kumar	Motivated by our analysis and the work on kernel-based sampling, we propose the Random Fourier Softmax (RF-softmax) method that utilizes the powerful Random Fourier Features to enable more efficient and accurate sampling from an approximate softmax distribution.
1241	Semi-flat minima and saddle points by embedding neural networks to overparameterization	Kenji Fukumizu, Shoichiro Yamaguchi, Yoh-ichi Mototake, Mirai Tanaka	We consider three basic methods for embedding a network into a wider one with more hidden units, and discuss whether a minimum point of the narrower network gives a minimum or saddle point of the wider one.
1242	Learning Fairness in Multi-Agent Systems	Jiechuan Jiang, Zongqing Lu	To tackle these difficulties, we propose FEN, a novel hierarchical reinforcement learning model.
1243	Primal-Dual Block Generalized Frank-Wolfe	Qi Lei, JIACHENG ZHUO, Constantine Caramanis, Inderjit S. Dhillon, Alexandros G. Dimakis	We propose a generalized variant of Frank-Wolfe algorithm for solving a class of sparse/low-rank optimization problems.
1244	GOT: An Optimal Transport framework for Graph comparison	Hermina Petric Maretic, Mireille El Gheche, Giovanni Chierchia, Pascal Frossard	We present a novel framework based on optimal transport for the challenging problem of comparing graphs.
1245	On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks	Sunil Thulasidasan, Gopinath Chennupati, Jeff A. Bilmes, Tanmoy Bhattacharya, Sarah Michalak	In this work, we discuss a hitherto untouched aspect of mixup training — the calibration and predictive uncertainty of models trained with mixup.
1246	Complexity of Highly Parallel Non-Smooth Convex Optimization	Sebastien Bubeck, Qijia Jiang, Yin-Tat Lee, Yuanzhi Li, Aaron Sidford	Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer poly(d) gradient queries in parallel.
1247	Inverting Deep Generative models, One layer at a time	Qi Lei, Ajil Jalal, Inderjit S. Dhillon, Alexandros G. Dimakis	In this paper we obtain several novel theoretical results for the inversion problem.
1248	Calculating Optimistic Likelihoods Using (Geodesically) Convex Optimization	Viet Anh Nguyen, Soroosh Shafieezadeh Abadeh, Man-Chung Yue, Daniel Kuhn, Wolfram Wiesemann	We thus propose to replace each nominal distribution with an ambiguity set containing all distributions in its vicinity and to evaluate an optimistic likelihood, that is, the maximum of the likelihood over all distributions in the ambiguity set.
1249	The Implicit Metropolis-Hastings Algorithm	Kirill Neklyudov, Evgenii Egorov, Dmitry P. Vetrov	For any implicit probabilistic model and a target distribution represented by a set of samples, implicit Metropolis-Hastings operates by learning a discriminator to estimate the density-ratio and then generating a chain of samples.
1250	An Inexact Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints	Mehmet Fatih Sahin, Armin eftekhari, Ahmet Alacaoglu, Fabian Latorre, Volkan Cevher	We propose a practical inexact augmented Lagrangian method (iALM) for nonconvex problems with nonlinear constraints.
1251	Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck	Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann	We discuss those differences and propose modifications to existing regularization techniques in order to better adapt them to RL.
1252	Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift	Jasper Snoek, Yaniv Ovadia, Emily Fertig, Balaji Lakshminarayanan, Sebastian Nowozin, D. Sculley, Joshua Dillon, Jie Ren, Zachary Nado	We present a large-scale benchmark of existing state-of-the-art methods on classification problems and investigate the effect of dataset shift on accuracy and calibration.
1253	Accurate Layerwise Interpretable Competence Estimation	Vickram Rajendran, William LeVine	In this paper, we seek to examine, understand, and predict the pointwise competence of classification models.
1254	A New Perspective on Pool-Based Active Classification and False-Discovery Control	Lalit Jain, Kevin G. Jamieson	In this paper, we provide the first provably sample efficient adaptive algorithm for this problem.
1255	Defending Neural Backdoors via Generative Distribution Modeling	Ximing Qiao, Yukun Yang, Hai Li	In the work, we explore the space formed by the pixel values of all possible backdoor triggers.
1256	Are Sixteen Heads Really Better than One?	Paul Michel, Omer Levy, Graham Neubig	However we observe that, in practice, a large proportion of attention heads can be removed at test time without significantly impacting performance, and that some layers can even be reduced to a single head.
1257	Multi-resolution Multi-task Gaussian Processes	Oliver Hamelijnck, Theodoros Damoulas, Kangrui Wang, Mark Girolami	We offer a multi-resolution multi-task (MRGP) framework that allows for both inter-task and intra-task multi-resolution and multi-fidelity.
1258	Variational Bayesian Optimal Experimental Design	Adam Foster, Martin Jankowiak, Elias Bingham, Paul Horsfall, Yee Whye Teh, Thomas Rainforth, Noah Goodman	To address this, we introduce several classes of fast EIG estimators by building on ideas from amortized variational inference.
1259	Universal Approximation of Input-Output Maps by Temporal Convolutional Nets	Joshua Hanson, Maxim Raginsky	We prove that TCNs can approximate a large class of input-output maps having approximately finite memory to arbitrary error tolerance.
1260	Provable Certificates for Adversarial Examples: Fitting a Ball in the Union of Polytopes	Matt Jordan, Justin Lewis, Alexandros G. Dimakis	We propose a novel method for computing exact pointwise robustness of deep neural networks for all convex lp norms.
1261	Reinforcement Learning with Convex Constraints	Sobhan Miryoosefi, Kiant? Brantley, Hal Daume III, Miro Dudik, Robert E. Schapire	In this paper, we propose an algorithmic scheme that can handle a wide class of constraints in RL tasks: specifically, any constraints that require expected values of some vector measurements (such as the use of an action) to lie in a convex set.
1262	User-Specified Local Differential Privacy in Unconstrained Adaptive Online Learning	Dirk van der Hoeven	In this paper we generalize this approach by allowing the provider of the data to choose the distribution of the noise without disclosing any parameters of the distribution to the learner, under the constraint that the distribution is symmetrical.
1263	Stochastic Bandits with Context Distributions	Johannes Kirschner, Andreas Krause	We introduce a stochastic contextual bandit model where at each time step the environment chooses a distribution over a context set and samples the context from this distribution.
1264	Inducing brain-relevant bias in natural language processing models	Dan Schwartz, Mariya Toneva, Leila Wehbe	We demonstrate that a version of BERT, a recently introduced and powerful language model, can improve the prediction of brain activity after fine-tuning.
1265	Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning	Harm Van Seijen, Mehdi Fatemi, Arash Tavakoli	We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause.
1266	Recovering Bandits	Ciara Pike-Burke, Steffen Grunewalder	In this work, we explore the use of Gaussian processes to tackle the estimation and planing problem.
1267	Computing Linear Restrictions of Neural Networks	Matthew Sotoudeh, Aditya V. Thakur	We present an efficient algorithm for computing ExactLine for networks that use ReLU, MaxPool, batch normalization, fully-connected, convolutional, and other layers, along with several applications.
1268	Learning Positive Functions with Pseudo Mirror Descent	Yingxiang Yang, Haoxiang Wang, Negar Kiyavash, Niao He	In this paper, we propose a novel algorithm, pseudo mirror descent, that performs efficient estimation of positive functions within a Hilbert space without expensive projections.
1269	Correlation Priors for Reinforcement Learning	Bastian Alt, Adrian ?o?ic, Heinz Koeppl	In this work, we present a Bayesian learning framework based on Pólya-Gamma augmentation that enables an analogous reasoning in such cases.
1270	Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression	Deeksha Adil, Richard Peng, Sushant Sachdeva	We propose p-IRLS, the first IRLS algorithm that provably converges geometrically for any p \in [2,\infty).
1271	A Similarity-preserving Network Trained on Transformed Images Recapitulates Salient Features of the Fly Motion Detection Circuit	Yanis Bahroun, Dmitri Chklovskii, Anirvan Sengupta	Here we propose a biologically plausible model of motion detection.
1272	Differentially Private Covariance Estimation	Kareem Amin, Travis Dick, Alex Kulesza, Andres Munoz, Sergei Vassilvitskii	In this work we propose a new epsilon-differentially private algorithm for computing the covariance matrix of a dataset that addresses both of these limitations.
1273	Outlier Detection and Robust PCA Using a Convex Measure of Innovation	Mostafa Rahmani, Ping Li	This paper presents a provable and strong algorithm, termed Innovation Search (iSearch), to robust Principal Component Analysis (PCA) and outlier detection.
1274	Integrating Markov processes with structural causal modeling enables counterfactual inference in complex systems	Robert Ness, Kaushal Paneri, Olga Vitek	This manuscript contributes a general and practical framework for casting a Markov process model of a system at equilibrium as a structural causal model, and carrying out counterfactual inference.
1275	Are Disentangled Representations Helpful for Abstract Visual Reasoning?	Sjoerd van Steenkiste, Francesco Locatello, J?rgen Schmidhuber, Olivier Bachem	In this paper, we conduct a large-scale study that investigates whether disentangled representations are more suitable for abstract reasoning tasks.
1276	PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization	Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi	We propose a low-rank gradient compressor that can i) compress gradients rapidly, ii) efficiently aggregate the compressed gradients using all-reduce, and iii) achieve test performance on par with SGD.
1277	Stochastic Frank-Wolfe for Composite Convex Minimization	Francesco Locatello, Alp Yurtsever, Olivier Fercoq, Volkan Cevher	In this work, we propose the first conditional-gradient-type method for solving stochastic optimization problems under affine constraints.
1278	Constraint-based Causal Structure Learning with Consistent Separating Sets	Honghao Li, Vincent Cabeli, Nadir Sella, Herve Isambert	In this paper, we propose a simple modification of PC and PC-derived algorithms so as to ensure that all separating sets identified to remove dispensable edges are consistent with the final graph,thus enhancing the explainability of constraint-basedmethods.
1279	Unsupervised Discovery of Temporal Structure in Noisy Data with Dynamical Components Analysis	David Clark, Jesse Livezey, Kristofer Bouchard	Combining these approaches, we introduce Dynamical Components Analysis (DCA), a linear dimensionality reduction method which discovers a subspace of high-dimensional time series data with maximal predictive information, defined as the mutual information between the past and future.
1280	Sample Efficient Active Learning of Causal Trees	Kristjan Greenewald, Dmitriy Katz, Karthikeyan Shanmugam, Sara Magliacane, Murat Kocaoglu, Enric Boix Adsera, Guy Bresler	We propose an adaptive framework that determines the next intervention based on a Bayesian prior updated with the outcomes of previous experiments, focusing on the setting where observational data is cheap (assumed infinite) and interventional data is expensive.
1281	Efficient Neural Architecture Transformation Search in Channel-Level for Object Detection	Junran Peng, Ming Sun, ZHAO-XIANG ZHANG, Tieniu Tan, Junjie Yan	To overcome this obstacle, we introduce a practical neural architecture transformation search(NATS) algorithm for object detection in this paper.
1282	Robust Attribution Regularization	Jiefeng Chen, Xi Wu, Vaibhav Rastogi, Yingyu Liang, Somesh Jha	We propose training objectives in classic robust optimization models to achieve robust IG attributions.
1283	Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization	Miika Aittala, Prafull Sharma, Lukas Murmann, Adam Yedidia, Gregory Wornell, Bill Freeman, Fredo Durand	We solve this problem by factoring the observed video into a matrix product between the unknown hidden scene video and an unknown light transport matrix.
1284	When to use parametric models in reinforcement learning?	Hado P. van Hasselt, Matteo Hessel, John Aslanides	We examine the question of when and how parametric models are most useful in reinforcement learning.
1285	General E(2)-Equivariant Steerable CNNs	Maurice Weiler, Gabriele Cesa	Here we give a general description of E(2)-equivariant convolutions in the framework of Steerable CNNs.
1286	Characterization and Learning of Causal Graphs with Latent Variables from Soft Interventions	Murat Kocaoglu, Amin Jaber, Karthikeyan Shanmugam, Elias Bareinboim	In this paper, we investigate the more general scenario where multiple observational and experimental distributions are available.
1287	Structure Learning with Side Information: Sample Complexity	Saurabh Sihag, Ali Tajer	This paper focuses on Ising graphical models, and considers the problem of simultaneously learning the structures of two {\sl partially} similar graphs, where any inference about the structure of one graph offers side information for the other graph.
1288	Untangling in Invariant Speech Recognition	Cory Stephenson, Jenelle Feather, Suchismita Padhy, Oguz Elibol, Hanlin Tang, Josh McDermott, SueYeon Chung	In this work, we employ a recently developed statistical mechanical theory that connects geometric properties of network representations and the separability of classes to probe how information is untangled within neural networks trained to recognize speech.
1289	Flexible information routing in neural populations through stochastic comodulation	Caroline Haimerl, Cristina Savin, Eero Simoncelli	Here, we propose a novel solution based on functionally-targeted stochastic modulation.
1290	Generalization Bounds in the Predict-then-Optimize Framework	Othman El Balghiti, Adam Elmachtoub, Paul Grigas, Ambuj Tewari	In this work, we provide an assortment of generalization bounds for the SPO loss function.
1291	Categorized Bandits	Matthieu Jedor, Vianney Perchet, Jonathan Louedec	We introduce a new stochastic multi-armed bandit setting where arms are grouped inside “ordered” categories.
1292	Worst-Case Regret Bounds for Exploration via Randomized Value Functions	Daniel Russo	By providing a worst-case regret bound for tabular finite-horizon Markov decision processes, we show that planning with respect to these randomized value functions can induce provably efficient exploration.
1293	Efficient characterization of electrically evoked responses for neural interfaces	Nishal Shah, Sasidhar Madugula, Pawel Hottowy, Alexander Sher, Alan Litke, Liam Paninski, E.J. Chichilnisky	This work tests the idea that using prior information from previous experiments and closed-loop measurements may greatly increase the efficiency of the neural interface.
1294	Differentially Private Distributed Data Summarization under Covariate Shift	Kanthi Sarpatwar, Karthikeyan Shanmugam, Venkata Sitaramagiridharganesh Ganapavarapu, Ashish Jagmohan, Roman Vaculin	We introduce a novel “noiseless” differentially private auctioning protocol, which may be of independent interest.
1295	Hamiltonian descent for composite objectives	Brendan O’Donoghue, Chris J. Maddison	In this paper we consider a convex optimization problem consisting of the sum of two convex functions, sometimes referred to as a composite objective, and we identify the duality gap to be the `energy’ of the system.
1296	Implicit Regularization of Accelerated Methods in Hilbert Spaces	Nicol? Pagliana, Lorenzo Rosasco	We study learning properties of accelerated gradient descent methods for linear least-squares in Hilbert spaces.
1297	Non-Asymptotic Pure Exploration by Solving Games	R?my Degenne, Wouter M. Koolen, Pierre M?nard	We interpret the optimisation problem as an unknown game, and propose sampling rules based on iterative strategies to estimate and converge to its saddle point.
1298	Implicit Posterior Variational Inference for Deep Gaussian Processes	Haibin YU, Yizhou Chen, Bryan Kian Hsiang Low, Patrick Jaillet, Zhongxiang Dai	This paper presents an implicit posterior variational inference (IPVI) framework for DGPs that can ideally recover an unbiased posterior belief and still preserve time efficiency.
1299	Deep Multi-State Dynamic Recurrent Neural Networks Operating on Wavelet Based Neural Features for Robust Brain Machine Interfaces	Benyamin Allahgholizadeh Haghi, Spencer Kellis, Sahil Shah, Maitreyi Ashok, Luke Bashford, Daniel Kramer, Brian Lee, Charles Liu, Richard Andersen, Azita Emami	We present a new deep multi-state Dynamic Recurrent Neural Network (DRNN) architecture for Brain Machine Interface (BMI) applications.
1300	Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback	Arun Verma, Manjesh Hanawal, Arun Rajkumar, Raman Sankaran	In this paper, we study Censored Semi-Bandits, a novel variant of the semi-bandits problem.
1301	Cormorant: Covariant Molecular Neural Networks	Brandon Anderson, Truong Son Hy, Risi Kondor	We propose Cormorant, a rotationally covariant neural network architecture for learning the behavior and properties of complex many-body physical systems.
1302	Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness	Andrey Malinin, Mark Gales	First, we show that the appropriate training criterion for Prior Networks is the reverse KL-divergence between Dirichlet distributions. This addresses issues in the nature of the training data target distributions, enabling prior networks to be successfully trained on classification tasks with arbitrarily many classes, as well as improving out-of-distribution detection performance. Second, taking advantage of this new training criterion, this paper investigates using Prior Networks to detect adversarial attacks and proposes a generalized form of adversarial training.
1303	Reflection Separation using a Pair of Unpolarized and Polarized Images	Youwei Lyu, Zhaopeng Cui, Si Li, Marc Pollefeys, Boxin Shi	In this paper, we propose to exploit physical constraints from a pair of unpolarized and polarized images to separate reflection and transmission layers.
1304	Policy Poisoning in Batch Reinforcement Learning and Control	Yuzhe Ma, Xuezhou Zhang, Wen Sun, Jerry Zhu	We present a unified framework for solving batch policy poisoning attacks, and instantiate the attack on two standard victims: tabular certainty equivalence learner in reinforcement learning and linear quadratic regulator in control.
1305	Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees	Alix LHERITIER, Frederic Cazals	We propose a novel nonparametric online predictor for discrete labels conditioned on multivariate continuous features.
1306	Pure Exploration with Multiple Correct Answers	R?my Degenne, Wouter M. Koolen	We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound.
1307	Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets	Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Rong Ge, Sanjeev Arora	We give mathematical explanations for this phenomenon, assuming generic properties (such as dropout stability and noise stability) of well-trained deep nets, which have previously been identified as part of understanding the generalization properties of deep nets.
1308	On the Fairness of Disentangled Representations	Francesco Locatello, Gabriele Abbati, Thomas Rainforth, Stefan Bauer, Bernhard Sch?lkopf, Olivier Bachem	In this paper, we investigate the usefulness of different notions of disentanglement for improving the fairness of downstream prediction tasks based on representations.
1309	Compiler Auto-Vectorization with Imitation Learning	Charith Mendis, Cambridge Yang, Yewen Pu, Dr.Saman Amarasinghe, Michael Carbin	In this work, we explore whether it is feasible to imitate optimal decisions made by their ILP solution by fitting a graph neural network policy.
1310	A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation	Runzhe Yang, Xingyuan Sun, Karthik Narasimhan	We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks.
1311	Exact Gaussian Processes on a Million Data Points	Ke Wang, Geoff Pleiss, Jacob Gardner, Stephen Tyree, Kilian Q. Weinberger, Andrew Gordon Wilson	In this paper, we develop a scalable approach for exact GPs that leverages multi-GPU parallelization and methods like linear conjugate gradients, accessing the kernel matrix only through matrix multiplication.
1312	Bayesian Layers: A Module for Neural Network Uncertainty	Dustin Tran, Mike Dusenberry, Mark van der Wilk, Danijar Hafner	We describe Bayesian Layers, a module designed for fast experimentation with neural network uncertainty.
1313	Learning Compositional Neural Programs with Recursive Tree Search and Planning	Thomas PIERROT, Guillaume Ligner, Scott E. Reed, Olivier Sigaud, Nicolas Perrin, Alexandre Laterre, David Kas, Karim Beguir, Nando de Freitas	We propose a novel reinforcement learning algorithm, AlphaNPI, that incorporates the strengths of Neural Programmer-Interpreters (NPI) and AlphaZero.
1314	Nonparametric Contextual Bandits in Metric Spaces with Unknown Metric	Nirandika Wanigasekara, Christina Yu	We present a novel algorithm which learns data-driven similarities amongst the arms, in order to implement adaptive partitioning of the context-arm space for more efficient learning.
1315	Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations	Debraj Basu, Deepesh Data, Can Karakus, Suhas Diggavi	In this paper we propose Qsparse-local-SGD algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference between the true and compressed gradients.
1316	Likelihood Ratios for Out-of-Distribution Detection	Jie Ren, Peter J. Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark Depristo, Joshua Dillon, Balaji Lakshminarayanan	We propose a likelihood ratio method for deep generative models which effectively corrects for these confounding background statistics.
1317	Discrete Flows: Invertible Generative Models of Discrete Data	Dustin Tran, Keyon Vafa, Kumar Agrawal, Laurent Dinh, Ben Poole	In this paper, we show that flows can in fact be extended to discrete events—and under a simple change-of-variables formula not requiring log-determinant-Jacobian computations.
1318	A Self Validation Network for Object-Level Human Attention Estimation	Zehua Zhang, Chen Yu, David Crandall	In this paper, we propose a novel unified model that incorporates both spatial and temporal evidence in identifying as well as locating the attended object in firstperson videos.
1319	Model Selection for Contextual Bandits	Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo	We introduce the problem of model selection for contextual bandits, where a learner must adapt to the complexity of the optimal policy while balancing exploration and exploitation.
1320	Sliced Gromov-Wasserstein	Vayer Titouan, R?mi Flamary, Nicolas Courty, Romain Tavenard, Laetitia Chapel	This paper proposes a new divergence based on GW akin to SW.
1321	Towards Practical Alternating Least-Squares for CCA	Zhiqiang Xu, Ping Li	To promote the practical use of ALS for CCA, we propose truly alternating least-squares.
1322	Deep Leakage from Gradients	Ligeng Zhu, Zhijian Liu, Song Han	However, in this paper, we show that we can obtain the private training set from the publicly shared gradients.
1323	Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness	Fanny Yang, Zuowen Wang, Christina Heinze-Deml	This work provides theoretical and empirical evidence that invariance-inducing regularizers can increase predictive accuracy for worst-case spatial transformations (spatial robustness).
1324	Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks	Spencer Frei, Yuan Cao, Quanquan Gu	In this work, we analyze overparameterized deep residual networks trained by gradient descent following random initialization, and demonstrate that (i) the class of networks learned by gradient descent constitutes a small subset of the entire neural network function class, and (ii) this subclass of networks is sufficiently large to guarantee small training error.
1325	Value Function in Frequency Domain and the Characteristic Value Iteration Algorithm	Amir-massoud Farahmand	It presents a new representational framework to maintain the uncertainty of returns and provides mathematical tools to compute it.
1326	Icebreaker: Element-wise Efficient Information Acquisition with a Bayesian Deep Latent Gaussian Model	Wenbo Gong, Sebastian Tschiatschek, Sebastian Nowozin, Richard E. Turner, Jos? Miguel Hern?ndez-Lobato, Cheng Zhang	In this paper, we address the ice-start problem, i.e., the challenge of deploying machine learning models when only a little or no training data is initially available, and acquiring each feature element of data is associated with costs.
1327	Algorithmic Guarantees for Inverse Imaging with Untrained Network Priors	Gauri Jagatap, Chinmay Hegde	Specifically, we consider the problem of solving linear inverse problems, such as compressive sensing, as well as non-linear problems, such as compressive phase retrieval.
1328	Planning with Goal-Conditioned Policies	Soroush Nasiriany, Vitchyr Pong, Steven Lin, Sergey Levine	We show that goal-conditioned policies learned with RL can be incorporated into planning, such that a planner can focus on which states to reach, rather than how those states are reached.
1329	Don't take it lightly: Phasing optical random projections with unknown operators	Sidharth Gupta, Remi Gribonval, Laurent Daudet, Ivan Dokmanic	In this paper we tackle the problem of recovering the phase of complex linear measurements when only magnitude information is available and we control the input.
1330	Generating Diverse High-Fidelity Images with VQ-VAE-2	Ali Razavi, Aaron van den Oord, Oriol Vinyals	We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE) models for large scale image generation.
1331	Generalized Matrix Means for Semi-Supervised Learning with Multilayer Graphs	Pedro Mercado, Francesco Tudisco, Matthias Hein	We propose a regularizer based on the generalized matrix mean, which is a one-parameter family of matrix means that includes the arithmetic, geometric and harmonic means as particular cases.
1332	Online Optimal Control with Linear Dynamics and Predictions: Algorithms and Regret Analysis	Yingying Li, Xin Chen, Na Li	We design online algorithms, Receding Horizon Gradient-based Control (RHGC), that utilize the predictions through finite steps of gradient computations.
1333	Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption	Wei Ma, George H. Chen	In this paper, we suggest a simple approach to estimating these probabilities that avoids these shortcomings.
1334	MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis	Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Br?bisson, Yoshua Bengio, Aaron C. Courville	In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques.
1335	Offline Contextual Bandits with High Probability Fairness Guarantees	Blossom Metevier, Stephen Giguere, Sarah Brockman, Ari Kobren, Yuriy Brun, Emma Brunskill, Philip S. Thomas	We present RobinHood, an offline contextual bandit algorithm designed to satisfy a broad family of fairness constraints.
1336	Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods	Maher Nouiehed, Maziar Sanjabi, Tianjian Huang, Jason D. Lee, Meisam Razaviyayn	In this paper, we study the problem in the non-convex regime and show that an $\varepsilon$–first order stationary point of the game can be computed when one of the player’s objective can be optimized to global optimality efficiently.
1337	Semantic-Guided Multi-Attention Localization for Zero-Shot Learning	Yizhe Zhu, Jianwen Xie, Zhiqiang Tang, Xi Peng, Ahmed Elgammal	In this paper, we study the significance of the discriminative region localization.
1338	Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)	Mariya Toneva, Leila Wehbe	We propose here a novel interpretation approach that relies on the only processing system we have that does understand language: the human brain.
1339	Function-Space Distributions over Kernels	Gregory Benton, Wesley J. Maddox, Jayson Salkey, Julio Albinati, Andrew Gordon Wilson	In this paper, we develop functional kernel learning (FKL) to directly infer functional posteriors over kernels.
1340	The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares	Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli	Motivated by this observation, this work provides a detailed study of the following question: what rate is achievable using the final iterate of SGD for the streaming least squares regression problem with and without strong convexity?
1341	Compositional Plan Vectors	Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine	We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision.
1342	Locally Private Learning without Interaction Requires Separation	Amit Daniely, Vitaly Feldman	We consider learning under the constraint of local differential privacy (LDP).
1343	Robust Bi-Tempered Logistic Loss Based on Bregman Divergences	Ehsan Amid, Manfred K. K. Warmuth, Rohan Anil, Tomer Koren	We introduce a temperature into the exponential function and replace the softmax output layer of the neural networks by a high-temperature generalization.
1344	Computational Separations between Sampling and Optimization	Kunal Talwar	We present a simpler and stronger separation.
1345	Surfing: Iterative Optimization Over Incrementally Trained Deep Networks	Ganlin Song, Zhou Fan, John Lafferty	We investigate a sequential optimization procedure to minimize the empirical risk functional $f_{\hat\theta}(x) = \frac{1}{2}\\|G_{\hat\theta}(x) – y\\|^2$ for certain families of deep networks $G_{\theta}(x)$.
1346	Learning to Optimize in Swarms	Yue Cao, Tianlong Chen, Zhangyang Wang, Yang Shen	To overcome the limitations, we propose a meta-optimizer that learns in the algorithmic space of both point-based and population-based optimization algorithms.
1347	On Human-Aligned Risk Minimization	Liu Leqi, Adarsh Prasad, Pradeep K. Ravikumar	In this paper, we pose the following simple question: in contrast to minimizing expected loss, could we minimize a better human-aligned risk measure?
1348	Semi-Parametric Efficient Policy Learning with Continuous Actions	Victor Chernozhukov, Mert Demirer, Greg Lewis, Vasilis Syrgkanis	We propose a doubly robust off-policy estimate for this setting and show that off-policy optimization based on this doubly robust estimate is robust to estimation errors of the policy function or the regression model.
1349	Multi-task Learning for Aggregated Data using Gaussian Processes	Fariba Yousefi, Michael T. Smith, Mauricio ?lvarez	In this paper, we present a novel multi-task learning model based on Gaussian processes for joint learning of variables that have been aggregated at different input scales.
1350	Minimal Variance Sampling in Stochastic Gradient Boosting	Bulat Ibragimov, Gleb Gusev	In this paper, we formulate the problem of randomization in SGB in terms of optimization of sampling probabilities to maximize the estimation accuracy of split scoring used to train decision trees.
1351	Beyond the Single Neuron Convex Barrier for Neural Network Certification	Gagandeep Singh, Rupanshu Ganvir, Markus P?schel, Martin Vechev	We propose a new parametric framework, called k-ReLU, for computing precise and scalable convex relaxations used to certify neural networks.
1352	An Algorithm to Learn Polytree Networks with Hidden Nodes	Firoozeh Sepehr, Donatello Materassi	In this article, we develop an algorithm to exactly recover graphical models of random variables with underlying polytree structures when the latent nodes satisfy specific degree conditions.
1353	Efficiently Learning Fourier Sparse Set Functions	Andisheh Amrollahi, Amir Zandieh, Michael Kapralov, Andreas Krause	In this paper we consider the problem of efficiently learning set functions that are defined over a ground set of size $n$ and that are sparse (say $k$-sparse) in the Fourier domain.
1354	Projected Stein Variational Newton: A Fast and Scalable Bayesian Inference Method in High Dimensions	Peng Chen, Keyi Wu, Joshua Chen, Tom O’Leary-Roseberry, Omar Ghattas	We propose a projected Stein variational Newton (pSVN) method for high-dimensional Bayesian inference.
1355	Invariance and identifiability issues for word embeddings	Rachel Carrington, Karthik Bharath, Simon Preston	We provide a formal treatment of the above identifiability issue, present some numerical examples, and discuss possible resolutions.
1356	Generalization Error Analysis of Quantized Compressive Learning	Xiaoyun Li, Ping Li	In this paper, we consider the learning problem where the projected data is further compressed by scalar quantization, which is called quantized compressive learning.
1357	Multi-Criteria Dimensionality Reduction with Applications to Fairness	Uthaipon Tantipongpipat, Samira Samadi, Mohit Singh, Jamie H. Morgenstern, Santosh Vempala	In this paper, we introduce the multi-criteria dimensionality reduction problem where we are given multiple objectives that need to be optimized simultaneously.
1358	Efficient Rematerialization for Deep Networks	Ravi Kumar, Manish Purohit, Zoya Svitkina, Erik Vee, Joshua Wang	In this work we consider the rematerialization problem and devise efficient algorithms that use structural characterizations of computation graphs—treewidth and pathwidth—to obtain provably efficient rematerialization schedules.
1359	Mo’ States Mo’ Problems: Emergency Stop Mechanisms from Observation	Samuel Ainsworth, Matt Barnes, Siddhartha Srinivasa	We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon.
1360	Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments	Vasilis Syrgkanis, Victor Lei, Miruna Oprescu, Maggie Hei, Keith Battocchi, Greg Lewis	We develop a statistical learning approach to the estimation of heterogeneous effects, reducing the problem to the minimization of an appropriate loss function that depends on a set of auxiliary models (each corresponding to a separate prediction task).
1361	Understanding Sparse JL for Feature Hashing	Meena Jagadeesan	In this paper, we demonstrate the benefits of using sparsity s greater than 1 in sparse JL on feature vectors.
1362	Text-Based Interactive Recommendation via Constraint-Augmented Reinforcement Learning	Ruiyi Zhang, Tong Yu, Yilin Shen, Hongxia Jin, Changyou Chen	To alleviate this issue, we propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time.
1363	Flexible Modeling of Diversity with Strongly Log-Concave Distributions	Joshua Robinson, Suvrit Sra, Stefanie Jegelka	We propose SLC as the right extension of SR that enables easier, more intuitive control over diversity, illustrating this via examples of practical importance.
1364	Momentum-Based Variance Reduction in Non-Convex SGD	Ashok Cutkosky, Francesco Orabona	We present a new algorithm, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning.
1365	Search on the Replay Buffer: Bridging Planning and Reinforcement Learning	Ben Eysenbach, Russ R. Salakhutdinov, Sergey Levine	We introduce a general-purpose control algorithm that combines the strengths of planning and reinforcement learning to effectively solve these tasks.
1366	Can Unconditional Language Models Recover Arbitrary Sentences?	Nishant Subramani, Samuel Bowman, Kyunghyun Cho	To do this, we introduce a pair of effective, complementary methods for feeding representations into pretrained unconditional language models and a corresponding set of methods to map sentences into and out of this representation space, the reparametrized sentence space.
1367	Group Retention when Using Machine Learning in Sequential Decision Making: the Interplay between User Dynamics and Fairness	Xueru Zhang, Mohammadmahdi Khaliligarekani, Cem Tekin, mingyan liu	In this study, we seek to understand the interplay between ML decisions and the underlying group representation, how they evolve in a sequential framework, and how the use of fairness criteria plays a role in this process.
1368	Faster width-dependent algorithm for mixed packing and covering LPs	Digvijay Boob, Saurabh Sawlani, Di Wang	In this paper, we give a faster width-dependent algorithm for mixed packing-covering LPs.
1369	Flattening a Hierarchical Clustering through Active Learning	Fabio Vitale, Anand Rajagopalan, Claudio Gentile	We investigate active learning by pairwise similarity over the leaves of trees originating from hierarchical clustering procedures.
1370	DeepWave: A Recurrent Neural-Network for Real-Time Acoustic Imaging	Matthieu SIMEONI, Sepand Kashani, Paul Hurley, Martin Vetterli	We propose a recurrent neural-network for real-time reconstruction of acoustic camera spherical maps.
1371	Certifying Geometric Robustness of Neural Networks	Mislav Balunovic, Maximilian Baader, Gagandeep Singh, Timon Gehr, Martin Vechev	In this work, we propose a new method to compute sound and asymptotically optimal linear relaxations for any composition of transformations.
1372	Goal-conditioned Imitation Learning	Yiming Ding, Carlos Florensa, Pieter Abbeel, Mariano Phielipp	In this work we propose a novel algorithm goalGAIL, which incorporates demonstrations to drastically speed up the convergence to a policy able to reach any goal, surpassing the performance of an agent trained with other Imitation Learning algorithms.
1373	Robust exploration in linear quadratic reinforcement learning	Jack Umenberger, Mina Ferizbegovic, Thomas B. Sch?n, H?kan Hjalmarsson	We present a method, based on convex optimization, that accomplishes this task ‘robustly’, i.e., the worst-case cost, accounting for system uncertainty given the observed data, is minimized.
1374	DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs	Ali Sadeghian, Mohammadreza Armandpour, Patrick Ding, Daisy Zhe Wang	In this paper, we study the problem of learning probabilistic logical rules for inductive and interpretable link prediction.
1375	Kernel Truncated Randomized Ridge Regression: Optimal Rates and Low Noise Acceleration	Kwang-Sung Jun, Ashok Cutkosky, Francesco Orabona	In this paper we consider the nonparametric least square regression in a Reproducing Kernel Hilbert Space (RKHS).
1376	Input-Output Equivalence of Unitary and Contractive RNNs	Melikasadat Emami, Mojtaba Sahraee Ardakan, Sundeep Rangan, Alyson K. Fletcher	This works shows that for any contractive RNN with ReLU activations, there is a URNN with at most twice the number of hidden states and the identical input-output mapping.
1377	Hamiltonian Neural Networks	Samuel Greydanus, Misko Dzamba, Jason Yosinski	In this paper, we draw inspiration from Hamiltonian mechanics to train models that learn and respect exact conservation laws in an unsupervised manner.
1378	Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks	Qiyang Li, Saminul Haque, Cem Anil, James Lucas, Roger B. Grosse, Joern-Henrik Jacobsen	In particular, we present the Block Convolution Orthogonal Parameterization (BCOP), an expressive parameterization of orthogonal convolution operations.
1379	Structured and Deep Similarity Matching via Structured and Deep Hebbian Networks	Dina Obeid, Hugo Ramambason, Cengiz Pehlevan	In this paper, we introduce structured and deep similarity matching cost functions, and show how they can be optimized in a gradient-based manner by neural networks with local learning rules.
1380	Understanding the Representation Power of Graph Neural Networks in Learning Graph Topology	Nima Dehmamy, Albert-Laszlo Barabasi, Rose Yu	To deepen our understanding of graph neural networks, we investigate the representation power of Graph Convolutional Networks (GCN) through the looking glass of graph moments, a key property of graph topology encoding path of various lengths.
1381	Multiple Futures Prediction	Charlie Tang, Russ R. Salakhutdinov	Towards these goals, we introduce a probabilistic framework that efficiently learns latent variables to jointly model the multi-step future motions of agents in a scene.
1382	Explicitly disentangling image content from translation and rotation with spatial-VAE	Tristan Bepler, Ellen Zhong, Kotaro Kelley, Edward Brignole, Bonnie Berger	We propose a method for explicitly disentangling image rotation and translation from other unstructured latent factors in a variational autoencoder (VAE) framework.
1383	Power analysis of knockoff filters for correlated designs	Jingbo Liu, Philippe Rigollet	In this work we study the case where the predictors have a general covariance matrix $\bsigma$.
1384	A Kernel Loss for Solving the Bellman Equation	Yihao Feng, Lihong Li, Qiang Liu	In this paper, we propose a novel loss function, which can be optimized using standard gradient-based methods with guaranteed convergence.
1385	Low-Rank Bandit Methods for High-Dimensional Dynamic Pricing	Jonas W. Mueller, Vasilis Syrgkanis, Matt Taddy	We consider dynamic pricing with many products under an evolving but low-dimensional demand model.
1386	Differential Privacy Has Disparate Impact on Model Accuracy	Eugene Bagdasaryan, Omid Poursaeed, Vitaly Shmatikov	We demonstrate that in the neural networks trained using differentially private stochastic gradient descent (DP-SGD), this cost is not borne equally: accuracy of DP models drops much more for the underrepresented classes and subgroups.
1387	Riemannian batch normalization for SPD neural networks	Daniel Brooks, Olivier Schwander, Frederic Barbaresco, Jean-Yves Schneider, Matthieu Cord	In our article, we introduce a Riemannian batch normalization (batch- norm) algorithm, which generalizes the one used in Euclidean nets.
1388	Neural Taskonomy: Inferring the Similarity of Task-Derived Representations from Brain Activity	Yuan Wang, Michael Tarr, Leila Wehbe	To address this problem, we used learned representations drawn from 21 computer vision tasks to construct encoding models for predicting brain responses from BOLD5000—a large-scale dataset comprised of fMRI scans collected while observers viewed over 5000 naturalistic scene and object images.
1389	Stacked Capsule Autoencoders	Adam Kosiorek, Sara Sabour, Yee Whye Teh, Geoffrey E. Hinton	We introduce an unsupervised capsule autoencoder (SCAE), which explicitly uses geometric relationships between parts to reason about objects.
1390	Learning Reward Machines for Partially Observable Reinforcement Learning	Rodrigo Toro Icarte, Ethan Waldie, Toryn Klassen, Rick Valenzano, Margarita Castro, Sheila McIlraith	We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem.
1391	Learning Representations by Maximizing Mutual Information Across Views	Philip Bachman, R Devon Hjelm, William Buchwalter	We propose an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context.
1392	Amortized Bethe Free Energy Minimization for Learning MRFs	Sam Wiseman, Yoon Kim	We propose to learn deep undirected graphical models (i.e., MRFs) with a non-ELBO objective for which we can calculate exact gradients.
1393	Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity	Chulhee Yun, Suvrit Sra, Ali Jadbabaie	We study finite sample expressivity, i.e., memorization power of ReLU networks.
1394	Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks	Aaron Voelker, Ivana Kajic, Chris Eliasmith	We propose a novel memory cell for recurrent neural networks that dynamically maintains information across long windows of time using relatively few resources.
1395	Exact Combinatorial Optimization with Graph Convolutional Neural Networks	Maxime Gasse, Didier Chetelat, Nicola Ferroni, Laurent Charlin, Andrea Lodi	We propose a new graph convolutional neural network model for learning branch-and-bound variable selection policies, which leverages the natural variable-constraint bipartite graph representation of mixed-integer linear programs.
1396	Fast structure learning with modular regularization	Greg Ver Steeg, Hrayr Harutyunyan, Daniel Moyer, Aram Galstyan	We introduce a novel method that leverages a newly discovered connection between information-theoretic measures and structured latent factor models to derive an optimization objective which encourages modular structures where each observed variable has a single latent parent.
1397	Wasserstein Dependency Measure for Representation Learning	Sherjil Ozair, Corey Lynch, Yoshua Bengio, Aaron van den Oord, Sergey Levine, Pierre Sermanet	In this work, we empirically demonstrate that mutual information-based representation learning approaches do fail to learn complete representations on a number of designed and real-world tasks.
1398	TAB-VCR: Tags and Attributes based VCR Baselines	Jingxiang Lin, Unnat Jain, Alexander Schwing	Here we show that a much simpler model obtained by ablating and pruning the existing intricate baseline can perform better with half the number of trainable parameters.
1399	Universality and individuality in neural dynamics across large populations of recurrent networks	Niru Maheswaranathan, Alex Williams, Matthew Golub, Surya Ganguli, David Sussillo	To address these foundational questions, we study populations of thousands of networks of commonly used RNN architectures trained to solve neuroscientifically motivated tasks and characterize their low-dimensional dynamics via CCA and nonlinear dynamical systems analysis.
1400	End-to-End Learning on 3D Protein Structure for Interface Prediction	Raphael Townshend, Rishi Bedi, Patricia Suriana, Ron Dror	To address this question, we focused on a central problem in biology: predicting how proteins interact with one another—that is, which surfaces of one protein bind to those of another protein.
1401	A Family of Robust Stochastic Operators for Reinforcement Learning	Yingdong Lu, Mark Squillante, Chai Wah Wu	We consider a new family of stochastic operators for reinforcement learning with the goal of alleviating negative effects and becoming more robust to approximation or estimation errors.
1402	Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty	Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, Dawn Song	We find that self-supervision can benefit robustness in a variety of ways, including robustness to adversarial examples, label corruption, and common input corruptions.
1403	Inherent Tradeoffs in Learning Fair Representations	Han Zhao, Geoff Gordon	In this paper, through the lens of information theory, we provide the first result that quantitatively characterizes the tradeoff between demographic parity and the joint utility across different population groups.
1404	Are deep ResNets provably better than linear predictors?	Chulhee Yun, Suvrit Sra, Ali Jadbabaie	Our main theorem on deep ResNets shows under simple geometric conditions that, any critical point in the optimization landscape is either (i) at least as good as the best linear predictor; or (ii) the Hessian at this critical point has a strictly negative eigenvalue.
1405	Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics	Niru Maheswaranathan, Alex Williams, Matthew Golub, Surya Ganguli, David Sussillo	In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task.
1406	BehaveNet: nonlinear embedding and Bayesian neural decoding of behavioral videos	Eleanor Batty, Matthew Whiteway, Shreya Saxena, Dan Biderman, Taiga Abe, Simon Musall, Winthrop Gillis, Jeffrey Markowitz, Anne Churchland, John P. Cunningham, Sandeep R. Datta, Scott Linderman, Liam Paninski	Here we introduce a probabilistic framework for the analysis of behavioral video and neural activity.
1407	Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models	Yuge Shi, Siddharth N, Brooks Paige, Philip Torr	In this work, we characterise successful learning of such models as the fulfilment of four criteria: i) implicit latent decomposition into shared and private subspaces, ii) coherent joint generation over all modalities, iii) coherent cross-generation across individual modalities, and iv) improved model learning for individual modalities through multi-modal integration.
1408	Gradient-based Adaptive Markov Chain Monte Carlo	Michalis Titsias, Petros Dellaportas	We introduce a gradient-based learning method to automatically adapt Markov chain Monte Carlo (MCMC) proposal distributions to intractable targets.
1409	On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset	Muhammad Waleed Gondal, Manuel Wuthrich, Djordje Miladinovic, Francesco Locatello, Martin Breidt, Valentin Volchkov, Joel Akpo, Olivier Bachem, Bernhard Sch?lkopf, Stefan Bauer	In this paper, we propose a novel data-set which consists of over 1 million images of physical 3D objects with seven factors of variation, such as object color, shape, size and position.
1410	Imitation-Projected Programmatic Reinforcement Learning	Abhinav Verma, Hoang Le, Yisong Yue, Swarat Chaudhuri	We study the problem of programmatic reinforcement learning, in which policies are represented as short programs in a symbolic language.
1411	Learning Data Manipulation for Augmentation and Weighting	Zhiting Hu, Bowen Tan, Russ R. Salakhutdinov, Tom M. Mitchell, Eric P. Xing	In this work, we propose a new method that supports learning different manipulation schemes with the same gradient-based algorithm.
1412	Exploring Algorithmic Fairness in Robust Graph Covering Problems	Aida Rahmattalabi, Phebe Vayanos, Anthony Fulginiti, Eric Rice, Bryan Wilder, Amulya Yadav, Milind Tambe	To remediate this issue, we propose a novel formulation of the robust covering problem with fairness constraints and a tractable approximation scheme applicable to real world instances.
1413	Abstraction based Output Range Analysis for Neural Networks	Pavithra Prabhakar, Zahra Rahimi Afzal	In this paper, we consider the problem of output range analysis for feed-forward neural networks.
1414	Space and Time Efficient Kernel Density Estimation in High Dimensions	Arturs Backurs, Piotr Indyk, Tal Wagner	In this work, we present an improvement to their framework that retains the same query time, while requiring only linear space and linear preprocessing time.
1415	PIDForest: Anomaly Detection via Partial Identification	Parikshit Gopalan, Vatsal Sharan, Udi Wieder	We propose a framework called Partial Identification which captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values.
1416	Generative Models for Graph-Based Protein Design	John Ingraham, Vikas Garg, Regina Barzilay, Tommi Jaakkola	We develop relational language models for protein sequences that directly condition on a graph specification of the target structure.
1417	The Geometry of Deep Networks: Power Diagram Subdivision	Randall Balestriero, Romain Cosentino, Behnaam Aazhang, Richard Baraniuk	We study the geometry of deep (neural) networks (DNs) with piecewise affine and convex nonlinearities.
1418	Approximate Feature Collisions in Neural Nets	Ke Li, Tianhao Zhang, Jitendra Malik	In this paper, we show the opposite: neural nets could be surprisingly insensitive to adversarially chosen changes of large magnitude.
1419	Ease-of-Teaching and Language Structure from Emergent Communication	Fushan Li, Michael Bowling	By introducing new agents periodically to replace old ones, sequentially and within a population, we explore such a new pressure – ease of teaching – and show its impact on the structure of the resulting language.
1420	Generalization in multitask deep neural classifiers: a statistical physics approach	Anthony Ndirango, Tyler Lee	We develop an analytic theory of the nonlinear dynamics of generalization of deep neural networks trained to solve classification tasks using softmax outputs and cross-entropy loss, addressing both single task and multitask settings.
1421	Optimistic Distributionally Robust Optimization for Nonparametric Likelihood Approximation	Viet Anh Nguyen, Soroosh Shafieezadeh Abadeh, Man-Chung Yue, Daniel Kuhn, Wolfram Wiesemann	In this paper, we propose a non-parametric approximation of the likelihood that identifies a probability measure which lies in the neighborhood of the nominal measure and that maximizes the probability of observing the given sample point.
1422	On Relating Explanations and Adversarial Examples	Alexey Ignatiev, Nina Narodytska, Joao Marques-Silva	This paper demonstrates that explanations and adversarial examples are related by a generalized form of hitting set duality, which extends earlier work on hitting set duality observed in model-based diagnosis and knowledge compilation.
1423	On the equivalence between graph isomorphism testing and function approximation with GNNs	Zhengdao Chen, Soledad Villar, Lei Chen, Joan Bruna	Our work connects these two perspectives and proves their equivalence.
1424	Surround Modulation: A Bio-inspired Connectivity Structure for Convolutional Neural Networks	Hosein Hasani, Mahdieh Soleymani, Hamid Aghajan	Inspired by the notion of surround modulation, we designed new excitatory-inhibitory connections between a unit and its surrounding units in the convolutional neural network (CNN) to achieve a more biologically plausible network.
1425	Self-attention with Functional Time Representation Learning	Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, Kannan Achan	We propose several models to learn the functional time representation and the interactions with event representation.
1426	Re-randomized Densification for One Permutation Hashing and Bin-wise Consistent Weighted Sampling	Ping Li, Xiaoyun Li, Cun-Hui Zhang	In this paper, we propose a strategy named “re-randomization” in the process of densification that could achieve the smallest variance among all densification schemes.
1427	Enabling hyperparameter optimization in sequential autoencoders for spiking neural data	Mohammad Reza Keshtkaran, Chethan Pandarinath	We develop and test two potential solutions: an alternate validation method (“sample validation”) and a novel regularization method (“coordinated dropout”). These innovations prevent overfitting quite effectively, and allow us to test whether SAEs can achieve good performance on limited data through large-scale HP optimization.