Paper Digest: AISTATS 2016 Highlights

June 17, 2016June 18, 2020 admin

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The International Conference on Artificial Intelligence and Statistics (AISTATS) is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: AISTATS 2016 Papers

	Title	Authors	Highlight
1	Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures	Mario Lucic, Olivier Bachem, Andreas Krause	We propose a single, practical algorithm to construct strong coresets for a large class of hard and soft clustering problems based on Bregman divergences.
2	Revealing Graph Bandits for Maximizing Local Influence	Alexandra Carpentier, Michal Valko	In this paper, we do not assume any knowledge of the graph, but we consider a setting where it can be gradually discovered in a sequential and active way.
3	Convex Block-sparse Linear Regression with Expanders � Provably	Anastasios Kyrillidis, Bubacarr Bah, Rouzbeh Hasheminezhad, Quoc Tran Dinh, Luca Baldassarre, Volkan Cevher	Our aim here is to theoretically characterize the performance of convex approaches under such setting.
4	C3: Lightweight Incrementalized MCMC for Probabilistic Programs using Continuations and Callsite Caching	Daniel Ritchie, Andreas Stuhlm�ller, Noah Goodman	We present a new extension to the lightweight approach, C3, which enables efficient, incrementalized re-execution of MH proposals.
5	Clamping Improves TRW and Mean Field Approximations	Adrian Weller, Justin Domke	We explore the value of our methods by empirical analysis and draw lessons to guide practitioners.
6	Tightness of LP Relaxations for Almost Balanced Models	Adrian Weller, Mark Rowland, David Sontag	Here we consider binary pairwise models and derive sufﬁcient conditions for guaranteed tightness of (i) the standard LP relaxation on the local polytope LP+LOC, and (ii) the LP relaxation on the triplet-consistent polytope LP+TRI (the next level in the Sherali-Adams hierarchy).
7	Control Functionals for Quasi-Monte Carlo Integration	Chris Oates, Mark Girolami	Quasi-Monte Carlo (QMC) methods are being adopted in statistical applications due to the increasingly challenging nature of numerical integrals that are now routinely encountered.
8	Probability Inequalities for Kernel Embeddings in Sampling without Replacement	Markus Schneider	In this work we generalize the results of (Serfling 1974) to quantify the difference between this two estimates.
9	Sparse Representation of Multivariate Extremes with Applications to Anomaly Ranking	Nicolas Goix, Anne Sabourin, St�phan Cl�men�on	This paper proposes a new algorithm based on multivariate EVT to learn how to rank observations in a high dimensional space with respect to their degree of ‘abnormality’.
10	A Robust-Equitable Copula Dependence Measure for Feature Selection	Yale Chang, Yi Li, Adam Ding, Jennifer Dy	In this paper we introduce the concept of robust-equitability and a robust-equitable dependence measure copula correlation (Ccor).
11	Random Forest for the Contextual Bandit Problem	Rapha�l F�raud, Robin Allesiardo, Tanguy Urvoy, Fabrice Cl�rot	To address the contextual bandit problem, we propose an online random forest algorithm.
12	Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics	Michael Herman, Tobias Gindele, J�rg Wagner, Felix Schmitt, Wolfram Burgard	To overcome this, we present a gradient-based IRL approach that simultaneously estimates the system’s dynamics.
13	Learning Sparse Additive Models with Interactions in High Dimensions	Hemant Tyagi, Anastasios Kyrillidis, Bernd G�rtner, Andreas Krause	In this work, we consider a generalized SPAM, allowing for second order interaction terms.
14	Bipartite Correlation Clustering: Maximizing Agreements	Megasthenis Asteris, Anastasios Kyrillidis, Dimitris Papailiopoulos, Alexandros Dimakis	We present a novel approximation algorithm for k-BCC, a variant of BCC with an upper bound k on the number of clusters.
15	Breaking Sticks and Ambiguities with Adaptive Skip-gram	Sergey Bartunov, Dmitry Kondrashkin, Anton Osokin, Dmitry Vetrov	In this paper we propose the Adaptive Skip-gram model which is a nonparametric Bayesian extension of Skip-gram capable to automatically learn the required number of representations for all words at desired semantic resolution.
16	Top Arm Identification in Multi-Armed Bandits with Batch Arm Pulls	Kwang-Sung Jun, Kevin Jamieson, Robert Nowak, Xiaojin Zhu	We introduce a new multi-armed bandit (MAB) problem in which arms must be sampled in batches, rather than one at a time.
17	Limits on Sparse Support Recovery via Linear Sketching with Random Expander Matrices	Jonathan Scarlett, Volkan Cevher	Motivated by applications where the \emphpositions of the non-zero entries in a sparse vector are of primary interest, we consider the problem of \emphsupport recovery from a linear sketch taking the form \mathbfY = \mathbfXβ+ \mathbfZ.
18	Maximum Likelihood for Variance Estimation in High-Dimensional Linear Models	Lee H. Dicker, Murat A. Erdogdu	More broadly, the results in this paper illustrate a strategy for drawing connections between fixed- and random-effects models in high dimensions, which may be useful in other applications.
19	Scalable Gaussian Process Classification via Expectation Propagation	Daniel Hernandez-Lobato, Jose Miguel Hernandez-Lobato	As an alternative, we describe here how to train these classifiers efficiently using expectation propagation (EP).
20	Precision Matrix Estimation in High Dimensional Gaussian Graphical Models with Faster Rates	Lingxiao Wang, Xiang Ren, Quanquan Gu	In this paper, we present a new estimator for precision matrix in high dimensional Gaussian graphical models.
21	On the Reducibility of Submodular Functions	Jincheng Mei, Hao Zhang, Bao-Liang Lu	In this paper, we study the reducibility of submodular functions, a property that enables us to reduce the solution space of submodular optimization problems without performance loss.
22	Accelerated Stochastic Gradient Descent for Minimizing Finite Sums	Atsushi Nitanda	We propose an optimization method for minimizing the finite sums of smooth convex functions.
23	Fast Convergence of Online Pairwise Learning Algorithms	Martin Boissier, Siwei Lyu, Yiming Ying, Ding-Xuan Zhou	In this paper, we focus on online learning algorithms for pairwise learning problems without strong convexity, for which all previously known algorithms achieve a convergence rate of \mathcalO(1/\sqrtT) after T iterations.
24	Computationally Efficient Bayesian Learning of Gaussian Process State Space Models	Andreas Svensson, Arno Solin, Simo S�rkk�, Thomas Sch�n	We present a procedure for efficient Bayesian learning in Gaussian process state space models, where the representation is formed by projecting the problem onto a set of approximate eigenfunctions derived from the prior covariance structure.
25	Generalized Ideal Parent (GIP): Discovering non-Gaussian Hidden Variables	Yaniv Tenzer, Gal Elidan	We propose a novel general purpose approach for discovering hidden variables in flexible non-Gaussian domains using the powerful class of Gaussian copula networks.
26	On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes	Alexander G. de G. Matthews, James Hensman, Richard Turner, Zoubin Ghahramani	In this paper we give a substantial generalization of the literature on this topic.
27	Non-stochastic Best Arm Identification and Hyperparameter Optimization	Kevin Jamieson, Ameet Talwalkar	Motivated by the task of hyperparameter optimization, we introduce the \em non-stochastic best-arm identification problem.
28	A Linearly-Convergent Stochastic L-BFGS Algorithm	Philipp Moritz, Robert Nishihara, Michael Jordan	We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions.
29	No Regret Bound for Extreme Bandits	Robert Nishihara, David Lopez-Paz, Leon Bottou	Motivated by the general challenge of sequentially choosing which algorithm to use, we study the more specific task of choosing among distributions to use for random hyperparameter optimization.
30	Tensor vs. Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations	Anima Anandkumar, Prateek Jain, Yang Shi, U. N. Niranjan	We propose a novel non-convex iterative algorithm with guaranteed recovery.
31	Online Learning to Rank with Feedback at the Top	Sougata Chaudhuri, Ambuj Tewari Tewari	We develop efficient algorithms for well known losses in the pointwise, pairwise and listwise families.
32	Survey Propagation beyond Constraint Satisfaction Problems	Christopher Srinivasa, Siamak Ravanbakhsh, Brendan Frey	We propose an approximation scheme to efficiently extend the application of SP to marginalization in binary pairwise graphical models.
33	Score Permutation Based Finite Sample Inference for Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) Models	Bal�zs Csan�d Cs�ji	Here, we suggest a finite sample approach, called ScoPe, to construct distribution-free confidence regions around the QML estimate, which have exact coverage probabilities, despite no additional assumptions about moments are made.
34	CRAFT: ClusteR-specific Assorted Feature selecTion	Vikas K. Garg, Cynthia Rudin, Tommi Jaakkola	We present a hierarchical Bayesian framework for clustering with cluster-specific feature selection.
35	Time-Varying Gaussian Process Bandit Optimization	Ilija Bogunovic, Jonathan Scarlett, Volkan Cevher	We introduce two natural extensions of the classical Gaussian process upper confidence bound (GP-UCB) algorithm.
36	Bayes-Optimal Effort Allocation in Crowdsourcing: Bounds and Index Policies	Weici Hu, Peter Frazier	Following a similar approach to the Lagrangian Relaxation technique in Adelman and Mersereau (2008), we provide a computationally tractable instance-specific upper bound on the value of this Bayes-optimal policy, which can in turn be used to bound the optimality gap of any other sub-optimal policy.
37	Bayesian Markov Blanket Estimation	Dinu Kaufmann, Sonali Parbhoo, Aleksander Wieczorek, Sebastian Keller, David Adametz, Volker Roth	This paper considers a Bayesian view for estimating the Markov blanket of a set of query variables, where the set of potential neighbours here is big.
38	Dreaming More Data: Class-dependent Distributions over Diffeomorphisms for Learned Data Augmentation	S�ren Hauberg, Oren Freifeld, Anders Boesen Lindbo Larsen, John Fisher, Lars Hansen	With an eye towards true end-to-end learning, we suggest learning the applied transformations on a per-class basis.
39	Unsupervised Ensemble Learning with Dependent Classifiers	Ariel Jaffe, Ethan Fetaya, Boaz Nadler, Tingting Jiang, Yuval Kluger	To this end we introduce a statistical model that allows for dependencies between classifiers.
40	Multi-Level Cause-Effect Systems	Krzysztof Chalupka, Frederick Eberhardt, Pietro Perona	We present a domain-general account of causation that applies to settings in which macro-level causal relations between two systems are of interest, but the relevant causal features are poorly understood and have to be aggregated from vast arrays of micro-measurements.
41	Deep Kernel Learning	Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, Eric P. Xing	We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods.
42	Nearly Optimal Classification for Semimetrics	Lee-Ad Gottlieb, Aryeh Kontorovich, Pinhas Nisnevitch	We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality.
43	Latent Point Process Allocation	Chris Lloyd, Tom Gunter, Michael Osborne, Stephen Roberts, Tom Nickson	We introduce a probabilistic model for the factorisation of continuous Poisson process rate functions.
44	K2-ABC: Approximate Bayesian Computation with Kernel Embeddings	Mijung Park, Wittawat Jitkrittum, Dino Sejdinovic	In this paper, we propose a fully nonparametric ABC paradigm which circumvents the need for manually selecting summary statistics.
45	Bayesian Generalised Ensemble Markov Chain Monte Carlo	Jes Frellsen, Ole Winther, Zoubin Ghahramani, Jesper Ferkinghoff-Borg	BayesGE uses a Bayesian approach to iteratively update the belief about the density of states (distribution of the log likelihood under the prior) for the model, with the dual purpose of enhancing the sampling efficiency and making the estimation of the partition function tractable.
46	A Lasso-based Sparse Knowledge Gradient Policy for Sequential Optimal Learning	Yan Li, Han Liu, Warren Powell	We propose a sequential learning policy for noisy discrete global optimization and ranking and selection (R&S) problems with high dimensional sparse belief functions, where there are hundreds or even thousands of features, but only a small portion of these features contain explanatory power.
47	Optimal Statistical and Computational Rates for One Bit Matrix Completion	Renkun Ni, Quanquan Gu	We present an estimator based on rank constrained maximum likelihood estimation, and an efficient greedy algorithm to solve it approximately based on an extension of conditional gradient descent.
48	PAC-Bayesian Bounds based on the R�nyi Divergence	Luc B�gin, Pascal Germain, Fran�ois Laviolette, Jean-Francis Roy	We propose a simplified proof process for PAC-Bayesian generalization bounds, that allows to divide the proof in four successive inequalities, easing the "customization" of PAC-Bayesian theorems.
49	Simple and Scalable Constrained Clustering: a Generalized Spectral Method	Mihai Cucuringu, Ioannis Koutis, Sanjay Chawla, Gary Miller, Richard Peng	We present a simple spectral approach to the well-studied constrained clustering problem.
50	Geometry Aware Mappings for High Dimensional Sparse Factors	Avradeep Bhowmik, Nathan Liu, Erheng Zhong, Badri Bhaskar, Suju Rajan	In this manuscript we present a novel framework that exploits structural properties of sparse vectors, using the inverted index representation, to significantly reduce the run time computational cost of factorisation models.
51	Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree	Chen-Yu Lee, Patrick W. Gallagher, Zhuowen Tu	We seek to improve deep neural networks by generalizing the pooling operations that play a central role in current architectures.
52	Rivalry of Two Families of Algorithms for Memory-Restricted Streaming PCA	Chun-Liang Li, Hsuan-Tien Lin, Chi-Jen Lu	In this paper, we analyze the convergence rate of a representative algorithm with decayed learning rate (Oja and Karhunen, 1985) in the first family for the general k>1 case.
53	Quantization based Fast Inner Product Search	Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, David Simcha	We propose a quantization based approach for fast approximate Maximum Inner Product Search (MIPS).
54	An Improved Convergence Analysis of Cyclic Block Coordinate Descent-type Methods for Strongly Convex Minimization	Xingguo Li, Tuo Zhao, Raman Arora, Han Liu, Mingyi Hong	To bridge this theoretical gap, we propose an improved convergence analysis for the CBCD-type methods.
55	Learning Structured Low-Rank Representation via Matrix Factorization	Jie Shen, Ping Li	In this paper, we propose to learn structured LRR by factorizing the nuclear norm regularized matrix, which leads to our proposed non-convex formulation NLRR.
56	A PAC RL Algorithm for Episodic POMDPs	Zhaohan Daniel Guo, Shayan Doroudi, Emma Brunskill	We give, to our knowledge, the first partially observable RL algorithm with a polynomial bound on the number of episodes on which the algorithm may not achieve near-optimal performance.
57	Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation	Sujith Ravi, Qiming Diao	Traditional graph-based semi-supervised learning (SSL) approaches are not suited for massive data and large label scenarios since they scale linearly with the number of edges \|E\| and distinct labels m. To deal with the large label size problem, recent works propose sketch-based methods to approximate the label distribution per node thereby achieving a space reduction from O(m) to O(\log m), under certain conditions.
58	Large-Scale Optimization Algorithms for Sparse Conditional Gaussian Graphical Models	Calvin McCarter, Seyoung Kim	In this paper, we propose a new optimization procedure based on a Newton method that efficiently iterates over two sub-problems, leading to drastic improvement in computation time compared to the previous methods.
59	Graph Connectivity in Noisy Sparse Subspace Clustering	Yining Wang, Yu-Xiang Wang, Aarti Singh	In this paper, we investigate the graph connectivity problem for noisy sparse sub-space clustering and show that a simple post-processing procedure is capable of delivering consistent clustering under certain “general position” or “restricted eigenvalue” assumptions.
60	The Nonparametric Kernel Bayes Smoother	Yu Nishiyama, Amir Afsharinejad, Shunsuke Naruse, Byron Boots, Le Song	We expand upon this work by introducing a smoothing algorithm, the nonparametric kernel Bayes’ smoother (nKB-smoother) which relies on kernel Bayesian inference through the kernel sum rule and kernel Bayes’ rule.
61	Universal Models of Multivariate Temporal Point Processes	Asela Gunawardana, Chris Meek	In this paper, we study the expressive power and learnability of Graphical Event Models (GEMs) – the analogue of directed graphical models for multivariate temporal point processes.
62	Online Relative Entropy Policy Search using Reproducing Kernel Hilbert Space Embeddings	Zhitang Chen, Pascal Poupart, Yanhui Geng	In this paper, we develop an online policy search algorithm based on a recent state-of-the-art algorithm REPS-RKHS that uses conditional kernel embeddings.
63	Relationship between PreTraining and Maximum Likelihood Estimation in Deep Boltzmann Machines	Muneki Yasuda	A pretraining algorithm, which is a layer-by-layer greedy learning algorithm, for a deep Boltzmann machine (DBM) is presented in this paper.
64	Enumerating Equivalence Classes of Bayesian Networks using EC Graphs	Eunice Yuh-Jie Chen, Arthur Choi Choi, Adnan Darwiche	We propose a new search space for A* search, called the EC graph, that facilitates the enumeration of equivalence classes, by representing the space of completed, partially directed acyclic graphs.
65	Low-Rank and Sparse Structure Pursuit via Alternating Minimization	Quanquan Gu, Zhaoran Wang Wang, Han Liu	In this paper, we present a nonconvex alternating minimization optimization algorithm for low-rank and sparse structure pursuit.
66	NuC-MKL: A Convex Approach to Non Linear Multiple Kernel Learning	Eli Meirom, Pavel Kisilev	In this paper, we propose a new non-linear MKL method that utilizes nuclear norm regularization and leads to convex optimization problem.
67	Tractable and Scalable Schatten Quasi-Norm Approximations for Rank Minimization	Fanhua Shang, Yuanyuan Liu, James Cheng	Motivated by the equivalence relation between the trace norm and its bilinear spectral penalty, we define two tractable Schatten norms, i.e. the bi-trace and tri-trace norms, and prove that they are in essence the Schatten-1/2 and 1/3 quasi-norms, respectively.
68	Fast Dictionary Learning with a Smoothed Wasserstein Loss	Antoine Rolet, Marco Cuturi, Gabriel Peyr�	We consider in this paper the dictionary learning problem when the observations are normalized histograms of features.
69	New Resistance Distances with Global Information on Large Graphs	Canh Hao Nguyen, Hiroshi Mamitsuka	We propose new distance functions between nodes for this problem.
70	Batch Bayesian Optimization via Local Penalization	Javier Gonzalez, Zhenwen Dai, Philipp Hennig, Neil Lawrence	We investigate this issue and propose a highly effective heuristic based on an estimate of the function’s Lipschitz constant that captures the most important aspect of this interaction–local repulsion–at negligible computational overhead.
71	Nonparametric Budgeted Stochastic Gradient Descent	Trung Le, Vu Nguyen, Tu Dinh Nguyen, Dinh Phung	In this paper, we propose the Nonparametric Budgeted Stochastic Gradient Descent that allows the model size to automatically grow with data in a principled way.
72	Learning Relationships between Data Obtained Independently	Alexandra Carpentier, Teresa Schlueter	The aim of this paper is to provide a new method for learning the relationships between data that have been obtained independently.
73	Fast and Scalable Structural SVM with Slack Rescaling	Heejin Choi, Ofer Meshi, Nathan Srebro	We present an efficient method for training slack-rescaled structural SVM.
74	Probabilistic Approximate Least-Squares	Simon Bartels, Philipp Hennig	Leveraging recent results casting elementary linear algebra operations as probabilistic inference, we propose a new approximate method for nonparametric least-squares that affords a probabilistic uncertainty estimate over the error between the approximate and exact least-squares solution (this is not the same as the posterior variance of the associated Gaussian process regressor).
75	Approximate Inference Using DC Programming For Collective Graphical Models	Thien Nguyen, Akshat Kumar, Hoong Chuin Lau, Daniel Sheldon	Collective graphical models (CGMs) provide a framework for reasoning about a population of independent and identically distributed individuals when only noisy and aggregate observations are given.
76	Sequential Inference for Deep Gaussian Process	Yali Wang, Marcus Brubaker, Brahim Chaib-Draa, Raquel Urtasun	In this paper, we propose an efficient sequential inference framework for DGP, where the data is processed sequentially.
77	Variational Tempering	Stephan Mandt, James McInerney, Farhan Abrol, Rajesh Ranganath, David Blei	We therefore introduce variational tempering, a variational algorithm that introduces a temperature latent variable to the model.
78	On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System	Yi Zhou, Yaoliang Yu, Wei Dai, Yingbin Liang, Eric Xing	In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting.
79	Scalable MCMC for Mixed Membership Stochastic Blockmodels	Wenzhe Li, Sungjin Ahn, Max Welling	We propose a stochastic gradient Markov chain Monte Carlo (SG-MCMC) algorithm for scalable inference in mixed-membership stochastic blockmodels (MMSB).
80	Non-Stationary Gaussian Process Regression with Hamiltonian Monte Carlo	Markus Heinonen, Henrik Mannerstr�m, Juho Rousu, Samuel Kaski, Harri L�hdesm�ki	We present a novel approach for non-stationary Gaussian process regression (GPR), where the three key parameters – noise variance, signal variance and lengthscale – can be simultaneously input-dependent.
81	A Deep Generative Deconvolutional Image Model	Yunchen Pu, Win Yuan, Andrew Stevens, Chunyuan Li, Lawrence Carin	A deep generative model is developed for representation and analysis of images, based on a hierarchical convolutional dictionary-learning framework.
82	Distributed Multi-Task Learning	Jialei Wang, Mladen Kolar, Nathan Srerbo	We present a communication-efficient estimator based on the debiased lasso and show that it is comparable with the optimal centralized method.
83	A Fixed-Point Operator for Inference in Variational Bayesian Latent Gaussian Models	Rishit Sheth, Roni Khardon	Recent work proposed a fixed-point (FP) update procedure to optimize the covariance matrix in the variational solution and demonstrated its efficacy in specific models.
84	Learning Probabilistic Submodular Diversity Models Via Noise Contrastive Estimation	Sebastian Tschiatschek, Josip Djolonga, Andreas Krause	In this paper, we propose FLID, a novel log-submodular diversity model that scales to large numbers of items and can be efficiently learned using noise contrastive estimation.
85	Fast Saddle-Point Algorithm for Generalized Dantzig Selector and FDR Control with Ordered L1-Norm	Sangkyun Lee, Damian Brzyski, Malgorzata Bogdan	In this paper we propose a primal-dual proximal extragradient algorithm to solve the generalized Dantzig selector (GDS) estimation problem, based on a new convex-concave saddle-point (SP) reformulation.
86	GLASSES: Relieving The Myopia Of Bayesian Optimisation	Javier Gonzalez, Michael Osborne, Neil Lawrence	We present GLASSES: Global optimisation with Look-Ahead through Stochastic Simulation and Expected-loss Search.
87	Stochastic Variational Inference for the HDP-HMM	Aonan Zhang, San Gultekin, John Paisley	In this paper we provide a solution to this problem by deriving a variational inference algorithm for the HDP-HMM, as well as its stochastic extension, for which all parameter updates are in closed form.
88	Stochastic Neural Networks with Monotonic Activation Functions	Siamak Ravanbakhsh, Barnabas Poczos, Jeff Schneider, Dale Schuurmans, Russell Greiner	We propose a Laplace approximation that creates a stochastic unit from any smooth monotonic activation function, using only Gaussian noise.
89	(Bandit) Convex Optimization with Biased Noisy Gradient Oracles	Xiaowei Hu, Prashanth L.A., Andr�s Gy�rgy, Csaba Szepesvari	In this paper we propose a novel framework that replaces the specific gradient estimation methods with an abstract oracle model.
90	Variational Gaussian Copula Inference	Shaobo Han, Xuejun Liao, David Dunson, Lawrence Carin	For models with continuous and non-Gaussian hidden variables, we propose a semiparametric and automated variational Gaussian copula approach, in which the parametric Gaussian copula family is able to preserve multivariate posterior dependence, and the nonparametric transformations based on Bernstein polynomials provide ample flexibility in characterizing the univariate marginal posteriors.
91	Low-Rank Approximation of Weighted Tree Automata	Guillaume Rabusseau, Borja Balle, Shay Cohen	We describe a technique to minimize weighted tree automata (WTA), a powerful formalisms that subsumes probabilistic context-free grammars (PCFGs) and latent-variable PCFGs.
92	Accelerating Online Convex Optimization via Adaptive Prediction	Mehryar Mohri, Scott Yang	We present a powerful general framework for designing data-dependent online convex optimization algorithms, building upon and unifying recent techniques in adaptive regularization, optimistic gradient predictions, and problem-dependent randomization.
93	Scalable geometric density estimation	Ye Wang, Antonio Canale, David Dunson	We introduce a novel empirical Bayes method that we term geometric density estimation (GEODE) and show that, with mild conditions and among all d-dimensional linear subspaces, the span of the d leading principal axes of the data maximizes the model posterior.
94	Model-based Co-clustering for High Dimensional Sparse Data	Aghiles Salah, Nicoleta Rogovschi, Mohamed Nadif	We propose a novel model based on the von Mises-Fisher (vMF) distribution for co-clustering high dimensional sparse matrices.
95	DUAL-LOCO: Distributing Statistical Estimation Using Random Projections	Christina Heinze, Brian McWilliams, Nicolai Meinshausen	We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation.
96	High Dimensional Bayesian Optimization via Restricted Projection Pursuit Models	Chun-Liang Li, Kirthevasan Kandasamy, Barnabas Poczos, Jeff Schneider	Our generalization provides the benefits of i) greatly increasing the space of functions that can be modeled by our approach, which covers the previous works (Wang et al., 2013; Kandasamy et al., 2015) as special cases, and ii) efficiently handling the learning in a larger model space.
97	On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games	Julien P�rolat, Bilal Piot, Bruno Scherrer, Olivier Pietquin	The main contribution of this paper consists in extending several non-stationary Reinforcement Learning (RL) algorithms and their theoretical guarantees to the case of γ-discounted zero-sum Markov Games (MGs).
98	Semi-Supervised Learning with Adaptive Spectral Transform	Hanxiao Liu, Yiming Yang	This paper proposes a novel nonparametric framework for semi-supervised learning and for optimizing the Laplacian spectrum of the data manifold simultaneously.
99	Pseudo-Marginal Slice Sampling	Iain Murray, Matthew Graham	We describe a general way to clamp and update the random numbers used in a pseudo-marginal method’s unbiased estimator.
100	How to Learn a Graph from Smooth Signals	Vassilis Kalofolias	We propose a framework to learn the graph structure underlying a set of smooth signals.
101	Ordered Weighted L1 Regularized Regression with Strongly Correlated Covariates: Theoretical Aspects	Mario Figueiredo, Robert Nowak	This paper studies the ordered weighted L1 (OWL) family of regularizers for sparse linear regression with strongly correlated covariates.
102	Pareto Front Identification from Stochastic Bandit Feedback	Peter Auer, Chao-Kai Chiang, Ronald Ortner, Madalina Drugan	We propose a confidence bound algorithm to approximate the Pareto front, and prove problem specific lower and upper bounds, showing that the sample complexity is characterized by some natural geometric properties of the operating points.
103	Sketching, Embedding and Dimensionality Reduction in Information Theoretic Spaces	Amirali Abdullah, Ravi Kumar, Andrew McGregor, Sergei Vassilvitskii, Suresh Venkatasubramanian	In this paper we show how to embed information distances like the χ^2 and Jensen-Shannon divergences efficiently in low dimensional spaces while preserving all pairwise distances.
104	AdaDelay: Delay Adaptive Distributed Stochastic Optimization	Suvrit Sra, Adams Wei Yu, Mu Li, Alex Smola	We develop distributed stochastic convex optimization algorithms under a delayed gradient model in which server nodes update parameters and worker nodes compute stochastic (sub)gradients.
105	Exponential Stochastic Cellular Automata for Massively Parallel Inference	Manzil Zaheer, Michael Wick, Jean-Baptiste Tristan, Alex Smola, Guy Steele	We propose an embarrassingly parallel, memory efficient inference algorithm for latent variable models in which the complete data likelihood is in the exponential family.
106	Globally Sparse Probabilistic PCA	Pierre-Alexandre Mattei, Charles Bouveyron, Pierre Latouche	To overcome this drawback, we propose a Bayesian procedure that allows to obtain several sparse components with the same sparsity pattern.
107	Provable Bayesian Inference via Particle Mirror Descent	Bo Dai, Niao He, Hanjun Dai, Le Song	To tackle this challenge, we propose a simple yet provable algorithm, Particle Mirror Descent (PMD), to iteratively approximate the posterior density.
108	Unsupervised Feature Selection by Preserving Stochastic Neighbors	Xiaokai Wei, Philip S. Yu	In this paper, we present an effective method, Stochastic Neighbor-preserving Feature Selection (SNFS), for selecting discriminative features in unsupervised setting.
109	Improved Learning Complexity in Combinatorial Pure Exploration Bandits	Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Ronald Ortner, Peter Bartlett	We study the problem of combinatorial pure exploration in the stochastic multi-armed bandit problem.
110	Scalable Gaussian Processes for Characterizing Multidimensional Change Surfaces	William Herlands, Andrew Wilson, Hannes Nickisch, Seth Flaxman, Daniel Neill, Wilbert Van Panhuis, Eric Xing	We present a scalable Gaussian process model for identifying and characterizing smooth multidimensional changepoints, and automatically learning changes in expressive covariance structure.
111	Optimization as Estimation with Gaussian Processes in Bandit Settings	Zi Wang, Bolei Zhou, Stefanie Jegelka	We study an optimization strategy that directly uses an estimate of the argmax of the function.
112	A Convex Surrogate Operator for General Non-Modular Loss Functions	Jiaqian Yu, Matthew Blaschko	In this work, a novel generic convex surrogate for general non-modular loss functions is introduced, which provides for the first time a tractable solution for loss functions that are neither supermodular nor submodular.
113	Inference for High-dimensional Exponential Family Graphical Models	Jialei Wang, Mladen Kolar	In this paper, we propose a novel estimator for edge parameters in an exponential family graphical models.
114	Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization	Changyou Chen, David Carlson, Zhe Gan, Chunyuan Li, Lawrence Carin	We explore this relationship by applying simulated annealing to an SG-MCMC algorithm.
115	Fitting Spectral Decay with the k-Support Norm	Andrew McDonald, Massimiliano Pontil, Dimitris Stamos	In this paper we generalize the norm to the spectral (k,p)-support norm, whose additional parameter p can be used to tailor the norm to the decay of the spectrum of the underlying model.
116	Early Stopping as Nonparametric Variational Inference	David Duvenaud, Dougal Maclaurin, Ryan Adams	We show that unconverged stochastic gradient descent can be interpreted as sampling from a nonparametric approximate posterior distribution.
117	Bayesian Nonparametric Kernel-Learning	Junier B. Oliva, Avinava Dubey, Andrew G. Wilson, Barnabas Poczos, Jeff Schneider, Eric P. Xing	In this paper we introduce Bayesian nonparmetric kernel-learning (BaNK), a generic, data-driven framework for scalable learning of kernels.
118	Tight Variational Bounds via Random Projections and I-Projections	Lun-Kai Hsu, Tudor Achim, Stefano Ermon	To overcome this issue, we introduce a new class of random projections to reduce the dimensionality and hence the complexity of the original model.
119	Bethe Learning of Graphical Models via MAP Decoding	Kui Tang, Nicholas Ruozzi, David Belanger, Tony Jebara	We introduce MLE-Struct, a method for learning discrete exponential family models using the Bethe approximation to the partition function.
120	Determinantal Regularization for Ensemble Variable Selection	Veronika Rockova, Gemma Moran, Edward George	Motivated by non-parametric variational Bayes strategies, we move beyond this limitation by proposing an ensemble optimization approach to identify a collection of representative posterior modes.
121	Scalable and Sound Low-Rank Tensor Learning	Hao Cheng, Yaoliang Yu, Xinhua Zhang, Eric Xing, Dale Schuurmans	To address this problem, we propose directly optimizing the tensor trace norm by approximating its dual spectral norm, and we show that the approximation bounds can be efficiently converted to the original problem via the generalized conditional gradient algorithm.
122	Non-negative Matrix Factorization for Discrete Data with Hierarchical Side-Information	Changwei Hu, Piyush Rai, Lawrence Carin	We present a probabilistic framework for efficient non-negative matrix factorization of discrete (count/binary) data with side-information.
123	Topic-Based Embeddings for Learning from Large Knowledge Graphs	Changwei Hu, Piyush Rai, Lawrence Carin	We present a scalable probabilistic framework for learning from multi-relational data given in form of entity-relation-entity triplets, with a potentially massive number of entities and relations (e.g., in multi-relational networks, knowledge bases, etc.).
124	Consistently Estimating Markov Chains with Noisy Aggregate Data	Garrett Bernstein, Daniel Sheldon	We address the problem of estimating the parameters of a time-homogeneous Markov chain given only noisy, aggregate data.
125	Unwrapping ADMM: Efficient Distributed Computing via Transpose Reduction	Tom Goldstein, Gavin Taylor, Kawika Barabin, Kent Sayre	We propose iterative methods that solve global sub-problems over an entire distributed dataset.
126	Improper Deep Kernels	Uri Heinemann, Roi Livni, Elad Eban, Gal Elidan, Amir Globerson	Here we address this difficulty by turning to "improper learning" of neural nets.
127	Unbounded Bayesian Optimization via Regularization	Bobak Shahriari, Alexandre Bouchard-Cote, Nando Freitas	In this work, we modify the standard Bayesian optimization framework in a principled way to allow for unconstrained exploration of the search space.
128	Non-Gaussian Component Analysis with Log-Density Gradient Estimation	Hiroaki Sasaki, Gang Niu, Masashi Sugiyama	In this paper, we propose a novel NGCA algorithm based on log-density gradient estimation.
129	Online Learning with Noisy Side Observations	Tom� Koc�k, Gergely Neu, Michal Valko	We propose a new partial-observability model for online learning problems where the learner, besides its own loss, also observes some noisy feedback about the other actions, depending on the underlying structure of the problem.
130	Black-Box Policy Search with Probabilistic Programs	Jan-Willem Vandemeent, Brooks Paige, David Tolpin, Frank Wood	In this work we show how to represent policies as programs: that is, as stochastic simulators with tunable parameters.
131	Efficient Bregman Projections onto the Permutahedron and Related Polytopes	Cong Han Lim, Stephen J. Wright	In summary, this work describes a fast unified approach to this well-known class of problems.
132	On Searching for Generalized Instrumental Variables	Benito Zander, Maciej Liskiewicz	We provide fast algorithms for searching and testing restricted cases of GIVs.
133	Provable Tensor Methods for Learning Mixtures of Generalized Linear Models	Hanie Sedghi, Majid Janzamin, Anima Anandkumar	In contrast, we present a tensor decomposition method which is guaranteed to correctly recover the parameters.
134	Controlling Bias in Adaptive Data Analysis Using Information Theory	Daniel Russo, James Zou	In this paper, we propose a general information-theoretic framework to quantify and provably bound the bias of arbitrary adaptive analysis process.
135	A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees	Jean-Francis Roy, Mario Marchand, Fran�ois Laviolette	In this work, we design a column generation algorithm that we call CqBoost, that optimizes the C-bound and outputs a sparse distribution on a possibly infinite set of voters.
136	Graph Sparsification Approaches for Laplacian Smoothing	Veeru Sadhanala, Yu-Xiang Wang, Ryan Tibshirani	Given a statistical estimation problem where regularization is performed according to the structure of a large, dense graph G, we consider fitting the statistical estimate using a \it sparsified surrogate graph \mathbfG, which shares the vertices of G but has far fewer edges, and is thus more tractable to work with computationally.
137	Scalable Exemplar Clustering and Facility Location via Augmented Block Coordinate Descent with Column Generation	Ian En-Hsu Yen, Dmitry Malioutov, Abhishek Kumar	In this work, we propose an Augmented-Lagrangian with Block Coordinate Descent (AL-BCD) algorithm that utilizes problem structure to obtain closed-form solution for each block sub-problem, and exploits low-rank representation of the dissimilarity matrix to search active columns without computing the entire matrix.
138	Robust Covariate Shift Regression	Xiangli Chen, Mathew Monfort, Anqi Liu, Brian D. Ziebart	We propose a robust approach for regression under covariate shift that embraces the uncertainty resulting from sample selection bias by producing regression models that are explicitly robust to it.
139	On Lloyd�s Algorithm: New Theoretical Insights for Clustering in Practice	Cheng Tang, Claire Monteleoni	We provide new analyses of Lloyd’s algorithm (1982), commonly known as the k-means clustering algorithm.
140	Towards Stability and Optimality in Stochastic Gradient Descent	Panos Toulis, Dustin Tran, Edo Airoldi	To address these two issues we propose an iterative estimation procedure termed averaged implicit SGD (AI-SGD).
141	Communication Efficient Distributed Agnostic Boosting	Shang-Tse Chen, Maria-Florina Balcan, Duen Horng Chau	Our main contribution is a general distributed boosting-based procedure for learning an arbitrary concept space, that is simultaneously noise tolerant, communication efficient, and computationally efficient.
142	Private Causal Inference	Matt J. Kusner, Yu Sun, Karthik Sridharan, Kilian Q. Weinberger	We study the problem of inferring causality using the current, popular causal inference framework, the additive noise model (ANM) while simultaneously ensuring privacy of the users.
143	Parallel Markov Chain Monte Carlo via Spectral Clustering	Guillaume Basse, Aaron Smith, Natesh Pillai	In this paper, we present a parallelization scheme for Markov chain Monte Carlo (MCMC) methods based on spectral clustering of the underlying state space, generalizing earlier work on parallelization of MCMC methods by state space partitioning.
144	Efficient Sampling for k-Determinantal Point Processes	Chengtao Li, Stefanie Jegelka, Suvrit Sra	In light of this, we propose a new method for approximate sampling from discrete k-DPPs.
145	A Fast and Reliable Policy Improvement Algorithm	Yasin Abbasi-Yadkori, Peter L. Bartlett, Stephen J. Wright	We introduce a simple, efficient method that improves stochastic policies for Markov decision processes.
146	Learning Sigmoid Belief Networks via Monte Carlo Expectation Maximization	Zhao Song, Ricardo Henao, David Carlson, Lawrence Carin	We propose using an online Monte Carlo expectation-maximization (MCEM) algorithm to learn the maximum a posteriori (MAP) estimator of the generative model or optimize the variational lower bound of a recognition network.
147	Active Learning Algorithms for Graphical Model Selection	Gautamd Dasarathy, Aarti Singh, Maria-Florina Balcan, Jong H. Park	We propose a general paradigm for graphical model selection where feedback is used to guide the sampling to high degree vertices, while obtaining only few samples from the ones with the low degrees.
148	Streaming Kernel Principal Component Analysis	Mina Ghashami, Daniel J. Perry, Jeff Phillips	Kernel principal component analysis (KPCA) provides a concise set of basis vectors which capture non-linear structures within large data sets, and is a central tool in data analysis and learning.
149	Back to the Future: Radial Basis Function Networks Revisited	Qichao Que, Mikhail Belkin	In this paper we aim to revisit some of the older approaches to training the RBF networks from a more modern perspective.
150	Cut Pursuit: Fast Algorithms to Learn Piecewise Constant Functions	Loic Landrieu, Guillaume Obozinski	We propose working-set/greedy algorithms to efficiently solve problems penalized respectively by the total variation and the Mumford Shah boundary size when the piecewise constant solutions has a small number of levelsets.
151	Loss Bounds and Time Complexity for Speed Priors	Daniel Filan, Jan Leike, Marcus Hutter	We propose a variant to the original speed prior (Schmidhuber, 2002), and show that our prior can predict sequences drawn from probability measures that are estimable in polynomial time.
152	NYTRO: When Subsampling Meets Early Stopping	Raffaello Camoriano, Tom�s Angles, Alessandro Rudi, Lorenzo Rosasco	In this paper we ask whether early stopping and subsampling ideas can be combined in a fruitful way.
153	Randomization and The Pernicious Effects of Limited Budgets on Auction Experiments	Guillaume W. Basse, Hossein Azari Soufiani, Diane Lambert	This paper shows that if an A/B experiment affects only bids, then the observed treatment effect is an unbiased estimator when all the bidders in the same auction are randomly assigned to A or B but the observed treatment effect can be severely biased otherwise, even in the absence of throttling.
154	Spectral M-estimation with Applications to Hidden Markov Models	Dustin Tran, Minjae Kim, Finale Doshi-Velez	In this paper, we apply the framework of M-estimation to develop both a generalized method of moments procedure and a principled method for regularization.
155	Chained Gaussian Processes	Alan D. Saul, James Hensman, Aki Vehtari, Neil D. Lawrence	We develop an approximate inference procedure for Chained GPs that is scalable and applicable to any factorized likelihood.
156	Multiresolution Matrix Compression	Nedelina Teneva, Pramod Kaushik Mudrakarta, Risi Kondor	In this paper we describe pMMF, a fast parallel MMF algorithm, which can scale to n in the range of millions.
157	Supervised Neighborhoods for Distributed Nonparametric Regression	Adam Bloniarz, Ameet Talwalkar, Bin Yu, Christopher Wu	We propose a new method, SILO, for fitting prediction-time local models that uses supervised neighborhoods that adapt to the local shape of the regression surface.
158	Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation	Dejiao Zhang, Laura Balzano	In this paper, we propose an adaptive step size scheme that is greedy for the noiseless case, that maximizes the improvement of our metric of convergence at each data index t, and yields an expected improvement for the noisy case.
159	Online and Distributed Bayesian Moment Matching for Parameter Learning in Sum-Product Networks	Abdullah Rashwan, Han Zhao, Pascal Poupart	More specifically, we propose a new Bayesian moment matching (BMM) algorithm that operates naturally in an online fashion and that can be easily distributed.
160	Mondrian Forests for Large-Scale Regression when Uncertainty Matters	Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh	Through a combination of illustrative examples, real-world large-scale datasets and Bayesian optimization benchmarks, we demonstrate that Mondrian forests outperform approximate GPs on large-scale regression tasks and deliver better-calibrated uncertainty assessments than decision-forest-based methods.
161	Online (and Offline) Robust PCA: Novel Algorithms and Performance Guarantees	Jinchun Zhan, Brian Lois, Han Guo, Namrata Vaswani	In this work we develop and study a novel online robust principal components’ analysis (RPCA) algorithm based on the recently introduced ReProCS framework.
162	Parallel Majorization Minimization with Dynamically Restricted Domains for Nonconvex Optimization	Yan Kaganovsky, Ikenna Odinaka, David Carlson, Lawrence Carin	We propose an optimization framework for nonconvex problems based on majorization-minimization that is particularity well-suited for parallel computing.
163	Discriminative Structure Learning of Arithmetic Circuits	Amirmohammad Rooshenas, Daniel Lowd	In this paper, we present the first discriminative structure learning algorithm for ACs, DACLearn (Discriminative AC Learner).
164	One Scan 1-Bit Compressed Sensing	Ping Li	Based on α-stable random projections with small α, we develop a simple algorithm for compressed sensing (sparse signal recovery) by utilizing only the signs (i.e., 1-bit) of the measurements.