Paper Digest: ICML 2015 Highlights

July 5, 2015October 6, 2019 admin

The International Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. In 2015, it is to be held in Lille, France.

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team
team@paperdigest.org

TABLE 1: ICML 2015 Papers

	Title	Authors	Highlight
1	Stochastic Optimization with Importance Sampling for Regularized Loss Minimization	Peilin Zhao, Tong Zhang	In this paper we study stochastic optimization, including prox-SMD and prox-SDCA, with importance sampling, which improves the convergence rate by reducing the stochastic variance.
2	Approval Voting and Incentives in Crowdsourcing	Nihar Shah, Dengyong Zhou, Yuval Peres	In this paper, we address these issues by introducing approval voting to utilize the expertise of workers who have partial knowledge of the true answer, and coupling it with a (“strictly proper”) incentive-compatible compensation mechanism.
3	A low variance consistent test of relative dependency	Wacha Bounliphone, Arthur Gretton, Arthur Tenenhaus, Matthew Blaschko	We describe a novel non-parametric statistical hypothesis test of relative dependence between a source variable and two candidate target variables.
4	An Aligned Subtree Kernel for Weighted Graphs	Lu Bai, Luca Rossi, Zhihong Zhang, Edwin Hancock	In this paper, we develop a new entropic matching kernel for weighted graphs by aligning depth-based representations.
5	Spectral Clustering via the Power Method – Provably	Christos Boutsidis, Prabhanjan Kambadur, Alex Gittens	Specifically, we prove that solving the k-means clustering problem on the approximate eigenvectors obtained via the power method gives an additive-error approximation to solving the k-means problem on the optimal eigenvectors.
6	Information Geometry and Minimum Description Length Networks	Ke Sun, Jun Wang, Alexandros Kalousis, Stephan Marchand-Maillet	We present a geometric picture, where all these representations are regarded as free points in the space of probability distributions.
7	Efficient Training of LDA on a GPU by Mean-for-Mode Estimation	Jean-Baptiste Tristan, Joseph Tassarotti, Guy Steele	We introduce Mean-for-Mode estimation, a variant of an uncollapsed Gibbs sampler that we use to train LDA on a GPU.
8	Adaptive Stochastic Alternating Direction Method of Multipliers	Peilin Zhao, Jinwei Yang, Tong Zhang, Ping Li	In this paper, we present a new family of stochastic ADMM algorithms with optimal 2nd-order proximal functions, which produce a new family of adaptive stochastic ADMM methods.
9	A Lower Bound for the Optimization of Finite Sums	Alekh Agarwal, Leon Bottou	This paper presents a lower bound for optimizing a finite sum of n functions, where each function is L-smooth and the sum is μ-strongly convex.
10	Learning Word Representations with Hierarchical Sparse Coding	Dani Yogatama, Manaal Faruqui, Chris Dyer, Noah Smith	We propose a new method for learning word representations using hierarchical regularization in sparse coding inspired by the linguistic study of word meanings.
11	Learning Transferable Features with Deep Adaptation Networks	Mingsheng Long, Yue Cao, Jianmin Wang, Michael Jordan	In this paper, we propose a new Deep Adaptation Network (DAN) architecture, which generalizes deep convolutional neural network to the domain adaptation scenario.
12	Robust partially observable Markov decision process	Takayuki Osogami	Based on the convexity, we design a value-iteration algorithm for finding the robust policy.
13	On the Relationship between Sum-Product Networks and Bayesian Networks	Han Zhao, Mazen Melibari, Pascal Poupart	In this paper, we establish some theoretical connections between Sum-Product Networks (SPNs) and Bayesian Networks (BNs).
14	Learning from Corrupted Binary Labels via Class-Probability Estimation	Aditya Menon, Brendan Van Rooyen, Cheng Soon Ong, Bob Williamson	This paper uses class-probability estimation to study these and other corruption processes belonging to the mutually contaminated distributions framework (Scott et al., 2013), with three conclusions.
15	An Explicit Sampling Dependent Spectral Error Bound for Column Subset Selection	Tianbao Yang, Lijun Zhang, Rong Jin, Shenghuo Zhu	In this paper, we consider the problem of column subset selection.
16	A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate	Ohad Shamir	We describe and analyze a simple algorithm for principal component analysis and singular value decomposition, VR-PCA, which uses computationally cheap stochastic iterations, yet converges exponentially fast to the optimal solution.
17	Attribute Efficient Linear Regression with Distribution-Dependent Sampling	Doron Kukliansky, Ohad Shamir	We develop efficient algorithms for Ridge and Lasso linear regression, which utilize the geometry of the data by a novel distribution-dependent sampling scheme, and have excess risk bounds which are better a factor of up to O(d/k) over the state-of-the-art, where d is the dimension and k+1 is the number of observed attributes per example.
18	Learning Local Invariant Mahalanobis Distances	Ethan Fetaya, Shimon Ullman	In this paper we propose a novel and computationally efficient way to learn a local Mahalanobis metric per datum, and show how we can learn a local invariant metric to any transformation in order to improve performance.
19	Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis	Zhuang Ma, Yichao Lu, Dean Foster	In this paper, we tackle the problem of large scale CCA, where classical algorithms, usually requiring computing the product of two huge matrices and huge matrix decomposition, are computationally and storage expensive.
20	Abstraction Selection in Model-based Reinforcement Learning	Nan Jiang, Alex Kulesza, Satinder Singh	Existing approaches have theoretical guarantees only under strong assumptions on the domain or asymptotically large amounts of data, but in this paper we propose a simple algorithm based on statistical hypothesis testing that comes with a finite-sample guarantee under assumptions on candidate abstractions.
21	Surrogate Functions for Maximizing Precision at the Top	Purushottam Kar, Harikrishna Narasimhan, Prateek Jain	In this paper we make key contributions in these directions.
22	Optimizing Non-decomposable Performance Measures: A Tale of Two Classes	Harikrishna Narasimhan, Purushottam Kar, Prateek Jain	In this paper we reveal that for two large families of performance measures that can be expressed as functions of true positive/negative rates, it is indeed possible to implement point stochastic updates.
23	Coresets for Nonparametric Estimation – the Case of DP-Means	Olivier Bachem, Mario Lucic, Andreas Krause	We explore the use of coresets – a data summarization technique originating from computational geometry – for this task.
24	A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits	Pratik Gajane, Tanguy Urvoy, Fabrice Cl�rot	We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem.
25	Functional Subspace Clustering with Application to Time Series	Mohammad Taha Bahadori, David Kale, Yingying Fan, Yan Liu	To address these challenges, we propose a new framework called Functional Subspace Clustering (FSC).
26	Accelerated Online Low Rank Tensor Learning for Multivariate Spatiotemporal Streams	Rose Yu, Dehua Cheng, Yan Liu	In this paper, we propose an online accelerated low-rank tensor learning algorithm (ALTO) to solve the problem.
27	Atomic Spatial Processes	Sean Jewell, Neil Spencer, Alexandre Bouchard-C�t�	We employ techniques from Bayesian non-parametric statistics to develop a process which captures a common characteristic of urban spatial datasets.
28	Classification with Low Rank and Missing Data	Elad Hazan, Roi Livni, Yishay Mansour	Nevertheless, using a non-proper formulation we give an efficient agnostic algorithm that classifies as good as the best linear classifier coupled with the best low-dimensional subspace in which the data resides.
29	Dynamic Sensing: Better Classification under Acquisition Constraints	Oran Richman, Shie Mannor	In this paper we propose to actively allocate resources to each sample such that resources are used optimally overall.
30	A Modified Orthant-Wise Limited Memory Quasi-Newton Method with Convergence Analysis	Pinghua Gong, Jieping Ye	In this paper, we propose a modified Orthant-Wise Limited memory Quasi-Newton (mOWL-QN) algorithm by slightly modifying the OWL-QN algorithm.
31	Telling cause from effect in deterministic linear dynamical systems	Naji Shajarisales, Dominik Janzing, Bernhard Schoelkopf, Michel Besserve	Assuming the effect is generated by the cause through a linear system, we propose a new approach based on the hypothesis that nature chooses the “cause” and the “mechanism generating the effect from the cause” independently of each other.
32	High Dimensional Bayesian Optimisation and Bandits via Additive Models	Kirthevasan Kandasamy, Jeff Schneider, Barnabas Poczos	In this paper, we identify two key challenges in this endeavour.
33	Theory of Dual-sparse Regularized Randomized Reduction	Tianbao Yang, Lijun Zhang, Rong Jin, Shenghuo Zhu	In this paper, we study randomized reduction methods, which reduce high-dimensional features into low-dimensional space by randomized methods (e.g., random projection, random hashing), for large-scale high-dimensional classification.
34	Generalization error bounds for learning to rank: Does the length of document lists matter?	Ambuj Tewari, Sougata Chaudhuri	We consider the generalization ability of algorithms for learning to rank at a query level, a problem also called subset ranking.
35	PeakSeg: constrained optimal segmentation and supervised penalty learning for peak detection in count data	Toby Hocking, Guillem Rigaill, Guillaume Bourque	We propose PeakSeg, a new constrained maximum likelihood segmentation model for peak detection with an efficient inference algorithm: constrained dynamic programming.
36	Mind the duality gap: safer rules for the Lasso	Olivier Fercoq, Alexandre Gramfort, Joseph Salmon	In this paper, we propose new versions of the so-called \textitsafe rules for the Lasso.
37	A General Analysis of the Convergence of ADMM	Robert Nishihara, Laurent Lessard, Ben Recht, Andrew Packard, Michael Jordan	We provide a new proof of the linear convergence of the alternating direction method of multipliers (ADMM) when one of the objective terms is strongly convex.
38	Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization	Yuchen Zhang, Xiao Lin	We propose a stochastic primal-dual coordinate method, which alternates between maximizing over one (or more) randomly chosen dual variable and minimizing over the primal variable.
39	DiSCO: Distributed Optimization for Self-Concordant Empirical Loss	Yuchen Zhang, Xiao Lin	We propose a new distributed algorithm for empirical risk minimization in machine learning.
40	Spectral MLE: Top-K Rank Aggregation from Pairwise Comparisons	Yuxin Chen, Changho Suh	To approach this minimax limit, we propose a nearly linear-time ranking scheme, called Spectral MLE, that returns the indices of the top-K items in accordance to a careful score estimate.
41	Paired-Dual Learning for Fast Training of Latent Variable Hinge-Loss MRFs	Stephen Bach, Bert Huang, Jordan Boyd-Graber, Lise Getoor	We introduce paired-dual learning, a framework that greatly speeds up training by using tractable entropy surrogates and avoiding repeated inferences.
42	Structural Maxent Models	Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Umar Syed	We present a new class of density estimation models, Structural Maxent models, with feature functions selected from possibly very complex families.
43	A Provable Generalized Tensor Spectral Method for Uniform Hypergraph Partitioning	Debarghya Ghoshdastidar, Ambedkar Dukkipati	In this paper, we develop a unified approach for partitioning uniform hypergraphs by means of a tensor trace optimization problem involving the affinity tensor, and a number of existing higher-order methods turn out to be special cases of the proposed formulation.
44	The Benefits of Learning with Strongly Convex Approximate Inference	Ben London, Bert Huang, Lise Getoor	Our insights for the latter suggest a novel counting number optimization framework, which guarantees strong convexity for any given modulus.
45	Pushing the Limits of Affine Rank Minimization by Adapting Probabilistic PCA	Bo Xin, David Wipf	Against this backdrop we derive a deceptively simple and parameter-free probabilistic PCA-like algorithm that is capable, over a wide battery of empirical tests, of successful recovery even at the theoretical limit where the number of measurements equals the degrees of freedom in the unknown low-rank matrix.
46	Budget Allocation Problem with Multiple Advertisers: A Game Theoretic View	Takanori Maehara, Akihiro Yabe, Ken-ichi Kawarabayashi	By extending the budget allocation problem with a bipartite influence model, we propose a game-theoretic model problem that considers many advertisers.
47	Tracking Approximate Solutions of Parameterized Optimization Problems over Multi-Dimensional (Hyper-)Parameter Domains	Katharina Blechschmidt, Joachim Giesen, Soeren Laue	Many machine learning methods are given as parameterized optimization problems.
48	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift	Sergey Ioffe, Christian Szegedy	We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs.
49	Distributed Estimation of Generalized Matrix Rank: Efficient Algorithms and Lower Bounds	Yuchen Zhang, Martin Wainwright, Michael Jordan	In contrast, we propose a randomized algorithm that communicates only O(n) bits.
50	Landmarking Manifolds with Gaussian Processes	Dawen Liang, John Paisley	We present an algorithm for finding landmarks along a manifold.
51	Markov Mixed Membership Models	Aonan Zhang, John Paisley	We present a Markov mixed membership model (Markov M3) for grouped data that learns a fully connected graph structure among mixing components.
52	A Unified Framework for Outlier-Robust PCA-like Algorithms	Wenzhuo Yang, Huan Xu	We propose a unified framework for making a wide range of PCA-like algorithms – including the standard PCA, sparse PCA and non-negative sparse PCA, etc. – robust when facing a constant fraction of arbitrarily corrupted outliers.
53	Streaming Sparse Principal Component Analysis	Wenzhuo Yang, Huan Xu	We develop and analyze two memory and computational efficient algorithms called streaming sparse PCA and streaming sparse ECA for analyzing data generated according to the spike model and the elliptical model respectively.
54	A Divide and Conquer Framework for Distributed Graph Clustering	Wenzhuo Yang, Huan Xu	In order to improve the scalability of existing graph clustering algorithms, we propose a novel divide and conquer framework for graph clustering, and establish theoretical guarantees of exact recovery of the clusters.
55	How Can Deep Rectifier Networks Achieve Linear Separability and Preserve Distances?	Senjian An, Farid Boussaid, Mohammed Bennamoun	This paper investigates how hidden layers of deep rectifier networks are capable of transforming two or more pattern sets to be linearly separable while preserving the distances with a guaranteed degree, and proves the universal classification power of such distance preserving rectifier networks.
56	Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning	K. Lakshmanan, Ronald Ortner, Daniil Ryabko	We consider the problem of undiscounted reinforcement learning in continuous state space.
57	The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling	Michael Betancourt	In this paper I demonstrate how data subsampling fundamentally compromises the scalability of Hamiltonian Monte Carlo.
58	Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets	Dan Garber, Elad Hazan	In this paper we consider the special case of optimization over strongly convex sets, for which we prove that the vanila FW method converges at a rate of \frac1t^2.
59	Ordered Stick-Breaking Prior for Sequential MCMC Inference of Bayesian Nonparametric Models	Mrinal Das, Trapit Bansal, Chiranjib Bhattacharyya	One of the major contributions of this paper is SUMO, an MCMC algorithm, for solving the inference problem arising from applying OSBP to BNP models.
60	Online Learning of Eigenvectors	Dan Garber, Elad Hazan, Tengyu Ma	In this paper we present new algorithms that avoid both issues.
61	A Unifying Framework of Anytime Sparse Gaussian Process Regression Models with Stochastic Variational Inference for Big Data	Trong Nghia Hoang, Quang Minh Hoang, Bryan Kian Hsiang Low	This paper presents a novel unifying framework of anytime sparse Gaussian process regression (SGPR) models that can produce good predictive performance fast and improve their predictive performance over time.
62	Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup	Yufei Ding, Yue Zhao, Xipeng Shen, Madanlal Musuvathi, Todd Mytkowicz	This paper presents Yinyang K-means, a new algorithm for K-means clustering.
63	Ordinal Mixed Membership Models	Seppo Virtanen, Mark Girolami	In this work, by way of illustration, we apply the models to a collection of consumer-generated reviews of mobile software applications, where each review contains unstructured text data accompanied with an ordinal rating, and demonstrate that the models infer useful and meaningful recurring patterns of consumer feedback.
64	Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network	Seunghoon Hong, Tackgeun You, Suha Kwak, Bohyung Han	We propose an online visual tracking algorithm by learning discriminative saliency map using Convolutional Neural Network (CNN).
65	Fast Kronecker Inference in Gaussian Processes with non-Gaussian Likelihoods	Seth Flaxman, Andrew Wilson, Daniel Neill, Hannes Nickisch, Alex Smola	We propose new scalable Kronecker methods for Gaussian processes with non-Gaussian likelihoods, using a Laplace approximation which involves linear conjugate gradients for inference, and a lower bound on the GP marginal likelihood for kernel learning.
66	Statistical and Algorithmic Perspectives on Randomized Sketching for Ordinary Least-Squares	Garvesh Raskutti, Michael Mahoney	In this paper, we provide a rigorous comparison of both perspectives leading to insights on how they differ.
67	On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence	Nathaniel Korda, Prashanth La	We provide non-asymptotic bounds for the well-known temporal difference learning algorithm TD(0) with linear function approximators.
68	Learning Parametric-Output HMMs with Two Aliased States	Roi Weiss, Boaz Nadler	In this paper we focus on parametric-output HMMs, whose output distributions come from a parametric family, and that have exactly two aliased states.
69	Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data	Yarin Gal, Yutian Chen, Zoubin Ghahramani	Building on these ideas we propose a Bayesian model for the unsupervised task of distribution estimation of multivariate categorical data.
70	Improving the Gaussian Process Sparse Spectrum Approximation by Representing Uncertainty in Frequency Inputs	Yarin Gal, Richard Turner	We model the covariance function with a finite Fourier series approximation and treat it as a random variable.
71	Ranking from Stochastic Pairwise Preferences: Recovering Condorcet Winners and Tournament Solution Sets at the Top	Arun Rajkumar, Suprovat Ghoshal, Lek-Heng Lim, Shivani Agarwal	In this paper, we consider settings where pairwise preferences can contain cycles.
72	Stochastic Dual Coordinate Ascent with Adaptive Probabilities	Dominik Csiba, Zheng Qu, Peter Richtarik	This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems.
73	Vector-Space Markov Random Fields via Exponential Families	Wesley Tansey, Oscar Hernan Madrid Padilla, Arun Sai Suggala, Pradeep Ravikumar	We present Vector-Space Markov Random Fields (VS-MRFs), a novel class of undirected graphical models where each variable can belong to an arbitrary vector space.
74	JUMP-Means: Small-Variance Asymptotics for Markov Jump Processes	Jonathan Huggins, Karthik Narasimhan, Ardavan Saeedi, Vikash Mansinghka	We propose algorithms for each of these formulations, which we call \emphJUMP-means.
75	Low Rank Approximation using Error Correcting Coding Matrices	Shashanka Ubaru, Arya Mazumdar, Yousef Saad	In this paper, we show how matrices from error correcting codes can be used to find such low rank approximations.
76	Off-policy Model-based Learning under Unknown Factored Dynamics	Assaf Hallak, Francois Schnitzler, Timothy Mann, Shie Mannor	To answer this question, we introduce the G-SCOPE algorithm that evaluates a new policy based on data generated by the existing policy.
77	Log-Euclidean Metric Learning on Symmetric Positive Definite Manifold with Application to Image Set Classification	Zhiwu Huang, Ruiping Wang, Shiguang Shan, Xianqiu Li, Xilin Chen	To overcome this limitation, we propose a novel metric learning approach to work directly on logarithms of SPD matrices.
78	Asymmetric Transfer Learning with Deep Gaussian Processes	Melih Kandemir	We introduce a novel Gaussian process based Bayesian model for asymmetric transfer learning.
79	Towards a Lower Sample Complexity for Robust One-bit Compressed Sensing	Rongda Zhu, Quanquan Gu	In this paper, we propose a novel algorithm based on nonconvex sparsity-inducing penalty for one-bit compressed sensing.
80	BilBOWA: Fast Bilingual Distributed Representations without Word Alignments	Stephan Gouws, Yoshua Bengio, Greg Corrado	We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.
81	Multi-view Sparse Co-clustering via Proximal Alternating Linearized Minimization	Jiangwen Sun, Jin Lu, Tingyang Xu, Jinbo Bi	We propose a proximal alternating linearized minimization algorithm that simultaneously decomposes multiple data matrices into sparse row and columns vectors.
82	Cascading Bandits: Learning to Rank in the Cascade Model	Branislav Kveton, Csaba Szepesvari, Zheng Wen, Azin Ashkan	In this paper, we propose cascading bandits, a learning variant of the cascade model where the objective is to identify K most attractive items.
83	Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models	James Foulds, Shachi Kumar, Lise Getoor	In this paper we introduce latent topic networks, a flexible class of richly structured topic models designed to facilitate applied research.
84	Random Coordinate Descent Methods for Minimizing Decomposable Submodular Functions	Alina Ene, Huy Nguyen	In this paper, we use random coordinate descent methods to obtain algorithms with faster \emphlinear convergence rates and cheaper iteration costs.
85	Alpha-Beta Divergences Discover Micro and Macro Structures in Data	Karthik Narayan, Ali Punjani, Pieter Abbeel	We study this relationship, theoretically and through an empirical analysis over 10 datasets.
86	Fictitious Self-Play in Extensive-Form Games	Johannes Heinrich, Marc Lanctot, David Silver	This paper introduces two variants of fictitious play that are implemented in behavioural strategies of an extensive-form game.
87	Counterfactual Risk Minimization: Learning from Logged Bandit Feedback	Adith Swaminathan, Thorsten Joachims	We present a decomposition of the POEM objective that enables efficient stochastic gradient optimization.
88	The Hedge Algorithm on a Continuum	Walid Krichene, Maximilian Balandat, Claire Tomlin, Alexandre Bayen	We consider an online optimization problem on a subset S of R^n (not necessarily convex), in which a decision maker chooses, at each iteration t, a probability distribution x^(t) over S, and seeks to minimize a cumulative expected loss, where each loss is a Lipschitz function revealed at the end of iteration t. Building on previous work, we propose a generalized Hedge algorithm and show a O(\sqrtt \log t) bound on the regret when the losses are uniformly Lipschitz and S is uniformly fat (a weaker condition than convexity).
89	A Linear Dynamical System Model for Text	David Belanger, Sham Kakade	Our learning algorithm is extremely scalable, operating on simple co-occurrence counts for both parameter initialization using the method of moments and subsequent iterations of EM.
90	Unsupervised Learning of Video Representations using LSTMs	Nitish Srivastava, Elman Mansimov, Ruslan Salakhudinov	We use Long Short Term Memory (LSTM) networks to learn representations of video sequences.
91	Message Passing for Collective Graphical Models	Tao Sun, Dan Sheldon, Akshat Kumar	Collective graphical models (CGMs) are a formalism for inference and learning about a population of independent and identically distributed individuals when only noisy aggregate data are available.
92	DP-space: Bayesian Nonparametric Subspace Clustering with Small-variance Asymptotics	Yining Wang, Jun Zhu	This paper presents a novel nonparametric Bayesian subspace clustering model that infers both the number of subspaces and the dimension of each subspace from the observed data.
93	HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades	Xinran He, Theodoros Rekatsinas, James Foulds, Lise Getoor, Yan Liu	In this work, we develop the HawkesTopic model (HTM) for analyzing text-based cascades, such as “retweeting a post” or “publishing a follow-up blog post”.
94	MADE: Masked Autoencoder for Distribution Estimation	Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle	We introduce a simple modification for autoencoder neural networks that yields powerful generative models.
95	An Online Learning Algorithm for Bilinear Models	Yuanbin Wu, Shiliang Sun	A new online learning algorithm is proposed to train the model parameters.
96	Adaptive Belief Propagation	Georgios Papachristoudis, John Fisher	Graphical models are widely used in inference problems.
97	Large-scale log-determinant computation through stochastic Chebyshev expansions	Insu Han, Dmitry Malioutov, Jinwoo Shin	We propose a linear-time randomized algorithm to approximate log-determinants for very large-scale positive definite and general non-singular matrices using a stochastic trace approximation, called the Hutchinson method, coupled with Chebyshev polynomial expansions that both rely on efficient matrix-vector multiplications.
98	Differentially Private Bayesian Optimization	Matt Kusner, Jacob Gardner, Roman Garnett, Kilian Weinberger	To address this, we introduce methods for releasing the best hyper-parameters and classifier accuracy privately.
99	A Nearly-Linear Time Framework for Graph-Structured Sparsity	Chinmay Hegde, Piotr Indyk, Ludwig Schmidt	We introduce a framework for sparsity structures defined via graphs.
100	Support Matrix Machines	Luo Luo, Yubo Xie, Zhihua Zhang, Wu-Jun Li	To leverage this kind of structure information, we propose a new classification method that we call support matrix machine (SMM).
101	Rademacher Observations, Private Data, and Boosting	Richard Nock, Giorgio Patrini, Arik Friedman	We provide a learning algorithm over rados with boosting-compliant convergence rates on the \textitlogistic loss (computed over examples).
102	From Word Embeddings To Document Distances	Matt Kusner, Yu Sun, Nicholas Kolkin, Kilian Weinberger	We present the Word Mover’s Distance (WMD), a novel distance function between text documents.
103	Bayesian and Empirical Bayesian Forests	Taddy Matthew, Chun-Sheng Chen, Jun Yu, Mitch Wyle	We derive ensembles of decision trees through a nonparametric Bayesian model, allowing us to view such ensembles as samples from a posterior distribution.
104	Inferring Graphs from Cascades: A Sparse Recovery Framework	Jean Pouget-Abadie, Thibaut Horel	In this paper, we approach this problem from the sparse recovery perspective.
105	Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM	Ching-Pei Lee, Dan Roth	This paper proposes an efficient box-constrained quadratic optimization algorithm for distributedly training linear support vector machines (SVMs) with large data.
106	Safe Exploration for Optimization with Gaussian Processes	Yanan Sui, Alkis Gotovos, Joel Burdick, Andreas Krause	We consider sequential decision problems under uncertainty, where we seek to optimize an unknown function from noisy samples.
107	The Ladder: A Reliable Leaderboard for Machine Learning Competitions	Avrim Blum, Moritz Hardt	We introduce a natural algorithm called the Ladder and demonstrate that it simultaneously supports strong theoretical guarantees in a fully adaptive model of estimation, withstands practical adversarial attacks, and achieves high utility on real submission files from a Kaggle competition.
108	Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the Unbiased LInear System SolvEr (ULISSE)	Maurizio Filippone, Raphael Engler	This paper proposes an adaptation of the Stochastic Gradient Langevin Dynamics algorithm to draw samples from the posterior distribution over covariance parameters with negligible bias and without the need to compute the marginal likelihood.
109	Finding Galaxies in the Shadows of Quasars with Gaussian Processes	Roman Garnett, Shirley Ho, Jeff Schneider	We develop an automated technique for detecting damped Lyman-αabsorbers (DLAs) along spectroscopic sightlines to quasi-stellar objects (QSOs or quasars).
110	Following the Perturbed Leader for Online Structured Learning	Alon Cohen, Tamir Hazan	To better understand FTPL algorithms for online structured learning, we present a lower bound on the regret for a large and natural class of FTPL algorithms that use logconcave perturbations.
111	Reified Context Models	Jacob Steinhardt, Percy Liang	In this work, we introduce a new approach, reified context models, to reconcile this tension.
112	Large-Scale Markov Decision Problems with KL Control Cost and its Application to Crowdsourcing	Yasin Abbasi-Yadkori, Peter Bartlett, Xi Chen, Alan Malek	We study average and total cost Markov decision problems with large state spaces.
113	Learning Fast-Mixing Models for Structured Prediction	Jacob Steinhardt, Percy Liang	Markov Chain Monte Carlo (MCMC) algorithms are often used for approximate inference inside learning, but their slow mixing can be difficult to diagnose and the resulting approximate gradients can seriously degrade learning.
114	A Probabilistic Model for Dirty Multi-task Feature Selection	Daniel Hernandez-Lobato, Jose Miguel Hernandez-Lobato, Zoubin Ghahramani	To account for this, we propose a model for multi-task feature selection based on a robust prior distribution that introduces a set of binary latent variables to identify outlier tasks and outlier features.
115	On Deep Multi-View Representation Learning	Weiran Wang, Raman Arora, Karen Livescu, Jeff Bilmes	Previous work on this problem has proposed several techniques based on deep neural networks, typically involving either autoencoder-like networks with a reconstruction objective or paired feedforward networks with a correlation-based objective.
116	Learning Program Embeddings to Propagate Feedback on Student Code	Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran Sahami, Leonidas Guibas	We introduce a neural network method to encode programs as a linear mapping from an embedded precondition space to an embedded postcondition space and propose an algorithm for feedback at scale using these linear maps as features.
117	Safe Subspace Screening for Nuclear Norm Regularized Least Squares Problems	Qiang Zhou, Qi Zhao	In this work, we propose a novel method called safe subspace screening (SSS), to improve the efficiency of the solver for nuclear norm regularized least squares problems.
118	Efficient Learning in Large-Scale Combinatorial Semi-Bandits	Zheng Wen, Branislav Kveton, Azin Ashkan	In this paper, we consider efficient learning in large-scale combinatorial semi-bandits with linear generalization, and as a solution, propose two learning algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) and Combinatorial Linear UCB (CombLinUCB).
119	Swept Approximate Message Passing for Sparse Estimation	Andre Manoel, Florent Krzakala, Eric Tramel, Lenka Zdeborov�	We propose a new approach to stabilizing AMP in these contexts by applying AMP updates to individual coefficients rather than in parallel.
120	Simple regret for infinitely many armed bandits	Alexandra Carpentier, Michal Valko	In this paper, we propose an algorithm aiming at minimizing the simple regret.
121	Exponential Integration for Hamiltonian Monte Carlo	Wei-Lun Chao, Justin Solomon, Dominik Michels, Fei Sha	We consider various ways to derive Gaussian approximations and conduct extensive empirical studies applying the proposed “exponential HMC” to several benchmarked learning problems.
122	Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays	Junpei Komiyama, Junya Honda, Hiroshi Nakagawa	In this paper, we propose the multiple-play Thompson sampling (MP-TS) algorithm, an extension of TS to the multiple-play MAB problem, and discuss its regret analysis.
123	Faster cover trees	Mike Izbicki, Christian Shelton	This paper makes cover trees even faster.
124	Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization	Tyler Johnson, Carlos Guestrin	We propose Blitz, a fast working set algorithm accompanied by useful guarantees.
125	Unsupervised Domain Adaptation by Backpropagation	Yaroslav Ganin, Victor Lempitsky	Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled target-domain data is necessary).
126	Non-Linear Cross-Domain Collaborative Filtering via Hyper-Structure Transfer	Yan-Fu Liu, Cheng-Yu Hsu, Shan-Hung Wu	In this paper, we propose the notion of Hyper-Structure Transfer (HST) that requires the rating matrices to be explained by the projections of some more complex structure, called the hyper-structure, shared by all domains, and thus allows the non-linearly correlated knowledge between domains to be identified and transferred.
127	Manifold-valued Dirichlet Processes	Hyunwoo Kim, Jia Xu, Baba Vemuri, Vikas Singh	To address this ’locality’ problem, we propose a novel nonparametric model which unifies multivariate general linear models (MGLMs) using multiple tangent spaces.
128	Multi-Task Learning for Subspace Segmentation	Yu Wang, David Wipf, Qing Ling, Wei Chen, Ian Wassell	Multi-Task Learning for Subspace Segmentation
129	Markov Chain Monte Carlo and Variational Inference: Bridging the Gap	Tim Salimans, Diederik Kingma, Max Welling	We describe the theoretical foundations that make this possible and show some promising first results.
130	Scalable Model Selection for Large-Scale Factorial Relational Models	Chunchen Liu, Lu Feng, Ryohei Fujimaki, Yusuke Muraoka	For scalable model selection of BMFs, this paper proposes stochastic factorized asymptotic Bayesian (sFAB) inference that combines concepts in two recently-developed techniques: stochastic variational inference (SVI) and FAB inference.
131	The Power of Randomization: Distributed Submodular Maximization on Massive Datasets	Rafael Barbosa, Alina Ene, Huy Nguyen, Justin Ward	We consider a distributed, greedy algorithm that combines previous approaches with randomization.
132	Dealing with small data: On the generalization of context trees	Ralf Eggeling, Mikko Koivisto, Ivo Grosse	In this work, we investigate to which degree CTs can be generalized to increase statistical efficiency while still keeping the learning computationally feasible.
133	Non-Gaussian Discriminative Factor Models via the Max-Margin Rank-Likelihood	Xin Yuan, Ricardo Henao, Ephraim Tsalik, Raymond Langley, Lawrence Carin	A Bayesian model based on the ranks of the data is proposed.
134	A Bayesian nonparametric procedure for comparing algorithms	Alessio Benavoli, Giorgio Corani, Francesca Mangili, Marco Zaffalon	We show by simulation that our approach is competitive both in terms of accuracy and speed in identifying the best algorithm.
135	Convergence rate of Bayesian tensor estimator and its minimax optimality	Taiji Suzuki	We investigate the statistical convergence rate of a Bayesian low-rank tensor estimator, and derive the minimax optimal rate for learning a low-rank tensor.
136	On Identifying Good Options under Combinatorially Structured Feedback in Finite Noisy Environments	Yifan Wu, Andras Gyorgy, Csaba Szepesvari	We consider the problem of identifying a good option out of finite set of options under combinatorially structured, noisy feedback about the quality of the options in a sequential process: In each round, a subset of the options, from an available set of subsets, can be selected to receive noisy information about the quality of the options in the chosen subset.
137	Nested Sequential Monte Carlo Methods	Christian Naesseth, Fredrik Lindsten, Thomas Schon	We propose nested sequential Monte Carlo (NSMC), a methodology to sample from sequences of probability distributions, even where the random variables are high-dimensional.
138	Sparse Variational Inference for Generalized GP Models	Rishit Sheth, Yuyang Wang, Roni Khardon	This paper develops a variational sparse solution for GPs under general likelihoods by providing a new characterization of the gradients required for inference in terms of individual observation likelihood terms.
139	Universal Value Function Approximators	Tom Schaul, Daniel Horgan, Karol Gregor, David Silver	In this paper we introduce universal value function approximators (UVFAs) V(s,g;theta) that generalise not just over states s but also over goals g.
140	Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games	Julien Perolat, Bruno Scherrer, Bilal Piot, Olivier Pietquin	This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games.
141	On Greedy Maximization of Entropy	Dravyansh Sharma, Ashish Kapoor, Amit Deshpande	The main goal of this paper is to explore and answer why the greedy selection does significantly better than the theoretical guarantee of (1 – 1/e).
142	Metadata Dependent Mondrian Processes	Yi Wang, Bin Li, Yang Wang, Fang Chen	In this paper, we propose a metadata dependent Mondrian process (MDMP) to incorporate meta information into the stochastic partition process in the product space and the entity allocation process on the resulting block structure.
143	Complex Event Detection using Semantic Saliency and Nearly-Isotonic SVM	Xiaojun Chang, Yi Yang, Eric Xing, Yaoliang Yu	We aim to detect complex events in long Internet videos that may last for hours.
144	Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal Likelihood	Kohei Hayashi, Shin-ichi Maeda, Ryohei Fujimaki	Factorized information criterion (FIC) is a recently developed approximation technique for the marginal log-likelihood, which provides an automatic model selection framework for a few latent variable models (LVMs) with tractable inference algorithms.
145	Double Nystr�m Method: An Efficient and Accurate Nystr�m Scheme for Large-Scale Data Sets	Woosang Lim, Minhwan Kim, Haesun Park, Kyomin Jung	In this paper, we present a novel Nyström method that improves both accuracy and efficiency based on a new theoretical analysis.
146	The Composition Theorem for Differential Privacy	Peter Kairouz, Sewoong Oh, Pramod Viswanath	In this paper we answer the fundamental question of characterizing the level of privacy degradation as a function of the number of adaptive interactions and the differential privacy levels maintained by the individual queries.
147	Convex Formulation for Learning from Positive and Unlabeled Data	Marthinus Du Plessis, Gang Niu, Masashi Sugiyama	In this paper, we discuss a convex formulation for PU classification that can still cancel the bias.
148	Threshold Influence Model for Allocating Advertising Budgets	Atsushi Miyauchi, Yuni Iwamasa, Takuro Fukunaga, Naonori Kakimura	We propose a new influence model for allocating budgets to advertising channels.
149	Strongly Adaptive Online Learning	Amit Daniely, Alon Gonen, Shai Shalev-Shwartz	We present a reduction that can transform standard low-regret algorithms to strongly adaptive.
150	CUR Algorithm for Partially Observed Matrices	Miao Xu, Rong Jin, Zhi-Hua Zhou	In this work, we alleviate this limitation by developing a CUR decomposition algorithm for partially observed matrices.
151	A Deterministic Analysis of Noisy Sparse Subspace Clustering for Dimensionality-reduced Data	Yining Wang, Yu-Xiang Wang, Aarti Singh	In this paper, we propose a theoretical framework to analyze a popular optimization-based algorithm, Sparse Subspace Clustering (SSC), when the data dimension is compressed via some random projection algorithms.
152	MRA-based Statistical Learning from Incomplete Rankings	Eric Sibony, St�phan Clemen�on, J�r�mie Jakubowicz	The goal of this paper is twofold: it develops a rigorous mathematical framework for the problem of learning a ranking model from incomplete rankings and introduces a novel general statistical method to address it.
153	Risk and Regret of Hierarchical Bayesian Learners	Jonathan Huggins, Josh Tenenbaum	We present a set of analytical tools for understanding hierarchical priors in both the online and batch learning settings.
154	Towards a Learning Theory of Cause-Effect Inference	David Lopez-Paz, Krikamol Muandet, Bernhard Sch�lkopf, Iliya Tolstikhin	We pose causal inference as the problem of learning to classify probability distributions.
155	DRAW: A Recurrent Neural Network For Image Generation	Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Rezende, Daan Wierstra	This paper introduces the Deep Recurrent Attentive Writer (DRAW) architecture for image generation with neural networks.
156	Multiview Triplet Embedding: Learning Attributes in Multiple Maps	Ehsan Amid, Antti Ukkonen	In this paper, we consider the problem of uncovering these hidden attributes given a set of relative distance judgments in the form of triplets.
157	Distributed Gaussian Processes	Marc Deisenroth, Jun Wei Ng	To scale Gaussian processes (GPs) to large data sets we introduce the robust Bayesian Committee Machine (rBCM), a practical and scalable product-of-experts model for large-scale distributed GP regression.
158	Guaranteed Tensor Decomposition: A Moment Approach	Gongguo Tang, Parikshit Shah	To address the computational challenge, we present a hierarchy of semidefinite programs based on sums-of-squares relaxations of the measure optimization problem.
159	\ell_1,p-Norm Regularization: Error Bounds and Convergence Rate Analysis of First-Order Methods	Zirui Zhou, Qi Zhang, Anthony Man-Cho So	Motivated by the desire to analyze the convergence rate of first-order methods, we show that for a large class of \ell_1,p-regularized problems, an error bound condition is satisfied when p∈[1,2] or p=∞but fails to hold for any p∈(2,∞).
160	Consistent estimation of dynamic and multi-layer block models	Qiuyi Han, Kevin Xu, Edoardo Airoldi	In this paper, we consider the multi-graph SBM, which serves as a foundation for many application settings including dynamic and multi-layer networks.
161	On the Rate of Convergence and Error Bounds for LSTD(?)	Manel Tagorti, Bruno Scherrer	We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002).
162	Variational Inference with Normalizing Flows	Danilo Rezende, Shakir Mohamed	We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions.
163	Controversy in mechanistic modelling with Gaussian processes	Benn Macdonald, Catherine Higham, Dirk Husmeier	In the present article, we offer a new interpretation of the second paradigm, which highlights the underlying assumptions, approximations and limitations.
164	Convex Learning of Multiple Tasks and their Structure	Carlo Ciliberto, Youssef Mroueh, Tomaso Poggio, Lorenzo Rosasco	Within this framework, we show that tasks and their structure can be efficiently learned considering a convex optimization problem that can be approached by means of block coordinate methods such as alternating minimization and for which we prove convergence to the global minimum.
165	K-hyperplane Hinge-Minimax Classifier	Margarita Osadchy, Tamir Hazan, Daniel Keren	We propose an efficient algorithm for training an intersection of finite number of hyperplane and demonstrate its effectiveness on real data, including letter and scene recognition.
166	Non-Stationary Approximate Modified Policy Iteration	Boris Lesner, Bruno Scherrer	We consider the infinite-horizon γ-discounted optimal control problem formalized by Markov Decision Processes.
167	Entropy evaluation based on confidence intervals of frequency estimates : Application to the learning of decision trees	Mathieu Serrurier, Henri Prade	We propose a new cumulative entropy function based on confidence intervals on frequency estimates that together considers the entropy of the probability distribution and the uncertainty around the estimation of its parameters.
168	Geometric Conditions for Subspace-Sparse Recovery	Chong You, Rene Vidal	In this work, we consider the more general case where ξlies in a low-dimensional subspace spanned by a few columns of \Pi, which are possibly \textitlinearly dependent.
169	An Empirical Study of Stochastic Variational Inference Algorithms for the Beta Bernoulli Process	Amar Shah, David Knowles, Zoubin Ghahramani	Deriving several new algorithms, and using synthetic, image and genomic datasets, we investigate whether the understanding gleaned from LDA applies in the setting of sparse latent factor models, specifically beta process factor analysis (BPFA).
170	Long Short-Term Memory Over Recursive Structures	Xiaodan Zhu, Parinaz Sobihani, Hongyu Guo	In this paper, we propose to extend it to tree structures, in which a memory cell can reflect the history memories of multiple child cells or multiple descendant cells in a recursive process.
171	Weight Uncertainty in Neural Network	Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, Daan Wierstra	We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop.
172	Learning Submodular Losses with the Lovasz Hinge	Jiaqian Yu, Matthew Blaschko	In this work, we show that these strategies lead to tight convex surrogates iff the underlying loss function is increasing in the number of incorrect predictions.
173	Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection	Julie Nutini, Mark Schmidt, Issam Laradji, Michael Friedlander, Hoyt Koepke	We give a simple analysis of the Gauss-Southwell rule showing that—except in extreme cases—it’s convergence rate is faster than choosing random coordinates.
174	Hashing for Distributed Data	Cong Leng, Jiaxiang Wu, Jian Cheng, Xi Zhang, Hanqing Lu	In this paper, we develop a novel hashing model to learn hash functions in a distributed setting.
175	Large-scale Distributed Dependent Nonparametric Trees	Zhiting Hu, Ho Qirong, Avinava Dubey, Eric Xing	In this paper, we consider dependent nonparametric trees (DNTs), a powerful infinite model that captures time-evolving hierarchies, and develop a large-scale distributed training system.
176	Qualitative Multi-Armed Bandits: A Quantile-Based Approach	Balazs Szorenyi, Robert Busa-Fekete, Paul Weng, Eyke H�llermeier	For both cases, we propose suitable algorithms and analyze their properties.
177	Deep Edge-Aware Filters	Li Xu, Jimmy Ren, Qiong Yan, Renjie Liao, Jiaya Jia	We made the attempt to learn a big and important family of edge-aware operators from data.
178	A Convex Optimization Framework for Bi-Clustering	Shiau Hong Lim, Yudong Chen, Huan Xu	We present a framework for biclustering and clustering where the observations are general labels.
179	Is Feature Selection Secure against Training Data Poisoning?	Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert, Fabio Roli	In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net.
180	Predictive Entropy Search for Bayesian Optimization with Unknown Constraints	Jose Miguel Hernandez-Lobato, Michael Gelbart, Matthew Hoffman, Ryan Adams, Zoubin Ghahramani	In this paper, we present a new information-based method called Predictive Entropy Search with Constraints (PESC).
181	A Theoretical Analysis of Metric Hypothesis Transfer Learning	Micha�l Perrot, Amaury Habrard	We propose an on-average-replace-two-stability model allowing us to prove fast generalization rates when an auxiliary source metric is used to bias the regularizer.
182	Generative Moment Matching Networks	Yujia Li, Kevin Swersky, Rich Zemel	We consider the problem of learning deep generative models from data.
183	Stay on path: PCA along graph paths	Megasthenis Asteris, Anastasios Kyrillidis, Alex Dimakis, Han-Gyol Yi, Bharath Chandrasekaran	We propose two algorithms to approximate the solution of the constrained quadratic maximization, and recover a component with the desired properties.
184	Deep Learning with Limited Numerical Precision	Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, Pritish Narayanan	We study the effect of limited precision data representation and computation on neural network training.
185	Safe Screening for Multi-Task Feature Learning with Multiple Data Matrices	Jie Wang, Jieping Ye	In this paper, we propose a novel screening rule—that is based on the dual projection onto convex sets (DPC)—to quickly identify the inactive features—that have zero coefficients in the solution vectors across all tasks.
186	Harmonic Exponential Families on Manifolds	Taco Cohen, Max Welling	We define an extremely flexible class of exponential family distributions on manifolds such as the torus, sphere, and rotation groups, and show that for these distributions the gradient of the log-likelihood can be computed efficiently using a non-commutative generalization of the Fast Fourier Transform (FFT).
187	Training Deep Convolutional Neural Networks to Play Go	Christopher Clark, Amos Storkey	To solve this problem we introduce a number of novel techniques, including a method of tying weights in the network to ’hard code’ symmetries that are expected to exist in the target function, and demonstrate in an ablation study they considerably improve performance.
188	Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP)	Andrew Wilson, Hannes Nickisch	We introduce a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs).
189	Learning Deep Structured Models	Liang-Chieh Chen, Alexander Schwing, Alan Yuille, Raquel Urtasun	The goal of this paper is to combine MRFs with deep learning to estimate complex representations while taking into account the dependencies between the output random variables.
190	Community Detection Using Time-Dependent Personalized PageRank	Haim Avron, Lior Horesh	We present an efficient local algorithm for approximating a graph diffusion that generalizes both the celebrated personalized PageRank and its recent competitor/companion – the heat kernel.
191	Scalable Variational Inference in Log-supermodular Models	Josip Djolonga, Andreas Krause	We consider the problem of approximate Bayesian inference in log-supermodular models.
192	Variational Inference for Gaussian Process Modulated Poisson Processes	Chris Lloyd, Tom Gunter, Michael Osborne, Stephen Roberts	We present the first fully variational Bayesian inference scheme for continuous Gaussian-process-modulated Poisson processes.
193	Scalable Deep Poisson Factor Analysis for Topic Modeling	Zhe Gan, Changyou Chen, Ricardo Henao, David Carlson, Lawrence Carin	A new framework for topic modeling is developed, based on deep graphical models, where interactions between topics are inferred through deep latent binary hierarchies.
194	Hidden Markov Anomaly Detection	Nico Goernitz, Mikio Braun, Marius Kloft	We introduce a new anomaly detection methodology for data with latent dependency structure.
195	Robust Estimation of Transition Matrices in High Dimensional Heavy-tailed Vector Autoregressive Processes	Huitong Qiu, Sheng Xu, Fang Han, Han Liu, Brian Caffo	In this paper, we develop a unified framework for modeling and estimating heavy-tailed VAR processes.
196	Convex Calibrated Surrogates for Hierarchical Classification	Harish Ramaswamy, Ambuj Tewari, Shivani Agarwal	In this work, we study the consistency of hierarchical classification algorithms with respect to a natural loss, namely the tree distance metric on the hierarchy tree of class labels, via the usage of calibrated surrogates.
197	Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks	Jose Miguel Hernandez-Lobato, Ryan Adams	In this work we present a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP).
198	Active Nearest Neighbors in Changing Environments	Christopher Berlind, Ruth Urner	We propose a novel nonparametric algorithm, ANDA, that combines an active nearest neighbor querying strategy with nearest neighbor prediction.
199	Bipartite Edge Prediction via Transductive Learning over Product Graphs	Hanxiao Liu, Yiming Yang	We propose a new optimization framework to map the two sides of the intrinsic structures onto the manifold structure of the edges via a graph product, and to reduce the original problem to vertex label propagation over the product graph.
200	Trust Region Policy Optimization	John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, Philipp Moritz	In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement.
201	Discovering Temporal Causal Relations from Subsampled Data	Mingming Gong, Kun Zhang, Bernhard Schoelkopf, Dacheng Tao, Philipp Geiger	In this paper, we assume that the time series at the true causal frequency follow the vector autoregressive model.
202	Preference Completion: Large-scale Collaborative Ranking from Pairwise Comparisons	Dohyung Park, Joe Neeman, Jin Zhang, Sujay Sanghavi, Inderjit Dhillon	In this paper we consider the collaborative ranking setting: a pool of users each provides a set of pairwise preferences over a small subset of the set of d possible items; from these we need to predict each user’s preferences for items s/he has not yet seen.
203	Causal Inference by Identification of Vector Autoregressive Processes with Hidden Components	Philipp Geiger, Kun Zhang, Bernhard Schoelkopf, Mingming Gong, Dominik Janzing	In this paper we take a different approach: We assume that X together with some hidden Z forms a first order vector autoregressive (VAR) process with transition matrix A, and argue why it is more valid to interpret A causally instead of \hatB.
204	On Symmetric and Asymmetric LSHs for Inner Product Search	Behnam Neyshabur, Nathan Srebro	We consider the problem of designing locality sensitive hashes (LSH) for inner product similarity, and of the power of asymmetric hashes in this context.
205	The Kendall and Mallows Kernels for Permutations	Yunlong Jiao, Jean-Philippe Vert	We show that the widely used Kendall tau correlation coefficient is a positive definite kernel for permutations.
206	Bayesian Multiple Target Localization	Purnima Rajan, Weidong Han, Raphael Sznitman, Peter Frazier, Bruno Jedynak	We present an empirical evaluation of this policy on simulated data for the problem of detecting multiple instances of the same object in an image.
207	Submodularity in Data Subset Selection and Active Learning	Kai Wei, Rishabh Iyer, Jeff Bilmes	We study the problem of selecting a subset of big data to train a classifier while incurring minimal performance loss.
208	Variational Generative Stochastic Networks with Collaborative Shaping	Philip Bachman, Doina Precup	We present empirical results on the MNIST and TFD datasets which show that our approach offers state-of-the-art performance, both quantitatively and from a qualitative point of view.
209	Adding vs. Averaging in Distributed Primal-Dual Optimization	Chenxin Ma, Virginia Smith, Martin Jaggi, Michael Jordan, Peter Richtarik, Martin Takac	In this paper, we present a novel generalization of the recent communication-efficient primal-dual framework (COCOA) for distributed optimization.
210	Feature-Budgeted Random Forest	Feng Nan, Joseph Wang, Venkatesh Saligrama	We propose a novel random forest algorithm to minimize prediction error for a user-specified \it average feature acquisition budget.
211	Entropic Graph-based Posterior Regularization	Maxwell Libbrecht, Michael Hoffman, Jeff Bilmes, William Noble	We present a three-way alternating optimization algorithm with closed-form updates for performing inference on this joint model and learning its parameters.
212	Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations	Tam Le, Marco Cuturi	We consider in this paper the problem of learning a Riemannian metric on the simplex given unlabeled histogram data.
213	Low-Rank Matrix Recovery from Row-and-Column Affine Measurements	Or Zuk, Avishai Wagner	We propose a simple algorithm for the problem based on Singular Value Decomposition (SVD) and least-squares (LS), which we term alg.
214	Algorithms for the Hard Pre-Image Problem of String Kernels and the General Problem of String Prediction	S�bastien Gigu�re, Am�lie Rolland, Francois Laviolette, Mario Marchand	For this problem, we propose an upper bound on the prediction function which has low computational complexity and which can be used in a branch and bound search algorithm to obtain optimal solutions.
215	A Multitask Point Process Predictive Model	Wenzhao Lian, Ricardo Henao, Vinayak Rao, Joseph Lucas, Lawrence Carin	In this work we propose a multitask point process model, leveraging information from all tasks via a hierarchical Gaussian process (GP).
216	A Hybrid Approach for Probabilistic Inference using Random Projections	Michael Zhu, Stefano Ermon	We introduce a new meta-algorithm for probabilistic inference in graphical models based on random projections.
217	Show, Attend and Tell: Neural Image Caption Generation with Visual Attention	Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, Yoshua Bengio	Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images.
218	Learning to Search Better than Your Teacher	Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daume, John Langford	We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy: a local-optimality guarantee.
219	Gated Feedback Recurrent Neural Networks	Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio	In this work, we propose a novel recurrent neural network (RNN) architecture.
220	Context-based Unsupervised Data Fusion for Decision Making	Erfan Soltanmohammadi, Mort Naraghi-Pour, Mihaela Schaar	In this paper, we propose an unsupervised joint estimation-detection scheme to estimate the accuracies of the local classifiers as functions of data context and to fuse the local decisions of the classifiers.
221	Phrase-based Image Captioning	Remi Lebret, Pedro Pinheiro, Ronan Collobert	In this paper, we present a simple model that is able to generate descriptive sentences given a sample image.
222	Celeste: Variational inference for a generative model of astronomical images	Jeffrey Regier, Andrew Miller, Jon McAuliffe, Ryan Adams, Matt Hoffman, Dustin Lang, David Schlegel, Mr Prabhat	We present a new, fully generative model of optical telescope image sets, along with a variational procedure for inference.
223	Distributional Rank Aggregation, and an Axiomatic Analysis	Adarsh Prasad, Harsh Pareek, Pradeep Ravikumar	We introduce a variant of this problem we call distributional rank aggregation, where the ranking data is only available via the induced distribution over the set of all permutations.
224	Gradient-based Hyperparameter Optimization through Reversible Learning	Dougal Maclaurin, David Duvenaud, Ryan Adams	Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable.
225	Bimodal Modelling of Source Code and Natural Language	Miltos Allamanis, Daniel Tarlow, Andrew Gordon, Yi Wei	We consider the problem of building probabilistic models that jointly model short natural language utterances and source code snippets.
226	Cheap Bandits	Manjesh Hanawal, Venkatesh Saligrama, Michal Valko, Remi Munos	In this paper we propose CheapUCB, an algorithm that matches the regret guarantees of the known algorithms for this setting and at the same time guarantees a linear cost again over them.
227	Subsampling Methods for Persistent Homology	Frederic Chazal, Brittany Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, Larry Wasserman	We study the risk of two estimators and we prove that the subsampling approach carries stable topological information while achieving a great reduction in computational complexity.
228	An embarrassingly simple approach to zero-shot learning	Bernardino Romera-Paredes, Philip Torr	In this paper we describe a zero-shot learning approach that can be implemented in just one line of code, yet it is able to outperform state of the art approaches on standard datasets.
229	Binary Embedding: Fundamental Limits and Fast Algorithm	Xinyang Yi, Constantine Caramanis, Eric Price	Specifically, for an arbitrary N distinct points in \mathbbS^p-1, our goal is to encode each point using m-dimensional binary strings such that we can reconstruct their geodesic distance up to δuniform distortion.
230	Scalable Bayesian Optimization Using Deep Neural Networks	Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Mr Prabhat, Ryan Adams	In this work, we explore the use of neural networks as an alternative to GPs to model distributions over functions.
231	How Hard is Inference for Structured Prediction?	Amir Globerson, Tim Roughgarden, David Sontag, Cafer Yildirim	The goal of this paper is to develop a theoretical explanation of the empirical effectiveness of heuristic inference algorithms for solving such structured prediction problems.
232	Online Time Series Prediction with Missing Data	Oren Anava, Elad Hazan, Assaf Zeevi	We consider the problem of time series prediction in the presence of missing data.
233	Proteins, Particles, and Pseudo-Max-Marginals: A Submodular Approach	Jason Pacheco, Erik Sudderth	Motivated by the challenging problem of protein side chain prediction, we extend D-PMP in several key ways to create a generic MAP inference algorithm for loopy models.
234	A Fast Variational Approach for Learning Markov Random Field Language Models	Yacine Jernite, Alexander Rush, David Sontag	In this work, we take a step towards overcoming these difficulties.
235	Removing systematic errors for exoplanet search via latent causes	Bernhard Sch�lkopf, David Hogg, Dun Wang, Dan Foreman-Mackey, Dominik Janzing, Carl-Johann Simon-Gabriel, Jonas Peters	We describe a method for removing the effect of confounders in order to reconstruct a latent quantity of interest.
236	Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes	Yves-Laurent Kom Samo, Stephen Roberts	In this paper we propose an efficient, scalable non-parametric Gaussian process model for inference on Poisson point processes.
237	Correlation Clustering in Data Streams	KookJin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, Anthony Wirth	In this paper, we address the problem of \emphcorrelation clustering in the dynamic data stream model.
238	Learning Scale-Free Networks by Dynamic Node Specific Degree Prior	Qingming Tang, Siqi Sun, Jinbo Xu	To fulfill this, this paper proposes a ranking-based method to dynamically estimate the degree of a node, which makes the resultant optimization problem challenging to solve.
239	Deep Unsupervised Learning using Nonequilibrium Thermodynamics	Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, Surya Ganguli	Here, we develop an approach that simultaneously achieves both flexibility and tractability.
240	Modeling Order in Neural Word Embeddings at Scale	Andrew Trask, David Gilmore, Matthew Russell	We propose a new neural language model incorporating both word order and character order in its embedding.
241	Distributed Inference for Dirichlet Process Mixture Models	Hong Ge, Yutian Chen, Moquan Wan, Zoubin Ghahramani	In this paper, we propose an efficient distributed inference algorithm for the DP and the HDP mixture model.
242	Compressing Neural Networks with the Hashing Trick	Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, Yixin Chen	We present a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes.
243	Intersecting Faces: Non-negative Matrix Factorization With New Guarantees	Rong Ge, James Zou	In this paper, we propose the notion of subset-separable NMF, which substantially generalizes the property of separability.
244	Scaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix	Roger Grosse, Ruslan Salakhudinov	We present FActorized Natural Gradient (FANG), an approximation to natural gradient descent where the Fisher matrix is approximated with a Gaussian graphical model whose precision matrix can be computed efficiently.
245	A Deeper Look at Planning as Learning from Replay	Harm Vanseijen, Rich Sutton	In this paper, we look more deeply at how replay blurs the line between model-based and model-free methods.
246	Optimal and Adaptive Algorithms for Online Boosting	Alina Beygelzimer, Satyen Kale, Haipeng Luo	We study online boosting, the task of converting any weak online learner into a strong online learner.
247	Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems	Christopher De Sa, Christopher Re, Kunle Olukotun	In this paper, we exhibit a step size scheme for SGD on a low-rank least-squares problem, and we prove that, under broad sampling conditions, our method converges globally from a random starting point within O(ε^-1 n \log n) steps with constant probability for constant-rank problems.
248	An Empirical Exploration of Recurrent Network Architectures	Rafal Jozefowicz, Wojciech Zaremba, Ilya Sutskever	In this work, we aim to determine whether the LSTM architecture is optimal or whether much better architectures exist.
249	Complete Dictionary Recovery Using Nonconvex Optimization	Ju Sun, Qing Qu, John Wright	We consider the problem of recovering a complete (i.e., square and invertible) dictionary mb A_0, from mb Y = mb A_0 mb X_0 with mb Y ∈\mathbb R^n \times p.
250	Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret	Haitham Bou Ammar, Rasul Tutunov, Eric Eaton	Lifelong reinforcement learning provides a promising framework for developing versatile agents that can accumulate knowledge over a lifetime of experience and rapidly learn new tasks by building upon prior knowledge.
251	PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent	Cho-Jui Hsieh, Hsiang-Fu Yu, Inderjit Dhillon	In this paper, we parallelize the DCD algorithms in LIBLINEAR.
252	High Confidence Policy Improvement	Philip Thomas, Georgios Theocharous, Mohammad Ghavamzadeh	We present a batch reinforcement learning (RL) algorithm that provides probabilistic guarantees about the quality of each policy that it proposes, and which has no hyper-parameter that requires expert tuning.
253	Fixed-point algorithms for learning determinantal point processes	Zelda Mariet, Suvrit Sra	We present experimental results on both real and simulated data to illustrate the numerical performance of our technique.
254	Consistent Multiclass Algorithms for Complex Performance Measures	Harikrishna Narasimhan, Harish Ramaswamy, Aadirupa Saha, Shivani Agarwal	This paper presents new consistent algorithms for multiclass learning with complex performance measures, defined by arbitrary functions of the confusion matrix.
255	Optimizing Neural Networks with Kronecker-factored Approximate Curvature	James Martens, Roger Grosse	We propose an efficient method for approximating natural gradient descent in neural networks which we call Kronecker-factored Approximate Curvature (K-FAC).
256	A Convex Exemplar-based Approach to MAD-Bayes Dirichlet Process Mixture Models	En-Hsu Yen, Xin Lin, Kai Zhong, Pradeep Ravikumar, Inderjit Dhillon	In this paper, we consider the exemplar-based version of MAD-Bayes formulation for DP and Hierarchical DP (HDP) mixture model.
257	Multi-instance multi-label learning in the presence of novel class instances	Anh Pham, Raviv Raich, Xiaoli Fern, Jes�s P�rez Arriaga	In this paper, this problem is addressed using a discriminative probabilistic model that accounts for novel instances.
258	Entropy-Based Concentration Inequalities for Dependent Variables	Liva Ralaivola, Massih-Reza Amini	In the way, we prove a new Talagrand concentration inequality for fractionally sub-additive functions of dependent variables.
259	PU Learning for Matrix Completion	Cho-Jui Hsieh, Nagarajan Natarajan, Inderjit Dhillon	In this paper, we consider the matrix completion problem when the observations are one-bit measurements of some underlying matrix M , and in particular the observed samples consist only of ones and no zeros.
260	An Asynchronous Distributed Proximal Gradient Method for Composite Convex Optimization	Necdet Aybat, Zi Wang, Garud Iyengar	We propose a distributed first-order augmented Lagrangian (DFAL) algorithm to minimize the sum of composite convex functions, where each term in the sum is a private cost function belonging to a node, and only nodes connected by an edge can directly communicate with each other.
261	Sparse Subspace Clustering with Missing Entries	Congyuan Yang, Daniel Robinson, Rene Vidal	We consider the problem of clustering incomplete data drawn from a union of subspaces.
262	Moderated and Drifting Linear Dynamical Systems	Jinyan Guan, Kyle Simek, Ernesto Brau, Clayton Morrison, Emily Butler, Kobus Barnard	This change of focus reduces opportunities for efficient inference, and we propose sampling procedures to learn and fit the models.
263	Boosted Categorical Restricted Boltzmann Machine for Computational Prediction of Splice Junctions	Taehoon Lee, Sungroh Yoon	In this paper, we propose a deep belief network-based methodology for computational splice junction prediction.
264	Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo	Yu-Xiang Wang, Stephen Fienberg, Alex Smola	We consider the problem of Bayesian learning on sensitive datasets and present two simple but somewhat surprising results that connect Bayesian learning to “differential privacy”, a cryptographic approach to protect individual-level privacy while permitting database-level utility.
265	A trust-region method for stochastic variational inference with applications to streaming data	Lucas Theis, Matt Hoffman	We address this problem by replacing the natural gradient step of stochastic varitional inference with a trust-region update.
266	Inference in a Partially Observed Queuing Model with Applications in Ecology	Kevin Winner, Garrett Bernstein, Dan Sheldon	The contribution of this paper is to formulate a latent variable model and develop a novel Gibbs sampler based on Markov bases to perform inference using the correct, but intractable, likelihood function.
267	Deterministic Independent Component Analysis	Ruitong Huang, Andras Gyorgy, Csaba Szepesv�ri	We present, for the first time in the literature, consistent, polynomial-time algorithms to recover non-Gaussian source signals and the mixing matrix with a reconstruction error that vanishes at a 1/\sqrtT rate using T observations and scales only polynomially with the natural parameters of the problem.
268	On the Optimality of Multi-Label Classification under Subset Zero-One Loss for Distributions Satisfying the Composition Property	Maxime Gasse, Alexandre Aussem, Haytham Elghazel	In this paper, we show that the subsets of labels that appear as irreducible factors in the factorization of the conditional distribution of the label set given the input features play a pivotal role for multi-label classification in the context of subset Zero-One loss minimization, as they divide the learning task into simpler independent multi-class problems.
269	Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization	Roy Frostig, Rong Ge, Sham Kakade, Aaron Sidford	To achieve this, we establish a framework, based on the classical proximal point algorithm, useful for accelerating recent fast stochastic algorithms in a black-box fashion.
270	A New Generalized Error Path Algorithm for Model Selection	Bin Gu, Charles Ling	Recently, various solution path algorithms have been proposed for several important learning algorithms including support vector classification, Lasso, and so on.