Paper Digest: NIPS 2013 Highlights

December 4, 2013October 6, 2019 admin

The Conference on Neural Information Processing Systems (NIPS) is one of the top machine learning conferences in the world. In 2013, it is to be held in Lake Tahoe, Neveda.

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team
team@paperdigest.org

TABLE 1: NIPS 2013 Papers

	Title	Authors	Highlight
1	The Randomized Dependence Coefficient	David Lopez-Paz, Philipp Hennig, Bernhard Sch�lkopf	We introduce the Randomized Dependence Coefficient (RDC), a measure of non-linear dependence between random variables of arbitrary dimension based on the Hirschfeld-Gebelein-Rényi Maximum Correlation Coefficient.
2	Documents as multiple overlapping windows into grids of counts	Alessandro Perina, Nebojsa Jojic, Manuele Bicego, Andrzej Truski	In this paper, we overcome to this issue with the \emph{Componential Counting Grid} which brings the componential nature of topic models to the basic counting grid.
3	Reciprocally Coupled Local Estimators Implement Bayesian Information Integration Distributively	Wen-Hao Zhang, Si Wu	The present study proposes a novel mechanism to achieve this.
4	Latent Maximum Margin Clustering	Guang-Tong Zhou, Tian Lan, Arash Vahdat, Greg Mori	We present a maximum margin framework that clusters data using latent variables.
5	Data-driven Distributionally Robust Polynomial Optimization	Martin Mevissen, Emanuele Ragnoli, Jia Yuan Yu	We consider robust optimization for polynomial optimization problems where the uncertainty set is a set of candidate probability density functions.
6	Transfer Learning in a Transductive Setting	Marcus Rohrbach, Sandra Ebert, Bernt Schiele	In this work, we extend transfer learning with semi-supervised learning to exploit unlabeled instances of (novel) categories with no or only a few labeled instances.
7	Bayesian optimization explains human active search	Ali Borji, Laurent Itti	We try to unravel the general underlying algorithm people may be using while searching for the maximum of an invisible 1D function.
8	Provable Subspace Clustering: When LRR meets SSC	Yu-Xiang Wang, Huan Xu, Chenlei Leng	Because the representation matrix is often simultaneously sparse and low-rank, we propose a new algorithm, termed Low-Rank Sparse Subspace Clustering (LRSSC), by combining SSC and LRR, and develops theoretical guarantees of when the algorithm succeeds.
9	Generalized Random Utility Models with Multiple Types	Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai, David C. Parkes	We propose a model for demand estimation in multi-agent, differentiated product settings and present an estimation algorithm that uses reversible jump MCMC techniques to classify agents’ types.
10	Polar Operators for Structured Sparse Estimation	Xinhua Zhang, Yao-Liang Yu, Dale Schuurmans	Our first contribution is to uncover a rich class of structured sparse regularizers whose polar operator can be evaluated efficiently.
11	On Decomposing the Proximal Map	Yao-Liang Yu	Motivated by the need of combining regularizers to simultaneously induce different types of structures, this paper initiates a systematic investigation of when the proximal map of a sum of functions decomposes into the composition of the proximal maps of the individual summands.
12	Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs	Liam C. MacDermed, Charles L. Isbell	(3) We present a method to transform any DecPOMDP into a DecPOMDP with bounded beliefs (the number of beliefs is a free parameter) using optimal (not lossless) belief compression.
13	PAC-Bayes-Empirical-Bernstein Inequality	Ilya O. Tolstikhin, Yevgeny Seldin	We present PAC-Bayes-Empirical-Bernstein inequality.
14	Modeling Clutter Perception using Parametric Proto-object Partitioning	Chen-Ping Yu, Wen-Yu Hua, Dimitris Samaras, Greg Zelinsky	We introduce a novel parametric method of merging superpixels by modeling mixture of Weibull distributions on similarity distance statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception.
15	Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching	Marcelo Fiori, Pablo Sprechmann, Joshua Vogelstein, Pablo Muse, Guillermo Sapiro	We propose a robust graph matching algorithm inspired in sparsity-related techniques.
16	Transportability from Multiple Environments with Limited Experiments	Elias Bareinboim, Sanghack Lee, Vasant Honavar, Judea Pearl	This paper considers the problem of transferring experimental findings learned from multiple heterogeneous domains to a target environment, in which only limited experiments can be performed.
17	More data speeds up training time in learning halfspaces over sparse vectors	Amit Daniely, Nati Linial, Shai Shalev-Shwartz	Our main contribution is a novel, non-cryptographic, methodology for establishing computational-statistical gaps, which allows us to show that, under a widely believed assumption that refuting random $\mathrm{3CNF}$ formulas is hard, efficiently learning this class using $O\left(n/\epsilon^2\right)$ examples is impossible.
18	Causal Inference on Time Series using Restricted Structural Equation Models	Jonas Peters, Dominik Janzing, Bernhard Sch�lkopf	(2) Practical: If there are no feedback loops between time series, we propose an algorithm based on non-linear independence tests of time series.
19	Deep Fisher Networks for Large-Scale Image Classification	Karen Simonyan, Andrea Vedaldi, Andrew Zisserman	In this paper, we explore the extent of this analogy, proposing a version of the state-of-the-art Fisher vector image encoding that can be stacked in multiple layers.
20	Sparse Additive Text Models with Low Rank Background	Lei Shi	This paper extends to propose sparse additive model with low rank background (SAM-LRB), and simple yet efficient estimation.
21	Variance Reduction for Stochastic Gradient Optimization	Chong Wang, Xi Chen, Alexander J. Smola, Eric P. Xing	In this paper, we develop a general approach of using control variate for variance reduction in stochastic gradient.
22	Training and Analysing Deep Recurrent Neural Networks	Michiel Hermans, Benjamin Schrauwen	Common recurrent neural networks, however, do not explicitly accommodate such a hierarchy, and most research on them has been focusing on training algorithms rather than on their basic architecture.
23	A simple example of Dirichlet process mixture inconsistency for the number of components	Jeffrey W. Miller, Matthew T. Harrison	In this note, we give an elementary demonstration of this inconsistency in what is perhaps the simplest possible setting: a DPM with normal components of unit variance, applied to data from a mixture” with one standard normal component.
24	Variational Policy Search via Trajectory Optimization	Sergey Levine, Vladlen Koltun	We present a method that uses trajectory optimization as a powerful exploration strategy that guides the policy search.
25	Scalable kernels for graphs with continuous attributes	Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, Karsten Borgwardt	In this paper, we present a class of path kernels with computational complexity $\mathcal{O}(n^2 (m + \delta^2))$, where $\delta$ is the graph diameter and $m$ the number of edges.
26	Density estimation from unweighted k-nearest neighbor graphs: a roadmap	Ulrike Von Luxburg, Morteza Alamgir	We prove how one can estimate the density p just from the unweighted adjacency matrix of the graph, without knowing the points themselves or their distance or similarity scores.
27	Decision Jungles: Compact and Rich Models for Classification	Jamie Shotton, Toby Sharp, Pushmeet Kohli, Sebastian Nowozin, John Winn, Antonio Criminisi	We present and compare two new node merging algorithms that jointly optimize both the features and the structure of the DAGs efficiently.
28	What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach	Zhenwen Dai, Georgios Exarchakis, J�rg L�cke	Here, we for the first time apply a model with non-linear feature superposition and explicit position encoding.
29	Actor-Critic Algorithms for Risk-Sensitive MDPs	Prashanth L.A., Mohammad Ghavamzadeh	In this paper, we consider both discounted and average reward Markov decision processes.
30	Summary Statistics for Partitionings and Feature Allocations	Isik B. Fidaner, Taylan Cemgil	In this paper, we introduce novel statistics based on block sizes for representing sample sets of partitionings and feature allocations.
31	One-shot learning and big data with n=2	Lee H. Dicker, Dean P. Foster	We model a one-shot learning” situation, where very few (scalar) observations $y_1,…,y_n$ are available.
32	Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression	Michalis Titsias RC AUEB, Miguel Lazaro-Gredilla	We introduce a novel variational method that allows to approximately integrate out kernel hyperparameters, such as length-scales, in Gaussian process regression.
33	Correlations strike back (again): the case of associative memory retrieval	Cristina Savin, Peter Dayan, Mate Lengyel	We show that activity-dependent learning generically produces such correlations, and failing to take them into account in the dynamics of memory retrieval leads to catastrophically poor recall.
34	Optimal Neural Population Codes for High-dimensional Stimulus Variables	Zhuo Wang, Alan A. Stocker, Daniel D. Lee	We consider solutions for a minimal case where the number of neurons in the population is equal to the number of stimulus dimensions (diffeomorphic).
35	Online Variational Approximations to non-Exponential Family Change Point Models: With Application to Radar Tracking	Ryan D. Turner, Steven Bottone, Clay J. Stanek	We apply our methodology to a tracking problem using radar data with a signal-to-noise feature that is Rice distributed.
36	Accelerating Stochastic Gradient Descent using Predictive Variance Reduction	Rie Johnson, Tong Zhang	To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG).
37	Using multiple samples to learn mixture models	Jason D. Lee, Ran Gilad-Bachrach, Rich Caruana	In this work we make the assumption that we have access to several samples drawn from the same $K$ underlying distributions, but with different mixing weights.
38	Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition	Tzu-Kuo Huang, Jeff Schneider	Inspired by recent advances in spectral learning methods, we propose to study this problem from a different perspective: moment matching and spectral decomposition.
39	On model selection consistency of penalized M-estimators: a geometric theory	Jason D. Lee, Yuekai Sun, Jonathan E. Taylor	We generalize the notion of irrepresentable to geometrically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of M-estimators with such penalties.
40	Dropout Training as Adaptive Regularization	Stefan Wager, Sida Wang, Percy S. Liang	By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer.
41	New Subsampling Algorithms for Fast Least Squares Regression	Paramveer Dhillon, Yichao Lu, Dean P. Foster, Lyle Ungar	We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation.
42	Faster Ridge Regression via the Subsampled Randomized Hadamard Transform	Yichao Lu, Paramveer Dhillon, Dean P. Foster, Lyle Ungar	We propose a fast algorithm for ridge regression when the number of features is much larger than the number of observations ($p \gg n$).
43	Accelerated Mini-Batch Stochastic Dual Coordinate Ascent	Shai Shalev-Shwartz, Tong Zhang	Our main contribution is to introduce an accelerated mini-batch version of SDCA and prove a fast convergence rate for this method.
44	Improved and Generalized Upper Bounds on the Complexity of Policy Iteration	Bruno Scherrer	Given a Markov Decision Process (MDP) with $n$ states and $m$ actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal $\gamma$-discounted optimal policy.
45	Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation	Dahua Lin	To tackle this problem, we propose a Bayesian learning algorithm for DP mixture models.
46	Online Robust PCA via Stochastic Optimization	Jiashi Feng, Huan Xu, Shuicheng Yan	In this paper, we develop an Online Robust Principal Component Analysis (OR-PCA) that processes one sample per time instance and hence its memory cost is independent of the data size, significantly enhancing the computation and storage efficiency.
47	Least Informative Dimensions	Fabian Sinz, Anna Stockl, Jan Grewe, Jan Benda	We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system.
48	A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks	Junming Yin, Qirong Ho, Eric P. Xing	We propose a scalable approach for making inference about latent spaces of large networks.
49	Understanding variable importances in forests of randomized trees	Gilles Louppe, Louis Wehenkel, Antonio Sutera, Pierre Geurts	In this work we characterize the Mean Decrease Impurity (MDI) variable importances as measured by an ensemble of totally randomized trees in asymptotic sample and ensemble size conditions.
50	Correlated random features for fast semi-supervised learning	Brian McWilliams, David Balduzzi, Joachim M. Buhmann	This paper presents Correlated Nystrom Views (XNV), a fast semi-supervised algorithm for regression and classification.
51	Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture	Trevor Campbell, Miao Liu, Brian Kulis, Jonathan P. How, Lawrence Carin	This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters.
52	Better Approximation and Faster Algorithm Using the Proximal Average	Yao-Liang Yu	Better Approximation and Faster Algorithm Using the Proximal Average
53	Rapid Distance-Based Outlier Detection via Sampling	Mahito Sugiyama, Karsten Borgwardt	We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets.
54	Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima	Po-Ling Loh, Martin J. Wainwright	Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models; regression in generalized linear models using nonconvex regularizers such as SCAD and MCP; and graph and inverse covariance matrix estimation.
55	Non-Linear Domain Adaptation with Boosting	Carlos J. Becker, Christos M. Christoudias, Pascal Fua	In this paper we present a multi-task learning algorithm for domain adaptation based on boosting.
56	Mid-level Visual Element Discovery as Discriminative Mode Seeking	Carl Doersch, Abhinav Gupta, Alexei A. Efros	In this work, we pose visual element discovery as discriminative mode seeking, drawing connections to the the well-known and well-studied mean-shift algorithm.
57	q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions	Assaf Glazer, Michael Lindenbaum, Shaul Markovitch	In this paper we introduce a novel method that can efficiently estimate a family of hierarchical dense sets in high-dimensional distributions.
58	Auditing: Active Learning with Outcome-Dependent Query Costs	Sivan Sabato, Anand D. Sarwate, Nati Srebro	We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance.
59	A message-passing algorithm for multi-agent trajectory planning	Jos� Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia	We describe a novel approach for computing collision-free \emph{global} trajectories for $p$ agents with specified initial and final configurations, based on an improved version of the alternating direction method of multipliers (ADMM) algorithm.
60	Learning Stochastic Feedforward Neural Networks	Yichuan Tang, Ruslan R. Salakhutdinov	In this paper, we propose a stochastic feedforward network with hidden layers having \emph{both deterministic and stochastic} variables.
61	Inferring neural population dynamics from multiple partial recordings of the same neural circuit	Srini Turaga, Lars Buesing, Adam M. Packer, Henry Dalgleish, Noah Pettit, Michael Hausser, Jakob H. Macke	Here we contribute a statistical method for stitching” together sequentially imaged sets of neurons into one model by phrasing the problem as fitting a latent dynamical system with missing observations.
62	Multi-Prediction Deep Boltzmann Machines	Ian Goodfellow, Mehdi Mirza, Aaron Courville, Yoshua Bengio	We introduce the Multi-Prediction Deep Boltzmann Machine (MP-DBM).
63	Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation	Vibhav Vineet, Carsten Rother, Philip Torr	In this work we explore the synergy effects between intrinsic scene properties recovered from an image, and the objects and attributes present in the scene.
64	Blind Calibration in Compressed Sensing using Message Passing Algorithms	Christophe Schulke, Francesco Caltagirone, Florent Krzakala, Lenka Zdeborov�	In this paper we study the so-called blind calibration, i.e. when the training signals that are available to perform the calibration are sparse but unknown.
65	Learning Trajectory Preferences for Manipulators via Iterative Improvement	Ashesh Jain, Brian Wojcik, Thorsten Joachims, Ashutosh Saxena	In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks.
66	Large Scale Distributed Sparse Precision Estimation	Huahua Wang, Arindam Banerjee, Cho-Jui Hsieh, Pradeep K. Ravikumar, Inderjit S. Dhillon	We present an inexact alternating direction method of multiplier (ADMM) algorithm for CLIME, and establish rates of convergence for both the objective and optimality conditions.
67	Neural representation of action sequences: how far can a simple snippet-matching model take us?	Cheston Tan, Jedediah M. Singer, Thomas Serre, David Sheinberg, Tomaso Poggio	We find that a baseline model, one that simply computes a linear weighted sum of ventral and dorsal responses to short action “snippets”, produces surprisingly good fits to the neural data.
68	On Algorithms for Sparse Multi-factor NMF	Siwei Lyu, Xin Wang	In this work, we describe a new simple and efficient algorithm for multi-factor nonnegative matrix factorization problem ({mfNMF}), which generalizes the original NMF problem to more than two factors.
69	Dirty Statistical Models	Eunho Yang, Pradeep K. Ravikumar	We provide a unified framework for the high-dimensional analysis of “superposition-structured” or “dirty” statistical models: where the model parameters are a “superposition” of structurally constrained parameters.
70	Parallel Sampling of DP Mixture Models using Sub-Cluster Splits	Jason Chang, John W. Fisher III	We present a novel MCMC sampler for Dirichlet process mixture models that can be used for conjugate or non-conjugate prior distributions.
71	Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent	Tianbao Yang	We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method.
72	Prior-free and prior-dependent regret bounds for Thompson Sampling	Sebastien Bubeck, Che-Yu Liu	We consider the stochastic multi-armed bandit problem with a prior distribution on the reward distributions.
73	Structured Learning via Logistic Regression	Justin Domke	This paper observes that if the inference problem is “smoothed” through the addition of entropy terms, for fixed messages, the learning objective reduces to a traditional (non-structured) logistic regression problem with respect to parameters.
74	Which Space Partitioning Tree to Use for Search?	Parikshit Ram, Alexander Gray	To this end, we present the theoretical results which imply that trees with better vector quantization performance have better search performance guarantees.
75	Projecting Ising Model Parameters for Fast Mixing	Justin Domke, Xianghang Liu	We present an algorithm to project Ising model parameters onto a parameter set that is guaranteed to be fast mixing, under several divergences.
76	Mixed Optimization for Smooth Functions	Mehrdad Mahdavi, Lijun Zhang, Rong Jin	In this work, we consider a new setup for optimizing smooth functions, termed as {\bf Mixed Optimization}, which allows to access both a stochastic oracle and a full gradient oracle.
77	Conditional Random Fields via Univariate Exponential Families	Eunho Yang, Pradeep K. Ravikumar, Genevera I. Allen, Zhandong Liu	We thus introduce a “novel subclass of CRFs”, derived by imposing node-wise conditional distributions of response variables conditioned on the rest of the responses and the covariates as arising from univariate exponential families.
78	Stochastic blockmodel approximation of a graphon: Theory and consistent estimation	Edo M. Airoldi, Thiago B. Costa, Stanley H. Chan	In this paper, we propose a computationally efficient algorithm to estimate a graphon from a set of observed graphs generated from it.
79	Reinforcement Learning in Robust Markov Decision Processes	Shiau Hong Lim, Huan Xu, Shie Mannor	We consider a problem setting where some unknown parts of the state space can have arbitrary transitions while other parts are purely stochastic.
80	On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization	Ke Hou, Zirui Zhou, Anthony Man-Cho So, Zhi-Quan Luo	In this paper, we show that for a large class of loss functions, the convergence rate of the PGM is in fact linear.
81	Recurrent networks of coupled Winner-Take-All oscillators for solving constraint satisfaction problems	Hesham Mostafa, Lorenz. K. Mueller, Giacomo Indiveri	We present a recurrent neuronal network, modeled as a continuous-time dynamical system, that can solve constraint satisfaction problems.
82	Latent Structured Active Learning	Wenjie Luo, Alex Schwing, Raquel Urtasun	In this paper we present active learning algorithms in the context of structured prediction problems.
83	A Gang of Bandits	Nicol� Cesa-Bianchi, Claudio Gentile, Giovanni Zappella	In this paper, we introduce novel algorithmic approaches to the solution of such networked bandit problems.
84	Learning Feature Selection Dependencies in Multi-task Learning	Daniel Hern�ndez-Lobato, Jos� Miguel Hern�ndez-Lobato	A probabilistic model based on the horseshoe prior is proposed for learning dependencies in the process of identifying relevant features for prediction.
85	B-test: A Non-parametric, Low Variance Kernel Two-sample Test	Wojciech Zaremba, Arthur Gretton, Matthew Blaschko	We propose a family of maximum mean discrepancy (MMD) kernel two-sample tests that have low sample complexity and are consistent.
86	Online PCA for Contaminated Data	Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan	Here we propose the online robust PCA algorithm, which is able to improve the PCs estimation upon an initial one steadily, even when faced with a constant fraction of outliers.
87	Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)	Francis Bach, Eric Moulines	We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk.
88	Efficient Algorithm for Privately Releasing Smooth Queries	Ziteng Wang, Kai Fan, Jiaqi Zhang, Liwei Wang	We study differentially private mechanisms for answering \emph{smooth} queries on databases consisting of data points in $\mathbb{R}^d$.
89	Beyond Pairwise: Provably Fast Algorithms for Approximate `k`-Way Similarity Search	Anshumali Shrivastava, Ping Li	In this paper, we focus on problems related to \emph{3-way Jaccard} similarity: $\mathcal{R}^{3way}= \frac{\|S_1 \cap S_2 \cap S_3\|}{\|S_1 \cup S_2 \cup S_3\|}$, $S_1, S_2, S_3 \in \mathcal{C}$, where $\mathcal{C}$ is a size $n$ collection of sets (or binary vectors).
90	Unsupervised Spectral Learning of Finite State Transducers	Raphael Bailly, Xavier Carreras, Ariadna Quattoni	In this paper we address the more realistic, yet challenging setting where the alignments are unknown to the learning algorithm.
91	Learning a Deep Compact Image Representation for Visual Tracking	Naiyan Wang, Dit-Yan Yeung	In this paper, we study the challenging problem of tracking the trajectory of a moving object in a video with possibly very complex background.
92	Learning Multi-level Sparse Representations	Ferran Diego Andilla, Fred A. Hamprecht	Driven by this concrete problem, we propose a decomposition of the matrix of observations into a product of more than two sparse matrices, with the rank decreasing from lower to higher levels.
93	Robust Data-Driven Dynamic Programming	Grani Adiwena Hanasusanto, Daniel Kuhn	To mitigate these small sample effects, we propose a robust data-driven DP scheme, which replaces the expectations in the DP recursions with worst-case expectations over a set of distributions close to the best estimate.
94	Low-Rank Matrix and Tensor Completion via Adaptive Sampling	Akshay Krishnamurthy, Aarti Singh	We study low rank matrix and tensor completion and propose novel algorithms that employ adaptive sampling schemes to obtain strong performance guarantees for these problems.
95	Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms	Adrien Todeschini, Fran�ois Caron, Marie Chavent	We propose a novel class of algorithms for low rank matrix completion.
96	Distributed Exploration in Multi-Armed Bandits	Eshcar Hillel, Zohar S. Karnin, Tomer Koren, Ronny Lempel, Oren Somekh	On the other extreme, we present an algorithm that achieves the ideal factor $k$ speed-up in learning performance, with communication only logarithmic in~$1/\epsilon$.
97	The Pareto Regret Frontier	Wouter M. Koolen	We study which such regret trade-offs can be achieved, and how.
98	Direct 0-1 Loss Minimization and Margin Maximization with Boosting	Shaodan Zhai, Tian Xia, Ming Tan, Shaojun Wang	We propose a boosting method, DirectBoost, a greedy coordinate descent algorithm that builds an ensemble classifier of weak classifiers through directly minimizing empirical classification error over labeled training examples; once the training classification error is reduced to a local coordinatewise minimum, DirectBoost runs a greedy coordinate ascent algorithm that continuously adds weak classifiers to maximize any targeted arbitrarily defined margins until reaching a local coordinatewise maximum of the margins in a certain sense.
99	Regret based Robust Solutions for Uncertain Markov Decision Processes	Asrar Ahmed, Pradeep Varakantham, Yossiri Adulyasak, Patrick Jaillet	In this paper, we seek robust policies for uncertain Markov Decision Processes (MDPs).
100	Speeding up Permutation Testing in Neuroimaging	Chris Hinrichs, Vamsi K. Ithapu, Qinyuan Sun, Sterling C. Johnson, Vikas Singh	In this paper, we observe that permutation testing in fact amounts to populating the columns of a very large matrix P. By analyzing the spectrum of this matrix, under certain conditions, we see that P has a low-rank plus a low-variance residual decomposition which makes it suitable for highly sub–sampled — on the order of 0.5% — matrix completion methods.
101	Generalized Denoising Auto-Encoders as Generative Models	Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent	We propose here a different attack on the problem, which deals with all these issues: arbitrary (but noisy enough) corruption, arbitrary reconstruction loss (seen as a log-likelihood), handling both discrete and continuous-valued variables, and removing the bias due to non-infinitesimal corruption noise (or non-infinitesimal contractive penalty).
102	Supervised Sparse Analysis and Synthesis Operators	Pablo Sprechmann, Roee Litman, Tal Ben Yakar, Alexander M. Bronstein, Guillermo Sapiro	In this paper, we propose a new and computationally efficient framework for learning sparse models.
103	Low-rank matrix reconstruction and clustering via approximate message passing	Ryosuke Matsushita, Toshiyuki Tanaka	We propose an efficient approximate message passing algorithm, derived from the belief propagation algorithm, to perform the Bayesian inference for matrix reconstruction.
104	Reasoning With Neural Tensor Networks for Knowledge Base Completion	Richard Socher, Danqi Chen, Christopher D. Manning, Andrew Ng	The goal of this paper is to develop a more powerful neural network model suitable for inference over these relationships.
105	Zero-Shot Learning Through Cross-Modal Transfer	Richard Socher, Milind Ganjoo, Christopher D. Manning, Andrew Ng	This work introduces a model that can recognize objects in images even if no training data is available for the object class.
106	Estimating LASSO Risk and Noise Level	Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari	We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling.
107	Learning Adaptive Value of Information for Structured Prediction	David J. Weiss, Ben Taskar	We propose an architecture that uses a rich feedback loop between extraction and prediction.
108	Efficient Online Inference for Bayesian Nonparametric Relational Models	Dae Il Kim, Prem K. Gopalan, David Blei, Erik Sudderth	We introduce a new model for these phenomena, the hierarchical Dirichlet process relational model, which allows nodes to have mixed membership in an unbounded set of communities.
109	Approximate inference in latent Gaussian-Markov models from continuous time observations	Botond Cseke, Manfred Opper, Guido Sanguinetti	We propose an approximate inference algorithm for continuous time Gaussian-Markov process models with both discrete and continuous time likelihoods.
110	Linear Convergence with Condition Number Independent Access of Full Gradients	Lijun Zhang, Mehrdad Mahdavi, Rong Jin	In this paper, we propose to reduce the number of full gradient required by allowing the algorithm to access the stochastic gradients of the objective function.
111	When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements	Divyanshu Vats, Richard Baraniuk	We consider the problem of accurately estimating a high-dimensional sparse vector using a small number of linear measurements that are contaminated by noise.
112	Wavelets on Graphs via Deep Learning	Raif Rustamov, Leonidas J. Guibas	This paper introduces a machine learning framework for constructing graph wavelets that can sparsely represent a given class of signals.
113	Robust Spatial Filtering with Beta Divergence	Wojciech Samek, Duncan Blythe, Klaus-Robert M�ller, Motoaki Kawanabe	Inspired by concepts from the field of information geometry we propose a novel approach for robustifying CSP.
114	Convex Relaxations for Permutation Problems	Fajwel Fogel, Rodolphe Jenatton, Francis Bach, Alexandre D’Aspremont	We present numerical experiments on archeological data, Markov chains and gene sequences.
115	High-Dimensional Gaussian Process Bandits	Josip Djolonga, Andreas Krause, Volkan Cevher	In particular, we present the SI-BO algorithm, which leverages recent low-rank matrix recovery techniques to learn the underlying subspace of the unknown function and applies Gaussian Process Upper Confidence sampling for optimization of the function.
116	A memory frontier for complex synapses	Subhaneil Lahiri, Surya Ganguli	To address this, we develop new mathematical theorems elucidating the relationship between the structural organization and memory properties of complex synapses that are themselves molecular networks.
117	Marginals-to-Models Reducibility	Tim Roughgarden, Michael Kearns	We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution.
118	First-order Decomposition Trees	Nima Taghipour, Jesse Davis, Hendrik Blockeel	In this paper, we introduce FO-dtrees, which upgrade propositional dtrees to the first-order level.
119	A Comparative Framework for Preconditioned Lasso Algorithms	Fabian L. Wauthier, Nebojsa Jojic, Michael I. Jordan	In this paper we propose an agnostic, theoretical framework for comparing Preconditioned Lasso algorithms to the Lasso without having to choose $\lambda$.
120	Lasso Screening Rules via Dual Polytope Projection	Jie Wang, Jiayu Zhou, Peter Wonka, Jieping Ye	In this paper, we propose an efficient and effective screening rule via Dual Polytope Projections (DPP), which is mainly based on the uniqueness and nonexpansiveness of the optimal dual solution due to the fact that the feasible set in the dual space is a convex and closed polytope.
121	Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent	Yuening Hu, Jordan L. Ying, Hal Daume III, Z. Irene Ying	We present results on both synthetic and real data that show the beta coalescent outperforms Kingman’s coalescent on real datasets and is qualitatively better at capturing data in bushy hierarchies.
122	A Latent Source Model for Nonparametric Time Series Classification	George H. Chen, Stanislav Nikolov, Devavrat Shah	To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a weighted majority voting” classification rule that can be approximated by a nearest-neighbor classifier.
123	Efficient Optimization for Sparse Gaussian Process Regression	Yanshuai Cao, Marcus A. Brubaker, David J. Fleet, Aaron Hertzmann	We propose an efficient discrete optimization algorithm for selecting a subset of training data to induce sparsity for Gaussian process regression.
124	Lexical and Hierarchical Topic Regression	Viet-An Nguyen, Jordan L. Ying, Philip Resnik	Inspired by a two-level theory that unifies agenda setting and ideological framing, we propose supervised hierarchical latent Dirichlet allocation (SHLDA) which jointly captures documents’ multi-level topic structure and their polar response variables.
125	Stochastic Convex Optimization with Multiple Objectives	Mehrdad Mahdavi, Tianbao Yang, Rong Jin	In this paper, we are interested in the development of efficient algorithms for convex optimization problems in the simultaneous presence of multiple objectives and stochasticity in the first-order information.
126	A Kernel Test for Three-Variable Interactions	Dino Sejdinovic, Arthur Gretton, Wicher Bergsma	We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space.
127	Memoized Online Variational Inference for Dirichlet Process Mixture Models	Michael C. Hughes, Erik Sudderth	We present a new algorithm, memoized online variational inference, which scales to very large (yet finite) datasets while avoiding the complexities of stochastic gradient.
128	Designed Measurements for Vector Count Data	Liming Wang, David E. Carlson, Miguel Rodrigues, David Wilcox, Robert Calderbank, Lawrence Carin	We consider design of linear projection measurements for a vector Poisson signal model.
129	Robust Transfer Principal Component Analysis with Rank Constraints	Yuhong Guo	In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method.
130	Online Learning with Switching Costs and Other Adaptive Adversaries	Nicol� Cesa-Bianchi, Ofer Dekel, Ohad Shamir	We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback.
131	Learning Prices for Repeated Auctions with Strategic Buyers	Kareem Amin, Afshin Rostamizadeh, Umar Syed	We present seller algorithms that are no-regret when the buyer discounts her future surplus — i.e. the buyer prefers showing advertisements to users sooner rather than later.
132	Probabilistic Principal Geodesic Analysis	Miaomiao Zhang, Tom Fletcher	Inspired by probabilistic PCA, we present a latent variable model for PGA that provides a probabilistic framework for factor analysis on manifolds.
133	Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models	Adel Javanmard, Andrea Montanari	We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values.
134	Learning with Noisy Labels	Nagarajan Natarajan, Inderjit S. Dhillon, Pradeep K. Ravikumar, Ambuj Tewari	In this paper, we theoretically study the problem of binary classification in the presence of random classification noise — the learner, instead of seeing the true labels, sees labels that have independently been flipped with some small probability.
135	Tracking Time-varying Graphical Structure	Erich Kummerfeld, David Danks	In this paper, we present LoSST, a novel, heuristic structure learning algorithm that tracks changes in graphical model structure or parameters in a dynamic, real-time manner.
136	Factorized Asymptotic Bayesian Inference for Latent Feature Models	Kohei Hayashi, Ryohei Fujimaki	This paper extends factorized asymptotic Bayesian (FAB) inference for latent feature models~(LFMs).
137	More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server	Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B. Gibbons, Garth A. Gibson, Greg Ganger, Eric P. Xing	We propose a parameter server system for distributed ML, which follows a Stale Synchronous Parallel (SSP) model of computation that maximizes the time computational workers spend doing useful work on ML algorithms, while still providing correctness guarantees.
138	Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models	Jie Liu, David Page	Posterior inference usually involves calculating intractable terms, and we propose two approximation algorithms, namely a Metropolis-Hastings algorithm with auxiliary variables and a Gibbs sampling algorithm with stripped Beta approximation (Gibbs_SBA).
139	Online Learning with Costly Features and Labels	Navid Zolghadr, Gabor Bartok, Russell Greiner, Andr�s Gy�rgy, Csaba Szepesvari	We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback.
140	Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions	Eftychios A. Pnevmatikakis, Liam Paninski	We propose a compressed sensing (CS) calcium imaging framework for monitoring large neuronal populations, where we image randomized projections of the spatial calcium concentration at each timestep, instead of measuring the concentration at individual locations.
141	A Novel Two-Step Method for Cross Language Representation Learning	Min Xiao, Yuhong Guo	In this paper, we propose a two-step representation learning method to bridge the feature spaces of different languages by exploiting a set of parallel bilingual documents.
142	On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations	Tamir Hazan, Subhransu Maji, Tommi Jaakkola	In this paper we describe how MAP inference can be used to sample efficiently from Gibbs distributions.
143	Graphical Models for Inference with Missing Data	Karthika Mohan, Judea Pearl, Jin Tian	We address the problem of deciding whether there exists a consistent estimator of a given relation Q, when data are missing not at random.
144	Reshaping Visual Datasets for Domain Adaptation	Boqing Gong, Kristen Grauman, Fei Sha	We extensively evaluate our approach on object recognition and human activity recognition tasks.
145	Statistical Active Learning Algorithms	Maria-Florina F. Balcan, Vitaly Feldman	We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise.
146	Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits	Ben Shababo, Brooks Paige, Ari Pakman, Liam Paninski	We present a realistic statistical model which accounts for the main sources of variability in this experiment and allows for large amounts of information about the biological system to be incorporated if available.
147	Reflection methods for user-friendly submodular optimization	Stefanie Jegelka, Francis Bach, Suvrit Sra	It is solved through a sequence of reflections and its solution can be automatically thresholded to obtain an optimal discrete solution.
148	Unsupervised Structure Learning of Stochastic And-Or Grammars	Kewei Tu, Maria Pavlovskaia, Song-Chun Zhu	We present a unified formalization of stochastic And-Or grammars that is agnostic to the type of the data being modeled, and propose an unsupervised approach to learning the structures as well as the parameters of such grammars.
149	Convex Tensor Decomposition via Structured Schatten Norm Regularization	Ryota Tomioka, Taiji Suzuki	We propose a new class of structured Schatten norms for tensors that includes two recently proposed norms (overlapped” and “latent”) for convex-optimization-based tensor decomposition.
150	Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs	Yann Dauphin, Yoshua Bengio	To generalize this idea to RBMs, we propose a stochastic ratio-matching algorithm that inherits all the computational advantages and unbiasedness of the importance sampling scheme.
151	Learning Chordal Markov Networks by Constraint Satisfaction	Jukka Corander, Tomi Janhunen, Jussi Rintanen, Henrik Nyman, Johan Pensar	We investigate the problem of learning the structure of a Markov network from data.
152	Parametric Task Learning	Ichiro Takeuchi, Tatsuya Hongo, Masashi Sugiyama, Shinichi Nakajima	We introduce a novel formulation of multi-task learning (MTL) called parametric task learning (PTL) that can systematically handle infinitely many tasks parameterized by a continuous parameter.
153	A Deep Architecture for Matching Short Texts	Zhengdong Lu, Hang Li	In this paper, we propose a new deep architecture to more effectively model the complicated matching relations between two objects from heterogeneous domains.
154	Computing the Stationary Distribution Locally	Christina E. Lee, Asuman Ozdaglar, Devavrat Shah	In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some $\Delta \in (0,1)$.
155	Nonparametric Multi-group Membership Model for Dynamic Networks	Myunghwan Kim, Jure Leskovec	We propose a nonparametric multi-group membership model for dynamic networks.
156	Adaptive Step-Size for Policy Gradient Methods	Matteo Pirotta, Marcello Restelli, Luca Bascetta	In this paper, we propose to determine the learning rate by maximizing a lower bound to the expected performance gain.
157	Optimistic Concurrency Control for Distributed Unsupervised Learning	Xinghao Pan, Joseph E. Gonzalez, Stefanie Jegelka, Tamara Broderick, Michael I. Jordan	We demonstrate our approach in three problem areas: clustering, feature learning and online facility location.
158	Reservoir Boosting : Between Online and Offline Ensemble Learning	Leonidas Lefakis, Fran�ois Fleuret	We propose to train an ensemble with the help of a reservoir in which the learning algorithm can store a limited number of samples.
159	Multiclass Total Variation Clustering	Xavier Bresson, Thomas Laurent, David Uminsky, James von Brecht	This paper presents a general framework for multiclass total variation clustering that does not rely on recursion.
160	Approximate Inference in Continuous Determinantal Processes	Raja Hafiz Affandi, Emily Fox, Ben Taskar	In this paper, we present efficient approximate DPP sampling schemes based on Nystrom and random Fourier feature approximations that apply to a wide range of kernel functions.
161	Global Solver and Its Efficient Approximation for Variational Bayesian Low-rank Subspace Clustering	Shinichi Nakajima, Akiko Takeda, S. Derin Babacan, Masashi Sugiyama, Ichiro Takeuchi	In this paper, we overcome this difficulty for low-rank subspace clustering (LRSC) by providing an exact global solver and its efficient approximation.
162	Thompson Sampling for 1-Dimensional Exponential Family Bandits	Nathaniel Korda, Emilie Kaufmann, Remi Munos	Here we extend them by proving asymptotic optimality of the algorithm using the Jeffreys prior for $1$-dimensional exponential family bandits.
163	Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion	Nguyen Viet Cuong, Wee Sun Lee, Nan Ye, Kian Ming A. Chai, Hai Leong Chieu	We introduce a new objective function for pool-based Bayesian active learning with probabilistic hypotheses.
164	It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals	Barbara Rakitsch, Christoph Lippert, Karsten Borgwardt, Oliver Stegle	Here, we propose a multi-task Gaussian process approach to model both the relatedness between regressors as well as the task correlations in the residuals, in order to more accurately identify true sharing between regressors.
165	Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses	Harish G. Ramaswamy, Shivani Agarwal, Ambuj Tewari	We give an explicit construction of a convex least-squares type surrogate loss that can be designed to be calibrated for any multiclass learning problem for which the target loss matrix has a low-rank structure; the surrogate loss operates on a surrogate target space of dimension at most the rank of the target loss.
166	Inverse Density as an Inverse Problem: the Fredholm Equation Approach	Qichao Que, Mikhail Belkin	We address the problem of estimating the ratio $\frac{q}{p}$ where $p$ is a density function and $q$ is another density, or, more generally an arbitrary function.
167	Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising	Forest Agostinelli, Michael R. Anderson, Honglak Lee	We present the multi-column stacked sparse denoising autoencoder, a novel technique of combining multiple SSDAs into a multi-column SSDA (MC-SSDA) by combining the outputs of each SSDA.
168	EDML for Learning Parameters in Directed and Undirected Graphical Models	Khaled S. Refaat, Arthur Choi, Adnan Darwiche	In this paper, we propose a greatly simplified perspective on EDML, which casts it as a general approach to continuous optimization.
169	Similarity Component Analysis	Soravit Changpinyo, Kuan Liu, Fei Sha	In this paper, we propose Similarity Component Analysis (SCA), a probabilistic graphical model that discovers those latent components from data.
170	Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs	Vikash K. Mansinghka, Tejas D. Kulkarni, Yura N. Perov, Josh Tenenbaum	We describe two applications: reading sequences of degraded and adversarially obscured alphanumeric characters, and inferring 3D road models from vehicle-mounted camera images.
171	Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation	John Duchi, Martin J. Wainwright, Michael I. Jordan	We provide a detailed study of the estimation of probability distributions—discrete and continuous—in a stringent setting in which data is kept private even from the statistician.
172	Firing rate predictions in optimal balanced networks	David G. Barrett, Sophie Den�ve, Christian K. Machens	This is an important problem because firing rates are one of the most important measures of network activity, in both the study of neural computation and neural network dynamics.
173	Manifold-based Similarity Adaptation for Label Propagation	Masayuki Karasuyama, Hiroshi Mamitsuka	We propose a method for a graph to capture the manifold structure of input features using edge weights parameterized by a similarity function.
174	Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty	Haichao Zhang, David Wipf	Using ideas from Bayesian inference and convex analysis, this paper derives a non-uniform blind deblurring algorithm with several desirable, yet previously-unexplored attributes.
175	Near-Optimal Entrywise Sampling for Data Matrices	Dimitris Achlioptas, Zohar S. Karnin, Edo Liberty	We consider the problem of independently sampling $s$ non-zero entries of a matrix $A$ in order to produce a sparse sketch of it, $B$, that minimizes $\\|A-B\\|_2$.
176	Learning to Prune in Metric and Non-Metric Spaces	Leonid Boytsov, Bilegsaikhan Naidan	We employ a VP-tree and explore two simple yet effective learning-to prune approaches: density estimation through sampling and “stretching” of the triangle inequality.
177	Online learning in episodic Markovian decision processes by relative entropy policy search	Alexander Zimin, Gergely Neu	We study the problem of online learning in finite episodic Markov decision processes where the loss function is allowed to change between episodes.
178	Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result	Paul Wagner	As our second main result, we show for a substantial subset of soft-greedy value function approaches that, while having the potential to avoid policy oscillation and policy chattering, this subset can never converge toward any optimal policy, except in a certain pathological case.
179	Bayesian Hierarchical Community Discovery	Charles Blundell, Yee Whye Teh	We propose an efficient Bayesian nonparametric model for discovering hierarchical community structure in social networks.
180	From Bandits to Experts: A Tale of Domination and Independence	Noga Alon, Nicol� Cesa-Bianchi, Claudio Gentile, Yishay Mansour	We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir (2011).
181	Predictive PAC Learning and Process Decompositions	Cosma Shalizi, Aryeh Kontorovich	In this paper, we argue that it is natural in predictive PAC to condition not on the past observations but on the mixture component of the sample path.
182	Pass-efficient unsupervised feature selection	Crystal Maung, Haim Schweitzer	We propose a new algorithm, a modification of the classical pivoted QR algorithm of Businger and Golub, that requires a small number of passes over the data.
183	Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors	Xiaoqin Zhang, Di Wang, Zhengyuan Zhou, Yi Ma	In this work, we propose a general method for recovering low-rank three-order tensors, in which the data can be deformed by some unknown transformation and corrupted by arbitrary sparse errors.
184	Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search	Aijun Bai, Feng Wu, Xiaoping Chen	In this paper we present a novel Bayesian mixture modelling and inference based Thompson sampling approach to addressing this dilemma.
185	Solving inverse problem of Markov chain with partial observations	Tetsuro Morimura, Takayuki Osogami, Tsuyoshi Ide	We formulate this task as a regularized optimization problem for probability functions, which is efficiently solved using the notion of natural gradient.
186	Locally Adaptive Bayesian Multivariate Time Series	Daniele Durante, Bruno Scarpa, David B. Dunson	We propose a continuous multivariate stochastic process for time series having locally varying smoothness in both the mean and covariance matrix.
187	Mapping paradigm ontologies to and from the brain	Yannick Schwartz, Bertrand Thirion, Gael Varoquaux	To that end, we propose a method that predicts the experimental paradigms across different studies.
188	Noise-Enhanced Associative Memories	Amin Karbasi, Amir Hesam Salavati, Amin Shokrollahi, Lav R. Varshney	Here we consider associative memories with noisy internal computations and analytically characterize performance.
189	Exact and Stable Recovery of Pairwise Interaction Tensors	Shouyuan Chen, Michael R. Lyu, Irwin King, Zenglin Xu	In this paper, we study the recovery algorithm for pairwise interaction tensors, which has recently gained considerable attention for modeling multiple attribute data due to its simplicity and effectiveness.
190	Bayesian entropy estimation for binary spike train data using parametric prior knowledge	Evan W. Archer, Il Memming Park, Jonathan W. Pillow	The parametric model captures high-level statistical features of the data, such as the average spike count in a spike word, which allows the posterior over entropy to concentrate more rapidly than with standard estimators (e.g., in cases where the probability of spiking differs strongly from 0.5).
191	Perfect Associative Learning with Spike-Timing-Dependent Plasticity	Christian Albers, Maren Westkott, Klaus Pawelzik	Recent extensions of the Perceptron, as e.g. the Tempotron, suggest that this theoretical concept is highly relevant also for understanding networks of spiking neurons in the brain.
192	On Poisson Graphical Models	Eunho Yang, Pradeep K. Ravikumar, Genevera I. Allen, Zhandong Liu	In this paper, our objective is to modify the Poisson graphical model distribution so that it can capture a rich dependence structure between count-valued variables.
193	Streaming Variational Bayes	Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael I. Jordan	We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior.
194	Gaussian Process Conditional Copulas with Applications to Financial Time Series	Jos� Miguel Hern�ndez-Lobato, James R. Lloyd, Daniel Hern�ndez-Lobato	To account for this, a Bayesian framework for the estimation of conditional copulas is proposed.
195	Extracting regions of interest from biological images with convolutional sparse block coding	Marius Pachitariu, Adam M. Packer, Noah Pettit, Henry Dalgleish, Michael Hausser, Maneesh Sahani	Formally, the model can be described as convolutional sparse block coding.
196	Approximate Dynamic Programming Finally Performs Well in the Game of Tetris	Victor Gabillon, Mohammad Ghavamzadeh, Bruno Scherrer	In this paper, we put our conjecture to test by applying such an ADP algorithm, called classification-based modified policy iteration (CBMPI), to the game of Tetris.
197	Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections	Matthew Lawlor, Steven W. Zucker	Association field models have been used to explain human contour grouping performance and to explain the mean frequency of long-range horizontal connections across cortical columns in V1.
198	DESPOT: Online POMDP Planning with Regularization	Adhiraj Somani, Nan Ye, David Hsu, Wee Sun Lee	This paper presents an online lookahead search algorithm that alleviates these difficulties by limiting the search to a set of sampled scenarios.
199	Matrix Completion From any Given Set of Observations	Troy Lee, Adi Shraibman	We present a means to obtain performance guarantees with respect to any set of initial observations.
200	Regression-tree Tuning in a Streaming Setting	Samory Kpotufe, Francesco Orabona	We consider the problem of maintaining the data-structures of a partition-based regression procedure in a setting where the training data arrives sequentially over time.
201	Multiscale Dictionary Learning for Estimating Conditional Distributions	Francesca Petralia, Joshua T. Vogelstein, David B. Dunson	We propose a multiscale dictionary learning model, which expresses the conditional response density as a convex combination of dictionary densities, with the densities used and their weights dependent on the path through a tree decomposition of the feature space.
202	Dimension-Free Exponentiated Gradient	Francesco Orabona	We present a new online learning algorithm that extends the exponentiated gradient to infinite dimensional spaces.
203	Stochastic Optimization of PCA with Capped MSG	Raman Arora, Andy Cotter, Nati Srebro	We study PCA as a stochastic optimization problem and propose a novel stochastic approximation algorithm which we refer to as Matrix Stochastic Gradient” (MSG), as well as a practical variant, Capped MSG.
204	On Flat versus Hierarchical Classification in Large-Scale Taxonomies	Rohit Babbar, Ioannis Partalas, Eric Gaussier, Massih R. Amini	We study in this paper flat and hierarchical classification strategies in the context of large-scale taxonomies. To this end, we first propose a multiclass, hierarchical data dependent bound on the generalization error of classifiers deployed in large-scale taxonomies.
205	Learning Gaussian Graphical Models with Observed or Latent FVSs	Ying Liu, Alan Willsky	In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles.
206	Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies	Yangqing Jia, Joshua T. Abbott, Joseph L. Austerweil, Tom Griffiths, Trevor Darrell	We present an algorithm for learning visual concepts directly from images, using probabilistic predictions generated by visual classifiers as the input to a Bayesian generalization model.
207	Robust Bloom Filters for Large MultiLabel Classification Tasks	Moustapha M. Cisse, Nicolas Usunier, Thierry Arti�res, Patrick Gallinari	This paper presents an approach to multilabel classification (MLC) with a large number of labels.
208	Solving the multi-way matching problem by permutation synchronization	Deepti Pachauri, Risi Kondor, Vikas Singh	In contrast, we propose a new method, permutation synchronization, which finds all the matchings jointly, in one shot, via a relaxation to eigenvector decomposition.
209	Generalizing Analytic Shrinkage for Arbitrary Covariance Structures	Daniel Bartz, Klaus-Robert M�ller	We show that the proof of consistency implies bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data.
210	Top-Down Regularization of Deep Belief Networks	Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim	We propose to implement the scheme using a method to regularize deep belief networks with top-down information.
211	Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions	Tamir Hazan, Subhransu Maji, Joseph Keshet, Tommi Jaakkola	In this work we develop efficient methods for learning random MAP predictors for structured label problems.
212	Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms	Yu Zhang	In this paper, different from existing methods, we propose local learning methods for multi-task classification and regression problems based on heterogeneous neighborhood which is defined on data points from all tasks.
213	Machine Teaching for Bayesian Learners in the Exponential Family	Jerry Zhu	We propose an optimal teaching framework aimed at learners who employ Bayesian models.
214	Scoring Workers in Crowdsourcing: How Many Control Questions are Enough?	Qiang Liu, Alexander T. Ihler, Mark Steyvers	We study the problem of estimating continuous quantities, such as prices, probabilities, and point spreads, using a crowdsourcing approach.
215	Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths	Stefan Mathe, Cristian Sminchisescu	Our work makes three contributions towards addressing this problem.
216	A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data	Jasper Snoek, Richard Zemel, Ryan P. Adams	We develop a novel model based on a determinantal point process over latent embeddings of neurons that effectively captures and helps visualize complex inhibitory and competitive interaction.
217	Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model	Fang Han, Han Liu	In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data.
218	Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation	Bogdan Savchynskyy, J�rg Hendrik Kappes, Paul Swoboda, Christoph Schn�rr	We propose a novel method of combining combinatorial and convex programming techniques to obtain a global solution of the initial combinatorial problem.
219	Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic	James L. Sharpnack, Akshay Krishnamurthy, Aarti Singh	In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise.
220	Demixing odors – fast inference in olfaction	Agnieszka Grabska-Barwinska, Jeff Beck, Alexandre Pouget, Peter Latham	Here we derive neural implementations of two approximate inference algorithms that could be used by the brain.
221	Learning Multiple Models via Regularized Weighting	Daniel Vainsencher, Shie Mannor, Huan Xu	We propose a different general formulation that seeks for each model a distribution over data points; the weights are regularized to be sufficiently spread out.
222	When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity	Anima Anandkumar, Daniel J. Hsu, Majid Janzamin, Sham M. Kakade	In this paper, we specify which overcomplete models can be identified given observable moments of a certain order.
223	Distributed `k`-means and `k`-median Clustering on General Topologies	Maria-Florina F. Balcan, Steven Ehrlich, Yingyu Liang	This paper provides new algorithms for distributed clustering for two popular center-based objectives, $k$-median and $k$-means.
224	Multi-Task Bayesian Optimization	Kevin Swersky, Jasper Snoek, Ryan P. Adams	In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently.
225	Online Learning of Dynamic Parameters in Social Networks	Shahin Shahrampour, Sasha Rakhlin, Ali Jadbabaie	Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state.
226	A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles	Jinwoo Shin, Andrew E. Gelfand, Misha Chertkov	In this paper, we design a BP algorithm for the Maximum Weight Matching (MWM) problem over general graphs.
227	Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space	Xinhua Zhang, Wee Sun Lee, Yee Whye Teh	In this paper, we propose a framework for learning in reproducing kernel Hilbert spaces (RKHS) using local invariances that explicitly characterize the behavior of the target function around data instances.
228	Approximate Gaussian process inference for the drift function in stochastic differential equations	Andreas Ruttor, Philipp Batz, Manfred Opper	We introduce a nonparametric approach for estimating drift functions in systems of stochastic differential equations from incomplete observations of the state vector.
229	Distributed Submodular Maximization: Identifying Representative Elements in Massive Data	Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, Andreas Krause	In this paper, we consider the problem of submodular function maximization in a distributed fashion.
230	Adaptive Market Making via Online Learning	Jacob Abernethy, Satyen Kale	We propose a class of spread-based market making strategies whose performance can be controlled even under worst-case (adversarial) settings.
231	On the Sample Complexity of Subspace Learning	Alessandro Rudi, Guillermo D. Canas, Lorenzo Rosasco	In this paper we introduce a general formulation of this problem and derive novel learning error estimates.
232	Spike train entropy-rate estimation using hierarchical Dirichlet process priors	Karin C. Knudson, Jonathan W. Pillow	We present both a fully Bayesian and empirical Bayes entropy rate estimator based on this model, and demonstrate their performance on simulated and real neural spike train data.
233	Embed and Project: Discrete Sampling with Universal Hashing	Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman	We propose a sampling algorithm, called PAWS, based on embedding the set into a higher-dimensional space which is then randomly projected using universal hash functions to a lower-dimensional subspace and explored using combinatorial search methods.
234	Discriminative Transfer Learning with Tree-based Priors	Nitish Srivastava, Ruslan R. Salakhutdinov	This paper proposes a way of improving classification performance for classes which have very few training examples.
235	Small-Variance Asymptotics for Hidden Markov Models	Anirban Roychowdhury, Ke Jiang, Brian Kulis	We present a small-variance asymptotic analysis of the Hidden Markov Model and its infinite-state Bayesian nonparametric extension.
236	Convergence of Monte Carlo Tree Search in Simultaneous Move Games	Viliam Lisy, Vojta Kovarik, Marc Lanctot, Branislav Bosansky	In this paper, we study Monte Carlo tree search (MCTS) in zero-sum extensive-form games with perfect information and simultaneous moves.
237	DeViSE: A Deep Visual-Semantic Embedding Model	Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc’Aurelio Ranzato, Tomas Mikolov	In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text.
238	Reward Mapping for Transfer in Long-Lived Agents	Xiaoxiao Guo, Satinder Singh, Richard L. Lewis	We consider how to transfer knowledge from previous tasks to a current task in long-lived and bounded agents that must solve a sequence of MDPs over a finite lifetime.
239	Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation	Martin Azizyan, Aarti Singh, Larry Wasserman	In this paper, we provide precise information theoretic bounds on the clustering accuracy and sample complexity of learning a mixture of two isotropic Gaussians in high dimensions under small mean separation.
240	Predicting Parameters in Deep Learning	Misha Denil, Babak Shakibi, Laurent Dinh, Marc’Aurelio Ranzato, Nando de Freitas	We demonstrate that there is significant redundancy in the parameterization of several deep learning models.
241	Estimating the Unseen: Improved Estimators for Entropy and other Properties	Paul Valiant, Gregory Valiant	We propose a novel modification of this approach and show: 1) theoretically, our estimator is optimal (to constant factors, over worst-case instances), and 2) in practice, it performs exceptionally well for a variety of estimation tasks, on a variety of natural distributions, for a wide range of parameters.
242	What do row and column marginals reveal about your dataset?	Behzad Golshan, John Byers, Evimaria Terzi	Here, we investigate how these data can be exploited to make inferences about the underlying matrix H. Instead of assuming a generative model for H, we view the input marginals as constraints on the dataspace of possible realizations of H and compute the probability density function of particular entries H(i,j) of interest.
243	RNADE: The real-valued neural autoregressive density-estimator	Benigno Uria, Iain Murray, Hugo Larochelle	We introduce RNADE, a new model for joint density estimation of real-valued vectors.
244	Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards	Thomas Bonald, Alexandre Proutiere	We propose a novel algorithm where the decision to exploit any arm is based on two successive targets, namely, the total number of successes until the first failure and the first $m$ failures, respectively, where $m$ is a fixed parameter.
245	Reconciling "priors" & "priors" without prejudice?	Remi Gribonval, Pierre Machart	The contribution of this paper is twofold.
246	Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis	Nikhil Rao, Christopher Cox, Rob Nowak, Timothy T. Rogers	The main contribution of this paper is a new procedure called {\em Sparse Overlapping Sets (SOS) lasso}, a convex optimization that automatically selects similar features for related learning tasks.
247	Sensor Selection in High-Dimensional Gaussian Trees with Nuisances	Daniel S. Levine, Jonathan P. How	We consider the sensor selection problem on multivariate Gaussian distributions where only a \emph{subset} of latent variables is of inferential interest.
248	Sequential Transfer in Multi-armed Bandit with Finite Set of Models	Mohammad Gheshlaghi azar, Alessandro Lazaric, Emma Brunskill	We introduce a novel bandit algorithm based on a method-of-moments approach for the estimation of the possible tasks and derive regret bounds for it.
249	Buy-in-Bulk Active Learning	Liu Yang, Jaime Carbonell	In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed.
250	Contrastive Learning Using Spectral Methods	James Y. Zou, Daniel J. Hsu, David C. Parkes, Ryan P. Adams	This paper formalizes this notion of contrastive learning for mixture models, and develops spectral algorithms for inferring mixture components specific to a foreground data set when contrasted with a background data set.
251	Message Passing Inference with Chemical Reaction Networks	Nils E. Napp, Ryan P. Adams	In this work, we develop a procedure that can take arbitrary probabilistic graphical models, represented as factor graphs over discrete random variables, and compile them into chemical reaction networks that implement inference.
252	Eluder Dimension and the Sample Complexity of Optimistic Exploration	Daniel Russo, Benjamin Van Roy	In this paper, we develop a regret bound that holds for both classes of algorithms.
253	Learning word embeddings efficiently with noise-contrastive estimation	Andriy Mnih, Koray Kavukcuoglu	We propose a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation.
254	Sparse Inverse Covariance Estimation with Calibration	Tuo Zhao, Han Liu	We propose a semiparametric procedure for estimating high dimensional sparse inverse covariance matrix.
255	Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization	Julien Mairal	In this paper, we intend to make this principle scalable.
256	Sinkhorn Distances: Lightspeed Computation of Optimal Transport	Marco Cuturi	We propose in this work a new family of optimal transportation distances that look at transportation problems from a maximum-entropy perspective.
257	Speedup Matrix Completion with Side Information: Application to Multi-Label Learning	Miao Xu, Rong Jin, Zhi-Hua Zhou	In this work, we develop a novel theory of matrix completion that explicitly explore the side information to reduce the requirement on the number of observed entries.
258	Compete to Compute	Rupesh K. Srivastava, Jonathan Masci, Sohrob Kazerounian, Faustino Gomez, J�rgen Schmidhuber	We apply the concept to gradient-based, backprop-trained artificial multilayer NNs.
259	Fast Determinantal Point Process Sampling with Application to Clustering	Byungkon Kang	In this paper, we address this problem by constructing a rapidly mixing Markov chain, from which we can acquire a sample from the given DPP in sub-cubic time.
260	Information-theoretic lower bounds for distributed statistical estimation with communication constraints	Yuchen Zhang, John Duchi, Michael I. Jordan, Martin J. Wainwright	We establish minimax risk lower bounds for distributed statistical estimation given a budget $B$ of the total number of bits that may be communicated.
261	Projected Natural Actor-Critic	Philip S. Thomas, William C. Dabney, Stephen Giguere, Sridhar Mahadevan	In this paper we address a drawback of natural actor-critics that limits their real-world applicability – their lack of safety guarantees.
262	How to Hedge an Option Against an Adversary: Black-Scholes Pricing is Minimax Optimal	Jacob Abernethy, Peter L. Bartlett, Rafael Frongillo, Andre Wibisono	We consider a popular problem in finance, option pricing, through the lens of an online learning game between Nature and an Investor.
263	Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests	Yacine Jernite, Yonatan Halpern, David Sontag	We give a polynomial-time algorithm for provably learning the structure and parameters of bipartite noisy-or Bayesian networks of binary variables where the top layer is completely hidden.
264	Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion	Franz Kiraly, Louis Theran	We propose a general framework for reconstructing and denoising single entries of incomplete and noisy entries.
265	Learning the Local Statistics of Optical Flow	Dan Rosenbaum, Daniel Zoran, Yair Weiss	Motivated by recent progress in natural image statistics, we use newly available datasets with ground truth optical flow to learn the local statistics of optical flow and rigorously compare the learned model to prior models assumed by computer vision optical flow algorithms.
266	Aggregating Optimistic Planning Trees for Solving Markov Decision Processes	Gunnar Kedenburg, Raphael Fonteneau, Remi Munos	We propose a new algorithm which is based on the construction of a forest of single successor state planning trees.
267	Robust learning of low-dimensional dynamics from large neural ensembles	David Pfau, Eftychios A. Pnevmatikakis, Liam Paninski	Here, we present an approach to dimensionality reduction for neural data that is convex, does not make strong assumptions about dynamics, does not require averaging over many trials and is extensible to more complex statistical models that combine local and global influences.
268	Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising	Min Xu, Tao Qin, Tie-Yan Liu	In this paper, we show that the naive application of MAB algorithms to search advertising for advertisement selection will produce sample selection bias that harms the search engine by decreasing expected revenue and “estimation of the largest mean” (ELM) bias that harms the advertisers by increasing game-theoretic player-regret.
269	Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization	Nataliya Shapovalova, Michalis Raptis, Leonid Sigal, Greg Mori	We propose a new weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video.
270	A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables	Jing Xiang, Seyoung Kim	In this paper, we propose a single-stage method, called A* lasso, that recovers the optimal sparse Bayesian network structure by solving a single optimization problem with A* search algorithm that uses lasso in its scoring system.
271	The Total Variation on Hypergraphs – Learning on Hypergraphs Revisited	Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram	In this paper we present a new learning framework on hypergraphs which fully uses the hypergraph structure.
272	Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints	Rishabh K. Iyer, Jeff A. Bilmes	We investigate two new optimization problems — minimizing a submodular function subject to a submodular lower bound constraint (submodular cover) and maximizing a submodular function subject to a submodular upper bound constraint (submodular knapsack).
273	Scalable Inference for Logistic-Normal Topic Models	Jianfei Chen, Jun Zhu, Zi Wang, Xun Zheng, Bo Zhang	This paper presents a partially collapsed Gibbs sampling algorithm that approaches the provably correct distribution by exploring the ideas of data augmentation.
274	Spectral methods for neural characterization using generalized quadratic models	Il Memming Park, Evan W. Archer, Nicholas Priebe, Jonathan W. Pillow	We describe a set of fast, tractable methods for characterizing neural responses to high-dimensional sensory stimuli using a model we refer to as the generalized quadratic model (GQM).
275	Universal models for binary spike patterns using centered Dirichlet processes	Il Memming Park, Evan W. Archer, Kenneth Latimer, Jonathan W. Pillow	To overcome these limitations, we propose a family of “universal” models for binary spike patterns, where universality refers to the ability to model arbitrary distributions over all $2^m$ binary patterns.
276	Synthesizing Robust Plans under Incomplete Domain Models	Tuan A. Nguyen, Subbarao Kambhampati, Minh Do	Most current planners assume complete domain models and focus on generating correct plans.
277	Integrated Non-Factorized Variational Inference	Shaobo Han, Xuejun Liao, Lawrence Carin	We present a non-factorized variational method for full posterior inference in Bayesian hierarchical models, with the goal of capturing the posterior variable dependencies via efficient and possibly parallel computation.
278	Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions	Ari Pakman, Liam Paninski	We present a new approach to sample from generic binary distributions, based on an exact Hamiltonian Monte Carlo algorithm applied to a piecewise continuous augmentation of the binary distribution of interest.
279	Symbolic Opportunistic Policy Iteration for Factored-Action MDPs	Aswin Raghavan, Roni Khardon, Alan Fern, Prasad Tadepalli	Our ﬁrst contribution is a novel method for symbolic policy backups via the application of constraints, which is used to yield a new efﬁcient symbolic imple- mentation of modiﬁed PI (MPI) for factored action spaces.
280	Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions	Yasin Abbasi, Peter L. Bartlett, Varun Kanade, Yevgeny Seldin, Csaba Szepesvari	We present an algorithm that, under a mixing assumption, achieves $O(\sqrt{T\log\|\Pi\|}+\log\|\Pi\|)$ regret with respect to a comparison set of policies $\Pi$.
281	Flexible sampling of discrete data correlations without the marginal distributions	Alfredo Kalaitzis, Ricardo Silva	We present an efficient algorithm based on recent advances on constrained Hamiltonian Markov chain Monte Carlo that is simple to implement and does not require paying for a quadratic cost in sample size.
282	One-shot learning by inverting a compositional causal process	Brenden M. Lake, Ruslan R. Salakhutdinov, Josh Tenenbaum	Here we present a Hierarchical Bayesian model based on compositionality and causality that can learn a wide range of natural (although simple) visual concepts, generalizing in human-like ways from just one image.
283	Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.	Michel Besserve, Nikos K. Logothetis, Bernhard Sch�lkopf	Here we provide a general framework for the statistical analysis of these interactions when random variables are sampled from stationary time-series of arbitrary objects.
284	Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis	James R. Voss, Luis Rademacher, Mikhail Belkin	The two main contributions of this work are as follows: 1.
285	Deep Neural Networks for Object Detection	Christian Szegedy, Alexander Toshev, Dumitru Erhan	In this paper we go one step further and address the problem of object detection — not only classifying but also precisely localizing objects of various classes using DNNs.
286	Geometric optimisation on positive definite matrices for elliptically contoured distributions	Suvrit Sra, Reshad Hosseini	In this paper we develop \emph{geometric optimisation} for globally optimising certain nonconvex loss functions arising in the modelling of data via elliptically contoured distributions (ECDs).
287	Sign Cauchy Projections and Chi-Square Kernel	Ping Li, Gennady Samorodnitsk, John Hopcroft	In this paper, we propose to use only the signs of the projected data and show that the probability of collision (i.e., when the two signs differ) can be accurately approximated as a function of the chi-square ($\chi^2$) similarity, which is a popular measure for nonnegative data (e.g., when features are generated from histograms as common in text and vision applications).
288	Relevance Topic Model for Unstructured Social Group Activity Recognition	Fang Zhao, Yongzhen Huang, Liang Wang, Tieniu Tan	To tackle this problem, we propose a relevance topic model” for jointly learning meaningful mid-level representations upon bag-of-words (BoW) video representations and a classifier with sparse weights.
289	k-Prototype Learning for 3D Rigid Structures	Hu Ding, Ronald Berezney, Jinhui Xu	In this paper, we study the following new variant of prototype learning, called {\em $k$-prototype learning problem for 3D rigid structures}: Given a set of 3D rigid structures, find a set of $k$ rigid structures so that each of them is a prototype for a cluster of the given rigid structures and the total cost (or dissimilarity) is minimized.
290	Restricting exchangeable nonparametric distributions	Sinead A. Williamson, Steve N. MacEachern, Eric P. Xing	In this paper, we propose a class of exchangeable nonparametric priors obtained by restricting the domain of existing models.
291	Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting	Shunan Zhang, Angela J. Yu	We investigate this behavior in the context of a multi-armed bandit task.
292	Probabilistic Movement Primitives	Alexandros Paraschos, Christian Daniel, Jan R. Peters, Gerhard Neumann	We present a probabilistic formulation of the MP concept that maintains a distribution over trajectories.
293	Policy Shaping: Integrating Human Feedback with Reinforcement Learning	Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L. Isbell, Andrea L. Thomaz	In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping.
294	Multilinear Dynamical Systems for Tensor Time Series	Mark Rogers, Lei Li, Stuart J. Russell	In this paper, we propose the multilinear dynamical system (MLDS) for modeling tensor time series and an expectation-maximization (EM) algorithm to estimate the parameters.
295	Deep content-based music recommendation	Aaron van den Oord, Sander Dieleman, Benjamin Schrauwen	In this paper, we propose to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data.
296	A Stability-based Validation Procedure for Differentially Private Machine Learning	Kamalika Chaudhuri, Staal A. Vinterbo	In this paper, we introduce a generic validation procedure for differentially private machine learning algorithms that apply when a certain stability condition holds on the training algorithm and the validation performance metric.
297	Capacity of strong attractor patterns to model behavioural and cognitive prototypes	Abbas Edalat	We solve the mean field equations for a stochastic Hopfield network with temperature (noise) in the presence of strong, i.e., multiply stored patterns, and use this solution to obtain the storage capacity of such a network.
298	Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA	Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe	We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-$d$ projection matrices (the Fantope).
299	Cluster Trees on Manifolds	Sivaraman Balakrishnan, Srivatsan Narayanan, Alessandro Rinaldo, Aarti Singh, Larry Wasserman	We investigate the problem of estimating the cluster tree for a density $f$ supported on or near a smooth $d$-dimensional manifold $M$ isometrically embedded in $\mathbb{R}^D$.
300	Bayesian inference for low rank spatiotemporal neural receptive fields	Mijung Park, Jonathan W. Pillow	In particular, we introduce a novel prior over low-rank RFs using the restriction of a matrix normal prior to the manifold of low-rank matrices.
301	Adaptive Submodular Maximization in Bandit Setting	Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan	We propose an efficient algorithm for solving our problem and prove that its expected cumulative regret increases logarithmically with time.
302	Generalized Method-of-Moments for Rank Aggregation	Hossein Azari Soufiani, William Chen, David C. Parkes, Lirong Xia	In this paper we propose a class of efficient Generalized Method-of-Moments(GMM) algorithms for computing parameters of the Plackett-Luce model, where the data consists of full rankings over alternatives.
303	Analyzing Hogwild Parallel Gaussian Gibbs Sampling	Matthew J. Johnson, James Saunderson, Alan Willsky	We develop a framework which provides convergence conditions and error bounds along with simple proofs and connections to methods in numerical linear algebra.
304	Minimax Optimal Algorithms for Unconstrained Linear Optimization	Brendan McMahan, Jacob Abernethy	We design and analyze minimax-optimal algorithms for online linear optimization games where the player’s choice is unconstrained.
305	(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings	Abhradeep Guha Thakurta, Adam Smith	We provide a general technique for making online learning algorithms differentially private, in both the full information and bandit settings.
306	Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions	Rishabh K. Iyer, Stefanie Jegelka, Jeff A. Bilmes	In the former two problems, we obtain these bounds through a generic black-box transformation (which can potentially work for any algorithm), while in the case of submodular minimization, we propose a framework of algorithms which depend on choosing an appropriate surrogate for the submodular function.
307	S-Optimality for Active Learning on Gaussian Random Fields	Yifei Ma, Roman Garnett, Jeff Schneider	In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs.
308	Learning Kernels Using Local Rademacher Complexity	Corinna Cortes, Marius Kloft, Mehryar Mohri	We devise two new learning kernel algorithms: one based on a convex optimization problem for which we give an efficient solution using existing learning kernel techniques, and another one that can be formulated as a DC-programming problem for which we describe a solution in detail.
309	Annealing between distributions by averaging moments	Roger B. Grosse, Chris J. Maddison, Ruslan R. Salakhutdinov	We present a novel sequence of intermediate distributions for exponential families: averaging the moments of the initial and target distributions.
310	Optimizing Instructional Policies	Robert V. Lindsey, Michael C. Mozer, William J. Huggins, Harold Pashler	We propose an experimental technique for searching policy spaces using Gaussian process surrogate-based optimization and a generative model of student performance.
311	Translating Embeddings for Modeling Multi-relational Data	Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, Oksana Yakhnenko	Our objective is to propose a canonical model which is easy to train, contains a reduced number of parameters and can scale up to very large databases.
312	Phase Retrieval using Alternating Minimization	Praneeth Netrapalli, Prateek Jain, Sujay Sanghavi	In this paper, we show that a simple alternating minimization algorithm geometrically converges to the solution of one such problem — finding a vector $x$ from $y,A$, where $y = \|A’x\|$ and $\|z\|$ denotes a vector of element-wise magnitudes of $z$ — under the assumption that $A$ is Gaussian.
313	Real-Time Inference for a Gamma Process Model of Neural Spiking	David E. Carlson, Vinayak Rao, Joshua T. Vogelstein, Lawrence Carin	Via exploratory data analysis—using data with partial ground truth as well as two novel data sets—we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) de- tecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using mul- tiple channels.
314	Understanding Dropout	Pierre Baldi, Peter J. Sadowski	We introduce a general formalism for studying dropout on either units or connections, with arbitrary probability values, and use it to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks.
315	The Power of Asymmetry in Binary Hashing	Behnam Neyshabur, Nati Srebro, Ruslan R. Salakhutdinov, Yury Makarychev, Payman Yadollahpour	When approximating binary similarity using the hamming distance between short binary hashes, we shown that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps.
316	Estimation, Optimization, and Parallelism when Data is Sparse	John Duchi, Michael I. Jordan, Brendan McMahan	We study stochastic optimization problems when the \emph{data} is sparse, which is in a sense dual to the current understanding of high-dimensional statistical learning and optimization.
317	A multi-agent control framework for co-adaptation in brain-computer interfaces	Josh S. Merel, Roy Fox, Tony Jebara, Liam Paninski	We present an approach to model this process of co-adaptation between the encoding model of the neural signal and the decoding algorithm as a multi-agent formulation of the linear quadratic Gaussian (LQG) control problem.
318	Modeling Overlapping Communities with Node Popularities	Prem K. Gopalan, Chong Wang, David Blei	We develop a probabilistic approach for accurate network modeling using node popularities within the framework of the mixed-membership stochastic blockmodel (MMSB).
319	Learning from Limited Demonstrations	Beomjoon Kim, Amir-massoud Farahmand, Joelle Pineau, Doina Precup	We propose an approach to learning from demonstration (LfD) which leverages expert data, even if the expert examples are very few or inaccurate.
320	On the Complexity and Approximation of Binary Evidence in Lifted Inference	Guy Van den Broeck, Adnan Darwiche	In this paper, we balance this grim result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference.
321	On the Representational Efficiency of Restricted Boltzmann Machines	James Martens, Arkadev Chattopadhya, Toni Pitassi, Richard Zemel	This paper examines the question: What kinds of distributions can be efficiently represented by Restricted Boltzmann Machines (RBMs)?
322	Memory Limited, Streaming PCA	Ioannis Mitliagkas, Constantine Caramanis, Prateek Jain	We present an algorithm that achieves both: it uses $O(kp)$ memory (meaning storage of any kind) and is able to compute the $k$-dimensional spike with $O(p \log p)$ sample-complexity — the first algorithm of its kind.
323	An Approximate, Efficient LP Solver for LP Rounding	Srikrishna Sridhar, Stephen Wright, Christopher Re, Ji Liu, Victor Bittorf, Ce Zhang	We propose a scheme that is based on a quadratic program relaxation which allows us to use parallel stochastic-coordinate-descent to approximately solve large linear programs efficiently.
324	Linear decision rule as aspiration for simple decision heuristics	�zg�r Simsek	This research has identified three environmental structures that aid heuristics: dominance, cumulative dominance, and noncompensatoriness.
325	On the Relationship Between Binary Classification, Bipartite Ranking, and Binary Class Probability Estimation	Harikrishna Narasimhan, Shivani Agarwal	In this paper, we introduce the notion of weak regret transfer bounds, where the mapping needed to transform a model from one problem to another depends on the underlying probability distribution (and in practice, must be estimated from data).
326	Bayesian inference as iterated random functions with applications to sequential inference in graphical models	Arash Amini, XuanLong Nguyen	We propose a general formalism of iterated random functions with semigroup property, under which exact and approximate Bayesian posterior updates can be viewed as specific instances.
327	Compressive Feature Learning	Hristo S. Paskov, Robert West, John C. Mitchell, Trevor Hastie	This paper addresses the problem of unsupervised feature learning for text data.
328	Moment-based Uniform Deviation Bounds for `k`-means and Friends	Matus J. Telgarsky, Sanjoy Dasgupta	Moment-based Uniform Deviation Bounds for `k`-means and Friends
329	Fast Template Evaluation with Vector Quantization	Mohammad Amin Sadeghi, David Forsyth	We describe a method that achieves a substantial end-to-end speedup over the best current methods, without loss of accuracy.
330	Context-sensitive active sensing in humans	Sheeraz Ahmad, He Huang, Angela J. Yu	Here, we propose a myopic approximation to C-DAC, which also takes behavioral costs into account, but achieves a significant reduction in complexity by looking only one step ahead. We also present data from a human active visual search experiment, and compare the performance of the various models against human behavior.
331	A New Convex Relaxation for Tensor Completion	Bernardino Romera-Paredes, Massimiliano Pontil	In this paper, we highlight some limitations of this approach and propose an alternative convex relaxation on the Euclidean unit ball.
332	Variational Planning for Graph-based MDPs	Qiang Cheng, Qiang Liu, Feng Chen, Alexander T. Ihler	We present a new variational framework to describe and solve the planning problem of MDPs, and derive both exact and approximate planning algorithms.
333	Convex Two-Layer Modeling	�zlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans	Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training.
334	Sketching Structured Matrices for Faster Nonlinear Regression	Haim Avron, Vikas Sindhwani, David Woodruff	We present empirical results confirming both the practical value of our modeling framework, as well as speedup benefits of randomized regression.”
335	(More) Efficient Reinforcement Learning via Posterior Sampling	Ian Osband, Daniel Russo, Benjamin Van Roy	Most provably efficient learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration.
336	Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition	Adel Javanmard, Andrea Montanari	We assume that only a small subset of covariates is `active’ (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates.
337	Efficient Exploration and Value Function Generalization in Deterministic Systems	Zheng Wen, Benjamin Van Roy	We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization.
338	Bellman Error Based Feature Generation using Random Projections on Sparse Spaces	Mahdi Milani Fard, Yuri Grinberg, Amir-massoud Farahmand, Joelle Pineau, Doina Precup	We propose a simple, fast and robust algorithm based on random projections, which generates BEBFs for sparse feature spaces.
339	Learning and using language via recursive pragmatic reasoning about other agents	Nathaniel J. Smith, Noah Goodman, Michael Frank	We describe a model in which language learners assume that they jointly approximate a shared, external lexicon and reason recursively about the goals of others in using this lexicon.
340	Learning Stochastic Inverses	Andreas Stuhlm�ller, Jacob Taylor, Noah Goodman	To make use of inverses before convergence, we describe the Inverse MCMC algorithm, which uses stochastic inverses to make block proposals for a Metropolis-Hastings sampler.
341	Learning invariant representations and applications to face verification	Qianli Liao, Joel Z. Leibo, Tomaso Poggio	In accord with a recent theory of transformation-invariance, we propose a model that, while capturing other common convolutional networks as special cases, can also be used with arbitrary identity-preserving transformations.
342	Optimization, Learning, and Games with Predictable Sequences	Sasha Rakhlin, Karthik Sridharan	We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences.
343	Adaptivity to Local Smoothness and Dimension in Kernel Regression	Samory Kpotufe, Vikas Garg	We present the first result for kernel regression where the procedure adapts locally at a point $x$ to both the unknown local dimension of the metric and the unknown H\{o}lder-continuity of the regression function at $x$.
344	Adaptive dropout for training deep neural networks	Jimmy Ba, Brendan Frey	We describe a model in which a binary belief network is overlaid on a neural network and is used to decrease the information content of its hidden units by selectively setting activities to zero.
345	Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream	Daniel L. Yamins, Ha Hong, Charles Cadieu, James J. DiCarlo	In this work, we construct models of the ventral stream using a novel optimization procedure for category-level object recognition problems, and produce RDMs resembling both macaque IT and human ventral stream.
346	Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex	Sam Patterson, Yee Whye Teh	In this paper we investigate the use of Langevin Monte Carlo methods on the probability simplex and propose a new method, Stochastic gradient Riemannian Langevin dynamics, which is simple to implement and can be applied online.
347	Distributed Representations of Words and Phrases and their Compositionality	Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, Jeff Dean	In this paper we present several improvements that make the Skip-gram model more expressive and enable it to learn higher quality vectors more rapidly.
348	Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel	Tai Qin, Karl Rohe	The current paper extends the previous theoretical results to the more canonical spectral clustering algorithm in a way that removes any assumption on the minimum degree and provides guidance on the choice of tuning parameter.
349	Analyzing the Harmonic Structure in Graph-Based Learning	Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang	In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost.
350	Recurrent linear models of simultaneously-recorded neural populations	Marius Pachitariu, Biljana Petreska, Maneesh Sahani	Here we describe a new, scalable approach to discovering the low-dimensional dynamics that underlie simultaneously recorded spike trains from a neural population.
351	Scalable Influence Estimation in Continuous-Time Diffusion Networks	Nan Du, Le Song, Manuel Gomez Rodriguez, Hongyuan Zha	In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks.
352	Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC	Roger Frigola, Fredrik Lindsten, Thomas B. Sch�n, Carl Edward Rasmussen	We present a fully Bayesian approach to inference and learning in nonlinear nonparametric state-space models.
353	BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables	Cho-Jui Hsieh, Matyas A. Sustik, Inderjit S. Dhillon, Pradeep K. Ravikumar, Russell Poldrack	In this paper, we develop an algorithm BigQUIC, which can solve 1 million dimensional l1-regularized Gaussian MLE problems (which would thus have 1000 billion parameters) using a single machine, with bounded memory.
354	The Fast Convergence of Incremental PCA	Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund	We prove the first finite-sample convergence rates for any incremental PCA algorithm using sub-quadratic time and memory per iteration.
355	Multisensory Encoding, Decoding, and Identification	Aurel A. Lazar, Yevgeniy Slutskiy	We investigate a spiking neuron model of multisensory integration.
356	Adaptive Anonymity via `b`-Matching	Krzysztof M. Choromanski, Tony Jebara, Kui Tang	Novel algorithms and theory are provided to implement this type of anonymity.
357	Optimal integration of visual speed across different spatiotemporal frequency channels	Matjaz Jogan, Alan A. Stocker	Here we propose that perceived speed is the result of optimal integration of speed information from independent spatiotemporal frequency tuned channels.
358	Matrix factorization with binary components	Martin Slawski, Matthias Hein, Pavlo Lutsik	Motivated by an application in computational biology, we consider constrained low-rank matrix factorization problems with $\{0,1\}$-constraints on one of the factors.
359	Learning to Pass Expectation Propagation Messages	Nicolas Heess, Daniel Tarlow, John Winn	In this work, we study the question of whether it is possible to automatically derive fast and accurate EP updates by learning a discriminative model e.g., a neural network or random forest) to map EP message inputs to EP message outputs.
360	Robust Low Rank Kernel Embeddings of Multivariate Distributions	Le Song, Bo Dai	In this paper, we propose a hierarchical low rank decomposition of kernels embeddings which can exploit such low rank structures in data while being robust to model misspecification.