# Paper Digest: NIPS 2013 Highlights

The Conference on Neural Information Processing Systems (NIPS) is one of the top machine learning conferences in the world. In 2013, it is to be held in Lake Tahoe, Neveda.

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to **sign up our free paper digest service ** to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team

team@paperdigest.org

#### TABLE 1: NIPS 2013 Papers

Title | Authors | Highlight | |
---|---|---|---|

1 | The Randomized Dependence Coefficient | David Lopez-Paz, Philipp Hennig, Bernhard Sch�lkopf | We introduce the Randomized Dependence Coefficient (RDC), a measure of non-linear dependence between random variables of arbitrary dimension based on the Hirschfeld-Gebelein-Rényi Maximum Correlation Coefficient. |

2 | Documents as multiple overlapping windows into grids of counts | Alessandro Perina, Nebojsa Jojic, Manuele Bicego, Andrzej Truski | In this paper, we overcome to this issue with the \emph{Componential Counting Grid} which brings the componential nature of topic models to the basic counting grid. |

3 | Reciprocally Coupled Local Estimators Implement Bayesian Information Integration Distributively | Wen-Hao Zhang, Si Wu | The present study proposes a novel mechanism to achieve this. |

4 | Latent Maximum Margin Clustering | Guang-Tong Zhou, Tian Lan, Arash Vahdat, Greg Mori | We present a maximum margin framework that clusters data using latent variables. |

5 | Data-driven Distributionally Robust Polynomial Optimization | Martin Mevissen, Emanuele Ragnoli, Jia Yuan Yu | We consider robust optimization for polynomial optimization problems where the uncertainty set is a set of candidate probability density functions. |

6 | Transfer Learning in a Transductive Setting | Marcus Rohrbach, Sandra Ebert, Bernt Schiele | In this work, we extend transfer learning with semi-supervised learning to exploit unlabeled instances of (novel) categories with no or only a few labeled instances. |

7 | Bayesian optimization explains human active search | Ali Borji, Laurent Itti | We try to unravel the general underlying algorithm people may be using while searching for the maximum of an invisible 1D function. |

8 | Provable Subspace Clustering: When LRR meets SSC | Yu-Xiang Wang, Huan Xu, Chenlei Leng | Because the representation matrix is often simultaneously sparse and low-rank, we propose a new algorithm, termed Low-Rank Sparse Subspace Clustering (LRSSC), by combining SSC and LRR, and develops theoretical guarantees of when the algorithm succeeds. |

9 | Generalized Random Utility Models with Multiple Types | Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai, David C. Parkes | We propose a model for demand estimation in multi-agent, differentiated product settings and present an estimation algorithm that uses reversible jump MCMC techniques to classify agents’ types. |

10 | Polar Operators for Structured Sparse Estimation | Xinhua Zhang, Yao-Liang Yu, Dale Schuurmans | Our first contribution is to uncover a rich class of structured sparse regularizers whose polar operator can be evaluated efficiently. |

11 | On Decomposing the Proximal Map | Yao-Liang Yu | Motivated by the need of combining regularizers to simultaneously induce different types of structures, this paper initiates a systematic investigation of when the proximal map of a sum of functions decomposes into the composition of the proximal maps of the individual summands. |

12 | Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs | Liam C. MacDermed, Charles L. Isbell | (3) We present a method to transform any DecPOMDP into a DecPOMDP with bounded beliefs (the number of beliefs is a free parameter) using optimal (not lossless) belief compression. |

13 | PAC-Bayes-Empirical-Bernstein Inequality | Ilya O. Tolstikhin, Yevgeny Seldin | We present PAC-Bayes-Empirical-Bernstein inequality. |

14 | Modeling Clutter Perception using Parametric Proto-object Partitioning | Chen-Ping Yu, Wen-Yu Hua, Dimitris Samaras, Greg Zelinsky | We introduce a novel parametric method of merging superpixels by modeling mixture of Weibull distributions on similarity distance statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception. |

15 | Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching | Marcelo Fiori, Pablo Sprechmann, Joshua Vogelstein, Pablo Muse, Guillermo Sapiro | We propose a robust graph matching algorithm inspired in sparsity-related techniques. |

16 | Transportability from Multiple Environments with Limited Experiments | Elias Bareinboim, Sanghack Lee, Vasant Honavar, Judea Pearl | This paper considers the problem of transferring experimental findings learned from multiple heterogeneous domains to a target environment, in which only limited experiments can be performed. |

17 | More data speeds up training time in learning halfspaces over sparse vectors | Amit Daniely, Nati Linial, Shai Shalev-Shwartz | Our main contribution is a novel, non-cryptographic, methodology for establishing computational-statistical gaps, which allows us to show that, under a widely believed assumption that refuting random $\mathrm{3CNF}$ formulas is hard, efficiently learning this class using $O\left(n/\epsilon^2\right)$ examples is impossible. |

18 | Causal Inference on Time Series using Restricted Structural Equation Models | Jonas Peters, Dominik Janzing, Bernhard Sch�lkopf | (2) Practical: If there are no feedback loops between time series, we propose an algorithm based on non-linear independence tests of time series. |

19 | Deep Fisher Networks for Large-Scale Image Classification | Karen Simonyan, Andrea Vedaldi, Andrew Zisserman | In this paper, we explore the extent of this analogy, proposing a version of the state-of-the-art Fisher vector image encoding that can be stacked in multiple layers. |

20 | Sparse Additive Text Models with Low Rank Background | Lei Shi | This paper extends to propose sparse additive model with low rank background (SAM-LRB), and simple yet efficient estimation. |

21 | Variance Reduction for Stochastic Gradient Optimization | Chong Wang, Xi Chen, Alexander J. Smola, Eric P. Xing | In this paper, we develop a general approach of using control variate for variance reduction in stochastic gradient. |

22 | Training and Analysing Deep Recurrent Neural Networks | Michiel Hermans, Benjamin Schrauwen | Common recurrent neural networks, however, do not explicitly accommodate such a hierarchy, and most research on them has been focusing on training algorithms rather than on their basic architecture. |

23 | A simple example of Dirichlet process mixture inconsistency for the number of components | Jeffrey W. Miller, Matthew T. Harrison | In this note, we give an elementary demonstration of this inconsistency in what is perhaps the simplest possible setting: a DPM with normal components of unit variance, applied to data from a mixture” with one standard normal component. |

24 | Variational Policy Search via Trajectory Optimization | Sergey Levine, Vladlen Koltun | We present a method that uses trajectory optimization as a powerful exploration strategy that guides the policy search. |

25 | Scalable kernels for graphs with continuous attributes | Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, Karsten Borgwardt | In this paper, we present a class of path kernels with computational complexity $\mathcal{O}(n^2 (m + \delta^2))$, where $\delta$ is the graph diameter and $m$ the number of edges. |

26 | Density estimation from unweighted k-nearest neighbor graphs: a roadmap | Ulrike Von Luxburg, Morteza Alamgir | We prove how one can estimate the density p just from the unweighted adjacency matrix of the graph, without knowing the points themselves or their distance or similarity scores. |

27 | Decision Jungles: Compact and Rich Models for Classification | Jamie Shotton, Toby Sharp, Pushmeet Kohli, Sebastian Nowozin, John Winn, Antonio Criminisi | We present and compare two new node merging algorithms that jointly optimize both the features and the structure of the DAGs efficiently. |

28 | What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach | Zhenwen Dai, Georgios Exarchakis, J�rg L�cke | Here, we for the first time apply a model with non-linear feature superposition and explicit position encoding. |

29 | Actor-Critic Algorithms for Risk-Sensitive MDPs | Prashanth L.A., Mohammad Ghavamzadeh | In this paper, we consider both discounted and average reward Markov decision processes. |

30 | Summary Statistics for Partitionings and Feature Allocations | Isik B. Fidaner, Taylan Cemgil | In this paper, we introduce novel statistics based on block sizes for representing sample sets of partitionings and feature allocations. |

31 | One-shot learning and big data with n=2 | Lee H. Dicker, Dean P. Foster | We model a one-shot learning” situation, where very few (scalar) observations $y_1,…,y_n$ are available. |

32 | Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression | Michalis Titsias RC AUEB, Miguel Lazaro-Gredilla | We introduce a novel variational method that allows to approximately integrate out kernel hyperparameters, such as length-scales, in Gaussian process regression. |

33 | Correlations strike back (again): the case of associative memory retrieval | Cristina Savin, Peter Dayan, Mate Lengyel | We show that activity-dependent learning generically produces such correlations, and failing to take them into account in the dynamics of memory retrieval leads to catastrophically poor recall. |

34 | Optimal Neural Population Codes for High-dimensional Stimulus Variables | Zhuo Wang, Alan A. Stocker, Daniel D. Lee | We consider solutions for a minimal case where the number of neurons in the population is equal to the number of stimulus dimensions (diffeomorphic). |

35 | Online Variational Approximations to non-Exponential Family Change Point Models: With Application to Radar Tracking | Ryan D. Turner, Steven Bottone, Clay J. Stanek | We apply our methodology to a tracking problem using radar data with a signal-to-noise feature that is Rice distributed. |

36 | Accelerating Stochastic Gradient Descent using Predictive Variance Reduction | Rie Johnson, Tong Zhang | To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG). |

37 | Using multiple samples to learn mixture models | Jason D. Lee, Ran Gilad-Bachrach, Rich Caruana | In this work we make the assumption that we have access to several samples drawn from the same $K$ underlying distributions, but with different mixing weights. |

38 | Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition | Tzu-Kuo Huang, Jeff Schneider | Inspired by recent advances in spectral learning methods, we propose to study this problem from a different perspective: moment matching and spectral decomposition. |

39 | On model selection consistency of penalized M-estimators: a geometric theory | Jason D. Lee, Yuekai Sun, Jonathan E. Taylor | We generalize the notion of irrepresentable to geometrically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of M-estimators with such penalties. |

40 | Dropout Training as Adaptive Regularization | Stefan Wager, Sida Wang, Percy S. Liang | By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. |

41 | New Subsampling Algorithms for Fast Least Squares Regression | Paramveer Dhillon, Yichao Lu, Dean P. Foster, Lyle Ungar | We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation. |

42 | Faster Ridge Regression via the Subsampled Randomized Hadamard Transform | Yichao Lu, Paramveer Dhillon, Dean P. Foster, Lyle Ungar | We propose a fast algorithm for ridge regression when the number of features is much larger than the number of observations ($p \gg n$). |

43 | Accelerated Mini-Batch Stochastic Dual Coordinate Ascent | Shai Shalev-Shwartz, Tong Zhang | Our main contribution is to introduce an accelerated mini-batch version of SDCA and prove a fast convergence rate for this method. |

44 | Improved and Generalized Upper Bounds on the Complexity of Policy Iteration | Bruno Scherrer | Given a Markov Decision Process (MDP) with $n$ states and $m$ actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal $\gamma$-discounted optimal policy. |

45 | Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation | Dahua Lin | To tackle this problem, we propose a Bayesian learning algorithm for DP mixture models. |

46 | Online Robust PCA via Stochastic Optimization | Jiashi Feng, Huan Xu, Shuicheng Yan | In this paper, we develop an Online Robust Principal Component Analysis (OR-PCA) that processes one sample per time instance and hence its memory cost is independent of the data size, significantly enhancing the computation and storage efficiency. |

47 | Least Informative Dimensions | Fabian Sinz, Anna Stockl, Jan Grewe, Jan Benda | We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system. |

48 | A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks | Junming Yin, Qirong Ho, Eric P. Xing | We propose a scalable approach for making inference about latent spaces of large networks. |

49 | Understanding variable importances in forests of randomized trees | Gilles Louppe, Louis Wehenkel, Antonio Sutera, Pierre Geurts | In this work we characterize the Mean Decrease Impurity (MDI) variable importances as measured by an ensemble of totally randomized trees in asymptotic sample and ensemble size conditions. |

50 | Correlated random features for fast semi-supervised learning | Brian McWilliams, David Balduzzi, Joachim M. Buhmann | This paper presents Correlated Nystrom Views (XNV), a fast semi-supervised algorithm for regression and classification. |

51 | Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture | Trevor Campbell, Miao Liu, Brian Kulis, Jonathan P. How, Lawrence Carin | This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. |

52 | Better Approximation and Faster Algorithm Using the Proximal Average | Yao-Liang Yu | Better Approximation and Faster Algorithm Using the Proximal Average |

53 | Rapid Distance-Based Outlier Detection via Sampling | Mahito Sugiyama, Karsten Borgwardt | We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. |

54 | Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima | Po-Ling Loh, Martin J. Wainwright | Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models; regression in generalized linear models using nonconvex regularizers such as SCAD and MCP; and graph and inverse covariance matrix estimation. |

55 | Non-Linear Domain Adaptation with Boosting | Carlos J. Becker, Christos M. Christoudias, Pascal Fua | In this paper we present a multi-task learning algorithm for domain adaptation based on boosting. |

56 | Mid-level Visual Element Discovery as Discriminative Mode Seeking | Carl Doersch, Abhinav Gupta, Alexei A. Efros | In this work, we pose visual element discovery as discriminative mode seeking, drawing connections to the the well-known and well-studied mean-shift algorithm. |

57 | q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions | Assaf Glazer, Michael Lindenbaum, Shaul Markovitch | In this paper we introduce a novel method that can efficiently estimate a family of hierarchical dense sets in high-dimensional distributions. |

58 | Auditing: Active Learning with Outcome-Dependent Query Costs | Sivan Sabato, Anand D. Sarwate, Nati Srebro | We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. |

59 | A message-passing algorithm for multi-agent trajectory planning | Jos� Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia | We describe a novel approach for computing collision-free \emph{global} trajectories for $p$ agents with specified initial and final configurations, based on an improved version of the alternating direction method of multipliers (ADMM) algorithm. |

60 | Learning Stochastic Feedforward Neural Networks | Yichuan Tang, Ruslan R. Salakhutdinov | In this paper, we propose a stochastic feedforward network with hidden layers having \emph{both deterministic and stochastic} variables. |

61 | Inferring neural population dynamics from multiple partial recordings of the same neural circuit | Srini Turaga, Lars Buesing, Adam M. Packer, Henry Dalgleish, Noah Pettit, Michael Hausser, Jakob H. Macke | Here we contribute a statistical method for stitching” together sequentially imaged sets of neurons into one model by phrasing the problem as fitting a latent dynamical system with missing observations. |

62 | Multi-Prediction Deep Boltzmann Machines | Ian Goodfellow, Mehdi Mirza, Aaron Courville, Yoshua Bengio | We introduce the Multi-Prediction Deep Boltzmann Machine (MP-DBM). |

63 | Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation | Vibhav Vineet, Carsten Rother, Philip Torr | In this work we explore the synergy effects between intrinsic scene properties recovered from an image, and the objects and attributes present in the scene. |

64 | Blind Calibration in Compressed Sensing using Message Passing Algorithms | Christophe Schulke, Francesco Caltagirone, Florent Krzakala, Lenka Zdeborov� | In this paper we study the so-called blind calibration, i.e. when the training signals that are available to perform the calibration are sparse but unknown. |

65 | Learning Trajectory Preferences for Manipulators via Iterative Improvement | Ashesh Jain, Brian Wojcik, Thorsten Joachims, Ashutosh Saxena | In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks. |

66 | Large Scale Distributed Sparse Precision Estimation | Huahua Wang, Arindam Banerjee, Cho-Jui Hsieh, Pradeep K. Ravikumar, Inderjit S. Dhillon | We present an inexact alternating direction method of multiplier (ADMM) algorithm for CLIME, and establish rates of convergence for both the objective and optimality conditions. |

67 | Neural representation of action sequences: how far can a simple snippet-matching model take us? | Cheston Tan, Jedediah M. Singer, Thomas Serre, David Sheinberg, Tomaso Poggio | We find that a baseline model, one that simply computes a linear weighted sum of ventral and dorsal responses to short action “snippets”, produces surprisingly good fits to the neural data. |

68 | On Algorithms for Sparse Multi-factor NMF | Siwei Lyu, Xin Wang | In this work, we describe a new simple and efficient algorithm for multi-factor nonnegative matrix factorization problem ({mfNMF}), which generalizes the original NMF problem to more than two factors. |

69 | Dirty Statistical Models | Eunho Yang, Pradeep K. Ravikumar | We provide a unified framework for the high-dimensional analysis of “superposition-structured” or “dirty” statistical models: where the model parameters are a “superposition” of structurally constrained parameters. |

70 | Parallel Sampling of DP Mixture Models using Sub-Cluster Splits | Jason Chang, John W. Fisher III | We present a novel MCMC sampler for Dirichlet process mixture models that can be used for conjugate or non-conjugate prior distributions. |

71 | Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent | Tianbao Yang | We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. |

72 | Prior-free and prior-dependent regret bounds for Thompson Sampling | Sebastien Bubeck, Che-Yu Liu | We consider the stochastic multi-armed bandit problem with a prior distribution on the reward distributions. |

73 | Structured Learning via Logistic Regression | Justin Domke | This paper observes that if the inference problem is “smoothed” through the addition of entropy terms, for fixed messages, the learning objective reduces to a traditional (non-structured) logistic regression problem with respect to parameters. |

74 | Which Space Partitioning Tree to Use for Search? | Parikshit Ram, Alexander Gray | To this end, we present the theoretical results which imply that trees with better vector quantization performance have better search performance guarantees. |

75 | Projecting Ising Model Parameters for Fast Mixing | Justin Domke, Xianghang Liu | We present an algorithm to project Ising model parameters onto a parameter set that is guaranteed to be fast mixing, under several divergences. |

76 | Mixed Optimization for Smooth Functions | Mehrdad Mahdavi, Lijun Zhang, Rong Jin | In this work, we consider a new setup for optimizing smooth functions, termed as {\bf Mixed Optimization}, which allows to access both a stochastic oracle and a full gradient oracle. |

77 | Conditional Random Fields via Univariate Exponential Families | Eunho Yang, Pradeep K. Ravikumar, Genevera I. Allen, Zhandong Liu | We thus introduce a “novel subclass of CRFs”, derived by imposing node-wise conditional distributions of response variables conditioned on the rest of the responses and the covariates as arising from univariate exponential families. |

78 | Stochastic blockmodel approximation of a graphon: Theory and consistent estimation | Edo M. Airoldi, Thiago B. Costa, Stanley H. Chan | In this paper, we propose a computationally efficient algorithm to estimate a graphon from a set of observed graphs generated from it. |

79 | Reinforcement Learning in Robust Markov Decision Processes | Shiau Hong Lim, Huan Xu, Shie Mannor | We consider a problem setting where some unknown parts of the state space can have arbitrary transitions while other parts are purely stochastic. |

80 | On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization | Ke Hou, Zirui Zhou, Anthony Man-Cho So, Zhi-Quan Luo | In this paper, we show that for a large class of loss functions, the convergence rate of the PGM is in fact linear. |

81 | Recurrent networks of coupled Winner-Take-All oscillators for solving constraint satisfaction problems | Hesham Mostafa, Lorenz. K. Mueller, Giacomo Indiveri | We present a recurrent neuronal network, modeled as a continuous-time dynamical system, that can solve constraint satisfaction problems. |

82 | Latent Structured Active Learning | Wenjie Luo, Alex Schwing, Raquel Urtasun | In this paper we present active learning algorithms in the context of structured prediction problems. |

83 | A Gang of Bandits | Nicol� Cesa-Bianchi, Claudio Gentile, Giovanni Zappella | In this paper, we introduce novel algorithmic approaches to the solution of such networked bandit problems. |

84 | Learning Feature Selection Dependencies in Multi-task Learning | Daniel Hern�ndez-Lobato, Jos� Miguel Hern�ndez-Lobato | A probabilistic model based on the horseshoe prior is proposed for learning dependencies in the process of identifying relevant features for prediction. |

85 | B-test: A Non-parametric, Low Variance Kernel Two-sample Test | Wojciech Zaremba, Arthur Gretton, Matthew Blaschko | We propose a family of maximum mean discrepancy (MMD) kernel two-sample tests that have low sample complexity and are consistent. |

86 | Online PCA for Contaminated Data | Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan | Here we propose the online robust PCA algorithm, which is able to improve the PCs estimation upon an initial one steadily, even when faced with a constant fraction of outliers. |

87 | Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) | Francis Bach, Eric Moulines | We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. |

88 | Efficient Algorithm for Privately Releasing Smooth Queries | Ziteng Wang, Kai Fan, Jiaqi Zhang, Liwei Wang | We study differentially private mechanisms for answering \emph{smooth} queries on databases consisting of data points in $\mathbb{R}^d$. |

89 | Beyond Pairwise: Provably Fast Algorithms for Approximate k-Way Similarity Search |
Anshumali Shrivastava, Ping Li | In this paper, we focus on problems related to \emph{3-way Jaccard} similarity: $\mathcal{R}^{3way}= \frac{|S_1 \cap S_2 \cap S_3|}{|S_1 \cup S_2 \cup S_3|}$, $S_1, S_2, S_3 \in \mathcal{C}$, where $\mathcal{C}$ is a size $n$ collection of sets (or binary vectors). |

90 | Unsupervised Spectral Learning of Finite State Transducers | Raphael Bailly, Xavier Carreras, Ariadna Quattoni | In this paper we address the more realistic, yet challenging setting where the alignments are unknown to the learning algorithm. |

91 | Learning a Deep Compact Image Representation for Visual Tracking | Naiyan Wang, Dit-Yan Yeung | In this paper, we study the challenging problem of tracking the trajectory of a moving object in a video with possibly very complex background. |

92 | Learning Multi-level Sparse Representations | Ferran Diego Andilla, Fred A. Hamprecht | Driven by this concrete problem, we propose a decomposition of the matrix of observations into a product of more than two sparse matrices, with the rank decreasing from lower to higher levels. |

93 | Robust Data-Driven Dynamic Programming | Grani Adiwena Hanasusanto, Daniel Kuhn | To mitigate these small sample effects, we propose a robust data-driven DP scheme, which replaces the expectations in the DP recursions with worst-case expectations over a set of distributions close to the best estimate. |

94 | Low-Rank Matrix and Tensor Completion via Adaptive Sampling | Akshay Krishnamurthy, Aarti Singh | We study low rank matrix and tensor completion and propose novel algorithms that employ adaptive sampling schemes to obtain strong performance guarantees for these problems. |

95 | Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms | Adrien Todeschini, Fran�ois Caron, Marie Chavent | We propose a novel class of algorithms for low rank matrix completion. |

96 | Distributed Exploration in Multi-Armed Bandits | Eshcar Hillel, Zohar S. Karnin, Tomer Koren, Ronny Lempel, Oren Somekh | On the other extreme, we present an algorithm that achieves the ideal factor $k$ speed-up in learning performance, with communication only logarithmic in~$1/\epsilon$. |

97 | The Pareto Regret Frontier | Wouter M. Koolen | We study which such regret trade-offs can be achieved, and how. |

98 | Direct 0-1 Loss Minimization and Margin Maximization with Boosting | Shaodan Zhai, Tian Xia, Ming Tan, Shaojun Wang | We propose a boosting method, DirectBoost, a greedy coordinate descent algorithm that builds an ensemble classifier of weak classifiers through directly minimizing empirical classification error over labeled training examples; once the training classification error is reduced to a local coordinatewise minimum, DirectBoost runs a greedy coordinate ascent algorithm that continuously adds weak classifiers to maximize any targeted arbitrarily defined margins until reaching a local coordinatewise maximum of the margins in a certain sense. |

99 | Regret based Robust Solutions for Uncertain Markov Decision Processes | Asrar Ahmed, Pradeep Varakantham, Yossiri Adulyasak, Patrick Jaillet | In this paper, we seek robust policies for uncertain Markov Decision Processes (MDPs). |

100 | Speeding up Permutation Testing in Neuroimaging | Chris Hinrichs, Vamsi K. Ithapu, Qinyuan Sun, Sterling C. Johnson, Vikas Singh | In this paper, we observe that permutation testing in fact amounts to populating the columns of a very large matrix P. By analyzing the spectrum of this matrix, under certain conditions, we see that P has a low-rank plus a low-variance residual decomposition which makes it suitable for highly sub–sampled — on the order of 0.5% — matrix completion methods. |

101 | Generalized Denoising Auto-Encoders as Generative Models | Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent | We propose here a different attack on the problem, which deals with all these issues: arbitrary (but noisy enough) corruption, arbitrary reconstruction loss (seen as a log-likelihood), handling both discrete and continuous-valued variables, and removing the bias due to non-infinitesimal corruption noise (or non-infinitesimal contractive penalty). |

102 | Supervised Sparse Analysis and Synthesis Operators | Pablo Sprechmann, Roee Litman, Tal Ben Yakar, Alexander M. Bronstein, Guillermo Sapiro | In this paper, we propose a new and computationally efficient framework for learning sparse models. |

103 | Low-rank matrix reconstruction and clustering via approximate message passing | Ryosuke Matsushita, Toshiyuki Tanaka | We propose an efficient approximate message passing algorithm, derived from the belief propagation algorithm, to perform the Bayesian inference for matrix reconstruction. |

104 | Reasoning With Neural Tensor Networks for Knowledge Base Completion | Richard Socher, Danqi Chen, Christopher D. Manning, Andrew Ng | The goal of this paper is to develop a more powerful neural network model suitable for inference over these relationships. |

105 | Zero-Shot Learning Through Cross-Modal Transfer | Richard Socher, Milind Ganjoo, Christopher D. Manning, Andrew Ng | This work introduces a model that can recognize objects in images even if no training data is available for the object class. |

106 | Estimating LASSO Risk and Noise Level | Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari | We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. |

107 | Learning Adaptive Value of Information for Structured Prediction | David J. Weiss, Ben Taskar | We propose an architecture that uses a rich feedback loop between extraction and prediction. |

108 | Efficient Online Inference for Bayesian Nonparametric Relational Models | Dae Il Kim, Prem K. Gopalan, David Blei, Erik Sudderth | We introduce a new model for these phenomena, the hierarchical Dirichlet process relational model, which allows nodes to have mixed membership in an unbounded set of communities. |

109 | Approximate inference in latent Gaussian-Markov models from continuous time observations | Botond Cseke, Manfred Opper, Guido Sanguinetti | We propose an approximate inference algorithm for continuous time Gaussian-Markov process models with both discrete and continuous time likelihoods. |

110 | Linear Convergence with Condition Number Independent Access of Full Gradients | Lijun Zhang, Mehrdad Mahdavi, Rong Jin | In this paper, we propose to reduce the number of full gradient required by allowing the algorithm to access the stochastic gradients of the objective function. |

111 | When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements | Divyanshu Vats, Richard Baraniuk | We consider the problem of accurately estimating a high-dimensional sparse vector using a small number of linear measurements that are contaminated by noise. |

112 | Wavelets on Graphs via Deep Learning | Raif Rustamov, Leonidas J. Guibas | This paper introduces a machine learning framework for constructing graph wavelets that can sparsely represent a given class of signals. |

113 | Robust Spatial Filtering with Beta Divergence | Wojciech Samek, Duncan Blythe, Klaus-Robert M�ller, Motoaki Kawanabe | Inspired by concepts from the field of information geometry we propose a novel approach for robustifying CSP. |

114 | Convex Relaxations for Permutation Problems | Fajwel Fogel, Rodolphe Jenatton, Francis Bach, Alexandre D’Aspremont | We present numerical experiments on archeological data, Markov chains and gene sequences. |

115 | High-Dimensional Gaussian Process Bandits | Josip Djolonga, Andreas Krause, Volkan Cevher | In particular, we present the SI-BO algorithm, which leverages recent low-rank matrix recovery techniques to learn the underlying subspace of the unknown function and applies Gaussian Process Upper Confidence sampling for optimization of the function. |

116 | A memory frontier for complex synapses | Subhaneil Lahiri, Surya Ganguli | To address this, we develop new mathematical theorems elucidating the relationship between the structural organization and memory properties of complex synapses that are themselves molecular networks. |

117 | Marginals-to-Models Reducibility | Tim Roughgarden, Michael Kearns | We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. |

118 | First-order Decomposition Trees | Nima Taghipour, Jesse Davis, Hendrik Blockeel | In this paper, we introduce FO-dtrees, which upgrade propositional dtrees to the first-order level. |

119 | A Comparative Framework for Preconditioned Lasso Algorithms | Fabian L. Wauthier, Nebojsa Jojic, Michael I. Jordan | In this paper we propose an agnostic, theoretical framework for comparing Preconditioned Lasso algorithms to the Lasso without having to choose $\lambda$. |

120 | Lasso Screening Rules via Dual Polytope Projection | Jie Wang, Jiayu Zhou, Peter Wonka, Jieping Ye | In this paper, we propose an efficient and effective screening rule via Dual Polytope Projections (DPP), which is mainly based on the uniqueness and nonexpansiveness of the optimal dual solution due to the fact that the feasible set in the dual space is a convex and closed polytope. |

121 | Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent | Yuening Hu, Jordan L. Ying, Hal Daume III, Z. Irene Ying | We present results on both synthetic and real data that show the beta coalescent outperforms Kingman’s coalescent on real datasets and is qualitatively better at capturing data in bushy hierarchies. |

122 | A Latent Source Model for Nonparametric Time Series Classification | George H. Chen, Stanislav Nikolov, Devavrat Shah | To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a weighted majority voting” classification rule that can be approximated by a nearest-neighbor classifier. |

123 | Efficient Optimization for Sparse Gaussian Process Regression | Yanshuai Cao, Marcus A. Brubaker, David J. Fleet, Aaron Hertzmann | We propose an efficient discrete optimization algorithm for selecting a subset of training data to induce sparsity for Gaussian process regression. |

124 | Lexical and Hierarchical Topic Regression | Viet-An Nguyen, Jordan L. Ying, Philip Resnik | Inspired by a two-level theory that unifies agenda setting and ideological framing, we propose supervised hierarchical latent Dirichlet allocation (SHLDA) which jointly captures documents’ multi-level topic structure and their polar response variables. |

125 | Stochastic Convex Optimization with Multiple Objectives | Mehrdad Mahdavi, Tianbao Yang, Rong Jin | In this paper, we are interested in the development of efficient algorithms for convex optimization problems in the simultaneous presence of multiple objectives and stochasticity in the first-order information. |

126 | A Kernel Test for Three-Variable Interactions | Dino Sejdinovic, Arthur Gretton, Wicher Bergsma | We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. |

127 | Memoized Online Variational Inference for Dirichlet Process Mixture Models | Michael C. Hughes, Erik Sudderth | We present a new algorithm, memoized online variational inference, which scales to very large (yet finite) datasets while avoiding the complexities of stochastic gradient. |

128 | Designed Measurements for Vector Count Data | Liming Wang, David E. Carlson, Miguel Rodrigues, David Wilcox, Robert Calderbank, Lawrence Carin | We consider design of linear projection measurements for a vector Poisson signal model. |

129 | Robust Transfer Principal Component Analysis with Rank Constraints | Yuhong Guo | In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. |

130 | Online Learning with Switching Costs and Other Adaptive Adversaries | Nicol� Cesa-Bianchi, Ofer Dekel, Ohad Shamir | We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. |

131 | Learning Prices for Repeated Auctions with Strategic Buyers | Kareem Amin, Afshin Rostamizadeh, Umar Syed | We present seller algorithms that are no-regret when the buyer discounts her future surplus — i.e. the buyer prefers showing advertisements to users sooner rather than later. |

132 | Probabilistic Principal Geodesic Analysis | Miaomiao Zhang, Tom Fletcher | Inspired by probabilistic PCA, we present a latent variable model for PGA that provides a probabilistic framework for factor analysis on manifolds. |

133 | Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models | Adel Javanmard, Andrea Montanari | We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values. |

134 | Learning with Noisy Labels | Nagarajan Natarajan, Inderjit S. Dhillon, Pradeep K. Ravikumar, Ambuj Tewari | In this paper, we theoretically study the problem of binary classification in the presence of random classification noise — the learner, instead of seeing the true labels, sees labels that have independently been flipped with some small probability. |

135 | Tracking Time-varying Graphical Structure | Erich Kummerfeld, David Danks | In this paper, we present LoSST, a novel, heuristic structure learning algorithm that tracks changes in graphical model structure or parameters in a dynamic, real-time manner. |

136 | Factorized Asymptotic Bayesian Inference for Latent Feature Models | Kohei Hayashi, Ryohei Fujimaki | This paper extends factorized asymptotic Bayesian (FAB) inference for latent feature models~(LFMs). |

137 | More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server | Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B. Gibbons, Garth A. Gibson, Greg Ganger, Eric P. Xing | We propose a parameter server system for distributed ML, which follows a Stale Synchronous Parallel (SSP) model of computation that maximizes the time computational workers spend doing useful work on ML algorithms, while still providing correctness guarantees. |

138 | Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models | Jie Liu, David Page | Posterior inference usually involves calculating intractable terms, and we propose two approximation algorithms, namely a Metropolis-Hastings algorithm with auxiliary variables and a Gibbs sampling algorithm with stripped Beta approximation (Gibbs_SBA). |

139 | Online Learning with Costly Features and Labels | Navid Zolghadr, Gabor Bartok, Russell Greiner, Andr�s Gy�rgy, Csaba Szepesvari | We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. |

140 | Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions | Eftychios A. Pnevmatikakis, Liam Paninski | We propose a compressed sensing (CS) calcium imaging framework for monitoring large neuronal populations, where we image randomized projections of the spatial calcium concentration at each timestep, instead of measuring the concentration at individual locations. |

141 | A Novel Two-Step Method for Cross Language Representation Learning | Min Xiao, Yuhong Guo | In this paper, we propose a two-step representation learning method to bridge the feature spaces of different languages by exploiting a set of parallel bilingual documents. |

142 | On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations | Tamir Hazan, Subhransu Maji, Tommi Jaakkola | In this paper we describe how MAP inference can be used to sample efficiently from Gibbs distributions. |

143 | Graphical Models for Inference with Missing Data | Karthika Mohan, Judea Pearl, Jin Tian | We address the problem of deciding whether there exists a consistent estimator of a given relation Q, when data are missing not at random. |

144 | Reshaping Visual Datasets for Domain Adaptation | Boqing Gong, Kristen Grauman, Fei Sha | We extensively evaluate our approach on object recognition and human activity recognition tasks. |

145 | Statistical Active Learning Algorithms | Maria-Florina F. Balcan, Vitaly Feldman | We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise. |

146 | Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits | Ben Shababo, Brooks Paige, Ari Pakman, Liam Paninski | We present a realistic statistical model which accounts for the main sources of variability in this experiment and allows for large amounts of information about the biological system to be incorporated if available. |

147 | Reflection methods for user-friendly submodular optimization | Stefanie Jegelka, Francis Bach, Suvrit Sra | It is solved through a sequence of reflections and its solution can be automatically thresholded to obtain an optimal discrete solution. |

148 | Unsupervised Structure Learning of Stochastic And-Or Grammars | Kewei Tu, Maria Pavlovskaia, Song-Chun Zhu | We present a unified formalization of stochastic And-Or grammars that is agnostic to the type of the data being modeled, and propose an unsupervised approach to learning the structures as well as the parameters of such grammars. |

149 | Convex Tensor Decomposition via Structured Schatten Norm Regularization | Ryota Tomioka, Taiji Suzuki | We propose a new class of structured Schatten norms for tensors that includes two recently proposed norms (overlapped” and “latent”) for convex-optimization-based tensor decomposition. |

150 | Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs | Yann Dauphin, Yoshua Bengio | To generalize this idea to RBMs, we propose a stochastic ratio-matching algorithm that inherits all the computational advantages and unbiasedness of the importance sampling scheme. |

151 | Learning Chordal Markov Networks by Constraint Satisfaction | Jukka Corander, Tomi Janhunen, Jussi Rintanen, Henrik Nyman, Johan Pensar | We investigate the problem of learning the structure of a Markov network from data. |

152 | Parametric Task Learning | Ichiro Takeuchi, Tatsuya Hongo, Masashi Sugiyama, Shinichi Nakajima | We introduce a novel formulation of multi-task learning (MTL) called parametric task learning (PTL) that can systematically handle infinitely many tasks parameterized by a continuous parameter. |

153 | A Deep Architecture for Matching Short Texts | Zhengdong Lu, Hang Li | In this paper, we propose a new deep architecture to more effectively model the complicated matching relations between two objects from heterogeneous domains. |

154 | Computing the Stationary Distribution Locally | Christina E. Lee, Asuman Ozdaglar, Devavrat Shah | In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some $\Delta \in (0,1)$. |

155 | Nonparametric Multi-group Membership Model for Dynamic Networks | Myunghwan Kim, Jure Leskovec | We propose a nonparametric multi-group membership model for dynamic networks. |

156 | Adaptive Step-Size for Policy Gradient Methods | Matteo Pirotta, Marcello Restelli, Luca Bascetta | In this paper, we propose to determine the learning rate by maximizing a lower bound to the expected performance gain. |

157 | Optimistic Concurrency Control for Distributed Unsupervised Learning | Xinghao Pan, Joseph E. Gonzalez, Stefanie Jegelka, Tamara Broderick, Michael I. Jordan | We demonstrate our approach in three problem areas: clustering, feature learning and online facility location. |

158 | Reservoir Boosting : Between Online and Offline Ensemble Learning | Leonidas Lefakis, Fran�ois Fleuret | We propose to train an ensemble with the help of a reservoir in which the learning algorithm can store a limited number of samples. |

159 | Multiclass Total Variation Clustering | Xavier Bresson, Thomas Laurent, David Uminsky, James von Brecht | This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. |

160 | Approximate Inference in Continuous Determinantal Processes | Raja Hafiz Affandi, Emily Fox, Ben Taskar | In this paper, we present efficient approximate DPP sampling schemes based on Nystrom and random Fourier feature approximations that apply to a wide range of kernel functions. |

161 | Global Solver and Its Efficient Approximation for Variational Bayesian Low-rank Subspace Clustering | Shinichi Nakajima, Akiko Takeda, S. Derin Babacan, Masashi Sugiyama, Ichiro Takeuchi | In this paper, we overcome this difficulty for low-rank subspace clustering (LRSC) by providing an exact global solver and its efficient approximation. |

162 | Thompson Sampling for 1-Dimensional Exponential Family Bandits | Nathaniel Korda, Emilie Kaufmann, Remi Munos | Here we extend them by proving asymptotic optimality of the algorithm using the Jeffreys prior for $1$-dimensional exponential family bandits. |

163 | Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion | Nguyen Viet Cuong, Wee Sun Lee, Nan Ye, Kian Ming A. Chai, Hai Leong Chieu | We introduce a new objective function for pool-based Bayesian active learning with probabilistic hypotheses. |

164 | It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals | Barbara Rakitsch, Christoph Lippert, Karsten Borgwardt, Oliver Stegle | Here, we propose a multi-task Gaussian process approach to model both the relatedness between regressors as well as the task correlations in the residuals, in order to more accurately identify true sharing between regressors. |

165 | Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses | Harish G. Ramaswamy, Shivani Agarwal, Ambuj Tewari | We give an explicit construction of a convex least-squares type surrogate loss that can be designed to be calibrated for any multiclass learning problem for which the target loss matrix has a low-rank structure; the surrogate loss operates on a surrogate target space of dimension at most the rank of the target loss. |

166 | Inverse Density as an Inverse Problem: the Fredholm Equation Approach | Qichao Que, Mikhail Belkin | We address the problem of estimating the ratio $\frac{q}{p}$ where $p$ is a density function and $q$ is another density, or, more generally an arbitrary function. |

167 | Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising | Forest Agostinelli, Michael R. Anderson, Honglak Lee | We present the multi-column stacked sparse denoising autoencoder, a novel technique of combining multiple SSDAs into a multi-column SSDA (MC-SSDA) by combining the outputs of each SSDA. |

168 | EDML for Learning Parameters in Directed and Undirected Graphical Models | Khaled S. Refaat, Arthur Choi, Adnan Darwiche | In this paper, we propose a greatly simplified perspective on EDML, which casts it as a general approach to continuous optimization. |

169 | Similarity Component Analysis | Soravit Changpinyo, Kuan Liu, Fei Sha | In this paper, we propose Similarity Component Analysis (SCA), a probabilistic graphical model that discovers those latent components from data. |

170 | Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs | Vikash K. Mansinghka, Tejas D. Kulkarni, Yura N. Perov, Josh Tenenbaum | We describe two applications: reading sequences of degraded and adversarially obscured alphanumeric characters, and inferring 3D road models from vehicle-mounted camera images. |

171 | Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation | John Duchi, Martin J. Wainwright, Michael I. Jordan | We provide a detailed study of the estimation of probability distributions—discrete and continuous—in a stringent setting in which data is kept private even from the statistician. |

172 | Firing rate predictions in optimal balanced networks | David G. Barrett, Sophie Den�ve, Christian K. Machens | This is an important problem because firing rates are one of the most important measures of network activity, in both the study of neural computation and neural network dynamics. |

173 | Manifold-based Similarity Adaptation for Label Propagation | Masayuki Karasuyama, Hiroshi Mamitsuka | We propose a method for a graph to capture the manifold structure of input features using edge weights parameterized by a similarity function. |

174 | Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty | Haichao Zhang, David Wipf | Using ideas from Bayesian inference and convex analysis, this paper derives a non-uniform blind deblurring algorithm with several desirable, yet previously-unexplored attributes. |

175 | Near-Optimal Entrywise Sampling for Data Matrices | Dimitris Achlioptas, Zohar S. Karnin, Edo Liberty | We consider the problem of independently sampling $s$ non-zero entries of a matrix $A$ in order to produce a sparse sketch of it, $B$, that minimizes $\|A-B\|_2$. |

176 | Learning to Prune in Metric and Non-Metric Spaces | Leonid Boytsov, Bilegsaikhan Naidan | We employ a VP-tree and explore two simple yet effective learning-to prune approaches: density estimation through sampling and “stretching” of the triangle inequality. |

177 | Online learning in episodic Markovian decision processes by relative entropy policy search | Alexander Zimin, Gergely Neu | We study the problem of online learning in finite episodic Markov decision processes where the loss function is allowed to change between episodes. |

178 | Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result | Paul Wagner | As our second main result, we show for a substantial subset of soft-greedy value function approaches that, while having the potential to avoid policy oscillation and policy chattering, this subset can never converge toward any optimal policy, except in a certain pathological case. |

179 | Bayesian Hierarchical Community Discovery | Charles Blundell, Yee Whye Teh | We propose an efficient Bayesian nonparametric model for discovering hierarchical community structure in social networks. |

180 | From Bandits to Experts: A Tale of Domination and Independence | Noga Alon, Nicol� Cesa-Bianchi, Claudio Gentile, Yishay Mansour | We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir (2011). |

181 | Predictive PAC Learning and Process Decompositions | Cosma Shalizi, Aryeh Kontorovich | In this paper, we argue that it is natural in predictive PAC to condition not on the past observations but on the mixture component of the sample path. |

182 | Pass-efficient unsupervised feature selection | Crystal Maung, Haim Schweitzer | We propose a new algorithm, a modification of the classical pivoted QR algorithm of Businger and Golub, that requires a small number of passes over the data. |

183 | Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors | Xiaoqin Zhang, Di Wang, Zhengyuan Zhou, Yi Ma | In this work, we propose a general method for recovering low-rank three-order tensors, in which the data can be deformed by some unknown transformation and corrupted by arbitrary sparse errors. |

184 | Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search | Aijun Bai, Feng Wu, Xiaoping Chen | In this paper we present a novel Bayesian mixture modelling and inference based Thompson sampling approach to addressing this dilemma. |

185 | Solving inverse problem of Markov chain with partial observations | Tetsuro Morimura, Takayuki Osogami, Tsuyoshi Ide | We formulate this task as a regularized optimization problem for probability functions, which is efficiently solved using the notion of natural gradient. |

186 | Locally Adaptive Bayesian Multivariate Time Series | Daniele Durante, Bruno Scarpa, David B. Dunson | We propose a continuous multivariate stochastic process for time series having locally varying smoothness in both the mean and covariance matrix. |

187 | Mapping paradigm ontologies to and from the brain | Yannick Schwartz, Bertrand Thirion, Gael Varoquaux | To that end, we propose a method that predicts the experimental paradigms across different studies. |

188 | Noise-Enhanced Associative Memories | Amin Karbasi, Amir Hesam Salavati, Amin Shokrollahi, Lav R. Varshney | Here we consider associative memories with noisy internal computations and analytically characterize performance. |

189 | Exact and Stable Recovery of Pairwise Interaction Tensors | Shouyuan Chen, Michael R. Lyu, Irwin King, Zenglin Xu | In this paper, we study the recovery algorithm for pairwise interaction tensors, which has recently gained considerable attention for modeling multiple attribute data due to its simplicity and effectiveness. |

190 | Bayesian entropy estimation for binary spike train data using parametric prior knowledge | Evan W. Archer, Il Memming Park, Jonathan W. Pillow | The parametric model captures high-level statistical features of the data, such as the average spike count in a spike word, which allows the posterior over entropy to concentrate more rapidly than with standard estimators (e.g., in cases where the probability of spiking differs strongly from 0.5). |

191 | Perfect Associative Learning with Spike-Timing-Dependent Plasticity | Christian Albers, Maren Westkott, Klaus Pawelzik | Recent extensions of the Perceptron, as e.g. the Tempotron, suggest that this theoretical concept is highly relevant also for understanding networks of spiking neurons in the brain. |

192 | On Poisson Graphical Models | Eunho Yang, Pradeep K. Ravikumar, Genevera I. Allen, Zhandong Liu | In this paper, our objective is to modify the Poisson graphical model distribution so that it can capture a rich dependence structure between count-valued variables. |

193 | Streaming Variational Bayes | Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael I. Jordan | We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. |

194 | Gaussian Process Conditional Copulas with Applications to Financial Time Series | Jos� Miguel Hern�ndez-Lobato, James R. Lloyd, Daniel Hern�ndez-Lobato | To account for this, a Bayesian framework for the estimation of conditional copulas is proposed. |

195 | Extracting regions of interest from biological images with convolutional sparse block coding | Marius Pachitariu, Adam M. Packer, Noah Pettit, Henry Dalgleish, Michael Hausser, Maneesh Sahani | Formally, the model can be described as convolutional sparse block coding. |

196 | Approximate Dynamic Programming Finally Performs Well in the Game of Tetris | Victor Gabillon, Mohammad Ghavamzadeh, Bruno Scherrer | In this paper, we put our conjecture to test by applying such an ADP algorithm, called classification-based modified policy iteration (CBMPI), to the game of Tetris. |

197 | Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections | Matthew Lawlor, Steven W. Zucker | Association field models have been used to explain human contour grouping performance and to explain the mean frequency of long-range horizontal connections across cortical columns in V1. |

198 | DESPOT: Online POMDP Planning with Regularization | Adhiraj Somani, Nan Ye, David Hsu, Wee Sun Lee | This paper presents an online lookahead search algorithm that alleviates these difficulties by limiting the search to a set of sampled scenarios. |

199 | Matrix Completion From any Given Set of Observations | Troy Lee, Adi Shraibman | We present a means to obtain performance guarantees with respect to any set of initial observations. |

200 | Regression-tree Tuning in a Streaming Setting | Samory Kpotufe, Francesco Orabona | We consider the problem of maintaining the data-structures of a partition-based regression procedure in a setting where the training data arrives sequentially over time. |

201 | Multiscale Dictionary Learning for Estimating Conditional Distributions | Francesca Petralia, Joshua T. Vogelstein, David B. Dunson | We propose a multiscale dictionary learning model, which expresses the conditional response density as a convex combination of dictionary densities, with the densities used and their weights dependent on the path through a tree decomposition of the feature space. |

202 | Dimension-Free Exponentiated Gradient | Francesco Orabona | We present a new online learning algorithm that extends the exponentiated gradient to infinite dimensional spaces. |

203 | Stochastic Optimization of PCA with Capped MSG | Raman Arora, Andy Cotter, Nati Srebro | We study PCA as a stochastic optimization problem and propose a novel stochastic approximation algorithm which we refer to as Matrix Stochastic Gradient” (MSG), as well as a practical variant, Capped MSG. |

204 | On Flat versus Hierarchical Classification in Large-Scale Taxonomies | Rohit Babbar, Ioannis Partalas, Eric Gaussier, Massih R. Amini | We study in this paper flat and hierarchical classification strategies in the context of large-scale taxonomies. To this end, we first propose a multiclass, hierarchical data dependent bound on the generalization error of classifiers deployed in large-scale taxonomies. |

205 | Learning Gaussian Graphical Models with Observed or Latent FVSs | Ying Liu, Alan Willsky | In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles. |

206 | Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies | Yangqing Jia, Joshua T. Abbott, Joseph L. Austerweil, Tom Griffiths, Trevor Darrell | We present an algorithm for learning visual concepts directly from images, using probabilistic predictions generated by visual classifiers as the input to a Bayesian generalization model. |

207 | Robust Bloom Filters for Large MultiLabel Classification Tasks | Moustapha M. Cisse, Nicolas Usunier, Thierry Arti�res, Patrick Gallinari | This paper presents an approach to multilabel classification (MLC) with a large number of labels. |

208 | Solving the multi-way matching problem by permutation synchronization | Deepti Pachauri, Risi Kondor, Vikas Singh | In contrast, we propose a new method, permutation synchronization, which finds all the matchings jointly, in one shot, via a relaxation to eigenvector decomposition. |

209 | Generalizing Analytic Shrinkage for Arbitrary Covariance Structures | Daniel Bartz, Klaus-Robert M�ller | We show that the proof of consistency implies bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. |

210 | Top-Down Regularization of Deep Belief Networks | Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim | We propose to implement the scheme using a method to regularize deep belief networks with top-down information. |

211 | Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions | Tamir Hazan, Subhransu Maji, Joseph Keshet, Tommi Jaakkola | In this work we develop efficient methods for learning random MAP predictors for structured label problems. |

212 | Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms | Yu Zhang | In this paper, different from existing methods, we propose local learning methods for multi-task classification and regression problems based on heterogeneous neighborhood which is defined on data points from all tasks. |

213 | Machine Teaching for Bayesian Learners in the Exponential Family | Jerry Zhu | We propose an optimal teaching framework aimed at learners who employ Bayesian models. |

214 | Scoring Workers in Crowdsourcing: How Many Control Questions are Enough? | Qiang Liu, Alexander T. Ihler, Mark Steyvers | We study the problem of estimating continuous quantities, such as prices, probabilities, and point spreads, using a crowdsourcing approach. |

215 | Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths | Stefan Mathe, Cristian Sminchisescu | Our work makes three contributions towards addressing this problem. |

216 | A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data | Jasper Snoek, Richard Zemel, Ryan P. Adams | We develop a novel model based on a determinantal point process over latent embeddings of neurons that effectively captures and helps visualize complex inhibitory and competitive interaction. |

217 | Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model | Fang Han, Han Liu | In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. |

218 | Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation | Bogdan Savchynskyy, J�rg Hendrik Kappes, Paul Swoboda, Christoph Schn�rr | We propose a novel method of combining combinatorial and convex programming techniques to obtain a global solution of the initial combinatorial problem. |

219 | Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic | James L. Sharpnack, Akshay Krishnamurthy, Aarti Singh | In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. |

220 | Demixing odors – fast inference in olfaction | Agnieszka Grabska-Barwinska, Jeff Beck, Alexandre Pouget, Peter Latham | Here we derive neural implementations of two approximate inference algorithms that could be used by the brain. |

221 | Learning Multiple Models via Regularized Weighting | Daniel Vainsencher, Shie Mannor, Huan Xu | We propose a different general formulation that seeks for each model a distribution over data points; the weights are regularized to be sufficiently spread out. |

222 | When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity | Anima Anandkumar, Daniel J. Hsu, Majid Janzamin, Sham M. Kakade | In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. |

223 | Distributed k-means and k-median Clustering on General Topologies |
Maria-Florina F. Balcan, Steven Ehrlich, Yingyu Liang | This paper provides new algorithms for distributed clustering for two popular center-based objectives, $k$-median and $k$-means. |

224 | Multi-Task Bayesian Optimization | Kevin Swersky, Jasper Snoek, Ryan P. Adams | In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. |

225 | Online Learning of Dynamic Parameters in Social Networks | Shahin Shahrampour, Sasha Rakhlin, Ali Jadbabaie | Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state. |

226 | A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles | Jinwoo Shin, Andrew E. Gelfand, Misha Chertkov | In this paper, we design a BP algorithm for the Maximum Weight Matching (MWM) problem over general graphs. |

227 | Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space | Xinhua Zhang, Wee Sun Lee, Yee Whye Teh | In this paper, we propose a framework for learning in reproducing kernel Hilbert spaces (RKHS) using local invariances that explicitly characterize the behavior of the target function around data instances. |

228 | Approximate Gaussian process inference for the drift function in stochastic differential equations | Andreas Ruttor, Philipp Batz, Manfred Opper | We introduce a nonparametric approach for estimating drift functions in systems of stochastic differential equations from incomplete observations of the state vector. |

229 | Distributed Submodular Maximization: Identifying Representative Elements in Massive Data | Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, Andreas Krause | In this paper, we consider the problem of submodular function maximization in a distributed fashion. |

230 | Adaptive Market Making via Online Learning | Jacob Abernethy, Satyen Kale | We propose a class of spread-based market making strategies whose performance can be controlled even under worst-case (adversarial) settings. |

231 | On the Sample Complexity of Subspace Learning | Alessandro Rudi, Guillermo D. Canas, Lorenzo Rosasco | In this paper we introduce a general formulation of this problem and derive novel learning error estimates. |

232 | Spike train entropy-rate estimation using hierarchical Dirichlet process priors | Karin C. Knudson, Jonathan W. Pillow | We present both a fully Bayesian and empirical Bayes entropy rate estimator based on this model, and demonstrate their performance on simulated and real neural spike train data. |

233 | Embed and Project: Discrete Sampling with Universal Hashing | Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman | We propose a sampling algorithm, called PAWS, based on embedding the set into a higher-dimensional space which is then randomly projected using universal hash functions to a lower-dimensional subspace and explored using combinatorial search methods. |

234 | Discriminative Transfer Learning with Tree-based Priors | Nitish Srivastava, Ruslan R. Salakhutdinov | This paper proposes a way of improving classification performance for classes which have very few training examples. |

235 | Small-Variance Asymptotics for Hidden Markov Models | Anirban Roychowdhury, Ke Jiang, Brian Kulis | We present a small-variance asymptotic analysis of the Hidden Markov Model and its infinite-state Bayesian nonparametric extension. |

236 | Convergence of Monte Carlo Tree Search in Simultaneous Move Games | Viliam Lisy, Vojta Kovarik, Marc Lanctot, Branislav Bosansky | In this paper, we study Monte Carlo tree search (MCTS) in zero-sum extensive-form games with perfect information and simultaneous moves. |

237 | DeViSE: A Deep Visual-Semantic Embedding Model | Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc’Aurelio Ranzato, Tomas Mikolov | In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. |

238 | Reward Mapping for Transfer in Long-Lived Agents | Xiaoxiao Guo, Satinder Singh, Richard L. Lewis | We consider how to transfer knowledge from previous tasks to a current task in long-lived and bounded agents that must solve a sequence of MDPs over a finite lifetime. |

239 | Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation | Martin Azizyan, Aarti Singh, Larry Wasserman | In this paper, we provide precise information theoretic bounds on the clustering accuracy and sample complexity of learning a mixture of two isotropic Gaussians in high dimensions under small mean separation. |

240 | Predicting Parameters in Deep Learning | Misha Denil, Babak Shakibi, Laurent Dinh, Marc’Aurelio Ranzato, Nando de Freitas | We demonstrate that there is significant redundancy in the parameterization of several deep learning models. |

241 | Estimating the Unseen: Improved Estimators for Entropy and other Properties | Paul Valiant, Gregory Valiant | We propose a novel modification of this approach and show: 1) theoretically, our estimator is optimal (to constant factors, over worst-case instances), and 2) in practice, it performs exceptionally well for a variety of estimation tasks, on a variety of natural distributions, for a wide range of parameters. |

242 | What do row and column marginals reveal about your dataset? | Behzad Golshan, John Byers, Evimaria Terzi | Here, we investigate how these data can be exploited to make inferences about the underlying matrix H. Instead of assuming a generative model for H, we view the input marginals as constraints on the dataspace of possible realizations of H and compute the probability density function of particular entries H(i,j) of interest. |

243 | RNADE: The real-valued neural autoregressive density-estimator | Benigno Uria, Iain Murray, Hugo Larochelle | We introduce RNADE, a new model for joint density estimation of real-valued vectors. |

244 | Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards | Thomas Bonald, Alexandre Proutiere | We propose a novel algorithm where the decision to exploit any arm is based on two successive targets, namely, the total number of successes until the first failure and the first $m$ failures, respectively, where $m$ is a fixed parameter. |

245 | Reconciling "priors" & "priors" without prejudice? | Remi Gribonval, Pierre Machart | The contribution of this paper is twofold. |

246 | Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis | Nikhil Rao, Christopher Cox, Rob Nowak, Timothy T. Rogers | The main contribution of this paper is a new procedure called {\em Sparse Overlapping Sets (SOS) lasso}, a convex optimization that automatically selects similar features for related learning tasks. |

247 | Sensor Selection in High-Dimensional Gaussian Trees with Nuisances | Daniel S. Levine, Jonathan P. How | We consider the sensor selection problem on multivariate Gaussian distributions where only a \emph{subset} of latent variables is of inferential interest. |

248 | Sequential Transfer in Multi-armed Bandit with Finite Set of Models | Mohammad Gheshlaghi azar, Alessandro Lazaric, Emma Brunskill | We introduce a novel bandit algorithm based on a method-of-moments approach for the estimation of the possible tasks and derive regret bounds for it. |

249 | Buy-in-Bulk Active Learning | Liu Yang, Jaime Carbonell | In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. |

250 | Contrastive Learning Using Spectral Methods | James Y. Zou, Daniel J. Hsu, David C. Parkes, Ryan P. Adams | This paper formalizes this notion of contrastive learning for mixture models, and develops spectral algorithms for inferring mixture components specific to a foreground data set when contrasted with a background data set. |

251 | Message Passing Inference with Chemical Reaction Networks | Nils E. Napp, Ryan P. Adams | In this work, we develop a procedure that can take arbitrary probabilistic graphical models, represented as factor graphs over discrete random variables, and compile them into chemical reaction networks that implement inference. |

252 | Eluder Dimension and the Sample Complexity of Optimistic Exploration | Daniel Russo, Benjamin Van Roy | In this paper, we develop a regret bound that holds for both classes of algorithms. |

253 | Learning word embeddings efficiently with noise-contrastive estimation | Andriy Mnih, Koray Kavukcuoglu | We propose a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation. |

254 | Sparse Inverse Covariance Estimation with Calibration | Tuo Zhao, Han Liu | We propose a semiparametric procedure for estimating high dimensional sparse inverse covariance matrix. |

255 | Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization | Julien Mairal | In this paper, we intend to make this principle scalable. |

256 | Sinkhorn Distances: Lightspeed Computation of Optimal Transport | Marco Cuturi | We propose in this work a new family of optimal transportation distances that look at transportation problems from a maximum-entropy perspective. |

257 | Speedup Matrix Completion with Side Information: Application to Multi-Label Learning | Miao Xu, Rong Jin, Zhi-Hua Zhou | In this work, we develop a novel theory of matrix completion that explicitly explore the side information to reduce the requirement on the number of observed entries. |

258 | Compete to Compute | Rupesh K. Srivastava, Jonathan Masci, Sohrob Kazerounian, Faustino Gomez, J�rgen Schmidhuber | We apply the concept to gradient-based, backprop-trained artificial multilayer NNs. |

259 | Fast Determinantal Point Process Sampling with Application to Clustering | Byungkon Kang | In this paper, we address this problem by constructing a rapidly mixing Markov chain, from which we can acquire a sample from the given DPP in sub-cubic time. |

260 | Information-theoretic lower bounds for distributed statistical estimation with communication constraints | Yuchen Zhang, John Duchi, Michael I. Jordan, Martin J. Wainwright | We establish minimax risk lower bounds for distributed statistical estimation given a budget $B$ of the total number of bits that may be communicated. |

261 | Projected Natural Actor-Critic | Philip S. Thomas, William C. Dabney, Stephen Giguere, Sridhar Mahadevan | In this paper we address a drawback of natural actor-critics that limits their real-world applicability – their lack of safety guarantees. |

262 | How to Hedge an Option Against an Adversary: Black-Scholes Pricing is Minimax Optimal | Jacob Abernethy, Peter L. Bartlett, Rafael Frongillo, Andre Wibisono | We consider a popular problem in finance, option pricing, through the lens of an online learning game between Nature and an Investor. |

263 | Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests | Yacine Jernite, Yonatan Halpern, David Sontag | We give a polynomial-time algorithm for provably learning the structure and parameters of bipartite noisy-or Bayesian networks of binary variables where the top layer is completely hidden. |

264 | Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion | Franz Kiraly, Louis Theran | We propose a general framework for reconstructing and denoising single entries of incomplete and noisy entries. |

265 | Learning the Local Statistics of Optical Flow | Dan Rosenbaum, Daniel Zoran, Yair Weiss | Motivated by recent progress in natural image statistics, we use newly available datasets with ground truth optical flow to learn the local statistics of optical flow and rigorously compare the learned model to prior models assumed by computer vision optical flow algorithms. |

266 | Aggregating Optimistic Planning Trees for Solving Markov Decision Processes | Gunnar Kedenburg, Raphael Fonteneau, Remi Munos | We propose a new algorithm which is based on the construction of a forest of single successor state planning trees. |

267 | Robust learning of low-dimensional dynamics from large neural ensembles | David Pfau, Eftychios A. Pnevmatikakis, Liam Paninski | Here, we present an approach to dimensionality reduction for neural data that is convex, does not make strong assumptions about dynamics, does not require averaging over many trials and is extensible to more complex statistical models that combine local and global influences. |

268 | Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising | Min Xu, Tao Qin, Tie-Yan Liu | In this paper, we show that the naive application of MAB algorithms to search advertising for advertisement selection will produce sample selection bias that harms the search engine by decreasing expected revenue and “estimation of the largest mean” (ELM) bias that harms the advertisers by increasing game-theoretic player-regret. |

269 | Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization | Nataliya Shapovalova, Michalis Raptis, Leonid Sigal, Greg Mori | We propose a new weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. |

270 | A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables | Jing Xiang, Seyoung Kim | In this paper, we propose a single-stage method, called A* lasso, that recovers the optimal sparse Bayesian network structure by solving a single optimization problem with A* search algorithm that uses lasso in its scoring system. |

271 | The Total Variation on Hypergraphs – Learning on Hypergraphs Revisited | Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram | In this paper we present a new learning framework on hypergraphs which fully uses the hypergraph structure. |

272 | Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints | Rishabh K. Iyer, Jeff A. Bilmes | We investigate two new optimization problems — minimizing a submodular function subject to a submodular lower bound constraint (submodular cover) and maximizing a submodular function subject to a submodular upper bound constraint (submodular knapsack). |

273 | Scalable Inference for Logistic-Normal Topic Models | Jianfei Chen, Jun Zhu, Zi Wang, Xun Zheng, Bo Zhang | This paper presents a partially collapsed Gibbs sampling algorithm that approaches the provably correct distribution by exploring the ideas of data augmentation. |

274 | Spectral methods for neural characterization using generalized quadratic models | Il Memming Park, Evan W. Archer, Nicholas Priebe, Jonathan W. Pillow | We describe a set of fast, tractable methods for characterizing neural responses to high-dimensional sensory stimuli using a model we refer to as the generalized quadratic model (GQM). |

275 | Universal models for binary spike patterns using centered Dirichlet processes | Il Memming Park, Evan W. Archer, Kenneth Latimer, Jonathan W. Pillow | To overcome these limitations, we propose a family of “universal” models for binary spike patterns, where universality refers to the ability to model arbitrary distributions over all $2^m$ binary patterns. |

276 | Synthesizing Robust Plans under Incomplete Domain Models | Tuan A. Nguyen, Subbarao Kambhampati, Minh Do | Most current planners assume complete domain models and focus on generating correct plans. |

277 | Integrated Non-Factorized Variational Inference | Shaobo Han, Xuejun Liao, Lawrence Carin | We present a non-factorized variational method for full posterior inference in Bayesian hierarchical models, with the goal of capturing the posterior variable dependencies via efficient and possibly parallel computation. |

278 | Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions | Ari Pakman, Liam Paninski | We present a new approach to sample from generic binary distributions, based on an exact Hamiltonian Monte Carlo algorithm applied to a piecewise continuous augmentation of the binary distribution of interest. |

279 | Symbolic Opportunistic Policy Iteration for Factored-Action MDPs | Aswin Raghavan, Roni Khardon, Alan Fern, Prasad Tadepalli | Our ﬁrst contribution is a novel method for symbolic policy backups via the application of constraints, which is used to yield a new efﬁcient symbolic imple- mentation of modiﬁed PI (MPI) for factored action spaces. |

280 | Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions | Yasin Abbasi, Peter L. Bartlett, Varun Kanade, Yevgeny Seldin, Csaba Szepesvari | We present an algorithm that, under a mixing assumption, achieves $O(\sqrt{T\log|\Pi|}+\log|\Pi|)$ regret with respect to a comparison set of policies $\Pi$. |

281 | Flexible sampling of discrete data correlations without the marginal distributions | Alfredo Kalaitzis, Ricardo Silva | We present an efficient algorithm based on recent advances on constrained Hamiltonian Markov chain Monte Carlo that is simple to implement and does not require paying for a quadratic cost in sample size. |

282 | One-shot learning by inverting a compositional causal process | Brenden M. Lake, Ruslan R. Salakhutdinov, Josh Tenenbaum | Here we present a Hierarchical Bayesian model based on compositionality and causality that can learn a wide range of natural (although simple) visual concepts, generalizing in human-like ways from just one image. |

283 | Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators. | Michel Besserve, Nikos K. Logothetis, Bernhard Sch�lkopf | Here we provide a general framework for the statistical analysis of these interactions when random variables are sampled from stationary time-series of arbitrary objects. |

284 | Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis | James R. Voss, Luis Rademacher, Mikhail Belkin | The two main contributions of this work are as follows: 1. |

285 | Deep Neural Networks for Object Detection | Christian Szegedy, Alexander Toshev, Dumitru Erhan | In this paper we go one step further and address the problem of object detection — not only classifying but also precisely localizing objects of various classes using DNNs. |

286 | Geometric optimisation on positive definite matrices for elliptically contoured distributions | Suvrit Sra, Reshad Hosseini | In this paper we develop \emph{geometric optimisation} for globally optimising certain nonconvex loss functions arising in the modelling of data via elliptically contoured distributions (ECDs). |

287 | Sign Cauchy Projections and Chi-Square Kernel | Ping Li, Gennady Samorodnitsk, John Hopcroft | In this paper, we propose to use only the signs of the projected data and show that the probability of collision (i.e., when the two signs differ) can be accurately approximated as a function of the chi-square ($\chi^2$) similarity, which is a popular measure for nonnegative data (e.g., when features are generated from histograms as common in text and vision applications). |

288 | Relevance Topic Model for Unstructured Social Group Activity Recognition | Fang Zhao, Yongzhen Huang, Liang Wang, Tieniu Tan | To tackle this problem, we propose a relevance topic model” for jointly learning meaningful mid-level representations upon bag-of-words (BoW) video representations and a classifier with sparse weights. |

289 | k-Prototype Learning for 3D Rigid Structures | Hu Ding, Ronald Berezney, Jinhui Xu | In this paper, we study the following new variant of prototype learning, called {\em $k$-prototype learning problem for 3D rigid structures}: Given a set of 3D rigid structures, find a set of $k$ rigid structures so that each of them is a prototype for a cluster of the given rigid structures and the total cost (or dissimilarity) is minimized. |

290 | Restricting exchangeable nonparametric distributions | Sinead A. Williamson, Steve N. MacEachern, Eric P. Xing | In this paper, we propose a class of exchangeable nonparametric priors obtained by restricting the domain of existing models. |

291 | Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting | Shunan Zhang, Angela J. Yu | We investigate this behavior in the context of a multi-armed bandit task. |

292 | Probabilistic Movement Primitives | Alexandros Paraschos, Christian Daniel, Jan R. Peters, Gerhard Neumann | We present a probabilistic formulation of the MP concept that maintains a distribution over trajectories. |

293 | Policy Shaping: Integrating Human Feedback with Reinforcement Learning | Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L. Isbell, Andrea L. Thomaz | In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. |

294 | Multilinear Dynamical Systems for Tensor Time Series | Mark Rogers, Lei Li, Stuart J. Russell | In this paper, we propose the multilinear dynamical system (MLDS) for modeling tensor time series and an expectation-maximization (EM) algorithm to estimate the parameters. |

295 | Deep content-based music recommendation | Aaron van den Oord, Sander Dieleman, Benjamin Schrauwen | In this paper, we propose to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data. |

296 | A Stability-based Validation Procedure for Differentially Private Machine Learning | Kamalika Chaudhuri, Staal A. Vinterbo | In this paper, we introduce a generic validation procedure for differentially private machine learning algorithms that apply when a certain stability condition holds on the training algorithm and the validation performance metric. |

297 | Capacity of strong attractor patterns to model behavioural and cognitive prototypes | Abbas Edalat | We solve the mean field equations for a stochastic Hopfield network with temperature (noise) in the presence of strong, i.e., multiply stored patterns, and use this solution to obtain the storage capacity of such a network. |

298 | Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA | Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe | We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-$d$ projection matrices (the Fantope). |

299 | Cluster Trees on Manifolds | Sivaraman Balakrishnan, Srivatsan Narayanan, Alessandro Rinaldo, Aarti Singh, Larry Wasserman | We investigate the problem of estimating the cluster tree for a density $f$ supported on or near a smooth $d$-dimensional manifold $M$ isometrically embedded in $\mathbb{R}^D$. |

300 | Bayesian inference for low rank spatiotemporal neural receptive fields | Mijung Park, Jonathan W. Pillow | In particular, we introduce a novel prior over low-rank RFs using the restriction of a matrix normal prior to the manifold of low-rank matrices. |

301 | Adaptive Submodular Maximization in Bandit Setting | Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan | We propose an efficient algorithm for solving our problem and prove that its expected cumulative regret increases logarithmically with time. |

302 | Generalized Method-of-Moments for Rank Aggregation | Hossein Azari Soufiani, William Chen, David C. Parkes, Lirong Xia | In this paper we propose a class of efficient Generalized Method-of-Moments(GMM) algorithms for computing parameters of the Plackett-Luce model, where the data consists of full rankings over alternatives. |

303 | Analyzing Hogwild Parallel Gaussian Gibbs Sampling | Matthew J. Johnson, James Saunderson, Alan Willsky | We develop a framework which provides convergence conditions and error bounds along with simple proofs and connections to methods in numerical linear algebra. |

304 | Minimax Optimal Algorithms for Unconstrained Linear Optimization | Brendan McMahan, Jacob Abernethy | We design and analyze minimax-optimal algorithms for online linear optimization games where the player’s choice is unconstrained. |

305 | (Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings | Abhradeep Guha Thakurta, Adam Smith | We provide a general technique for making online learning algorithms differentially private, in both the full information and bandit settings. |

306 | Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions | Rishabh K. Iyer, Stefanie Jegelka, Jeff A. Bilmes | In the former two problems, we obtain these bounds through a generic black-box transformation (which can potentially work for any algorithm), while in the case of submodular minimization, we propose a framework of algorithms which depend on choosing an appropriate surrogate for the submodular function. |

307 | S-Optimality for Active Learning on Gaussian Random Fields | Yifei Ma, Roman Garnett, Jeff Schneider | In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. |

308 | Learning Kernels Using Local Rademacher Complexity | Corinna Cortes, Marius Kloft, Mehryar Mohri | We devise two new learning kernel algorithms: one based on a convex optimization problem for which we give an efficient solution using existing learning kernel techniques, and another one that can be formulated as a DC-programming problem for which we describe a solution in detail. |

309 | Annealing between distributions by averaging moments | Roger B. Grosse, Chris J. Maddison, Ruslan R. Salakhutdinov | We present a novel sequence of intermediate distributions for exponential families: averaging the moments of the initial and target distributions. |

310 | Optimizing Instructional Policies | Robert V. Lindsey, Michael C. Mozer, William J. Huggins, Harold Pashler | We propose an experimental technique for searching policy spaces using Gaussian process surrogate-based optimization and a generative model of student performance. |

311 | Translating Embeddings for Modeling Multi-relational Data | Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, Oksana Yakhnenko | Our objective is to propose a canonical model which is easy to train, contains a reduced number of parameters and can scale up to very large databases. |

312 | Phase Retrieval using Alternating Minimization | Praneeth Netrapalli, Prateek Jain, Sujay Sanghavi | In this paper, we show that a simple alternating minimization algorithm geometrically converges to the solution of one such problem — finding a vector $x$ from $y,A$, where $y = |A’x|$ and $|z|$ denotes a vector of element-wise magnitudes of $z$ — under the assumption that $A$ is Gaussian. |

313 | Real-Time Inference for a Gamma Process Model of Neural Spiking | David E. Carlson, Vinayak Rao, Joshua T. Vogelstein, Lawrence Carin | Via exploratory data analysis—using data with partial ground truth as well as two novel data sets—we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) de- tecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using mul- tiple channels. |

314 | Understanding Dropout | Pierre Baldi, Peter J. Sadowski | We introduce a general formalism for studying dropout on either units or connections, with arbitrary probability values, and use it to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. |

315 | The Power of Asymmetry in Binary Hashing | Behnam Neyshabur, Nati Srebro, Ruslan R. Salakhutdinov, Yury Makarychev, Payman Yadollahpour | When approximating binary similarity using the hamming distance between short binary hashes, we shown that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps. |

316 | Estimation, Optimization, and Parallelism when Data is Sparse | John Duchi, Michael I. Jordan, Brendan McMahan | We study stochastic optimization problems when the \emph{data} is sparse, which is in a sense dual to the current understanding of high-dimensional statistical learning and optimization. |

317 | A multi-agent control framework for co-adaptation in brain-computer interfaces | Josh S. Merel, Roy Fox, Tony Jebara, Liam Paninski | We present an approach to model this process of co-adaptation between the encoding model of the neural signal and the decoding algorithm as a multi-agent formulation of the linear quadratic Gaussian (LQG) control problem. |

318 | Modeling Overlapping Communities with Node Popularities | Prem K. Gopalan, Chong Wang, David Blei | We develop a probabilistic approach for accurate network modeling using node popularities within the framework of the mixed-membership stochastic blockmodel (MMSB). |

319 | Learning from Limited Demonstrations | Beomjoon Kim, Amir-massoud Farahmand, Joelle Pineau, Doina Precup | We propose an approach to learning from demonstration (LfD) which leverages expert data, even if the expert examples are very few or inaccurate. |

320 | On the Complexity and Approximation of Binary Evidence in Lifted Inference | Guy Van den Broeck, Adnan Darwiche | In this paper, we balance this grim result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. |

321 | On the Representational Efficiency of Restricted Boltzmann Machines | James Martens, Arkadev Chattopadhya, Toni Pitassi, Richard Zemel | This paper examines the question: What kinds of distributions can be efficiently represented by Restricted Boltzmann Machines (RBMs)? |

322 | Memory Limited, Streaming PCA | Ioannis Mitliagkas, Constantine Caramanis, Prateek Jain | We present an algorithm that achieves both: it uses $O(kp)$ memory (meaning storage of any kind) and is able to compute the $k$-dimensional spike with $O(p \log p)$ sample-complexity — the first algorithm of its kind. |

323 | An Approximate, Efficient LP Solver for LP Rounding | Srikrishna Sridhar, Stephen Wright, Christopher Re, Ji Liu, Victor Bittorf, Ce Zhang | We propose a scheme that is based on a quadratic program relaxation which allows us to use parallel stochastic-coordinate-descent to approximately solve large linear programs efficiently. |

324 | Linear decision rule as aspiration for simple decision heuristics | �zg�r Simsek | This research has identified three environmental structures that aid heuristics: dominance, cumulative dominance, and noncompensatoriness. |

325 | On the Relationship Between Binary Classification, Bipartite Ranking, and Binary Class Probability Estimation | Harikrishna Narasimhan, Shivani Agarwal | In this paper, we introduce the notion of weak regret transfer bounds, where the mapping needed to transform a model from one problem to another depends on the underlying probability distribution (and in practice, must be estimated from data). |

326 | Bayesian inference as iterated random functions with applications to sequential inference in graphical models | Arash Amini, XuanLong Nguyen | We propose a general formalism of iterated random functions with semigroup property, under which exact and approximate Bayesian posterior updates can be viewed as specific instances. |

327 | Compressive Feature Learning | Hristo S. Paskov, Robert West, John C. Mitchell, Trevor Hastie | This paper addresses the problem of unsupervised feature learning for text data. |

328 | Moment-based Uniform Deviation Bounds for k-means and Friends |
Matus J. Telgarsky, Sanjoy Dasgupta | Moment-based Uniform Deviation Bounds for k-means and Friends |

329 | Fast Template Evaluation with Vector Quantization | Mohammad Amin Sadeghi, David Forsyth | We describe a method that achieves a substantial end-to-end speedup over the best current methods, without loss of accuracy. |

330 | Context-sensitive active sensing in humans | Sheeraz Ahmad, He Huang, Angela J. Yu | Here, we propose a myopic approximation to C-DAC, which also takes behavioral costs into account, but achieves a significant reduction in complexity by looking only one step ahead. We also present data from a human active visual search experiment, and compare the performance of the various models against human behavior. |

331 | A New Convex Relaxation for Tensor Completion | Bernardino Romera-Paredes, Massimiliano Pontil | In this paper, we highlight some limitations of this approach and propose an alternative convex relaxation on the Euclidean unit ball. |

332 | Variational Planning for Graph-based MDPs | Qiang Cheng, Qiang Liu, Feng Chen, Alexander T. Ihler | We present a new variational framework to describe and solve the planning problem of MDPs, and derive both exact and approximate planning algorithms. |

333 | Convex Two-Layer Modeling | �zlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans | Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training. |

334 | Sketching Structured Matrices for Faster Nonlinear Regression | Haim Avron, Vikas Sindhwani, David Woodruff | We present empirical results confirming both the practical value of our modeling framework, as well as speedup benefits of randomized regression.” |

335 | (More) Efficient Reinforcement Learning via Posterior Sampling | Ian Osband, Daniel Russo, Benjamin Van Roy | Most provably efficient learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. |

336 | Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition | Adel Javanmard, Andrea Montanari | We assume that only a small subset of covariates is `active’ (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. |

337 | Efficient Exploration and Value Function Generalization in Deterministic Systems | Zheng Wen, Benjamin Van Roy | We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization. |

338 | Bellman Error Based Feature Generation using Random Projections on Sparse Spaces | Mahdi Milani Fard, Yuri Grinberg, Amir-massoud Farahmand, Joelle Pineau, Doina Precup | We propose a simple, fast and robust algorithm based on random projections, which generates BEBFs for sparse feature spaces. |

339 | Learning and using language via recursive pragmatic reasoning about other agents | Nathaniel J. Smith, Noah Goodman, Michael Frank | We describe a model in which language learners assume that they jointly approximate a shared, external lexicon and reason recursively about the goals of others in using this lexicon. |

340 | Learning Stochastic Inverses | Andreas Stuhlm�ller, Jacob Taylor, Noah Goodman | To make use of inverses before convergence, we describe the Inverse MCMC algorithm, which uses stochastic inverses to make block proposals for a Metropolis-Hastings sampler. |

341 | Learning invariant representations and applications to face verification | Qianli Liao, Joel Z. Leibo, Tomaso Poggio | In accord with a recent theory of transformation-invariance, we propose a model that, while capturing other common convolutional networks as special cases, can also be used with arbitrary identity-preserving transformations. |

342 | Optimization, Learning, and Games with Predictable Sequences | Sasha Rakhlin, Karthik Sridharan | We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. |

343 | Adaptivity to Local Smoothness and Dimension in Kernel Regression | Samory Kpotufe, Vikas Garg | We present the first result for kernel regression where the procedure adapts locally at a point $x$ to both the unknown local dimension of the metric and the unknown H\{o}lder-continuity of the regression function at $x$. |

344 | Adaptive dropout for training deep neural networks | Jimmy Ba, Brendan Frey | We describe a model in which a binary belief network is overlaid on a neural network and is used to decrease the information content of its hidden units by selectively setting activities to zero. |

345 | Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream | Daniel L. Yamins, Ha Hong, Charles Cadieu, James J. DiCarlo | In this work, we construct models of the ventral stream using a novel optimization procedure for category-level object recognition problems, and produce RDMs resembling both macaque IT and human ventral stream. |

346 | Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex | Sam Patterson, Yee Whye Teh | In this paper we investigate the use of Langevin Monte Carlo methods on the probability simplex and propose a new method, Stochastic gradient Riemannian Langevin dynamics, which is simple to implement and can be applied online. |

347 | Distributed Representations of Words and Phrases and their Compositionality | Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, Jeff Dean | In this paper we present several improvements that make the Skip-gram model more expressive and enable it to learn higher quality vectors more rapidly. |

348 | Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel | Tai Qin, Karl Rohe | The current paper extends the previous theoretical results to the more canonical spectral clustering algorithm in a way that removes any assumption on the minimum degree and provides guidance on the choice of tuning parameter. |

349 | Analyzing the Harmonic Structure in Graph-Based Learning | Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang | In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. |

350 | Recurrent linear models of simultaneously-recorded neural populations | Marius Pachitariu, Biljana Petreska, Maneesh Sahani | Here we describe a new, scalable approach to discovering the low-dimensional dynamics that underlie simultaneously recorded spike trains from a neural population. |

351 | Scalable Influence Estimation in Continuous-Time Diffusion Networks | Nan Du, Le Song, Manuel Gomez Rodriguez, Hongyuan Zha | In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. |

352 | Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC | Roger Frigola, Fredrik Lindsten, Thomas B. Sch�n, Carl Edward Rasmussen | We present a fully Bayesian approach to inference and learning in nonlinear nonparametric state-space models. |

353 | BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables | Cho-Jui Hsieh, Matyas A. Sustik, Inderjit S. Dhillon, Pradeep K. Ravikumar, Russell Poldrack | In this paper, we develop an algorithm BigQUIC, which can solve 1 million dimensional l1-regularized Gaussian MLE problems (which would thus have 1000 billion parameters) using a single machine, with bounded memory. |

354 | The Fast Convergence of Incremental PCA | Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund | We prove the first finite-sample convergence rates for any incremental PCA algorithm using sub-quadratic time and memory per iteration. |

355 | Multisensory Encoding, Decoding, and Identification | Aurel A. Lazar, Yevgeniy Slutskiy | We investigate a spiking neuron model of multisensory integration. |

356 | Adaptive Anonymity via b-Matching |
Krzysztof M. Choromanski, Tony Jebara, Kui Tang | Novel algorithms and theory are provided to implement this type of anonymity. |

357 | Optimal integration of visual speed across different spatiotemporal frequency channels | Matjaz Jogan, Alan A. Stocker | Here we propose that perceived speed is the result of optimal integration of speed information from independent spatiotemporal frequency tuned channels. |

358 | Matrix factorization with binary components | Martin Slawski, Matthias Hein, Pavlo Lutsik | Motivated by an application in computational biology, we consider constrained low-rank matrix factorization problems with $\{0,1\}$-constraints on one of the factors. |

359 | Learning to Pass Expectation Propagation Messages | Nicolas Heess, Daniel Tarlow, John Winn | In this work, we study the question of whether it is possible to automatically derive fast and accurate EP updates by learning a discriminative model e.g., a neural network or random forest) to map EP message inputs to EP message outputs. |

360 | Robust Low Rank Kernel Embeddings of Multivariate Distributions | Le Song, Bo Dai | In this paper, we propose a hierarchical low rank decomposition of kernels embeddings which can exploit such low rank structures in data while being robust to model misspecification. |