Paper Digest: NIPS 2014 Highlights

December 7, 2014October 6, 2019 admin

The Conference on Neural Information Processing Systems (NIPS) is one of the top machine learning conferences in the world. In 2014, it is to be held in Montreal, Canada.

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team
team@paperdigest.org

TABLE 1: NIPS 2014 Papers

	Title	Authors	Highlight
1	Kernel Mean Estimation via Spectral Filtering	Krikamol Muandet, Bharath Sriperumbudur, Bernhard Sch�lkopf	Using the kernel PCA basis, we show that some of these estimators can be constructed using spectral filtering algorithms which are shown to be consistent under some technical assumptions.
2	Semi-Separable Hamiltonian Monte Carlo for Inference in Bayesian Hierarchical Models	Yichuan Zhang, Charles Sutton	We introduce a new RMHMC method, which we call semi-separable Hamiltonian Monte Carlo, which uses a specially designed mass matrix that allows the joint Hamiltonian over model parameters and hyperparameters to decompose into two simpler Hamiltonians.
3	Communication Efficient Distributed Machine Learning with the Parameter Server	Mu Li, David G. Andersen, Alexander J. Smola, Kai Yu	This paper describes a third-generation parameter server framework for distributed machine learning.
4	The Infinite Mixture of Infinite Gaussian Mixtures	Halid Z. Yerebakan, Bartek Rajwa, Murat Dundar	Herein, we present the infinite mixture of infinite Gaussian mixtures (I2GMM) for more flexible modeling of data sets with skewed and multi-modal cluster distributions.
5	Robust Classification Under Sample Selection Bias	Anqi Liu, Brian Ziebart	We develop a framework for robustly learning a probabilistic classifier that adapts to different sample selection biases using a minimax estimation formulation.
6	Zeta Hull Pursuits: Learning Nonconvex Data Hulls	Yuanjun Xiong, Wei Liu, Deli Zhao, Xiaoou Tang	Employing this measure, we present a greedy algorithmic framework, dubbed Zeta Hulls, to perform structured column sampling.
7	Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction	Katerina Fragkiadaki, Marta Salas, Pablo Arbelaez, Jitendra Malik	In this paper, we make a first attempt towards this goal, and propose a method that combines dense optical flow tracking, motion trajectory clustering and NRSfM for 3D reconstruction of objects in videos.
8	Sparse Space-Time Deconvolution for Calcium Image Analysis	Ferran Diego Andilla, Fred A. Hamprecht	We describe a unified formulation and algorithm to find an extremely sparse representation for Calcium image sequences in terms of cell locations, cell shapes, spike timings and impulse responses.
9	Restricted Boltzmann machines modeling human choice	Takayuki Osogami, Makoto Otsuka	We extend the multinomial logit model to represent some of the empirical phenomena that are frequently observed in the choices made by humans.
10	Multiscale Fields of Patterns	Pedro Felzenszwalb, John G. Oberlin	We describe a framework for defining high-order image models that can be used in a variety of applications.
11	large scale canonical correlation analysis with iterative least squares	Yichao Lu, Dean P. Foster	In this paper we introduce L-CCA, an iterative algorithm which can compute CCA fast on huge sparse datasets.
12	Altitude Training: Strong Bounds for Single-Layer Dropout	Stefan Wager, William Fithian, Sida Wang, Percy S. Liang	This paper proposes a theoretical explanation for this phenomenon: we show that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization.
13	Rounding-based Moves for Metric Labeling	M. Pawan Kumar	We consider a large class of parallel rounding procedures, and design move-making algorithms that closely mimic them.
14	Parallel Double Greedy Submodular Maximization	Xinghao Pan, Stefanie Jegelka, Joseph E. Gonzalez, Joseph K. Bradley, Michael I. Jordan	In this work, we propose two methods to parallelize the double-greedy algorithm.
15	Multivariate Regression with Calibration	Han Liu, Lie Wang, Tuo Zhao	We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models.
16	Exact Post Model Selection Inference for Marginal Screening	Jason D. Lee, Jonathan E. Taylor	We develop a framework for post model selection inference, via marginal screening, in linear regression.
17	On a Theory of Nonparametric Pairwise Similarity for Clustering: Connecting Clustering to Classification	Yingzhen Yang, Feng Liang, Shuicheng Yan, Zhangyang Wang, Thomas S. Huang	In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class classification.
18	Just-In-Time Learning for Fast and Flexible Inference	S. M. Ali Eslami, Daniel Tarlow, Pushmeet Kohli, John Winn	Motivated by this, we introduce just-in-time learning, a framework for fast and flexible inference that learns to speed up inference at run-time.
19	Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation	Ohad Shamir	In this paper, we describe how a single set of results implies positive answers to the above, for several different settings.
20	Quantized Kernel Learning for Feature Matching	Danfeng Qin, Xuanli Chen, Matthieu Guillaumin, Luc V. Gool	In this paper, we introduce a simple and flexible family of non-linear kernels which we refer to as Quantized Kernels (QK).
21	Parallel Direction Method of Multipliers	Huahua Wang, Arindam Banerjee, Zhi-Quan Luo	In this paper, we propose a parallel randomized block coordinate method named Parallel Direction Method of Multipliers (PDMM) to solve the optimization problems with multi-block linear constraints.
22	(Almost) No Label No Cry	Giorgio Patrini, Richard Nock, Paul Rivera, Tiberio Caetano	We provide a fast learning algorithm that estimates the mean operator via a manifold regularizer with guaranteed approximation bounds.
23	Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards	Omar Besbes, Yonatan Gur, Assaf Zeevi	In this paper, we focus on a MAB formulation which allows for a broad range of temporal uncertainties in the rewards, while still maintaining mathematical tractability.
24	Object Localization based on Structural SVM using Privileged Information	Jan Feyereisl, Suha Kwak, Jeany Son, Bohyung Han	We propose a structured prediction algorithm for object localization based on Support Vector Machines (SVMs) using privileged information.
25	Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations	Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang	To take into account this instinct, this paper proposes a novel deep neural net, named multi-view perceptron (MVP), which can untangle the identity and view features, and infer a full spectrum of multi-view images in the meanwhile, given a single 2D face image.
26	Shape and Illumination from Shading using the Generic Viewpoint Assumption	Daniel Zoran, Dilip Krishnan, Jos� Bento, Bill Freeman	In this paper, we show how to utilize the GVA to estimate shape and illumination from a single shading image, without the use of other priors.
27	Parallel Sampling of HDPs using Sub-Cluster Splits	Jason Chang, John W. Fisher III	We develop a sampling technique for Hierarchical Dirichlet process models.
28	From MAP to Marginals: Variational Inference in Bayesian Submodular Models	Josip Djolonga, Andreas Krause	In particular, we present L-Field, a variational approach to general log-submodular and log-supermodular distributions based on sub- and supergradients.
29	Robust Logistic Regression and Classification	Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan	We propose a new robust logistic regression algorithm, called RoLR, that estimates the parameter through a simple linear programming procedure.
30	Extracting Certainty from Uncertainty: Transductive Pairwise Classification from Pairwise Similarities	Tianbao Yang, Rong Jin	In this work, we study the problem of transductive pairwise classification from pairwise similarities~\footnote{The pairwise similarities are usually derived from some side information instead of the underlying class labels.}
31	A Unified Semantic Embedding: Relating Taxonomies and Attributes	Sung Ju Hwang, Leonid Sigal	We propose a method that learns a discriminative yet semantic space for object categorization, where we also embed auxiliary semantic entities such as supercategories and attributes.
32	Transportability from Multiple Environments with Limited Experiments: Completeness Results	Elias Bareinboim, Judea Pearl	This paper addresses the problem of $mz$-transportability, that is, transferring causal knowledge collected in several heterogeneous domains to a target domain in which only passive observations and limited experimental data can be collected.
33	Augmentative Message Passing for Traveling Salesman Problem and Graph Partitioning	Siamak Ravanbakhsh, Reihaneh Rabbany, Russell Greiner	2) For graph-partitioning (a.k.a. community mining) using modularity optimization, we introduce a binary variable model with a large number of constraints that enforce formation of cliques.
34	Causal Inference through a Witness Protection Program	Ricardo Silva, Robin Evans	We introduce a novel approach for estimating causal effects that exploits observational conditional independencies to suggest “weak” paths in a unknown causal graph.
35	Incremental Clustering: The Case for Extra Clusters	Margareta Ackerman, Sanjoy Dasgupta	In this paper, we initiate the formal analysis of incremental clustering methods focusing on the types of cluster structure that they are able to detect.
36	Multi-scale Graphical Models for Spatio-Temporal Processes	firdaus janoos, Huseyin Denli, Niranjan Subrahmanya	In this paper, we present a hierarchical graphical model with physically derived priors that better represents the multi-scale character of these dynamical systems.
37	Iterative Neural Autoregressive Distribution Estimator NADE-k	Tapani Raiko, Yao Li, Kyunghyun Cho, Yoshua Bengio	We propose a new model that extends this inference scheme to multiple steps, arguing that it is easier to learn to improve a reconstruction in $k$ steps rather than to learn to reconstruct in a single inference step.
38	Sparse PCA via Covariance Thresholding	Yash Deshpande, Andrea Montanari	In an influential paper, Johnstone and Lu \cite{johnstone2004sparse} introduced a simple algorithm that estimates the support of the principal vectors $\bv_1,\dots,\bv_r$ by the largest entries in the diagonal of the empirical covariance.
39	Low-dimensional models of neural population activity in sensory cortical circuits	Evan W. Archer, Urs Koster, Jonathan W. Pillow, Jakob H. Macke	We introduce a fast estimation method using online expectation maximization with Laplace approximations.
40	A Synaptical Story of Persistent Activity with Graded Lifetime in a Neural System	Yuanyuan Mi, Luozheng Li, Dahui Wang, Si Wu	We propose a simple yet effective mechanism to achieve this goal, which utilizes the property of short-term plasticity (STP) of neuronal synapses.
41	A Representation Theory for Ranking Functions	Harsh H. Pareek, Pradeep K. Ravikumar	This paper presents a representation theory for permutation-valued functions, which in their general form can also be called listwise ranking functions.
42	Near-optimal sample compression for nearest neighbors	Lee-Ad Gottlieb, Aryeh Kontorovich, Pinhas Nisnevitch	We present the first sample compression algorithm for nearest neighbors with non-trivial performance guarantees.
43	Combinatorial Pure Exploration of Multi-Armed Bandits	Shouyuan Chen, Tian Lin, Irwin King, Michael R. Lyu, Wei Chen	We present general learning algorithms which work for all decision classes that admit offline maximization oracles in both fixed confidence and fixed budget settings.
44	Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces	Minh Ha Quang, Marco San Biagio, Vittorio Murino	This paper introduces a novel mathematical and computational framework, namely {\it Log-Hilbert-Schmidt metric} between positive definite operators on a Hilbert space.
45	Consistency of Spectral Partitioning of Uniform Hypergraphs under Planted Partition Model	Debarghya Ghoshdastidar, Ambedkar Dukkipati	In this paper, we view these models as random m-uniform hypergraphs and establish the consistency of spectral algorithm in this general setting.
46	Spectral Clustering of graphs with the Bethe Hessian	Alaa Saade, Florent Krzakala, Lenka Zdeborov�	Here, we propose to use instead a simpler object, a symmetric real matrix known as the Bethe Hessian operator, or deformed Laplacian.
47	Fast and Robust Least Squares Estimation in Corrupted Linear Models	Brian McWilliams, Gabriel Krummenacher, Mario Lucic, Joachim M. Buhmann	The concept of influence that was developed for regression diagnostics can be used to detect such corrupted observations as shown in this paper.
48	Local Decorrelation For Improved Pedestrian Detection	Woonhyun Nam, Piotr Dollar, Joon Hee Han	Inspired by recent work on discriminative decorrelation of HOG features, we instead propose an efficient feature transform that removes correlations in local neighborhoods.
49	Robust Kernel Density Estimation by Scaling and Projection in Hilbert Space	Robert A. Vandermeulen, Clayton Scott	We present a robust version of the popular kernel density estimator (KDE).
50	Beyond Disagreement-Based Agnostic Active Learning	Chicheng Zhang, Kamalika Chaudhuri	In this paper, we provide such an algorithm.
51	Bayes-Adaptive Simulation-based Search with Value Function Approximation	Arthur Guez, Nicolas Heess, David Silver, Peter Dayan	We present a tractable method for approximating the Bayes-adaptive solution by combining simulation-based search with a novel value function approximation technique that generalises over belief space.
52	A State-Space Model for Decoding Auditory Attentional Modulation from MEG in a Competing-Speaker Environment	Sahar Akram, Jonathan Z. Simon, Shihab A. Shamma, Behtash Babadi	We present simulation studies as well as application to real MEG data from two human subjects.
53	Active Regression by Stratification	Sivan Sabato, Remi Munos	We propose a new active learning algorithm for parametric linear regression with random design.
54	Sensory Integration and Density Estimation	Joseph G. Makin, Philip N. Sabes	We prove here an analytical connection between these seemingly different tasks, density estimation and sensory integration; that the former implies the latter for the model used in [2]; but that this does not appear to be true for all models.
55	Learning Deep Features for Scene Recognition using Places Database	Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva	We propose new methods to compare the density and diversity of image datasets and show that Places is as dense as other scene datasets and has more diversity. Here, we introduce a new scene-centric database called Places with over 7 million labeled pictures of scenes.
56	A Complete Variational Tracker	Ryan D. Turner, Steven Bottone, Bhargav Avasarala	We introduce a novel probabilistic tracking algorithm that incorporates combinatorial data association constraints and model-based track management using variational Bayes.
57	Spike Frequency Adaptation Implements Anticipative Tracking in Continuous Attractor Neural Networks	Yuanyuan Mi, C. C. Alan Fung, K. Y. Michael Wong, Si Wu	Here we propose a simple yet effective mechanism to implement anticipative tracking in neural systems.
58	Efficient Sampling for Learning Sparse Additive Models in High Dimensions	Hemant Tyagi, Bernd G�rtner, Andreas Krause	Assuming $\phi_l$’s to be smooth, we propose a set of points at which to sample $f$ and an efficient randomized algorithm that recovers a \textit{uniform approximation} to each unknown $\phi_l$.
59	Deep Joint Task Learning for Generic Object Extraction	Xiaolong Wang, Liliang Zhang, Liang Lin, Zhujin Liang, Wangmeng Zuo	We present a general joint task learning framework, in which each task (either object localization or object segmentation) is tackled via a multi-layer convolutional neural network, and the two networks work collaboratively to boost performance.
60	Robust Bayesian Max-Margin Clustering	Changyou Chen, Jun Zhu, Xinhua Zhang	We present max-margin Bayesian clustering (BMC), a general and robust framework that incorporates the max-margin criterion into Bayesian clustering models, as well as two concrete models of BMC to demonstrate its flexibility and effectiveness in dealing with different clustering tasks.
61	Permutation Diffusion Maps (PDM) with Application to the Image Association Problem in Computer Vision	Deepti Pachauri, Risi Kondor, Gautam Sargur, Vikas Singh	In this paper we introduce a new method, Permutations Diffusion Maps (PDM), to solve the matching problem, as well as a related new affinity measure, derived using ideas from harmonic analysis on the symmetric group.
62	Bounded Regret for Finite-Armed Structured Bandits	Tor Lattimore, Remi Munos	We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret.
63	Coresets for k-Segmentation of Streaming Data	Guy Rosman, Mikhail Volkov, Dan Feldman, John W. Fisher III, Daniela Rus	We consider the problem of computing optimal segmentation of such signals by k-piecewise linear function, using only one pass over the data by maintaining a coreset for the signal.
64	Two-Stream Convolutional Networks for Action Recognition in Videos	Karen Simonyan, Andrew Zisserman	Our contribution is three-fold.
65	Discovering Structure in High-Dimensional Data Through Correlation Explanation	Greg Ver Steeg, Aram Galstyan	We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective.
66	Positive Curvature and Hamiltonian Monte Carlo	Christof Seiler, Simon Rubinstein-Salzedo, Susan Holmes	We show that positive curvature can be used to prove theoretical concentration results for HMC Markov chains.
67	Learning Mixed Multinomial Logit Model from Ordinal Data	Sewoong Oh, Devavrat Shah	To that end, we present a sufficient condition as well as an efficient algorithm for learning mixed MNL models from partial preferences/comparisons data.
68	Near-optimal Reinforcement Learning in Factored MDPs	Ian Osband, Benjamin Van Roy	We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sampling reinforcement learning (PSRL) and an upper confidence bound algorithm (UCRL-Factored).
69	Efficient learning by implicit exploration in bandit problems with side observations	Tom� Koc�k, Gergely Neu, Michal Valko, Remi Munos	For this setting, we propose the first algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions.
70	Repeated Contextual Auctions with Strategic Buyers	Kareem Amin, Afshin Rostamizadeh, Umar Syed	Motivated by real-time advertising exchanges, we analyze the problem of pricing inventory in a repeated posted-price auction.
71	Recursive Inversion Models for Permutations	Christopher Meek, Marina Meila	We describe how one can do parameter estimation and propose an approach to structure search for this class of models.
72	On the Convergence Rate of Decomposable Submodular Function Minimization	Robert Nishihara, Stefanie Jegelka, Michael I. Jordan	In this paper, we show that the algorithm converges linearly, and we provide upper and lower bounds on the rate of convergence.
73	New Rules for Domain Independent Lifted MAP Inference	Happy Mittal, Prasoon Goyal, Vibhav G. Gogate, Parag Singla	In this paper, we present two new lifting rules, which enable fast MAP inference in a large class of MLNs.
74	PAC-Bayesian AUC classification and scoring	James Ridgway, Pierre Alquier, Nicolas Chopin, Feng Liang	We derive in particular a Sequential Monte Carlo algorithm, as an efficient method which may be used as a gold standard, and an Expectation-Propagation algorithm, as a much faster but approximate method.
75	Optimization Methods for Sparse Pseudo-Likelihood Graphical Model Selection	Sang Oh, Onkar Dalal, Kshitij Khare, Bala Rajaratnam	In this paper, we address this crucial gap by proposing two proximal gradient methods (CONCORD-ISTA and CONCORD-FISTA) for performing $\ell_1$-regularized inverse covariance matrix estimation in the pseudo-likelihood framework.
76	On Prior Distributions and Approximate Inference for Structured Variables	Oluwasanmi O. Koyejo, Rajiv Khanna, Joydeep Ghosh, Russell Poldrack	We present a general framework for constructing prior distributions with structured variables.
77	On Iterative Hard Thresholding Methods for High-dimensional M-Estimation	Prateek Jain, Ambuj Tewari, Purushottam Kar	In this work we bridge this gap by providing the first analysis for IHT-style methods in the high dimensional statistical setting.
78	Online and Stochastic Gradient Methods for Non-decomposable Loss Functions	Purushottam Kar, Harikrishna Narasimhan, Prateek Jain	In this work we initiate a study of online learning techniques for such non-decomposable loss functions with an aim to enable incremental learning as well as design scalable solvers for batch problems.
79	Analysis of Learning from Positive and Unlabeled Data	Marthinus C. du Plessis, Gang Niu, Masashi Sugiyama	Analysis of Learning from Positive and Unlabeled Data
80	Dimensionality Reduction with Subspace Structure Preservation	Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju	In this paper, we propose a novel dimensionality reduction algorithm that theoretically preserves this structure for a given dataset.
81	Constrained convex minimization via model-based excessive gap	Quoc Tran-Dinh, Volkan Cevher	We introduce a model-based excessive gap technique to analyze first-order primal- dual methods for constrained convex minimization.
82	Poisson Process Jumping between an Unknown Number of Rates: Application to Neural Spike Data	Florian Stimberg, Andreas Ruttor, Manfred Opper	We introduce a model where the rate of an inhomogeneous Poisson process is modified by a Chinese restaurant process.
83	Probabilistic ODE Solvers with Runge-Kutta Means	Michael Schober, David K. Duvenaud, Philipp Hennig	We construct a family of probabilistic numerical methods that instead return a Gauss-Markov process defining a probability distribution over the ODE solution.
84	Optimal decision-making with time-varying evidence reliability	Jan Drugowitsch, Ruben Moreno-Bote, Alexandre Pouget	The work presented here describes the computation and characterization of optimal decision-making in the more realistic case of an evidence reliability that varies across time even within a trial.
85	Learning Shuffle Ideals Under Restricted Distributions	Dongqu Chen	In the empirical direction, we propose a heuristic algorithm for learning shuffle ideals from given labeled strings under general unrestricted distributions.
86	Discriminative Unsupervised Feature Learning with Convolutional Neural Networks	Alexey Dosovitskiy, Jost Tobias Springenberg, Martin Riedmiller, Thomas Brox	In this paper we present an approach for training a convolutional neural network using only unlabeled data.
87	Distance-Based Network Recovery under Feature Correlation	David Adametz, Volker Roth	We present an inference method for Gaussian graphical models when only pairwise distances of n objects are observed.
88	Bandit Convex Optimization: Towards Tight Bounds	Elad Hazan, Kfir Levy	In this paper we take a step towards understanding the best attainable regret bounds for BCO: we give an efficient and near-optimal regret algorithm for BCO with strongly-convex and smooth loss functions.
89	Projective dictionary pair learning for pattern classification	Shuhang Gu, Lei Zhang, Wangmeng Zuo, Xiangchu Feng	We propose a new discriminative DL framework, namely projective dictionary pair learning (DPL), which learns a synthesis dictionary and an analysis dictionary jointly to achieve the goal of signal representation and discrimination.
90	Provable Submodular Minimization using Wolfe's Algorithm	Deeparnab Chakrabarty, Prateek Jain, Pravesh Kothari	In this paper we give a maiden convergence analysis of Wolfe’s algorithm.
91	Exploiting easy data in online optimization	Amir Sani, Gergely Neu, Alessandro Lazaric	In this paper we introduce a general algorithm that, provided with a safe learning algorithm and an opportunistic benchmark, can effectively combine good worst-case guarantees with much improved performance on easy data.
92	Sparse Multi-Task Reinforcement Learning	Daniele Calandriello, Alessandro Lazaric, Marcello Restelli	In this paper we investigate the case when all the tasks can be accurately represented in a linear approximation space using the same small subset of the original (large) set of features.
93	Best-Arm Identification in Linear Bandits	Marta Soare, Alessandro Lazaric, Remi Munos	We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget.
94	Mind the Nuisance: Gaussian Process Classification using Privileged Noise	Daniel Hern�ndez-lobato, Viktoriia Sharmanska, Kristian Kersting, Christoph H. Lampert, Novi Quadrianto	Here, we show that privileged information can naturally be treated as noise in the latent function of a Gaussian process classifier (GPC).
95	Tight Bounds for Influence in Diffusion Networks and Application to Bond Percolation and Epidemiology	Remi Lemonnier, Kevin Scaman, Nicolas Vayatis	In this paper, we derive theoretical bounds for the long-term influence of a node in an Independent Cascade Model (ICM).
96	On the Computational Efficiency of Training Neural Networks	Roi Livni, Shai Shalev-Shwartz, Ohad Shamir	In this paper we revisit the computational complexity of training neural networks from a modern perspective.
97	Self-Adaptable Templates for Feature Coding	Xavier Boix, Gemma Roig, Salomon Diether, Luc V. Gool	In this paper, we analyze O2P as a coding-pooling scheme.
98	Multilabel Structured Output Learning with Random Spanning Trees of Max-Margin Markov Networks	Mario Marchand, Hongyu Su, Emilie Morvant, Juho Rousu, John S. Shawe-Taylor	We show that the usual score function for conditional Markov networks can be written as the expectation over the scores of their spanning trees.
99	Stochastic Network Design in Bidirected Trees	xiaojian wu, Daniel R. Sheldon, Shlomo Zilberstein	We investigate the problem of stochastic network design in bidirected trees.
100	Learning convolution filters for inverse covariance estimation of neural network connectivity	George Mohler	In this paper we show how inverse covariance estimation can be dramatically improved using a simple convolution filter prior to applying sample covariance.
101	SerialRank: Spectral Ranking using Seriation	Fajwel Fogel, Alexandre d’Aspremont, Milan Vojnovic	We describe a seriation algorithm for ranking a set of n items given pairwise comparisons between these items.
102	Clamping Variables and Approximate Inference	Adrian Weller, Tony Jebara	For an attractive model, we show that summing over the Bethe partition functions for each sub-model obtained after clamping any variable can only raise (and hence improve) the approximation.
103	Predictive Entropy Search for Efficient Global Optimization of Black-box Functions	Jos� Miguel Hern�ndez-Lobato, Matthew W. Hoffman, Zoubin Ghahramani	We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES).
104	A Block-Coordinate Descent Approach for Large-scale Sparse Inverse Covariance Estimation	Eran Treister, Javier S. Turek	In this paper we present a new block-coordinate descent approach for solving the problem for large-scale data sets.
105	Efficient Inference of Continuous Markov Random Fields with Polynomial Potentials	Shenlong Wang, Alex Schwing, Raquel Urtasun	In this paper, we prove that every multivariate polynomial with even degree can be decomposed into a sum of convex and concave polynomials.
106	Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices	Austin R. Benson, Jason D. Lee, Bartek Rajwa, David F. Gleich	In this paper, we show how to make these algorithms scalable for data matrices that have many more rows than columns, so-called tall-and-skinny matrices.”
107	Inferring synaptic conductances from spike trains with a biophysically inspired point process model	Kenneth W. Latimer, E.J. Chichilnisky, Fred Rieke, Jonathan W. Pillow	Here we introduce a biophysically inspired point process model that explicitly incorporates stimulus-induced changes in synaptic conductance in a dynamical model of neuronal membrane potential.
108	Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights	Daniel Soudry, Itay Hubara, Ron Meir	Multilayer Neural Networks (MNNs) are commonly trained using gradient descent-based methods, such as BackPropagation (BP).
109	Incremental Local Gaussian Regression	Franziska Meier, Philipp Hennig, Stefan Schaal	In this paper, we suggest a path from Gaussian (process) regression to locally weighted regression, where we retain the best of both approaches.
110	General Table Completion using a Bayesian Nonparametric Model	Isabel Valera, Zoubin Ghahramani	In this paper, we provide an efficient and robust table completion tool, based on a Bayesian nonparametric latent feature model.
111	Universal Option Models	hengshuai yao, Csaba Szepesvari, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar	We introduce a new model for an option that is independent of any reward function, called the {\it universal option model (UOM)}.
112	Approximating Hierarchical MV-sets for Hierarchical Clustering	Assaf Glazer, Omer Weissbrod, Michael Lindenbaum, Shaul Markovitch	We present empirical results that demonstrate the superiority of our method over existing ones.
113	Constant Nullspace Strong Convexity and Fast Convergence of Proximal Methods under High-Dimensional Settings	Ian En-Hsu Yen, Cho-Jui Hsieh, Pradeep K. Ravikumar, Inderjit S. Dhillon	In this paper, we investigate a novel variant of strong convexity, which we call Constant Nullspace Strong Convexity (CNSC), where we require that the objective function be strongly convex only over a constant subspace.
114	Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm	Deanna Needell, Rachel Ward, Nati Srebro	Our results are based on a connection we make between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods.
115	A Framework for Testing Identifiability of Bayesian Models of Perception	Luigi Acerbi, Wei Ji Ma, Sethu Vijayakumar	We propose a novel framework for a systematic testing of the identifiability of a significant class of Bayesian observer models, with practical applications for improving experimental design.
116	Optimistic Planning in Markov Decision Processes Using a Generative Model	Bal�zs Sz�r�nyi, Gunnar Kedenburg, Remi Munos	We consider the problem of online planning in a Markov decision process with discounted rewards for any given initial state.
117	Gaussian Process Volatility Model	Yue Wu, Jos� Miguel Hern�ndez-Lobato, Zoubin Ghahramani	To address these problems we introduce GP-Vol, a novel non-parametric model for time-changing variances based on Gaussian Processes.
118	A Safe Screening Rule for Sparse Logistic Regression	Jie Wang, Jiayu Zhou, Jun Liu, Peter Wonka, Jieping Ye	In this paper, we present a fast and effective sparse logistic regression screening rule (Slores) to identify the zero components in the solution vector, which may lead to a substantial reduction in the number of features to be entered to the optimization.
119	Hardness of parameter estimation in graphical models	Guy Bresler, David Gamarnik, Devavrat Shah	The goal of this paper is to investigate the computational feasibility of this statistical task.
120	Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics	Sergey Levine, Pieter Abbeel	We present a policy search method that uses iteratively refitted local linear models to optimize trajectory distributions for large, continuous problems.
121	Magnitude-sensitive preference formation`	Nisheeth Srivastava, Ed Vul, Paul R. Schrater	In this paper, we develop a theory of magnitude-sensitive preference learning that permits an agent to rationally infer its preferences for items compared with money options of different magnitudes.
122	Extreme bandits	Alexandra Carpentier, Michal Valko	In this paper, we study an efficient way to allocate these resources sequentially under limited feedback.
123	Distributed Estimation, Information Loss and Exponential Families	Qiang Liu, Alexander T. Ihler	We study a simple communication-efficient learning framework that first calculates the local maximum likelihood estimates (MLE) based on the data subsets, and then combines the local MLEs to achieve the best possible approximation to the global MLE, based on the whole dataset jointly.
124	Non-convex Robust PCA	Praneeth Netrapalli, Niranjan U N, Sujay Sanghavi, Animashree Anandkumar, Prateek Jain	We propose a new provable method for robust PCA, where the task is to recover a low-rank matrix, which is corrupted with sparse perturbations.
125	Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning	Francesco Orabona	In this paper, we propose a new kernel-based stochastic gradient descent algorithm that performs model selection while training, with no parameters to tune, nor any form of cross-validation.
126	Learning From Weakly Supervised Data by The Expectation Loss SVM (e-SVM) algorithm	Jun Zhu, Junhua Mao, Alan L. Yuille	We propose a novel learning algorithm called \emph{expectation loss SVM} (e-SVM) that is devoted to the problems where only the “positiveness” instead of a binary label of each training sample is available.
127	Message Passing Inference for Large Scale Graphical Models with High Order Potentials	Jian Zhang, Alex Schwing, Raquel Urtasun	In this paper we propose a partitioning strategy followed by a message passing algorithm which is able to exploit pre-computations.
128	Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors	Lingqiao Liu, Chunhua Shen, Lei Wang, Anton van den Hengel, Chao Wang	In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace.
129	Dependent nonparametric trees for dynamic hierarchical clustering	Kumar Avinava Dubey, Qirong Ho, Sinead A. Williamson, Eric P. Xing	In this paper, we present a distribution over collections of time-dependent, infinite-dimensional trees that can be used to model evolving hierarchies, and present an efficient and scalable algorithm for performing approximate inference in such a model.
130	Causal Strategic Inference in Networked Microfinance Economies	Mohammad T. Irfan, Luis E. Ortiz	We propose \emph{causal strategic inference} as a framework for conducting interventions and apply it to large, networked microfinance economies.
131	Learning Multiple Tasks in Parallel with a Shared Annotator	Haim Cohen, Koby Crammer	We introduce a new multi-task framework, in which $K$ online learners are sharing a single annotator with limited bandwidth.
132	Reducing the Rank in Relational Factorization Models by Including Observable Patterns	Maximilian Nickel, Xueyan Jiang, Volker Tresp	Based on our findings, we propose a novel additive tensor factorization model for learning from latent and observable patterns in multi-relational data and present a scalable algorithm for computing the factorization.
133	Clustering from Labels and Time-Varying Graphs	Shiau Hong Lim, Yudong Chen, Huan Xu	We present a general framework for graph clustering where a label is observed to each pair of nodes.
134	From Stochastic Mixability to Fast Rates	Nishant A. Mehta, Robert C. Williamson	The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of $(\ell,\mathcal{F}, \mathsf{P})$, and in so doing provides new insight into the fast-rates phenomenon.
135	Recovery of Coherent Data via Low-Rank Dictionary Pursuit	Guangcan Liu, Ping Li	The recently established RPCA method provides a convenient way to restore low-rank matrices from grossly corrupted observations.
136	Inferring sparse representations of continuous signals with continuous orthogonal matching pursuit	Karin C. Knudson, Jacob Yates, Alexander Huk, Jonathan W. Pillow	Here we present a fast method for recovering these identities, amplitudes, and translations.
137	Analysis of Variational Bayesian Latent Dirichlet Allocation: Weaker Sparsity Than MAP	Shinichi Nakajima, Issei Sato, Masashi Sugiyama, Kazuho Watanabe, Hiroko Kobayashi	In this paper, we theoretically investigate variational Bayesian (VB) learning in LDA.
138	Discovering, Learning and Exploiting Relevance	Cem Tekin, Mihaela Van Der Schaar	In this paper we consider the problem of learning online what is the information to consider when making sequential decisions.
139	Divide-and-Conquer Learning by Anchoring a Conical Hull	Tianyi Zhou, Jeff A. Bilmes, Carlos Guestrin	To find the $k$ anchors, we propose a novel divide-and-conquer learning scheme “DCA” that distributes the problem to $\mathcal O(k\log k)$ same-type sub-problems on different low-D random hyperplanes, each can be solved by any solver.
140	Extended and Unscented Gaussian Processes	Daniel M. Steinberg, Edwin V. Bonilla	We present two new methods for inference in Gaussian process (GP) models with general nonlinear likelihoods.
141	Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing	Yuchen Zhang, Xi Chen, Dengyong Zhou, Michael I. Jordan	In this paper, we propose a two-stage efficient algorithm for multi-class crowd labeling problems.
142	Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation	Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus	We present techniques for speeding up the test-time evaluation of large convolutional networks, designed for object recognition tasks.
143	Learning to Discover Efficient Mathematical Identities	Wojciech Zaremba, Karol Kurach, Rob Fergus	In this paper we explore how machine learning techniques can be applied to the discovery of efficient mathematical identities.
144	The Large Margin Mechanism for Differentially Private Maximization	Kamalika Chaudhuri, Daniel J. Hsu, Shuang Song	This work provides the first general purpose, range-independent algorithm for private maximization that guarantees approximate differential privacy.
145	DFacTo: Distributed Factorization of Tensors	Joon Hee Choi, S. Vishwanathan	We present a technique for significantly speeding up Alternating Least Squares (ALS) and Gradient Descent (GD), two widely used algorithms for tensor factorization.
146	Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology	Mehmet G�nen, Adam A. Margolin	In this paper, we propose a novel multiple kernel learning algorithm that extends kernel k-means clustering to the multiview setting, which combines kernels calculated on the views in a localized way to better capture sample-specific characteristics of the data.
147	Conditional Swap Regret and Conditional Correlated Equilibrium	Mehryar Mohri, Scott Yang	We present algorithms for minimizing conditional swap regret with bounded conditioning history.
148	Mode Estimation for High Dimensional Discrete Tree Graphical Models	Chao Chen, Han Liu, Dimitris Metaxas, Tianqi Zhao	An efficient algorithm with provable theoretical guarantees is proposed and is applied to applications like data analysis and multiple predictions.
149	Large-scale L-BFGS using MapReduce	Weizhu Chen, Zhenghao Wang, Jingren Zhou	In this paper, we study the problem of parallelizing the L-BFGS algorithm in large clusters of tens of thousands of shared-nothing commodity machines.
150	Submodular Attribute Selection for Action Recognition in Video	Jingjing Zheng, Zhuolin Jiang, Rama Chellappa, Jonathon P. Phillips	In this work, we encode actions based on attributes that describes actions as high-level concepts: \textit{e.g.}, jump forward and motion in the air.
151	Efficient Structured Matrix Rank Minimization	Adams Wei Yu, Wanli Ma, Yaoliang Yu, Jaime Carbonell, Suvrit Sra	We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map.
152	On Integrated Clustering and Outlier Detection	Lionel Ott, Linsey Pang, Fabio T. Ramos, Sanjay Chawla	We model the joint clustering and outlier detection problem using an extension of the facility location formulation.
153	A Drifting-Games Analysis for Online Learning and Applications to Boosting	Haipeng Luo, Robert E. Schapire	We provide a general mechanism to design online learning algorithms based on a minimax analysis within a drifting-games framework.
154	Projecting Markov Random Field Parameters for Fast Mixing	Xianghang Liu, Justin Domke	This paper gives sufficient conditions to guarantee that univariate Gibbs sampling on Markov Random Fields (MRFs) will be fast mixing, in a precise sense.
155	Automatic Discovery of Cognitive Skills to Improve the Prediction of Student Learning	Robert V. Lindsey, Mohammad Khajah, Michael C. Mozer	We propose a technique that uses student performance data to automatically discover the skills needed in a discipline.
156	Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures	Ananda Theertha Suresh, Alon Orlitsky, Jayadev Acharya, Ashkan Jafarpour	Conversely, we show that any estimator requires $\Omega\bigl({dk}/{\epsilon^2}\bigr)$ samples, hence the algorithm’s sample complexity is nearly optimal in the dimension.
157	Automated Variational Inference for Gaussian Process Models	Trung V. Nguyen, Edwin V. Bonilla	We develop an automated variational method for approximate inference in Gaussian process (GP) models whose posteriors are often intractable.
158	Learning Mixtures of Submodular Functions for Image Collection Summarization	Sebastian Tschiatschek, Rishabh K. Iyer, Haochen Wei, Jeff A. Bilmes	We address the problem of image collection summarization by learning mixtures of submodular functions. We provide a new dataset consisting of 14 real-world image collections along with many human-generated ground truth summaries collected using mechanical turk.
159	Robust Tensor Decomposition with Gross Corruption	Quanquan Gu, Huan Gui, Jiawei Han	In this paper, we study the statistical performance of robust tensor decomposition with gross corruption.
160	Provable Tensor Factorization with Missing Data	Prateek Jain, Sewoong Oh	We propose a novel alternating minimization based method which iteratively refines estimates of the singular vectors.
161	Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization	Meisam Razaviyayn, Mingyi Hong, Zhi-Quan Luo, Jong-Shi Pang	In this work, we propose an inexact parallel BCD approach where at each iteration, a subset of the variables is updated in parallel by minimizing convex approximations of the original objective function.
162	Using Convolutional Neural Networks to Recognize Rhythm ?Stimuli from Electroencephalography Recordings	Sebastian Stober, Daniel J. Cameron, Jessica A. Grahn	We investigate the impact of the data representation and the pre-processing steps for this classification tasks and compare different network structures.
163	Blossom Tree Graphical Models	Zhe Liu, John Lafferty	We combine the ideas behind trees and Gaussian graphical models to form a new nonparametric family of graphical models.
164	Model-based Reinforcement Learning and the Eluder Dimension	Ian Osband, Benjamin Van Roy	We consider the problem of learning to optimize an unknown Markov decision process (MDP).
165	Minimax-optimal Inference from Partial Rankings	Bruce Hajek, Sewoong Oh, Jiaming Xu	This paper studies the problem of rank aggregation under the Plackett-Luce model.
166	Spectral Methods for Indian Buffet Process Inference	Hsiao-Yu Tung, Alexander J. Smola	We provide an efficient spectral algorithm as an alternative to costly Variational Bayes and sampling-based algorithms.
167	On the Statistical Consistency of Plug-in Classifiers for Non-decomposable Performance Measures	Harikrishna Narasimhan, Rohit Vaish, Shivani Agarwal	In this work, we consider plug-in algorithms that learn a classifier by applying an empirically determined threshold to a suitable `estimate’ of the class probability, and provide a general methodology to show consistency of these methods for any non-decomposable measure that can be expressed as a continuous function of true positive rate (TPR) and true negative rate (TNR), and for which the Bayes optimal classifier is the class probability function thresholded suitably.
168	Top Rank Optimization in Linear Time	Nan Li, Rong Jin, Zhi-Hua Zhou	We propose a highly efficient approach, titled TopPush, for optimizing accuracy at the top that has computational complexity linear in the number of training instances.
169	Spectral Methods for Supervised Topic Models	Yining Wang, Jun Zhu	This paper presents a novel spectral decomposition algorithm to recover the parameters of supervised latent Dirichlet allocation (sLDA) models.
170	Graphical Models for Recovering Probabilistic and Causal Queries from Missing Data	Karthika Mohan, Judea Pearl	We address the problem of deciding whether a causal or probabilistic query is estimable from data corrupted by missing entries, given a model of missingness process.
171	Sparse PCA with Oracle Property	Quanquan Gu, Zhaoran Wang, Han Liu	In this paper, we study the estimation of the $k$-dimensional sparse principal subspace of covariance matrix $\Sigma$ in the high-dimensional setting.
172	Unsupervised Transcription of Piano Music	Taylor Berg-Kirkpatrick, Jacob Andreas, Dan Klein	We present a new probabilistic model for transcribing piano music from audio to a symbolic form.
173	Decoupled Variational Gaussian Inference	Mohammad E. Khan	We propose decoupled variational inference that brings the best of both worlds together.
174	Estimation with Norm Regularization	Arindam Banerjee, Sheng Chen, Farideh Fazayeli, Vidyashankar Sivakumar	We characterize the restricted error set, establish relations between error sets for the constrained and regularized problems, and present an estimation error bound applicable to {\em any} norm.
175	Decomposing Parameter Estimation Problems	Khaled S. Refaat, Arthur Choi, Adnan Darwiche	We propose a technique for decomposing the parameter learning problem in Bayesian networks into independent learning problems.
176	Stochastic Proximal Gradient Descent with Acceleration Techniques	Atsushi Nitanda	In this paper, we propose and analyze an accelerated variant of these methods in the mini-batch setting.
177	Learning to Optimize via Information-Directed Sampling	Daniel Russo, Benjamin Van Roy	We propose information-directed sampling — a new algorithm for online optimization problems in which a decision-maker must balance between exploration and exploitation while learning from partial feedback.
178	Covariance shrinkage for autocorrelated data	Daniel Bartz, Klaus-Robert M�ller	We contribute in this work by showing that the Sancetta estimator, while being consistent in the high-dimensional limit, suffers from a high bias in finite sample sizes.
179	Do Convnets Learn Correspondence?	Jonathan L. Long, Ning Zhang, Trevor Darrell	In this paper, we study the effectiveness of convnet activation features for tasks requiring correspondence.
180	The Blinded Bandit: Learning with Adaptive Feedback	Ofer Dekel, Elad Hazan, Tomer Koren	We develop efficient online learning algorithms for this problem and prove that they guarantee the same asymptotic regret as the optimal algorithms for the standard multi-armed bandit problem.
181	Convex Optimization Procedure for Clustering: Theoretical Revisit	Changbo Zhu, Huan Xu, Chenlei Leng, Shuicheng Yan	In this paper, we present theoretical analysis of SON~–~a convex optimization procedure for clustering using a sum-of-norms (SON) regularization recently proposed in \cite{ICML2011Hocking_419,SON, Lindsten650707, pelckmans2005convex}.
182	Sparse Bayesian structure learning with �dependent relevance determination� priors	Anqi Wu, Mijung Park, Oluwasanmi O. Koyejo, Jonathan W. Pillow	Here we introduce a hierarchical model for smooth, region-sparse weight vectors and tensors in a linear regression setting.
183	Weakly-supervised Discovery of Visual Pattern Configurations	Hyun Oh Song, Yong Jae Lee, Stefanie Jegelka, Trevor Darrell	We propose an approach that automatically identifies discriminative configurations of visual patterns that are characteristic of a given object class.
184	SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives	Aaron Defazio, Francis Bach, Simon Lacoste-Julien	In this work we introduce a new fast incremental gradient method SAGA, in the spirit of SAG, SDCA, MISO and SVRG.
185	Exclusive Feature Learning on Arbitrary Structures via `\ell_{1,2}`-norm	Deguang Kong, Ryohei Fujimaki, Ji Liu, Feiping Nie, Chris Ding	In this paper, we propose a new formulation called “exclusive group lasso”, which brings out sparsity at intra-group level in the context of feature selection.
186	Time–Data Tradeoffs by Aggressive Smoothing	John J. Bruer, Joel A. Tropp, Volkan Cevher, Stephen Becker	This paper proposes a tradeoff between sample complexity and computation time that applies to statistical estimators based on convex optimization.
187	Distributed Power-law Graph Computing: Theoretical and Empirical Analysis	Cong Xie, Ling Yan, Wu-Jun Li, Zhihua Zhang	In this paper, we propose a novel vertex-cut method, called \emph{degree-based hashing}~(DBH), for GP.
188	A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input	Mateusz Malinowski, Mario Fritz	We propose a method for automatically answering questions about images by bringing together recent advances from natural language processing and computer vision.
189	Efficient Partial Monitoring with Prior Information	Hastagiri P. Vanchinathan, G�bor Bart�k, Andreas Krause	In this paper, we assume that we are given some prior information about the distribution based on which the opponent generates the outcomes.
190	Distributed Parameter Estimation in Probabilistic Graphical Models	Yariv D. Mizrahi, Misha Denil, Nando de Freitas	This paper presents foundational theoretical results on distributed parameter estimation for undirected probabilistic graphical models.
191	Unsupervised Deep Haar Scattering on Graphs	Xu Chen, Xiuyuan Cheng, Stephane Mallat	We introduce a Haar scattering transform on graphs, which computes invariant signal descriptors.
192	Online Optimization for Max-Norm Regularization	Jie Shen, Huan Xu, Ping Li	In this paper, we propose an online algorithm for solving max-norm regularized problems that is scalable to large problems.
193	Probabilistic low-rank matrix completion on finite alphabets	Jean Lafond, Olga Klopp, Eric Moulines, Joseph Salmon	Most works have focused on recovering an unknown real-valued low-rank matrix from randomly sub-sampling its entries.
194	Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations	Xianjie Chen, Alan L. Yuille	We present a method for estimating articulated human pose from a single static image based on a graphical model with novel pairwise relations that make adaptive use of local image measurements.
195	Bayesian Inference for Structured Spike and Slab Priors	Michael R. Andersen, Ole Winther, Lars K. Hansen	We propose a novel prior formulation, the structured spike and slab prior, which allows to incorporate a priori knowledge of the sparsity pattern by imposing a spatial Gaussian process on the spike and slab probabilities.
196	Bayesian Nonlinear Support Vector Machines and Discriminative Factor Modeling	Ricardo Henao, Xin Yuan, Lawrence Carin	Bayesian Nonlinear Support Vector Machines and Discriminative Factor Modeling
197	Generalized Higher-Order Orthogonal Iteration for Tensor Decomposition and Completion	Yuanyuan Liu, Fanhua Shang, Wei Fan, James Cheng, Hong Cheng	To address this difficulty, we therefore propose an efficient and scalable core tensor Schatten 1-norm minimization method for simultaneous tensor decomposition and completion, with a much lower computational complexity.
198	Making Pairwise Binary Graphical Models Attractive	Nicholas Ruozzi, Tony Jebara	In this work, we propose a novel scheme that has better convergence properties than BP and provably provides better partition function estimates in many instances than TRBP.
199	Low Rank Approximation Lower Bounds in Row-Update Streams	David Woodruff	We study low-rank approximation in the streaming model in which the rows of an $n \times d$ matrix $A$ are presented one at a time in an arbitrary order.
200	Deep Convolutional Neural Network for Image Deconvolution	Li Xu, Jimmy SJ Ren, Ce Liu, Jiaya Jia	Instead of perfectly modeling outliers, which is rather challenging from a generative model perspective, we develop a deep convolutional neural network to capture the characteristics of degradation.
201	Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation	Jonathan J. Tompson, Arjun Jain, Yann LeCun, Christoph Bregler	This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field.
202	Learning Generative Models with Visual Attention	Yichuan Tang, Nitish Srivastava, Ruslan R. Salakhutdinov	Inspired by the attention models in visual neuroscience and the need for object-centered data for generative models, we propose a deep-learning based generative framework using attention.
203	Metric Learning for Temporal Sequence Alignment	Damien Garreau, R�mi Lajugie, Sylvain Arlot, Francis Bach	In this paper, we propose to learn a Mahalanobis distance to perform alignment of multivariate time series.
204	Learning Optimal Commitment to Overcome Insecurity	Avrim Blum, Nika Haghtalab, Ariel D. Procaccia	We design an algorithm that optimizes the defender’s strategy with no prior information, by observing the attacker’s responses to randomized deployments of resources and learning his priorities.
205	How hard is my MDP?" The distribution-norm to the rescue"	Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor	This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) we call the {\em distribution-norm}.
206	Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms	Siu On Chan, Ilias Diakonikolas, Rocco A. Servedio, Xiaorui Sun	The main contribution of this paper is a highly efficient density estimation algorithm for learning using a variable-width histogram, i.e., a hypothesis distribution with a piecewise constant probability density function.
207	An Autoencoder Approach to Learning Bilingual Word Representations	Sarath Chandar A P, Stanislas Lauly, Hugo Larochelle, Mitesh Khapra, Balaraman Ravindran, Vikas C. Raykar, Amrita Saha	In this work we explore the use of autoencoder-based methods for cross-language learning of vectorial word representations that are aligned between two languages, while not relying on word-level alignments.
208	Sequential Monte Carlo for Graphical Models	Christian Andersson Naesseth, Fredrik Lindsten, Thomas B. Sch�n	We propose a new framework for how to use sequential Monte Carlo (SMC) algorithms for inference in probabilistic graphical models (PGM).
209	Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers	Mehryar Mohri, Andres Munoz	We analyze a very broad family of monotone regret minimization algorithms for this problem, which includes the previous best known algorithm, and show that no algorithm in that family admits a strategic regret more favorable than $\Omega(\sqrt{T})$.
210	Optimal prior-dependent neural population codes under shared input noise	Agnieszka Grabska-Barwinska, Jonathan W. Pillow	Here we analyze population coding under a simple alternative model in which latent input noise” corrupts the stimulus before it is encoded by the population.
211	Deep Fragment Embeddings for Bidirectional Image Sentence Mapping	Andrej Karpathy, Armand Joulin, Li F. Fei-Fei	We introduce a model for bidirectional retrieval of images and sentences through a deep, multi-modal embedding of visual and natural language data.
212	Flexible Transfer Learning under Support and Model Shift	Xuezhi Wang, Jeff Schneider	In this paper, we consider a general case where both the support and the model change across domains.
213	Probabilistic Differential Dynamic Programming	Yunpeng Pan, Evangelos Theodorou	We present a data-driven, probabilistic trajectory optimization framework for systems with unknown dynamics, called Probabilistic Differential Dynamic Programming (PDDP).
214	Predicting Useful Neighborhoods for Lazy Local Learning	Aron Yu, Kristen Grauman	Rather than simply gather the test example’s nearest neighbors, we propose to predict the subset of training data that is jointly relevant to training its local model.
215	Modeling Deep Temporal Dependencies with Recurrent Grammar Cells""	Vincent Michalski, Roland Memisevic, Kishore Konda	We propose modeling time series by representing the transformations that take a frame at time t to a frame at time t+1.
216	Generalized Dantzig Selector: Application to the k-support norm	Soumyadeep Chatterjee, Sheng Chen, Arindam Banerjee	We propose a Generalized Dantzig Selector (GDS) for linear models, in which any norm encoding the parameter structure can be leveraged for estimation.
217	Neurons as Monte Carlo Samplers: Bayesian ?Inference and Learning in Spiking Networks	Yanping Huang, Rajesh P. Rao	We propose a two-layer spiking network capable of performing approximate inference and learning for a hidden Markov model.
218	The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification	Been Kim, Cynthia Rudin, Julie A. Shah	We present the Bayesian Case Model (BCM), a general framework for Bayesian case-based reasoning (CBR) and prototype classification and clustering.
219	Latent Support Measure Machines for Bag-of-Words Data Classification	Yuya Yoshikawa, Tomoharu Iwata, Hiroshi Sawada	With the latent SMM, a latent vector is associated with each vocabulary term, and each document is represented as a distribution of the latent vectors for words appearing in the document.
220	Local Linear Convergence of Forward–Backward under Partial Smoothness	Jingwei Liang, Jalal Fadili, Gabriel Peyr�	In this paper, we consider the Forward–Backward proximal splitting algorithm to minimize the sum of two proper closed convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relatively to an active manifold $\mathcal{M}$.
221	RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning	Marek Petrik, Dharmashankar Subramanian	We describe how to use robust Markov decision processes for value function approximation with state aggregation.
222	Deep Learning Face Representation by Joint Identification-Verification	Yi Sun, Yuheng Chen, Xiaogang Wang, Xiaoou Tang	In this paper, we show that it can be well solved with deep learning and using both face identification and verification signals as supervision.
223	A provable SVD-based algorithm for learning topics in dominant admixture corpus	Trapit Bansal, Chiranjib Bhattacharyya, Ravindran Kannan	A major contribution of the paper is to show that under this more realistic assumption, which is empirically verified on real corpora, a singular value decomposition (SVD) based algorithm with a crucial pre-processing step of thresholding, can provably recover the topics from a collection of documents drawn from Dominant admixtures.
224	QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models	Cho-Jui Hsieh, Inderjit S. Dhillon, Pradeep K. Ravikumar, Stephen Becker, Peder A. Olsen	In this paper, we develop a family of algorithms for optimizing superposition-structured” or “dirty” statistical estimators for high-dimensional problems involving the minimization of the sum of a smooth loss function with a hybrid regularization.
225	General Stochastic Networks for Classification	Matthias Z�hrer, Franz Pernkopf	In particular, we introduce a hybrid training objective considering a generative and discriminative cost function governed by a trade-off parameter lambda.
226	Spatio-temporal Representations of Uncertainty in Spiking Neural Networks	Cristina Savin, Sophie Den�ve	Here we present a novel circuit model for representing multidimensional real-valued distributions using a spike based spatio-temporal code.
227	Attentional Neural Network: Feature Selection Using Cognitive Feedback	Qian Wang, Jiaxing Zhang, Sen Song, Zheng Zhang	We obtain classification accuracy better than or competitive with state of art results on the MNIST variation dataset, and successfully disentangle overlaid digits with high success rates.
228	Convolutional Neural Network Architectures for Matching Natural Language Sentences	Baotian Hu, Zhengdong Lu, Hang Li, Qingcai Chen	As a step toward this goal, we propose convolutional neural network models for matching two sentences, by adapting the convolutional strategy in vision and speech.
229	Scalable Non-linear Learning with Adaptive Polynomial Expansions	Alekh Agarwal, Alina Beygelzimer, Daniel J. Hsu, John Langford, Matus J. Telgarsky	We describe a new algorithm that explicitly and adaptively expands higher-order interaction features over base linear representations.
230	On the relations of LFPs & Neural Spike Trains	David E. Carlson, Jana Schaich Borg, Kafui Dzirasa, Lawrence Carin	This work proposes a strategy for identifying neural networks characterized by time- and frequency-dependent connectivity patterns, using convolutional dictionary learning that links spike-train data to local field potentials (LFPs) across multiple areas of the brain.
231	Diverse Sequential Subset Selection for Supervised Video Summarization	Boqing Gong, Wei-Lun Chao, Kristen Grauman, Fei Sha	To this end, we propose the sequential determinantal point process (seqDPP), a probabilistic model for diverse sequential subset selection.
232	Self-Paced Learning with Diversity	Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, Alexander Hauptmann	To incorporate this information, we propose an approach called self-paced learning with diversity (SPLD) which formalizes the preference for both easy and diverse samples into a general regularizer.
233	Feature Cross-Substitution in Adversarial Classification	Bo Li, Yevgeniy Vorobeychik	We investigate both the problem of modeling the objectives of such adversaries, as well as the algorithmic problem of accounting for rational, objective-driven adversaries.
234	Deep Recursive Neural Networks for Compositionality in Language	Ozan Irsoy, Claire Cardie	In this work we introduce a new architecture — a deep recursive neural network (deep RNN) — constructed by stacking multiple recursive layers.
235	Inference by Learning: Speeding-up Graphical Model Optimization via a Coarse-to-Fine Cascade of Pruning Classifiers	Bruno Conejo, Nikos Komodakis, Sebastien Leprince, Jean Philippe Avouac	We propose a general and versatile framework that significantly speeds-up graphical model optimization while maintaining an excellent solution accuracy.
236	A Filtering Approach to Stochastic Variational Inference	Neil Houlsby, David Blei	We present an alternative perspective on SVI as approximate parallel coordinate ascent.
237	Optimizing F-Measures by Cost-Sensitive Classification	Shameem Puthiya Parambath, Nicolas Usunier, Yves Grandvalet	We present a theoretical analysis of F-measures for binary, multiclass and multilabel classification.
238	Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets	Jie Wang, Jieping Ye	In this paper, we propose a novel two-layer feature reduction method (TLFre) for SGL via a decomposition of its dual feasible set.
239	Improved Multimodal Deep Learning with Variation of Information	Kihyuk Sohn, Wenling Shang, Honglak Lee	In this paper, we propose a novel multimodal representation learning framework that explicitly aims this goal.
240	PEWA: Patch-based Exponentially Weighted Aggregation for image denoising	Charles Kervrann	In this paper, we propose a general statistical aggregation method which combines image patches denoised with several commonly-used algorithms.
241	Elementary Estimators for Graphical Models	Eunho Yang, Aurelie C. Lozano, Pradeep K. Ravikumar	We propose a class of closed-form estimators for sparsity-structured graphical models, expressed as exponential family distributions, under high-dimensional settings.
242	Beyond the Birkhoff Polytope: Convex Relaxations for Vector Permutation Problems	Cong Han Lim, Stephen Wright	Using a recent construction of Goemans (2010), we show that when optimizing over the convex hull of the permutation vectors (the permutahedron), we can reduce the number of variables and constraints to $\Theta(n \log n)$ in theory and $\Theta(n \log^2 n)$ in practice.
243	Neural Word Embedding as Implicit Matrix Factorization	Omer Levy, Yoav Goldberg	We analyze skip-gram with negative-sampling (SGNS), a word embedding method introduced by Mikolov et al., and show that it is implicitly factorizing a word-context matrix, whose cells are the pointwise mutual information (PMI) of the respective word and context pairs, shifted by a global constant.
244	Multi-Resolution Cascades for Multiclass Object Detection	Mohammad Saberian, Nuno Vasconcelos	An algorithm for learning fast multiclass object detection cascades is introduced.
245	Median Selection Subset Aggregation for Parallel Inference	Xiangyu Wang, Peichao Peng, David B. Dunson	We propose a MEdian Selection Subset AGgregation Estimator (message) algorithm, which attempts to solve these problems.
246	Recurrent Models of Visual Attention	Volodymyr Mnih, Nicolas Heess, Alex Graves, koray kavukcuoglu	We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution.
247	Tree-structured Gaussian Process Approximations	Thang D. Bui, Richard E. Turner	In this paper we devise an approximation whose complexity grows linearly with the number of pseudo-datapoints.
248	Active Learning and Best-Response Dynamics	Maria-Florina F. Balcan, Christopher Berlind, Avrim Blum, Emma Cohen, Kaushik Patnaik, Le Song	We consider a setting in which low-power distributed sensors are each making highly noisy measurements of some unknown target function.
249	Analog Memories in a Balanced Rate-Based Network of E-I Neurons	Dylan Festa, Guillaume Hennequin, Mate Lengyel	We propose a novel control-theoretic framework to build functioning attractor networks that satisfy a set of relevant physiological constraints.
250	Fast Sampling-Based Inference in Balanced Neuronal Networks	Guillaume Hennequin, Laurence Aitchison, Mate Lengyel	We explore this bottleneck in a simple, linear-Gaussian latent variable model, in which posterior sampling can be achieved by stochastic neural networks with linear dynamics.
251	Spectral Learning of Mixture of Hidden Markov Models	Cem Subakan, Johannes Traa, Paris Smaragdis	In this paper, we propose a learning approach for the Mixture of Hidden Markov Models (MHMM) based on the Method of Moments (MoM).
252	Subspace Embeddings for the Polynomial Kernel	Haim Avron, Huy Nguyen, David Woodruff	In particular, we propose an embedding for mappings induced by the polynomial kernel.
253	A Boosting Framework on Grounds of Online Learning	Tofigh Naghibi Mohamadpoor, Beat Pfister	By exploiting the duality between boosting and online learning, we present a boosting framework which proves to be extremely powerful thanks to employing the vast knowledge available in the online learning area.
254	A Dual Algorithm for Olfactory Computation in the Locust Brain	Sina Tootoonian, Mate Lengyel	We study the early locust olfactory system in an attempt to explain its well-characterized structure and dynamics.
255	Advances in Learning Bayesian Networks of Bounded Treewidth	Siqi Nie, Denis D. Maua, Cassio P. de Campos, Qiang Ji	This work presents novel algorithms for learning Bayesian networks of bounded treewidth.
256	Learning the Learning Rate for Prediction with Expert Advice	Wouter M. Koolen, Tim van Erven, Peter Gr�nwald	To close the gap between theory and practice we introduce an approach to learn the learning rate.
257	On the Information Theoretic Limits of Learning Ising Models	Rashish Tandon, Karthikeyan Shanmugam, Pradeep K. Ravikumar, Alexandros G. Dimakis	We provide a general framework for computing lower-bounds on the sample complexity of recovering the underlying graphs of Ising models, given i.i.d. samples.
258	Efficient Optimization for Average Precision SVM	Pritish Mohapatra, C.V. Jawahar, M. Pawan Kumar	To alleviate this deficiency, we propose three complementary approaches.
259	Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS)	Anshumali Shrivastava, Ping Li	We present the first provably sublinear time hashing algorithm for approximate \emph{Maximum Inner Product Search} (MIPS).
260	A framework for studying synaptic plasticity with neural spike train data	Scott Linderman, Christopher H. Stock, Ryan P. Adams	The computational rules according to which synaptic weights change over time are the subject of much research, and are not precisely understood.
261	Randomized Experimental Design for Causal Graph Discovery	Huining Hu, Zhentao Li, Adrian R. Vetta	We examine the number of controlled experiments required to discover a causal graph.
262	A Multiplicative Model for Learning Distributed Text-Based Attribute Representations	Ryan Kiros, Richard Zemel, Ruslan R. Salakhutdinov	In this paper we propose a general framework for learning distributed representations of attributes: characteristics of text whose representations can be jointly learned with word embeddings.
263	Learning Chordal Markov Networks by Dynamic Programming	Kustaa Kangas, Mikko Koivisto, Teppo Niinim�ki	We present an algorithm for finding a chordal Markov network that maximizes any given decomposable scoring function.
264	Depth Map Prediction from a Single Image using a Multi-Scale Deep Network	David Eigen, Christian Puhrsch, Rob Fergus	In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally.
265	Proximal Quasi-Newton for Computationally Intensive L1-regularized M-estimators	Kai Zhong, Ian En-Hsu Yen, Inderjit S. Dhillon, Pradeep K. Ravikumar	In this work, we propose the use of a carefully constructed proximal quasi-Newton algorithm for such computationally intensive M-estimation problems, where we employ an aggressive active set selection technique.
266	A Probabilistic Framework for Multimodal Retrieval using Integrative Indian Buffet Process	Bahadir Ozdemir, Larry S. Davis	We propose a multimodal retrieval procedure based on latent feature models.
267	Searching for Higgs Boson Decay Modes with Deep Learning	Peter J. Sadowski, Daniel Whiteson, Pierre Baldi	In this paper, we train artificial neural networks to detect the decay of the Higgs boson to tau leptons on a dataset of 82 million simulated collision events.
268	Structure Regularization for Structured Prediction	Xu Sun	To control structure-based overfitting, we propose a structure regularization framework via \emph{structure decomposition}, which decomposes training samples into mini-samples with simpler structures, deriving a model with better generalization power.
269	On Multiplicative Multitask Feature Learning	Xin Wang, Jinbo Bi, Shipeng Yu, Jiangwen Sun	We propose two new learning formulations by varying the parameters in the proposed framework.
270	Multivariate f-divergence Estimation With Confidence	Kevin Moon, Alfred Hero	This theory enables us to perform divergence-based inference tasks such as testing equality of pairs of distributions based on empirical samples.
271	Generalized Unsupervised Manifold Alignment	Zhen Cui, Hong Chang, Shiguang Shan, Xilin Chen	In this paper, we propose a generalized Unsupervised Manifold Alignment (GUMA) method to build the connections between different but correlated datasets without any known correspondences.
272	Smoothed Gradients for Stochastic Variational Inference	Stephan Mandt, David Blei	In this paper, we explore the idea of following biased stochastic gradients in SVI.
273	Recursive Context Propagation Network for Semantic Scene Labeling	Abhishek Sharma, Oncel Tuzel, Ming-Yu Liu	We propose a deep feed-forward neural network architecture for pixel-wise semantic scene labeling.
274	Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space	Ian En-Hsu Yen, Ting-Wei Lin, Shou-De Lin, Pradeep K. Ravikumar, Inderjit S. Dhillon	In this paper, we propose a Sparse Random Feature algorithm, which learns a sparse non-linear predictor by minimizing an $\ell_1$-regularized objective function over the Hilbert Space induced from kernel function.
275	Optimal Teaching for Limited-Capacity Human Learners	Kaustubh R. Patil, Jerry Zhu, Lukasz Kopec, Bradley C. Love	In this contribution, we take a first principles approach to constructing idealized training sets.
276	Shaping Social Activity by Incentivizing Users	Mehrdad Farajtabar, Nan Du, Manuel Gomez Rodriguez, Isabel Valera, Hongyuan Zha, Le Song	In this paper, we model social events using multivariate Hawkes processes, which can capture both endogenous and exogenous event intensities, and derive a time dependent linear relation between the intensity of exogenous events and the overall network activity.
277	Analysis of Brain States from Multi-Region LFP Time-Series	Kyle R. Ulrich, David E. Carlson, Wenzhao Lian, Jana S. Borg, Kafui Dzirasa, Lawrence Carin	An infinite hidden Markov model (iHMM) is proposed to model the evolution of brain states, based on electrophysiological LFP data measured at multiple brain regions.
278	Reputation-based Worker Filtering in Crowdsourcing	Srikanth Jagabathula, Lakshminarayanan Subramanian, Ashwin Venkataraman	In this paper, we study the problem of aggregating noisy labels from crowd workers to infer the underlying true labels of binary tasks.
279	Multi-Class Deep Boosting	Vitaly Kuznetsov, Mehryar Mohri, Umar Syed	We present new ensemble learning algorithms for multi-class classification.
280	A Differential Equation for Modeling Nesterov�s Accelerated Gradient Method: Theory and Insights	Weijie Su, Stephen Boyd, Emmanuel Candes	We derive a second-order ordinary differential equation (ODE), which is the limit of Nesterov’s accelerated gradient method.
281	Difference of Convex Functions Programming for Reinforcement Learning	Bilal Piot, Matthieu Geist, Olivier Pietquin	The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming.
282	Design Principles of the Hippocampal Cognitive Map	Kimberly L. Stachenfeld, Matthew Botvinick, Samuel J. Gershman	We hypothesize a set of design principles for the hippocampal cognitive map that explain how place fields represent space in a way that facilitates navigation and reinforcement learning.
283	Deep Symmetry Networks	Robert Gens, Pedro M. Domingos	In this paper, we introduce deep symmetry networks (symnets), a generalization of convnets that forms feature maps over arbitrary symmetry groups.
284	Nonparametric Bayesian inference on multivariate exponential families	William R. Vega-Brown, Marek Doniec, Nicholas G. Roy	We develop a model by choosing the maximum entropy distribution from the set of models satisfying certain smoothness and independence criteria; we show that inference on this model generalizes local kernel estimation to the context of Bayesian inference on stochastic processes.
285	Optimal rates for k-NN density and mode estimation	Sanjoy Dasgupta, Samory Kpotufe	We present two related contributions of independent interest: (1) high-probability finite sample rates for $k$-NN density estimation, and (2) practical mode estimators — based on $k$-NN — which attain minimax-optimal rates under surprisingly general distributional conditions.
286	Feedforward Learning of Mixture Models	Matthew Lawlor, Steven W. Zucker	We develop a biologically-plausible learning rule that provably converges to the class means of general mixture models.
287	Diverse Randomized Agents Vote to Win	Albert Jiang, Leandro Soriano Marcolino, Ariel D. Procaccia, Tuomas Sandholm, Nisarg Shah, Milind Tambe	With teams of computer Go agents in mind, we develop a novel theoretical model of two-stage noisy voting that builds on recent work in machine learning.
288	Ranking via Robust Binary Classification	Hyokun Yun, Parameswaran Raman, S. Vishwanathan	We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification.
289	Distributed Balanced Clustering via Mapping Coresets	Mohammadhossein Bateni, Aditya Bhaskara, Silvio Lattanzi, Vahab Mirrokni	In the present paper we develop a general framework based on “mapping coresets” to tackle this issue.
290	Augur: Data-Parallel Probabilistic Modeling	Jean-Baptiste Tristan, Daniel Huang, Joseph Tassarotti, Adam C. Pocock, Stephen Green, Guy L. Steele	In this paper we present Augur, a probabilistic modeling language and compiler for Bayesian networks designed to make effective use of data-parallel architectures such as GPUs.
291	Learning Mixtures of Ranking Models	Pranjal Awasthi, Avrim Blum, Or Sheffet, Aravindan Vijayaraghavan	We present the first polynomial time algorithm which provably learns the parameters of a mixture of two Mallows models.
292	Controlling privacy in recommender systems	Yu Xin, Tommi Jaakkola	In this paper, we explore a two-tiered notion of privacy where there is a small set of “public” users who are willing to share their preferences openly, and a large set of “private” users who require privacy guarantees.
293	Convolutional Kernel Networks	Julien Mairal, Piotr Koniusz, Zaid Harchaoui, Cordelia Schmid	In this paper, we address this goal with a new type of convolutional neural network (CNN) whose invariance is encoded by a reproducing kernel.
294	Fairness in Multi-Agent Sequential Decision-Making	Chongjie Zhang, Julie A. Shah	We develop a simple linear programming approach and a more scalable game-theoretic approach for computing an optimal fairness policy.
295	Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets	Adarsh Prasad, Stefanie Jegelka, Dhruv Batra	We study greedy algorithms for finding a diverse subset of solutions in structured-output spaces by drawing new connections between submodular functions over combinatorial item sets and High-Order Potentials (HOPs) studied for graphical models.
296	Do Deep Nets Really Need to be Deep?	Jimmy Ba, Rich Caruana	In this paper we empirically demonstrate that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models.
297	Dynamic Rank Factor Model for Text Streams	Shaobo Han, Lin Du, Esther Salazar, Lawrence Carin	We propose a semi-parametric and dynamic rank factor model for topic modeling, capable of (1) discovering topic prevalence over time, and (2) learning contemporary multi-scale dependence structures, providing topic and word correlations as a byproduct.
298	Generative Adversarial Nets	Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio	We propose a new framework for estimating generative models via adversarial nets, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
299	Testing Unfaithful Gaussian Graphical Models	De Wen Soh, Sekhar C. Tatikonda	Testing Unfaithful Gaussian Graphical Models
300	Global Sensitivity Analysis for MAP Inference in Graphical Models	Jasper De Bock, Cassio P. de Campos, Alessandro Antonucci	Our main contribution is an exact algorithm that can check whether the MAP configuration is robust with respect to given perturbations.
301	Deconvolution of High Dimensional Mixtures via Boosting, with Application to Diffusion-Weighted MRI of Human Brain	Charles Y. Zheng, Franco Pestilli, Ariel Rokem	Here, we propose a more general algorithm that fits mixture models of any dimensionality without discretization.
302	Efficient Minimax Signal Detection on Graphs	Jing Qian, Venkatesh Saligrama	We overcome the combinatorics of connectivity by embedding connected subgraphs into linear matrix inequalities (LMI).
303	Cone-Constrained Principal Component Analysis	Yash Deshpande, Andrea Montanari, Emile Richard	We consider a simple model for noisy quadratic observation of an unknown vector $\bvz$.
304	On Communication Cost of Distributed Statistical Estimation and Dimensionality	Ankit Garg, Tengyu Ma, Huy Nguyen	Specifically we study the problem of estimating the mean $\vectheta$ of an unknown $d$ dimensional gaussian distribution in the distributed setting.
305	Computing Nash Equilibria in Generalized Interdependent Security Games	Hau Chan, Luis E. Ortiz	We study the computational complexity of computing Nash equilibria in generalized interdependent-security (IDS) games.
306	Consistent Binary Classification with Generalized Performance Metrics	Oluwasanmi O. Koyejo, Nagarajan Natarajan, Pradeep K. Ravikumar, Inderjit S. Dhillon	We propose two algorithms for estimating the optimal classifiers, and prove their statistical consistency.
307	Greedy Subspace Clustering	Dohyung Park, Constantine Caramanis, Sujay Sanghavi	In this paper, we provide new simple and efficient algorithms for this problem.
308	Deterministic Symmetric Positive Semidefinite Matrix Completion	William E. Bishop, Byron M. Yu	We consider the problem of recovering a symmetric, positive semidefinite (SPSD) matrix from a subset of its entries, possibly corrupted by noise. We develop a set of sufficient conditions for the recovery of a SPSD matrix from a set of its principal submatrices, present necessity results based on this set of conditions and develop an algorithm that can exactly recover a matrix when these conditions are met.
309	Multi-Step Stochastic ADMM in High Dimensions: Applications to Sparse Optimization and Matrix Decomposition	Hanie Sedghi, Anima Anandkumar, Edmond Jonckheere	In this paper, we consider a multi-step version of the stochastic ADMM method with efficient guarantees for high-dimensional problems.
310	Online combinatorial optimization with stochastic decision sets and adversarial losses	Gergely Neu, Michal Valko	In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions.
311	Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature	Tom Gunter, Michael A. Osborne, Roman Garnett, Philipp Hennig, Stephen J. Roberts	We propose a novel sampling framework for inference in probabilistic models: an active learning approach that converges more quickly (in wall-clock time) than Markov chain Monte Carlo (MCMC) benchmarks.
312	Multi-Scale Spectral Decomposition of Massive Graphs	Si Si, Donghyuk Shin, Inderjit S. Dhillon, Beresford N. Parlett	In this paper, we propose and analyze a novel multi-scale spectral decomposition method (MSEIGS), which first clusters the graph into smaller clusters whose spectral decomposition can be computed efficiently and independently.
313	The limits of squared Euclidean distance regularization	Michal Derezinski, Manfred K. Warmuth	We give a random construction for sets of examples where the target linear weight vector is trivial to learn but any algorithm from the above family is drastically sub-optimal.
314	Bregman Alternating Direction Method of Multipliers	Huahua Wang, Arindam Banerjee	In this paper, we similarly generalize the alternating direction method of multipliers (ADMM) to Bregman ADMM (BADMM), which allows the choice of different Bregman divergences to exploit the structure of problems.
315	Multitask learning meets tensor factorization: task imputation via convex optimization	Kishan Wimalawarne, Masashi Sugiyama, Ryota Tomioka	We propose a new norm, which we call the scaled latent trace norm and analyze the excess risk of all the three norms.
316	On Model Parallelization and Scheduling Strategies for Distributed Machine Learning	Seunghak Lee, Jin Kyu Kim, Xun Zheng, Qirong Ho, Garth A. Gibson, Eric P. Xing	In this paper, we develop a system for model-parallelism, STRADS, that provides a programming abstraction for scheduling parameter updates by discovering and leveraging changing structural properties of ML programs.
317	Scalable Inference for Neuronal Connectivity from Calcium Imaging	Alyson K. Fletcher, Sundeep Rangan	In this work, we propose a computationally fast method for the state estimation based on a hybrid of loopy belief propagation and approximate message passing (AMP).
318	Structure learning of antiferromagnetic Ising models	Guy Bresler, David Gamarnik, Devavrat Shah	In this paper we investigate the computational complexity of learning the graph structure underlying a discrete undirected graphical model from i.i.d. samples.
319	The Noisy Power Method: A Meta Algorithm with Applications	Moritz Hardt, Eric Price	We provide a new robust convergence analysis of the well-known power method for computing the dominant singular vectors of a matrix that we call noisy power method.
320	Algorithm selection by rational metareasoning as a model of human strategy selection	Falk Lieder, Dillon Plunkett, Jessica B. Hamrick, Stuart J. Russell, Nicholas Hay, Tom Griffiths	Here, we view the algorithm selection problem as a special case of metareasoning and derive a solution that outperforms existing methods in sorting algorithm selection.
321	Extremal Mechanisms for Local Differential Privacy	Peter Kairouz, Sewoong Oh, Pramod Viswanath	We introduce a family of extremal privatization mechanisms, which we call staircase mechanisms, and prove that it contains the optimal privatization mechanism that maximizes utility.
322	Global Belief Recursive Neural Networks	Romain Paulus, Richard Socher, Christopher D. Manning	We introduce global belief recursive neural networks (GB-RNNs) which are based on the idea of extending purely feedforward neural networks to include one feedbackward step during inference.
323	A statistical model for tensor PCA	Emile Richard, Andrea Montanari	For moderate dimensions, we propose an hybrid approach that uses unfolding together with power iteration, and show that it outperforms significantly baseline methods.
324	Real-Time Decoding of an Integrate and Fire Encoder	Shreya Saxena, Munther Dahleh	Here, we introduce a causal, real-time decoder of the biophysically-based Integrate and Fire encoding neuron model.
325	Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning	Brendan McMahan, Matthew Streeter	We analyze new online gradient descent algorithms for distributed systems with large delays between gradient computations and the corresponding updates.
326	On the Number of Linear Regions of Deep Neural Networks	Guido F. Montufar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio	We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have.
327	Identifying and attacking the saddle point problem in high-dimensional non-convex optimization	Yann N. Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio	Motivated by these arguments, we propose a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods.
328	Extracting Latent Structure From Multiple Interacting Neural Populations	Joao Semedo, Amin Zandvakili, Adam Kohn, Christian K. Machens, Byron M. Yu	Rather than attempting to identify direct interactions between neurons (where the number of interactions grows with the number of neurons squared), we propose to extract a smaller number of latent variables from each population and study how the latent variables interact.
329	Learning with Fredholm Kernels	Qichao Que, Mikhail Belkin, Yusu Wang	In this paper we propose a framework for supervised and semi-supervised learning based on reformulating the learning problem as a regularized Fredholm integral equation.
330	Hamming Ball Auxiliary Sampling for Factorial Hidden Markov Models	Michalis Titsias RC AUEB, Christopher Yau	We introduce a novel sampling algorithm for Markov chain Monte Carlo-based Bayesian inference for factorial hidden Markov models.
331	Optimizing Energy Production Using Policy Search and Predictive State Representations	Yuri Grinberg, Doina Precup, Michel Gendreau	We propose a policy-search-based approach coupled with predictive modelling to address this problem.
332	Scaling-up Importance Sampling for Markov Logic Networks	Deepak Venugopal, Vibhav G. Gogate	In this paper, we address both problems by presenting a scalable, lifted importance sampling-based approach that never grounds the full MLN.
333	Optimal Neural Codes for Control and Estimation	Alex K. Susemihl, Ron Meir, Manfred Opper	In this work we show that sensory adaptation for control differs from sensory adaptation for perception, even for simple control setups.
334	Graph Clustering With Missing Data: Convex Algorithms and Analysis	Ramya Korlakai Vinayak, Samet Oymak, Babak Hassibi	We consider the problem of finding clusters in an unweighted graph, when the graph is partially observed.
335	Scale Adaptive Blind Deblurring	Haichao Zhang, Jianchao Yang	We present a scale space perspective on blind deblurring algorithms, and introduce a cascaded scale space formulation for blind deblurring.
336	Weighted importance sampling for off-policy learning with linear function approximation	A. Rupam Mahmood, Hado P. van Hasselt, Richard S. Sutton	In this paper, we take two steps toward bridging this gap.
337	Information-based learning by agents in unbounded state spaces	Shariq A. Mobin, James A. Arnemann, Fritz Sommer	The idea that animals might use information-driven planning to explore an unknown environment and build an internal model of it has been proposed for quite some time.
338	Exponential Concentration of a Density Functional Estimator	Shashank Singh, Barnabas Poczos	We analyse a plug-in estimator for a large class of integral functionals of one or more continuous probability densities.
339	Scalable Kernel Methods via Doubly Stochastic Gradients	Bo Dai, Bo Xie, Niao He, Yingyu Liang, Anant Raj, Maria-Florina F. Balcan, Le Song	In this paper, we propose an approach that scales up kernel methods using a novel concept called “doubly stochastic functional gradients”.
340	Fast Training of Pose Detectors in the Fourier Domain	Jo�o F. Henriques, Pedro Martins, Rui F. Caseiro, Jorge Batista	By assuming that this transformation is norm-preserving and cyclic, we propose a closed-form solution in the Fourier domain that can eliminate most redundancies.
341	An Accelerated Proximal Coordinate Gradient Method	Qihang Lin, Zhaosong Lu, Lin Xiao	We develop an accelerated randomized proximal coordinate gradient (APCG) method, for solving a broad class of composite convex optimization problems.
342	Communication-Efficient Distributed Dual Coordinate Ascent	Martin Jaggi, Virginia Smith, Martin Takac, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael I. Jordan	In this paper, we propose a communication-efficient framework, COCOA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication.
343	Simple MAP Inference via Low-Rank Relaxations	Roy Frostig, Sida Wang, Percy S. Liang, Christopher D. Manning	We focus on the problem of maximum a posteriori (MAP) inference in Markov random fields with binary variables and pairwise interactions.
344	A* Sampling	Chris J. Maddison, Daniel Tarlow, Tom Minka	In this work, we show how sampling from a continuous distribution can be converted into an optimization problem over continuous space.
345	A Bayesian model for identifying hierarchically organised states in neural population activity	Patrick Putzky, Florian Franzen, Giacomo Bassetto, Jakob H. Macke	Here, we present a statistical model for extracting hierarchically organised neural population states from multi-channel recordings of neural spiking activity.
346	Sequence to Sequence Learning with Neural Networks	Ilya Sutskever, Oriol Vinyals, Quoc V. Le	In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure.
347	Improved Distributed Principal Component Analysis	Yingyu Liang, Maria-Florina F. Balcan, Vandana Kanchanapally, David Woodruff	We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for $k$-means clustering and related problems.
348	Sparse Polynomial Learning and Graph Sketching	Murat Kocaoglu, Karthikeyan Shanmugam, Alexandros G. Dimakis, Adam Klivans	We give an algorithm for exactly reconstructing $f$ given random examples from the uniform distribution on $\{-1,1\}^n$ that runs in time polynomial in $n$ and $2^{s}$ and succeeds if the function satisfies the \textit{unique sign property}: there is one output value which corresponds to a unique set of values of the participating parities.
349	Tight Continuous Relaxation of the Balanced k-Cut Problem	Syama Sundar Rangapuram, Pramod Kaushik Mudrakarta, Matthias Hein	In this paper we propose a new tight continuous relaxation for any balanced k-cut problem and show that a related recently proposed relaxation is in most cases loose leading to poor performance in practice.
350	Mondrian Forests: Efficient Online Random Forests	Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh	In this work, we use Mondrian processes (Roy and Teh, 2009) to construct ensembles of random decision trees we call Mondrian forests.
351	Expectation-Maximization for Learning Determinantal Point Processes	Jennifer A. Gillenwater, Alex Kulesza, Emily Fox, Ben Taskar	In this work we propose a novel algorithm for learning the full kernel matrix.
352	Capturing Semantically Meaningful Word Dependencies with an Admixture of Poisson MRFs	David I. Inouye, Pradeep K. Ravikumar, Inderjit S. Dhillon	We develop a fast algorithm for the Admixture of Poisson MRFs (APM) topic model and propose a novel metric to directly evaluate this model.
353	Streaming, Memory Limited Algorithms for Community Detection	Se-Young Yun, marc lelarge, Alexandre Proutiere	In this paper, we consider sparse networks consisting of a finite number of non-overlapping communities, i.e. disjoint clusters, so that there is higher density within clusters than across clusters.
354	Content-based recommendations with Poisson factorization	Prem K. Gopalan, Laurent Charlin, David Blei	We develop collaborative topic Poisson factorization (CTPF), a generative model of articles and reader preferences.
355	A Statistical Decision-Theoretic Framework for Social Choice	Hossein Azari Soufiani, David C. Parkes, Lirong Xia	In this paper, we take a statistical decision-theoretic viewpoint on social choice, putting a focus on the decision to be made on behalf of a system of agents.
356	Compressive Sensing of Signals from a GMM with Sparse Precision Matrices	Jianbo Yang, Xuejun Liao, Minhua Chen, Lawrence Carin	Under these challenging assumptions, we develop a hierarchical Bayesian method to simultaneously estimate the GMM and recover the signals using solely the incomplete measurements and a Bayesian shrinkage prior that promotes sparsity of the Gaussian precision matrices.
357	Bayesian Sampling Using Stochastic Gradient Thermostats	Nan Ding, Youhan Fang, Ryan Babbush, Changyou Chen, Robert D. Skeel, Hartmut Neven	Dynamics-based sampling methods, such as Hybrid Monte Carlo (HMC) and Langevin dynamics (LD), are commonly used to sample target distributions.
358	On Sparse Gaussian Chain Graph Models	Calvin McCarter, Seyoung Kim	In this paper, we address the problem of learning the structure of Gaussian chain graph models in a high-dimensional space.
359	Orbit Regularization	Renato Negrinho, Andre Martins	We propose a general framework for regularization based on group majorization.
360	Efficient Minimax Strategies for Square Loss Games	Wouter M. Koolen, Alan Malek, Peter L. Bartlett	We consider online prediction problems where the loss between the prediction and the outcome is measured by the squared Euclidean distance and its generalization, the squared Mahalanobis distance.
361	A Residual Bootstrap for High-Dimensional Regression with Near Low-Rank Designs	Miles Lopes	In this setting, we propose a version of RB that resamples residuals obtained from ridge regression.
362	Large-Margin Convex Polytope Machine	Alex Kantchelian, Michael C. Tschantz, Ling Huang, Peter L. Bartlett, Anthony D. Joseph, J. D. Tygar	We present the Convex Polytope Machine (CPM), a novel non-linear learning algorithm for large-scale binary classification tasks.
363	Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models	Yarin Gal, Mark van der Wilk, Carl Edward Rasmussen	We introduce a novel re-parametrisation of variational inference for sparse GP regression and latent variable models that allows for an efficient distributed algorithm.
364	Learning Distributed Representations for Structured Output Prediction	Vivek Srikumar, Christopher D. Manning	In this paper, we present a new formulation for structured prediction where we represent individual labels in a structure as dense vectors and allow semantically similar labels to share parameters.
365	Convex Deep Learning via Normalized Kernels	�zlem Aslan, Xinhua Zhang, Dale Schuurmans	In this paper, we develop a new architecture for nested nonlinearities that allows arbitrarily deep compositions to be trained to global optimality.
366	Tight convex relaxations for sparse matrix factorization	Emile Richard, Guillaume R. Obozinski, Jean-Philippe Vert	Based on a new atomic norm, we propose a new convex formulation for sparse matrix factorization problems in which the number of nonzero elements of the factors is assumed fixed and known.
367	Learning to Search in Branch and Bound Algorithms	He He, Hal Daume III, Jason M. Eisner	We address the key challenge of learning an adaptive node searching order for any class of problem solvable by branch-and-bound.
368	An Integer Polynomial Programming Based Framework for Lifted MAP Inference	Somdeb Sarkhel, Deepak Venugopal, Parag Singla, Vibhav G. Gogate	In this paper, we present a new approach for lifted MAP inference in Markov logic networks (MLNs).
369	Conditional Random Field Autoencoders for Unsupervised Structured Prediction	Waleed Ammar, Chris Dyer, Noah A. Smith	We introduce a framework for unsupervised learning of structured predictors with overlapping, global features.
370	How transferable are features in deep neural networks?	Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson	In this paper we experimentally quantify the generality versus specificity of neurons in each layer of a deep convolutional neural network and report a few surprising results.
371	Accelerated Mini-batch Randomized Block Coordinate Descent Method	Tuo Zhao, Mo Yu, Yiming Wang, Raman Arora, Han Liu	In this paper, we propose a mini-batch randomized block coordinate descent (MRBCD) method, which estimates the partial gradient of the selected block based on a mini-batch of randomly sampled data in each iteration.
372	Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning	Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang	Our main goal in this work is to build a better real-time Atari game playing agent than DQN.
373	A Latent Source Model for Online Collaborative Filtering	Guy Bresler, George H. Chen, Devavrat Shah	We address this theoretical gap by introducing a model for online recommendation systems, cast item recommendation under the model as a learning problem, and analyze the performance of a cosine-similarity collaborative filtering method.
374	Distributed Bayesian Posterior Sampling via Moment Sharing	Minjie Xu, Balaji Lakshminarayanan, Yee Whye Teh, Jun Zhu, Bo Zhang	We propose a distributed Markov chain Monte Carlo (MCMC) inference algorithm for large scale Bayesian posterior simulation.
375	Learning with Pseudo-Ensembles	Philip Bachman, Ouais Alsharif, Doina Precup	We present a novel regularizer based on making the behavior of a pseudo-ensemble robust with respect to the noise process generating it.
376	Learning Time-Varying Coverage Functions	Nan Du, Yingyu Liang, Maria-Florina F. Balcan, Le Song	In this paper, we propose a new problem of learning time-varying coverage functions which arise naturally from applications in social network analysis, machine learning, and algorithmic game theory.
377	Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time	Zhaoran Wang, Huanran Lu, Han Liu	In this paper, we propose a two-stage sparse PCA procedure that attains the optimal principal subspace estimator in polynomial time.
378	Discriminative Metric Learning by Neighborhood Gerrymandering	Shubhendu Trivedi, David Mcallester, Greg Shakhnarovich	We describe an efficient algorithm for exact loss augmented inference,and a fast gradient descent algorithm for learning in this model.
379	Finding a sparse vector in a subspace: Linear sparsity using alternating directions	Qing Qu, Ju Sun, John Wright	We consider the problem of recovering the sparsest vector in a subspace $ \mathcal{S} \in \mathbb{R}^p $ with $ \text{dim}(\mathcal{S})=n$.
380	Asynchronous Anytime Sequential Monte Carlo	Brooks Paige, Frank Wood, Arnaud Doucet, Yee Whye Teh	We introduce a new sequential Monte Carlo algorithm we call the particle cascade.
381	Discrete Graph Hashing	Wei Liu, Cun Mu, Sanjiv Kumar, Shih-Fu Chang	This paper presents a graph-based unsupervised hashing model to preserve the neighborhood structure of massive data in a discrete code space.
382	Feedback Detection for Live Predictors	Stefan Wager, Nick Chamandy, Omkar Muralidharan, Amir Najmi	In this paper we analyze predictor feedback detection as a causal inference problem, and introduce a local randomization scheme that can be used to detect non-linear feedback in real-world problems.
383	Rates of Convergence for Nearest Neighbor Classification	Kamalika Chaudhuri, Sanjoy Dasgupta	We analyze the behavior of nearest neighbor classification in metric spaces and provide finite-sample, distribution-dependent rates of convergence under minimal assumptions.
384	Consistency of weighted majority votes	Daniel Berend, Aryeh Kontorovich	We revisit from a statistical learning perspective the classical decision-theoretic problem of weighted expert voting.
385	Beta-Negative Binomial Process and Exchangeable ?Random Partitions for Mixed-Membership Modeling	Mingyuan Zhou	This paper introduces an exchangeable partition probability function to explicitly describe how the BNBP clusters the data points of each group into a random number of exchangeable partitions, which are shared across all the groups.
386	Zero-shot recognition with unreliable attributes	Dinesh Jayaraman, Kristen Grauman	We propose a novel random forest approach to train zero-shot models that explicitly accounts for the unreliability of attribute predictions.
387	Concavity of reweighted Kikuchi approximation	Po-Ling Loh, Andre Wibisono	We analyze a reweighted version of the Kikuchi approximation for estimating the log partition function of a product distribution defined over a region graph.
388	Online Decision-Making in General Combinatorial Spaces	Arun Rajkumar, Shivani Agarwal	We study online combinatorial decision problems, where one must make sequential decisions in some combinatorial space without knowing in advance the cost of decisions on each trial; the goal is to minimize the total regret over some sequence of trials relative to the best fixed decision in hindsight.
389	Fast Multivariate Spatio-temporal Analysis via Low Rank Tensor Learning	Mohammad Taha Bahadori, Qi (Rose) Yu, Yan Liu	We propose a unified low rank tensor learning framework for multivariate spatio-temporal analysis, which can conveniently incorporate different properties in spatio-temporal data, such as spatial clustering and shared structure among variables.
390	Clustered factor analysis of multineuronal spike data	Lars Buesing, Timothy A. Machado, John P. Cunningham, Liam Paninski	Here, we extend unstructured factor models by proposing a model that discovers subpopulations or groups of cells from the pool of recorded neurons.
391	Algorithms for CVaR Optimization in MDPs	Yinlam Chow, Mohammad Ghavamzadeh	In this paper, we consider the mean-CVaR optimization problem in MDPs.
392	Factoring Variations in Natural Images with Deep Gaussian Mixture Models	Aaron van den Oord, Benjamin Schrauwen	In this paper we propose a new scalable deep generative model for images, called the Deep Gaussian Mixture Model, that is a straightforward but powerful generalization of GMMs to multiple layers.
393	Partition-wise Linear Models	Hidekazu Oiwa, Ryohei Fujimaki	This paper proposes novel convex region-specific linear models, which we refer to as partition-wise linear models.
394	LSDA: Large Scale Detection through Adaptation	Judy Hoffman, Sergio Guadarrama, Eric S. Tzeng, Ronghang Hu, Jeff Donahue, Ross Girshick, Trevor Darrell, Kate Saenko	In this paper, we propose Large Scale Detection through Adaptation (LSDA), an algorithm which learns the difference between the two tasks and transfers this knowledge to classifiers for categories without bounding box annotated data, turning them into detectors.
395	Deep Networks with Internal Selective Attention through Feedback Connections	Marijn F. Stollenga, Jonathan Masci, Faustino Gomez, J�rgen Schmidhuber	Deep Networks with Internal Selective Attention through Feedback Connections
396	Parallel Feature Selection Inspired by Group Testing	Yingbo Zhou, Utkarsh Porwal, Ce Zhang, Hung Q. Ngo, XuanLong Nguyen, Christopher R�, Venu Govindaraju	This paper presents a parallel feature selection method for classification that scales up to very high dimensions and large data sizes.
397	Low-Rank Time-Frequency Synthesis	C�dric F�votte, Matthieu Kowalski	The work presented in this paper fills in this ultimate gap by proposing a novel signal synthesis model with low-rank time-frequency structure.
398	Pre-training of Recurrent Neural Networks via Linear Autoencoders	Luca Pasa, Alessandro Sperduti	We propose a pre-training technique for recurrent neural networks based on linear autoencoder networks for sequences, i.e. linear dynamical systems modelling the target sequences.
399	Semi-supervised Learning with Deep Generative Models	Durk P. Kingma, Shakir Mohamed, Danilo Jimenez Rezende, Max Welling	We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones.
400	Signal Aggregate Constraints in Additive Factorial HMMs, with Application to Energy Disaggregation	Mingjun Zhong, Nigel Goddard, Charles Sutton	We introduce a way of incorporating domain knowledge into this problem, called signal aggregate constraints (SACs).
401	Stochastic variational inference for hidden Markov models	Nick Foti, Jason Xu, Dillon Laird, Emily Fox	We propose an algorithm that harnesses the memory decay of the chain to adaptively bound errors arising from edge effects.
402	A Wild Bootstrap for Degenerate Kernel Tests	Kacper P. Chwialkowski, Dino Sejdinovic, Arthur Gretton	A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed.
403	Biclustering Using Message Passing	Luke O’Connor, Soheil Feizi	We propose a biclustering algorithm that maximizes a global objective function using message passing.
404	Fast Kernel Learning for Multidimensional Pattern Extrapolation	Andrew G. Wilson, Elad Gilboa, Arye Nehorai, John P. Cunningham	In this paper, we propose a Gaussian process approach for large scale multidimensional pattern extrapolation.
405	Learning on graphs using Orthonormal Representation is Statistically Consistent	Rakesh Shivanna, Chiranjib Bhattacharyya	In this paper, we show that there exists orthonormal representations which are statistically consistent over a large class of graphs, including power law and random graphs.
406	Spectral k-Support Norm Regularization	Andrew M. McDonald, Massimiliano Pontil, Dimitris Stamos	We extend the norms from the vector to the matrix setting and we introduce the spectral $k$-support norm.
407	Unsupervised learning of an efficient short-term memory network	Pietro Vertechi, Wieland Brendel, Christian K. Machens	We develop our main theory for linear networks, but then sketch how the learning rules could be transferred to balanced, spiking networks.
408	Quantized Estimation of Gaussian Sequence Models in Euclidean Balls	Yuancheng Zhu, John Lafferty	In this paper, we present an extension to Pinsker’s theorem where estimation is carried out under storage or communication constraints.
409	Learning a Concept Hierarchy from Multi-labeled Documents	Viet-An Nguyen, Jordan L. Ying, Philip Resnik, Jonathan Chang	In this paper, we present a model-Label to Hierarchy (L2H)-that can induce a hierarchy of user-generated labels and the topics associated with those labels from a set of multi-labeled documents.
410	Variational Gaussian Process State-Space Models	Roger Frigola, Yutian Chen, Carl Edward Rasmussen	We present a procedure for efficient variational Bayesian learning of nonlinear state-space models based on sparse Gaussian processes.
411	Fast Prediction for Large-Scale Kernel Machines	Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon	In this paper, we present two novel insights for improving the prediction efficiency of kernel machines.