# Paper Digest: NIPS 2014 Highlights

The Conference on Neural Information Processing Systems (NIPS) is one of the top machine learning conferences in the world. In 2014, it is to be held in Montreal, Canada.

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to **sign up our free paper digest service ** to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team

team@paperdigest.org

#### TABLE 1: NIPS 2014 Papers

Title | Authors | Highlight | |
---|---|---|---|

1 | Kernel Mean Estimation via Spectral Filtering | Krikamol Muandet, Bharath Sriperumbudur, Bernhard Sch�lkopf | Using the kernel PCA basis, we show that some of these estimators can be constructed using spectral filtering algorithms which are shown to be consistent under some technical assumptions. |

2 | Semi-Separable Hamiltonian Monte Carlo for Inference in Bayesian Hierarchical Models | Yichuan Zhang, Charles Sutton | We introduce a new RMHMC method, which we call semi-separable Hamiltonian Monte Carlo, which uses a specially designed mass matrix that allows the joint Hamiltonian over model parameters and hyperparameters to decompose into two simpler Hamiltonians. |

3 | Communication Efficient Distributed Machine Learning with the Parameter Server | Mu Li, David G. Andersen, Alexander J. Smola, Kai Yu | This paper describes a third-generation parameter server framework for distributed machine learning. |

4 | The Infinite Mixture of Infinite Gaussian Mixtures | Halid Z. Yerebakan, Bartek Rajwa, Murat Dundar | Herein, we present the infinite mixture of infinite Gaussian mixtures (I2GMM) for more flexible modeling of data sets with skewed and multi-modal cluster distributions. |

5 | Robust Classification Under Sample Selection Bias | Anqi Liu, Brian Ziebart | We develop a framework for robustly learning a probabilistic classifier that adapts to different sample selection biases using a minimax estimation formulation. |

6 | Zeta Hull Pursuits: Learning Nonconvex Data Hulls | Yuanjun Xiong, Wei Liu, Deli Zhao, Xiaoou Tang | Employing this measure, we present a greedy algorithmic framework, dubbed Zeta Hulls, to perform structured column sampling. |

7 | Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction | Katerina Fragkiadaki, Marta Salas, Pablo Arbelaez, Jitendra Malik | In this paper, we make a first attempt towards this goal, and propose a method that combines dense optical flow tracking, motion trajectory clustering and NRSfM for 3D reconstruction of objects in videos. |

8 | Sparse Space-Time Deconvolution for Calcium Image Analysis | Ferran Diego Andilla, Fred A. Hamprecht | We describe a unified formulation and algorithm to find an extremely sparse representation for Calcium image sequences in terms of cell locations, cell shapes, spike timings and impulse responses. |

9 | Restricted Boltzmann machines modeling human choice | Takayuki Osogami, Makoto Otsuka | We extend the multinomial logit model to represent some of the empirical phenomena that are frequently observed in the choices made by humans. |

10 | Multiscale Fields of Patterns | Pedro Felzenszwalb, John G. Oberlin | We describe a framework for defining high-order image models that can be used in a variety of applications. |

11 | large scale canonical correlation analysis with iterative least squares | Yichao Lu, Dean P. Foster | In this paper we introduce L-CCA, an iterative algorithm which can compute CCA fast on huge sparse datasets. |

12 | Altitude Training: Strong Bounds for Single-Layer Dropout | Stefan Wager, William Fithian, Sida Wang, Percy S. Liang | This paper proposes a theoretical explanation for this phenomenon: we show that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization. |

13 | Rounding-based Moves for Metric Labeling | M. Pawan Kumar | We consider a large class of parallel rounding procedures, and design move-making algorithms that closely mimic them. |

14 | Parallel Double Greedy Submodular Maximization | Xinghao Pan, Stefanie Jegelka, Joseph E. Gonzalez, Joseph K. Bradley, Michael I. Jordan | In this work, we propose two methods to parallelize the double-greedy algorithm. |

15 | Multivariate Regression with Calibration | Han Liu, Lie Wang, Tuo Zhao | We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models. |

16 | Exact Post Model Selection Inference for Marginal Screening | Jason D. Lee, Jonathan E. Taylor | We develop a framework for post model selection inference, via marginal screening, in linear regression. |

17 | On a Theory of Nonparametric Pairwise Similarity for Clustering: Connecting Clustering to Classification | Yingzhen Yang, Feng Liang, Shuicheng Yan, Zhangyang Wang, Thomas S. Huang | In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class classification. |

18 | Just-In-Time Learning for Fast and Flexible Inference | S. M. Ali Eslami, Daniel Tarlow, Pushmeet Kohli, John Winn | Motivated by this, we introduce just-in-time learning, a framework for fast and flexible inference that learns to speed up inference at run-time. |

19 | Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation | Ohad Shamir | In this paper, we describe how a single set of results implies positive answers to the above, for several different settings. |

20 | Quantized Kernel Learning for Feature Matching | Danfeng Qin, Xuanli Chen, Matthieu Guillaumin, Luc V. Gool | In this paper, we introduce a simple and flexible family of non-linear kernels which we refer to as Quantized Kernels (QK). |

21 | Parallel Direction Method of Multipliers | Huahua Wang, Arindam Banerjee, Zhi-Quan Luo | In this paper, we propose a parallel randomized block coordinate method named Parallel Direction Method of Multipliers (PDMM) to solve the optimization problems with multi-block linear constraints. |

22 | (Almost) No Label No Cry | Giorgio Patrini, Richard Nock, Paul Rivera, Tiberio Caetano | We provide a fast learning algorithm that estimates the mean operator via a manifold regularizer with guaranteed approximation bounds. |

23 | Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards | Omar Besbes, Yonatan Gur, Assaf Zeevi | In this paper, we focus on a MAB formulation which allows for a broad range of temporal uncertainties in the rewards, while still maintaining mathematical tractability. |

24 | Object Localization based on Structural SVM using Privileged Information | Jan Feyereisl, Suha Kwak, Jeany Son, Bohyung Han | We propose a structured prediction algorithm for object localization based on Support Vector Machines (SVMs) using privileged information. |

25 | Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations | Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang | To take into account this instinct, this paper proposes a novel deep neural net, named multi-view perceptron (MVP), which can untangle the identity and view features, and infer a full spectrum of multi-view images in the meanwhile, given a single 2D face image. |

26 | Shape and Illumination from Shading using the Generic Viewpoint Assumption | Daniel Zoran, Dilip Krishnan, Jos� Bento, Bill Freeman | In this paper, we show how to utilize the GVA to estimate shape and illumination from a single shading image, without the use of other priors. |

27 | Parallel Sampling of HDPs using Sub-Cluster Splits | Jason Chang, John W. Fisher III | We develop a sampling technique for Hierarchical Dirichlet process models. |

28 | From MAP to Marginals: Variational Inference in Bayesian Submodular Models | Josip Djolonga, Andreas Krause | In particular, we present L-Field, a variational approach to general log-submodular and log-supermodular distributions based on sub- and supergradients. |

29 | Robust Logistic Regression and Classification | Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan | We propose a new robust logistic regression algorithm, called RoLR, that estimates the parameter through a simple linear programming procedure. |

30 | Extracting Certainty from Uncertainty: Transductive Pairwise Classification from Pairwise Similarities | Tianbao Yang, Rong Jin | In this work, we study the problem of transductive pairwise classification from pairwise similarities~\footnote{The pairwise similarities are usually derived from some side information instead of the underlying class labels.} |

31 | A Unified Semantic Embedding: Relating Taxonomies and Attributes | Sung Ju Hwang, Leonid Sigal | We propose a method that learns a discriminative yet semantic space for object categorization, where we also embed auxiliary semantic entities such as supercategories and attributes. |

32 | Transportability from Multiple Environments with Limited Experiments: Completeness Results | Elias Bareinboim, Judea Pearl | This paper addresses the problem of $mz$-transportability, that is, transferring causal knowledge collected in several heterogeneous domains to a target domain in which only passive observations and limited experimental data can be collected. |

33 | Augmentative Message Passing for Traveling Salesman Problem and Graph Partitioning | Siamak Ravanbakhsh, Reihaneh Rabbany, Russell Greiner | 2) For graph-partitioning (a.k.a. community mining) using modularity optimization, we introduce a binary variable model with a large number of constraints that enforce formation of cliques. |

34 | Causal Inference through a Witness Protection Program | Ricardo Silva, Robin Evans | We introduce a novel approach for estimating causal effects that exploits observational conditional independencies to suggest “weak” paths in a unknown causal graph. |

35 | Incremental Clustering: The Case for Extra Clusters | Margareta Ackerman, Sanjoy Dasgupta | In this paper, we initiate the formal analysis of incremental clustering methods focusing on the types of cluster structure that they are able to detect. |

36 | Multi-scale Graphical Models for Spatio-Temporal Processes | firdaus janoos, Huseyin Denli, Niranjan Subrahmanya | In this paper, we present a hierarchical graphical model with physically derived priors that better represents the multi-scale character of these dynamical systems. |

37 | Iterative Neural Autoregressive Distribution Estimator NADE-k | Tapani Raiko, Yao Li, Kyunghyun Cho, Yoshua Bengio | We propose a new model that extends this inference scheme to multiple steps, arguing that it is easier to learn to improve a reconstruction in $k$ steps rather than to learn to reconstruct in a single inference step. |

38 | Sparse PCA via Covariance Thresholding | Yash Deshpande, Andrea Montanari | In an influential paper, Johnstone and Lu \cite{johnstone2004sparse} introduced a simple algorithm that estimates the support of the principal vectors $\bv_1,\dots,\bv_r$ by the largest entries in the diagonal of the empirical covariance. |

39 | Low-dimensional models of neural population activity in sensory cortical circuits | Evan W. Archer, Urs Koster, Jonathan W. Pillow, Jakob H. Macke | We introduce a fast estimation method using online expectation maximization with Laplace approximations. |

40 | A Synaptical Story of Persistent Activity with Graded Lifetime in a Neural System | Yuanyuan Mi, Luozheng Li, Dahui Wang, Si Wu | We propose a simple yet effective mechanism to achieve this goal, which utilizes the property of short-term plasticity (STP) of neuronal synapses. |

41 | A Representation Theory for Ranking Functions | Harsh H. Pareek, Pradeep K. Ravikumar | This paper presents a representation theory for permutation-valued functions, which in their general form can also be called listwise ranking functions. |

42 | Near-optimal sample compression for nearest neighbors | Lee-Ad Gottlieb, Aryeh Kontorovich, Pinhas Nisnevitch | We present the first sample compression algorithm for nearest neighbors with non-trivial performance guarantees. |

43 | Combinatorial Pure Exploration of Multi-Armed Bandits | Shouyuan Chen, Tian Lin, Irwin King, Michael R. Lyu, Wei Chen | We present general learning algorithms which work for all decision classes that admit offline maximization oracles in both fixed confidence and fixed budget settings. |

44 | Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces | Minh Ha Quang, Marco San Biagio, Vittorio Murino | This paper introduces a novel mathematical and computational framework, namely {\it Log-Hilbert-Schmidt metric} between positive definite operators on a Hilbert space. |

45 | Consistency of Spectral Partitioning of Uniform Hypergraphs under Planted Partition Model | Debarghya Ghoshdastidar, Ambedkar Dukkipati | In this paper, we view these models as random m-uniform hypergraphs and establish the consistency of spectral algorithm in this general setting. |

46 | Spectral Clustering of graphs with the Bethe Hessian | Alaa Saade, Florent Krzakala, Lenka Zdeborov� | Here, we propose to use instead a simpler object, a symmetric real matrix known as the Bethe Hessian operator, or deformed Laplacian. |

47 | Fast and Robust Least Squares Estimation in Corrupted Linear Models | Brian McWilliams, Gabriel Krummenacher, Mario Lucic, Joachim M. Buhmann | The concept of influence that was developed for regression diagnostics can be used to detect such corrupted observations as shown in this paper. |

48 | Local Decorrelation For Improved Pedestrian Detection | Woonhyun Nam, Piotr Dollar, Joon Hee Han | Inspired by recent work on discriminative decorrelation of HOG features, we instead propose an efficient feature transform that removes correlations in local neighborhoods. |

49 | Robust Kernel Density Estimation by Scaling and Projection in Hilbert Space | Robert A. Vandermeulen, Clayton Scott | We present a robust version of the popular kernel density estimator (KDE). |

50 | Beyond Disagreement-Based Agnostic Active Learning | Chicheng Zhang, Kamalika Chaudhuri | In this paper, we provide such an algorithm. |

51 | Bayes-Adaptive Simulation-based Search with Value Function Approximation | Arthur Guez, Nicolas Heess, David Silver, Peter Dayan | We present a tractable method for approximating the Bayes-adaptive solution by combining simulation-based search with a novel value function approximation technique that generalises over belief space. |

52 | A State-Space Model for Decoding Auditory Attentional Modulation from MEG in a Competing-Speaker Environment | Sahar Akram, Jonathan Z. Simon, Shihab A. Shamma, Behtash Babadi | We present simulation studies as well as application to real MEG data from two human subjects. |

53 | Active Regression by Stratification | Sivan Sabato, Remi Munos | We propose a new active learning algorithm for parametric linear regression with random design. |

54 | Sensory Integration and Density Estimation | Joseph G. Makin, Philip N. Sabes | We prove here an analytical connection between these seemingly different tasks, density estimation and sensory integration; that the former implies the latter for the model used in [2]; but that this does not appear to be true for all models. |

55 | Learning Deep Features for Scene Recognition using Places Database | Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva | We propose new methods to compare the density and diversity of image datasets and show that Places is as dense as other scene datasets and has more diversity. Here, we introduce a new scene-centric database called Places with over 7 million labeled pictures of scenes. |

56 | A Complete Variational Tracker | Ryan D. Turner, Steven Bottone, Bhargav Avasarala | We introduce a novel probabilistic tracking algorithm that incorporates combinatorial data association constraints and model-based track management using variational Bayes. |

57 | Spike Frequency Adaptation Implements Anticipative Tracking in Continuous Attractor Neural Networks | Yuanyuan Mi, C. C. Alan Fung, K. Y. Michael Wong, Si Wu | Here we propose a simple yet effective mechanism to implement anticipative tracking in neural systems. |

58 | Efficient Sampling for Learning Sparse Additive Models in High Dimensions | Hemant Tyagi, Bernd G�rtner, Andreas Krause | Assuming $\phi_l$’s to be smooth, we propose a set of points at which to sample $f$ and an efficient randomized algorithm that recovers a \textit{uniform approximation} to each unknown $\phi_l$. |

59 | Deep Joint Task Learning for Generic Object Extraction | Xiaolong Wang, Liliang Zhang, Liang Lin, Zhujin Liang, Wangmeng Zuo | We present a general joint task learning framework, in which each task (either object localization or object segmentation) is tackled via a multi-layer convolutional neural network, and the two networks work collaboratively to boost performance. |

60 | Robust Bayesian Max-Margin Clustering | Changyou Chen, Jun Zhu, Xinhua Zhang | We present max-margin Bayesian clustering (BMC), a general and robust framework that incorporates the max-margin criterion into Bayesian clustering models, as well as two concrete models of BMC to demonstrate its flexibility and effectiveness in dealing with different clustering tasks. |

61 | Permutation Diffusion Maps (PDM) with Application to the Image Association Problem in Computer Vision | Deepti Pachauri, Risi Kondor, Gautam Sargur, Vikas Singh | In this paper we introduce a new method, Permutations Diffusion Maps (PDM), to solve the matching problem, as well as a related new affinity measure, derived using ideas from harmonic analysis on the symmetric group. |

62 | Bounded Regret for Finite-Armed Structured Bandits | Tor Lattimore, Remi Munos | We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. |

63 | Coresets for k-Segmentation of Streaming Data | Guy Rosman, Mikhail Volkov, Dan Feldman, John W. Fisher III, Daniela Rus | We consider the problem of computing optimal segmentation of such signals by k-piecewise linear function, using only one pass over the data by maintaining a coreset for the signal. |

64 | Two-Stream Convolutional Networks for Action Recognition in Videos | Karen Simonyan, Andrew Zisserman | Our contribution is three-fold. |

65 | Discovering Structure in High-Dimensional Data Through Correlation Explanation | Greg Ver Steeg, Aram Galstyan | We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective. |

66 | Positive Curvature and Hamiltonian Monte Carlo | Christof Seiler, Simon Rubinstein-Salzedo, Susan Holmes | We show that positive curvature can be used to prove theoretical concentration results for HMC Markov chains. |

67 | Learning Mixed Multinomial Logit Model from Ordinal Data | Sewoong Oh, Devavrat Shah | To that end, we present a sufficient condition as well as an efficient algorithm for learning mixed MNL models from partial preferences/comparisons data. |

68 | Near-optimal Reinforcement Learning in Factored MDPs | Ian Osband, Benjamin Van Roy | We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sampling reinforcement learning (PSRL) and an upper confidence bound algorithm (UCRL-Factored). |

69 | Efficient learning by implicit exploration in bandit problems with side observations | Tom� Koc�k, Gergely Neu, Michal Valko, Remi Munos | For this setting, we propose the first algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions. |

70 | Repeated Contextual Auctions with Strategic Buyers | Kareem Amin, Afshin Rostamizadeh, Umar Syed | Motivated by real-time advertising exchanges, we analyze the problem of pricing inventory in a repeated posted-price auction. |

71 | Recursive Inversion Models for Permutations | Christopher Meek, Marina Meila | We describe how one can do parameter estimation and propose an approach to structure search for this class of models. |

72 | On the Convergence Rate of Decomposable Submodular Function Minimization | Robert Nishihara, Stefanie Jegelka, Michael I. Jordan | In this paper, we show that the algorithm converges linearly, and we provide upper and lower bounds on the rate of convergence. |

73 | New Rules for Domain Independent Lifted MAP Inference | Happy Mittal, Prasoon Goyal, Vibhav G. Gogate, Parag Singla | In this paper, we present two new lifting rules, which enable fast MAP inference in a large class of MLNs. |

74 | PAC-Bayesian AUC classification and scoring | James Ridgway, Pierre Alquier, Nicolas Chopin, Feng Liang | We derive in particular a Sequential Monte Carlo algorithm, as an efficient method which may be used as a gold standard, and an Expectation-Propagation algorithm, as a much faster but approximate method. |

75 | Optimization Methods for Sparse Pseudo-Likelihood Graphical Model Selection | Sang Oh, Onkar Dalal, Kshitij Khare, Bala Rajaratnam | In this paper, we address this crucial gap by proposing two proximal gradient methods (CONCORD-ISTA and CONCORD-FISTA) for performing $\ell_1$-regularized inverse covariance matrix estimation in the pseudo-likelihood framework. |

76 | On Prior Distributions and Approximate Inference for Structured Variables | Oluwasanmi O. Koyejo, Rajiv Khanna, Joydeep Ghosh, Russell Poldrack | We present a general framework for constructing prior distributions with structured variables. |

77 | On Iterative Hard Thresholding Methods for High-dimensional M-Estimation | Prateek Jain, Ambuj Tewari, Purushottam Kar | In this work we bridge this gap by providing the first analysis for IHT-style methods in the high dimensional statistical setting. |

78 | Online and Stochastic Gradient Methods for Non-decomposable Loss Functions | Purushottam Kar, Harikrishna Narasimhan, Prateek Jain | In this work we initiate a study of online learning techniques for such non-decomposable loss functions with an aim to enable incremental learning as well as design scalable solvers for batch problems. |

79 | Analysis of Learning from Positive and Unlabeled Data | Marthinus C. du Plessis, Gang Niu, Masashi Sugiyama | Analysis of Learning from Positive and Unlabeled Data |

80 | Dimensionality Reduction with Subspace Structure Preservation | Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju | In this paper, we propose a novel dimensionality reduction algorithm that theoretically preserves this structure for a given dataset. |

81 | Constrained convex minimization via model-based excessive gap | Quoc Tran-Dinh, Volkan Cevher | We introduce a model-based excessive gap technique to analyze first-order primal- dual methods for constrained convex minimization. |

82 | Poisson Process Jumping between an Unknown Number of Rates: Application to Neural Spike Data | Florian Stimberg, Andreas Ruttor, Manfred Opper | We introduce a model where the rate of an inhomogeneous Poisson process is modified by a Chinese restaurant process. |

83 | Probabilistic ODE Solvers with Runge-Kutta Means | Michael Schober, David K. Duvenaud, Philipp Hennig | We construct a family of probabilistic numerical methods that instead return a Gauss-Markov process defining a probability distribution over the ODE solution. |

84 | Optimal decision-making with time-varying evidence reliability | Jan Drugowitsch, Ruben Moreno-Bote, Alexandre Pouget | The work presented here describes the computation and characterization of optimal decision-making in the more realistic case of an evidence reliability that varies across time even within a trial. |

85 | Learning Shuffle Ideals Under Restricted Distributions | Dongqu Chen | In the empirical direction, we propose a heuristic algorithm for learning shuffle ideals from given labeled strings under general unrestricted distributions. |

86 | Discriminative Unsupervised Feature Learning with Convolutional Neural Networks | Alexey Dosovitskiy, Jost Tobias Springenberg, Martin Riedmiller, Thomas Brox | In this paper we present an approach for training a convolutional neural network using only unlabeled data. |

87 | Distance-Based Network Recovery under Feature Correlation | David Adametz, Volker Roth | We present an inference method for Gaussian graphical models when only pairwise distances of n objects are observed. |

88 | Bandit Convex Optimization: Towards Tight Bounds | Elad Hazan, Kfir Levy | In this paper we take a step towards understanding the best attainable regret bounds for BCO: we give an efficient and near-optimal regret algorithm for BCO with strongly-convex and smooth loss functions. |

89 | Projective dictionary pair learning for pattern classification | Shuhang Gu, Lei Zhang, Wangmeng Zuo, Xiangchu Feng | We propose a new discriminative DL framework, namely projective dictionary pair learning (DPL), which learns a synthesis dictionary and an analysis dictionary jointly to achieve the goal of signal representation and discrimination. |

90 | Provable Submodular Minimization using Wolfe's Algorithm | Deeparnab Chakrabarty, Prateek Jain, Pravesh Kothari | In this paper we give a maiden convergence analysis of Wolfe’s algorithm. |

91 | Exploiting easy data in online optimization | Amir Sani, Gergely Neu, Alessandro Lazaric | In this paper we introduce a general algorithm that, provided with a safe learning algorithm and an opportunistic benchmark, can effectively combine good worst-case guarantees with much improved performance on easy data. |

92 | Sparse Multi-Task Reinforcement Learning | Daniele Calandriello, Alessandro Lazaric, Marcello Restelli | In this paper we investigate the case when all the tasks can be accurately represented in a linear approximation space using the same small subset of the original (large) set of features. |

93 | Best-Arm Identification in Linear Bandits | Marta Soare, Alessandro Lazaric, Remi Munos | We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. |

94 | Mind the Nuisance: Gaussian Process Classification using Privileged Noise | Daniel Hern�ndez-lobato, Viktoriia Sharmanska, Kristian Kersting, Christoph H. Lampert, Novi Quadrianto | Here, we show that privileged information can naturally be treated as noise in the latent function of a Gaussian process classifier (GPC). |

95 | Tight Bounds for Influence in Diffusion Networks and Application to Bond Percolation and Epidemiology | Remi Lemonnier, Kevin Scaman, Nicolas Vayatis | In this paper, we derive theoretical bounds for the long-term influence of a node in an Independent Cascade Model (ICM). |

96 | On the Computational Efficiency of Training Neural Networks | Roi Livni, Shai Shalev-Shwartz, Ohad Shamir | In this paper we revisit the computational complexity of training neural networks from a modern perspective. |

97 | Self-Adaptable Templates for Feature Coding | Xavier Boix, Gemma Roig, Salomon Diether, Luc V. Gool | In this paper, we analyze O2P as a coding-pooling scheme. |

98 | Multilabel Structured Output Learning with Random Spanning Trees of Max-Margin Markov Networks | Mario Marchand, Hongyu Su, Emilie Morvant, Juho Rousu, John S. Shawe-Taylor | We show that the usual score function for conditional Markov networks can be written as the expectation over the scores of their spanning trees. |

99 | Stochastic Network Design in Bidirected Trees | xiaojian wu, Daniel R. Sheldon, Shlomo Zilberstein | We investigate the problem of stochastic network design in bidirected trees. |

100 | Learning convolution filters for inverse covariance estimation of neural network connectivity | George Mohler | In this paper we show how inverse covariance estimation can be dramatically improved using a simple convolution filter prior to applying sample covariance. |

101 | SerialRank: Spectral Ranking using Seriation | Fajwel Fogel, Alexandre d’Aspremont, Milan Vojnovic | We describe a seriation algorithm for ranking a set of n items given pairwise comparisons between these items. |

102 | Clamping Variables and Approximate Inference | Adrian Weller, Tony Jebara | For an attractive model, we show that summing over the Bethe partition functions for each sub-model obtained after clamping any variable can only raise (and hence improve) the approximation. |

103 | Predictive Entropy Search for Efficient Global Optimization of Black-box Functions | Jos� Miguel Hern�ndez-Lobato, Matthew W. Hoffman, Zoubin Ghahramani | We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). |

104 | A Block-Coordinate Descent Approach for Large-scale Sparse Inverse Covariance Estimation | Eran Treister, Javier S. Turek | In this paper we present a new block-coordinate descent approach for solving the problem for large-scale data sets. |

105 | Efficient Inference of Continuous Markov Random Fields with Polynomial Potentials | Shenlong Wang, Alex Schwing, Raquel Urtasun | In this paper, we prove that every multivariate polynomial with even degree can be decomposed into a sum of convex and concave polynomials. |

106 | Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices | Austin R. Benson, Jason D. Lee, Bartek Rajwa, David F. Gleich | In this paper, we show how to make these algorithms scalable for data matrices that have many more rows than columns, so-called tall-and-skinny matrices.” |

107 | Inferring synaptic conductances from spike trains with a biophysically inspired point process model | Kenneth W. Latimer, E.J. Chichilnisky, Fred Rieke, Jonathan W. Pillow | Here we introduce a biophysically inspired point process model that explicitly incorporates stimulus-induced changes in synaptic conductance in a dynamical model of neuronal membrane potential. |

108 | Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights | Daniel Soudry, Itay Hubara, Ron Meir | Multilayer Neural Networks (MNNs) are commonly trained using gradient descent-based methods, such as BackPropagation (BP). |

109 | Incremental Local Gaussian Regression | Franziska Meier, Philipp Hennig, Stefan Schaal | In this paper, we suggest a path from Gaussian (process) regression to locally weighted regression, where we retain the best of both approaches. |

110 | General Table Completion using a Bayesian Nonparametric Model | Isabel Valera, Zoubin Ghahramani | In this paper, we provide an efficient and robust table completion tool, based on a Bayesian nonparametric latent feature model. |

111 | Universal Option Models | hengshuai yao, Csaba Szepesvari, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar | We introduce a new model for an option that is independent of any reward function, called the {\it universal option model (UOM)}. |

112 | Approximating Hierarchical MV-sets for Hierarchical Clustering | Assaf Glazer, Omer Weissbrod, Michael Lindenbaum, Shaul Markovitch | We present empirical results that demonstrate the superiority of our method over existing ones. |

113 | Constant Nullspace Strong Convexity and Fast Convergence of Proximal Methods under High-Dimensional Settings | Ian En-Hsu Yen, Cho-Jui Hsieh, Pradeep K. Ravikumar, Inderjit S. Dhillon | In this paper, we investigate a novel variant of strong convexity, which we call Constant Nullspace Strong Convexity (CNSC), where we require that the objective function be strongly convex only over a constant subspace. |

114 | Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm | Deanna Needell, Rachel Ward, Nati Srebro | Our results are based on a connection we make between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods. |

115 | A Framework for Testing Identifiability of Bayesian Models of Perception | Luigi Acerbi, Wei Ji Ma, Sethu Vijayakumar | We propose a novel framework for a systematic testing of the identifiability of a significant class of Bayesian observer models, with practical applications for improving experimental design. |

116 | Optimistic Planning in Markov Decision Processes Using a Generative Model | Bal�zs Sz�r�nyi, Gunnar Kedenburg, Remi Munos | We consider the problem of online planning in a Markov decision process with discounted rewards for any given initial state. |

117 | Gaussian Process Volatility Model | Yue Wu, Jos� Miguel Hern�ndez-Lobato, Zoubin Ghahramani | To address these problems we introduce GP-Vol, a novel non-parametric model for time-changing variances based on Gaussian Processes. |

118 | A Safe Screening Rule for Sparse Logistic Regression | Jie Wang, Jiayu Zhou, Jun Liu, Peter Wonka, Jieping Ye | In this paper, we present a fast and effective sparse logistic regression screening rule (Slores) to identify the zero components in the solution vector, which may lead to a substantial reduction in the number of features to be entered to the optimization. |

119 | Hardness of parameter estimation in graphical models | Guy Bresler, David Gamarnik, Devavrat Shah | The goal of this paper is to investigate the computational feasibility of this statistical task. |

120 | Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics | Sergey Levine, Pieter Abbeel | We present a policy search method that uses iteratively refitted local linear models to optimize trajectory distributions for large, continuous problems. |

121 | Magnitude-sensitive preference formation` | Nisheeth Srivastava, Ed Vul, Paul R. Schrater | In this paper, we develop a theory of magnitude-sensitive preference learning that permits an agent to rationally infer its preferences for items compared with money options of different magnitudes. |

122 | Extreme bandits | Alexandra Carpentier, Michal Valko | In this paper, we study an efficient way to allocate these resources sequentially under limited feedback. |

123 | Distributed Estimation, Information Loss and Exponential Families | Qiang Liu, Alexander T. Ihler | We study a simple communication-efficient learning framework that first calculates the local maximum likelihood estimates (MLE) based on the data subsets, and then combines the local MLEs to achieve the best possible approximation to the global MLE, based on the whole dataset jointly. |

124 | Non-convex Robust PCA | Praneeth Netrapalli, Niranjan U N, Sujay Sanghavi, Animashree Anandkumar, Prateek Jain | We propose a new provable method for robust PCA, where the task is to recover a low-rank matrix, which is corrupted with sparse perturbations. |

125 | Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning | Francesco Orabona | In this paper, we propose a new kernel-based stochastic gradient descent algorithm that performs model selection while training, with no parameters to tune, nor any form of cross-validation. |

126 | Learning From Weakly Supervised Data by The Expectation Loss SVM (e-SVM) algorithm | Jun Zhu, Junhua Mao, Alan L. Yuille | We propose a novel learning algorithm called \emph{expectation loss SVM} (e-SVM) that is devoted to the problems where only the “positiveness” instead of a binary label of each training sample is available. |

127 | Message Passing Inference for Large Scale Graphical Models with High Order Potentials | Jian Zhang, Alex Schwing, Raquel Urtasun | In this paper we propose a partitioning strategy followed by a message passing algorithm which is able to exploit pre-computations. |

128 | Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors | Lingqiao Liu, Chunhua Shen, Lei Wang, Anton van den Hengel, Chao Wang | In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. |

129 | Dependent nonparametric trees for dynamic hierarchical clustering | Kumar Avinava Dubey, Qirong Ho, Sinead A. Williamson, Eric P. Xing | In this paper, we present a distribution over collections of time-dependent, infinite-dimensional trees that can be used to model evolving hierarchies, and present an efficient and scalable algorithm for performing approximate inference in such a model. |

130 | Causal Strategic Inference in Networked Microfinance Economies | Mohammad T. Irfan, Luis E. Ortiz | We propose \emph{causal strategic inference} as a framework for conducting interventions and apply it to large, networked microfinance economies. |

131 | Learning Multiple Tasks in Parallel with a Shared Annotator | Haim Cohen, Koby Crammer | We introduce a new multi-task framework, in which $K$ online learners are sharing a single annotator with limited bandwidth. |

132 | Reducing the Rank in Relational Factorization Models by Including Observable Patterns | Maximilian Nickel, Xueyan Jiang, Volker Tresp | Based on our findings, we propose a novel additive tensor factorization model for learning from latent and observable patterns in multi-relational data and present a scalable algorithm for computing the factorization. |

133 | Clustering from Labels and Time-Varying Graphs | Shiau Hong Lim, Yudong Chen, Huan Xu | We present a general framework for graph clustering where a label is observed to each pair of nodes. |

134 | From Stochastic Mixability to Fast Rates | Nishant A. Mehta, Robert C. Williamson | The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of $(\ell,\mathcal{F}, \mathsf{P})$, and in so doing provides new insight into the fast-rates phenomenon. |

135 | Recovery of Coherent Data via Low-Rank Dictionary Pursuit | Guangcan Liu, Ping Li | The recently established RPCA method provides a convenient way to restore low-rank matrices from grossly corrupted observations. |

136 | Inferring sparse representations of continuous signals with continuous orthogonal matching pursuit | Karin C. Knudson, Jacob Yates, Alexander Huk, Jonathan W. Pillow | Here we present a fast method for recovering these identities, amplitudes, and translations. |

137 | Analysis of Variational Bayesian Latent Dirichlet Allocation: Weaker Sparsity Than MAP | Shinichi Nakajima, Issei Sato, Masashi Sugiyama, Kazuho Watanabe, Hiroko Kobayashi | In this paper, we theoretically investigate variational Bayesian (VB) learning in LDA. |

138 | Discovering, Learning and Exploiting Relevance | Cem Tekin, Mihaela Van Der Schaar | In this paper we consider the problem of learning online what is the information to consider when making sequential decisions. |

139 | Divide-and-Conquer Learning by Anchoring a Conical Hull | Tianyi Zhou, Jeff A. Bilmes, Carlos Guestrin | To find the $k$ anchors, we propose a novel divide-and-conquer learning scheme “DCA” that distributes the problem to $\mathcal O(k\log k)$ same-type sub-problems on different low-D random hyperplanes, each can be solved by any solver. |

140 | Extended and Unscented Gaussian Processes | Daniel M. Steinberg, Edwin V. Bonilla | We present two new methods for inference in Gaussian process (GP) models with general nonlinear likelihoods. |

141 | Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing | Yuchen Zhang, Xi Chen, Dengyong Zhou, Michael I. Jordan | In this paper, we propose a two-stage efficient algorithm for multi-class crowd labeling problems. |

142 | Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation | Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus | We present techniques for speeding up the test-time evaluation of large convolutional networks, designed for object recognition tasks. |

143 | Learning to Discover Efficient Mathematical Identities | Wojciech Zaremba, Karol Kurach, Rob Fergus | In this paper we explore how machine learning techniques can be applied to the discovery of efficient mathematical identities. |

144 | The Large Margin Mechanism for Differentially Private Maximization | Kamalika Chaudhuri, Daniel J. Hsu, Shuang Song | This work provides the first general purpose, range-independent algorithm for private maximization that guarantees approximate differential privacy. |

145 | DFacTo: Distributed Factorization of Tensors | Joon Hee Choi, S. Vishwanathan | We present a technique for significantly speeding up Alternating Least Squares (ALS) and Gradient Descent (GD), two widely used algorithms for tensor factorization. |

146 | Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology | Mehmet G�nen, Adam A. Margolin | In this paper, we propose a novel multiple kernel learning algorithm that extends kernel k-means clustering to the multiview setting, which combines kernels calculated on the views in a localized way to better capture sample-specific characteristics of the data. |

147 | Conditional Swap Regret and Conditional Correlated Equilibrium | Mehryar Mohri, Scott Yang | We present algorithms for minimizing conditional swap regret with bounded conditioning history. |

148 | Mode Estimation for High Dimensional Discrete Tree Graphical Models | Chao Chen, Han Liu, Dimitris Metaxas, Tianqi Zhao | An efficient algorithm with provable theoretical guarantees is proposed and is applied to applications like data analysis and multiple predictions. |

149 | Large-scale L-BFGS using MapReduce | Weizhu Chen, Zhenghao Wang, Jingren Zhou | In this paper, we study the problem of parallelizing the L-BFGS algorithm in large clusters of tens of thousands of shared-nothing commodity machines. |

150 | Submodular Attribute Selection for Action Recognition in Video | Jingjing Zheng, Zhuolin Jiang, Rama Chellappa, Jonathon P. Phillips | In this work, we encode actions based on attributes that describes actions as high-level concepts: \textit{e.g.}, jump forward and motion in the air. |

151 | Efficient Structured Matrix Rank Minimization | Adams Wei Yu, Wanli Ma, Yaoliang Yu, Jaime Carbonell, Suvrit Sra | We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map. |

152 | On Integrated Clustering and Outlier Detection | Lionel Ott, Linsey Pang, Fabio T. Ramos, Sanjay Chawla | We model the joint clustering and outlier detection problem using an extension of the facility location formulation. |

153 | A Drifting-Games Analysis for Online Learning and Applications to Boosting | Haipeng Luo, Robert E. Schapire | We provide a general mechanism to design online learning algorithms based on a minimax analysis within a drifting-games framework. |

154 | Projecting Markov Random Field Parameters for Fast Mixing | Xianghang Liu, Justin Domke | This paper gives sufficient conditions to guarantee that univariate Gibbs sampling on Markov Random Fields (MRFs) will be fast mixing, in a precise sense. |

155 | Automatic Discovery of Cognitive Skills to Improve the Prediction of Student Learning | Robert V. Lindsey, Mohammad Khajah, Michael C. Mozer | We propose a technique that uses student performance data to automatically discover the skills needed in a discipline. |

156 | Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures | Ananda Theertha Suresh, Alon Orlitsky, Jayadev Acharya, Ashkan Jafarpour | Conversely, we show that any estimator requires $\Omega\bigl({dk}/{\epsilon^2}\bigr)$ samples, hence the algorithm’s sample complexity is nearly optimal in the dimension. |

157 | Automated Variational Inference for Gaussian Process Models | Trung V. Nguyen, Edwin V. Bonilla | We develop an automated variational method for approximate inference in Gaussian process (GP) models whose posteriors are often intractable. |

158 | Learning Mixtures of Submodular Functions for Image Collection Summarization | Sebastian Tschiatschek, Rishabh K. Iyer, Haochen Wei, Jeff A. Bilmes | We address the problem of image collection summarization by learning mixtures of submodular functions. We provide a new dataset consisting of 14 real-world image collections along with many human-generated ground truth summaries collected using mechanical turk. |

159 | Robust Tensor Decomposition with Gross Corruption | Quanquan Gu, Huan Gui, Jiawei Han | In this paper, we study the statistical performance of robust tensor decomposition with gross corruption. |

160 | Provable Tensor Factorization with Missing Data | Prateek Jain, Sewoong Oh | We propose a novel alternating minimization based method which iteratively refines estimates of the singular vectors. |

161 | Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization | Meisam Razaviyayn, Mingyi Hong, Zhi-Quan Luo, Jong-Shi Pang | In this work, we propose an inexact parallel BCD approach where at each iteration, a subset of the variables is updated in parallel by minimizing convex approximations of the original objective function. |

162 | Using Convolutional Neural Networks to Recognize Rhythm ?Stimuli from Electroencephalography Recordings | Sebastian Stober, Daniel J. Cameron, Jessica A. Grahn | We investigate the impact of the data representation and the pre-processing steps for this classification tasks and compare different network structures. |

163 | Blossom Tree Graphical Models | Zhe Liu, John Lafferty | We combine the ideas behind trees and Gaussian graphical models to form a new nonparametric family of graphical models. |

164 | Model-based Reinforcement Learning and the Eluder Dimension | Ian Osband, Benjamin Van Roy | We consider the problem of learning to optimize an unknown Markov decision process (MDP). |

165 | Minimax-optimal Inference from Partial Rankings | Bruce Hajek, Sewoong Oh, Jiaming Xu | This paper studies the problem of rank aggregation under the Plackett-Luce model. |

166 | Spectral Methods for Indian Buffet Process Inference | Hsiao-Yu Tung, Alexander J. Smola | We provide an efficient spectral algorithm as an alternative to costly Variational Bayes and sampling-based algorithms. |

167 | On the Statistical Consistency of Plug-in Classifiers for Non-decomposable Performance Measures | Harikrishna Narasimhan, Rohit Vaish, Shivani Agarwal | In this work, we consider plug-in algorithms that learn a classifier by applying an empirically determined threshold to a suitable `estimate’ of the class probability, and provide a general methodology to show consistency of these methods for any non-decomposable measure that can be expressed as a continuous function of true positive rate (TPR) and true negative rate (TNR), and for which the Bayes optimal classifier is the class probability function thresholded suitably. |

168 | Top Rank Optimization in Linear Time | Nan Li, Rong Jin, Zhi-Hua Zhou | We propose a highly efficient approach, titled TopPush, for optimizing accuracy at the top that has computational complexity linear in the number of training instances. |

169 | Spectral Methods for Supervised Topic Models | Yining Wang, Jun Zhu | This paper presents a novel spectral decomposition algorithm to recover the parameters of supervised latent Dirichlet allocation (sLDA) models. |

170 | Graphical Models for Recovering Probabilistic and Causal Queries from Missing Data | Karthika Mohan, Judea Pearl | We address the problem of deciding whether a causal or probabilistic query is estimable from data corrupted by missing entries, given a model of missingness process. |

171 | Sparse PCA with Oracle Property | Quanquan Gu, Zhaoran Wang, Han Liu | In this paper, we study the estimation of the $k$-dimensional sparse principal subspace of covariance matrix $\Sigma$ in the high-dimensional setting. |

172 | Unsupervised Transcription of Piano Music | Taylor Berg-Kirkpatrick, Jacob Andreas, Dan Klein | We present a new probabilistic model for transcribing piano music from audio to a symbolic form. |

173 | Decoupled Variational Gaussian Inference | Mohammad E. Khan | We propose decoupled variational inference that brings the best of both worlds together. |

174 | Estimation with Norm Regularization | Arindam Banerjee, Sheng Chen, Farideh Fazayeli, Vidyashankar Sivakumar | We characterize the restricted error set, establish relations between error sets for the constrained and regularized problems, and present an estimation error bound applicable to {\em any} norm. |

175 | Decomposing Parameter Estimation Problems | Khaled S. Refaat, Arthur Choi, Adnan Darwiche | We propose a technique for decomposing the parameter learning problem in Bayesian networks into independent learning problems. |

176 | Stochastic Proximal Gradient Descent with Acceleration Techniques | Atsushi Nitanda | In this paper, we propose and analyze an accelerated variant of these methods in the mini-batch setting. |

177 | Learning to Optimize via Information-Directed Sampling | Daniel Russo, Benjamin Van Roy | We propose information-directed sampling — a new algorithm for online optimization problems in which a decision-maker must balance between exploration and exploitation while learning from partial feedback. |

178 | Covariance shrinkage for autocorrelated data | Daniel Bartz, Klaus-Robert M�ller | We contribute in this work by showing that the Sancetta estimator, while being consistent in the high-dimensional limit, suffers from a high bias in finite sample sizes. |

179 | Do Convnets Learn Correspondence? | Jonathan L. Long, Ning Zhang, Trevor Darrell | In this paper, we study the effectiveness of convnet activation features for tasks requiring correspondence. |

180 | The Blinded Bandit: Learning with Adaptive Feedback | Ofer Dekel, Elad Hazan, Tomer Koren | We develop efficient online learning algorithms for this problem and prove that they guarantee the same asymptotic regret as the optimal algorithms for the standard multi-armed bandit problem. |

181 | Convex Optimization Procedure for Clustering: Theoretical Revisit | Changbo Zhu, Huan Xu, Chenlei Leng, Shuicheng Yan | In this paper, we present theoretical analysis of SON~–~a convex optimization procedure for clustering using a sum-of-norms (SON) regularization recently proposed in \cite{ICML2011Hocking_419,SON, Lindsten650707, pelckmans2005convex}. |

182 | Sparse Bayesian structure learning with �dependent relevance determination� priors | Anqi Wu, Mijung Park, Oluwasanmi O. Koyejo, Jonathan W. Pillow | Here we introduce a hierarchical model for smooth, region-sparse weight vectors and tensors in a linear regression setting. |

183 | Weakly-supervised Discovery of Visual Pattern Configurations | Hyun Oh Song, Yong Jae Lee, Stefanie Jegelka, Trevor Darrell | We propose an approach that automatically identifies discriminative configurations of visual patterns that are characteristic of a given object class. |

184 | SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives | Aaron Defazio, Francis Bach, Simon Lacoste-Julien | In this work we introduce a new fast incremental gradient method SAGA, in the spirit of SAG, SDCA, MISO and SVRG. |

185 | Exclusive Feature Learning on Arbitrary Structures via \ell_{1,2}-norm |
Deguang Kong, Ryohei Fujimaki, Ji Liu, Feiping Nie, Chris Ding | In this paper, we propose a new formulation called “exclusive group lasso”, which brings out sparsity at intra-group level in the context of feature selection. |

186 | Time–Data Tradeoffs by Aggressive Smoothing | John J. Bruer, Joel A. Tropp, Volkan Cevher, Stephen Becker | This paper proposes a tradeoff between sample complexity and computation time that applies to statistical estimators based on convex optimization. |

187 | Distributed Power-law Graph Computing: Theoretical and Empirical Analysis | Cong Xie, Ling Yan, Wu-Jun Li, Zhihua Zhang | In this paper, we propose a novel vertex-cut method, called \emph{degree-based hashing}~(DBH), for GP. |

188 | A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input | Mateusz Malinowski, Mario Fritz | We propose a method for automatically answering questions about images by bringing together recent advances from natural language processing and computer vision. |

189 | Efficient Partial Monitoring with Prior Information | Hastagiri P. Vanchinathan, G�bor Bart�k, Andreas Krause | In this paper, we assume that we are given some prior information about the distribution based on which the opponent generates the outcomes. |

190 | Distributed Parameter Estimation in Probabilistic Graphical Models | Yariv D. Mizrahi, Misha Denil, Nando de Freitas | This paper presents foundational theoretical results on distributed parameter estimation for undirected probabilistic graphical models. |

191 | Unsupervised Deep Haar Scattering on Graphs | Xu Chen, Xiuyuan Cheng, Stephane Mallat | We introduce a Haar scattering transform on graphs, which computes invariant signal descriptors. |

192 | Online Optimization for Max-Norm Regularization | Jie Shen, Huan Xu, Ping Li | In this paper, we propose an online algorithm for solving max-norm regularized problems that is scalable to large problems. |

193 | Probabilistic low-rank matrix completion on finite alphabets | Jean Lafond, Olga Klopp, Eric Moulines, Joseph Salmon | Most works have focused on recovering an unknown real-valued low-rank matrix from randomly sub-sampling its entries. |

194 | Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations | Xianjie Chen, Alan L. Yuille | We present a method for estimating articulated human pose from a single static image based on a graphical model with novel pairwise relations that make adaptive use of local image measurements. |

195 | Bayesian Inference for Structured Spike and Slab Priors | Michael R. Andersen, Ole Winther, Lars K. Hansen | We propose a novel prior formulation, the structured spike and slab prior, which allows to incorporate a priori knowledge of the sparsity pattern by imposing a spatial Gaussian process on the spike and slab probabilities. |

196 | Bayesian Nonlinear Support Vector Machines and Discriminative Factor Modeling | Ricardo Henao, Xin Yuan, Lawrence Carin | Bayesian Nonlinear Support Vector Machines and Discriminative Factor Modeling |

197 | Generalized Higher-Order Orthogonal Iteration for Tensor Decomposition and Completion | Yuanyuan Liu, Fanhua Shang, Wei Fan, James Cheng, Hong Cheng | To address this difficulty, we therefore propose an efficient and scalable core tensor Schatten 1-norm minimization method for simultaneous tensor decomposition and completion, with a much lower computational complexity. |

198 | Making Pairwise Binary Graphical Models Attractive | Nicholas Ruozzi, Tony Jebara | In this work, we propose a novel scheme that has better convergence properties than BP and provably provides better partition function estimates in many instances than TRBP. |

199 | Low Rank Approximation Lower Bounds in Row-Update Streams | David Woodruff | We study low-rank approximation in the streaming model in which the rows of an $n \times d$ matrix $A$ are presented one at a time in an arbitrary order. |

200 | Deep Convolutional Neural Network for Image Deconvolution | Li Xu, Jimmy SJ Ren, Ce Liu, Jiaya Jia | Instead of perfectly modeling outliers, which is rather challenging from a generative model perspective, we develop a deep convolutional neural network to capture the characteristics of degradation. |

201 | Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation | Jonathan J. Tompson, Arjun Jain, Yann LeCun, Christoph Bregler | This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. |

202 | Learning Generative Models with Visual Attention | Yichuan Tang, Nitish Srivastava, Ruslan R. Salakhutdinov | Inspired by the attention models in visual neuroscience and the need for object-centered data for generative models, we propose a deep-learning based generative framework using attention. |

203 | Metric Learning for Temporal Sequence Alignment | Damien Garreau, R�mi Lajugie, Sylvain Arlot, Francis Bach | In this paper, we propose to learn a Mahalanobis distance to perform alignment of multivariate time series. |

204 | Learning Optimal Commitment to Overcome Insecurity | Avrim Blum, Nika Haghtalab, Ariel D. Procaccia | We design an algorithm that optimizes the defender’s strategy with no prior information, by observing the attacker’s responses to randomized deployments of resources and learning his priorities. |

205 | How hard is my MDP?" The distribution-norm to the rescue" | Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor | This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) we call the {\em distribution-norm}. |

206 | Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms | Siu On Chan, Ilias Diakonikolas, Rocco A. Servedio, Xiaorui Sun | The main contribution of this paper is a highly efficient density estimation algorithm for learning using a variable-width histogram, i.e., a hypothesis distribution with a piecewise constant probability density function. |

207 | An Autoencoder Approach to Learning Bilingual Word Representations | Sarath Chandar A P, Stanislas Lauly, Hugo Larochelle, Mitesh Khapra, Balaraman Ravindran, Vikas C. Raykar, Amrita Saha | In this work we explore the use of autoencoder-based methods for cross-language learning of vectorial word representations that are aligned between two languages, while not relying on word-level alignments. |

208 | Sequential Monte Carlo for Graphical Models | Christian Andersson Naesseth, Fredrik Lindsten, Thomas B. Sch�n | We propose a new framework for how to use sequential Monte Carlo (SMC) algorithms for inference in probabilistic graphical models (PGM). |

209 | Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers | Mehryar Mohri, Andres Munoz | We analyze a very broad family of monotone regret minimization algorithms for this problem, which includes the previous best known algorithm, and show that no algorithm in that family admits a strategic regret more favorable than $\Omega(\sqrt{T})$. |

210 | Optimal prior-dependent neural population codes under shared input noise | Agnieszka Grabska-Barwinska, Jonathan W. Pillow | Here we analyze population coding under a simple alternative model in which latent input noise” corrupts the stimulus before it is encoded by the population. |

211 | Deep Fragment Embeddings for Bidirectional Image Sentence Mapping | Andrej Karpathy, Armand Joulin, Li F. Fei-Fei | We introduce a model for bidirectional retrieval of images and sentences through a deep, multi-modal embedding of visual and natural language data. |

212 | Flexible Transfer Learning under Support and Model Shift | Xuezhi Wang, Jeff Schneider | In this paper, we consider a general case where both the support and the model change across domains. |

213 | Probabilistic Differential Dynamic Programming | Yunpeng Pan, Evangelos Theodorou | We present a data-driven, probabilistic trajectory optimization framework for systems with unknown dynamics, called Probabilistic Differential Dynamic Programming (PDDP). |

214 | Predicting Useful Neighborhoods for Lazy Local Learning | Aron Yu, Kristen Grauman | Rather than simply gather the test example’s nearest neighbors, we propose to predict the subset of training data that is jointly relevant to training its local model. |

215 | Modeling Deep Temporal Dependencies with Recurrent Grammar Cells"" | Vincent Michalski, Roland Memisevic, Kishore Konda | We propose modeling time series by representing the transformations that take a frame at time t to a frame at time t+1. |

216 | Generalized Dantzig Selector: Application to the k-support norm | Soumyadeep Chatterjee, Sheng Chen, Arindam Banerjee | We propose a Generalized Dantzig Selector (GDS) for linear models, in which any norm encoding the parameter structure can be leveraged for estimation. |

217 | Neurons as Monte Carlo Samplers: Bayesian ?Inference and Learning in Spiking Networks | Yanping Huang, Rajesh P. Rao | We propose a two-layer spiking network capable of performing approximate inference and learning for a hidden Markov model. |

218 | The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification | Been Kim, Cynthia Rudin, Julie A. Shah | We present the Bayesian Case Model (BCM), a general framework for Bayesian case-based reasoning (CBR) and prototype classification and clustering. |

219 | Latent Support Measure Machines for Bag-of-Words Data Classification | Yuya Yoshikawa, Tomoharu Iwata, Hiroshi Sawada | With the latent SMM, a latent vector is associated with each vocabulary term, and each document is represented as a distribution of the latent vectors for words appearing in the document. |

220 | Local Linear Convergence of Forward–Backward under Partial Smoothness | Jingwei Liang, Jalal Fadili, Gabriel Peyr� | In this paper, we consider the Forward–Backward proximal splitting algorithm to minimize the sum of two proper closed convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relatively to an active manifold $\mathcal{M}$. |

221 | RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning | Marek Petrik, Dharmashankar Subramanian | We describe how to use robust Markov decision processes for value function approximation with state aggregation. |

222 | Deep Learning Face Representation by Joint Identification-Verification | Yi Sun, Yuheng Chen, Xiaogang Wang, Xiaoou Tang | In this paper, we show that it can be well solved with deep learning and using both face identification and verification signals as supervision. |

223 | A provable SVD-based algorithm for learning topics in dominant admixture corpus | Trapit Bansal, Chiranjib Bhattacharyya, Ravindran Kannan | A major contribution of the paper is to show that under this more realistic assumption, which is empirically verified on real corpora, a singular value decomposition (SVD) based algorithm with a crucial pre-processing step of thresholding, can provably recover the topics from a collection of documents drawn from Dominant admixtures. |

224 | QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models | Cho-Jui Hsieh, Inderjit S. Dhillon, Pradeep K. Ravikumar, Stephen Becker, Peder A. Olsen | In this paper, we develop a family of algorithms for optimizing superposition-structured” or “dirty” statistical estimators for high-dimensional problems involving the minimization of the sum of a smooth loss function with a hybrid regularization. |

225 | General Stochastic Networks for Classification | Matthias Z�hrer, Franz Pernkopf | In particular, we introduce a hybrid training objective considering a generative and discriminative cost function governed by a trade-off parameter lambda. |

226 | Spatio-temporal Representations of Uncertainty in Spiking Neural Networks | Cristina Savin, Sophie Den�ve | Here we present a novel circuit model for representing multidimensional real-valued distributions using a spike based spatio-temporal code. |

227 | Attentional Neural Network: Feature Selection Using Cognitive Feedback | Qian Wang, Jiaxing Zhang, Sen Song, Zheng Zhang | We obtain classification accuracy better than or competitive with state of art results on the MNIST variation dataset, and successfully disentangle overlaid digits with high success rates. |

228 | Convolutional Neural Network Architectures for Matching Natural Language Sentences | Baotian Hu, Zhengdong Lu, Hang Li, Qingcai Chen | As a step toward this goal, we propose convolutional neural network models for matching two sentences, by adapting the convolutional strategy in vision and speech. |

229 | Scalable Non-linear Learning with Adaptive Polynomial Expansions | Alekh Agarwal, Alina Beygelzimer, Daniel J. Hsu, John Langford, Matus J. Telgarsky | We describe a new algorithm that explicitly and adaptively expands higher-order interaction features over base linear representations. |

230 | On the relations of LFPs & Neural Spike Trains | David E. Carlson, Jana Schaich Borg, Kafui Dzirasa, Lawrence Carin | This work proposes a strategy for identifying neural networks characterized by time- and frequency-dependent connectivity patterns, using convolutional dictionary learning that links spike-train data to local field potentials (LFPs) across multiple areas of the brain. |

231 | Diverse Sequential Subset Selection for Supervised Video Summarization | Boqing Gong, Wei-Lun Chao, Kristen Grauman, Fei Sha | To this end, we propose the sequential determinantal point process (seqDPP), a probabilistic model for diverse sequential subset selection. |

232 | Self-Paced Learning with Diversity | Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, Alexander Hauptmann | To incorporate this information, we propose an approach called self-paced learning with diversity (SPLD) which formalizes the preference for both easy and diverse samples into a general regularizer. |

233 | Feature Cross-Substitution in Adversarial Classification | Bo Li, Yevgeniy Vorobeychik | We investigate both the problem of modeling the objectives of such adversaries, as well as the algorithmic problem of accounting for rational, objective-driven adversaries. |

234 | Deep Recursive Neural Networks for Compositionality in Language | Ozan Irsoy, Claire Cardie | In this work we introduce a new architecture — a deep recursive neural network (deep RNN) — constructed by stacking multiple recursive layers. |

235 | Inference by Learning: Speeding-up Graphical Model Optimization via a Coarse-to-Fine Cascade of Pruning Classifiers | Bruno Conejo, Nikos Komodakis, Sebastien Leprince, Jean Philippe Avouac | We propose a general and versatile framework that significantly speeds-up graphical model optimization while maintaining an excellent solution accuracy. |

236 | A Filtering Approach to Stochastic Variational Inference | Neil Houlsby, David Blei | We present an alternative perspective on SVI as approximate parallel coordinate ascent. |

237 | Optimizing F-Measures by Cost-Sensitive Classification | Shameem Puthiya Parambath, Nicolas Usunier, Yves Grandvalet | We present a theoretical analysis of F-measures for binary, multiclass and multilabel classification. |

238 | Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets | Jie Wang, Jieping Ye | In this paper, we propose a novel two-layer feature reduction method (TLFre) for SGL via a decomposition of its dual feasible set. |

239 | Improved Multimodal Deep Learning with Variation of Information | Kihyuk Sohn, Wenling Shang, Honglak Lee | In this paper, we propose a novel multimodal representation learning framework that explicitly aims this goal. |

240 | PEWA: Patch-based Exponentially Weighted Aggregation for image denoising | Charles Kervrann | In this paper, we propose a general statistical aggregation method which combines image patches denoised with several commonly-used algorithms. |

241 | Elementary Estimators for Graphical Models | Eunho Yang, Aurelie C. Lozano, Pradeep K. Ravikumar | We propose a class of closed-form estimators for sparsity-structured graphical models, expressed as exponential family distributions, under high-dimensional settings. |

242 | Beyond the Birkhoff Polytope: Convex Relaxations for Vector Permutation Problems | Cong Han Lim, Stephen Wright | Using a recent construction of Goemans (2010), we show that when optimizing over the convex hull of the permutation vectors (the permutahedron), we can reduce the number of variables and constraints to $\Theta(n \log n)$ in theory and $\Theta(n \log^2 n)$ in practice. |

243 | Neural Word Embedding as Implicit Matrix Factorization | Omer Levy, Yoav Goldberg | We analyze skip-gram with negative-sampling (SGNS), a word embedding method introduced by Mikolov et al., and show that it is implicitly factorizing a word-context matrix, whose cells are the pointwise mutual information (PMI) of the respective word and context pairs, shifted by a global constant. |

244 | Multi-Resolution Cascades for Multiclass Object Detection | Mohammad Saberian, Nuno Vasconcelos | An algorithm for learning fast multiclass object detection cascades is introduced. |

245 | Median Selection Subset Aggregation for Parallel Inference | Xiangyu Wang, Peichao Peng, David B. Dunson | We propose a MEdian Selection Subset AGgregation Estimator (message) algorithm, which attempts to solve these problems. |

246 | Recurrent Models of Visual Attention | Volodymyr Mnih, Nicolas Heess, Alex Graves, koray kavukcuoglu | We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. |

247 | Tree-structured Gaussian Process Approximations | Thang D. Bui, Richard E. Turner | In this paper we devise an approximation whose complexity grows linearly with the number of pseudo-datapoints. |

248 | Active Learning and Best-Response Dynamics | Maria-Florina F. Balcan, Christopher Berlind, Avrim Blum, Emma Cohen, Kaushik Patnaik, Le Song | We consider a setting in which low-power distributed sensors are each making highly noisy measurements of some unknown target function. |

249 | Analog Memories in a Balanced Rate-Based Network of E-I Neurons | Dylan Festa, Guillaume Hennequin, Mate Lengyel | We propose a novel control-theoretic framework to build functioning attractor networks that satisfy a set of relevant physiological constraints. |

250 | Fast Sampling-Based Inference in Balanced Neuronal Networks | Guillaume Hennequin, Laurence Aitchison, Mate Lengyel | We explore this bottleneck in a simple, linear-Gaussian latent variable model, in which posterior sampling can be achieved by stochastic neural networks with linear dynamics. |

251 | Spectral Learning of Mixture of Hidden Markov Models | Cem Subakan, Johannes Traa, Paris Smaragdis | In this paper, we propose a learning approach for the Mixture of Hidden Markov Models (MHMM) based on the Method of Moments (MoM). |

252 | Subspace Embeddings for the Polynomial Kernel | Haim Avron, Huy Nguyen, David Woodruff | In particular, we propose an embedding for mappings induced by the polynomial kernel. |

253 | A Boosting Framework on Grounds of Online Learning | Tofigh Naghibi Mohamadpoor, Beat Pfister | By exploiting the duality between boosting and online learning, we present a boosting framework which proves to be extremely powerful thanks to employing the vast knowledge available in the online learning area. |

254 | A Dual Algorithm for Olfactory Computation in the Locust Brain | Sina Tootoonian, Mate Lengyel | We study the early locust olfactory system in an attempt to explain its well-characterized structure and dynamics. |

255 | Advances in Learning Bayesian Networks of Bounded Treewidth | Siqi Nie, Denis D. Maua, Cassio P. de Campos, Qiang Ji | This work presents novel algorithms for learning Bayesian networks of bounded treewidth. |

256 | Learning the Learning Rate for Prediction with Expert Advice | Wouter M. Koolen, Tim van Erven, Peter Gr�nwald | To close the gap between theory and practice we introduce an approach to learn the learning rate. |

257 | On the Information Theoretic Limits of Learning Ising Models | Rashish Tandon, Karthikeyan Shanmugam, Pradeep K. Ravikumar, Alexandros G. Dimakis | We provide a general framework for computing lower-bounds on the sample complexity of recovering the underlying graphs of Ising models, given i.i.d. samples. |

258 | Efficient Optimization for Average Precision SVM | Pritish Mohapatra, C.V. Jawahar, M. Pawan Kumar | To alleviate this deficiency, we propose three complementary approaches. |

259 | Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) | Anshumali Shrivastava, Ping Li | We present the first provably sublinear time hashing algorithm for approximate \emph{Maximum Inner Product Search} (MIPS). |

260 | A framework for studying synaptic plasticity with neural spike train data | Scott Linderman, Christopher H. Stock, Ryan P. Adams | The computational rules according to which synaptic weights change over time are the subject of much research, and are not precisely understood. |

261 | Randomized Experimental Design for Causal Graph Discovery | Huining Hu, Zhentao Li, Adrian R. Vetta | We examine the number of controlled experiments required to discover a causal graph. |

262 | A Multiplicative Model for Learning Distributed Text-Based Attribute Representations | Ryan Kiros, Richard Zemel, Ruslan R. Salakhutdinov | In this paper we propose a general framework for learning distributed representations of attributes: characteristics of text whose representations can be jointly learned with word embeddings. |

263 | Learning Chordal Markov Networks by Dynamic Programming | Kustaa Kangas, Mikko Koivisto, Teppo Niinim�ki | We present an algorithm for finding a chordal Markov network that maximizes any given decomposable scoring function. |

264 | Depth Map Prediction from a Single Image using a Multi-Scale Deep Network | David Eigen, Christian Puhrsch, Rob Fergus | In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. |

265 | Proximal Quasi-Newton for Computationally Intensive L1-regularized M-estimators | Kai Zhong, Ian En-Hsu Yen, Inderjit S. Dhillon, Pradeep K. Ravikumar | In this work, we propose the use of a carefully constructed proximal quasi-Newton algorithm for such computationally intensive M-estimation problems, where we employ an aggressive active set selection technique. |

266 | A Probabilistic Framework for Multimodal Retrieval using Integrative Indian Buffet Process | Bahadir Ozdemir, Larry S. Davis | We propose a multimodal retrieval procedure based on latent feature models. |

267 | Searching for Higgs Boson Decay Modes with Deep Learning | Peter J. Sadowski, Daniel Whiteson, Pierre Baldi | In this paper, we train artificial neural networks to detect the decay of the Higgs boson to tau leptons on a dataset of 82 million simulated collision events. |

268 | Structure Regularization for Structured Prediction | Xu Sun | To control structure-based overfitting, we propose a structure regularization framework via \emph{structure decomposition}, which decomposes training samples into mini-samples with simpler structures, deriving a model with better generalization power. |

269 | On Multiplicative Multitask Feature Learning | Xin Wang, Jinbo Bi, Shipeng Yu, Jiangwen Sun | We propose two new learning formulations by varying the parameters in the proposed framework. |

270 | Multivariate f-divergence Estimation With Confidence | Kevin Moon, Alfred Hero | This theory enables us to perform divergence-based inference tasks such as testing equality of pairs of distributions based on empirical samples. |

271 | Generalized Unsupervised Manifold Alignment | Zhen Cui, Hong Chang, Shiguang Shan, Xilin Chen | In this paper, we propose a generalized Unsupervised Manifold Alignment (GUMA) method to build the connections between different but correlated datasets without any known correspondences. |

272 | Smoothed Gradients for Stochastic Variational Inference | Stephan Mandt, David Blei | In this paper, we explore the idea of following biased stochastic gradients in SVI. |

273 | Recursive Context Propagation Network for Semantic Scene Labeling | Abhishek Sharma, Oncel Tuzel, Ming-Yu Liu | We propose a deep feed-forward neural network architecture for pixel-wise semantic scene labeling. |

274 | Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space | Ian En-Hsu Yen, Ting-Wei Lin, Shou-De Lin, Pradeep K. Ravikumar, Inderjit S. Dhillon | In this paper, we propose a Sparse Random Feature algorithm, which learns a sparse non-linear predictor by minimizing an $\ell_1$-regularized objective function over the Hilbert Space induced from kernel function. |

275 | Optimal Teaching for Limited-Capacity Human Learners | Kaustubh R. Patil, Jerry Zhu, Lukasz Kopec, Bradley C. Love | In this contribution, we take a first principles approach to constructing idealized training sets. |

276 | Shaping Social Activity by Incentivizing Users | Mehrdad Farajtabar, Nan Du, Manuel Gomez Rodriguez, Isabel Valera, Hongyuan Zha, Le Song | In this paper, we model social events using multivariate Hawkes processes, which can capture both endogenous and exogenous event intensities, and derive a time dependent linear relation between the intensity of exogenous events and the overall network activity. |

277 | Analysis of Brain States from Multi-Region LFP Time-Series | Kyle R. Ulrich, David E. Carlson, Wenzhao Lian, Jana S. Borg, Kafui Dzirasa, Lawrence Carin | An infinite hidden Markov model (iHMM) is proposed to model the evolution of brain states, based on electrophysiological LFP data measured at multiple brain regions. |

278 | Reputation-based Worker Filtering in Crowdsourcing | Srikanth Jagabathula, Lakshminarayanan Subramanian, Ashwin Venkataraman | In this paper, we study the problem of aggregating noisy labels from crowd workers to infer the underlying true labels of binary tasks. |

279 | Multi-Class Deep Boosting | Vitaly Kuznetsov, Mehryar Mohri, Umar Syed | We present new ensemble learning algorithms for multi-class classification. |

280 | A Differential Equation for Modeling Nesterov�s Accelerated Gradient Method: Theory and Insights | Weijie Su, Stephen Boyd, Emmanuel Candes | We derive a second-order ordinary differential equation (ODE), which is the limit of Nesterov’s accelerated gradient method. |

281 | Difference of Convex Functions Programming for Reinforcement Learning | Bilal Piot, Matthieu Geist, Olivier Pietquin | The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming. |

282 | Design Principles of the Hippocampal Cognitive Map | Kimberly L. Stachenfeld, Matthew Botvinick, Samuel J. Gershman | We hypothesize a set of design principles for the hippocampal cognitive map that explain how place fields represent space in a way that facilitates navigation and reinforcement learning. |

283 | Deep Symmetry Networks | Robert Gens, Pedro M. Domingos | In this paper, we introduce deep symmetry networks (symnets), a generalization of convnets that forms feature maps over arbitrary symmetry groups. |

284 | Nonparametric Bayesian inference on multivariate exponential families | William R. Vega-Brown, Marek Doniec, Nicholas G. Roy | We develop a model by choosing the maximum entropy distribution from the set of models satisfying certain smoothness and independence criteria; we show that inference on this model generalizes local kernel estimation to the context of Bayesian inference on stochastic processes. |

285 | Optimal rates for k-NN density and mode estimation | Sanjoy Dasgupta, Samory Kpotufe | We present two related contributions of independent interest: (1) high-probability finite sample rates for $k$-NN density estimation, and (2) practical mode estimators — based on $k$-NN — which attain minimax-optimal rates under surprisingly general distributional conditions. |

286 | Feedforward Learning of Mixture Models | Matthew Lawlor, Steven W. Zucker | We develop a biologically-plausible learning rule that provably converges to the class means of general mixture models. |

287 | Diverse Randomized Agents Vote to Win | Albert Jiang, Leandro Soriano Marcolino, Ariel D. Procaccia, Tuomas Sandholm, Nisarg Shah, Milind Tambe | With teams of computer Go agents in mind, we develop a novel theoretical model of two-stage noisy voting that builds on recent work in machine learning. |

288 | Ranking via Robust Binary Classification | Hyokun Yun, Parameswaran Raman, S. Vishwanathan | We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification. |

289 | Distributed Balanced Clustering via Mapping Coresets | Mohammadhossein Bateni, Aditya Bhaskara, Silvio Lattanzi, Vahab Mirrokni | In the present paper we develop a general framework based on “mapping coresets” to tackle this issue. |

290 | Augur: Data-Parallel Probabilistic Modeling | Jean-Baptiste Tristan, Daniel Huang, Joseph Tassarotti, Adam C. Pocock, Stephen Green, Guy L. Steele | In this paper we present Augur, a probabilistic modeling language and compiler for Bayesian networks designed to make effective use of data-parallel architectures such as GPUs. |

291 | Learning Mixtures of Ranking Models | Pranjal Awasthi, Avrim Blum, Or Sheffet, Aravindan Vijayaraghavan | We present the first polynomial time algorithm which provably learns the parameters of a mixture of two Mallows models. |

292 | Controlling privacy in recommender systems | Yu Xin, Tommi Jaakkola | In this paper, we explore a two-tiered notion of privacy where there is a small set of “public” users who are willing to share their preferences openly, and a large set of “private” users who require privacy guarantees. |

293 | Convolutional Kernel Networks | Julien Mairal, Piotr Koniusz, Zaid Harchaoui, Cordelia Schmid | In this paper, we address this goal with a new type of convolutional neural network (CNN) whose invariance is encoded by a reproducing kernel. |

294 | Fairness in Multi-Agent Sequential Decision-Making | Chongjie Zhang, Julie A. Shah | We develop a simple linear programming approach and a more scalable game-theoretic approach for computing an optimal fairness policy. |

295 | Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets | Adarsh Prasad, Stefanie Jegelka, Dhruv Batra | We study greedy algorithms for finding a diverse subset of solutions in structured-output spaces by drawing new connections between submodular functions over combinatorial item sets and High-Order Potentials (HOPs) studied for graphical models. |

296 | Do Deep Nets Really Need to be Deep? | Jimmy Ba, Rich Caruana | In this paper we empirically demonstrate that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models. |

297 | Dynamic Rank Factor Model for Text Streams | Shaobo Han, Lin Du, Esther Salazar, Lawrence Carin | We propose a semi-parametric and dynamic rank factor model for topic modeling, capable of (1) discovering topic prevalence over time, and (2) learning contemporary multi-scale dependence structures, providing topic and word correlations as a byproduct. |

298 | Generative Adversarial Nets | Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio | We propose a new framework for estimating generative models via adversarial nets, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. |

299 | Testing Unfaithful Gaussian Graphical Models | De Wen Soh, Sekhar C. Tatikonda | Testing Unfaithful Gaussian Graphical Models |

300 | Global Sensitivity Analysis for MAP Inference in Graphical Models | Jasper De Bock, Cassio P. de Campos, Alessandro Antonucci | Our main contribution is an exact algorithm that can check whether the MAP configuration is robust with respect to given perturbations. |

301 | Deconvolution of High Dimensional Mixtures via Boosting, with Application to Diffusion-Weighted MRI of Human Brain | Charles Y. Zheng, Franco Pestilli, Ariel Rokem | Here, we propose a more general algorithm that fits mixture models of any dimensionality without discretization. |

302 | Efficient Minimax Signal Detection on Graphs | Jing Qian, Venkatesh Saligrama | We overcome the combinatorics of connectivity by embedding connected subgraphs into linear matrix inequalities (LMI). |

303 | Cone-Constrained Principal Component Analysis | Yash Deshpande, Andrea Montanari, Emile Richard | We consider a simple model for noisy quadratic observation of an unknown vector $\bvz$. |

304 | On Communication Cost of Distributed Statistical Estimation and Dimensionality | Ankit Garg, Tengyu Ma, Huy Nguyen | Specifically we study the problem of estimating the mean $\vectheta$ of an unknown $d$ dimensional gaussian distribution in the distributed setting. |

305 | Computing Nash Equilibria in Generalized Interdependent Security Games | Hau Chan, Luis E. Ortiz | We study the computational complexity of computing Nash equilibria in generalized interdependent-security (IDS) games. |

306 | Consistent Binary Classification with Generalized Performance Metrics | Oluwasanmi O. Koyejo, Nagarajan Natarajan, Pradeep K. Ravikumar, Inderjit S. Dhillon | We propose two algorithms for estimating the optimal classifiers, and prove their statistical consistency. |

307 | Greedy Subspace Clustering | Dohyung Park, Constantine Caramanis, Sujay Sanghavi | In this paper, we provide new simple and efficient algorithms for this problem. |

308 | Deterministic Symmetric Positive Semidefinite Matrix Completion | William E. Bishop, Byron M. Yu | We consider the problem of recovering a symmetric, positive semidefinite (SPSD) matrix from a subset of its entries, possibly corrupted by noise. We develop a set of sufficient conditions for the recovery of a SPSD matrix from a set of its principal submatrices, present necessity results based on this set of conditions and develop an algorithm that can exactly recover a matrix when these conditions are met. |

309 | Multi-Step Stochastic ADMM in High Dimensions: Applications to Sparse Optimization and Matrix Decomposition | Hanie Sedghi, Anima Anandkumar, Edmond Jonckheere | In this paper, we consider a multi-step version of the stochastic ADMM method with efficient guarantees for high-dimensional problems. |

310 | Online combinatorial optimization with stochastic decision sets and adversarial losses | Gergely Neu, Michal Valko | In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. |

311 | Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature | Tom Gunter, Michael A. Osborne, Roman Garnett, Philipp Hennig, Stephen J. Roberts | We propose a novel sampling framework for inference in probabilistic models: an active learning approach that converges more quickly (in wall-clock time) than Markov chain Monte Carlo (MCMC) benchmarks. |

312 | Multi-Scale Spectral Decomposition of Massive Graphs | Si Si, Donghyuk Shin, Inderjit S. Dhillon, Beresford N. Parlett | In this paper, we propose and analyze a novel multi-scale spectral decomposition method (MSEIGS), which first clusters the graph into smaller clusters whose spectral decomposition can be computed efficiently and independently. |

313 | The limits of squared Euclidean distance regularization | Michal Derezinski, Manfred K. Warmuth | We give a random construction for sets of examples where the target linear weight vector is trivial to learn but any algorithm from the above family is drastically sub-optimal. |

314 | Bregman Alternating Direction Method of Multipliers | Huahua Wang, Arindam Banerjee | In this paper, we similarly generalize the alternating direction method of multipliers (ADMM) to Bregman ADMM (BADMM), which allows the choice of different Bregman divergences to exploit the structure of problems. |

315 | Multitask learning meets tensor factorization: task imputation via convex optimization | Kishan Wimalawarne, Masashi Sugiyama, Ryota Tomioka | We propose a new norm, which we call the scaled latent trace norm and analyze the excess risk of all the three norms. |

316 | On Model Parallelization and Scheduling Strategies for Distributed Machine Learning | Seunghak Lee, Jin Kyu Kim, Xun Zheng, Qirong Ho, Garth A. Gibson, Eric P. Xing | In this paper, we develop a system for model-parallelism, STRADS, that provides a programming abstraction for scheduling parameter updates by discovering and leveraging changing structural properties of ML programs. |

317 | Scalable Inference for Neuronal Connectivity from Calcium Imaging | Alyson K. Fletcher, Sundeep Rangan | In this work, we propose a computationally fast method for the state estimation based on a hybrid of loopy belief propagation and approximate message passing (AMP). |

318 | Structure learning of antiferromagnetic Ising models | Guy Bresler, David Gamarnik, Devavrat Shah | In this paper we investigate the computational complexity of learning the graph structure underlying a discrete undirected graphical model from i.i.d. samples. |

319 | The Noisy Power Method: A Meta Algorithm with Applications | Moritz Hardt, Eric Price | We provide a new robust convergence analysis of the well-known power method for computing the dominant singular vectors of a matrix that we call noisy power method. |

320 | Algorithm selection by rational metareasoning as a model of human strategy selection | Falk Lieder, Dillon Plunkett, Jessica B. Hamrick, Stuart J. Russell, Nicholas Hay, Tom Griffiths | Here, we view the algorithm selection problem as a special case of metareasoning and derive a solution that outperforms existing methods in sorting algorithm selection. |

321 | Extremal Mechanisms for Local Differential Privacy | Peter Kairouz, Sewoong Oh, Pramod Viswanath | We introduce a family of extremal privatization mechanisms, which we call staircase mechanisms, and prove that it contains the optimal privatization mechanism that maximizes utility. |

322 | Global Belief Recursive Neural Networks | Romain Paulus, Richard Socher, Christopher D. Manning | We introduce global belief recursive neural networks (GB-RNNs) which are based on the idea of extending purely feedforward neural networks to include one feedbackward step during inference. |

323 | A statistical model for tensor PCA | Emile Richard, Andrea Montanari | For moderate dimensions, we propose an hybrid approach that uses unfolding together with power iteration, and show that it outperforms significantly baseline methods. |

324 | Real-Time Decoding of an Integrate and Fire Encoder | Shreya Saxena, Munther Dahleh | Here, we introduce a causal, real-time decoder of the biophysically-based Integrate and Fire encoding neuron model. |

325 | Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning | Brendan McMahan, Matthew Streeter | We analyze new online gradient descent algorithms for distributed systems with large delays between gradient computations and the corresponding updates. |

326 | On the Number of Linear Regions of Deep Neural Networks | Guido F. Montufar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio | We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. |

327 | Identifying and attacking the saddle point problem in high-dimensional non-convex optimization | Yann N. Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio | Motivated by these arguments, we propose a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. |

328 | Extracting Latent Structure From Multiple Interacting Neural Populations | Joao Semedo, Amin Zandvakili, Adam Kohn, Christian K. Machens, Byron M. Yu | Rather than attempting to identify direct interactions between neurons (where the number of interactions grows with the number of neurons squared), we propose to extract a smaller number of latent variables from each population and study how the latent variables interact. |

329 | Learning with Fredholm Kernels | Qichao Que, Mikhail Belkin, Yusu Wang | In this paper we propose a framework for supervised and semi-supervised learning based on reformulating the learning problem as a regularized Fredholm integral equation. |

330 | Hamming Ball Auxiliary Sampling for Factorial Hidden Markov Models | Michalis Titsias RC AUEB, Christopher Yau | We introduce a novel sampling algorithm for Markov chain Monte Carlo-based Bayesian inference for factorial hidden Markov models. |

331 | Optimizing Energy Production Using Policy Search and Predictive State Representations | Yuri Grinberg, Doina Precup, Michel Gendreau | We propose a policy-search-based approach coupled with predictive modelling to address this problem. |

332 | Scaling-up Importance Sampling for Markov Logic Networks | Deepak Venugopal, Vibhav G. Gogate | In this paper, we address both problems by presenting a scalable, lifted importance sampling-based approach that never grounds the full MLN. |

333 | Optimal Neural Codes for Control and Estimation | Alex K. Susemihl, Ron Meir, Manfred Opper | In this work we show that sensory adaptation for control differs from sensory adaptation for perception, even for simple control setups. |

334 | Graph Clustering With Missing Data: Convex Algorithms and Analysis | Ramya Korlakai Vinayak, Samet Oymak, Babak Hassibi | We consider the problem of finding clusters in an unweighted graph, when the graph is partially observed. |

335 | Scale Adaptive Blind Deblurring | Haichao Zhang, Jianchao Yang | We present a scale space perspective on blind deblurring algorithms, and introduce a cascaded scale space formulation for blind deblurring. |

336 | Weighted importance sampling for off-policy learning with linear function approximation | A. Rupam Mahmood, Hado P. van Hasselt, Richard S. Sutton | In this paper, we take two steps toward bridging this gap. |

337 | Information-based learning by agents in unbounded state spaces | Shariq A. Mobin, James A. Arnemann, Fritz Sommer | The idea that animals might use information-driven planning to explore an unknown environment and build an internal model of it has been proposed for quite some time. |

338 | Exponential Concentration of a Density Functional Estimator | Shashank Singh, Barnabas Poczos | We analyse a plug-in estimator for a large class of integral functionals of one or more continuous probability densities. |

339 | Scalable Kernel Methods via Doubly Stochastic Gradients | Bo Dai, Bo Xie, Niao He, Yingyu Liang, Anant Raj, Maria-Florina F. Balcan, Le Song | In this paper, we propose an approach that scales up kernel methods using a novel concept called “doubly stochastic functional gradients”. |

340 | Fast Training of Pose Detectors in the Fourier Domain | Jo�o F. Henriques, Pedro Martins, Rui F. Caseiro, Jorge Batista | By assuming that this transformation is norm-preserving and cyclic, we propose a closed-form solution in the Fourier domain that can eliminate most redundancies. |

341 | An Accelerated Proximal Coordinate Gradient Method | Qihang Lin, Zhaosong Lu, Lin Xiao | We develop an accelerated randomized proximal coordinate gradient (APCG) method, for solving a broad class of composite convex optimization problems. |

342 | Communication-Efficient Distributed Dual Coordinate Ascent | Martin Jaggi, Virginia Smith, Martin Takac, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael I. Jordan | In this paper, we propose a communication-efficient framework, COCOA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. |

343 | Simple MAP Inference via Low-Rank Relaxations | Roy Frostig, Sida Wang, Percy S. Liang, Christopher D. Manning | We focus on the problem of maximum a posteriori (MAP) inference in Markov random fields with binary variables and pairwise interactions. |

344 | A* Sampling | Chris J. Maddison, Daniel Tarlow, Tom Minka | In this work, we show how sampling from a continuous distribution can be converted into an optimization problem over continuous space. |

345 | A Bayesian model for identifying hierarchically organised states in neural population activity | Patrick Putzky, Florian Franzen, Giacomo Bassetto, Jakob H. Macke | Here, we present a statistical model for extracting hierarchically organised neural population states from multi-channel recordings of neural spiking activity. |

346 | Sequence to Sequence Learning with Neural Networks | Ilya Sutskever, Oriol Vinyals, Quoc V. Le | In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. |

347 | Improved Distributed Principal Component Analysis | Yingyu Liang, Maria-Florina F. Balcan, Vandana Kanchanapally, David Woodruff | We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for $k$-means clustering and related problems. |

348 | Sparse Polynomial Learning and Graph Sketching | Murat Kocaoglu, Karthikeyan Shanmugam, Alexandros G. Dimakis, Adam Klivans | We give an algorithm for exactly reconstructing $f$ given random examples from the uniform distribution on $\{-1,1\}^n$ that runs in time polynomial in $n$ and $2^{s}$ and succeeds if the function satisfies the \textit{unique sign property}: there is one output value which corresponds to a unique set of values of the participating parities. |

349 | Tight Continuous Relaxation of the Balanced k-Cut Problem | Syama Sundar Rangapuram, Pramod Kaushik Mudrakarta, Matthias Hein | In this paper we propose a new tight continuous relaxation for any balanced k-cut problem and show that a related recently proposed relaxation is in most cases loose leading to poor performance in practice. |

350 | Mondrian Forests: Efficient Online Random Forests | Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh | In this work, we use Mondrian processes (Roy and Teh, 2009) to construct ensembles of random decision trees we call Mondrian forests. |

351 | Expectation-Maximization for Learning Determinantal Point Processes | Jennifer A. Gillenwater, Alex Kulesza, Emily Fox, Ben Taskar | In this work we propose a novel algorithm for learning the full kernel matrix. |

352 | Capturing Semantically Meaningful Word Dependencies with an Admixture of Poisson MRFs | David I. Inouye, Pradeep K. Ravikumar, Inderjit S. Dhillon | We develop a fast algorithm for the Admixture of Poisson MRFs (APM) topic model and propose a novel metric to directly evaluate this model. |

353 | Streaming, Memory Limited Algorithms for Community Detection | Se-Young Yun, marc lelarge, Alexandre Proutiere | In this paper, we consider sparse networks consisting of a finite number of non-overlapping communities, i.e. disjoint clusters, so that there is higher density within clusters than across clusters. |

354 | Content-based recommendations with Poisson factorization | Prem K. Gopalan, Laurent Charlin, David Blei | We develop collaborative topic Poisson factorization (CTPF), a generative model of articles and reader preferences. |

355 | A Statistical Decision-Theoretic Framework for Social Choice | Hossein Azari Soufiani, David C. Parkes, Lirong Xia | In this paper, we take a statistical decision-theoretic viewpoint on social choice, putting a focus on the decision to be made on behalf of a system of agents. |

356 | Compressive Sensing of Signals from a GMM with Sparse Precision Matrices | Jianbo Yang, Xuejun Liao, Minhua Chen, Lawrence Carin | Under these challenging assumptions, we develop a hierarchical Bayesian method to simultaneously estimate the GMM and recover the signals using solely the incomplete measurements and a Bayesian shrinkage prior that promotes sparsity of the Gaussian precision matrices. |

357 | Bayesian Sampling Using Stochastic Gradient Thermostats | Nan Ding, Youhan Fang, Ryan Babbush, Changyou Chen, Robert D. Skeel, Hartmut Neven | Dynamics-based sampling methods, such as Hybrid Monte Carlo (HMC) and Langevin dynamics (LD), are commonly used to sample target distributions. |

358 | On Sparse Gaussian Chain Graph Models | Calvin McCarter, Seyoung Kim | In this paper, we address the problem of learning the structure of Gaussian chain graph models in a high-dimensional space. |

359 | Orbit Regularization | Renato Negrinho, Andre Martins | We propose a general framework for regularization based on group majorization. |

360 | Efficient Minimax Strategies for Square Loss Games | Wouter M. Koolen, Alan Malek, Peter L. Bartlett | We consider online prediction problems where the loss between the prediction and the outcome is measured by the squared Euclidean distance and its generalization, the squared Mahalanobis distance. |

361 | A Residual Bootstrap for High-Dimensional Regression with Near Low-Rank Designs | Miles Lopes | In this setting, we propose a version of RB that resamples residuals obtained from ridge regression. |

362 | Large-Margin Convex Polytope Machine | Alex Kantchelian, Michael C. Tschantz, Ling Huang, Peter L. Bartlett, Anthony D. Joseph, J. D. Tygar | We present the Convex Polytope Machine (CPM), a novel non-linear learning algorithm for large-scale binary classification tasks. |

363 | Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models | Yarin Gal, Mark van der Wilk, Carl Edward Rasmussen | We introduce a novel re-parametrisation of variational inference for sparse GP regression and latent variable models that allows for an efficient distributed algorithm. |

364 | Learning Distributed Representations for Structured Output Prediction | Vivek Srikumar, Christopher D. Manning | In this paper, we present a new formulation for structured prediction where we represent individual labels in a structure as dense vectors and allow semantically similar labels to share parameters. |

365 | Convex Deep Learning via Normalized Kernels | �zlem Aslan, Xinhua Zhang, Dale Schuurmans | In this paper, we develop a new architecture for nested nonlinearities that allows arbitrarily deep compositions to be trained to global optimality. |

366 | Tight convex relaxations for sparse matrix factorization | Emile Richard, Guillaume R. Obozinski, Jean-Philippe Vert | Based on a new atomic norm, we propose a new convex formulation for sparse matrix factorization problems in which the number of nonzero elements of the factors is assumed fixed and known. |

367 | Learning to Search in Branch and Bound Algorithms | He He, Hal Daume III, Jason M. Eisner | We address the key challenge of learning an adaptive node searching order for any class of problem solvable by branch-and-bound. |

368 | An Integer Polynomial Programming Based Framework for Lifted MAP Inference | Somdeb Sarkhel, Deepak Venugopal, Parag Singla, Vibhav G. Gogate | In this paper, we present a new approach for lifted MAP inference in Markov logic networks (MLNs). |

369 | Conditional Random Field Autoencoders for Unsupervised Structured Prediction | Waleed Ammar, Chris Dyer, Noah A. Smith | We introduce a framework for unsupervised learning of structured predictors with overlapping, global features. |

370 | How transferable are features in deep neural networks? | Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson | In this paper we experimentally quantify the generality versus specificity of neurons in each layer of a deep convolutional neural network and report a few surprising results. |

371 | Accelerated Mini-batch Randomized Block Coordinate Descent Method | Tuo Zhao, Mo Yu, Yiming Wang, Raman Arora, Han Liu | In this paper, we propose a mini-batch randomized block coordinate descent (MRBCD) method, which estimates the partial gradient of the selected block based on a mini-batch of randomly sampled data in each iteration. |

372 | Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning | Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang | Our main goal in this work is to build a better real-time Atari game playing agent than DQN. |

373 | A Latent Source Model for Online Collaborative Filtering | Guy Bresler, George H. Chen, Devavrat Shah | We address this theoretical gap by introducing a model for online recommendation systems, cast item recommendation under the model as a learning problem, and analyze the performance of a cosine-similarity collaborative filtering method. |

374 | Distributed Bayesian Posterior Sampling via Moment Sharing | Minjie Xu, Balaji Lakshminarayanan, Yee Whye Teh, Jun Zhu, Bo Zhang | We propose a distributed Markov chain Monte Carlo (MCMC) inference algorithm for large scale Bayesian posterior simulation. |

375 | Learning with Pseudo-Ensembles | Philip Bachman, Ouais Alsharif, Doina Precup | We present a novel regularizer based on making the behavior of a pseudo-ensemble robust with respect to the noise process generating it. |

376 | Learning Time-Varying Coverage Functions | Nan Du, Yingyu Liang, Maria-Florina F. Balcan, Le Song | In this paper, we propose a new problem of learning time-varying coverage functions which arise naturally from applications in social network analysis, machine learning, and algorithmic game theory. |

377 | Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time | Zhaoran Wang, Huanran Lu, Han Liu | In this paper, we propose a two-stage sparse PCA procedure that attains the optimal principal subspace estimator in polynomial time. |

378 | Discriminative Metric Learning by Neighborhood Gerrymandering | Shubhendu Trivedi, David Mcallester, Greg Shakhnarovich | We describe an efficient algorithm for exact loss augmented inference,and a fast gradient descent algorithm for learning in this model. |

379 | Finding a sparse vector in a subspace: Linear sparsity using alternating directions | Qing Qu, Ju Sun, John Wright | We consider the problem of recovering the sparsest vector in a subspace $ \mathcal{S} \in \mathbb{R}^p $ with $ \text{dim}(\mathcal{S})=n$. |

380 | Asynchronous Anytime Sequential Monte Carlo | Brooks Paige, Frank Wood, Arnaud Doucet, Yee Whye Teh | We introduce a new sequential Monte Carlo algorithm we call the particle cascade. |

381 | Discrete Graph Hashing | Wei Liu, Cun Mu, Sanjiv Kumar, Shih-Fu Chang | This paper presents a graph-based unsupervised hashing model to preserve the neighborhood structure of massive data in a discrete code space. |

382 | Feedback Detection for Live Predictors | Stefan Wager, Nick Chamandy, Omkar Muralidharan, Amir Najmi | In this paper we analyze predictor feedback detection as a causal inference problem, and introduce a local randomization scheme that can be used to detect non-linear feedback in real-world problems. |

383 | Rates of Convergence for Nearest Neighbor Classification | Kamalika Chaudhuri, Sanjoy Dasgupta | We analyze the behavior of nearest neighbor classification in metric spaces and provide finite-sample, distribution-dependent rates of convergence under minimal assumptions. |

384 | Consistency of weighted majority votes | Daniel Berend, Aryeh Kontorovich | We revisit from a statistical learning perspective the classical decision-theoretic problem of weighted expert voting. |

385 | Beta-Negative Binomial Process and Exchangeable ?Random Partitions for Mixed-Membership Modeling | Mingyuan Zhou | This paper introduces an exchangeable partition probability function to explicitly describe how the BNBP clusters the data points of each group into a random number of exchangeable partitions, which are shared across all the groups. |

386 | Zero-shot recognition with unreliable attributes | Dinesh Jayaraman, Kristen Grauman | We propose a novel random forest approach to train zero-shot models that explicitly accounts for the unreliability of attribute predictions. |

387 | Concavity of reweighted Kikuchi approximation | Po-Ling Loh, Andre Wibisono | We analyze a reweighted version of the Kikuchi approximation for estimating the log partition function of a product distribution defined over a region graph. |

388 | Online Decision-Making in General Combinatorial Spaces | Arun Rajkumar, Shivani Agarwal | We study online combinatorial decision problems, where one must make sequential decisions in some combinatorial space without knowing in advance the cost of decisions on each trial; the goal is to minimize the total regret over some sequence of trials relative to the best fixed decision in hindsight. |

389 | Fast Multivariate Spatio-temporal Analysis via Low Rank Tensor Learning | Mohammad Taha Bahadori, Qi (Rose) Yu, Yan Liu | We propose a unified low rank tensor learning framework for multivariate spatio-temporal analysis, which can conveniently incorporate different properties in spatio-temporal data, such as spatial clustering and shared structure among variables. |

390 | Clustered factor analysis of multineuronal spike data | Lars Buesing, Timothy A. Machado, John P. Cunningham, Liam Paninski | Here, we extend unstructured factor models by proposing a model that discovers subpopulations or groups of cells from the pool of recorded neurons. |

391 | Algorithms for CVaR Optimization in MDPs | Yinlam Chow, Mohammad Ghavamzadeh | In this paper, we consider the mean-CVaR optimization problem in MDPs. |

392 | Factoring Variations in Natural Images with Deep Gaussian Mixture Models | Aaron van den Oord, Benjamin Schrauwen | In this paper we propose a new scalable deep generative model for images, called the Deep Gaussian Mixture Model, that is a straightforward but powerful generalization of GMMs to multiple layers. |

393 | Partition-wise Linear Models | Hidekazu Oiwa, Ryohei Fujimaki | This paper proposes novel convex region-specific linear models, which we refer to as partition-wise linear models. |

394 | LSDA: Large Scale Detection through Adaptation | Judy Hoffman, Sergio Guadarrama, Eric S. Tzeng, Ronghang Hu, Jeff Donahue, Ross Girshick, Trevor Darrell, Kate Saenko | In this paper, we propose Large Scale Detection through Adaptation (LSDA), an algorithm which learns the difference between the two tasks and transfers this knowledge to classifiers for categories without bounding box annotated data, turning them into detectors. |

395 | Deep Networks with Internal Selective Attention through Feedback Connections | Marijn F. Stollenga, Jonathan Masci, Faustino Gomez, J�rgen Schmidhuber | Deep Networks with Internal Selective Attention through Feedback Connections |

396 | Parallel Feature Selection Inspired by Group Testing | Yingbo Zhou, Utkarsh Porwal, Ce Zhang, Hung Q. Ngo, XuanLong Nguyen, Christopher R�, Venu Govindaraju | This paper presents a parallel feature selection method for classification that scales up to very high dimensions and large data sizes. |

397 | Low-Rank Time-Frequency Synthesis | C�dric F�votte, Matthieu Kowalski | The work presented in this paper fills in this ultimate gap by proposing a novel signal synthesis model with low-rank time-frequency structure. |

398 | Pre-training of Recurrent Neural Networks via Linear Autoencoders | Luca Pasa, Alessandro Sperduti | We propose a pre-training technique for recurrent neural networks based on linear autoencoder networks for sequences, i.e. linear dynamical systems modelling the target sequences. |

399 | Semi-supervised Learning with Deep Generative Models | Durk P. Kingma, Shakir Mohamed, Danilo Jimenez Rezende, Max Welling | We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones. |

400 | Signal Aggregate Constraints in Additive Factorial HMMs, with Application to Energy Disaggregation | Mingjun Zhong, Nigel Goddard, Charles Sutton | We introduce a way of incorporating domain knowledge into this problem, called signal aggregate constraints (SACs). |

401 | Stochastic variational inference for hidden Markov models | Nick Foti, Jason Xu, Dillon Laird, Emily Fox | We propose an algorithm that harnesses the memory decay of the chain to adaptively bound errors arising from edge effects. |

402 | A Wild Bootstrap for Degenerate Kernel Tests | Kacper P. Chwialkowski, Dino Sejdinovic, Arthur Gretton | A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed. |

403 | Biclustering Using Message Passing | Luke O’Connor, Soheil Feizi | We propose a biclustering algorithm that maximizes a global objective function using message passing. |

404 | Fast Kernel Learning for Multidimensional Pattern Extrapolation | Andrew G. Wilson, Elad Gilboa, Arye Nehorai, John P. Cunningham | In this paper, we propose a Gaussian process approach for large scale multidimensional pattern extrapolation. |

405 | Learning on graphs using Orthonormal Representation is Statistically Consistent | Rakesh Shivanna, Chiranjib Bhattacharyya | In this paper, we show that there exists orthonormal representations which are statistically consistent over a large class of graphs, including power law and random graphs. |

406 | Spectral k-Support Norm Regularization | Andrew M. McDonald, Massimiliano Pontil, Dimitris Stamos | We extend the norms from the vector to the matrix setting and we introduce the spectral $k$-support norm. |

407 | Unsupervised learning of an efficient short-term memory network | Pietro Vertechi, Wieland Brendel, Christian K. Machens | We develop our main theory for linear networks, but then sketch how the learning rules could be transferred to balanced, spiking networks. |

408 | Quantized Estimation of Gaussian Sequence Models in Euclidean Balls | Yuancheng Zhu, John Lafferty | In this paper, we present an extension to Pinsker’s theorem where estimation is carried out under storage or communication constraints. |

409 | Learning a Concept Hierarchy from Multi-labeled Documents | Viet-An Nguyen, Jordan L. Ying, Philip Resnik, Jonathan Chang | In this paper, we present a model-Label to Hierarchy (L2H)-that can induce a hierarchy of user-generated labels and the topics associated with those labels from a set of multi-labeled documents. |

410 | Variational Gaussian Process State-Space Models | Roger Frigola, Yutian Chen, Carl Edward Rasmussen | We present a procedure for efficient variational Bayesian learning of nonlinear state-space models based on sparse Gaussian processes. |

411 | Fast Prediction for Large-Scale Kernel Machines | Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon | In this paper, we present two novel insights for improving the prediction efficiency of kernel machines. |