Paper Digest: NIPS 2015 Highlights

December 6, 2015October 6, 2019 admin

The Conference on Neural Information Processing Systems (NIPS) is one of the top machine learning conferences in the world. In 2015, it is to be held in Montreal, Canada.

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team
team@paperdigest.org

TABLE 1: NIPS 2015 Papers

	Title	Authors	Highlight
1	Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing	Nihar Bhadresh Shah, Dengyong Zhou	To address this fundamental challenge in crowdsourcing, we propose a simple payment mechanism to incentivize workers to answer only the questions that they are sure of and skip the rest.
2	Learning with Symmetric Label Noise: The Importance of Being Unhinged	Brendan van Rooyen, Aditya Menon, Robert C. Williamson	In this paper, we propose a convex, classification-calibrated loss and prove that it is SLN-robust.
3	Algorithmic Stability and Uniform Generalization	Ibrahim M. Alabdulmohsin	In this paper, we prove that algorithmic stability in the inference process is equivalent to uniform generalization across all parametric loss functions.
4	Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models	Theodoros Tsiligkaridis, Theodoros Tsiligkaridis, Keith Forsythe	Motivated by large-sample asymptotics, we propose a noveladaptive low-complexity design for the Dirichlet process concentration parameter and show that the number of classes grow at most at a logarithmic rate.
5	Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling	Xiaocheng Shang, Zhanxing Zhu, Benedict Leimkuhler, Amos J. Storkey	In this article, we propose a covariance-controlled adaptive Langevin thermostat that can effectively dissipate parameter-dependent noise while maintaining a desired target distribution.
6	Robust Portfolio Optimization	Huitong Qiu, Fang Han, Han Liu, Brian Caffo	We propose a robust portfolio optimization approach based on quantile statistics.
7	Logarithmic Time Online Multiclass prediction	Anna E. Choromanska, John Langford	We study the problem of multiclass classification with an extremely large number of classes (k), with the goal of obtaining train and test time complexity logarithmic in the number of classes.
8	Planar Ultrametrics for Image Segmentation	Julian E. Yarkony, Charless Fowlkes	We study the problem of hierarchical clustering on planar graphs.
9	Expressing an Image Stream with a Sequence of Natural Sentences	Cesc C. Park, Gunhee Kim	We propose an approach for generating a sequence of natural sentences for an image stream.
10	Parallel Correlation Clustering on Big Graphs	Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael I. Jordan	We show that our algorithms can cluster billion-edge graphs in under 5 seconds on 32 cores, while achieving a 15x speedup.
11	Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks	Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun	In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
12	Space-Time Local Embeddings	Ke Sun, Jun Wang, Alexandros Kalousis, Stephane Marchand-Maillet	We present basic definitions with interesting counter-intuitions.
13	A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements	Qinqing Zheng, John Lafferty	We propose a simple, scalable, and fast gradient descent algorithm to optimize a nonconvex objective for the rank minimization problem and a closely related family of semidefinite programs.
14	Smooth Interactive Submodular Set Cover	Bryan D. He, Yisong Yue	In this paper, we propose a new extension, which we call smooth interactive submodular set cover, that allows the target threshold to vary depending on the plausibility of each hypothesis.
15	Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning	Jiajun Wu, Ilker Yildirim, Joseph J. Lim, Bill Freeman, Josh Tenenbaum	We propose a generative model for solving these problems of physical scene understanding from real-world videos and images.
16	On the Pseudo-Dimension of Nearly Optimal Auctions	Jamie H. Morgenstern, Tim Roughgarden	We introduce t-level auctions to interpolate between simple auctions, such as welfare maximization with reserve prices, and optimal auctions, thereby balancing the competing demands of expressivity and simplicity.
17	Unlocking neural population non-stationarities using hierarchical dynamics models	Mijung Park, Gergo Bohner, Jakob H. Macke	To better understand the nature of co-variability in neural circuits and their impact on cortical information processing, we introduce a hierarchical dynamics model that is able to capture inter-trial modulations in firing rates, as well as neural population dynamics.
18	Bayesian Manifold Learning: The Locally Linear Latent Variable Model (LL-LVM)	Mijung Park, Wittawat Jitkrittum, Ahmad Qamar, Zoltan Szabo, Lars Buesing, Maneesh Sahani	We introduce the Locally Linear Latent Variable Model (LL-LVM), a probabilistic model for non-linear manifold discovery that describes a joint distribution over observations, their manifold coordinates and locally linear maps conditioned on a set of neighbourhood relationships.
19	Color Constancy by Learning to Predict Chromaticity from Luminance	Ayan Chakrabarti	In this paper, we show that the per-pixel color statistics of natural scenes—without any spatial or semantic context—can by themselves be a powerful cue for color constancy.
20	Fast and Accurate Inference of Plackett�Luce Models	Lucas Maystre, Matthias Grossglauser	We take advantage of this perspective and formulate a new spectral algorithm that is significantly more accurate than previous ones for the Plackett–Luce model.
21	Probabilistic Line Searches for Stochastic Optimization	Maren Mahsereci, Philipp Hennig	Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent.
22	Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets	Armand Joulin, Tomas Mikolov	In this paper, we discuss the limitations of standard deep learning approaches and show that some of these limitations can be overcome by learning how to grow the complexity of a model in a structured way.
23	Where are they looking?	Adria Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba	In this paper, we propose a deep neural network-based approach for gaze-following and a new benchmark dataset for thorough evaluation.
24	The Pareto Regret Frontier for Bandits	Tor Lattimore	I show that the price for such unbalanced worst-case regret guarantees is rather high.
25	On the Limitation of Spectral Methods: From the Gaussian Hidden Clique Problem to Rank-One Perturbations of Gaussian Tensors	Andrea Montanari, Daniel Reichman, Ofer Zeitouni	We consider the following detection problem: given a realization of asymmetric matrix $X$ of dimension $n$, distinguish between the hypothesisthat all upper triangular variables are i.i.d. Gaussians variableswith mean 0 and variance $1$ and the hypothesis that there is aplanted principal submatrix $B$ of dimension $L$ for which all upper triangularvariables are i.i.d. Gaussians with mean $1$ and variance $1$, whereasall other upper triangular elements of $X$ not in $B$ are i.i.d.Gaussians variables with mean 0 and variance $1$.
26	Measuring Sample Quality with Stein's Method	Jackson Gorham, Lester Mackey	To address these challenges, we introduce a new computable quality measure based on Stein’s method that bounds the discrepancy between sample and target expectations over a large class of test functions.
27	Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution	Yan Huang, Wei Wang, Liang Wang	Considering that recurrent neural network (RNN) can model long-term contextual information of temporal sequences well, we propose a bidirectional recurrent convolutional network for efficient multi-frame SR.Different from vanilla RNN, 1) the commonly-used recurrent full connections are replaced with weight-sharing convolutional connections and 2) conditional convolutional connections from previous input layers to current hidden layer are added for enhancing visual-temporal dependency modelling.
28	Bounding errors of Expectation-Propagation	Guillaume P. Dehaene, Simon Barthelm�	In this article, we prove that the approximation errors made by EP can be bounded.
29	A fast, universal algorithm to learn parametric nonlinear embeddings	Miguel A. Carreira-Perpinan, Max Vladymyrov	Using the method of auxiliary coordinates, we derive a training algorithm that works by alternating steps that train an auxiliary embedding with steps that train the mapping.
30	Texture Synthesis Using Convolutional Neural Networks	Leon Gatys, Alexander S. Ecker, Matthias Bethge	Here we introduce a new model of natural textures based on the feature spaces of convolutional neural networks optimised for object recognition.
31	Extending Gossip Algorithms to Distributed Estimation of U-statistics	Igor Colin, Aur�lien Bellet, Joseph Salmon, St�phan Cl�men�on	This paper proposes new synchronous and asynchronous randomized gossip algorithms which simultaneously propagate data across the network and maintain local estimates of the U-statistic of interest.
32	Streaming, Distributed Variational Inference for Bayesian Nonparametrics	Trevor Campbell, Julian Straub, John W. Fisher III, Jonathan P. How	This paper presents a methodology for creating streaming, distributed inference algorithms for Bayesian nonparametric (BNP) models. To address this, the paper develops a combinatorial optimization problem over component correspondences, and provides an efficient solution technique.
33	Learning visual biases from human imagination	Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba	In this paper, we investigate whether wecan extract these biases and transfer them into a machine recognition system.We introduce a novel method that, inspired by well-known tools in humanpsychophysics, estimates the biases that the human visual system might use forrecognition, but in computer vision feature spaces.
34	Smooth and Strong: MAP Inference with Linear Convergence	Ofer Meshi, Mehrdad Mahdavi, Alex Schwing	Specifically, we introduce strong convexity by adding a quadratic term to the LP relaxation objective.
35	Copeland Dueling Bandits	Masrour Zoghi, Zohar S. Karnin, Shimon Whiteson, Maarten de Rijke	Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist.
36	Optimal Ridge Detection using Coverage Risk	Yen-Chi Chen, Christopher R. Genovese, Shirley Ho, Larry Wasserman	We introduce the concept of coverage risk as an error measure for density ridge estimation.The coverage risk generalizes the mean integrated square error to set estimation.We propose two risk estimators for the coverage risk and we show that we can select tuning parameters by minimizing the estimated risk.We study the rate of convergence for coverage risk and prove consistency of the risk estimators.We apply our method to three simulated datasets and to cosmology data.In all the examples, the proposed method successfully recover the underlying density structure.
37	Top-k Multiclass SVM	Maksim Lapin, Matthias Hein, Bernt Schiele	We propose top-k multiclass SVM as a direct method to optimize for top-k performance.
38	Policy Evaluation Using the O-Return	Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Konidaris	We propose the Ω-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms.
39	Orthogonal NMF through Subspace Exploration	Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis	Existing algorithms rely mostly on heuristics, which despite their good empirical performance, lack provable performance guarantees.We present a new ONMF algorithm with provable approximation guarantees.For any constant dimension~$k$, we obtain an additive EPTAS without any assumptions on the input.
40	Stochastic Online Greedy Learning with Semi-bandit Feedbacks	Tian Lin, Jian Li, Wei Chen	In this paper, we address the online learning problem when the input to the greedy algorithm is stochastic with unknown parameters that have to be learned over time.
41	Deeply Learning the Messages in Message Passing Inference	Guosheng Lin, Chunhua Shen, Ian Reid, Anton van den Hengel	We apply our method to semantic image segmentation and achieve impressive performance, which demonstrates the effectiveness and usefulness of our CNN message learning method.
42	Synaptic Sampling: A Bayesian Approach to Neural Network Plasticity and Rewiring	David Kappel, Stefan Habenschuss, Robert Legenstein, Wolfgang Maass	We reexamine in this article the conceptual and mathematical framework for understanding the organization of plasticity in spiking neural networks.
43	Accelerated Proximal Gradient Methods for Nonconvex Programming	Huan Li, Zhouchen Lin	To address this issue, we introduce a monitor-corrector step and extend APG for general nonconvex and nonsmooth programs.
44	Approximating Sparse PCA from Incomplete Data	ABHISEK KUNDU, Petros Drineas, Malik Magdon-Ismail	We study how well one can recover sparse principal componentsof a data matrix using a sketch formed from a few of its elements.
45	Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations	Kirthevasan Kandasamy, Akshay Krishnamurthy, Barnabas Poczos, Larry Wasserman, james m. robins	We propose and analyse estimators for statistical functionals of one or moredistributions under nonparametric assumptions.Our estimators are derived from the von Mises expansion andare based on the theory of influence functions, which appearin the semiparametric statistics literature.We show that estimators based either on data-splitting or a leave-one-out techniqueenjoy fast rates of convergence and other favorable theoretical properties.We apply this framework to derive estimators for several popular informationtheoretic quantities, and via empirical evaluation, show the advantage of thisapproach over existing estimators.
46	Column Selection via Adaptive Sampling	Saurabh Paul, Malik Magdon-Ismail, Petros Drineas	We propose a new adaptive sampling algorithm that can be used to improve any relative-error column selection algorithm.
47	HONOR: Hybrid Optimization for NOn-convex Regularized problems	Pinghua Gong, Jieping Ye	In this paper, we propose an efficient \underline{H}ybrid \underline{O}ptimization algorithm for \underline{NO}n convex \underline{R}egularized problems (HONOR).
48	3D Object Proposals for Accurate Object Class Detection	Xiaozhi Chen, Kaustav Kundu, Yukun Zhu, Andrew G. Berneshawi, Huimin Ma, Sanja Fidler, Raquel Urtasun	The goal of this paper is to generate high-quality 3D object proposals in the context of autonomous driving.
49	Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits	Huasen Wu, R. Srikant, Xin Liu, Chong Jiang	We show that the proposed UCB-ALP algorithm achieves logarithmic regret except in certain boundary cases.Further, we design algorithms and obtain similar regret analysis results for more general systems with unknown context distribution or heterogeneous costs.
50	Tensorizing Neural Networks	Alexander Novikov, Dmitrii Podoprikhin, Anton Osokin, Dmitry P. Vetrov	In this paper we convert the dense weight matrices of the fully-connected layers to the Tensor Train format such that the number of parameters is reduced by a huge factor and at the same time the expressive power of the layer is preserved.In particular, for the Very Deep VGG networks we report the compression factor of the dense weight matrix of a fully-connected layer up to 200000 times leading to the compression factor of the whole network up to 7 times.
51	Parallelizing MCMC with Random Partition Trees	Xiangyu Wang, Fangjian Guo, Katherine A. Heller, David B. Dunson	In this article, we propose a new EP-MCMC algorithm PART that solves these problems.
52	A Reduced-Dimension fMRI Shared Response Model	Po-Hsuan (Cameron) Chen, Janice Chen, Yaara Yeshurun, Uri Hasson, James Haxby, Peter J. Ramadge	We develop a shared response model for aggregating multi-subject fMRI data that accounts for different functional topographies among anatomically aligned datasets.
53	Spectral Learning of Large Structured HMMs for Comparative Epigenomics	Chicheng Zhang, Jimin Song, Kamalika Chaudhuri, Kevin Chen	We develop a latent variable model and an efficient spectral algorithm motivated by the recent emergence of very large data sets of chromatin marks from multiple human cell types.
54	Individual Planning in Infinite-Horizon Multiagent Settings: Inference, Structure and Scalability	Xia Qu, Prashant Doshi	We exploit the graphical model structure specific to I-POMDPs, and present a new approach based on block-coordinate descent for further speed up.
55	Estimating Mixture Models via Mixtures of Polynomials	Sida Wang, Arun Tejasvi Chaganty, Percy S. Liang	In this work, we present Polymom, an unifying framework based on method of moments in which estimation procedures are easily derivable, just as in EM.
56	On the Global Linear Convergence of Frank-Wolfe Optimization Variants	Simon Lacoste-Julien, Martin Jaggi	In this paper, we highlight and clarify several variants of the Frank-Wolfe optimization algorithm that has been successfully applied in practice: FW with away steps, pairwise FW, fully-corrective FW and Wolfe’s minimum norm point algorithm, and prove for the first time that they all enjoy global linear convergence under a weaker condition than strong convexity.
57	Deep Knowledge Tracing	Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J. Guibas, Jascha Sohl-Dickstein	Knowledge tracing, where a machine models the knowledge of a student as they interact with coursework, is an established and significantly unsolved problem in computer supported education.In this paper we explore the benefit of using recurrent neural networks to model student learning.This family of models have important advantages over current state of the art methods in that they do not require the explicit encoding of human domain knowledge,and have a far more flexible functional form which can capture substantially more complex student interactions.We show that these neural networks outperform the current state of the art in prediction on real student data,while allowing straightforward interpretation and discovery of structure in the curriculum.These results suggest a promising new line of research for knowledge tracing.
58	Rethinking LDA: Moment Matching for Discrete ICA	Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien	We consider moment matching techniques for estimation in Latent Dirichlet Allocation (LDA).
59	Efficient Compressive Phase Retrieval with Constrained Sensing Vectors	Sohail Bahmani, Justin Romberg	We propose a robust and efficient approach to the problem of compressive phase retrieval in which the goal is to reconstruct a sparse vector from the magnitude of a number of its linear measurements.
60	Barrier Frank-Wolfe for Marginal Inference	Rahul G. Krishnan, Simon Lacoste-Julien, David Sontag	We introduce a globally-convergent algorithm for optimizing the tree-reweighted (TRW) variational objective over the marginal polytope.
61	Learning Theory and Algorithms for Forecasting Non-stationary Time Series	Vitaly Kuznetsov, Mehryar Mohri	We present data-dependent learning bounds for the general scenario of non-stationary non-mixing stochastic processes.
62	Compressive spectral embedding: sidestepping the SVD	Dinesh Ramasamy, Upamanyu Madhow	In this paper, we propose a low-complexity it compressive spectral embedding algorithm, which employs random projections and finite order polynomial expansions to compute approximations to SVD-based embedding.
63	A Nonconvex Optimization Framework for Low Rank Matrix Estimation	Tuo Zhao, Zhaoran Wang, Han Liu	In this paper, we define the notion of projected oracle divergence based on which we establish sufficient conditions for the success of nonconvex optimization.
64	Automatic Variational Inference in Stan	Alp Kucukelbir, Rajesh Ranganath, Andrew Gelman, David Blei	We propose an automatic variational inference algorithm, automatic differentiation variational inference (ADVI); we implement it in Stan (code available), a probabilistic programming system.
65	Attention-Based Models for Speech Recognition	Jan K. Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio	We offer a qualitative explanation of this failure and propose a novel and generic method of adding location-awareness to the attention mechanism to alleviate this issue.
66	Closed-form Estimators for High-dimensional Generalized Linear Models	Eunho Yang, Aurelie C. Lozano, Pradeep K. Ravikumar	We propose a class of closed-form estimators for GLMs under high-dimensional sampling regimes.
67	Online F-Measure Optimization	R�bert Busa-Fekete, Bal�zs Sz�r�nyi, Krzysztof Dembczynski, Eyke H�llermeier	In this paper, we study the problem of F-measure maximization in the setting of online learning.
68	Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach	Bal�zs Sz�r�nyi, R�bert Busa-Fekete, Adil Paul, Eyke H�llermeier	We study the problem of online rank elicitation, assuming that rankings of a set of alternatives obey the Plackett-Luce distribution.
69	M-Best-Diverse Labelings for Submodular Energies and Beyond	Alexander Kirillov, Dmytro Shlezinger, Dmitry P. Vetrov, Carsten Rother, Bogdan Savchynskyy	In this work we show that the joint inference of $M$ best diverse solutions can be formulated as a submodular energy minimization if the original MAP-inference problem is submodular, hence fast inference techniques can be used.
70	Tractable Bayesian Network Structure Learning with Bounded Vertex Cover Number	Janne H. Korhonen, Pekka Parviainen	In this paper, we propose bounded vertex cover number Bayesian networks as an alternative to bounded tree-width networks.
71	Learning Large-Scale Poisson DAG Models based on OverDispersion Scoring	Gunwoong Park, Garvesh Raskutti	In this paper, we address the question of identifiability and learning algorithms for large-scale Poisson Directed Acyclic Graphical (DAG) models.
72	Training Restricted Boltzmann Machine via the ?Thouless-Anderson-Palmer free energy	Marylou Gabrie, Eric W. Tramel, Florent Krzakala	We propose an alternative deterministic iterative procedure based on an improved mean field method from statistical physics known as the Thouless-Anderson-Palmer approach.
73	Character-level Convolutional Networks for Text Classification	Xiang Zhang, Junbo Zhao, Yann LeCun	This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results.
74	Robust Feature-Sample Linear Discriminant Analysis for Brain Disorders Diagnosis	Ehsan Adeli-Mosabbeb, Kim-Han Thung, Le An, Feng Shi, Dinggang Shen	In this paper, we propose a classification method based on the least-squares formulation of linear discriminant analysis, which simultaneously detects the sample-outliers and feature-noises.
75	Black-box optimization of noisy functions with unknown smoothness	Jean-Bastien Grill, Michal Valko, Remi Munos, Remi Munos	Our contribution is an adaptive optimization algorithm, POO or parallel optimistic optimization, that is able to deal with this setting.
76	Recovering Communities in the General Stochastic Block Model Without Knowing the Parameters	Emmanuel Abbe, Colin Sandon	This paper introduces efficient algorithms that do not require such knowledge and yet achieve the optimal information-theoretic tradeoffs identified in Abbe-Sandon FOCS15.
77	Deep learning with Elastic Averaging SGD	Sixin Zhang, Anna E. Choromanska, Yann LeCun	We propose synchronous and asynchronous variants of the new algorithm.
78	Monotone k-Submodular Function Maximization with Size Constraints	Naoto Ohsaka, Yuichi Yoshida	A $k$-submodular function is a generalization of a submodular function, where the input consists of $k$ disjoint subsets, instead of a single subset, of the domain.Many machine learning problems, including influence maximization with $k$ kinds of topics and sensor placement with $k$ kinds of sensors, can be naturally modeled as the problem of maximizing monotone $k$-submodular functions.In this paper, we give constant-factor approximation algorithms for maximizing monotone $k$-submodular functions subject to several size constraints.The running time of our algorithms are almost linear in the domain size.We experimentally demonstrate that our algorithms outperform baseline algorithms in terms of the solution quality.
79	Active Learning from Weak and Strong Labelers	Chicheng Zhang, Kamalika Chaudhuri	Our goal is to learn a classifier with low error on data labeled by the oracle, while using the weak labeler to reduce the number of label queries made to this labeler.
80	On the Optimality of Classifier Chain for Multi-label Classification	Weiwei Liu, Ivor Tsang	Based on our results, we propose a dynamic programming based classifier chain (CC-DP) algorithm to search the globally optimal label order for CC and a greedy classifier chain (CC-Greedy) algorithm to find a locally optimal CC.
81	Robust Regression via Hard Thresholding	Kush Bhatia, Prateek Jain, Purushottam Kar	We study the problem of Robust Least Squares Regression (RLSR) where several response variables can be adversarially corrupted.
82	Sparse Local Embeddings for Extreme Multi-label Classification	Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, Prateek Jain	We conducted extensive experiments on several real-world as well as benchmark data sets and compare our method against state-of-the-art methods for extreme multi-label classification.
83	Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems	Yuxin Chen, Emmanuel Candes	This paper is concerned with finding a solution x to a quadratic system of equations y_i = \|< a_i, x >\|^2, i = 1, 2, …, m.
84	A Framework for Individualizing Predictions of Disease Trajectories by Exploiting Multi-Resolution Structure	Peter Schulam, Suchi Saria	We propose a hierarchical latent variable model that individualizes predictions of disease trajectories.
85	Subspace Clustering with Irrelevant Features via Robust Dantzig Selector	Chao Qu, Huan Xu	We propose a method termed “robust Dantzig selector” which can successfully identify the clustering structure even with the presence of irrelevant features.
86	Sparse PCA via Bipartite Matchings	Megasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, Alexandros G. Dimakis	We consider the following multi-component sparse PCA problem:given a set of data points, we seek to extract a small number of sparse components with \emph{disjoint} supports that jointly capture the maximum possible variance.Such components can be computed one by one, repeatedly solving the single-component problem and deflating the input data matrix, but this greedy procedure is suboptimal.We present a novel algorithm for sparse PCA that jointly optimizes multiple disjoint components.
87	Fast Randomized Kernel Ridge Regression with Statistical Guarantees	Ahmed Alaoui, Michael W. Mahoney	Here, we describe a version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance.By extending the notion of \emph{statistical leverage scores} to the setting of kernel ridge regression, we are able to identify a sampling distribution that reduces the size of the sketch (i.e., the required number of columns to be sampled) to the \emph{effective dimensionality} of the problem.
88	Online Learning for Adversaries with Memory: Price of Past Mistakes	Oren Anava, Elad Hazan, Shie Mannor	In this work we extend the notion of learning with memory to the general Online Convex Optimization (OCO) framework, and present two algorithms that attain low regret.
89	Convolutional spike-triggered covariance analysis for neural subunit models	Anqi Wu, Il Memming Park, Jonathan W. Pillow	Here we address this problem by forging a theoretical connection between spike-triggered covariance analysis and nonlinear subunit models.
90	Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting	Xingjian SHI, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, Wang-chun WOO	In this paper, we formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem in which both the input and the prediction target are spatiotemporal sequences.
91	GAP Safe screening rules for sparse multi-task and multi-class models	Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon	In this paper we derive new safe rules for generalized linear models regularized with L1 and L1/L2 norms.
92	Empirical Localization of Homogeneous Divergences on Discrete Sample Spaces	Takashi Takenouchi, Takafumi Kanamori	In this paper, we propose a novel parameter estimator for probabilistic models on discrete space.
93	Statistical Model Criticism using Kernel Two Sample Tests	James R. Lloyd, Zoubin Ghahramani	We propose an exploratory approach to statistical model criticism using maximum mean discrepancy (MMD) two sample tests.
94	Precision-Recall-Gain Curves: PR Analysis Done Right	Peter Flach, Meelis Kull	We demonstrate in this paper that this practice is fraught with difficulties, mainly because of incoherent scale assumptions — e.g., the area under a PR curve takes the arithmetic mean of precision values whereas the $F_{\beta}$ score applies the harmonic mean.
95	A Generalization of Submodular Cover via the Diminishing Return Property on the Integer Lattice	Tasuku Soma, Yuichi Yoshida	We consider a generalization of the submodular cover problem based on the concept of diminishing return property on the integer lattice.
96	Bidirectional Recurrent Neural Networks as Generative Models	Mathias Berglund, Tapani Raiko, Mikko Honkala, Leo K�rkk�inen, Akos Vetek, Juha T. Karhunen	We propose two probabilistic interpretations of bidirectional RNNs that can be used to reconstruct missing gaps efficiently.
97	Quartz: Randomized Dual Coordinate Ascent with Arbitrary Sampling	Zheng Qu, Peter Richtarik, Tong Zhang	We propose and analyze a novel primal-dual method (Quartz) which at every iteration samples and updates a random subset of the dual variables, chosen according to an arbitrary distribution.
98	Maximum Likelihood Learning With Arbitrary Treewidth via Fast-Mixing Parameter Sets	Justin Domke	This paper explores an alternative notion of a tractable set, namely a set of “fast-mixing parameters” where Markov chain Monte Carlo (MCMC) inference can be guaranteed to quickly converge to the stationary distribution.
99	Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks	Minhyung Cho, Chandra Dhir, Jaehyung Lee	Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks
100	Large-scale probabilistic predictors with and without guarantees of validity	Vladimir Vovk, Ivan Petej, Valentina Fedorova	Large-scale probabilistic predictors with and without guarantees of validity
101	Shepard Convolutional Neural Networks	Jimmy SJ Ren, Li Xu, Qiong Yan, Wenxiu Sun	In this paper, we draw on Shepard interpolation and design Shepard Convolutional Neural Networks (ShCNN) which efficiently realizes end-to-end trainable TVI operators in the network.
102	Matrix Manifold Optimization for Gaussian Mixtures	Reshad Hosseini, Suvrit Sra	To bring our ideas to fruition, we develop a well-tuned Riemannian LBFGS method that proves superior to known competing methods (e.g., Riemannian conjugate gradient).
103	Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding	Rie Johnson, Tong Zhang	This paper presents a new semi-supervised framework with convolutional neural networks (CNNs) for text categorization.
104	Parallel Recursive Best-First AND/OR Search for Exact MAP Inference in Graphical Models	Akihiro Kishimoto, Radu Marinescu, Adi Botea	We introduce a new parallel shared-memory recursive best-first AND/OR search algorithm, called SPRBFAOO, that explores the search space in a best-first manner while operating with restricted memory.
105	Convolutional Neural Networks with Intra-Layer Recurrent Connections for Scene Labeling	Ming Liang, Xiaolin Hu, Bo Zhang	We adopt a deep recurrent convolutional neural network (RCNN) for this task, which is originally proposed for object recognition.
106	Bounding the Cost of Search-Based Lifted Inference	David B. Smith, Vibhav G. Gogate	In this paper, we present a principled approach to address this problem.
107	Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families	Heiko Strathmann, Dino Sejdinovic, Samuel Livingstone, Zoltan Szabo, Arthur Gretton	We propose Kernel Hamiltonian Monte Carlo (KMC), a gradient-free adaptive MCMC algorithm based on Hamiltonian Monte Carlo (HMC).
108	Linear Multi-Resource Allocation with Semi-Bandit Feedback	Tor Lattimore, Koby Crammer, Csaba Szepesvari	Our main contribution is the new setting and an algorithm with nearly-optimal regret analysis.
109	Unsupervised Learning by Program Synthesis	Kevin Ellis, Armando Solar-Lezama, Josh Tenenbaum	We introduce an unsupervised learning algorithmthat combines probabilistic modeling with solver-based techniques for program synthesis.We apply our techniques to both a visual learning domain and a language learning problem,showing that our algorithm can learn many visual concepts from only a few examplesand that it can recover some English inflectional morphology.Taken together, these results give both a new approach to unsupervised learning of symbolic compositional structures,and a technique for applying program synthesis tools to noisy data.
110	Enforcing balance allows local supervised learning in spiking recurrent networks	Ralph Bourdoukan, Sophie Den�ve	Using a top-down approach, we show how networks of integrate-and-fire neurons can learn arbitrary linear dynamical systems by feeding back their error as a feed-forward input.
111	Fast and Guaranteed Tensor Decomposition via Sketching	Yining Wang, Hsiao-Yu Tung, Alexander J. Smola, Anima Anandkumar	In this paper, we propose fast and randomized tensor CP decomposition algorithms based on sketching.
112	Differentially private subspace clustering	Yining Wang, Yu-Xiang Wang, Aarti Singh	In this work, we build on the framework of “differential privacy” and present two provably private subspace clustering algorithms.
113	Predtron: A Family of Online Algorithms for General Prediction Problems	Prateek Jain, Nagarajan Natarajan, Ambuj Tewari	We offer a general framework to derive mistake driven online algorithms and associated loss bounds.
114	Weighted Theta Functions and Embeddings with Applications to Max-Cut, Clustering and Summarization	Fredrik D. Johansson, Ankani Chattoraj, Chiranjib Bhattacharyya, Devdatt Dubhashi	We introduce a unifying generalization of the Lovász theta function, and the associated geometric embedding, for graphs with weights on both nodes and edges.
115	SGD Algorithms based on Incomplete U-statistics: Large-Scale Minimization of Empirical Risk	Guillaume Papa, St�phan Cl�men�on, Aur�lien Bellet	In this paper, we focus on how to best implement a stochastic approximation approach to solve such risk minimization problems.
116	On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs	Wei Cao, Jian Li, Yufei Tao, Zhize Li	This paper discusses how to efficiently choose from $n$ unknowndistributions the $k$ ones whose means are the greatest by a certainmetric, up to a small relative error.
117	The Brain Uses Reliability of Stimulus Information when Making Perceptual Decisions	Sebastian Bitzer, Stefan Kiebel	We here show that even the basic drift diffusion model, which has frequently been used to explain experimental findings in perceptual decision making, implicitly relies on estimates of stimulus reliability.
118	Fast Classification Rates for High-dimensional Gaussian Generative Models	Tianyang Li, Adarsh Prasad, Pradeep K. Ravikumar	We present a novel analysis of the classification error of any linear discriminant approach given conditional Gaussian models.
119	Fast Distributed k-Center Clustering with Outliers on Massive Data	Gustavo Malkomes, Matt J. Kusner, Wenlin Chen, Kilian Q. Weinberger, Benjamin Moseley	In this work, we consider the widely used k-center clustering problem and its variant used to handle noisy data, k-center with outliers.
120	Human Memory Search as Initial-Visit Emitting Random Walk	Kwang-Sung Jun, Jerry Zhu, Timothy T. Rogers, Zhuoran Yang, ming yuan	In this paper, we propose the first efficient maximum likelihood estimate (MLE) for INVITE by decomposing the censored output into a series of absorbing random walks.
121	Non-convex Statistical Optimization for Sparse Tensor Graphical Model	Wei Sun, Zhaoran Wang, Han Liu, Guang Cheng	We consider the estimation of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data.
122	Convergence Rates of Active Learning for Maximum Likelihood Estimation	Kamalika Chaudhuri, Sham M. Kakade, Praneeth Netrapalli, Sujay Sanghavi	In this paper, we shift our attention to a more general setting — maximum likelihood estimation.
123	Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis	Jimei Yang, Scott E. Reed, Ming-Hsuan Yang, Honglak Lee	In this paper, we propose a novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image.
124	Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets	Pascal Vincent, Alexandre de Br�bisson, Xavier Bouthillier	In this work we develop an original algorithmic approach that, for a family of loss functions that includes squared error and spherical softmax, can compute the exact loss, gradient update for the output weights, and gradient for backpropagation, all in $O(d^2)$ per example instead of $O(Dd)$, remarkably without ever computing the D-dimensional output.
125	Backpropagation for Energy-Efficient Neuromorphic Computing	Steve K. Esser, Rathinakumar Appuswamy, Paul Merolla, John V. Arthur, Dharmendra S. Modha	To demonstrate, we trained a sparsely connected network that runs on the TrueNorth chip using the MNIST dataset.
126	Alternating Minimization for Regression Problems with Vector-valued Outputs	Prateek Jain, Ambuj Tewari	We provide finite sample upper and lower bounds on the estimation error of OLS and MLE, in two popular models: a) Pooled model, b) Seemingly Unrelated Regression (SUR) model.
127	Learning both Weights and Connections for Efficient Neural Network	Song Han, Jeff Pool, John Tran, William Dally	To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections.
128	Optimal Rates for Random Fourier Features	Bharath Sriperumbudur, Zoltan Szabo	In this paper, we provide a detailed finite-sample theoretical analysis about the approximation quality of RFFs by (i) establishing optimal (in terms of the RFF dimension, and growing set size) performance guarantees in uniform norm, and (ii) presenting guarantees in L^r (1 ≤ r < ∞) norms.
129	The Population Posterior and Bayesian Modeling on Streams	James McInerney, Rajesh Ranganath, David Blei	We develop population variational Bayes, a new approach for using Bayesian modeling to analyze streams of data.
130	Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees	Fran�ois-Xavier Briol, Chris Oates, Mark Girolami, Michael A. Osborne	In this paper, we present the first probabilistic integrator that admits such theoretical treatment, called Frank-Wolfe Bayesian Quadrature (FWBQ).
131	Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks	Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer	We propose a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous token, towards a less guided scheme which mostly uses the generated token instead.
132	Unified View of Matrix Completion under General Structural Constraints	Suriya Gunasekar, Arindam Banerjee, Joydeep Ghosh	In this paper, we present a unified analysis of matrix completion under general low-dimensional structural constraints induced by {\em any} norm regularization.We consider two estimators for the general problem of structured matrix completion, and provide unified upper bounds on the sample complexity and the estimation error.
133	Efficient Output Kernel Learning for Multiple Tasks	Pratik Kumar Jawanpuria, Maksim Lapin, Matthias Hein, Bernt Schiele	Using the theory of positive semidefinite kernels we show in this paper that for a certain class of regularizers on the output kernel, the constraint of being positive semidefinite can be dropped as it is automatically satisfied for the relaxed problem.
134	Scalable Adaptation of State Complexity for Nonparametric Hidden Markov Models	Michael C. Hughes, William T. Stephenson, Erik Sudderth	We develop an inference algorithm for the sticky hierarchical Dirichlet process hidden Markov model that scales to big datasets by processing a few sequences at a time yet allows rapid adaptation of the state space cardinality.
135	Variational Consensus Monte Carlo	Maxim Rabinovich, Elaine Angelino, Michael I. Jordan	We introduce variational consensus Monte Carlo (VCMC), a variational Bayes algorithm that optimizes over aggregation functions to obtain samples from a distribution that better approximates the target.
136	Newton-Stein Method: A Second Order Method for GLMs via Stein's Lemma	Murat A. Erdogdu	We consider the problem of efficiently computing the maximum likelihood estimator in Generalized Linear Models (GLMs)when the number of observations is much larger than the number of coefficients (n > > p > > 1).
137	Practical and Optimal LSH for Angular Distance	Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, Ludwig Schmidt	We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent.
138	Learning to Linearize Under Uncertainty	Ross Goroshin, Michael F. Mathieu, Yann LeCun	In this work we suggest a new architecture and loss for training deep feature hierarchies that linearize the transformations observed in unlabelednatural video sequences.
139	Finite-Time Analysis of Projected Langevin Monte Carlo	Sebastien Bubeck, Ronen Eldan, Joseph Lehec	We analyze the projected Langevin Monte Carlo (LMC) algorithm, a close cousin of projected Stochastic Gradient Descent (SGD).
140	Deep Visual Analogy-Making	Scott E. Reed, Yi Zhang, Yuting Zhang, Honglak Lee	In this paper we develop a novel deep network trained end-to-end to perform visual analogy making, which is the task of transforming a query image according to an example pair of related images.
141	Matrix Completion from Fewer Entries: Spectral Detectability and Rank Estimation	Alaa Saade, Florent Krzakala, Lenka Zdeborov�	We propose a spectral algorithm for these two tasks called MaCBetH (for Matrix Completion with the Bethe Hessian).
142	Online Learning with Adversarial Delays	Kent Quanrud, Daniel Khashabi	Our main contribution is to show that standard algorithms for online learning already have simple regret bounds in the most general setting of delayed feedback, making adjustments to the analysis and not to the algorithms themselves.
143	Multi-Layer Feature Reduction for Tree Structured Group Lasso via Hierarchical Projection	Jie Wang, Jieping Ye	In this paper, we propose a novel Multi-Layer Feature reduction method (MLFre) to quickly identify the inactive nodes (the groups of features with zero coefficients in the solution) hierarchically in a top-down fashion, which are guaranteed to be irrelevant to the response.
144	Minimum Weight Perfect Matching via Blossom Belief Propagation	Sung-Soo Ahn, Sejun Park, Michael Chertkov, Jinwoo Shin	In this paper, we develop the first such algorithm, coined Blossom-BP, for solving the minimum weight matching problem over arbitrary graphs.
145	Efficient Thompson Sampling for Online ?Matrix-Factorization Recommendation	Jaya Kawale, Hung H. Bui, Branislav Kveton, Long Tran-Thanh, Sanjay Chawla	Efficient Thompson Sampling for Online ?Matrix-Factorization Recommendation
146	Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems	Ruoyu Sun, Mingyi Hong	Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems
147	Lifted Symmetry Detection and Breaking for MAP Inference	Timothy Kopp, Parag Singla, Henry Kautz	In this work, we extend symmetry breaking to the problem of model finding in weighted and unweighted relational theories, a class of problems that includes MAP inference in Markov Logic and similar statistical-relational languages.
148	Evaluating the statistical significance of biclusters	Jason D. Lee, Yuekai Sun, Jonathan E. Taylor	We develop a framework for performing statistical inference on biclusters found by score-based algorithms.
149	Discriminative Robust Transformation Learning	Jiaji Huang, Qiang Qiu, Guillermo Sapiro, Robert Calderbank	This paper proposes a framework for learning features that are robust to data variation, which is particularly important when only a limited number of trainingsamples are available.
150	Bandits with Unobserved Confounders: A Causal Approach	Elias Bareinboim, Andrew Forney, Judea Pearl	In this paper, we show that formalizing this distinction has conceptual and algorithmic implications to the bandit setting.
151	Scalable Semi-Supervised Aggregation of Classifiers	Akshay Balsubramani, Yoav Freund	We present and empirically evaluate an efficient algorithm that learns to aggregate the predictions of an ensemble of binary classifiers.
152	Online Learning with Gaussian Payoffs and Side Observations	Yifan Wu, Andr�s Gy�rgy, Csaba Szepesvari	We consider a sequential learning problem with Gaussian payoffs and side information: after selecting an action $i$, the learner receives information about the payoff of every action $j$ in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair $(i,j)$ (and may be infinite).
153	Private Graphon Estimation for Sparse Graphs	Christian Borgs, Jennifer Chayes, Adam Smith	We design algorithms for fitting a high-dimensional statistical model to a large, sparse network without revealing sensitive information of individual members.
154	SubmodBoxes: Near-Optimal Search for a Set of Diverse Object Proposals	Qing Sun, Dhruv Batra	In order to speed up repeated application of B\&B, we propose a novel generalization of Minoux’s ‘lazy greedy’ algorithm to the B\&B tree.
155	Fast Second Order Stochastic Backpropagation for Variational Inference	Kai Fan, Ziteng Wang, Jeff Beck, James Kwok, Katherine A. Heller	We propose a second-order (Hessian or Hessian-free) based optimization method for variational inference inspired by Gaussian backpropagation, and argue that quasi-Newton optimization can be developed as well.
156	Randomized Block Krylov Methods for Stronger and Faster Approximate Singular Value Decomposition	Cameron Musco, Christopher Musco	We address this problem for the first time by showing that both Block Krylov Iteration and Simultaneous Iteration give nearly optimal PCA for any matrix.
157	Cross-Domain Matching for Bag-of-Words Data via Kernel Embeddings of Latent Distributions	Yuya Yoshikawa, Tomoharu Iwata, Hiroshi Sawada, Takeshi Yamada	We propose a kernel-based method for finding matching between instances across different domains, such as multilingual documents and images with annotations.
158	Scalable Inference for Gaussian Process Models with Black-Box Likelihoods	Amir Dezfouli, Edwin V. Bonilla	We propose a sparse method for scalable automated variational inference (AVI) in a large class of models with Gaussian process (GP) priors, multiple latent functions, multiple outputs and non-linear likelihoods.
159	Fast Bidirectional Probability Estimation in Markov Models	Siddhartha Banerjee, Peter Lofgren	We develop a new bidirectional algorithm for estimating Markov chain multi-step transition probabilities: given a Markov chain, we want to estimate the probability of hitting a given target state in $\ell$ steps after starting from a given source distribution.
160	Probabilistic Variational Bounds for Graphical Models	Qiang Liu, John W. Fisher III, Alexander T. Ihler	We propose a simple Monte Carlo based inference method that augments convex variational bounds by adding importance sampling (IS).
161	Linear Response Methods for Accurate Covariance Estimates from Mean Field Variational Bayes	Ryan J. Giordano, Tamara Broderick, Michael I. Jordan	We generalize linear response methods from statistical physics to deliver accurate uncertainty estimates for model variables—both for individual variables and coherently across variables.
162	Combinatorial Cascading Bandits	Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari	We propose a UCB-like algorithm for solving our problems, CombCascade; and prove gap-dependent and gap-free upper bounds on its n-step regret.
163	Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path	Daniel J. Hsu, Aryeh Kontorovich, Csaba Szepesvari	This article provides the first procedure for computing a fully data-dependent interval that traps the mixing time $t_{mix}$ of a finite reversible ergodic Markov chain at a prescribed confidence level.
164	Policy Gradient for Coherent Risk Measures	Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor	In this work, we extend the policy gradient method to the whole class of coherent risk measures, which is widely accepted in finance and operations research, among other fields.
165	Fast Rates for Exp-concave Empirical Risk Minimization	Tomer Koren, Kfir Levy	We consider Empirical Risk Minimization (ERM) in the context of stochastic optimization with exp-concave and smooth losses—a general optimization framework that captures several important learning problems including linear and logistic regression, learning SVMs with the squared hinge-loss, portfolio selection and more.
166	Deep Generative Image Models using a ?Laplacian Pyramid of Adversarial Networks	Emily L. Denton, Soumith Chintala, arthur szlam, Rob Fergus	In this paper we introduce a generative model capable of producing high quality samples of natural images.
167	Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation	Seunghoon Hong, Hyeonwoo Noh, Bohyung Han	We propose a novel deep neural network architecture for semi-supervised semantic segmentation using heterogeneous annotations.
168	Equilibrated adaptive learning rates for non-convex optimization	Yann Dauphin, Harm de Vries, Yoshua Bengio	We introduce a novel adaptive learning rate scheme, called ESGD, based on the equilibration preconditioner.
169	BACKSHIFT: Learning causal cyclic graphs from unknown shift interventions	Dominik Rothenh�usler, Christina Heinze, Jonas Peters, Nicolai Meinshausen	We propose a simple method to learn linear causal cyclic models in the presence of latent variables.
170	Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach	Yinlam Chow, Aviv Tamar, Shie Mannor, Marco Pavone	In this paper we address the problem of decision making within a Markov decision process (MDP) framework where risk and modeling errors are taken into account.
171	Asynchronous stochastic convex optimization: the noise is in the noise and SGD don't care	Sorathan Chaturapruek, John C. Duchi, Christopher R�	We show that asymptotically, completely asynchronous stochastic gradient procedures achieve optimal (even to constant factors) convergence rates for the solution of convex optimization problems under nearly the same conditions required for asymptotic optimality of standard stochastic gradient procedures.
172	Lifelong Learning with Non-i.i.d. Tasks	Anastasia Pentina, Christoph H. Lampert	In this work we aim at extending theoretical foundations of lifelong learning.
173	Optimal Linear Estimation under Unknown Nonlinear Transform	Xinyang Yi, Zhaoran Wang, Constantine Caramanis, Han Liu	We propose a novel spectral-based estimation procedure and show that we can recover $\beta^*$ in settings (i.e., classes of link function $f$) where previous algorithms fail.
174	Learning with Group Invariant Features: A Kernel Perspective.	Youssef Mroueh, Stephen Voinea, Tomaso A. Poggio	We analyze in this paper a random feature map based on a theory of invariance (\emph{I-theory}) introduced in \cite{AnselmiLRMTP13}.
175	Regularized EM Algorithms: A Unified Framework and Statistical Guarantees	Xinyang Yi, Constantine Caramanis	We address precisely this setting through a unified treatment using regularization.
176	Distributionally Robust Logistic Regression	Soroosh Shafieezadeh Abadeh, Peyman Mohajerin Mohajerin Esfahani, Daniel Kuhn	This paper proposes a distributionally robust approach to logistic regression.
177	Adaptive Stochastic Optimization: From Sets to Paths	Zhan Wei Lim, David Hsu, Wee Sun Lee	We describe Recursive Adaptive Coverage (RAC), a new adaptive stochastic optimization algorithm that exploits these conditions, and apply it to two planning tasks under uncertainty.
178	Beyond Convexity: Stochastic Quasi-Convex Optimization	Elad Hazan, Kfir Levy, Shai Shalev-Shwartz	In this paper we analyze a stochastic version of NGD and prove its convergence to a global minimum for a wider class of functions: we require the functions to be quasi-convex and locally-Lipschitz.
179	A Tractable Approximation to Optimal Point Process Filtering: Application to Neural Encoding	Yuval Harel, Ron Meir, Manfred Opper	We develop an analytically tractable Bayesian approximation to optimal filtering based on point process observations, which allows us to introduce distributional assumptions about sensory cell properties, that greatly facilitates the analysis of optimal encoding in situations deviating from common assumptions of uniform coding.
180	Sum-of-Squares Lower Bounds for Sparse PCA	Tengyu Ma, Avi Wigderson	Specifically, we consider the {\em Sparse Principal Component Analysis} (Sparse PCA) problem, and the family of {\em Sum-of-Squares} (SoS, aka Lasserre/Parillo) convex relaxations.
181	Max-Margin Majority Voting for Learning from Crowds	TIAN TIAN, Jun Zhu	This paper presents max-margin majority voting (M^3V) to improve the discriminative ability of majority voting and further presents a Bayesian generalization to incorporate the flexibility of generative methods on modeling noisy observations with worker confusion matrices.
182	Learning with Incremental Iterative Regularization	Lorenzo Rosasco, Silvia Villa	Within a statistical learning setting, we propose and study an iterative regularization algorithm for least squares defined by an incremental gradient method.
183	Halting in Random Walk Kernels	Mahito Sugiyama, Karsten Borgwardt	We theoretically show that halting may occur in geometric random walk kernels.
184	MCMC for Variationally Sparse Gaussian Processes	James Hensman, Alexander G. Matthews, Maurizio Filippone, Zoubin Ghahramani	This paper simultaneously addresses these, using a variational approximation to the posterior which is sparse in sup- port of the function but otherwise free-form.
185	Less is More: Nystr�m Computational Regularization	Alessandro Rudi, Raffaello Camoriano, Lorenzo Rosasco	We study Nyström type subsampling approaches to large scale kernel methods, and prove learning bounds in the statistical learning setting, where random sampling and high probability estimates are considered.
186	Infinite Factorial Dynamical Model	Isabel Valera, Francisco Ruiz, Lennart Svensson, Fernando Perez-Cruz	We propose the infinite factorial dynamic model (iFDM), a general Bayesian nonparametric model for source separation.
187	Regularization Path of Cross-Validation Error Lower Bounds	Atsushi Shibagaki, Yoshiki Suzuki, Masayuki Karasuyama, Ichiro Takeuchi	Careful tuning of a regularization parameter is indispensable in many machine learning tasks because it has a significant impact on generalization performances.Nevertheless, current practice of regularization parameter tuning is more of an art than a science, e.g., it is hard to tell how many grid-points would be needed in cross-validation (CV) for obtaining a solution with sufficiently small CV error.In this paper we propose a novel framework for computing a lower bound of the CV errors as a function of the regularization parameter, which we call regularization path of CV error lower bounds.The proposed framework can be used for providing a theoretical approximation guarantee on a set of solutions in the sense that how far the CV error of the current best solution could be away from best possible CV error in the entire range of the regularization parameters.We demonstrate through numerical experiments that a theoretically guaranteed a choice of regularization parameter in the above sense is possible with reasonable computational costs.
188	Attractor Network Dynamics Enable Preplay and Rapid Path Planning in Maze�like Environments	Dane S. Corneil, Wulfram Gerstner	Here, we show how a particular mapping of space allows for the immediate generation of trajectories between arbitrary start and goal locations in an environment, based only on the mapped representation of the goal.
189	Teaching Machines to Read and Comprehend	Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, Phil Blunsom	In this work we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data.
190	Principal Differences Analysis: Interpretable Characterization of Differences between Distributions	Jonas W. Mueller, Tommi Jaakkola	We introduce principal differences analysis for analyzing differences between high-dimensional distributions.
191	When are Kalman-Filter Restless Bandits Indexable?	Christopher R. Dance, Tomi Silander	We study the restless bandit associated with an extremely simple scalar Kalman filter model in discrete time.
192	Segregated Graphs and Marginals of Chain Graph Models	Ilya Shpitser	In this paper, we show that special mixed graphs which we call segregated graphs can be associated, via a Markov property, with supermodels of a marginal of chain graphs defined only by conditional independences.
193	Efficient Non-greedy Optimization of Decision Trees	Mohammad Norouzi, Maxwell Collins, Matthew A. Johnson, David J. Fleet, Pushmeet Kohli	In this paper, we present an algorithm for optimizing the split functions at all levels of the tree jointly with the leaf parameters, based on a global objective.
194	Probabilistic Curve Learning: Coulomb Repulsion and the Electrostatic Gaussian Process	Ye Wang, David B. Dunson	We solve these issues by proposing a novel Coulomb repulsive process (Corp) for locations of points on the manifold, inspired by physical models of electrostatic interactions among particles.
195	Inverse Reinforcement Learning with Locally Consistent Reward Functions	Quoc Phong Nguyen, Bryan Kian Hsiang Low, Patrick Jaillet	This paper presents a novel generalization of the IRL problem that allows each trajectory to be generated by multiple locally consistent reward functions, hence catering to more realistic and complex experts’ behaviors.
196	Communication Complexity of Distributed Convex Learning and Optimization	Yossi Arjevani, Ohad Shamir	We study the fundamental limits to communication-efficient distributed methods for convex learning and optimization, under different assumptions on the information available to individual machines, and the types of functions considered.
197	End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture	Jianshu Chen, Ji He, Yelong Shen, Lin Xiao, Xiaodong He, Jianfeng Gao, Xinying Song, Li Deng	We develop a fully discriminative learning approach for supervised Latent Dirichlet Allocation (LDA) model using Back Propagation (i.e., BP-sLDA), which maximizes the posterior probability of the prediction variable given the input document.
198	Subset Selection by Pareto Optimization	Chao Qian, Yang Yu, Zhi-Hua Zhou	In this paper, we propose the POSS approach which employs evolutionary Pareto optimization to find a small-sized subset with good performance.
199	On the Accuracy of Self-Normalized Log-Linear Models	Jacob Andreas, Maxim Rabinovich, Michael I. Jordan, Dan Klein	In this paper, we analyze a recently proposed technique known as “self-normalization”, which introduces a regularization term in training to penalize log normalizers for deviating from zero.
200	Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring	Junpei Komiyama, Junya Honda, Hiroshi Nakagawa	In this paper, we study partial monitoring with finite actions and stochastic outcomes.
201	Is Approval Voting Optimal Given Approval Votes?	Ariel D. Procaccia, Nisarg Shah	We challenge this assertion by proposing a probabilistic framework of noisy voting, and asking whether approval voting yields an alternative that is most likely to be the best alternative, given k-approval votes.
202	Regressive Virtual Metric Learning	Micha�l Perrot, Amaury Habrard	In this paper, instead of bringing closer examples of the same class and pushing far away examples of different classes we propose to move the examples with respect to virtual points.
203	Analysis of Robust PCA via Local Incoherence	Huishuai Zhang, Yi Zhou, Yingbin Liang	We investigate the robust PCA problem of decomposing an observed matrix into the sum of a low-rank and a sparse error matrices via convex programming Principal Component Pursuit (PCP).
204	Learning to Transduce with Unbounded Memory	Edward Grefenstette, Karl Moritz Hermann, Mustafa Suleyman, Phil Blunsom	In this paper we explore the representational power of these models using synthetic grammars designed to exhibit phenomena similar to those found in real transduction problems such as machine translation.
205	Max-Margin Deep Generative Models	Chongxuan Li, Jun Zhu, Tianlin Shi, Bo Zhang	This paper presents max-margin deep generative models (mmDGMs), which explore the strongly discriminative principle of max-margin learning to improve the discriminative power of DGMs, while retaining the generative capability.
206	Spherical Random Features for Polynomial Kernels	Jeffrey Pennington, Felix Xinnan X. Yu, Sanjiv Kumar	The question we address in this work is: if we know a priori that data is so normalized, can we devise a more compact map?
207	Rectified Factor Networks	Djork-Arn� Clevert, Andreas Mayr, Thomas Unterthiner, Sepp Hochreiter	We propose rectified factor networks (RFNs) to efficiently construct very sparse, non-linear, high-dimensional representations of the input.
208	Learning Bayesian Networks with Thousands of Variables	Mauro Scanagatta, Cassio P. de Campos, Giorgio Corani, Marco Zaffalon	We present a method for learning Bayesian networks from data sets containingthousands of variables without the need for structure constraints.
209	Matrix Completion Under Monotonic Single Index Models	Ravi Sastry Ganti, Laura Balzano, Rebecca Willett	We propose a novel matrix completion method that alternates between low-rank matrix estimation and monotonic function estimation to estimate the missing matrix elements.
210	Visalogy: Answering Visual Analogy Questions	Fereshteh Sadeghi, C. Lawrence Zitnick, Ali Farhadi	In this paper, we study the problem of answering visual analogy questions. We pose this problem as learning an embedding that encourages pairs of analogous images with similar transformations to be close together using convolutional neural networks with a quadruple Siamese architecture.
211	Tree-Guided MCMC Inference for Normalized Random Measure Mixture Models	Juho Lee, Seungjin Choi	In this paper, we present a hybrid inference algorithm for NRMM models, which combines the merits of both MCMC and IBHC.
212	Streaming Min-max Hypergraph Partitioning	Dan Alistarh, Jennifer Iglesias, Milan Vojnovic	We consider the problem of partitioning the set of items into a given number of parts such that the maximum number of topics covered by a part of the partition is minimized.
213	Collaboratively Learning Preferences from Ordinal Data	Sewoong Oh, Kiran K. Thekumparampil, Jiaming Xu	We present the convex relaxation approach in two contexts of interest: collaborative ranking and bundled choice modeling.
214	Biologically Inspired Dynamic Textures for Probing Motion Perception	Jonathan Vacher, Andrew Isaac Meso, Laurent U. Perrinet, Gabriel Peyr�	Importantly, we show that this model can equivalently be described as a stochastic partial differential equation.
215	Generative Image Modeling Using Spatial LSTMs	Lucas Theis, Matthias Bethge	We here introduce a recurrent image model based on multi-dimensional long short-term memory units which are particularly suited for image modeling due to their spatial structure.
216	Robust PCA with compressed data	Wooseok Ha, Rina Foygel Barber	We examine the robust principal component analysis (RPCA) problem under data compression, wherethe data $Y$ is approximately given by $(L + S)\cdot C$, that is, a low-rank $+$ sparse data matrix that has been compressed to size $n\times m$ (with $m$ substantially smaller than the original dimension $d$) via multiplication witha compression matrix $C$.
217	Sampling from Probabilistic Submodular Models	Alkis Gotovos, Hamed Hassani, Andreas Krause	In this paper, we investigate the use of Markov chain Monte Carlo sampling to perform approximate inference in general log-submodular and log-supermodular models.
218	COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution	Mehrdad Farajtabar, Yichen Wang, Manuel Gomez Rodriguez, Shuang Li, Hongyuan Zha, Le Song	We experimented with both synthetic data and data gathered from Twitter, and show that our model provides a good fit to the data as well as more accurate predictions than alternatives.
219	Supervised Learning for Dynamical System Learning	Ahmed Hefny, Carlton Downey, Geoffrey J. Gordon	We demonstrate theeffectiveness of our framework by showing examples where nonlinear regressionor lasso let us learn better state representations than plain linear regression does;the correctness of these instances follows directly from our general analysis.
220	Regret-Based Pruning in Extensive-Form Games	Noam Brown, Tuomas Sandholm	The new algorithm maintains CFR’s convergence guarantees while making iterations significantly faster—even if previously known pruning techniques are used in the comparison.
221	Fast Two-Sample Testing with Analytic Representations of Probability Measures	Kacper P. Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, Arthur Gretton	We propose a class of nonparametric two-sample tests with a cost linear in the sample size.
222	Learning to Segment Object Candidates	Pedro O. Pinheiro, Ronan Collobert, Piotr Dollar	In this paper, we propose a new way to generate object proposals, introducing an approach based on a discriminative convolutional network.
223	GP Kernels for Cross-Spectrum Analysis	Kyle R. Ulrich, David E. Carlson, Kafui Dzirasa, Lawrence Carin	In this paper, we develop a novel covariance kernel for multiple outputs, called the cross-spectral mixture (CSM) kernel.
224	Secure Multi-party Differential Privacy	Peter Kairouz, Sewoong Oh, Pramod Viswanath	We study the problem of multi-party interactive function computation under differential privacy.
225	Spatial Transformer Networks	Max Jaderberg, Karen Simonyan, Andrew Zisserman, koray kavukcuoglu	In this work we introduce a new learnable module, theSpatial Transformer, which explicitly allows the spatial manipulation ofdata within the network.
226	Anytime Influence Bounds and the Explosive Behavior of Continuous-Time Diffusion Networks	Kevin Scaman, R�mi Lemonnier, Nicolas Vayatis	We introduce the Laplace Hazard matrix and show that its spectral radius fully characterizes the dynamics of the contagion both in terms of influence and of explosion time.
227	Multi-class SVMs: From Tighter Data-Dependent Generalization Bounds to Novel Algorithms	Yunwen Lei, Urun Dogan, Alexander Binder, Marius Kloft	This paper studies the generalization performance of multi-class classification algorithms, for which we obtain, for the first time, a data-dependent generalization error bound with a logarithmic dependence on the class size, substantially improving the state-of-the-art linear dependence in the existing data-dependent generalization analysis.
228	High-dimensional neural spike train analysis with generalized count linear dynamical systems	Yuanjun Gao, Lars Busing, Krishna V. Shenoy, John P. Cunningham	We apply our model to data from primate motor cortex and demonstrate performance improvements over state-of-the-art methods, both in capturing the variance structure of the data and in held-out prediction.
229	Learning with a Wasserstein Loss	Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya, Tomaso A. Poggio	In this paper we develop a loss function for multi-label learning, based on the Wasserstein distance.
230	b-bit Marginal Regression	Martin Slawski, Ping Li	We consider the problem of sparse signal recovery from $m$ linear measurements quantized to $b$ bits.
231	Natural Neural Networks	Guillaume Desjardins, Karen Simonyan, Razvan Pascanu, koray kavukcuoglu	We introduce Natural Neural Networks, a novel family of algorithms that speed up convergence by adapting their internal representation during training to improve conditioning of the Fisher matrix.
232	Optimization Monte Carlo: Efficient and Embarrassingly Parallel Likelihood-Free Inference	Ted Meeds, Max Welling	We describe an embarrassingly parallel, anytime Monte Carlo method for likelihood-free models.
233	Adaptive Primal-Dual Splitting Methods for Statistical Learning and Image Processing	Tom Goldstein, Min Li, Xiaoming Yuan	We propose self-adaptive stepsize rules that automatically tune PDHG parameters for optimal convergence.
234	On some provably correct cases of variational inference for topic models	Pranjal Awasthi, Andrej Risteski	We provide the first analysis of instances where variational inference algorithms converge to the global optimum, in the setting of topic models.
235	Collaborative Filtering with Graph Information: Consistency and Scalable Methods	Nikhil Rao, Hsiang-Fu Yu, Pradeep K. Ravikumar, Inderjit S. Dhillon	We tackle the problem of matrix completion when pairwise relationships among variables are known, via a graph.
236	Combinatorial Bandits Revisited	Richard Combes, Mohammad Sadegh Talebi Mazraeh Shahi, Alexandre Proutiere, marc lelarge	We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret.
237	Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning	Shakir Mohamed, Danilo Jimenez Rezende	This paper provides a new approach for scalable optimisation of the mutual information by merging techniques from variational inference and deep learning.
238	A Structural Smoothing Framework For Robust Graph Comparison	Pinar Yanardag, S.V.N. Vishwanathan	In this paper, we propose a general smoothing framework for graph kernels by taking \textit{structural similarity} into account, and apply it to derive smoothed variants of popular graph kernels.
239	Competitive Distribution Estimation: Why is Good-Turing Good	Alon Orlitsky, Ananda Theertha Suresh	Conversely, we show that any estimator must have a KL divergence $\ge\tilde\Omega(\min(k/n,1/ n^{2/3}))$ over the best estimator for the first comparison, and $\ge\tilde\Omega(\min(k/n,1/\sqrt{n}))$ for the second.
240	Efficient Learning by Directed Acyclic Graph For Resource Constrained Prediction	Joseph Wang, Kirill Trapeznikov, Venkatesh Saligrama	Rather than jointly optimizing such a highly coupled and non-convex problem over all decision nodes, we propose an efficient algorithm motivated by dynamic programming.
241	A hybrid sampler for Poisson-Kingman mixture models	Maria Lomeli, Stefano Favaro, Yee Whye Teh	We present a novel and compact way of representing the infinite dimensional component of the model such that while explicitly representing this infinite component it has less memory and storage requirements than previous MCMC schemes.
242	An Active Learning Framework using Sparse-Graph Codes for Sparse Polynomials and Graph Sketching	Xiao Li, Kannan Ramchandran	We introduce an active learning framework that is associated with a low query cost and computational runtime.
243	Local Smoothness in Variance Reduced Optimization	Daniel Vainsencher, Han Liu, Tong Zhang	Abstract We propose a family of non-uniform sampling strategies to provably speed up a class of stochastic optimization algorithms with linear convergence including Stochastic Variance Reduced Gradient (SVRG) and Stochastic Dual Coordinate Ascent (SDCA).
244	Saliency, Scale and Information: Towards a Unifying Theory	Shafin Rahman, Neil Bruce	In this paper we present a definition for visual saliency grounded in information theory.
245	Fighting Bandits with a New Kind of Smoothness	Jacob D. Abernethy, Chansoo Lee, Ambuj Tewari	In the present work, we provide a new set of analysis tools, using the notion of convex smoothing, to provide several novel algorithms with optimal guarantees.
246	Beyond Sub-Gaussian Measurements: High-Dimensional Structured Estimation with Sub-Exponential Designs	Vidyashankar Sivakumar, Arindam Banerjee, Pradeep K. Ravikumar	We consider the problem of high-dimensional structured estimation with norm-regularized estimators, such as Lasso, when the design matrix and noise are drawn from sub-exponential distributions.Existing results only consider sub-Gaussian designs and noise, and both the sample complexity and non-asymptotic estimation error have been shown to depend on the Gaussian width of suitable sets.
247	Spectral Norm Regularization of Orthonormal Representations for Graph Transduction	Rakesh Shivanna, Bibaswan K. Chatterjee, Raman Sankaran, Chiranjib Bhattacharyya, Francis Bach	In this paper, we show that orthonormal representations, a class of unit-sphere graph embeddings are PAC learnable.
248	Convolutional Networks on Graphs for Learning Molecular Fingerprints	David K. Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alan Aspuru-Guzik, Ryan P. Adams	We introduce a convolutional neural network that operates directly on graphs.These networks allow end-to-end learning of prediction pipelines whose inputs are graphs of arbitrary size and shape.The architecture we present generalizes standard molecular feature extraction methods based on circular fingerprints.We show that these data-driven features are more interpretable, and have better predictive performance on a variety of tasks.
249	Mixed Robust/Average Submodular Partitioning: Fast Algorithms, Guarantees, and Applications	Kai Wei, Rishabh K. Iyer, Shengjie Wang, Wenruo Bai, Jeff A. Bilmes	In the present paper, we bridge this gap, by proposing several new algorithms (including greedy, majorization-minimization, minorization-maximization, and relaxation algorithms) that not only scale to large datasets but that also achieve theoretical approximation guarantees comparable to the state-of-the-art.
250	Tractable Learning for Complex Probability Queries	Jessa Bekker, Jesse Davis, Arthur Choi, Adnan Darwiche, Guy Van den Broeck	We propose a tractable learner that guarantees efficient inference for a broader class of queries.
251	StopWasting My Gradients: Practical SVRG	Reza Harikandeh, Mohamed Osama Ahmed, Alim Virani, Mark Schmidt, Jakub Konecn�, Scott Sallinen	We present and analyze several strategies for improving the performance ofstochastic variance-reduced gradient (SVRG) methods.
252	Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction	Been Kim, Julie A. Shah, Finale Doshi-Velez	We present the Mind the Gap Model (MGM), an approach for interpretable feature extraction and selection.
253	A Normative Theory of Adaptive Dimensionality Reduction in Neural Networks	Cengiz Pehlevan, Dmitri Chklovskii	Here, we derive biologically plausible dimensionality reduction algorithms which adapt the number of output dimensions to the eigenspectrum of the input covariance matrix.
254	On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators	Changyou Chen, Nan Ding, Lawrence Carin	In this paper we consider general SG-MCMCs with high-order integrators, and develop theory to analyze finite-time convergence properties and their asymptotic invariant measures.
255	Learning structured densities via infinite dimensional exponential families	Siqi Sun, Mladen Kolar, Jinbo Xu	In this paper, we study the problem of estimating the structure of a probabilistic graphical model without assuming a particular parametric model.
256	Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question	Haoyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, Wei Xu	In this paper, we present the mQA model, which is able to answer questions about the content of an image. We construct a Freestyle Multilingual Image Question Answering (FM-IQA) dataset to train and evaluate our mQA model.
257	Variance Reduced Stochastic Gradient Descent with Neighbors	Thomas Hofmann, Aurelien Lucchi, Simon Lacoste-Julien, Brian McWilliams	Recently, variance reduction techniques such as SVRG and SAGA have been proposed to overcome this weakness.
258	Sample Efficient Path Integral Control under Uncertainty	Yunpeng Pan, Evangelos Theodorou, Michail Kontitsis	We present a data-driven stochastic optimal control framework that is derived using the path integral (PI) control approach.
259	Stochastic Expectation Propagation	Yingzhen Li, Jos� Miguel Hern�ndez-Lobato, Richard E. Turner	This paper presents an extension to EP, called stochastic expectation propagation (SEP), that maintains a global posterior approximation (like VI) but updates it in a local way (like EP).
260	Exactness of Approximate MAP Inference in Continuous MRFs	Nicholas Ruozzi	In this work, we use graph covers to provide necessary and sufficient conditions for continuous MAP relaxations to be tight.
261	Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients	Bo Xie, Yingyu Liang, Le Song	We demonstrate the effectiveness and scalability of our algorithm on large scale synthetic and real world datasets.
262	Generalization in Adaptive Data Analysis and Holdout Reuse	Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toni Pitassi, Omer Reingold, Aaron Roth	We give an algorithm that enables the validation of a large number of adaptively chosen hypotheses, while provably avoiding overfitting.
263	Market Scoring Rules Act As Opinion Pools For Risk-Averse Agents	Mithun Chakraborty, Sanmay Das	In this paper, we add to a growing body of research aimed at understanding the precise manner in which the price process induced by a MSR incorporates private information from agents who deviate from the assumption of risk-neutrality.
264	Sparse Linear Programming via Primal and Dual Augmented Coordinate Descent	Ian En-Hsu Yen, Kai Zhong, Cho-Jui Hsieh, Pradeep K. Ravikumar, Inderjit S. Dhillon	In this paper, we investigate a general LP algorithm based on the combination of Augmented Lagrangian and Coordinate Descent (AL-CD), giving an iteration complexity of $O((\log(1/\epsilon))^2)$ with $O(nnz(A))$ cost per iteration, where $nnz(A)$ is the number of non-zeros in the $m\times n$ constraint matrix $A$, and in practice, one can further reduce cost per iteration to the order of non-zeros in columns (rows) corresponding to the active primal (dual) variables through an active-set strategy.
265	Training Very Deep Networks	Rupesh K. Srivastava, Klaus Greff, J�rgen Schmidhuber	Here we introduce a new architecture designed to overcome this.
266	Bayesian Active Model Selection with an Application to Automated Audiometry	Jacob Gardner, Gustavo Malkomes, Roman Garnett, Kilian Q. Weinberger, Dennis Barbour, John P. Cunningham	We introduce a novel information-theoretic approach for active model selection and demonstrate its effectiveness in a real-world application.
267	Particle Gibbs for Infinite Hidden Markov Models	Nilesh Tripuraneni, Shixiang (Shane) Gu, Hong Ge, Zoubin Ghahramani	In this paper, we present an infinite-state Particle Gibbs (PG) algorithm to resample state trajectories for the iHMM.
268	Learning spatiotemporal trajectories from manifold-valued longitudinal data	Jean-Baptiste SCHIRATTI, St�phanie ALLASSONNIERE, Olivier Colliot, Stanley DURRLEMAN	We propose a Bayesian mixed-effects model to learn typical scenarios of changes from longitudinal manifold-valued data, namely repeated measurements of the same objects or individuals at several points in time.
269	A Bayesian Framework for Modeling Confidence in Perceptual Decision Making	Koosha Khalvati, Rajesh P. Rao	In this paper, we introduce a Bayesian framework to model confidence in perceptual decision making.
270	Path-SGD: Path-Normalized Optimization in Deep Neural Networks	Behnam Neyshabur, Ruslan R. Salakhutdinov, Nati Srebro	We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights.
271	On the consistency theory of high dimensional variable screening	Xiangyu Wang, Chenlei Leng, David B. Dunson	When the data dimension $p$ is substantially larger than the sample size $n$, variable screening becomes crucial as 1) Faster feature selection algorithms are needed; 2) Conditions guaranteeing selection consistency might fail to hold.This article studies a class of linear screening methods and establishes consistency theory for this special class.
272	End-To-End Memory Networks	Sainbayar Sukhbaatar, arthur szlam, Jason Weston, Rob Fergus	We introduce a neural network with a recurrent attention model over a possibly large external memory.
273	Spectral Representations for Convolutional Neural Networks	Oren Rippel, Jasper Snoek, Ryan P. Adams	In this work, we demonstrate that, beyond its advantages for efficient computation, the spectral domain also provides a powerful representation in which to model and train convolutional neural networks (CNNs).
274	Online Gradient Boosting	Alina Beygelzimer, Elad Hazan, Satyen Kale, Haipeng Luo	We extend the theory of boosting for regression problems to the online learning setting.
275	Deep Temporal Sigmoid Belief Networks for Sequence Modeling	Zhe Gan, Chunyuan Li, Ricardo Henao, David E. Carlson, Lawrence Carin	Scalable learning and inference algorithms are derived by introducing a recognition model that yields fast sampling from the variational posterior.
276	Recognizing retinal ganglion cells in the dark	Emile Richard, Georges A. Goetz, E.J. Chichilnisky	Here, we develop automated classifiers for functional identification of retinal ganglion cells, the output neurons of the retina, based solely on recorded voltage patterns on a large scale array.
277	A Theory of Decision Making Under Dynamic Context	Michael Shvartsman, Vaibhav Srivastava, Jonathan D. Cohen	In this work, we describe a computational theory of decision making under dynamically shifting context.
278	A Gaussian Process Model of Quasar Spectral Energy Distributions	Andrew Miller, Albert Wu, Jeff Regier, Jon McAuliffe, Dustin Lang, Mr. Prabhat, David Schlegel, Ryan P. Adams	We propose a method for combining two sources of astronomical data, spectroscopy and photometry, that carry information about sources of light (e.g., stars, galaxies, and quasars) at extremely different spectral resolutions.
279	Hidden Technical Debt in Machine Learning Systems	D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Fran�ois Crespo, Dan Dennison	This paper argues it is dangerous to think ofthese quick wins as coming for free.
280	Local Causal Discovery of Direct Causes and Effects	Tian Gao, Qiang Ji	We propose a new local causal discovery algorithm, called Causal Markov Blanket (CMB), to identify the direct causes and effects of a target variable based on Markov Blanket Discovery.
281	High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality	Zhaoran Wang, Quanquan Gu, Yang Ning, Han Liu	In particular, we make two contributions: (i) For parameter estimation, we propose a novel high dimensional EM algorithm which naturally incorporates sparsity structure into parameter estimation.
282	Revenue Optimization against Strategic Buyers	Mehryar Mohri, Andres Munoz	We present a revenue optimization algorithm for posted-price auctions when facing a buyer with random valuations who seeks to optimize his $\gamma$-discounted surplus.
283	Deep Convolutional Inverse Graphics Network	Tejas D. Kulkarni, William F. Whitney, Pushmeet Kohli, Josh Tenenbaum	This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN), a model that aims to learn an interpretable representation of images, disentangled with respect to three-dimensional scene structure and viewing transformations such as depth rotations and lighting variations.
284	Sparse and Low-Rank Tensor Decomposition	Parikshit Shah, Nikhil Rao, Gongguo Tang	We present an efficient computational algorithm that modifies Leurgans’ algoirthm for tensor factorization.
285	Minimax Time Series Prediction	Wouter M. Koolen, Alan Malek, Peter L. Bartlett, Yasin Abbasi	We derive the minimax strategyfor all problems of this type and show that it can be implementedefficiently.
286	Differentially Private Learning of Structured Discrete Distributions	Ilias Diakonikolas, Moritz Hardt, Ludwig Schmidt	Our goal is to design efficient algorithms that simultaneously achieve low error in total variation norm while guaranteeing Differential Privacy to the individuals of the population.We describe a general approach that yields near sample-optimal and computationally efficient differentially private estimators for a wide range of well-studied and natural distribution families.
287	Variational Dropout and the Local Reparameterization Trick	Durk P. Kingma, Tim Salimans, Max Welling	Our method allows inference of more flexibly parameterized posteriors; specifically, we propose \emph{variational dropout}, a generalization of Gaussian dropout, but with a more flexibly parameterized posterior, often leading to better generalization.
288	Sample Complexity of Learning Mahalanobis Distance Metrics	Nakul Verma, Kristin Branson	In this work we provide PAC-style sample complexity rates for supervised metric learning.
289	Learning Wake-Sleep Recurrent Attention Models	Jimmy Ba, Ruslan R. Salakhutdinov, Roger B. Grosse, Brendan J. Frey	Borrowing techniques from the literature on training deep generative models, we present the Wake-Sleep Recurrent Attention Model, a method for training stochastic attention networks which improves posterior inference and which reduces the variability in the stochastic gradients.
290	Robust Gaussian Graphical Modeling with the Trimmed Graphical Lasso	Eunho Yang, Aurelie C. Lozano	In this paper, we propose the Trimmed Graphical Lasso for robust estimation of sparse GGMs.
291	Testing Closeness With Unequal Sized Samples	Bhaswar Bhattacharya, Gregory Valiant	Specifically, given a target error parameter $\eps > 0$, $m_1$ independent draws from an unknown distribution $p$ with discrete support, and $m_2$ draws from an unknown distribution $q$ of discrete support, we describe a test for distinguishing the case that $p=q$ from the case that $\|\|p-q\|\|_1 \geq \eps$.
292	Estimating Jaccard Index with Missing Observations: A Matrix Calibration Approach	Wenye Li	This paper investigates the problem of estimating a Jaccard index matrix when there are missing observations in data samples.
293	Neural Adaptive Sequential Monte Carlo	Shixiang (Shane) Gu, Zoubin Ghahramani, Richard E. Turner	This paper presents a new method for automatically adapting the proposal using an approximation of the Kullback-Leibler divergence between the true posterior and the proposal distribution.
294	Local Expectation Gradients for Black Box Variational Inference	Michalis Titsias RC AUEB, Miguel L�zaro-Gredilla	We introduce local expectation gradients which is a general purpose stochastic variational inference algorithm for constructing stochastic gradients by sampling from the variational distribution.
295	On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants	Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alexander J. Smola	We bridge this gap by presentinga unifying framework that captures many variance reduction techniques.Subsequently, we propose an asynchronous algorithm grounded in our framework,with fast convergence rates.
296	NEXT: A System for Real-World Development, Evaluation, and Application of Active Learning	Kevin G. Jamieson, Lalit Jain, Chris Fernandez, Nicholas J. Glattard, Rob Nowak	Active learning methods automatically adapt data collection by selecting the most informative samples in order to accelerate machine learning.
297	Super-Resolution Off the Grid	Qingqing Huang, Sham M. Kakade	This work provides an algorithm with the following favorable guarantees:1.
298	Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms	Christopher M. De Sa, Ce Zhang, Kunle Olukotun, Christopher R�, Christopher R�	Specifically, we useour new analysis in three ways: (1) we derive convergence rates for the convex case (Hogwild) with relaxed assumptions on the sparsity of the problem; (2) we analyze asynchronous SGD algorithms for non-convex matrix problems including matrix completion; and (3) we design and analyze an asynchronous SGD algorithm, called Buckwild, that uses lower-precision arithmetic.
299	The Return of the Gating Network: Combining Generative Models and Discriminative Training in Natural Image Priors	Dan Rosenbaum, Yair Weiss	In this paper we show how to combine the strengths of both approaches by training a discriminative, feed-forward architecture to predict the state of latent variables in a generative model of natural images.
300	Pointer Networks	Oriol Vinyals, Meire Fortunato, Navdeep Jaitly	We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that arediscrete tokens corresponding to positions in an input sequence.Such problems cannot be trivially addressed by existent approaches such as sequence-to-sequence and Neural Turing Machines,because the number of target classes in eachstep of the output depends on the length of the input, which is variable.Problems such as sorting variable sized sequences, and various combinatorialoptimization problems belong to this class.
301	Associative Memory via a Sparse Recovery Model	Arya Mazumdar, Ankit Singh Rawat	In this paper, for the first time, we propose a model of associative memory based on sparse recovery of signals.
302	Robust Spectral Inference for Joint Stochastic Matrix Factorization	Moontae Lee, David Bindel, David Mimno	Spectral inference provides fast algorithms and provable optimality for latent topic analysis.
303	Fast, Provable Algorithms for Isotonic Regression in all L_p-norms	Rasmus Kyng, Anup Rao, Sushant Sachdeva	This paper gives improved algorithms for computing the Isotonic Regression for all weighted $\ell_{p}$-norms with rigorous performance guarantees.
304	Adversarial Prediction Games for Multivariate Losses	Hong Wang, Wei Xing, Kaiser Asif, Brian Ziebart	We propose to approximate the training data instead of the loss function by posing multivariate prediction as an adversarial game between a loss-minimizing prediction player and a loss-maximizing evaluation player constrained to match specified properties of training data.
305	Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization	Xiangru Lian, Yijun Huang, Yuncheng Li, Ji Liu	We establish an ergodic convergence rate $O(1/\sqrt{K})$ for both algorithms and prove that the linear speedup is achievable if the number of workers is bounded by $\sqrt{K}$ ($K$ is the total number of iterations).
306	Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images	Manuel Watter, Jost Springenberg, Joschka Boedecker, Martin Riedmiller	We introduce Embed to Control (E2C), a method for model learning and control of non-linear dynamical systems from raw pixel images.
307	Efficient and Parsimonious Agnostic Active Learning	Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire	We develop a new active learning algorithm for the streaming settingsatisfying three important properties: 1) It provably works for anyclassifier representation and classification problem including thosewith severe noise.
308	Softstar: Heuristic-Guided Probabilistic Inference	Mathew Monfort, Brenden M. Lake, Brenden M. Lake, Brian Ziebart, Patrick Lucey, Josh Tenenbaum	We propose the Softstar algorithm, a softened heuristic-guided search technique for the maximum entropy inverse optimal control model of sequential behavior.
309	Grammar as a Foreign Language	Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton	In this paper we showthat the domain agnostic attention-enhanced sequence-to-sequence modelachieves state-of-the-art results on the most widely used syntacticconstituency parsing dataset, when trained on a large synthetic corpusthat was annotated using existing parsers.
310	Regularization-Free Estimation in Trace Regression with Symmetric Positive Semidefinite Matrices	Martin Slawski, Ping Li, Matthias Hein	In this paper, we argue that such regularization may no longer be necessary if the underlying matrix is symmetric positive semidefinite (spd) and the design satisfies certain conditions.
311	Winner-Take-All Autoencoders	Alireza Makhzani, Brendan J. Frey	In this paper, we propose a winner-take-all method for learning hierarchical sparse representations in an unsupervised fashion.
312	Deep Poisson Factor Modeling	Ricardo Henao, Zhe Gan, James Lu, Lawrence Carin	We propose a new deep architecture for topic modeling, based on Poisson Factor Analysis (PFA) modules.
313	Bayesian Optimization with Exponential Convergence	Kenji Kawaguchi, Leslie Pack Kaelbling, Tom�s Lozano-P�rez	This paper presents a Bayesian optimization method with exponential convergence without the need of auxiliary optimization and without the delta-cover sampling.
314	Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning	Christoph Dann, Emma Brunskill	In this paper, we derive an upper PAC bound of order O(\|S\|²\|A\|H² log(1/δ)/ɛ²) and a lower PAC bound Ω(\|S\|\|A\|H² log(1/(δ+c))/ɛ²) (ignoring log-terms) that match up to log-terms and an additional linear dependency on the number of states \|S\|.
315	Learning with Relaxed Supervision	Jacob Steinhardt, Percy S. Liang	In this paper, we develop a rigorous approach to relaxing the supervision, which yields asymptotically consistent parameter estimates despite altering the supervision.
316	Subsampled Power Iteration: a Unified Algorithm for Block Models and Planted CSP's	Vitaly Feldman, Will Perkins, Santosh Vempala	We present an algorithm for recovering planted solutions in two well-known models, the stochastic block model and planted constraint satisfaction problems (CSP), via a common generalization in terms of random bipartite graphs.
317	Accelerated Mirror Descent in Continuous and Discrete Time	Walid Krichene, Alexandre Bayen, Peter L. Bartlett	Combining the original continuous-time motivation of mirror descent with a recent ODE interpretation of Nesterov’s accelerated method, we propose a family of continuous-time descent dynamics for convex functions with Lipschitz gradients, such that the solution trajectories are guaranteed to converge to the optimum at a $O(1/t^2)$ rate.
318	The Human Kernel	Andrew G. Wilson, Christoph Dann, Chris Lucas, Eric P. Xing	Bayesian nonparametric models, such as Gaussian processes, provide a compelling framework for automatic statistical modelling: these models have a high degree of flexibility, and automatically calibrated complexity. In this paper, we create function extrapolation problems and acquire human responses, and then design a kernel learning framework to reverse engineer the inductive biases of human learners across a set of behavioral experiments.
319	Action-Conditional Video Prediction using Deep Networks in Atari Games	Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L. Lewis, Satinder Singh	We propose and evaluate two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks.
320	A Pseudo-Euclidean Iteration for Optimal Recovery in Noisy ICA	James R. Voss, Mikhail Belkin, Luis Rademacher	We propose a new algorithm, PEGI (for pseudo-Euclidean Gradient Iteration), for provable model recovery for ICA with Gaussian noise.
321	Distributed Submodular Cover: Succinctly Summarizing Massive Data	Baharan Mirzasoleiman, Amin Karbasi, Ashwinkumar Badanidiyuru, Andreas Krause	In this paper, we formalize this challenge as a submodular cover problem.
322	Community Detection via Measure Space Embedding	Mark Kozdoba, Shie Mannor	We present a new algorithm for community detection.
323	Basis refinement strategies for linear value function approximation in MDPs	Gheorghe Comanici, Doina Precup, Prakash Panangaden	We provide a theoretical framework for analyzing basis function construction for linear value function approximation in Markov Decision Processes (MDPs).
324	Structured Estimation with Atomic Norms: General Bounds and Applications	Sheng Chen, Arindam Banerjee	In this paper, we present general upper bounds for such geometric measures, which only require simple information of the atomic norm under consideration, and we establish tightness of these bounds by providing the corresponding lower bounds.
325	A Complete Recipe for Stochastic Gradient MCMC	Yi-An Ma, Tianqi Chen, Emily Fox	In this paper, we provide a general recipe for constructing MCMC samplers–including stochastic gradient versions–based on continuous Markov processes specified via two matrices.
326	Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff	Ofer Dekel, Ronen Eldan, Tomer Koren	We present an efficient algorithm for the banditsmooth convex optimization problem that guarantees a regret of $\widetilde{O}(T^{5/8})$.
327	Online Prediction at the Limit of Zero Temperature	Mark Herbster, Stephen Pasteris, Shaona Ghosh	We design an online algorithm to classify the vertices of a graph.
328	Learning Continuous Control Policies by Stochastic Value Gradients	Nicolas Heess, Gregory Wayne, David Silver, Timothy Lillicrap, Tom Erez, Yuval Tassa	We present a unified framework for learning continuous control policies usingbackpropagation.
329	Exploring Models and Data for Image Question Answering	Mengye Ren, Ryan Kiros, Richard Zemel	This work aims to address the problem of image-based question-answering (QA) with new models and datasets.
330	Efficient and Robust Automated Machine Learning	Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, Frank Hutter	In this work we introduce a robust new AutoML system based on scikit-learn (using 15 classifiers, 14 feature preprocessing methods, and 4 data preprocessing methods, giving rise to a structured hypothesis space with 110 hyperparameters).
331	Preconditioned Spectral Descent for Deep Learning	David E. Carlson, Edo Collins, Ya-Ping Hsieh, Lawrence Carin, Volkan Cevher	We theoretically formalize our arguments and derive novel preconditioned non-Euclidean algorithms.
332	A Recurrent Latent Variable Model for Sequential Data	Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C. Courville, Yoshua Bengio	In this paper, we explore the inclusion of latent random variables into the hidden state of a recurrent neural network (RNN) by combining the elements of the variational autoencoder.
333	Fast Convergence of Regularized Learning in Games	Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, Robert E. Schapire	We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games.
334	Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation	Marijn F. Stollenga, Wonmin Byeon, Marcus Liwicki, J�rgen Schmidhuber	Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation
335	Reflection, Refraction, and Hamiltonian Monte Carlo	Hadi Mohasel Afshar, Justin Domke	We introduce a modification of the Leapfrog discretization of Hamiltonian dynamics on piecewise continuous energies, where intersections of the trajectory with discontinuities are detected, and the momentum is reflected or refracted to compensate for the change in energy.
336	The Consistency of Common Neighbors for Link Prediction in Stochastic Blockmodels	Purnamrita Sarkar, Deepayan Chakrabarti, peter j. bickel	The Consistency of Common Neighbors for Link Prediction in Stochastic Blockmodels
337	Nearly Optimal Private LASSO	Kunal Talwar, Abhradeep Guha Thakurta, Li Zhang	We present a nearly optimal differentially private version of the well known LASSO estimator.
338	Convergence Analysis of Prediction Markets via Randomized Subspace Descent	Rafael Frongillo, Mark D. Reid	We establish convergence rates for RSD and leverage them to prove rates for the two prediction market models above, answering the open questions.
339	The Poisson Gamma Belief Network	Mingyuan Zhou, Yulai Cong, Bo Chen	To infer a multilayer representation of high-dimensional count vectors, we propose the Poisson gamma belief network (PGBN) that factorizes each of its layers into the product of a connection weight matrix and the nonnegative real hidden units of the next layer.
340	Convergence rates of sub-sampled Newton methods	Murat A. Erdogdu, Andrea Montanari	In this paper, we shift our attention to a more general setting — maximum likelihood estimation.
341	No-Regret Learning in Bayesian Games	Jason Hartline, Vasilis Syrgkanis, Eva Tardos	Recent price-of-anarchy analyses of games of complete information suggest that coarse correlated equilibria, which characterize outcomes resulting from no-regret learning dynamics, have near-optimal welfare.
342	Statistical Topological Data Analysis – A Kernel Perspective	Roland Kwitt, Stefan Huber, Marc Niethammer, Weili Lin, Ulrich Bauer	Our contribution is to close this gap by proving universality of a variant of the original kernel, and to demonstrate its effective use in two-sample hypothesis testing on synthetic as well as real-world data.
343	Semi-supervised Sequence Learning	Andrew M. Dai, Quoc V. Le	We present two approaches to use unlabeled data to improve Sequence Learningwith recurrent networks.
344	Structured Transforms for Small-Footprint Deep Learning	Vikas Sindhwani, Tara Sainath, Sanjiv Kumar	We propose a uni-fied framework to learn a broad family of structured parameter matrices that arecharacterized by the notion of low displacement rank.
345	Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width	Christopher M. De Sa, Ce Zhang, Kunle Olukotun, Christopher R�	To help understand the behavior of Gibbs sampling, we introduce a new (hyper)graph property, called hierarchy width.
346	Interpolating Convex and Non-Convex Tensor Decompositions via the Subspace Norm	Qinqing Zheng, Ryota Tomioka	We consider the problem of recovering a low-rank tensor from its noisy observation.
347	Sample Complexity Bounds for Iterative Stochastic Policy Optimization	Marin Kobilarov	This paper is concerned with robustness analysis of decision making under uncertainty.
348	BinaryConnect: Training Deep Neural Networks with binary weights during propagations	Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David	We introduce BinaryConnect, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated.
349	Interactive Control of Diverse Complex Characters with Neural Networks	Igor Mordatch, Kendall Lowrey, Galen Andrew, Zoran Popovic, Emanuel V. Todorov	We present a method for training recurrent neural networks to act as near-optimal feedback controllers.
350	Submodular Hamming Metrics	Jennifer A. Gillenwater, Rishabh K. Iyer, Bethany Lusch, Rahul Kidambi, Jeff A. Bilmes	We show that there is a largely unexplored class of functions (positive polymatroids) that can define proper discrete metrics over pairs of binary vectors and that are fairly tractable to optimize over.
351	A Universal Primal-Dual Convex Optimization Framework	Alp Yurtsever, Quoc Tran Dinh, Volkan Cevher	We propose a new primal-dual algorithmic framework for a prototypical constrained convex optimization template.
352	Learning From Small Samples: An Analysis of Simple Decision Heuristics	�zg�r Simsek, Marcus Buckmann	We focus on three families of heuristics: single-cue decision making, lexicographic decision making, and tallying.
353	Explore no more: Improved high-probability regret bounds for non-stochastic bandits	Gergely Neu	This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability.
354	Fast and Memory Optimal Low-Rank Matrix Approximation	Se-Young Yun, marc lelarge, Alexandre Proutiere	In this paper, we revisit the problem of constructing a near-optimal rank $k$ approximation of a matrix $M\in [0,1]^{m\times n}$ under the streaming data model where the columns of $M$ are revealed sequentially.
355	Learnability of Influence in Networks	Harikrishna Narasimhan, David C. Parkes, Yaron Singer	We establish PAC learnability of influence functions for three common influence models, namely, the Linear Threshold (LT), Independent Cascade (IC) and Voter models, and present concrete sample complexity results in each case.
356	Learning Causal Graphs with Small Interventions	Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath	We consider the problem of learning causal networks with interventions, when each intervention is limited in size under Pearl’s Structural Equation Model with independent errors (SEM-IE).
357	Information-theoretic lower bounds for convex optimization with erroneous oracles	Yaron Singer, Jan Vondrak	We consider the problem of optimizing convex and concave functions with access to an erroneous zeroth-order oracle.
358	Fixed-Length Poisson MRF: Adding Dependencies to the Multinomial	David I. Inouye, Pradeep K. Ravikumar, Inderjit S. Dhillon	We propose a novel distribution that generalizes the Multinomial distribution to enable dependencies between dimensions.
359	Large-Scale Bayesian Multi-Label Learning via Topic-Based Label Embeddings	Piyush Rai, Changwei Hu, Ricardo Henao, Lawrence Carin	We present a scalable Bayesian multi-label learning model based on learning low-dimensional label embeddings.
360	The Self-Normalized Estimator for Counterfactual Learning	Adith Swaminathan, Thorsten Joachims	This paper identifies a severe problem of the counterfactual risk estimator typically used in batch learning from logged bandit feedback (BLBF), and proposes the use of an alternative estimator that avoids this problem.In the BLBF setting, the learner does not receive full-information feedback like in supervised learning, but observes feedback only for the actions taken by a historical policy.This makes BLBF algorithms particularly attractive for training online systems (e.g., ad placement, web search, recommendation) using their historical logs.The Counterfactual Risk Minimization (CRM) principle offers a general recipe for designing BLBF algorithms.
361	Fast Lifted MAP Inference via Partitioning	Somdeb Sarkhel, Parag Singla, Vibhav G. Gogate	In this paper, we present a novel approach, which cleverly introduces new symmetries at the time of grounding.
362	Data Generation as Sequential Decision Making	Philip Bachman, Doina Precup	We formulate data imputation as an MDP and develop models capable of representing effective policies for it.
363	On Elicitation Complexity	Rafael Frongillo, Ian Kash	Specifically, what is the minimum number of regression parameters needed to compute the property?Building on previous work, we introduce a new notion of elicitation complexity and lay the foundations for a calculus of elicitation.
364	Decomposition Bounds for Marginal MAP	Wei Ping, Qiang Liu, Alexander T. Ihler	In this work, we generalize dual decomposition to a generic powered-sum inference task, which includes marginal MAP, along with pure marginalization and MAP, as special cases.
365	Discrete R�nyi Classifiers	Meisam Razaviyayn, Farzan Farnia, David Tse	In this work, we consider the problem of designing the optimum classifier based on some estimated low order marginals of (X,Y).
366	A class of network models recoverable by spectral clustering	Yali Wan, Marina Meila	Here we show that essentially the same algorithm used for the SBM and for its extension called Degree-Corrected SBM, works on a wider class of Block-Models, which we call Preference Frame Models, with essentially the same guarantees.
367	Skip-Thought Vectors	Ryan Kiros, Yukun Zhu, Ruslan R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, Sanja Fidler	We describe an approach for unsupervised learning of a generic, distributed sentence encoder.
368	Rate-Agnostic (Causal) Structure Learning	Sergey Plis, David Danks, Cynthia Freeman, Vince Calhoun	We apply these algorithms to data from simulations.
369	Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric	Vivien Seguy, Marco Cuturi	We consider in this work the space of probability measures $P(X)$ on a Hilbert space $X$ endowed with the 2-Wasserstein metric.
370	Consistent Multilabel Classification	Oluwasanmi O. Koyejo, Nagarajan Natarajan, Pradeep K. Ravikumar, Inderjit S. Dhillon	Based on the population-optimal classifier, we propose a computationally efficient and general-purpose plug-in classification algorithm, and prove its consistency with respect to the metric of interest.
371	Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions	Amar Shah, Zoubin Ghahramani	We develop \textit{parallel predictive entropy search} (PPES), a novel algorithm for Bayesian optimization of expensive black-box objective functions.
372	Cornering Stationary and Restless Mixing Bandits with Remix-UCB	Julien Audiffren, Liva Ralaivola	As we shall see, the bandit problem we tackle requires us to address the exploration/exploitation/independence trade-off, which we do by considering the idea of a {\em waiting arm} in the new Remix-UCB algorithm, a generalization of Improved-UCB for the problem at hand, that we introduce.
373	Semi-Supervised Factored Logistic Regression for High-Dimensional Neuroimaging Data	Danilo Bzdok, Michael Eickenberg, Olivier Grisel, Bertrand Thirion, Gael Varoquaux	We therefore propose to blend representation modelling and task classification into a unified statistical learning problem.
374	Gaussian Process Random Fields	David Moore, Stuart J. Russell	We introduce a new approximation for large-scale Gaussian processes, the Gaussian Process Random Field (GPRF), in which local GPs are coupled via pairwise potentials.
375	M-Statistic for Kernel Change-Point Detection	Shuang Li, Yao Xie, Hanjun Dai, Le Song	In this paper we propose two related computationally efficient M-statistics for kernel-based change-point detection when the amount of background data is large.
376	Adaptive Online Learning	Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan	We propose a general framework for studying adaptive regret bounds in the online learning setting, subsuming model selection and data-dependent bounds.
377	A Universal Catalyst for First-Order Optimization	Hongzhou Lin, Julien Mairal, Zaid Harchaoui	We introduce a generic scheme for accelerating first-order optimization methods in the sense of Nesterov, which builds upon a new analysis of the accelerated proximal point algorithm.
378	Inference for determinantal point processes without spectral knowledge	R�mi Bardenet, Michalis Titsias RC AUEB	Our main contribution is to derive bounds on the likelihood ofa DPP, both for finite and continuous domains.
379	Kullback-Leibler Proximal Variational Inference	Mohammad E. Khan, Pierre Baque, Fran�ois Fleuret, Pascal Fua	We propose a new variational inference method based on the Kullback-Leibler (KL) proximal term.
380	Semi-Proximal Mirror-Prox for Nonsmooth Composite Minimization	Niao He, Zaid Harchaoui	We propose a new first-order optimization algorithm to solve high-dimensional non-smooth composite minimization problems.
381	LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements	CHRISTOS THRAMPOULIDIS, Ehsan Abbasi, Babak Hassibi	In this work, we considerably strengthen these results by obtaining explicit expressions for $\\|\hat x-\mu x_0\\|_2$, for the regularized Generalized-LASSO, that are asymptotically precise when $m$ and $n$ grow large.
382	From random walks to distances on unweighted graphs	Tatsunori Hashimoto, Yi Sun, Tommi Jaakkola	We establish a general correspondence between hitting times of the Brownian motion and analogous hitting times on the graph.
383	Bayesian dark knowledge	Anoop Korattikara Balan, Vivek Rathod, Kevin P. Murphy, Max Welling	We describe a method for “distilling” a Monte Carlo approximation to the posterior predictive density into a more compact form, namely a single deep neural network.
384	Matrix Completion with Noisy Side Information	Kai-Yang Chiang, Cho-Jui Hsieh, Inderjit S. Dhillon	In this paper, we propose a novel model that balances between features and observations simultaneously, enabling us to leverage feature information yet to be robust to feature noise.
385	Dependent Multinomial Models Made Easy: Stick-Breaking with the Polya-gamma Augmentation	Scott Linderman, Matthew J. Johnson, Ryan P. Adams	Here, we leverage a logistic stick-breaking representation and recent innovations in P\'{o}lya-gamma augmentation to reformulate the multinomial distribution in terms of latent variables with jointly Gaussian likelihoods, enabling us to take advantage of a host of Bayesian inference techniques for Gaussian models with minimal overhead.
386	On-the-Job Learning with Bayesian Decision Theory	Keenon Werling, Arun Tejasvi Chaganty, Percy S. Liang, Christopher D. Manning	Our goal is to deploy a high-accuracy system starting with zero training examples.
387	Calibrated Structured Prediction	Volodymyr Kuleshov, Percy S. Liang	We explore a range of features appropriate for structured recalibration, and demonstrate their efficacy on three real-world datasets.
388	Learning Structured Output Representation using Deep Conditional Generative Models	Kihyuk Sohn, Honglak Lee, Xinchen Yan	In this work, we develop a scalable deep conditional generative model for structured output variables using Gaussian latent variables.
389	Time-Sensitive Recommendation From Recurrent User Activities	Nan Du, Yichen Wang, Niao He, Jimeng Sun, Le Song	To address these questions, we propose a novel framework which connects self-exciting point processes and low-rank models to capture the recurrent temporal patterns in a large collection of user-item consumption pairs.
390	Learning Stationary Time Series using Gaussian Processes with Nonparametric Kernels	Felipe Tobar, Thang D. Bui, Richard E. Turner	We introduce the Gaussian Process Convolution Model (GPCM), a two-stage nonparametric generative procedure to model stationary signals as the convolution between a continuous-time white-noise process and a continuous-time linear filter drawn from Gaussian process.
391	A Market Framework for Eliciting Private Data	Bo Waggoner, Rafael Frongillo, Jacob D. Abernethy	We propose a mechanism for purchasing information from a sequence of participants.The participants may simply hold data points they wish to sell, or may have more sophisticated information; either way, they are incentivized to participate as long as they believe their data points are representative or their information will improve the mechanism’s future prediction on a test set.The mechanism, which draws on the principles of prediction markets, has a bounded budget and minimizes generalization error for Bregman divergence loss functions.We then show how to modify this mechanism to preserve the privacy of participants’ information: At any given time, the current prices and predictions of the mechanism reveal almost no information about any one participant, yet in total over all participants, information is accurately aggregated.
392	Lifted Inference Rules With Constraints	Happy Mittal, Anuj Mahajan, Vibhav G. Gogate, Parag Singla	Computational complexity of these rules is highly dependent onthe choice of the constraint language they operate on and therefore coming upwith the right kind of representation is critical to the success of lifted inference.In this paper, we propose a new constraint language, called setineq, which allowssubset, equality and inequality constraints, to represent substitutions over the vari-ables in the theory.
393	Gradient Estimation Using Stochastic Computation Graphs	John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel	We introduce the formalism of stochastic computation graphs–directed acyclic graphs that include both deterministic functions and conditional probability distributions and describe how to easily and automatically derive an unbiased estimator of the loss function’s gradient.
394	Model-Based Relative Entropy Stochastic Search	Abbas Abdolmaleki, Rudolf Lioutikov, Jan R. Peters, Nuno Lau, Luis Pualo Reis, Gerhard Neumann	To alleviate these problems, we introduce a new surrogate-based stochastic search approach.
395	Semi-supervised Learning with Ladder Networks	Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, Tapani Raiko	We combine supervised learning with unsupervised learning in deep neural networks.
396	Embedding Inference for Structured Multilabel Prediction	Farzaneh Mirzazadeh, Siamak Ravanbakhsh, Nan Ding, Dale Schuurmans	Rather than using approximate inference or tailoring a specialized inference method for a particular structure—standard responses to the scaling challenge—we propose to embed prediction constraints directly into the learned representation.
397	Copula variational inference	Dustin Tran, David Blei, Edo M. Airoldi	We develop a general variational inference method that preserves dependency among the latent variables.
398	Recursive Training of 2D-3D Convolutional Networks for Neuronal Boundary Prediction	Kisuk Lee, Aleksandar Zlateski, Vishwanathan Ashwin, H. Sebastian Seung	Here we achieve a substantial gain in accuracy through three innovations.
399	A Dual Augmented Block Minimization Framework for Learning with Limited Memory	Ian En-Hsu Yen, Shan-Wei Lin, Shou-De Lin	In this paper, we consider the more general setting of regularized \emph{Empirical Risk Minimization (ERM)} when data cannot fit into memory.
400	Optimal Testing for Properties of Distributions	Jayadev Acharya, Constantinos Daskalakis, Gautam Kamath	Nevertheless, even for basic classes ofdistributions such as monotone, log-concave, unimodal, and monotone hazard rate, the optimal sample complexity is unknown.We provide a general approach via which we obtain sample-optimal and computationally efficient testers for all these distribution families.
401	Efficient Learning of Continuous-Time Hidden Markov Models for Disease Progression	Yu-Ying Liu, Shuang Li, Fuxin Li, Le Song, James M. Rehg	In this paper, we present the first complete characterization of efficient EM-based learning methods for CT-HMM models.
402	Expectation Particle Belief Propagation	Thibaut Lienart, Yee Whye Teh, Arnaud Doucet	We propose an original particle-based implementation of the Loopy Belief Propagation (LPB) algorithm for pairwise Markov Random Fields (MRF) on a continuous state space.
403	Latent Bayesian melding for integrating individual and population models	Mingjun Zhong, Nigel Goddard, Charles Sutton	We propose latent Bayesian melding, which is motivated by averaging the distributions over populations statistics of both the individual-level and the population-level models under a logarithmic opinion pool framework.