# Paper Digest: ICML 2014 Highlights

The International Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. In 2014, it is to be held in Beijing, China.

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to **sign up our free paper digest service ** to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team

team@paperdigest.org

#### TABLE 1: ICML 2014 Papers

Title | Authors | Highlight | |
---|---|---|---|

1 | A Discriminative Latent Variable Model for Online Clustering | Rajhans Samdani, Kai-Wei Chang, Dan Roth | This paper presents a latent variable structured prediction model for discriminative supervised clustering of items called the Latent Left-linking Model (L3M). |

2 | Kernel Mean Estimation and Stein Effect | Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Arthur Gretton, Bernhard Schoelkopf | Focusing on a subset of this class, we propose efficient shrinkage estimators for the kernel mean. |

3 | Demystifying Information-Theoretic Clustering | Greg Ver Steeg, Aram Galstyan, Fei Sha, Simon DeDeo | We propose a novel method for clustering data which is grounded in information-theoretic principles and requires no parametric assumptions. |

4 | Covering Number for Efficient Heuristic-based POMDP Planning | Zongzhang Zhang, David Hsu, Wee Sun Lee | In this paper, we use the covering number to characterize the size of the search space reachable under heuristics and connect the complexity of POMDP planning to the effectiveness of heuristics. |

5 | The Coherent Loss Function for Classification | Wenzhuo Yang, Melvyn Sim, Huan Xu | To address the intractability, previous methods consider minimizing the cumulative loss – the sum of convex surrogates of the 0-1 loss of each sample. In this paper, we revisit this paradigm and develop instead an axiomatic framework by proposing a set of salient properties on functions for binary classification and then propose the coherent loss approach, which is a tractable upper-bound of the empirical classification error over the entire sample set. |

6 | Fast Stochastic Alternating Direction Method of Multipliers | Wenliang Zhong, James Kwok | We propose a new stochastic alternating direction method of multipliers (ADMM) algorithm, which incrementally approximates the full gradient in the linearized ADMM formulation. |

7 | Active Detection via Adaptive Submodularity | Yuxin Chen, Hiroaki Shioi, Cesar Fuentes Montesinos, Lian Pin Koh, Serge Wich, Andreas Krause | In this paper, we propose a principled approach to active object detection, and show that for a rich class of base detectors algorithms, one can derive a natural sequential decision problem for deciding when to invoke expert supervision. |

8 | Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization | Shai Shalev-Shwartz, Tong Zhang | We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure. |

9 | An Adaptive Accelerated Proximal Gradient Method and its Homotopy Continuation for Sparse Optimization | Qihang Lin, Lin Xiao | This method incorporates a restarting scheme to automatically estimate the strong convexity parameter and achieves a nearly optimal iteration complexity. |

10 | Recurrent Convolutional Neural Networks for Scene Labeling | Pedro Pinheiro, Ronan Collobert | We propose an approach that consists of a recurrent convolutional neural network which allows us to consider a large input context while limiting the capacity of the model. |

11 | A Statistical Perspective on Algorithmic Leveraging | Ping Ma, Michael Mahoney, Bin Yu | Based on these theoretical results, we propose and analyze two new leveraging algorithms: one constructs a smaller least-squares problem with “shrinked” leverage scores (SLEV), and the other solves a smaller and unweighted (or biased) least-squares problem (LEVUNW). |

12 | Thompson Sampling for Complex Online Problems | Aditya Gopalan, Shie Mannor, Yishay Mansour | We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round. |

13 | Boosting multi-step autoregressive forecasts | Souhaib Ben Taieb, Rob Hyndman | To address this issue, we propose a new forecasting strategy which boosts traditional recursive linear forecasts with a direct strategy using a boosting autoregression procedure at each horizon. |

14 | A Statistical Convergence Perspective of Algorithms for Rank Aggregation from Pairwise Data | Arun Rajkumar, Shivani Agarwal | In this paper, we consider this question in a natural setting where pairwise comparisons are drawn randomly and independently from some underlying probability distribution. |

15 | Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations | Timothy Mann, Shie Mannor | We show how options, a class of control structures encompassing primitive and temporally extended actions, can play a valuable role in planning in MDPs with continuous state-spaces. |

16 | Latent Bandits. | Odalric-Ambrym Maillard, Shie Mannor | In each setting, we introduce specific algorithms and derive non-trivial regret performance. |

17 | Fast Allocation of Gaussian Process Experts | Trung Nguyen, Edwin Bonilla | We propose a scalable nonparametric Bayesian regression model based on a mixture of Gaussian process (GP) experts and the inducing points formalism underpinning sparse GP approximations. |

18 | Von Mises-Fisher Clustering Models | Siddharth Gopal, Yiming Yang | This paper proposes a suite of models for clustering high-dimensional data on a unit sphere based on Von Mises-Fisher (vMF) distribution and for discovering more intuitive clusters than existing approaches. |

19 | Convergence rates for persistence diagram estimation in Topological Data Analysis | Fr�d�ric Chazal, Marc Glisse, Catherine Labru�re, Bertrand Michel | We show that the use of persistent homology can be naturally considered in general statistical frameworks. |

20 | Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs | Fabian Gieseke, Justin Heinermann, Cosmin Oancea, Christian Igel | We present a new approach for combining k-d trees and graphics processing units for nearest neighbor search. |

21 | Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget | Anoop Korattikara, Yutian Chen, Max Welling | We introduce an approximate MH rule based on a sequential hypothesis test that allows us to accept or reject samples with high confidence using only a fraction of the data required for the exact MH rule. |

22 | Understanding the Limiting Factors of Topic Modeling via Posterior Contraction Analysis | Jian Tang, Zhaoshi Meng, Xuanlong Nguyen, Qiaozhu Mei, Ming Zhang | We present theorems elucidating the posterior contraction rates of the topics as the amount of data increases, and a thorough supporting empirical study using synthetic and real data sets, including news and web-based articles and tweet messages. |

23 | The Inverse Regression Topic Model | Maxim Rabinovich, David Blei | In this paper, we introduce the inverse regression topic model (IRTM), a mixed-membership extension of MNIR that combines the strengths of both methodologies. |

24 | A Consistent Histogram Estimator for Exchangeable Graph Models | Stanley Chan, Edoardo Airoldi | In this paper, we propose a histogram estimator of a graphon that is provably consistent and numerically efficient. |

25 | Latent Variable Copula Inference for Bundle Pricing from Retail Transaction Data | Benjamin Letham, Wei Sun, Anshul Sheopuri | We develop a statistically consistent and computationally tractable inference procedure for fitting a copula model over correlated valuations, using only sales transaction data for the individual items. |

26 | Towards Minimax Online Learning with Unknown Time Horizon | Haipeng Luo, Robert Schapire | For the random horizon setting with restricted losses, we derive a fully optimal minimax algorithm. |

27 | Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball | Andrew Miller, Luke Bornn, Ryan Adams, Kirk Goldsberry | Modeling shot attempt data as a point process, we create a low dimensional representation of offensive player types in the NBA. |

28 | Margins, Kernels and Non-linear Smoothed Perceptrons | Aaditya Ramdas, Javier Pe�a | We focus on the problem of finding a non-linear classification function that lies in a Reproducing Kernel Hilbert Space (RKHS) both from the primal point of view (finding a perfect separator when one exists) and the dual point of view (giving a certificate of non-existence), with special focus on generalizations of two classical schemes – the Perceptron (primal) and Von-Neumann (dual) algorithms. |

29 | Robust RegBayes: Selectively Incorporating First-Order Logic Domain Knowledge into Bayesian Models | Shike Mei, Jun Zhu, Jerry Zhu | We present a novel and more direct approach by imposing First-Order Logic (FOL) rules on the posterior distribution. |

30 | Learning Theory and Algorithms for revenue optimization in second price auctions with reserve | Mehryar Mohri, Andres Munoz Medina | We cast the problem of selecting the reserve price to optimize revenue as a learning problem and present a full theoretical analysis dealing with the complex properties of the corresponding loss function (it is non-convex and discontinuous). |

31 | Low-density Parity Constraints for Hashing-Based Discrete Integration | Stefano Ermon, Carla Gomes, Ashish Sabharwal, Bart Selman | Inspired by the success of LDPC codes, we propose the use of low-density parity constraints to make inference more tractable in practice. |

32 | Prediction with Limited Advice and Multiarmed Bandits with Paid Observations | Yevgeny Seldin, Peter Bartlett, Koby Crammer, Yasin Abbasi-Yadkori | We present an algorithm that achieves O(\sqrt(N/M)T\ln N) regret on T rounds of this game. |

33 | Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts | Tien Vu Nguyen, Dinh Phung, Xuanlong Nguyen, Swetha Venkatesh, Hung Bui | We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. |

34 | Large-Margin Metric Learning for Constrained Partitioning Problems | R�mi Lajugie, Francis Bach, Sylvain Arlot | We aim at learning a Mahalanobis metric for these unsupervised problems, leading to feature weighting and/or selection. |

35 | Wasserstein Propagation for Semi-Supervised Learning | Justin Solomon, Raif Rustamov, Leonidas Guibas, Adrian Butscher | Thus, this paper introduces a technique for graph-based semi-supervised learning of histograms, derived from the theory of optimal transportation. |

36 | Max-Margin Infinite Hidden Markov Models | Aonan Zhang, Jun Zhu, Bo Zhang | Our paper introduces max-margin infinite HMMs (M2iHMMs), new infinite HMMs that explore the max-margin principle for discriminative learning. |

37 | Efficient Approximation of Cross-Validation for Kernel Methods using Bouligand Influence Function | Yong Liu, Shali Jiang, Shizhong Liao | In this paper, we present a novel strategy for approximating the cross-validation based on the Bouligand influence function (BIF), which only requires the solution of the algorithm once. |

38 | Generalized Exponential Concentration Inequality for Renyi Divergence Estimation | Shashank Singh, Barnabas Poczos | The main contribution of our work is to provide such a bound for an estimator of Renyi divergence for a smooth Holder class of densities on the d-dimensional unit cube. |

39 | Boosting with Online Binary Learners for the Multiclass Bandit Problem | Shang-Tse Chen, Hsuan-Tien Lin, Chi-Jen Lu | In this paper, we propose an approach that systematically converts existing online binary classifiers to promising bandit learners with strong theoretical guarantee. |

40 | Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm | Tasuku Soma, Naonori Kakimura, Kazuhiro Inaba, Ken-ichi Kawarabayashi | We consider the budget allocation problem over bipartite influence model proposed by Alon et al. |

41 | Computing Parametric Ranking Models via Rank-Breaking | Hossein Azari Soufiani, David Parkes, Lirong Xia | We characterize the breakings for which the estimator is consistent for random utility models (RUMs) including Plackett-Luce and Normal-RUM, develop a general sufficient condition for a full breaking to be the only consistent breaking, and provide a trichotomy theorem in regard to single-edge breakings. |

42 | Tracking Adversarial Targets | Yasin Abbasi-Yadkori, Peter Bartlett, Varun Kanade | We present an efficient algorithm for this problem and show that, under standard conditions on the linear system, its regret with respect to an optimal linear policy grows as O(\log^2 T), where T is the number of rounds of the game. |

43 | Online Bayesian Passive-Aggressive Learning | Tianlin Shi, Jun Zhu | This paper presents online Bayesian Passive-Aggressive (BayesPA) learning, which subsumes the online PA and extends naturally to incorporate latent variables and perform nonparametric Bayesian inference, thus providing great flexibility for explorative analysis. |

44 | Deterministic Policy Gradient Algorithms | David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller | In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. |

45 | Modeling Correlated Arrival Events with Latent Semi-Markov Processes | Wenzhao Lian, Vinayak Rao, Brian Eriksson, Lawrence Carin | In this work, we model such data as generated by a latent collection of continuous-time binary semi-Markov processes, corresponding to external events appearing and disappearing. |

46 | Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach | R�mi Bardenet, Arnaud Doucet, Chris Holmes | This paper describes a methodology that aims to scale up the Metropolis-Hastings (MH) algorithm in this context. |

47 | Diagnosis determination: decision trees optimizing simultaneously worst and expected testing cost | Ferdinando Cicalese, Eduardo Laber, Aline Medeiros Saettler | We provide an algorithm that builds a strategy (decision tree) with both expected cost and worst cost which are at most an O(\log n) factor away from, respectively, the minimum possible expected cost and the minimum possible worst cost. |

48 | Condensed Filter Tree for Cost-Sensitive Multi-Label Classification | Chun-Liang Li, Hsuan-Tien Lin | In this paper, we propose a novel algorithm, called condensed filter tree (CFT), for optimizing any criteria in CSMLC. |

49 | On Measure Concentration of Random Maximum A-Posteriori Perturbations | Francesco Orabona, Tamir Hazan, Anand Sarwate, Tommi Jaakkola | More efficient algorithms use sequential sampling strategies based on the expected value of low dimensional MAP perturbations. |

50 | Bias in Natural Actor-Critic Algorithms | Philip Thomas | We show that several popular discounted reward natural actor-critics, including the popular NAC-LSTD and eNAC algorithms, do not generate unbiased estimates of the natural policy gradient as claimed. |

51 | Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning | Fran�ois Denis, Mattias Gybels, Amaury Habrard | Spectral methods propose elegant solutions to the problem of inferring weighted automata from finite samples of variable-length strings drawn from an unknown target distribution. |

52 | On Modelling Non-linear Topical Dependencies | Zhixing Li, Siqiang Wen, Juanzi Li, Peng Zhang, Jie Tang | In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words. |

53 | A Deep and Tractable Density Estimator | Benigno Uria, Iain Murray, Hugo Larochelle | In this work we introduce an efficient procedure to simultaneously train a NADE model for each possible ordering of the variables, by sharing parameters across all these models. |

54 | (Near) Dimension Independent Risk Bounds for Differentially Private Learning | Prateek Jain, Abhradeep Guha Thakurta | In this paper, we study the problem of differentially private risk minimization where the goal is to provide differentially private algorithms that have small excess risk. |

55 | Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels | Jiyan Yang, Vikas Sindhwani, Haim Avron, Michael Mahoney | In this paper, we propose to use Quasi-Monte Carlo (QMC) approximations instead where the relevant integrands are evaluated on a low-discrepancy sequence of points as opposed to random point sets as in the Monte Carlo approach. |

56 | Discriminative Features via Generalized Eigenvectors | Nikos Karampatziakis, Paul Mineiro | In this paper we investigate scalable techniques for inducing discriminative features by taking advantage of simple second order structure in the data. |

57 | Forward-Backward Greedy Algorithms for General Convex Smooth Functions over A Cardinality Constraint | Ji Liu, Jieping Ye, Ryohei Fujimaki | In this paper, we systematically analyze the theoretical properties of both algorithms. |

58 | Online Learning in Markov Decision Processes with Changing Cost Sequences | Travis Dick, Andras Gyorgy, Csaba Szepesvari | In this paper we consider online learning in finite Markov decision processes (MDPs) with changing cost sequences under full and bandit-information. |

59 | Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms | Richard Combes, Alexandre Proutiere | For discrete unimodal bandits, we derive asymptotic lower bounds for the regret achieved under any algorithm, and propose OSUB, an algorithm whose regret matches this lower bound. |

60 | Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection | Arun Iyer, Saketha Nath, Sunita Sarawagi | In this paper we investigate the use of maximum mean discrepancy (MMD) in a reproducing kernel Hilbert space (RKHS) for estimating such ratios. |

61 | Asymptotically consistent estimation of the number of change points in highly dependent time series | Azadeh Khaleghi, Daniil Ryabko | Based on this reduction, an algorithm is proposed that finds the number of change points and locates the changes. |

62 | Coordinate-descent for learning orthogonal matrices through Givens rotations | Uri Shalit, Gal Chechik | Here we propose a framework for optimizing orthogonal matrices, that is the parallel of coordinate-descent in Euclidean spaces. |

63 | Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search | Anshumali Shrivastava, Ping Li | In this paper, we propose a hashing technique which generates all the necessary hash evaluations needed for similarity search, using one single permutation. |

64 | A Divide-and-Conquer Solver for Kernel Support Vector Machines | Cho-Jui Hsieh, Si Si, Inderjit Dhillon | In this paper, we propose and analyze a novel divide-and-conquer solver for kernel SVMs (DC-SVM). |

65 | Nuclear Norm Minimization via Active Subspace Selection | Cho-Jui Hsieh, Peder Olsen | We describe a novel approach to optimizing matrix problems involving nuclear norm regularization and apply it to the matrix completion problem. |

66 | Provable Bounds for Learning Some Deep Representations | Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma | We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. |

67 | Large-scale Multi-label Learning with Missing Labels | Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, Inderjit Dhillon | In this paper, we directly address both these problems by studying the multi-label problem in a generic empirical risk minimization (ERM) framework. |

68 | Learning Graphs with a Few Hubs | Rashish Tandon, Pradeep Ravikumar | We consider the problem of recovering the graph structure of a “hub-networked” Ising model given iid samples, under high-dimensional settings, where number of nodes p could be potentially larger than the number of samples n. By a “hub-networked” graph, we mean a graph with a few “hub nodes” with very large degrees. |

69 | Agnostic Bayesian Learning of Ensembles | Alexandre Lacoste, Mario Marchand, Fran�ois Laviolette, Hugo Larochelle | We propose a method for producing ensembles of predictors based on holdout estimations of their generalization performances. |

70 | Towards an optimal stochastic alternating direction method of multipliers | Samaneh Azadi, Suvrit Sra | This paper presents two new SADMM methods: (i) the first attains the minimax optimal rate of O(1/k) for nonsmooth strongly-convex stochastic problems; while (ii) the second progresses towards an optimal rate by exhibiting an O(1/k^2) rate for the smooth part. |

71 | Spherical Hamiltonian Monte Carlo for Constrained Target Distributions | Shiwei Lan, Bo Zhou, Babak Shahbaba | For such problems, we propose a novel Markov Chain Monte Carlo (MCMC) method that provides a general and computationally efficient framework for handling boundary conditions. |

72 | Efficient Continuous-Time Markov Chain Estimation | Monir Hajiaghayi, Bonnie Kirkpatrick, Liangliang Wang, Alexandre Bouchard-C�t� | We propose a particle-based Monte Carlo approach where the holding times are marginalized analytically. |

73 | DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition | Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell | We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks. |

74 | Making the Most of Bag of Words: Sentence Regularization with Alternating Direction Method of Multipliers | Dani Yogatama, Noah Smith | We introduce a learning algorithm that exploits this intuition by encoding it in a regularizer. |

75 | Narrowing the Gap: Random Forests In Theory and In Practice | Misha Denil, David Matheson, Nando De Freitas | In this paper we contribute to this understanding in two ways. |

76 | Coherent Matrix Completion | Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, Rachel Ward | Here, we show that nuclear norm minimization can recover an arbitrary n \times n matrix of rank r from O(nr log^2(n)) revealed entries, provided that revealed entries are drawn proportionally to the local row and column coherences (closely related to leverage scores) of the underlying matrix. |

77 | Admixture of Poisson MRFs: A Topic Model with Word Dependencies | David Inouye, Pradeep Ravikumar, Inderjit Dhillon | This paper introduces a new topic model based on an admixture of Poisson Markov Random Fields (APM), which can model dependencies between words as opposed to previous independent topic models such as PLSA (Hofmann, 1999), LDA (Blei et al., 2003) or SAM (Reisinger et al., 2010). |

78 | True Online TD(lambda) | Harm Seijen, Rich Sutton | In this paper we introduce a new forward view that takes into account the possibility of changing estimates and a new variant of TD(lambda) that exactly achieves it. |

79 | Memory Efficient Kernel Approximation | Si Si, Cho-Jui Hsieh, Inderjit Dhillon | Based on this observation, we propose a new kernel approximation algorithm – Memory Efficient Kernel Approximation (MEKA), which considers both low-rank and clustering structure of the kernel matrix. |

80 | Learning Sum-Product Networks with Direct and Indirect Variable Interactions | Amirmohammad Rooshenas, Daniel Lowd | In this paper, we present ID-SPN, a new algorithm for learning SPN structure that unifies the two approaches. |

81 | Hamiltonian Monte Carlo Without Detailed Balance | Jascha Sohl-Dickstein, Mayur Mudigonda, Michael DeWeese | We present a method for performing Hamiltonian Monte Carlo that largely eliminates sample rejection. |

82 | Filtering with Abstract Particles | Jacob Steinhardt, Percy Liang | We present a new filtering method that addresses this issue by using “abstract particles” that each represent an entire region of the state space. |

83 | Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers | Taiji Suzuki | We propose a new stochastic dual coordinate ascent technique that can be applied to a wide range of regularized learning problems. |

84 | Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction | Jian Zhou, Olga Troyanskaya | Here we present a new supervised generative stochastic network (GSN) based method to predict local secondary structure with deep hierarchical representations. |

85 | An Efficient Approach for Assessing Hyperparameter Importance | Frank Hutter, Holger Hoos, Kevin Leyton-Brown | This paper describes efficient methods that can be used to gain such insight, leveraging random forest models fit on the data already gathered by Bayesian optimization. |

86 | An Information Geometry of Statistical Manifold Learning | Ke Sun, St�phane Marchand-Maillet | We develop a manifold learning theory in a hypothesis space consisting of models. |

87 | Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem | Masrour Zoghi, Shimon Whiteson, Remi Munos, Maarten Rijke | This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. |

88 | Compact Random Feature Maps | Raffay Hamid, Ying Xiao, Alex Gittens, Dennis Decoste | We show how structured random matrices can be used to efficiently generate CRAFTMaps, and present a single-pass algorithm using CRAFTMaps to learn non-linear multi-class classifiers. |

89 | Concentration in unbounded metric spaces and algorithmic stability | Aryeh Kontorovich | To this end, we introduce the notion of the \em subgaussian diameter, which is a distribution-dependent refinement of the metric diameter. |

90 | Heavy-tailed regression with a generalized median-of-means | Daniel Hsu, Sivan Sabato | This work proposes a simple and computationally efficient estimator for linear regression, and other smooth and strongly convex loss minimization problems. |

91 | Spectral Bandits for Smooth Graph Functions | Michal Valko, Remi Munos, Branislav Kveton, Tom� Koc�k | In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. |

92 | Robust Principal Component Analysis with Complex Noise | Qian Zhao, Deyu Meng, Zongben Xu, Wangmeng Zuo, Lei Zhang | We propose a generative RPCA model under the Bayesian framework by modeling data noise as a mixture of Gaussians (MoG). |

93 | Scalable Semidefinite Relaxation for Maximum A Posterior Estimation | Qixing Huang, Yuxin Chen, Leonidas Guibas | In this paper, we propose a novel semidefinite relaxation formulation (referred to as SDR) to estimate the MAP assignment. |

94 | Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery | Cun Mu, Bo Huang, John Wright, Donald Goldfarb | We introduce a simple, new convex relaxation, which partially bridges this gap. |

95 | Automated inference of point of view from user interactions in collective intelligence venues | Sanmay Das, Allen Lavoie | We introduce a statistical framework which classifies point of view based on user interactions. |

96 | Rank-One Matrix Pursuit for Matrix Completion | Zheng Wang, Ming-Jun Lai, Zhaosong Lu, Wei Fan, Hasan Davulcu, Jieping Ye | In this paper, we present an efficient and scalable algorithm for matrix completion. |

97 | Near-Optimal Joint Object Matching via Convex Relaxation | Yuxin Chen, Leonidas Guibas, Qixing Huang | In this paper, we propose an algorithm to jointly match multiple objects that exhibit only partial similarities, where the provided pairwise feature correspondences can be densely corrupted. |

98 | Convex Total Least Squares | Dmitry Malioutov, Nikolai Slavov | We describe a fast solution based on augmented Lagrangian formulation, and apply our approach to an important class of biological problems that use population average measurements to infer cell-type and physiological-state specific expression levels that are very hard to measure directly. |

99 | On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection | Pratik Jawanpuria, Manik Varma, Saketha Nath | Our objective is to develop formulations and algorithms for efficiently computing the feature selection path – i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity. |

100 | Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization | Xiaotong Yuan, Ping Li, Tong Zhang | In this paper, we generalize HTP from compressed sensing to a generic problem setup of sparsity-constrained convex optimization. |

101 | A Unified Framework for Consistency of Regularized Loss Minimizers | Jean Honorio, Tommi Jaakkola | We characterize a family of regularized loss minimization problems that satisfy three properties: scaled uniform convergence, super-norm regularization, and norm-loss monotonicity. |

102 | Geodesic Distance Function Learning via Heat Flow on Vector Fields | Binbin Lin, Ji Yang, Xiaofei He, Jieping Ye | In this paper, we propose to learn the distance function directly on the manifold without embedding. |

103 | Near-Optimally Teaching the Crowd to Classify | Adish Singla, Ilija Bogunovic, Gabor Bartok, Amin Karbasi, Andreas Krause | We propose a natural stochastic model of the learners, modeling them as randomly switching among hypotheses based on observed feedback. |

104 | On the convergence of no-regret learning in selfish routing | Walid Krichene, Benjamin Drigh�s, Alexandre Bayen | We consider a model in which players use regret-minimizing algorithms as the learning mechanism, and study the resulting dynamics. |

105 | Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques | J�r�mie Mary, Philippe Preux, Olivier Nicol | After highlighting the limitations of the previous methods, we present a new method, based on bootstrapping techniques. |

106 | Scaling Up Robust MDPs using Function Approximation | Aviv Tamar, Shie Mannor, Huan Xu | In this work we employ a reinforcement learning approach to tackle this planning problem: we develop a robust approximate dynamic programming method based on a projected fixed point equation to approximately solve large scale robust MDPs. |

107 | Marginal Structured SVM with Hidden Variables | Wei Ping, Qiang Liu, Alex Ihler | In this work, we propose the marginal structured SVM (MSSVM) for structured prediction with hidden variables. |

108 | Linear and Parallel Learning of Markov Random Fields | Yariv Mizrahi, Misha Denil, Nando De Freitas | We introduce a new embarrassingly parallel parameter learning algorithm for Markov random fields which is efficient for a large class of practical models. |

109 | Pitfalls in the use of Parallel Inference for the Dirichlet Process | Yarin Gal, Zoubin Ghahramani | In this paper we show that the approach suggested is impractical due to an extremely unbalanced distribution of the data. |

110 | Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing | Yuan Zhou, Xi Chen, Jian Li | We propose a new PAC algorithm, which, with probability at least 1-δ, identifies a set of K arms with regret at most ε. |

111 | Deep Generative Stochastic Networks Trainable by Backprop | Yoshua Bengio, Eric Laufer, Guillaume Alain, Jason Yosinski | We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. |

112 | A Highly Scalable Parallel Algorithm for Isotropic Total Variation Models | Jie Wang, Qingyang Li, Sen Yang, Wei Fan, Peter Wonka, Jieping Ye | In this paper, we propose a highly scalable parallel algorithm for TV models that is based on a novel decomposition strategy of the problem domain. |

113 | Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting | Yudong Chen, Jiaming Xu | The planted models assume that a graph is generated from some unknown clusters by randomly placing edges between nodes according to their cluster memberships; the task is to recover the clusters given the graph. |

114 | Gaussian Process Optimization with Mutual Information | Emile Contal, Vianney Perchet, Nicolas Vayatis | In this paper, we analyze a generic algorithm scheme for sequential global optimization using Gaussian processes. |

115 | Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy | Dengyong Zhou, Qiang Liu, John Platt, Christopher Meek | We propose a method to aggregate noisy ordinal labels collected from a crowd of workers or annotators. |

116 | Exchangeable Variable Models | Mathias Niepert, Pedro Domingos | We introduce exchangeable variable models (EVMs) as a novel class of probabilistic models whose basic building blocks are partially exchangeable sequences, a generalization of exchangeable sequences. |

117 | Clustering in the Presence of Background Noise | Shai Ben-David, Nika Haghtalab | We propose a simple and efficient method to turn any centroid-based clustering algorithm into a noise-robust one, and prove robustness guarantees for our method with respect to these measures. |

118 | Safe Screening with Variational Inequalities and Its Application to Lasso | Jun Liu, Zheng Zhao, Jie Wang, Jieping Ye | In this paper, we propose an approach called “Sasvi” (Safe screening with variational inequalities). |

119 | Learning the Consistent Behavior of Common Users for Target Node Prediction across Social Networks | Shan-Hung Wu, Hao-Heng Chien, Kuan-Hua Lin, Philip Yu | In this paper, we propose learning the consistent behavior of common users to help the knowledge transfer. |

120 | Signal recovery from Pooling Representations | Joan Bruna Estrach, Arthur Szlam, Yann LeCun | We address this latter question by computing the upper and lower Lipschitz bounds of \ell_p pooling operators for p=1, 2, ∞as well as their half-rectified equivalents, which give sufficient conditions for the design of invertible pooling layers. |

121 | PAC-inspired Option Discovery in Lifelong Reinforcement Learning | Emma Brunskill, Lihong Li | In this work, we provide the first formal analysis of the sample complexity, a measure of learning speed, of reinforcement learning with options. |

122 | Multi-label Classification via Feature-aware Implicit Label Space Encoding | Zijia Lin, Guiguang Ding, Mingqing Hu, Jianmin Wang | In this paper, we propose a novel method termed FaIE to perform LSDR via Feature-aware Implicit label space Encoding. |

123 | Scalable Gaussian Process Structured Prediction for Grid Factor Graph Applications | Sebastien Bratieres, Novi Quadrianto, Sebastian Nowozin, Zoubin Ghahramani | Here we explore a scalable approach to learning GPstruct models based on ensemble learning, with weak learners (predictors) trained on subsets of the latent variables and bootstrap data, which can easily be distributed. |

124 | Anomaly Ranking as Supervised Bipartite Ranking | Stephan Cl�men�on, Sylvain Robbiano | In this paper, it is proved that, in the case where the data generating probability distribution has compact support, anomaly ranking is equivalent to (supervised) bipartite ranking, where the goal is to discriminate between the underlying probability distribution and the uniform distribution with same support. |

125 | Hierarchical Quasi-Clustering Methods for Asymmetric Networks | Gunnar Carlsson, Facundo M�moli, Alejandro Ribeiro, Santiago Segarra | This paper introduces hierarchical quasi-clustering methods, a generalization of hierarchical clustering for asymmetric networks where the output structure preserves the asymmetry of the input data. |

126 | Rectangular Tiling Process | Masahiro Nakano, Katsuhiko Ishiguro, Akisato Kimura, Takeshi Yamada, Naonori Ueda | In this paper, we propose a new probabilistic model of arbitrary partitioning called the rectangular tiling process (RTP). |

127 | Two-Stage Metric Learning | Jun Wang, Ke Sun, Fei Sha, St�phane Marchand-Maillet, Alexandros Kalousis | In this paper, we present a novel two-stage metric learning algorithm. |

128 | Stochastic Inference for Scalable Probabilistic Modeling of Binary Matrices | Jose Miguel Hernandez-Lobato, Neil Houlsby, Zoubin Ghahramani | We derive an efficient stochastic inference algorithm for PMF models of fully observed binary matrices. |

129 | Elementary Estimators for High-Dimensional Linear Regression | Eunho Yang, Aurelie Lozano, Pradeep Ravikumar | In this paper, we attempt to address this scaling issue at the source, by asking whether one can build \emphsimpler possibly closed-form estimators, that yet come with statistical guarantees that are nonetheless comparable to regularized likelihood estimators! |

130 | Elementary Estimators for Sparse Covariance Matrices and other Structured Moments | Eunho Yang, Aurelie Lozano, Pradeep Ravikumar | We propose a class of elementary convex estimators, that in many cases are available in \emphclosed-form, for estimating general structured moments. |

131 | Graph-based Semi-supervised Learning: Realizing Pointwise Smoothness Probabilistically | Yuan Fang, Kevin Chang, Hady Lauw | In this paper, we study two complementary dimensions of smoothness: its pointwise nature and probabilistic modeling. |

132 | Bayesian Max-margin Multi-Task Learning with Data Augmentation | Chengtao Li, Jun Zhu, Jianfei Chen | We present Bayesian max-margin multi-task learning, which conjoins the two schools of methods, thus allowing the discriminative max-margin methods to enjoy the great flexibility of Bayesian methods on incorporating rich prior information as well as performing nonparametric Bayesian feature learning with the latent dimensionality resolved from data. |

133 | Sparse Reinforcement Learning via Convex Optimization | Zhiwei Qin, Weichang Li, Firdaus Janoos | We propose two new algorithms for the sparse reinforcement learning problem based on different formulations. |

134 | Gaussian Process Classification and Active Learning with Multiple Annotators | Filipe Rodrigues, Francisco Pereira, Bernardete Ribeiro | In this paper, we extend GP classification in order to account for multiple annotators with different levels expertise. |

135 | Structured Prediction of Network Response | Hongyu Su, Aristides Gionis, Juho Rousu | To solve the problems, we present an approximate inference method through a semi-definite programming relaxation (SDP), as well as a more scalable greedy heuristic algorithm. We introduce the following network response problem: given a complex network and an action, predict the subnetwork that responds to action, that is, which nodes perform the action and which directed edges relay the action to the adjacent nodes. |

136 | An Analysis of State-Relevance Weights and Sampling Distributions on L1-Regularized Approximate Linear Programming Approximation Accuracy | Gavin Taylor, Connor Geer, David Piekut | In this paper, we discuss and explain the effects of choices in the state-relevance weights and sampling distribution on approximation quality, using both theoretical and experimental illustrations. |

137 | Optimization Equivalence of Divergences Improves Neighbor Embedding | Zhirong Yang, Jaakko Peltonen, Samuel Kaski | Through the equivalences we represent several nonlinear dimensionality reduction and graph drawing methods in a generalized stochastic neighbor embedding setting, where information divergences are minimized between similarities in input and output spaces, and the optimal connection scalar provides a natural choice for the tradeoff between attractive and repulsive forces. |

138 | An Asynchronous Parallel Stochastic Coordinate Descent Algorithm | Ji Liu, Steve Wright, Christopher Re, Victor Bittorf, Srikrishna Sridhar | We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. |

139 | Consistency of Causal Inference under the Additive Noise Model | Samory Kpotufe, Eleni Sgouritsa, Dominik Janzing, Bernhard Sch�lkopf | We analyze a family of methods for statistical causal inference from sample under the so-called Additive Noise Model. |

140 | Globally Convergent Parallel MAP LP Relaxation Solver using the Frank-Wolfe Algorithm | Alexander Schwing, Tamir Hazan, Marc Pollefeys, Raquel Urtasun | In this paper we suggest to decouple the quadratic program based on the Frank-Wolfe approach. |

141 | Linear Programming for Large-Scale Markov Decision Problems | Alan Malek, Yasin Abbasi-Yadkori, Peter Bartlett | We propose two techniques, one based on stochastic convex optimization, and one based on constraint sampling. |

142 | Linear Time Solver for Primal SVM | Feiping Nie, Yizhen Huang, Heng Huang | This paper presents a new L2-norm regularized primal SVM solver using Augmented Lagrange Multipliers, with linear-time computational cost for Lp-norm loss functions. |

143 | Memory (and Time) Efficient Sequential Monte Carlo | Seong-Hwan Jun, Alexandre Bouchard-C�t� | Our contribution is a simple scheme that makes the memory cost of SMC methods depends on the number of distinct particles that survive resampling. |

144 | Scaling SVM and Least Absolute Deviations via Exact Data Reduction | Jie Wang, Peter Wonka, Jieping Ye | Motivated by this observation, we present fast and efficient screening rules to discard non-support vectors by analyzing the dual problem of SVM via variational inequalities (DVI). |

145 | Latent Semantic Representation Learning for Scene Classification | Xin Li, Yuhong Guo | In this work, we address this problem by proposing a novel patch-based latent variable model to integrate latent contextual representation learning and classification model training in one joint optimization framework. |

146 | Least Squares Revisited: Scalable Approaches for Multi-class Prediction | Alekh Agarwal, Sham Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant | On the theoretical front, we present several variants with convergence guarantees. |

147 | Local algorithms for interactive clustering | Pranjal Awasthi, Maria Balcan, Konstantin Voevodski | We study the design of interactive clustering algorithms for data sets satisfying natural stability assumptions. |

148 | Model-Based Relational RL When Object Existence is Partially Observable | Ngo Ahn Vien, Marc Toussaint | We propose a computationally efficient extension of model-based relational RL methods that approximates these beliefs using discrete uncertainty predicates. |

149 | A new Q(lambda) with interim forward view and Monte Carlo equivalence | Rich Sutton, Ashique Rupam Mahmood, Doina Precup, Hado Hasselt | In this paper, we introduce a new version of Q(lambda) that does exactly that, without significantly increased algorithmic complexity. |

150 | On Robustness and Regularization of Structural Support Vector Machines | Mohamad Ali Torkamani, Daniel Lowd | In this paper, we explore the problem of learning robust models for structured prediction problems. |

151 | Guess-Averse Loss Functions For Cost-Sensitive Multiclass Boosting | Oscar Beijbom, Mohammad Saberian, David Kriegman, Nuno Vasconcelos | Guess-Averse Loss Functions For Cost-Sensitive Multiclass Boosting |

152 | Multimodal Neural Language Models | Ryan Kiros, Ruslan Salakhutdinov, Rich Zemel | We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities. |

153 | Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods | Jascha Sohl-Dickstein, Ben Poole, Surya Ganguli | We present an algorithm for minimizing a sum of functions that combines the computational efficiency of stochastic gradient descent (SGD) with the second order curvature information leveraged by quasi-Newton methods. |

154 | Alternating Minimization for Mixed Linear Regression | Xinyang Yi, Constantine Caramanis, Sujay Sanghavi | In this paper we provide a new initialization procedure for EM, based on finding the leading two eigenvectors of an appropriate matrix. |

155 | Stochastic Neighbor Compression | Matt Kusner, Stephen Tyree, Kilian Weinberger, Kunal Agrawal | We present Stochastic Neighborhood Compression (SNC), an algorithm to compress a dataset for the purpose of k-nearest neighbor (kNN) classification. |

156 | Robust Learning under Uncertain Test Distributions: Relating Covariate Shift to Model Misspecification | Junfeng Wen, Chun-Nam Yu, Russell Greiner | Our empirical studies, on UCI datasets and a real-world cancer prognostic prediction dataset, show that our analysis applies, and that our RCSA works effectively. |

157 | Nonparametric Estimation of Multi-View Latent Variable Models | Le Song, Animashree Anandkumar, Bo Dai, Bo Xie | In this paper, we propose a kernel method for learning multi-view latent variable models, allowing each mixture component to be nonparametric and learned from data in an unsupervised fashion. |

158 | Structured Generative Models of Natural Source Code | Chris Maddison, Daniel Tarlow | We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. |

159 | A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data | Jinfeng Yi, Lijun Zhang, Jun Wang, Rong Jin, Anil Jain | In this work, we focus on the problem of clustering high-dimensional data with sparse centers. |

160 | Statistical analysis of stochastic gradient methods for generalized linear models | Panagiotis Toulis, Edoardo Airoldi, Jason Rennie | We study the statistical properties of stochastic gradient descent (SGD) using explicit and implicit updates for fitting generalized linear models (GLMs). |

161 | Coding for Random Projections | Ping Li, Michael Mitzenmacher, Anshumali Shrivastava | In this paper, we study a number of simple coding schemes, focusing on the task of similarity estimation and on an application to training linear classifiers. |

162 | Fast Computation of Wasserstein Barycenters | Marco Cuturi, Arnaud Doucet | We present new algorithms to compute the mean of a set of N empirical probability measures under the optimal transport metric. |

163 | Global graph kernels using geometric embeddings | Fredrik Johansson, Vinay Jethava, Devdatt Dubhashi, Chiranjib Bhattacharyya | This paper presents two graph kernels defined on unlabeled graphs which capture global properties of graphs using the celebrated Lovász number and its associated orthonormal representation. |

164 | Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data | Zhiyuan Chen, Bing Liu | To address this problem, several knowledge-based topic models have been proposed to incorporate prior domain knowledge from the user. |

165 | K-means recovers ICA filters when independent components are sparse | Alon Vinnikov, Shai Shalev-Shwartz | The goal of this work is to shed light on the success of K-means with whitening for the task of unsupervised feature learning. |

166 | Learning Mixtures of Linear Classifiers | Yuekai Sun, Stratis Ioannidis, Andrea Montanari | We consider a discriminative learning (regression) problem, whereby the regression function is a convex combination of k linear classifiers. |

167 | The Falling Factorial Basis and Its Statistical Applications | Yu-Xiang Wang, Alex Smola, Ryan Tibshirani | We study a novel spline-like basis, which we name the \it falling factorial basis, bearing many similarities to the classic truncated power basis. |

168 | Nonmyopic e-Bayes-Optimal Active Learning of Gaussian Processes | Trong Nghia Hoang, Bryan Kian Hsiang Low, Patrick Jaillet, Mohan Kankanhalli | This paper presents a novel nonmyopic ε-Bayes-optimal active learning (ε-BAL) approach that jointly and naturally optimizes the trade-off. |

169 | A Unifying View of Representer Theorems | Andreas Argyriou, Francesco Dinuzzo | In this paper we propose a unified view, which generalizes the concept of representer theorems and extends necessary and sufficient conditions for such theorems to hold. |

170 | Online Clustering of Bandits | Claudio Gentile, Shuai Li, Giovanni Zappella | We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation (“bandit”) strategies. |

171 | Cold-start Active Learning with Robust Ordinal Matrix Factorization | Neil Houlsby, Jose Miguel Hernandez-Lobato, Zoubin Ghahramani | We present a new matrix factorization model for rating data and a corresponding active learning strategy to address the cold-start problem. |

172 | Multivariate Maximal Correlation Analysis | Hoang Vu Nguyen, Emmanuel M�ller, Jilles Vreeken, Pavel Efros, Klemens B�hm | We propose MAC, a novel multivariate correlation measure designed for discovering multi-dimensional patterns. |

173 | Efficient Label Propagation | Yasuhiro Fujiwara, Go Irie | This paper proposes an efficient label propagation algorithm that guarantees exactly the same labeling results as those yielded by optimal labeling scores. |

174 | Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm | Hadi Daneshmand, Manuel Gomez-Rodriguez, Le Song, Bernhard Schoelkopf | In this paper, we investigate the network structure inference problem for a general family of continuous-time diffusion models using an l1-regularized likelihood maximization framework. |

175 | Coupled Group Lasso for Web-Scale CTR Prediction in Display Advertising | Ling Yan, Wu-Jun Li, Gui-Rong Xue, Dingyi Han | In this paper, we propose a novel model, called coupled group lasso(CGL), for CTR prediction in display advertising. |

176 | Putting MRFs on a Tensor Train | Alexander Novikov, Anton Rodomanov, Anton Osokin, Dmitry Vetrov | In the paper we present a new framework for dealing with probabilistic graphical models. |

177 | Efficient Algorithms for Robust One-bit Compressive Sensing | Lijun Zhang, Jinfeng Yi, Rong Jin | In this paper, we study the vector recovery problem from noisy one-bit measurements, and develop two novel algorithms with formal theoretical guarantees. |

178 | Learning Complex Neural Network Policies with Trajectory Optimization | Sergey Levine, Vladlen Koltun | In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks. |

179 | Composite Quantization for Approximate Nearest Neighbor Search | Ting Zhang, Chao Du, Jingdong Wang | This paper presents a novel compact coding approach, composite quantization, for approximate nearest neighbor search. |

180 | Local Ordinal Embedding | Yoshikazu Terada, Ulrike Luxburg | We study the problem of ordinal embedding: given a set of ordinal constraints of the form distance(i,j) < distance(k,l) for some_quadruples (i,j,k,l) of indices, the goal is to construct a point configuration \hat\bmx_1, …, \hat\bmx_n in \R^p that preserves these constraints as well as possible. |

181 | Reducing Dueling Bandits to Cardinal Bandits | Nir Ailon, Zohar Karnin, Thorsten Joachims | We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. |

182 | Large-margin Weakly Supervised Dimensionality Reduction | Chang Xu, Dacheng Tao, Chao Xu, Yong Rui | A novel framework is proposed that integrates two aspects of the large margin principle (angle and distance), which simultaneously encourage angle consistency between preference pairs and maximize the distance between examples in preference pairs. |

183 | Joint Inference of Multiple Label Types in Large Networks | Deepayan Chakrabarti, Stanislav Funiak, Jonathan Chang, Sofus Macskassy | We tackle the problem of inferring node labels in a partially labeled graph where each node in the graph has multiple label types and each label type has a large number of possible labels. |

184 | Hard-Margin Active Linear Regression | Elad Hazan, Zohar Karnin | We consider the fundamental problem of linear regression in which the designer can actively choose observations. |

185 | Maximum Margin Multiclass Nearest Neighbors | Aryeh Kontorovich, Roi Weiss | We develop a general framework for margin-based multicategory classification in metric spaces. |

186 | Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications | Tian Lin, Bruno Abrahao, Robert Kleinberg, John Lui, Wei Chen | In this paper, we propose the model of combinatorial partial monitoring games with linear feedback, a model which simultaneously addresses limited feedback, infinite outcome space of the environment and exponentially large action space of the player. |

187 | Sparse meta-Gaussian information bottleneck | Melani Rey, Volker Roth, Thomas Fuchs | We present a new sparse compression technique based on the information bottleneck (IB) principle, which takes into account side information. |

188 | Nonparametric Estimation of Renyi Divergence and Friends | Akshay Krishnamurthy, Kirthevasan Kandasamy, Barnabas Poczos, Larry Wasserman | We consider nonparametric estimation of L_2, Renyi-αand Tsallis-αdivergences between continuous distributions. |

189 | Robust Inverse Covariance Estimation under Noisy Measurements | Jun-Kun Wang, Shou-de Lin | This paper proposes a robust method to estimate the inverse covariance under noisy measurements. |

190 | Bayesian Optimization with Inequality Constraints | Jacob Gardner, Matt Kusner, Zhixiang, Kilian Weinberger, John Cunningham | Here we present constrained Bayesian optimization, which places a prior distribution on both the objective and the constraint functions. |

191 | Circulant Binary Embedding | Felix Yu, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang | To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix. |

192 | Multiple Testing under Dependence via Semiparametric Graphical Models | Jie Liu, Chunming Zhang, Elizabeth Burnside, David Page | We propose a novel semiparametric approach for multiple testing under dependence, which estimates f1 adaptively. |

193 | Making Fisher Discriminant Analysis Scalable | Bojun Tu, Zhihua Zhang, Shusen Wang, Hui Qian | In this paper we present theoretical analysis on the approximation error of a two-stage algorithm. |

194 | Hierarchical Dirichlet Scaling Process | Dongwoo Kim, Alice Oh | We present the hierarchical Dirichlet scaling process (HDSP), a Bayesian nonparametric mixed membership model for multi-labeled data. |

195 | Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process | Issei Sato, Hiroshi Nakagawa | We theoretically analyze the SGLD algorithm with constant stepsize in two ways. |

196 | A PAC-Bayesian bound for Lifelong Learning | Anastasia Pentina, Christoph Lampert | In this work we study lifelong learning from a theoretical perspective. |

197 | Communication-Efficient Distributed Optimization using an Approximate Newton-type Method | Ohad Shamir, Nati Srebro, Tong Zhang | We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems. |

198 | Concept Drift Detection Through Resampling | Maayan Harel, Shie Mannor, Ran El-Yaniv, Koby Crammer | We present theoretical guarantees for the proposed procedure based on the stability of the underlying learning algorithms. |

199 | Anti-differentiating approximation algorithms:A case study with min-cuts, spectral, and flow | David Gleich, Michael Mahoney | We explore this concept with a case study of approximation algorithms for finding locally-biased partitions in data graphs, demonstrating connections between min-cut objectives, a personalized version of the popular PageRank vector, and the highly effective “push” procedure for computing an approximation to personalized PageRank. |

200 | A Bayesian Wilcoxon signed-rank test based on the Dirichlet process | Alessio Benavoli, Giorgio Corani, Francesca Mangili, Marco Zaffalon, Fabrizio Ruggeri | We propose a nonparametric Bayesian version of the Wilcoxon signed-rank test using a Dirichlet process (DP) based prior. |

201 | Min-Max Problems on Factor Graphs | Siamak Ravanbakhsh, Christopher Srinivasa, Brendan Frey, Russell Greiner | We study the min-max problem in factor graphs, which seeks the assignment that minimizes the maximum value over all factors. |

202 | Distributed Stochastic Gradient MCMC | Sungjin Ahn, Babak Shahbaba, Max Welling | Here we introduce the first fully distributed MCMC algorithm based on stochastic gradients. |

203 | Nearest Neighbors Using Compact Sparse Codes | Anoop Cherian | In this paper, we propose a novel scheme for approximate nearest neighbor (ANN) retrieval based on dictionary learning and sparse coding. |

204 | Optimal Mean Robust Principal Component Analysis | Feiping Nie, Jianjun Yuan, Heng Huang | In this paper, we propose novel robust PCA objective functions with removing optimal mean automatically. |

205 | Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows | Robert Busa-Fekete, Eyke Huellermeier, Bal�zs Sz�r�nyi | We address the problem of rank elicitation assuming that the underlying data generating process is characterized by a probability distribution on the set of all rankings (total orders) of a given set of items. |

206 | Hierarchical Conditional Random Fields for Outlier Detection: An Application to Detecting Epileptogenic Cortical Malformations | Bilal Ahmed, Thomas Thesen, Karen Blackmon, Yijun Zhao, Orrin Devinsky, Ruben Kuzniecky, Carla Brodley | We cast the problem of detecting and isolating regions of abnormal cortical tissue in the MRIs of epilepsy patients in an image segmentation framework. |

207 | A Physics-Based Model Prior for Object-Oriented MDPs | Jonathan Scholz, Martin Levihn, Charles Isbell, David Wingate | We present a physics-based approach that exploits modern simulation tools to efficiently parameterize physical dynamics. |

208 | Outlier Path: A Homotopy Algorithm for Robust SVM | Shinya Suzumura, Kohei Ogawa, Masashi Sugiyama, Ichiro Takeuchi | In this paper, we address these two issues simultaneously in an integrated way by introducing a novel homotopy approach to RSVM learning. |

209 | Ensemble-Based Tracking: Aggregating Crowdsourced Structured Time Series Data | Naiyan Wang, Dit-Yan Yeung | We propose a factorial hidden Markov model (FHMM) for ensemble-based tracking by learning jointly the unknown trajectory of the target and the reliability of each tracker in the ensemble. |

210 | Latent Confusion Analysis by Normalized Gamma Construction | Issei Sato, Hisashi Kashima, Hiroshi Nakagawa | We aim at summarizing the workers’ confusion matrices with the small number of latent principal confusion matrices because many personal confusion matrices is difficult to analyze. |

211 | Finito: A faster, permutable incremental gradient method for big data problems | Aaron Defazio, Justin Domke, Caetano | In this work we introduce a new method in this class with a theoretical convergence rate four times faster than existing methods, for sums with sufficiently many terms. |

212 | Ensemble Methods for Structured Prediction | Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri | We present a series of learning algorithms and theoretical guarantees for designing accurate ensembles of structured prediction tasks. |

213 | Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance | Simone Romano, James Bailey, Vinh Nguyen, Karin Verspoor | In this paper, we argue that a further type of statistical adjustment for the mutual information is also beneficial – an adjustment to correct selection bias. |

214 | Preserving Modes and Messages via Diverse Particle Selection | Jason Pacheco, Silvia Zuffi, Michael Black, Erik Sudderth | We develop a particle-based max-product algorithm which maintains a diverse set of posterior mode hypotheses, and is robust to initialization. |

215 | Nonlinear Information-Theoretic Compressive Measurement Design | Liming Wang, Abolfazl Razi, Miguel Rodrigues, Robert Calderbank, Lawrence Carin | We investigate design of general nonlinear functions for mapping high-dimensional data into a lower-dimensional (compressive) space. |

216 | Dual Query: Practical Private Query Release for High Dimensional Data | Marco Gaboardi, Emilio Jesus Gallego Arias, Justin Hsu, Aaron Roth, Zhiwei Steven Wu | We present a practical, differentially private algorithm for answering a large number of queries on high dimensional datasets. |

217 | Deep Boosting | Corinna Cortes, Mehryar Mohri, Umar Syed | We present a new ensemble learning algorithm, DeepBoost, which can use as base classifiers a hypothesis set containing deep decision trees, or members of other rich or complex families, and succeed in achieving high accuracy without overfitting the data. |

218 | Distributed Representations of Sentences and Documents | Quoc Le, Tomas Mikolov | In this paper, we propose an unsupervised algorithm that learns vector representations of sentences and text documents. |

219 | Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models | Robert McGibbon, Bharath Ramsundar, Mohammad Sultan, Gert Kiss, Vijay Pande | We present a machine learning framework for modeling protein dynamics. |

220 | Online Multi-Task Learning for Policy Gradient Methods | Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, Matthew Taylor | To make agents more sample-efficient, we developed a multi-task policy gradient method to learn decision making tasks consecutively, transferring knowledge between tasks to accelerate learning. |

221 | Affinity Weighted Embedding | Jason Weston, Ron Weiss, Hector Yee | We propose a new class of models which aim to provide improved performance while retaining many of the benefits of the existing class of embedding models. |

222 | Learning the Parameters of Determinantal Point Process Kernels | Raja Hafiz Affandi, Emily Fox, Ryan Adams, Ben Taskar | Here we propose Bayesian methods for learning the DPP kernel parameters. |

223 | Discrete Chebyshev Classifiers | Elad Eban, Elad Mezuman, Amir Globerson | Here we present a framework for discriminative learning given a set of statistics. |

224 | Deep AutoRegressive Networks | Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blundell, Daan Wierstra | We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. |

225 | A Convergence Rate Analysis for LogitBoost, MART and Their Variant | Peng Sun, Tong Zhang, Jie Zhou | We analyze their convergence rates based on a new weak learnability formulation. |

226 | Inferning with High Girth Graphical Models | Uri Heinemann, Amir Globerson | Motivated by this, we propose an algorithm that always returns models of this type, and hence in the models it returns inference is approximately correct. |

227 | Learning Latent Variable Gaussian Graphical Models | Zhaoshi Meng, Brian Eriksson, Al Hero | In this paper, we focus on a family of latent variable Gaussian graphical models (LVGGM), where the model is conditionally sparse given latent variables, but marginally non-sparse. |

228 | Stochastic Backpropagation and Approximate Inference in Deep Generative Models | Danilo Jimenez Rezende, Shakir Mohamed, Daan Wierstra | Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. |

229 | One Practical Algorithm for Both Stochastic and Adversarial Bandits | Yevgeny Seldin, Aleksandrs Slivkins | We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stochastic and adversarial regimes without prior knowledge about the nature of the environment. |

230 | Robust and Efficient Kernel Hyperparameter Paths with Guarantees | Joachim Giesen, Soeren Laue, Patrick Wieschollek | We use this algorithm to compute approximate kernel hyperparamter solution paths for support vector machines and robust kernel regression. |

231 | Active Transfer Learning under Model Shift | Xuezhi Wang, Tzu-Kuo Huang, Jeff Schneider | We propose two transfer learning algorithms that allow changes in all marginal and conditional distributions but assume the changes are smooth in order to achieve transfer between the tasks. |

232 | Approximate Policy Iteration Schemes: A Comparison | Bruno Scherrer | For all algorithms, we describe performance bounds, and make a comparison by paying a particular attention to the concentrability constants involved, the number of iterations and the memory required. |

233 | Stable and Efficient Representation Learning with Nonnegativity Constraints | Tsung-Han Lin, H. T. Kung | In this work, we provide extensive analysis and experimental results to examine and validate the stability advantage of NOMP. |

234 | Sample Efficient Reinforcement Learning with Gaussian Processes | Robert Grande, Thomas Walsh, Jonathan How | This paper derives sample complexity results for using Gaussian Processes (GPs) in both model-based and model-free reinforcement learning (RL). |

235 | Memory and Computation Efficient PCA via Very Sparse Random Projections | Farhad Pourkamali Anaraki, Shannon Hughes | In this paper, we propose an approach to principal component estimation that utilizes projections onto very sparse random vectors with Bernoulli-generated nonzero entries. |

236 | Time-Regularized Interrupting Options (TRIO) | Timothy Mann, Daniel Mankowitz, Shie Mannor | Therefore we introduce a regularization term that favors longer duration skills. |

237 | Randomized Nonlinear Component Analysis | David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, Bernhard Schoelkopf | In this paper we leverage randomness to design scalable new variants of nonlinear PCA and CCA; our ideas extend to key multivariate analysis tools such as spectral clustering or LDA. |

238 | High Order Regularization for Semi-Supervised Learning of Structured Output Problems | Yujia Li, Rich Zemel | We propose a new max-margin framework for semi-supervised structured output learning, that allows the use of powerful discrete optimization algorithms and high order regularizers defined directly on model predictions for the unlabeled examples. |

239 | Transductive Learning with Multi-class Volume Approximation | Gang Niu, Bo Dai, Christoffel Plessis, Masashi Sugiyama | In this paper, we propose a novel generalization to multiple classes, allowing applications of the large volume principle on more learning problems such as multi-class, multi-label and serendipitous learning in a transductive manner. |

240 | Methods of Moments for Learning Stochastic Languages: Unified Presentation and Empirical Comparison | Borja Balle, William Hamilton, Joelle Pineau | In this work, we provide a unified presentation and empirical comparison of three general moment-based methods in the context of modelling stochastic languages. |

241 | Effective Bayesian Modeling of Groups of Related Count Time Series | Nicolas Chapados | This paper introduces a hierarchical Bayesian formulation applicable to count time series that can easily account for explanatory variables and share statistical strength across groups of related time series. |

242 | Variational Inference for Sequential Distance Dependent Chinese Restaurant Process | Sergey Bartunov, Dmitry Vetrov | In this paper we propose novel variational inference for important sequential case of ddCRP (seqddCRP) by revealing its connection with Laplacian of random graph constructed by the process. |

243 | Discovering Latent Network Structure in Point Process Data | Scott Linderman, Ryan Adams | To enable analysis of these implicit networks, we develop a probabilistic model that combines mutually-exciting point processes with random graph models. |

244 | A Kernel Independence Test for Random Processes | Kacper Chwialkowski, Arthur Gretton | A non-parametric approach to the problem of testing the independence of two random processes is developed. |

245 | Learning to Disentangle Factors of Variation with Manifold Interaction | Scott Reed, Kihyuk Sohn, Yuting Zhang, Honglak Lee | We propose to learn manifold coordinates for the relevant factors of variation and to model their joint interaction. |

246 | Learning Modular Structures from Network Data and Node Variables | Elham Azizi, Edoardo Airoldi, James Galagan | Here, we propose an extended model that leverages direct observations about the network in addition to node-specific variables. |

247 | Probabilistic Partial Canonical Correlation Analysis | Yusuke Mukuta, Harada | In this paper, we have addressed these problems by proposing a probabilistic interpretation of partial CCA and deriving a Bayesian estimation method based on the probabilistic model. |

248 | Skip Context Tree Switching | Marc Bellemare, Joel Veness, Erik Talvitie | In this paper we show how to generalize this technique to the class of K-skip prediction suffix trees. |

249 | Lower Bounds for the Gibbs Sampler over Mixtures of Gaussians | Christopher Tosh, Sanjoy Dasgupta | In this paper, we present lower bounds for the mixing time of the Gibbs sampler over Gaussian mixture models with Dirichlet priors. |

250 | Marginalized Denoising Auto-encoders for Nonlinear Representations | Minmin Chen, Kilian Weinberger, Fei Sha, Yoshua Bengio | In this paper we present the marginalized Denoising Auto-encoder (mDAE), which (approximately) marginalizes out the corruption during training. |

251 | Gaussian Processes for Bayesian Estimation in Ordinary Differential Equations | David Barber, Yali Wang | We propose a Gaussian process model that directly links state derivative information with system observations, simplifying previous approaches and providing a natural generative model. |

252 | Fast Multi-stage Submodular Maximization | Kai Wei, Rishabh Iyer, Jeff Bilmes | We introduce a new multi-stage algorithmic framework for submodular maximization. |

253 | Programming by Feedback | Marc Schoenauer, Riad Akrour, Michele Sebag, Jean-Christophe Souplet | This paper advocates a new ML-based programming framework, called Programming by Feedback (PF), which involves a sequence of interactions between the active computer and the user. |

254 | Probabilistic Matrix Factorization with Non-random Missing Data | Jose Miguel Hernandez-Lobato, Neil Houlsby, Zoubin Ghahramani | We propose a probabilistic matrix factorization model for collaborative filtering that learns from data that is missing not at random(MNAR). |

255 | Pursuit-Evasion Without Regret, with an Application to Trading | Lili Dworkin, Michael Kearns, Yuriy Nevmyvaka | We propose a state-based variant of the classical online learning problem of tracking the best expert. |

256 | The f-Adjusted Graph Laplacian: a Diagonal Modification with a Geometric Interpretation | Sven Kurras, Ulrike Luxburg, Gilles Blanchard | Our goal is to re-weight the graph’s edges such that all cuts and volumes behave as if the graph was built on a different sample drawn from an alternative density q. |

257 | Riemannian Pursuit for Big Matrix Recovery | Mingkui Tan, Ivor W. Tsang, Li Wang, Bart Vandereycken, Sinno Jialin Pan | In this paper, we therefore propose an efficient method, called Riemannian Pursuit (RP), that aims to address these two problems simultaneously. |

258 | Dynamic Programming Boosting for Discriminative Macro-Action Discovery | Leonidas Lefakis, Francois Fleuret | Our main contribution is a novel supervised learning algorithm which extends the classical Boosting framework by combining it with dynamic programming. |

259 | Online Stochastic Optimization under Correlated Bandit Feedback | Mohammad Gheshlaghi azar, Alessandro Lazaric, Emma Brunskill | In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. |

260 | Weighted Graph Clustering with Non-Uniform Uncertainties | Yudong Chen, Shiau Hong Lim, Huan Xu | We propose a clustering algorithm that is based on optimizing an appropriate weighted objective, where larger weights are given to observations with lower uncertainty. |

261 | GeNGA: A Generalization of Natural Gradient Ascent with Positive and Negative Convergence Results | Philip Thomas | In our first contribution, we derive generalized natural gradient ascent (GeNGA), a generalization of NGA which allows for positive semidefinite non-smooth metric tensors. |

262 | A Bayesian Framework for Online Classifier Ensemble | Qinxun Bai, Henry Lam, Stan Sclaroff | We propose a Bayesian framework for recursively estimating the classifier weights in online learning of a classifier ensemble. |

263 | Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm | Jacob Steinhardt, Percy Liang | We present an adaptive variant of the exponentiated gradient algorithm. |

264 | Gaussian Approximation of Collective Graphical Models | Liping Liu, Daniel Sheldon, Thomas Dietterich | The Collective Graphical Model (CGM) models a population of independent and identically distributed individuals when only collective statistics (i.e., counts of individuals) are observed. |

265 | On learning to localize objects with minimal supervision | Hyun Oh Song, Ross Girshick, Stefanie Jegelka, Julien Mairal, Zaid Harchaoui, Trevor Darrell | In this paper, we propose a new method that achieves this goal with only image-level labels of whether the objects are present or not. |

266 | Multiresolution Matrix Factorization | Risi Kondor, Nedelina Teneva, Vikas Garg | Inspired by ideas from multiresolution analysis, this paper introduces a new notion of matrix factorization that can capture structure in matrices at multiple different scales. |

267 | Learnability of the Superset Label Learning Problem | Liping Liu, Thomas Dietterich | In this paper, we analyze Empirical Risk Minimizing learners that use the superset error as the empirical risk measure. |

268 | Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits | Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert Schapire | We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K \emphactions in response to the observed \emphcontext, and observes the \emphreward only for that action. |

269 | Structured Recurrent Temporal Restricted Boltzmann Machines | Roni Mittelman, Benjamin Kuipers, Silvio Savarese, Honglak Lee | In this work we propose a new class of RTRBM, which explicitly uses a dependency graph to model the structure in the problem and to define the energy function. |

270 | Scalable and Robust Bayesian Inference via the Median Posterior | Stanislav Minsker, Sanvesh Srivastava, Lizhen Lin, David Dunson | We propose a novel general approach to Bayesian inference that is scalable and robust to corruption in the data. |

271 | Kernel Adaptive Metropolis-Hastings | Dino Sejdinovic, Heiko Strathmann, Maria Lomeli Garcia, Christophe Andrieu, Arthur Gretton | A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support. |

272 | Input Warping for Bayesian Optimization of Non-Stationary Functions | Jasper Snoek, Kevin Swersky, Rich Zemel, Ryan Adams | We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function. |

273 | Stochastic Gradient Hamiltonian Monte Carlo | Tianqi Chen, Emily Fox, Carlos Guestrin | In this paper, we explore the properties of such a stochastic gradient HMC approach. |

274 | A Deep Semi-NMF Model for Learning Hidden Representations | George Trigeorgis, Konstantinos Bousmalis, Stefanos Zafeiriou, Bjoern Schuller | In this work we propose a novel model, Deep Semi-NMF, that is able to learn such hidden representations that allow themselves to an interpretation of clustering according to different, unknown attributes of a given dataset. |

275 | Asynchronous Distributed ADMM for Consensus Optimization | Ruiliang Zhang, James Kwok | In this paper, we propose an asynchronous ADMM algorithm by using two conditions to control the asynchrony: partial barrier and bounded delay. |

276 | Spectral Regularization for Max-Margin Sequence Tagging | Ariadna Quattoni, Borja Balle, Xavier Carreras, Amir Globerson | We frame max-margin learning of latent variable structured prediction models as a convex optimization problem, making use of scoring functions computed by input-output observable operator models. |

277 | Learning by Stretching Deep Networks | Gaurav Pandey, Ambedkar Dukkipati | In this paper, we propose a technique, called ‘stretching’, that allows the same models to perform considerably better with very little training. |

278 | Nonnegative Sparse PCA with Provable Guarantees | Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros Dimakis | We introduce a novel algorithm to compute nonnegative sparse principal components of positive semidefinite (PSD) matrices. |

279 | Active Learning of Parameterized Skills | Bruno Da Silva, George Konidaris, Andrew Barto | We introduce a method for actively learning parameterized skills. |

280 | Learning Ordered Representations with Nested Dropout | Oren Rippel, Michael Gelbart, Ryan Adams | In this paper, we present results on ordered representations of data in which different dimensions have different degrees of importance. |

281 | Learning the Irreducible Representations of Commutative Lie Groups | Taco Cohen, Max Welling | We present a new probabilistic model of compact commutative Lie groups that produces invariant-equivariant and disentangled representations of data. |

282 | Towards End-To-End Speech Recognition with Recurrent Neural Networks | Alex Graves, Navdeep Jaitly | This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. |

283 | Multi-period Trading Prediction Markets with Connections to Machine Learning | Jinli Hu, Amos Storkey | We present a new model for prediction markets, in which we use risk measures to model agents and introduce a market maker to describe the trading process. |

284 | Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets | Diederik Kingma, Max Welling | We show that either of these types of models can often be transformed into an instance of the other, by switching between centered and differentiable non-centered parameterizations of the latent variables. |

285 | Neural Variational Inference and Learning in Belief Networks | Andriy Mnih, Karol Gregor | We propose a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior. |

286 | Scalable Bayesian Low-Rank Decomposition of Incomplete Multiway Tensors | Piyush Rai, Yingjian Wang, Shengbo Guo, Gary Chen, David Dunson, Lawrence Carin | We present a scalable Bayesian framework for low-rank decomposition of multiway tensor data with missing observations. |

287 | Beta Diffusion Trees | Creighton Heaukulani, David Knowles, Zoubin Ghahramani | We define the beta diffusion tree, a random tree structure with a set of leaves that defines a collection of overlapping subsets of objects, known as a feature allocation. |

288 | Learning Character-level Representations for Part-of-Speech Tagging | Cicero Dos Santos, Bianca Zadrozny | In this paper, we propose a deep neural network that learns character-level representation of words and associate them with usual word representations to perform POS tagging. |

289 | Saddle Points and Accelerated Perceptron Algorithms | Adams Wei Yu, Fatma Kilinc-Karzan, Jaime Carbonell | In this paper, we consider the problem of finding a linear (binary) classifier or providing a near-infeasibility certificate if there is none. |

290 | Robust Distance Metric Learning via Simultaneous L1-Norm Minimization and Maximization | Hua Wang, Feiping Nie, Heng Huang | As an important theoretical contribution of this paper, we systematically derive an efficient iterative algorithm to solve the general L1-norm minmax problem, which is rarely studied in literature. |

291 | Learning from Contagion (Without Timestamps) | Kareem Amin, Hoda Heidari, Michael Kearns | We introduce and study new models for learning from contagion processes in a network. |

292 | Stochastic Variational Inference for Bayesian Time Series Models | Matthew Johnson, Alan Willsky | In this paper we develop SVI algorithms for several common Bayesian time series models, namely the hidden Markov model (HMM), hidden semi-Markov model (HSMM), and the nonparametric HDP-HMM and HDP-HSMM. |

293 | A Clockwork RNN | Jan Koutnik, Klaus Greff, Faustino Gomez, Juergen Schmidhuber | This paper introduces a simple, yet powerful modification to the simple RNN (SRN) architecture, the Clockwork RNN (CW-RNN), in which the hidden layer is partitioned into separate modules, each processing inputs at its own temporal granularity, making computations only at its prescribed clock rate. |

294 | Estimating Latent-Variable Graphical Models using Moments and Likelihoods | Arun Tejasvi Chaganty, Percy Liang | In this work, we show that the method of moments in conjunction with a composite marginal likelihood objective yields consistent parameter estimates for a much broader class of directed and undirected graphical models, including loopy graphs with high treewidth. |

295 | Universal Matrix Completion | Srinadh Bhojanapalli, Prateek Jain | In this work, we address these issues by providing a universal recovery guarantee for matrix completion that works for a variety of sampling schemes. |

296 | Finding Dense Subgraphs via Low-Rank Bilinear Optimization | Dimitris Papailiopoulos, Ioannis Mitliagkas, Alexandros Dimakis, Constantine Caramanis | In this work, we develop a novel algorithm for \DkS that searches a low-dimensional space for provably good solutions. |

297 | Compositional Morphology for Word Representations and Language Modelling | Jan Botha, Phil Blunsom | This paper presents a scalable method for integrating compositional morphological representations into a vector-based probabilistic language model. |

298 | Learning Polynomials with Neural Networks | Alexandr Andoni, Rina Panigrahy, Gregory Valiant, Li Zhang | In this paper, we present several positive theoretical results to support the effectiveness of neural networks. |

299 | Exponential Family Matrix Completion under Structural Constraints | Suriya Gunasekar, Pradeep Ravikumar, Joydeep Ghosh | In this paper, we provide a vastly unified framework for generalized matrix completion by considering a matrix completion setting wherein the matrix entries are sampled from any member of the rich family of \textitexponential family distributions; and impose general structural constraints on the underlying matrix, as captured by a general regularizer \mathcalR(.) |

300 | Sample-based approximate regularization | Philip Bachman, Amir-Massoud Farahmand, Doina Precup | We introduce a method for regularizing linearly parameterized functions using general derivative-based penalties, which relies on sampling as well as finite-difference approximations of the relevant derivatives. |

301 | A Compilation Target for Probabilistic Programming Languages | Brooks Paige, Frank Wood | Forward inference techniques such as sequential Monte Carlo and particle Markov chain Monte Carlo for probabilistic programming can be implemented in any programming language by creative use of standardized operating system functionality including processes, forking, mutexes, and shared memory. |

302 | Adaptive Monte Carlo via Bandit Allocation | James Neufeld, Andras Gyorgy, Csaba Szepesvari, Dale Schuurmans | We consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate. |

303 | Efficient Dimensionality Reduction for High-Dimensional Network Estimation | Safiye Celik, Benjamin Logsdon, Su-In Lee | We propose module graphical lasso (MGL), an aggressive dimensionality reduction and network estimation technique for a high-dimensional Gaussian graphical model (GGM). |

304 | Deterministic Anytime Inference for Stochastic Continuous-Time Markov Processes | E. Busra Celikkaya, Christian Shelton | We describe a deterministic anytime method for calculating filtered and smoothed distributions in large variable-based continuous time Markov processes. |

305 | Doubly Stochastic Variational Bayes for non-Conjugate Inference | Michalis Titsias, Miguel L�zaro-Gredilla | We propose a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian non-conjugate inference in continuous parameter spaces. |

306 | Efficient Learning of Mahalanobis Metrics for Ranking | Daryl Lim, Gert Lanckriet | We develop an efficient algorithm to learn a Mahalanobis distance metric by directly optimizing a ranking loss. |

307 | GEV-Canonical Regression for Accurate Binary Class Probability Estimation when One Class is Rare | Arpit Agarwal, Harikrishna Narasimhan, Shivaram Kalyanakrishnan, Shivani Agarwal | In this paper, we use tools from the theory of proper composite losses (Buja et al, 2005; Reid & Williamson, 2010) to construct a canonical underlying CPE loss corresponding to the GEV link, which yields a convex proper composite loss that we call the GEV-canonical loss; this loss is tailored for the task of CPE when one class is rare, and is easy to minimize using an IRLS-type algorithm similar to that used for logistic regression. |

308 | A reversible infinite HMM using normalised random measures | David Knowles, Zoubin Ghahramani, Konstantina Palla | We present a nonparametric prior over reversible Markov chains. |

309 | Structured Low-Rank Matrix Factorization: Optimality, Algorithm, and Applications to Image Processing | Benjamin Haeffele, Eric Young, Rene Vidal | In this paper we explore a matrix factorization technique suitable for large datasets that captures additional structure in the factors by using a projective tensor norm, which includes classical image regularizers such as total variation and the nuclear norm as particular cases. |

310 | Influence Function Learning in Information Diffusion Networks | Nan Du, Yingyu Liang, Maria Balcan, Le Song | In this paper, we exploit the insight that the influence functions in many diffusion models are coverage functions, and propose a novel parameterization of such functions using a convex combination of random basis functions. |