Paper Digest: ICML 2013 Highlights

June 16, 2013October 6, 2019 admin

The International Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. In 2013, it is to be held in Atlanta, Georgia.

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team
team@paperdigest.org

TABLE 1: ICML 2013 Papers

	Title	Authors	Highlight
1	An Optimal Policy for Target Localization with Application to Electron Microscopy	Raphael Sznitman, Aurelien Lucchi, Peter Frazier, Bruno Jedynak, Pascal Fua	Within a Bayesian framework, we study the problem of minimizing an objective that combines the entropy of the posterior distribution with the cost of the questions asked.
2	Domain Generalization via Invariant Feature Representation	Krikamol Muandet, David Balduzzi, Bernhard Sch�lkopf	We propose Domain-Invariant Component Analysis (DICA), a kernel-based optimization algorithm that learns an invariant transformation by minimizing the dissimilarity across domains, whilst preserving the functional relationship between input and output variables.
3	A Spectral Learning Approach to Range-Only SLAM	Byron Boots, Geoff Gordon	We present a novel spectral learning algorithm for simultaneous localization and mapping (SLAM) from range data with known correspondences.
4	Near-Optimal Bounds for Cross-Validation via Loss Stability	Ravi Kumar, Daniel Lokshtanov, Sergei Vassilvitskii, Andrea Vattani	Our work thus quantitatively improves the current best bounds on cross-validation.
5	Sparsity-Based Generalization Bounds for Predictive Sparse Coding	Nishant Mehta, Alexander Gray	We establish the first generalization error bounds for predictive sparse coding, in the overcomplete setting, where the number of features k exceeds the original dimensionality d.
6	Sparse Uncorrelated Linear Discriminant Analysis	Xiaowei Zhang, Delin Chu	In this paper, we develop a novel approach for sparse uncorrelated linear discriminant analysis (ULDA).
7	Block-Coordinate Frank-Wolfe Optimization for Structural SVMs	Simon Lacoste-Julien, Martin Jaggi, Mark Schmidt, Patrick Pletscher	We propose a randomized block-coordinate variant of the classic Frank-Wolfe algorithm for convex optimization with block-separable constraints.
8	Fast Probabilistic Optimization from Noisy Gradients	Philipp Hennig	Generalising recent results, this paper constructs a nonparametric Bayesian quasi-Newton algorithm that learns gradient and Hessian from noisy evaluations of the gradient.
9	Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes	Ohad Shamir, Tong Zhang	In this paper, we investigate the performance of SGD \emphwithout such smoothness assumptions, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy.
10	Stochastic Alternating Direction Method of Multipliers	Hua Ouyang, Niao He, Long Tran, Alexander Gray	In this paper, we present a stochastic setting for optimization problems with non-smooth composite objective functions.
11	Noisy Sparse Subspace Clustering	Yu-Xiang Wang, Huan Xu	This paper considers the problem of subspace clustering under noise.
12	Parallel Markov Chain Monte Carlo for Nonparametric Mixture Models	Sinead Williamson, Avinava Dubey, Eric Xing	In this paper, we describe auxiliary variable representations for the Dirichlet process and the hierarchical Dirichlet process that allow us to perform MCMC using the correct equilibrium distribution, in a distributed manner.
13	Risk Bounds and Learning Algorithms for the Regression Approach to Structured Output Prediction	S�bastien Gigu�re, Fran�ois Laviolette, Mario Marchand, Khadidja Sylla	We provide rigorous guarantees for the regression approach to structured output prediction.
14	Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures	James Bergstra, Daniel Yamins, David Cox	In this work, we propose a meta-modeling approach to support automated hyperparameter optimization, with the goal of providing practical tools that replace hand-tuning with a reproducible and unbiased optimization process.
15	Gibbs Max-Margin Topic Models with Fast Sampling Algorithms	Jun Zhu, Ning Chen, Hugh Perkins, Bo Zhang	This paper presents Gibbs max-margin supervised topic models by minimizing an expected margin loss, an upper bound of the existing margin loss derived from an expected prediction rule.
16	Cost-Sensitive Tree of Classifiers	Zhixiang Xu, Matt Kusner, Kilian Weinberger, Minmin Chen	In this paper, we address the challenge of balancing test-time cost and the classifier accuracy in a principled fashion.
17	Learning Hash Functions Using Column Generation	Xi Li, Guosheng Lin, Chunhua Shen, Anton Hengel, Anthony Dick	In this work, we propose a column generation based method for learning data-dependent hash functions on the basis of proximity comparison information.
18	Combinatorial Multi-Armed Bandit: General Framework and Applications	Wei Chen, Yajun Wang, Yang Yuan	We define a general framework for a large class of combinatorial multi-armed bandit (CMAB) problems, where simple arms with unknown istributions form \em super arms.
19	Near-optimal Batch Mode Active Learning and Adaptive Submodular Optimization	Yuxin Chen, Andreas Krause	We consider batch mode active learning and more general information-parallel stochastic optimization problems that exhibit adaptive submodularity, a natural diminishing returns condition.
20	Convex formulations of radius-margin based Support Vector Machines	Huyen Do, Alexandros Kalousis	In this paper we present two novel algorithms: R-SVM_μ^+—a SVM radius-margin based feature selection algorithm, and R-SVM^+ — a metric learning-based SVM.
21	Modelling Sparse Dynamical Systems with Compressed Predictive State Representations	William L. Hamilton, Mahdi Milani Fard, Joelle Pineau	We present a new algorithm, called Compressed Predictive State Representation (CPSR), for learning models of high-dimensional partially observable uncontrolled dynamical systems from small sample sets.
22	A Machine Learning Framework for Programming by Example	Aditya Menon, Omer Tamuz, Sumit Gulwani, Butler Lampson, Adam Kalai	We show how machine learning can be used to speed up this seemingly hopeless search problem, by learning weights that relate textual features describing the provided input-output examples to plausible sub-components of a program.
23	Discriminatively Activated Sparselets	Ross Girshick, Hyun Oh Song, Trevor Darrell	In this paper we describe a new training framework that learns which sparselets to activate in order to optimize a discriminative objective, leading to larger speedup factors with no decrease in task performance.
24	The Pairwise Piecewise-Linear Embedding for Efficient Non-Linear Classification	Ofir Pele, Ben Taskar, Amir Globerson, Michael Werman	In this work we propose a non-linear feature map that is both very efficient, but at the same time highly expressive.
25	Fixed-Point Model For Structured Labeling	Quannan Li, Jingdong Wang, David Wipf, Zhuowen Tu	In this paper, we propose a simple but effective solution to the structured labeling problem: a fixed-point model.
26	Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation	Boqing Gong, Kristen Grauman, Fei Sha	In this paper, we propose a novel approach for learning such features.
27	Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization	Abhishek Kumar, Vikas Sindhwani, Prabhanjan Kambadur	In this paper, we reformulate the separable NMF problem as that of finding the extreme rays of the conical hull of a finite set of vectors.
28	Principal Component Analysis on non-Gaussian Dependent Data	Fang Han, Han Liu	In this paper, we analyze the performance of a semiparametric principal component analysis named Copula Component Analysis (COCA) (Han & Liu, 2012) when the data are dependent.
29	Learning Linear Bayesian Networks with Latent Variables	Animashree Anandkumar, Daniel Hsu, Adel Javanmard, Sham Kakade	This work considers the problem of learning linear Bayesian networks when some of the variables are unobserved.
30	Multiple Identifications in Multi-Armed Bandits	S�ebastian Bubeck, Tengyao Wang, Nitin Viswanathan	We study the problem of identifying the top m arms in a multi-armed bandit game.
31	Learning Optimally Sparse Support Vector Machines	Andrew Cotter, Shai Shalev-Shwartz, Nati Srebro	We show how to train SVMs with an optimal guarantee on the number of support vectors (up to constants), and with sample complexity and training runtime bounds matching the best known for kernel SVM optimization (i.e. without any additional asymptotic cost beyond standard SVM training).
32	Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks	Creighton Heaukulani, Zoubin Ghahramani	In this paper, we introduce a new probabilistic model for capturing this phenomenon, which we call latent feature propagation, in social networks.
33	Efficient Sparse Group Feature Selection via Nonconvex Optimization	Shuo Xiang, Xiaoshen Tong, Jieping Ye	In this paper, we expand a nonconvex paradigm to sparse group feature selection, which is motivated by applications that require identifying the underlying group structure and performing feature selection simultaneously.
34	Domain Adaptation for Sequence Labeling Tasks with a Probabilistic Language Adaptation Model	Min Xiao, Yuhong Guo	In this paper, we propose to address the problem of domain adaptation for sequence labeling tasks via distributed representation learning by using a log-bilinear language adaptation model.
35	Maximum Variance Correction with Application to A* Search	Wenlin Chen, Kilian Weinberger, Yixin Chen	In this paper we introduce Maximum Variance Correction (MVC), which finds large-scale feasible solutions to Maximum Variance Unfolding (MVU) by post-processing embeddings from any manifold learning algorithm.
36	Adaptive Sparsity in Gaussian Graphical Models	Eleanor Wong, Suyash Awate, P. Thomas Fletcher	In this paper, we introduce a parameter-free method for estimating a precision matrix with sparsity that adapts to the data automatically.
37	Average Reward Optimization Objective In Partially Observable Domains	Yuri Grinberg, Doina Precup	In this paper, we show that PSRs have a well-behaved stationary distribution, which is a rational function of policy parameters.
38	Feature Selection in High-Dimensional Classification	Mladen Kolar, Han Liu	In this paper, we bridge this gap by providing sharp sufficient conditions for consistent variable selection using the ROAD estimator (Fan et al., 2010).
39	Human Boosting	Harsh Pareek, Pradeep Ravikumar	In this paper, we consider the problem of “boosting” human learners to extend the learning ability of human learners and achieve improved performance on tasks which individual humans find difficult.
40	Efficient Dimensionality Reduction for Canonical Correlation Analysis	Haim Avron, Christos Boutsidis, Sivan Toledo, Anastasios Zouzias	We present a fast algorithm for approximate Canonical Correlation Analysis (CCA).
41	Parsing epileptic events using a Markov switching process model for correlated time series	Drausin Wulsin, Emily Fox, Brian Litt	A goal of our work is to parse these complex epileptic events into distinct dynamic regimes.
42	Optimal rates for stochastic convex optimization under Tsybakov noise condition	Aaditya Ramdas, Aarti Singh	We focus on the problem of minimizing a convex function f over a convex set S given T queries to a stochastic first order oracle.
43	A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning	Arash Afkanpour, Andr�s Gy�rgy, Csaba Szepesvari, Michael Bowling	We propose a randomized version of the mirror descent algorithm to overcome this issue, under the objective of minimizing the group p-norm penalized empirical risk.
44	Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery	Yudong Chen, Constantine Caramanis	In this paper we develop a simple variant of orthogonal matching pursuit (OMP) for precisely this setting.
45	Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method	Taiji Suzuki	We propose two types of online variants of ADMM, which correspond to online proximal gradient descent and regularized dual averaging respectively.
46	A New Frontier of Kernel Design for Structured Data	Kilho Shin	In this paper, we shed light on a novel important class of kernels within the framework: We give a mathematical characterization of the class, show a parametric method to optimize kernels of the class to specific problems, based on this characterization, and present some experimental results, which show the new kernels are promising in both accuracy and efficiency.
47	Learning with Marginalized Corrupted Features	Laurens Maaten, Minmin Chen, Stephen Tyree, Kilian Weinberger	We propose to corrupt training examples with noise from known distributions within the exponential family and present a novel learning algorithm, called marginalized corrupted features (MCF), that trains robust predictors by minimizing the expected value of the loss function under the corrupting distribution – essentially learning with infinitely many (corrupted) training examples.
48	Approximation properties of DBNs with binary hidden units and real-valued visible units	Oswin Krause, Asja Fischer, Tobias Glasmachers, Christian Igel	We analyze the approximation properties of DBNs with two layers of binary hidden units and visible units with conditional distributions from the exponential family.
49	Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization	Martin Jaggi	We present a new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, and discuss the broad application areas of this approach.
50	General Functional Matrix Factorization Using Gradient Boosting	Tianqi Chen, Hang Li, Qiang Yang, Yong Yu	In this paper, we study the problem of utilizing auxiliary information as features of factorization and propose formalizing the problem as general functional matrix factorization, whose model includes conventional matrix factorization models as its special cases.
51	Iterative Learning and Denoising in Convolutional Neural Associative Memories	Amin Karbasi, Amir Hesam Salavati, Amin Shokrollahi	More specifically, we devise an iterative algorithm that learns the redundancy among the patterns.
52	Scaling Multidimensional Gaussian Processes using Projected Additive Approximations	Elad GilboaYunus Saat�i, John Cunningham, Elad Gilboa	This paper introduces and tests a novel method of projected additive approximation to multidimensional GPs.
53	Active Learning for Multi-Objective Optimization	Marcela Zuluaga, Guillaume Sergent, Andreas Krause, Markus P�schel	To address this challenge, we propose the Pareto Active Learning (PAL) algorithm, which intelligently samples the design space to predict the Pareto-optimal set.
54	A Generalized Kernel Approach to Structured Output Learning	Hachem Kadri, Mohammad Ghavamzadeh, Philippe Preux	We study the problem of structured output learning from a regression perspective.
55	Efficient Active Learning of Halfspaces: an Aggressive Approach	Alon Gonen, Sivan Sabato, Shai Shalev-Shwartz	We revisit the aggressive approach for active learning in the realizable case, and show that it can be made efficient and practical, while also having theoretical guarantees under reasonable assumptions.
56	Enhanced statistical rankings via targeted data collection	Braxton Osting, Christoph Brune, Stanley Osher	This reduction of the data collection problem to a spectral graph-theoretic question is one of the primary contributions of this work.
57	Online Feature Selection for Model-based Reinforcement Learning	Trung Nguyen, Zhuoru Li, Tomi Silander, Tze Yun Leong	We propose a new framework for learning the world dynamics of feature-rich environments in model-based reinforcement learning.
58	ELLA: An Efficient Lifelong Learning Algorithm	Paul Ruvolo, Eric Eaton	In this paper, we develop a method for online multi-task learning in the lifelong learning setting.
59	A Structural SVM Based Approach for Optimizing Partial AUC	Harikrishna Narasimhan, Shivani Agarwal	In this paper, we develop a structural SVM framework for directly optimizing the partial AUC between any two false positive rates.
60	Convex Relaxations for Learning Bounded-Treewidth Decomposable Graphs	K. S. Sesh Kumar, Francis Bach	In this paper, we pose it as a combinatorial optimization problem, which is then relaxed to a convex optimization problem that involves searching over the forest and hyperforest polytopes with special structures.
61	Adaptive Task Assignment for Crowdsourced Classification	Chien-Ju Ho, Shahin Jabbari, Jennifer Wortman Vaughan	We investigate the problem of task assignment and label inference for heterogeneous classification tasks.
62	Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning	Odalric-Ambrym Maillard, Phuong Nguyen, Ronald Ortner, Daniil Ryabko	We propose an algorithm whose regret after T time steps is O(\sqrtT), with all constants reasonably small.
63	Better Mixing via Deep Representations	Yoshua Bengio, Gregoire Mesnil, Yann Dauphin, Salah Rifai	To better understand this, we propose a secondary conjecture: the higher-level samples fill more uniformly the space they occupy and the high-density manifolds tend to unfold when represented at higher levels.
64	Online Latent Dirichlet Allocation with Infinite Vocabulary	Ke Zhai, Jordan Boyd-Graber	Topic models based on latent Dirichlet allocation (LDA) assume a predefined vocabulary a priori.
65	Characterizing the Representer Theorem	Yaoliang Yu, Hao Cheng, Dale Schuurmans, Csaba Szepesvari	In this paper we completely characterize the necessary and sufficient conditions on the regularizer that ensure the representer theorem holds.
66	Dynamical Models and tracking regret in online convex programming	Eric Hall, Rebecca Willett	This paper describes a new online convex optimization method which incorporates a family of candidate dynamical models and establishes novel tracking regret bounds that scale with comparator’s deviation from the best dynamical model in this family.
67	Large-Scale Bandit Problems and KWIK Learning	Jacob Abernethy, Kareem Amin, Michael Kearns, Moez Draief	We provide such results in both the standard parametric MAB setting, as well as for a new model in which the action space is finite but growing with time.
68	Vanishing Component Analysis	Roi Livni, David Lehavi, Sagi Schein, Hila Nachliely, Shai Shalev-Shwartz, Amir Globerson	Here we describe and analyze an efficient procedure that constructs a set of generators of a vanishing ideal.
69	Learning an Internal Dynamics Model from Control Demonstration	Matthew Golub, Steven Chase, Byron Yu	Here, we consider the problem of learning the subject’s internal model from demonstrations of control and knowledge of task goals.
70	Robust Structural Metric Learning	Daryl Lim, Gert Lanckriet, Brian McFee	In this paper, we present an efficient and robust structural metric learning algorithm which enforces group sparsity on the learned transformation, while optimizing for structured ranking output prediction.
71	Constrained fractional set programs and their application in local clustering and community detection	Thomas B�hler, Shyam Sundar Rangapuram, Simon Setzer, Matthias Hein	In this paper we show that every constrained minimization problem of a ratio of non-negative set functions allows a tight relaxation into an unconstrained continuous optimization problem.
72	Efficient Semi-supervised and Active Learning of Disjunctions	Nina Balcan, Christopher Berlind, Steven Ehrlich, Yingyu Liang	We provide efficient algorithms for learning disjunctions in the semi-supervised setting under a natural regularity assumption introduced by (Balcan & Blum, 2005).
73	Convex Adversarial Collective Classification	MohamadAli Torkamani, Daniel Lowd	In this paper, we present a novel method for robustly performing collective classification in the presence of a malicious adversary that can modify up to a fixed number of binary-valued attributes.
74	Rounding Methods for Discrete Linear Classification	Yann Chevaleyre, Fr�d�erick Koriche, Jean-daniel Zucker	In this paper, the learning task is cast as combinatorial optimization problem: given a set of positive and negative feature vectors in the Euclidean space, the goal is to find a discrete linear function that minimizes the cumulative hinge loss of this training set.
75	Mixture of Mutually Exciting Processes for Viral Diffusion	Shuang-Hong Yang, Hongyuan Zha	This paper shows that these two tasks can be addressed simultaneously with a probabilistic model involving a mixture of mutually exciting point processes.
76	Gaussian Process Vine Copulas for Multivariate Dependence	David Lopez-Paz, Jose Miguel Hern�ndez-Lobato, Ghahramani Zoubin	In this paper, we relax this assumption by discovering the latent functions that specify the shape of a conditional copula given its conditioning variables.
77	Stochastic Simultaneous Optimistic Optimization	Michal Valko, Alexandra Carpentier, R�mi Munos	We study the problem of global maximization of a function f given a finite number of evaluations perturbed by noise.
78	Toward Optimal Stratification for Stratified Monte-Carlo Integration	Alexandra Carpentier, R�mi Munos	In this paper we provide an algorithm \em Monte-Carlo Upper-Lower Confidence Bound that selects online, among a large class of partitions, the partition that provides a near-optimal trade-off, and allocates the samples almost optimally on this partition.
79	A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems	Pinghua Gong, Changshui Zhang, Zhaosong Lu, Jianhua Huang, Jieping Ye	In this paper, we propose a General Iterative Shrinkage and Thresholding (GIST) algorithm to solve the nonconvex optimization problem for a large class of non-convex penalties.
80	Thurstonian Boltzmann Machines: Learning from Multiple Inequalities	Truyen Tran, Dinh Phung, Svetha Venkatesh	We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time.
81	A Variational Approximation for Topic Modeling of Hierarchical Corpora	Do-kyum Kim, Geoffrey Voelker, Lawrence Saul	We study the problem of topic modeling in corpora whose documents are organized in a multi-level hierarchy.
82	Forecastable Component Analysis	Georg Goerg	I introduce Forecastable Component Analysis (ForeCA), a novel dimension reduction technique for temporally dependent signals.
83	Ellipsoidal Multiple Instance Learning	Gabriel Krummenacher, Cheng Soon Ong, Joachim Buhmann	We propose a large margin method for asymmetric learning with ellipsoids, called eMIL, suited to multiple instance learning (MIL). We introduce a MIL dataset from a real world application of detecting wheel defects from multiple partial observations, and show that eMIL outperforms competing approaches.
84	Local Low-Rank Matrix Approximation	Joonseok Lee, Seungyeon Kim, Guy Lebanon, Yoram Singer	We propose a new matrix approximation model where we assume instead that the matrix is locally of low-rank, leading to a representation of the observed matrix as a weighted sum of low-rank matrices.
85	Generic Exploration and K-armed Voting Bandits	Tanguy Urvoy, Fabrice Clerot, Raphael F�raud, Sami Naamane	We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits.
86	A unifying framework for vector-valued manifold regularization and multi-view learning	Minh H� Quang, Loris Bazzani, Vittorio Murino	This paper presents a general vector-valued reproducing kernel Hilbert spaces (RKHS) formulation for the problem of learning an unknown functional dependency between a structured input space and a structured output space, in the Semi-Supervised Learning setting.
87	Learning Connections in Financial Time Series	Gartheeban Ganeshapillai, John Guttag, Andrew Lo	We present a machine learning-based method to build a connectedness matrix to address the shortcomings of correlation in capturing events such as large losses.
88	Fast dropout training	Sida Wang, Christopher Manning	Based on an examination of the implied objective function of dropout training, we show how to do fast dropout training by sampling from or integrating a Gaussian approximation, instead of doing Monte Carlo optimization of this objective.
89	Scalable Optimization of Neighbor Embedding for Visualization	Zhirong Yang, Jaakko Peltonen, Samuel Kaski	We demonstrate that the obvious approach of subsampling produces inferior results and propose a generic approximated optimization technique that reduces the NE optimization cost to O(n log n).
90	Precision-recall space to correct external indices for biclustering	Blaise Hanczar, Mohamed Nadif	All these algorithms aim to obtain coherent biclusters and it is crucial to have a reliable procedure for their validation.
91	Monochromatic Bi-Clustering	Sharon Wulff, Ruth Urner, Shai Ben-David	We propose a natural cost function for the bi-clustering task, the monochromatic cost.
92	Gated Autoencoders with Tied Input Weights	Droniou Alain, Sigaud Olivier	In this work, we extend the mathematical study of (Memisevic, 2012b) to show that it is possible to use a unique projection for both images in a way that turns intermediate layers as spectrum encoders of transformations.
93	Strict Monotonicity of Sum of Squares Error and Normalized Cut in the Lattice of Clusterings	Nicola Rebagliati	Here we study monotonicity not just on the minimizers but on the entire clustering lattice.
94	Transition Matrix Estimation in High Dimensional Time Series	Fang Han, Han Liu	In this paper, we propose a new method in estimating transition matrices of high dimensional vector autoregressive (VAR) models.
95	Label Partitioning For Sublinear Ranking	Jason Weston, Ameesh Makadia, Hector Yee	We present a general approach for converting an algorithm which has linear time in the size of the set to a sublinear one via label partitioning.
96	Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing	Huayan Wang, Koller Daphne	In this paper, we show a “unified” message passing algorithm that: (a) subsumes MPLP, MSD, and TRW-S as special cases when applied to their respective choices of dual objective and blocks, and (b) is able to perform BCD under much more flexible choices of blocks (including very large blocks) as well as the dual objective itself (that arise from an arbitrary dual decomposition).
97	Collaborative hyperparameter tuning	R�mi Bardenet, M�ty�s Brendel, Bal�zs K�gl, Mich�le Sebag	In this paper, we propose a generic method to incorporate knowledge from previous experiments when simultaneously tuning a learning algorithm on new problems at hand.
98	SADA: A General Framework to Support Robust Causation Discovery	Ruichu Cai, Zhenjie Zhang, Zhifeng Hao	Motivated by the observations of the local sparsity properties on causal structures, we propose a general Split-and-Merge strategy, named SADA, to enhance the scalability of a wide class of causality discovery algorithms.
99	Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines	Kihyuk Sohn, Guanyu Zhou, Chansoo Lee, Honglak Lee	To address this problem, we propose a point-wise gated Boltzmann machine, a unified generative model that combines feature learning and feature selection.
100	Sequential Bayesian Search	Zheng Wen, Branislav Kveton, Brian Eriksson, Sandilya Bhamidipati	We propose an efficient learning algorithm for solving the problem, sequential Bayesian search (SBS), and prove that it is Bayesian optimal.
101	Sparse projections onto the simplex	Anastasios Kyrillidis, Stephen Becker, Volkan Cevher, Christoph Koch	Most learning methods with rank or sparsity constraints use convex relaxations, which lead to optimization with the nuclear norm or the \ell_1-norm.
102	Modeling Musical Influence with Topic Models	Uri Shalit, Daphna Weinshall, Gal Chechik	In this work we approach the question of influence by applying topic-modeling tools (Blei & Lafferty, 2006; Gerrish & Blei, 2010) to a dataset of 24941 songs by 9222 artists, from the years 1922 to 2010.
103	Subtle Topic Models and Discovering Subtly Manifested Software Concerns Automatically	Mrinal Das, Suparna Bhattacharya, Chiranjib Bhattacharyya, Gopinath Kanchi	To address this issue we propose subtle topic models(STM). We pose this problem as that of discovering topics that rarely occur in individual documents, which we will refer to as subtle topics.
104	Exploring the Mind: Integrating Questionnaires and fMRI	Esther Salazar, Ryan Bogdan, Adam Gorka, Ahmad Hariri, Lawrence Carin	A new model is developed for joint analysis of ordered, categorical, real and count data.
105	A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions	Quoc Tran Dinh, Anastasios Kyrillidis, Volkan Cevher	We propose an algorithmic framework for convex minimization problems of composite functions with two terms: a self-concordant part and a possibly nonsmooth regularization part.
106	A Practical Algorithm for Topic Modeling with Provable Guarantees	Sanjeev Arora, Rong Ge, Yonatan Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu	In this paper we present an algorithm for learning topic models that is both provable and practical.
107	Distributed training of Large-scale Logistic models	Siddharth Gopal, Yiming Yang	In this paper, we propose a training method for large-scale multinomial logistic models that breaks this bottleneck by enabling parallel optimization of the likelihood objective.
108	An Adaptive Learning Rate for Stochastic Variational Inference	Rajesh Ranganath, Chong Wang, Blei David, Eric Xing	We solve this problem by developing an adaptive learning rate for stochastic inference.
109	Margins, Shrinkage, and Boosting	Matus Telgarsky	This manuscript shows that AdaBoost and its immediate variants can produce approximately maximum margin classifiers simply by scaling their step size choices by a fixed small constant.
110	Canonical Correlation Analysis based on Hilbert-Schmidt Independence Criterion and Centered Kernel Target Alignment	Billy Chang, Uwe Kruger, Rafal Kustra, Junping Zhang	The aim of this article is to introduce two nonlinear CCA extensions that rely on the recently proposed Hilbert-Schmidt independence criterion and the centered kernel target alignment.
111	Large-Scale Learning with Less RAM via Randomization	Daniel Golovin, D. Sculley, Brendan McMahan, Michael Young	We reduce the memory footprint of popular large-scale online learning methods by projecting our weight vector onto a coarse discrete set using randomized rounding.
112	Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization	Stefano Ermon, Carla Gomes, Ashish Sabharwal, Bart Selman	We propose a randomized algorithm that, with high probability, gives a constant-factor approximation of a general discrete integral defined over an exponentially large set.
113	Sparse coding for multitask and transfer learning	Andreas Maurer, Massi Pontil, Bernardino Romera-Paredes	We investigate the use of sparse coding and dictionary learning in the context of multitask and transfer learning.
114	Direct Modeling of Complex Invariances for Visual Object Features	Ka Yu Hui	In this paper we consider an alternate strategy of directly modeling complex invariances of object features.
115	Hierarchically-coupled hidden Markov models for learning kinetic rates from single-molecule data	Jan-Willem Meent, Jonathan Bronson, Frank Wood, Ruben Gonzalez Jr., Chris Wiggins	We address the problem of analyzing sets of noisy time-varying signals that all report on the same process but confound straightforward analyses due to complex inter-signal heterogeneities and measurement artifacts.
116	Activized Learning with Uniform Classification Noise	Liu Yang, Steve Hanneke	We prove that for any VC class, it is possible to transform any passive learning algorithm into an active learning algorithm with strong asymptotic improvements in label complexity for every nontrivial distribution satisfying a uniform classification noise condition.
117	Guided Policy Search	Sergey Levine, Vladlen Koltun	We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima.
118	Squared-loss Mutual Information Regularization: A Novel Information-theoretic Approach to Semi-supervised Learning	Gang Niu, Wittawat Jitkrittum, Bo Dai, Hirotaka Hachiya, Masashi Sugiyama	We propose squared-loss mutual information regularization (SMIR) for multi-class probabilistic classification, following the information maximization principle.
119	Gossip-based distributed stochastic bandit algorithms	Balazs Szorenyi, Robert Busa-Fekete, Istvan Hegedus, Robert Ormandi, Mark Jelasity, Balazs Kegl	Our goal is to adapt stochastic bandit algorithms to P2P networks.
120	The Sample-Complexity of General Reinforcement Learning	Tor Lattimore, Marcus Hutter, Peter Sunehag	We study the sample-complexity of reinforcement learning in a general setting without assuming ergodicity or finiteness of the environment.
121	Hierarchical Regularization Cascade for Joint Learning	Alon Zweig, Daphna Weinshall	We present a hierarchical approach which exploits information sharing among different classification tasks, in multi-task and multi-class settings.
122	Multi-Class Classification with Maximum Margin Multiple Kernel	Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh	We present a new algorithm for multi-class classification with multiple kernels.
123	Bayesian Games for Adversarial Regression Problems	Michael Gro�hans, Christoph Sawade, Michael Br�ckner, Tobias Scheffer	We study regression problems in which an adversary can exercise some control over the data generation process.
124	Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing	Xi Chen, Qihang Lin, Dengyong Zhou	To solve the computational challenge, we propose a novel approximate policy which is called optimistic knowledge gradient.
125	Markov Network Estimation From Multi-attribute Data	Mladen Kolar, Han Liu, Eric Xing	In this paper, we propose a new principled framework for estimating multi-attribute graphs.
126	MILEAGE: Multiple Instance LEArning with Global Embedding	Dan Zhang, Jingrui He, Luo Si, Richard Lawrence	This paper proposes a novel framework – \emphMultiple Instance LEArning with Global Embedding (MILEAGE), in which the global feature vectors for traditional learning methods are integrated into the MIL setting.
127	Guaranteed Sparse Recovery under Linear Transformation	Ji Liu, Lei Yuan, Jieping Ye	We consider the following signal recovery problem: given a measurement matrix Φ∈\mathbbR^n\times p and a noisy observation vector c∈\mathbbR^n constructed from c = Φθ^* + εwhere ε∈\mathbbR^n is the noise vector whose entries follow i.i.d. centered sub-Gaussian distribution, how to recover the signal θ^* if Dθ^* is sparse \rca under a linear transformation D∈\mathbbR^m\times p?
128	Learning invariant features by harnessing the aperture problem	Roland Memisevic, Georgios Exarchakis	The energy model is a simple, biologically inspired approach to extracting relationships between images in tasks like stereopsis and motion analysis.
129	Efficient Ranking from Pairwise Comparisons	Fabian Wauthier, Michael Jordan, Nebojsa Jojic	In this paper we show that much simpler algorithms can match the same Ω(n) lower bound in expectation.
130	Differentially Private Learning with Kernels	Prateek Jain, Abhradeep Thakurta	In this paper, we consider the problem of differentially private learning where access to the training features is through a kernel function only.
131	Thompson Sampling for Contextual Bandits with Linear Payoffs	Shipra Agrawal, Navin Goyal	In this paper, we design and analyze Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary.
132	Learning Multiple Behaviors from Unlabeled Demonstrations in a Latent Controller Space	Javier Almingol, Lui Montesano, Manuel Lopes	In this paper we introduce a method to learn multiple behaviors in the form of motor primitives from an unlabeled dataset.
133	Inference algorithms for pattern-based CRFs on sequence data	Rustem Takhanov, Vladimir Kolmogorov	We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP.
134	One-Bit Compressed Sensing: Provable Support and Vector Recovery	Sivakant Gopi, Praneeth Netrapalli, Prateek Jain, Aditya Nori	In this paper, we study the problem of one-bit compressed sensing (1-bit CS), where the goal is to design a measurement matrix A and a recovery algorithm s.t. a k-sparse vector \x^* can be efficiently recovered back from signed linear measurements, i.e., b=\sign(A\x^*).
135	Tensor Analyzers	Yichuan Tang, Ruslan Salakhutdinov, Geoffrey Hinton	In this paper, we introduce Tensor Analyzers which are a multilinear generalization of Factor Analyzers.
136	Learning Sparse Penalties for Change-point Detection using Max Margin Interval Regression	Toby Hocking, Guillem Rigaill, Jean-Philippe Vert, Francis Bach	In this work, we propose to learn the penalty and its constants in databases of signals with weak change-point annotations.
137	Learning from Human-Generated Lists	Kwang-Sung Jun, Jerry Zhu, Burr Settles, Timothy Rogers	We propose a generative model – sampling with reduced replacement (SWIRL) – for such lists.
138	A Fast and Exact Energy Minimization Algorithm for Cycle MRFs	Huayan Wang, Koller Daphne	In this paper, we present a fast and exact algorithm for energy minimization in cycle MRFs, which can be used as a subroutine in tackling general MRFs.
139	Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning	Daniel Tarlow, Kevin Swersky, Laurent Charlin, Ilya Sutskever, Rich Zemel	Here we present kNCA, which generalizes NCA by learning distance metrics that are appropriate for kNN with arbitrary k.
140	An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation	Nicholas Bryan, Gautham Mysore	For the constraints considered, an efficient expectation-maximization algorithm is derived with closed-form multiplicative updates, drawing connections to non-negative matrix factorization methods, and allowing for high-quality interactive-rate separation without explicit training data.
141	Estimating Unknown Sparsity in Compressed Sensing	Miles Lopes	In this paper, we propose to estimate a stable measure of sparsity s(x):=\\|x\\|_1^2/\\|x\\|_2^2, which is a sharp lower bound on \\|x\\|_0.
142	MAD-Bayes: MAP-based Asymptotic Derivations from Bayes	Tamara Broderick, Brian Kulis, Michael Jordan	We demonstrate several other algorithms, all of which are scalable and simple to implement.
143	The Most Generative Maximum Margin Bayesian Networks	Robert Peharz, Sebastian Tschiatschek, Franz Pernkopf	In this paper, we introduce a novel approach of hybrid generative-discriminative learning for Bayesian networks.
144	Fastfood – Computing Hilbert Space Expansions in loglinear time	Quoc Le, Tamas Sarlos, Alexander Smola	This paper proposes an improvement to random kitchen sinks that offers significantly faster computation in log-linear time without sacrificing accuracy.
145	Joint Transfer and Batch-mode Active Learning	Rita Chattopadhyay, Wei Fan, Ian Davidson, Sethuraman Panchanathan, Jieping Ye	In this work, we present an integrated framework that performs transfer and active learning simultaneously by solving a single convex optimization problem.
146	Message passing with l1 penalized KL minimization	Yuan Qi, Yandong Guo	To address this issue, we propose an approximate inference method, relaxed expectation propagation(REP), based on a new divergence with a l1 penalty.
147	Mean Reversion with a Variance Threshold	Marco Cuturi, Alexandre D�Aspremont	Starting from a multivariate data set, we study several techniques to isolate affine combinations of the variables with a maximum amount of mean reversion, while constraining the variance to be larger than a given threshold.
148	Top-down particle filtering for Bayesian decision trees	Balaji Lakshminarayanan, Daniel Roy, Yee Whye Teh	We present a sequential Monte Carlo (SMC) algorithm that instead works in a top-down manner, mimicking the behavior and speed of classic algorithms.
149	Smooth Sparse Coding via Marginal Regression for Learning Sparse Representations	Krishnakumar Balasubramanian, Kai Yu, Guy Lebanon	We propose and analyze a novel framework for learning sparse representations, based on two statistical techniques: kernel smoothing and marginal regression.
150	Robust and Discriminative Self-Taught Learning	Hua Wang, Feiping Nie, Heng Huang	To address this, we propose a novel robust and discriminative self-taught learning approach to utilize any unlabeled data without the above restrictions.
151	Safe Policy Iteration	Matteo Pirotta, Marcello Restelli, Alessio Pecorino, Daniele Calandriello	We propose two safe policy-iteration algorithms that differ in the way the next policy is chosen w.r.t. the estimated greedy policy.
152	Unfolding Latent Tree Structures using 4th Order Tensors	Mariya Ishteva, Haesun Park, Le Song	In this paper, we propose a quartet based approach which is agnostic to this number.
153	Learning Fair Representations	Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, Cynthia Dwork	We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the proportion in the population as a whole), and individual fairness (similar individuals should be treated similarly).
154	Hierarchical Tensor Decomposition of Latent Tree Graphical Models	Le Song, Mariya Ishteva, Ankur Parikh, Eric Xing, Haesun Park	We approach the problem of estimating the parameters of a latent tree graphical model from a hierarchical tensor decomposition point of view.
155	No more pesky learning rates	Tom Schaul, Sixin Zhang, Yann LeCun	We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time.
156	Multi-View Clustering and Feature Learning via Structured Sparsity	Hua Wang, Feiping Nie, Heng Huang	In this paper, we propose a novel multi-view learning model to integrate all features and learn the weight for every feature with respect to each cluster individually via new joint structured sparsity-inducing norms.
157	Planning by Prioritized Sweeping with Small Backups	Harm Van Seijen, Rich Sutton	In this paper, we introduce a new planning backup that uses only the current value of a single successor state and has a computation time independent of the number of successor states.
158	Solving Continuous POMDPs: Value Iteration with Incremental Learning of an Efficient Space Representation	Sebastian Brechtel, Tobias Gindele, R�diger Dillmann	We propose the novel concept of learning a discrete representation of the continuous state space to solve the integrals in continuous POMDPs efficiently and generalize sparse calculations over the continuous space.
159	Learning Heteroscedastic Models by Convex Programming under Group Sparsity	Arnak Dalalyan, Mohamed Hebiri, Katia Meziani, Joseph Salmon	In this paper, we propose a new approach to the joint estimation of the conditional mean and the conditional variance in a high-dimensional (auto-) regression setting.
160	Covariate Shift in Hilbert Space: A Solution via Sorrogate Kernels	Kai Zhang, Vincent Zheng, Qiaojun Wang, James Kwok, Qiang Yang, Ivan Marsic	Motivated by this, we propose to match data distributions in the Hilbert space, which, given a pre-defined empirical kernel map, can be formulated as aligning kernel matrices across domains.
161	A Local Algorithm for Finding Well-Connected Clusters	Zeyuan Allen Zhu, Silvio Lattanzi, Vahab Mirrokni	Motivated by applications of large-scale graph clustering, we study random-walk-based LOCAL algorithms whose running times depend only on the size of the output cluster, rather than the entire graph.
162	Efficient Multi-label Classification with Many Labels	Wei Bi, James Kwok	In this paper, we address this problem by selecting a small subset of class labels that can approximately span the original label space.
163	Spectral Compressed Sensing via Structured Matrix Completion	Yuxin Chen, Yuejie Chi	To address this problem, we develop a novel nonparametric algorithm, called enhanced matrix completion (EMaC), based on structured matrix completion.
164	Multi-Task Learning with Gaussian Matrix Generalized Inverse Gaussian Model	Ming Yang, Yingming Li, Zhongfei Zhang	In this paper, we study the multi-task learning problem with a new perspective of considering the structure of the residue error matrix and the low-rank approximation to the task covariance matrix simultaneously.
165	Simple Sparsification Improves Sparse Denoising Autoencoders in Denoising Highly Corrupted Images	Kyunghyun Cho	We propose that a simple sparsification of the latent representation found by the encoder improves denoising performance, when the DAE was trained with sparsity regularization.
166	On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions	Purushottam Kar, Bharath Sriperumbudur, Prateek Jain, Harish Karnick	In this paper, we study the generalization properties of online learning based stochastic methods for supervised learning problems where the loss function is dependent on more than one training sample (e.g., metric learning, ranking).
167	Non-Linear Stationary Subspace Analysis with Application to Video Classification	Mahsa Baktashmotlagh, Mehrtash Harandi, Abbas Bigdeli, Brian Lovell, Mathieu Salzmann	In this paper, we introduce Non-Linear Stationary Subspace Analysis: A method that overcomes this issue by explicitly separating the stationary parts of the video signal (i.e., the parts shared across all videos in one class), from its non-stationary parts (i.e., specific to individual videos).
168	Two-Sided Exponential Concentration Bounds for Bayes Error Rate and Shannon Entropy	Jean Honorio, Jaakkola Tommi	We provide a method that approximates the Bayes error rate and the Shannon entropy with high probability.
169	That was fast! Speeding up NN search of high dimensional distributions.	Emanuele Coviello, Adeel Mumtaz, Antoni Chan, Gert Lanckriet	The main technical contribution of the paper is a novel and efficient algorithm for deciding whether to explore nodes during backtracking, based on a variational approximation.
170	Entropic Affinities: Properties and Efficient Numerical Computation	Max Vladymyrov, Miguel Carreira-Perpinan	Hinton and Roweis (2003) introduced a way to set the scale individually for each point so that it has a distribution over neighbors with a desired perplexity, or effective number of neighbors.
171	Local Deep Kernel Learning for Efficient Non-linear SVM Prediction	Cijo Jose, Prasoon Goyal, Parv Aggrwal, Manik Varma	Our objective is to speed up non-linear SVM prediction while maintaining classification accuracy above an acceptable limit.
172	Temporal Difference Methods for the Variance of the Reward To Go	Aviv Tamar, Dotan Di Castro, Shie Mannor	In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward.
173	\proptoSVM for Learning with Label Proportions	Felix Yu, Dong Liu, Sanjiv Kumar, Jebara Tony, Shih-Fu Chang	We propose a new method called proportion-SVM, or \proptoSVM, which explicitly models the latent unknown instance labels together with the known group label proportions in a large-margin framework.
174	Parameter Learning and Convergent Inference for Dense Random Fields	Philipp Kraehenbuehl, Vladlen Koltun	In this paper, we present an efficient algorithm for learning parameters in dense random fields.
175	Loss-Proportional Subsampling for Subsequent ERM	Paul Mineiro, Nikos Karampatziakis	We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk.
176	Scalable Simple Random Sampling and Stratified Sampling	Xiangrui Meng	In this work, we describe a scalable simple random sampling algorithm, named ScaSRS, which uses probabilistic thresholds to decide on the fly whether to accept, reject, or wait-list an item independently of others.
177	Riemannian Similarity Learning	Li Cheng	We tackle this problem in a Riemannian optimization framework.
178	On Compact Codes for Spatially Pooled Features	Yangqing Jia, Oriol Vinyals, Trevor Darrell	In this paper we analyze the classification accuracy with respect to dictionary size by linking the encoding stage to kernel methods and \nystrom sampling, and obtain useful bounds on accuracy as a function of size.
179	Dynamic Covariance Models for Multivariate Financial Time Series	Yue Wu, Jose Miguel Hernandez-Lobato, Ghahramani Zoubin	To address these problems we introduce a novel dynamic model for time-changing covariances.
180	Revisiting the Nystrom method for improved large-scale machine learning	Alex Gittens, Michael Mahoney	Revisiting the Nystrom method for improved large-scale machine learning
181	Infinite Positive Semidefinite Tensor Factorization for Source Separation of Mixture Signals	Kazuyoshi Yoshii, Ryota Tomioka, Daichi Mochihashi, Masataka Goto	This paper presents a new class of tensor factorization called positive semidefinite tensor factorization (PSDTF) that decomposes a set of positive semidefinite (PSD) matrices into the convex combinations of fewer PSD basis matrices.
182	A Unified Robust Regression Model for Lasso-like Algorithms	Wenzhuo Yang, Huan Xu	We develop a unified robust linear regression model and show that it is equivalent to a general regularization framework to encourage sparse-like structure that contains group Lasso and fused Lasso as specific examples.
183	Quickly Boosting Decision Trees � Pruning Underachieving Features Early	Ron Appel, Thomas Fuchs, Piotr Dollar, Pietro Perona	We propose a principled approach to overcome this drawback.
184	On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance	Aditya Menon, Harikrishna Narasimhan, Shivani Agarwal, Sanjay Chawla	In this paper, we study consistency with respect to one such performance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establish that some simple methods that have been used in practice, such as applying an empirically determined threshold to a suitable class probability estimate or performing an empirically balanced form of risk minimization, are in fact consistent with respect to the AM (under mild conditions on the underlying distribution).
185	Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment	Jason Chuang, Sonal Gupta, Christopher Manning, Jeffrey Heer	We introduce a framework to support large-scale assessment of topical relevance.
186	Online Kernel Learning with a Near Optimal Sparsity Bound	Lijun Zhang, Jinfeng Yi, Rong Jin, Ming Lin, Xiaofei He	In this work, we focus on Online Sparse Kernel Learning that aims to online learn a kernel classifier with a bounded number of support vectors.
187	Spectral Learning of Hidden Markov Models from Dynamic and Static Data	Tzu-Kuo Huang, Jeff Schneider	We develop spectral learning algorithms for Hidden Markov Models that learn not only from time series, or dynamic data but also static data drawn independently from the HMM’s stationary distribution.
188	Analogy-preserving Semantic Embedding for Visual Object Categorization	Sung Ju Hwang, Kristen Grauman, Fei Sha	We propose to model \emphanalogies that reflect the relationships between multiple pairs of classes simultaneously, in the form “p is to q, as r is to s””.
189	Algebraic classifiers: a generic approach to fast cross-validation, online training, and parallel training	Michael Izbicki	But not all classifiers have an obvious structure, so we introduce the Free HomTrainer.
190	Factorial Multi-Task Learning : A Bayesian Nonparametric Approach	Sunil Gupta, Dinh Phung, Svetha Venkatesh	To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups.
191	Modeling Information Propagation with Survival Theory	Manuel Gomez-Rodriguez, Jure Leskovec, Bernhard Sch�lkopf	Here we apply survival theory to develop general additive and multiplicative risk models under which the network inference problems can be solved efficiently by exploiting their convexity.
192	Better Rates for Any Adversarial Deterministic MDP	Ofer Dekel, Elad Hazan	We devise a new algorithm that pushes the state-of-the-art forward in two ways: First, it attains a regret of O(T^2/3) with respect to the best fixed policy in hindsight, whereas the previous best regret bound was O(T^3/4).
193	ABC Reinforcement Learning	Christos Dimitrakakis, Nikolaos Tziortziotis	We introduce a simple, general framework for likelihood-free Bayesian reinforcement learning, through Approximate Bayesian Computation (ABC).
194	Sharp Generalization Error Bounds for Randomly-projected Classifiers	Robert Durrant, Ata Kaban	We derive sharp bounds on the generalization error of a generic linear classifier trained by empirical risk minimization on randomly-projected data.
195	On learning parametric-output HMMs	Aryeh Kontorovich, Boaz Nadler, Roi Weiss	We present a novel approach to learning an HMM whose outputs are distributed according to a parametric family.
196	LDA Topic Model with Soft Assignment of Descriptors to Words	Daphna Weinshall, Gal Levi, Dmitri Hanukaev	Here we extend the LDA model to deal with documents that are represented more naturally by bags of continuous descriptors.
197	On autoencoder scoring	Hanna Kamyshanska, Roland Memisevic	In this paper we show how an autoencoder can assign meaningful scores to data independently of training procedure and without reference to any probabilistic model, by interpreting it as a dynamical system.
198	Infinite Markov-Switching Maximum Entropy Discrimination Machines	Sotirios Chatzis	In this paper, we present a method that combines the merits of Bayesian nonparametrics, specifically stick-breaking priors, and large-margin kernel machines in the context of sequential data classification.
199	A PAC-Bayesian Approach for Domain Adaptation with Specialization to Linear Classifiers	Pascal Germain, Amaury Habrard, Fran�ois Laviolette, Emilie Morvant	We provide a first PAC-Bayesian analysis for domain adaptation (DA) which arises when the learning and test distributions differ.
200	Sparse PCA through Low-rank Approximations	Dimitris Papailiopoulos, Alexandros Dimakis, Stavros Korokythakis	We introduce a novel algorithm that computes the k-sparse principal component of a positive semidefinite matrix A.
201	Computation-Risk Tradeoffs for Covariance-Thresholded Regression	Dinah Shender, John Lafferty	We present a family of linear regression estimators that provides a fine-grained tradeoff between statistical accuracy and computational efficiency.
202	Exact Rule Learning via Boolean Compressed Sensing	Dmitry Malioutov, Kush Varshney	We propose an interpretable rule-based classification system based on ideas from Boolean compressed sensing. Furthermore we construct rule sets from these learned clauses using set covering and boosting.
203	Robust Sparse Regression under Adversarial Corruption	Yudong Chen, Constantine Caramanis, Shie Mannor	We consider three popular algorithms in the uncorrupted setting: Thresholding Regression, Lasso, and the Dantzig selector, and show that the counterparts obtained using the trimmed inner product are provably robust.
204	Optimization with First-Order Surrogate Functions	Julien Mairal	In this paper, we study optimization methods consisting of iteratively minimizing surrogates of an objective function.
205	Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation	Hema Koppula, Ashutosh Saxena	In this paper, we reason about these alternate possibilities by reasoning over multiple possible graph structures.
206	Consistency versus Realizable H-Consistency for Multiclass Classification	Phil Long, Rocco Servedio	We identify a property of a loss function, realizable consistency with respect to a restricted class of scoring functions, that accounts for this difference.
207	Feature Multi-Selection among Subjective Features	Sivan Sabato, Adam Kalai	We give theoretically-motivated “”feature multi-selection”” algorithms that choose, among a large set of candidate features, not only which features to judge but how many times to judge each one.
208	Domain Adaptation under Target and Conditional Shift	Kun Zhang, Bernhard Sch�lkopf, Krikamol Muandet, Zhikun Wang	We exploit importance reweighting or sample transformation to find the learning machine that works well on test data, and propose to estimate the weights or transformations by \it reweighting or transforming training data to reproduce the covariate distribution on the test domain.
209	Collective Stability in Structured Prediction: Generalization from One Example	Ben London, Bert Huang, Ben Taskar, Lise Getoor	In this paper, we identify a set of natural conditions – weak dependence, hypothesis complexity and a new measure, collective stability – that are sufficient for generalization from even a single example, without imposing an explicit generative model of the data.
210	Stable Coactive Learning via Perturbation	Karthik Raman, Thorsten Joachims, Pannaga Shivaswamy, Tobias Schnabel	User feedback takes the form of preferences, and recent work has introduced online algorithms that learn from this weak feedback.
211	Max-Margin Multiple-Instance Dictionary Learning	Xinggang Wang, Baoyuan Wang, Xiang Bai, Wenyu Liu, Zhuowen Tu	In this paper, we propose a multiple instance learning (MIL) strategy (along the line of weakly supervised learning) for dictionary learning.
212	Fast Semidifferential-based Submodular Function Optimization	Rishabh Iyer, Stefanie Jegelka, Jeff Bilmes	We present a practical and powerful new framework for both unconstrained and constrained submodular function optimization based on discrete semidifferentials (sub- and super-differentials).
213	Kernelized Bayesian Matrix Factorization	Mehmet G�nen, Suleiman Khan, Samuel Kaski	We extend kernelized matrix factorization with a fully Bayesian treatment and with an ability to work with multiple side information sources expressed as different kernels.
214	Learning the Structure of Sum-Product Networks	Robert Gens, Domingos Pedro	We propose the first algorithm for learning the structure of SPNs that takes full advantage of their expressiveness.
215	Quantile Regression for Large-scale Applications	Jiyan Yang, Xiangrui Meng, Michael Mahoney	Here, we present a randomized algorithm that runs in time that is nearly linear in the size of the input and that, with constant probability, computes a (1+ε) approximate solution to an arbitrary quantile regression problem.
216	Robust Regression on MapReduce	Xiangrui Meng, Michael Mahoney	In this work, we propose an efficient algorithm for solving strongly over-determined (m ≫n) robust \ell_p regression problems to moderate precision on MapReduce.
217	Infinitesimal Annealing for Training Semi-Supervised Support Vector Machines	Kohei Ogawa, Motoki Imamura, Ichiro Takeuchi, Masashi Sugiyama	In this paper, we go beyond this trade-off by proposing a novel training algorithm that efficiently performs annealing with an infinitesimal resolution.
218	One-Pass AUC Optimization	Wei Gao, Rong Jin, Shenghuo Zhu, Zhi-Hua Zhou	In this work, we focus on one-pass AUC optimization that requires only going through the training data once without storing the entire training dataset, where conventional online learning algorithms cannot be applied directly because AUC is measured by a sum of losses defined over pairs of instances from different classes.
219	Learning Convex QP Relaxations for Structured Prediction	Jeremy Jancsary, Sebastian Nowozin, Carsten Rother	We introduce a new large margin approach to discriminative training of intractable discrete graphical models.
220	Concurrent Reinforcement Learning from Customer Interactions	David Silver, Leonard Newnham, David Barker, Suzanne Weller, Jason McFall	In this paper, we explore applications in which a company interacts concurrently with many customers.
221	Saving Evaluation Time for the Decision Function in Boosting: Representation and Reordering Base Learner	Peng Sun, Jie Zhou	In this paper, we present a novel method for ordering.
222	Stability and Hypothesis Transfer Learning	Ilja Kuzborskij, Francesco Orabona	Particularly, we conduct a theoretical analysis of HTL by considering the algorithmic stability of a class of HTL algorithms based on Regularized Least Squares with biased regularization.
223	Fast Dual Variational Inference for Non-Conjugate Latent Gaussian Models	Mohammad Emtiyaz Khan, Aleksandr Aravkin, Michael Friedlander, Matthias Seeger	In this paper, we derive a novel dual variational inference approach, which exploits the convexity property of the VG approximations.
224	Modeling Temporal Evolution and Multiscale Structure in Networks	Tue Herlau, Morten M�rup, Mikkel Schmidt	We propose a model for temporally correlated multifurcating hierarchies in complex networks which jointly capture both effects.
225	Dependent Normalized Random Measures	Changyou Chen, Vinayak Rao, Wray Buntine, Yee Whye Teh	In this paper we propose two constructions of dependent normalized random measures, a class of nonparametric priors over dependent probability measures.
226	Fast Max-Margin Matrix Factorization with Data Augmentation	Minjie Xu, Jun Zhu, Bo Zhang	In this paper we present a probabilistic M3F model that admits a highly efficient Gibbs sampling algorithm through data augmentation.
227	Natural Image Bases to Represent Neuroimaging Data	Ashish Gupta, Murat Ayhan, Anthony Maida	The key technique in this paper is the use of cross-domain features to represent MRI data.
228	Breaking the Small Cluster Barrier of Graph Clustering	Nir Ailon, Yudong Chen, Huan Xu	This paper investigates graph clustering in the planted cluster model in the presence of \em small clusters.
229	Approximate Inference in Collective Graphical Models	Daniel Sheldon, Tao Sun, Akshat Kumar, Tom Dietterich	We study the problem of approximate inference in collective graphical models (CGMs), which were recently introduced to model the problem of learning and inference with noisy aggregate observations.
230	Scaling the Indian Buffet Process via Submodular Maximization	Colorado Reed, Ghahramani Zoubin	In this work, we use Kurihara & Wellings (2008)’s maximization-expectation framework to perform approximate MAP inference for linear-Gaussian latent feature models with an Indian Buffet Process (IBP) prior.
231	Mini-Batch Primal and Dual Methods for SVMs	Martin Takac, Avleen Bijral, Peter Richtarik, Nati Srebro	We address the issue of using mini-batches in stochastic optimization of SVMs.
232	The lasso, persistence, and cross-validation	Darren Homrighausen, Daniel McDonald	We consider the high-dimensional setting wherein the number of predictors p=n^α, α>0 grows with the number of observations.
233	Spectral Experts for Estimating Mixtures of Linear Regressions	Arun Tejasvi Chaganty, Percy Liang	In this paper, we develop a new computationally efficient and provably consistent estimator for the mixture of linear regressions, a simple instance of discriminative latent-variable models.
234	Distribution to Distribution Regression	Junier Oliva, Barnabas Poczos, Jeff Schneider	We analyze ’Distribution to Distribution regression’ where one is regressing a mapping where both the covariate (inputs) and response (outputs) are distributions.
235	Regularization of Neural Networks using DropConnect	Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, Rob Fergus	We introduce DropConnect, a generalization of DropOut, for regularizing large fully-connected layers within neural networks.
236	Gaussian Process Kernels for Pattern Discovery and Extrapolation	Andrew Wilson, Ryan Adams	We introduce simple closed form kernels that can be used with Gaussian processes to discover patterns and enable extrapolation.
237	Anytime Representation Learning	Zhixiang Xu, Matt Kusner, Gao Huang, Kilian Weinberger	We introduce Anytime Feature Representations (AFR), a novel algorithm that explicitly addresses this trade-off in the data representation rather than in the classifier.
238	Algorithms for Direct 0�1 Loss Optimization in Binary Classification	Tan Nguyen, Scott Sanner	In this paper, however, we do just that: we explore a variety of practical methods for direct (approximate) optimization of the 0–1 loss based on branch and bound search, combinatorial search, and coordinate descent on smooth, differentiable relaxations of 0–1 loss.
239	Top-k Selection based on Adaptive Sampling of Noisy Preferences	Robert Busa-Fekete, Balazs Szorenyi, Weiwei Cheng, Paul Weng, Eyke Huellermeier	We propose and formally analyze a general preference-based racing algorithm that we instantiate with three specific ranking procedures and corresponding sampling schemes.
240	The Extended Parameter Filter	Yusuf Bugra Erol, Lei Li, Bharath Ramsundar, Russell Stuart	In this paper, we demonstrate a connection between Storvik’s filter and a Kalman filter in parameter space and establish more general conditions under which Storvik’s filter works.
241	Exploiting Ontology Structures and Unlabeled Data for Learning	Nina Balcan, Avrim Blum, Yishay Mansour	We present and analyze a theoretical model designed to understand and explain the effectiveness of ontologies for learning multiple related tasks from primarily unlabeled data.
242	O(logT) Projections for Stochastic Optimization of Smooth and Strongly Convex Functions	Lijun Zhang, Tianbao Yang, Rong Jin, Xiaofei He	In this paper, we present a novel algorithm that aims to reduce the number of projections for stochastic optimization.
243	Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization	Krzysztof Dembczynski, Arkadiusz Jachnik, Wojciech Kotlowski, Willem Waegeman, Eyke Huellermeier	We introduce a novel plug-in rule algorithm that estimates all parameters required for a Bayes-optimal prediction via a set of multinomial regression models, and we compare this algorithm with SSVMs in terms of computational complexity and statistical consistency.
244	On the importance of initialization and momentum in deep learning	Ilya Sutskever, James Martens, George Dahl, Geoffrey Hinton	In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization.
245	A non-IID Framework for Collaborative Filtering with Restricted Boltzmann Machines	Kostadin Georgiev, Preslav Nakov	We propose a framework for collaborative filtering based on Restricted Boltzmann Machines (RBM), which extends previous RBM-based approaches in several important directions.
246	Intersecting singularities for multi-structured estimation	Emile Richard, Francis BACH, Jean-Philippe Vert	We address the problem of designing a convex nonsmooth regularizer encouraging multiple structural effects simultaneously.
247	Structure Discovery in Nonparametric Regression through Compositional Kernel Search	David Duvenaud, James Lloyd, Roger Grosse, Joshua Tenenbaum, Ghahramani Zoubin	We present a method for searching over this space of structures which mirrors the scientific discovery process.
248	Copy or Coincidence? A Model for Detecting Social Influence and Duplication Events	Lisa Friedland, David Jensen, Michael Lavine	In this paper, we analyze the task of inferring rare links between pairs of entities that seem too similar to have occurred by chance.
249	Smooth Operators	Steffen Grunewalder, Gretton Arthur, John Shawe-Taylor	We develop a generic approach to form smooth versions of basic mathematical operations like multiplication, composition, change of measure, and conditional expectation, among others.
250	The Cross-Entropy Method Optimizes for Quantiles	Sergiu Goschin, Ari Weinstein, Michael Littman	We provide a variant of CE (Proportional CE) that effectively optimizes the expected value.
251	Topic Discovery through Data Dependent and Random Projections	Weicong Ding, Mohammad Hossein Rohban, Prakash Ishwar, Venkatesh Saligrama	We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns.
252	Bayesian Learning of Recursively Factored Environments	Marc Bellemare, Joel Veness, Michael Bowling	In this paper we introduce the class of recursively decomposable factorizations, and show how exact Bayesian inference can be used to efficiently guarantee predictive performance close to the best factorization in this class.
253	Selective sampling algorithms for cost-sensitive multiclass prediction	Alekh Agarwal	In this paper, we study the problem of active learning for cost-sensitive multiclass classification.
254	The Bigraphical Lasso	Alfredo Kalaitzis, John Lafferty, Neil D. Lawrence, Shuheng Zhou	We introduce the bigraphical lasso, an estimator for precision matrices of matrix-normals based on the Cartesian product of graphs.
255	Almost Optimal Exploration in Multi-Armed Bandits	Zohar Karnin, Tomer Koren, Oren Somekh	We present two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters.
256	Deep Canonical Correlation Analysis	Galen Andrew, Raman Arora, Jeff Bilmes, Karen Livescu	We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated.
257	Consistency of Online Random Forests	Misha Denil, David Matheson, Nando Freitas	In this paper, we take a step towards narrowing this gap by providing a consistency result for online random forests.
258	Sparse Gaussian Conditional Random Fields: Algorithms, Theory, and Application to Energy Forecasting	Matt Wytock, Zico Kolter	In this paper, we make three contributions: 1) we develop a second-order active-set method which is several orders of magnitude faster that previously proposed optimization approaches for this problem 2) we analyze the model from a theoretical standpoint, improving upon past bounds with convergence rates that depend logarithmically on the data dimension, and 3) we apply the method to large-scale energy forecasting problems, demonstrating state-of-the-art performance on two real-world tasks.
259	Fast Image Tagging	Minmin Chen, Alice Zheng, Kilian Weinberger	We propose FastTag, a novel algorithm that achieves comparable results with two simple linear mappings that are co-regularized in a joint convex loss function.
260	Expensive Function Optimization with Stochastic Binary Outcomes	Matthew Tesch, Jeff Schneider, Howie Choset	In this work we define the stochastic binary optimization problem and propose an approach using an adaptation of Gaussian Processes for classification that presents a Bayesian optimization framework for this problem.
261	Multiple-source cross-validation	Krzysztof Geras, Charles Sutton	We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting.
262	Learning Triggering Kernels for Multi-dimensional Hawkes Processes	Ke Zhou, Hongyuan Zha, Le Song	In this paper, we study these critical questions in social network analysis quantitatively under the framework of multi-dimensional Hawkes processes.
263	On the difficulty of training recurrent neural networks	Razvan Pascanu, Tomas Mikolov, Yoshua Bengio	In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective.
264	Maxout Networks	Ian Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio	We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout.
265	Predictable Dual-View Hashing	Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Daume Hal, Larry Davis	We propose a Predictable Dual-View Hashing (PDH) algorithm which embeds proximity of data samples in the original spaces.
266	Deep learning with COTS HPC systems	Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, Ng Andrew	In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI.
267	Nonparametric Mixture of Gaussian Processes with Constraints	James Ross, Jennifer Dy	Motivated by the need to identify new and clinically relevant categories of lung disease, we propose a novel clustering with constraints method using a Dirichlet process mixture of Gaussian processes in a variational Bayesian nonparametric framework.
268	Scale Invariant Conditional Dependence Measures	Sashank J Reddi, Barnabas Poczos	In this paper we develop new dependence and conditional dependence measures and provide their estimators.
269	Learning Policies for Contextual Submodular Prediction	Stephane Ross, Jiaji Zhou, Yisong Yue, Debadeepta Dey, Drew Bagnell	We propose a simple, efficient, and provably near-optimal approach to optimizing such prediction problems based on no-regret learning.
270	Manifold Preserving Hierarchical Topic Models for Quantization and Approximation	Minje Kim, Paris Smaragdis	We present two complementary topic models to address the analysis of mixture data lying on manifolds.
271	Safe Screening of Non-Support Vectors in Pathwise SVM Computation	Kohei Ogawa, Yoshiki Suzuki, Ichiro Takeuchi	In this paper, we claim that some of the non-support vectors (non-SVs) that have no influence on the classifier can be screened out prior to the training phase in pathwise SVM computation scenario, in which one is asked to train a sequence (or path) of SVM classifiers for different regularization parameters.
272	Cost-sensitive Multiclass Classification Risk Bounds	Bernardo �vila Pires, Csaba Szepesvari, Mohammad Ghavamzadeh	In this paper, we strengthen these results by showing how the 0-1 excess loss of a predictor can be upper bounded as a function of the excess loss of the predictor measured using the convex surrogate.
273	Semi-supervised Clustering by Input Pattern Assisted Pairwise Similarity Matrix Completion	Jinfeng Yi, Lijun Zhang, Rong Jin, Qi Qian, Anil Jain	Many semi-supervised clustering algorithms have been proposed to improve the clustering accuracy by effectively exploring the available side information that is usually in the form of pairwise constraints.
274	Learning the beta-Divergence in Tweedie Compound Poisson Matrix Factorization Models	Umut Simsekli, Ali Taylan Cemgil, Yusuf Kenan Yilmaz	In this study, we derive algorithms for estimating mixed β-divergences.
275	Fast algorithms for sparse principal component analysis based on Rayleigh quotient iteration	Volodymyr Kuleshov	We introduce new algorithms for sparse principal component analysis (sPCA), a variation of PCA which aims to represent data in a sparse low-dimensional basis.
276	Nested Chinese Restaurant Franchise Process: Applications to User Tracking and Document Modeling	Amr Ahmed, Liangjie Hong, Alexander Smola	We introduce the nested Chinese Restaurant Franchise Process as a means to obtain both hierarchical tree-structured representations for objects, akin to (but more general than) the nested Chinese Restaurant Process while sharing their structure akin to the Hierarchical Dirichlet Process.
277	Tree-Independent Dual-Tree Algorithms	Ryan Curtin, William March, Parikshit Ram, David Anderson, Alexander Gray, Charles Isbell	We introduce a four-part logical split: the tree, the traversal, the point-to-point base case, and the pruning rule.
278	Multilinear Multitask Learning	Bernardino Romera-Paredes, Hane Aung, Nadia Bianchi-Berthouze, Massimiliano Pontil	We present two learning methods; one is an adapted convex relaxation method used in the context of tensor completion.
279	Online Learning under Delayed Feedback	Pooria Joulani, Andras Gyorgy, Csaba Szepesvari	In this paper we provide a systematic study of the topic, and analyze the effect of delay on the regret of online learning algorithms.
280	Adaptive Hamiltonian and Riemann Manifold Monte Carlo	Ziyu Wang, Shakir Mohamed, Nando Freitas	In this paper we address the widely-experienced difficulty in tuning Hamiltonian-based Monte Carlo samplers.
281	Coco-Q: Learning in Stochastic Games with Side Payments	Eric Sodomka, Elizabeth Hilliard, Michael Littman, Amy Greenwald	In this paper, we show that coco values can also be defined for stochastic games and can be learned using a simple variant of Q-learning that is provably convergent. We provide a set of examples showing how the strategies learned by the Coco-Q algorithm relate to those learned by existing multiagent Q-learning algorithms.
282	On A Nonlinear Generalization of Sparse Coding and Dictionary Learning	Jeffrey Ho, Yuchen Xie, Baba Vemuri	This paper proposes a novel framework for sparse coding and dictionary learning for data on a Riemannian manifold, and it shows that the existing sparse coding and dictionary learning methods can be considered as special (Euclidean) cases of the more general framework proposed here.
283	Estimation of Causal Peer Influence Effects	Panos Toulis, Edward Kao	In this paper, we extend potential outcomes to allow for interference, we introduce well-defined causal estimands of peer-influence, and we develop two estimation procedures: a frequentist procedure relying on a sequential randomization design that requires knowledge of the network but operates under complicated response functions, and a Bayesian procedure which accounts for network uncertainty but relies on a linear response assumption to increase estimation precision.