Paper Digest: ICML 2013 Highlights
To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.
Paper Digest Team
TABLE 1: ICML 2013 Papers
|An Optimal Policy for Target Localization with Application to Electron Microscopy
|Raphael Sznitman, Aurelien Lucchi, Peter Frazier, Bruno Jedynak, Pascal Fua
|Within a Bayesian framework, we study the problem of minimizing an objective that combines the entropy of the posterior distribution with the cost of the questions asked.
|Domain Generalization via Invariant Feature Representation
|Krikamol Muandet, David Balduzzi, Bernhard Sch�lkopf
|We propose Domain-Invariant Component Analysis (DICA), a kernel-based optimization algorithm that learns an invariant transformation by minimizing the dissimilarity across domains, whilst preserving the functional relationship between input and output variables.
|A Spectral Learning Approach to Range-Only SLAM
|Byron Boots, Geoff Gordon
|We present a novel spectral learning algorithm for simultaneous localization and mapping (SLAM) from range data with known correspondences.
|Near-Optimal Bounds for Cross-Validation via Loss Stability
|Ravi Kumar, Daniel Lokshtanov, Sergei Vassilvitskii, Andrea Vattani
|Our work thus quantitatively improves the current best bounds on cross-validation.
|Sparsity-Based Generalization Bounds for Predictive Sparse Coding
|Nishant Mehta, Alexander Gray
|We establish the first generalization error bounds for predictive sparse coding, in the overcomplete setting, where the number of features k exceeds the original dimensionality d.
|Sparse Uncorrelated Linear Discriminant Analysis
|Xiaowei Zhang, Delin Chu
|In this paper, we develop a novel approach for sparse uncorrelated linear discriminant analysis (ULDA).
|Block-Coordinate Frank-Wolfe Optimization for Structural SVMs
|Simon Lacoste-Julien, Martin Jaggi, Mark Schmidt, Patrick Pletscher
|We propose a randomized block-coordinate variant of the classic Frank-Wolfe algorithm for convex optimization with block-separable constraints.
|Fast Probabilistic Optimization from Noisy Gradients
|Generalising recent results, this paper constructs a nonparametric Bayesian quasi-Newton algorithm that learns gradient and Hessian from noisy evaluations of the gradient.
|Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
|Ohad Shamir, Tong Zhang
|In this paper, we investigate the performance of SGD \emphwithout such smoothness assumptions, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy.
|Stochastic Alternating Direction Method of Multipliers
|Hua Ouyang, Niao He, Long Tran, Alexander Gray
|In this paper, we present a stochastic setting for optimization problems with non-smooth composite objective functions.
|Noisy Sparse Subspace Clustering
|Yu-Xiang Wang, Huan Xu
|This paper considers the problem of subspace clustering under noise.
|Parallel Markov Chain Monte Carlo for Nonparametric Mixture Models
|Sinead Williamson, Avinava Dubey, Eric Xing
|In this paper, we describe auxiliary variable representations for the Dirichlet process and the hierarchical Dirichlet process that allow us to perform MCMC using the correct equilibrium distribution, in a distributed manner.
|Risk Bounds and Learning Algorithms for the Regression Approach to Structured Output Prediction
|S�bastien Gigu�re, Fran�ois Laviolette, Mario Marchand, Khadidja Sylla
|We provide rigorous guarantees for the regression approach to structured output prediction.
|Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures
|James Bergstra, Daniel Yamins, David Cox
|In this work, we propose a meta-modeling approach to support automated hyperparameter optimization, with the goal of providing practical tools that replace hand-tuning with a reproducible and unbiased optimization process.
|Gibbs Max-Margin Topic Models with Fast Sampling Algorithms
|Jun Zhu, Ning Chen, Hugh Perkins, Bo Zhang
|This paper presents Gibbs max-margin supervised topic models by minimizing an expected margin loss, an upper bound of the existing margin loss derived from an expected prediction rule.
|Cost-Sensitive Tree of Classifiers
|Zhixiang Xu, Matt Kusner, Kilian Weinberger, Minmin Chen
|In this paper, we address the challenge of balancing test-time cost and the classifier accuracy in a principled fashion.
|Learning Hash Functions Using Column Generation
|Xi Li, Guosheng Lin, Chunhua Shen, Anton Hengel, Anthony Dick
|In this work, we propose a column generation based method for learning data-dependent hash functions on the basis of proximity comparison information.
|Combinatorial Multi-Armed Bandit: General Framework and Applications
|Wei Chen, Yajun Wang, Yang Yuan
|We define a general framework for a large class of combinatorial multi-armed bandit (CMAB) problems, where simple arms with unknown istributions form \em super arms.
|Near-optimal Batch Mode Active Learning and Adaptive Submodular Optimization
|Yuxin Chen, Andreas Krause
|We consider batch mode active learning and more general information-parallel stochastic optimization problems that exhibit adaptive submodularity, a natural diminishing returns condition.
|Convex formulations of radius-margin based Support Vector Machines
|Huyen Do, Alexandros Kalousis
|In this paper we present two novel algorithms: R-SVM_μ^+—a SVM radius-margin based feature selection algorithm, and R-SVM^+ — a metric learning-based SVM.
|Modelling Sparse Dynamical Systems with Compressed Predictive State Representations
|William L. Hamilton, Mahdi Milani Fard, Joelle Pineau
|We present a new algorithm, called Compressed Predictive State Representation (CPSR), for learning models of high-dimensional partially observable uncontrolled dynamical systems from small sample sets.
|A Machine Learning Framework for Programming by Example
|Aditya Menon, Omer Tamuz, Sumit Gulwani, Butler Lampson, Adam Kalai
|We show how machine learning can be used to speed up this seemingly hopeless search problem, by learning weights that relate textual features describing the provided input-output examples to plausible sub-components of a program.
|Discriminatively Activated Sparselets
|Ross Girshick, Hyun Oh Song, Trevor Darrell
|In this paper we describe a new training framework that learns which sparselets to activate in order to optimize a discriminative objective, leading to larger speedup factors with no decrease in task performance.
|The Pairwise Piecewise-Linear Embedding for Efficient Non-Linear Classification
|Ofir Pele, Ben Taskar, Amir Globerson, Michael Werman
|In this work we propose a non-linear feature map that is both very efficient, but at the same time highly expressive.
|Fixed-Point Model For Structured Labeling
|Quannan Li, Jingdong Wang, David Wipf, Zhuowen Tu
|In this paper, we propose a simple but effective solution to the structured labeling problem: a fixed-point model.
|Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation
|Boqing Gong, Kristen Grauman, Fei Sha
|In this paper, we propose a novel approach for learning such features.
|Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization
|Abhishek Kumar, Vikas Sindhwani, Prabhanjan Kambadur
|In this paper, we reformulate the separable NMF problem as that of finding the extreme rays of the conical hull of a finite set of vectors.
|Principal Component Analysis on non-Gaussian Dependent Data
|Fang Han, Han Liu
|In this paper, we analyze the performance of a semiparametric principal component analysis named Copula Component Analysis (COCA) (Han & Liu, 2012) when the data are dependent.
|Learning Linear Bayesian Networks with Latent Variables
|Animashree Anandkumar, Daniel Hsu, Adel Javanmard, Sham Kakade
|This work considers the problem of learning linear Bayesian networks when some of the variables are unobserved.
|Multiple Identifications in Multi-Armed Bandits
|S�ebastian Bubeck, Tengyao Wang, Nitin Viswanathan
|We study the problem of identifying the top m arms in a multi-armed bandit game.
|Learning Optimally Sparse Support Vector Machines
|Andrew Cotter, Shai Shalev-Shwartz, Nati Srebro
|We show how to train SVMs with an optimal guarantee on the number of support vectors (up to constants), and with sample complexity and training runtime bounds matching the best known for kernel SVM optimization (i.e. without any additional asymptotic cost beyond standard SVM training).
|Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks
|Creighton Heaukulani, Zoubin Ghahramani
|In this paper, we introduce a new probabilistic model for capturing this phenomenon, which we call latent feature propagation, in social networks.
|Efficient Sparse Group Feature Selection via Nonconvex Optimization
|Shuo Xiang, Xiaoshen Tong, Jieping Ye
|In this paper, we expand a nonconvex paradigm to sparse group feature selection, which is motivated by applications that require identifying the underlying group structure and performing feature selection simultaneously.
|Domain Adaptation for Sequence Labeling Tasks with a Probabilistic Language Adaptation Model
|Min Xiao, Yuhong Guo
|In this paper, we propose to address the problem of domain adaptation for sequence labeling tasks via distributed representation learning by using a log-bilinear language adaptation model.
|Maximum Variance Correction with Application to A* Search
|Wenlin Chen, Kilian Weinberger, Yixin Chen
|In this paper we introduce Maximum Variance Correction (MVC), which finds large-scale feasible solutions to Maximum Variance Unfolding (MVU) by post-processing embeddings from any manifold learning algorithm.
|Adaptive Sparsity in Gaussian Graphical Models
|Eleanor Wong, Suyash Awate, P. Thomas Fletcher
|In this paper, we introduce a parameter-free method for estimating a precision matrix with sparsity that adapts to the data automatically.
|Average Reward Optimization Objective In Partially Observable Domains
|Yuri Grinberg, Doina Precup
|In this paper, we show that PSRs have a well-behaved stationary distribution, which is a rational function of policy parameters.
|Feature Selection in High-Dimensional Classification
|Mladen Kolar, Han Liu
|In this paper, we bridge this gap by providing sharp sufficient conditions for consistent variable selection using the ROAD estimator (Fan et al., 2010).
|Harsh Pareek, Pradeep Ravikumar
|In this paper, we consider the problem of “boosting” human learners to extend the learning ability of human learners and achieve improved performance on tasks which individual humans find difficult.
|Efficient Dimensionality Reduction for Canonical Correlation Analysis
|Haim Avron, Christos Boutsidis, Sivan Toledo, Anastasios Zouzias
|We present a fast algorithm for approximate Canonical Correlation Analysis (CCA).
|Parsing epileptic events using a Markov switching process model for correlated time series
|Drausin Wulsin, Emily Fox, Brian Litt
|A goal of our work is to parse these complex epileptic events into distinct dynamic regimes.
|Optimal rates for stochastic convex optimization under Tsybakov noise condition
|Aaditya Ramdas, Aarti Singh
|We focus on the problem of minimizing a convex function f over a convex set S given T queries to a stochastic first order oracle.
|A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning
|Arash Afkanpour, Andr�s Gy�rgy, Csaba Szepesvari, Michael Bowling
|We propose a randomized version of the mirror descent algorithm to overcome this issue, under the objective of minimizing the group p-norm penalized empirical risk.
|Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery
|Yudong Chen, Constantine Caramanis
|In this paper we develop a simple variant of orthogonal matching pursuit (OMP) for precisely this setting.
|Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method
|We propose two types of online variants of ADMM, which correspond to online proximal gradient descent and regularized dual averaging respectively.
|A New Frontier of Kernel Design for Structured Data
|In this paper, we shed light on a novel important class of kernels within the framework: We give a mathematical characterization of the class, show a parametric method to optimize kernels of the class to specific problems, based on this characterization, and present some experimental results, which show the new kernels are promising in both accuracy and efficiency.
|Learning with Marginalized Corrupted Features
|Laurens Maaten, Minmin Chen, Stephen Tyree, Kilian Weinberger
|We propose to corrupt training examples with noise from known distributions within the exponential family and present a novel learning algorithm, called marginalized corrupted features (MCF), that trains robust predictors by minimizing the expected value of the loss function under the corrupting distribution – essentially learning with infinitely many (corrupted) training examples.
|Approximation properties of DBNs with binary hidden units and real-valued visible units
|Oswin Krause, Asja Fischer, Tobias Glasmachers, Christian Igel
|We analyze the approximation properties of DBNs with two layers of binary hidden units and visible units with conditional distributions from the exponential family.
|Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization
|We present a new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, and discuss the broad application areas of this approach.
|General Functional Matrix Factorization Using Gradient Boosting
|Tianqi Chen, Hang Li, Qiang Yang, Yong Yu
|In this paper, we study the problem of utilizing auxiliary information as features of factorization and propose formalizing the problem as general functional matrix factorization, whose model includes conventional matrix factorization models as its special cases.
|Iterative Learning and Denoising in Convolutional Neural Associative Memories
|Amin Karbasi, Amir Hesam Salavati, Amin Shokrollahi
|More specifically, we devise an iterative algorithm that learns the redundancy among the patterns.
|Scaling Multidimensional Gaussian Processes using Projected Additive Approximations
|Elad GilboaYunus Saat�i, John Cunningham, Elad Gilboa
|This paper introduces and tests a novel method of projected additive approximation to multidimensional GPs.
|Active Learning for Multi-Objective Optimization
|Marcela Zuluaga, Guillaume Sergent, Andreas Krause, Markus P�schel
|To address this challenge, we propose the Pareto Active Learning (PAL) algorithm, which intelligently samples the design space to predict the Pareto-optimal set.
|A Generalized Kernel Approach to Structured Output Learning
|Hachem Kadri, Mohammad Ghavamzadeh, Philippe Preux
|We study the problem of structured output learning from a regression perspective.
|Efficient Active Learning of Halfspaces: an Aggressive Approach
|Alon Gonen, Sivan Sabato, Shai Shalev-Shwartz
|We revisit the aggressive approach for active learning in the realizable case, and show that it can be made efficient and practical, while also having theoretical guarantees under reasonable assumptions.
|Enhanced statistical rankings via targeted data collection
|Braxton Osting, Christoph Brune, Stanley Osher
|This reduction of the data collection problem to a spectral graph-theoretic question is one of the primary contributions of this work.
|Online Feature Selection for Model-based Reinforcement Learning
|Trung Nguyen, Zhuoru Li, Tomi Silander, Tze Yun Leong
|We propose a new framework for learning the world dynamics of feature-rich environments in model-based reinforcement learning.
|ELLA: An Efficient Lifelong Learning Algorithm
|Paul Ruvolo, Eric Eaton
|In this paper, we develop a method for online multi-task learning in the lifelong learning setting.
|A Structural SVM Based Approach for Optimizing Partial AUC
|Harikrishna Narasimhan, Shivani Agarwal
|In this paper, we develop a structural SVM framework for directly optimizing the partial AUC between any two false positive rates.
|Convex Relaxations for Learning Bounded-Treewidth Decomposable Graphs
|K. S. Sesh Kumar, Francis Bach
|In this paper, we pose it as a combinatorial optimization problem, which is then relaxed to a convex optimization problem that involves searching over the forest and hyperforest polytopes with special structures.
|Adaptive Task Assignment for Crowdsourced Classification
|Chien-Ju Ho, Shahin Jabbari, Jennifer Wortman Vaughan
|We investigate the problem of task assignment and label inference for heterogeneous classification tasks.
|Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning
|Odalric-Ambrym Maillard, Phuong Nguyen, Ronald Ortner, Daniil Ryabko
|We propose an algorithm whose regret after T time steps is O(\sqrtT), with all constants reasonably small.
|Better Mixing via Deep Representations
|Yoshua Bengio, Gregoire Mesnil, Yann Dauphin, Salah Rifai
|To better understand this, we propose a secondary conjecture: the higher-level samples fill more uniformly the space they occupy and the high-density manifolds tend to unfold when represented at higher levels.
|Online Latent Dirichlet Allocation with Infinite Vocabulary
|Ke Zhai, Jordan Boyd-Graber
|Topic models based on latent Dirichlet allocation (LDA) assume a predefined vocabulary a priori.
|Characterizing the Representer Theorem
|Yaoliang Yu, Hao Cheng, Dale Schuurmans, Csaba Szepesvari
|In this paper we completely characterize the necessary and sufficient conditions on the regularizer that ensure the representer theorem holds.
|Dynamical Models and tracking regret in online convex programming
|Eric Hall, Rebecca Willett
|This paper describes a new online convex optimization method which incorporates a family of candidate dynamical models and establishes novel tracking regret bounds that scale with comparator’s deviation from the best dynamical model in this family.
|Large-Scale Bandit Problems and KWIK Learning
|Jacob Abernethy, Kareem Amin, Michael Kearns, Moez Draief
|We provide such results in both the standard parametric MAB setting, as well as for a new model in which the action space is finite but growing with time.
|Vanishing Component Analysis
|Roi Livni, David Lehavi, Sagi Schein, Hila Nachliely, Shai Shalev-Shwartz, Amir Globerson
|Here we describe and analyze an efficient procedure that constructs a set of generators of a vanishing ideal.
|Learning an Internal Dynamics Model from Control Demonstration
|Matthew Golub, Steven Chase, Byron Yu
|Here, we consider the problem of learning the subject’s internal model from demonstrations of control and knowledge of task goals.
|Robust Structural Metric Learning
|Daryl Lim, Gert Lanckriet, Brian McFee
|In this paper, we present an efficient and robust structural metric learning algorithm which enforces group sparsity on the learned transformation, while optimizing for structured ranking output prediction.
|Constrained fractional set programs and their application in local clustering and community detection
|Thomas B�hler, Shyam Sundar Rangapuram, Simon Setzer, Matthias Hein
|In this paper we show that every constrained minimization problem of a ratio of non-negative set functions allows a tight relaxation into an unconstrained continuous optimization problem.
|Efficient Semi-supervised and Active Learning of Disjunctions
|Nina Balcan, Christopher Berlind, Steven Ehrlich, Yingyu Liang
|We provide efficient algorithms for learning disjunctions in the semi-supervised setting under a natural regularity assumption introduced by (Balcan & Blum, 2005).
|Convex Adversarial Collective Classification
|MohamadAli Torkamani, Daniel Lowd
|In this paper, we present a novel method for robustly performing collective classification in the presence of a malicious adversary that can modify up to a fixed number of binary-valued attributes.
|Rounding Methods for Discrete Linear Classification
|Yann Chevaleyre, Fr�d�erick Koriche, Jean-daniel Zucker
|In this paper, the learning task is cast as combinatorial optimization problem: given a set of positive and negative feature vectors in the Euclidean space, the goal is to find a discrete linear function that minimizes the cumulative hinge loss of this training set.
|Mixture of Mutually Exciting Processes for Viral Diffusion
|Shuang-Hong Yang, Hongyuan Zha
|This paper shows that these two tasks can be addressed simultaneously with a probabilistic model involving a mixture of mutually exciting point processes.
|Gaussian Process Vine Copulas for Multivariate Dependence
|David Lopez-Paz, Jose Miguel Hern�ndez-Lobato, Ghahramani Zoubin
|In this paper, we relax this assumption by discovering the latent functions that specify the shape of a conditional copula given its conditioning variables.
|Stochastic Simultaneous Optimistic Optimization
|Michal Valko, Alexandra Carpentier, R�mi Munos
|We study the problem of global maximization of a function f given a finite number of evaluations perturbed by noise.
|Toward Optimal Stratification for Stratified Monte-Carlo Integration
|Alexandra Carpentier, R�mi Munos
|In this paper we provide an algorithm \em Monte-Carlo Upper-Lower Confidence Bound that selects online, among a large class of partitions, the partition that provides a near-optimal trade-off, and allocates the samples almost optimally on this partition.
|A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems
|Pinghua Gong, Changshui Zhang, Zhaosong Lu, Jianhua Huang, Jieping Ye
|In this paper, we propose a General Iterative Shrinkage and Thresholding (GIST) algorithm to solve the nonconvex optimization problem for a large class of non-convex penalties.
|Thurstonian Boltzmann Machines: Learning from Multiple Inequalities
|Truyen Tran, Dinh Phung, Svetha Venkatesh
|We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time.
|A Variational Approximation for Topic Modeling of Hierarchical Corpora
|Do-kyum Kim, Geoffrey Voelker, Lawrence Saul
|We study the problem of topic modeling in corpora whose documents are organized in a multi-level hierarchy.
|Forecastable Component Analysis
|I introduce Forecastable Component Analysis (ForeCA), a novel dimension reduction technique for temporally dependent signals.
|Ellipsoidal Multiple Instance Learning
|Gabriel Krummenacher, Cheng Soon Ong, Joachim Buhmann
|We propose a large margin method for asymmetric learning with ellipsoids, called eMIL, suited to multiple instance learning (MIL). We introduce a MIL dataset from a real world application of detecting wheel defects from multiple partial observations, and show that eMIL outperforms competing approaches.
|Local Low-Rank Matrix Approximation
|Joonseok Lee, Seungyeon Kim, Guy Lebanon, Yoram Singer
|We propose a new matrix approximation model where we assume instead that the matrix is locally of low-rank, leading to a representation of the observed matrix as a weighted sum of low-rank matrices.
|Generic Exploration and K-armed Voting Bandits
|Tanguy Urvoy, Fabrice Clerot, Raphael F�raud, Sami Naamane
|We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits.
|A unifying framework for vector-valued manifold regularization and multi-view learning
|Minh H� Quang, Loris Bazzani, Vittorio Murino
|This paper presents a general vector-valued reproducing kernel Hilbert spaces (RKHS) formulation for the problem of learning an unknown functional dependency between a structured input space and a structured output space, in the Semi-Supervised Learning setting.
|Learning Connections in Financial Time Series
|Gartheeban Ganeshapillai, John Guttag, Andrew Lo
|We present a machine learning-based method to build a connectedness matrix to address the shortcomings of correlation in capturing events such as large losses.
|Fast dropout training
|Sida Wang, Christopher Manning
|Based on an examination of the implied objective function of dropout training, we show how to do fast dropout training by sampling from or integrating a Gaussian approximation, instead of doing Monte Carlo optimization of this objective.
|Scalable Optimization of Neighbor Embedding for Visualization
|Zhirong Yang, Jaakko Peltonen, Samuel Kaski
|We demonstrate that the obvious approach of subsampling produces inferior results and propose a generic approximated optimization technique that reduces the NE optimization cost to O(n log n).
|Precision-recall space to correct external indices for biclustering
|Blaise Hanczar, Mohamed Nadif
|All these algorithms aim to obtain coherent biclusters and it is crucial to have a reliable procedure for their validation.
|Sharon Wulff, Ruth Urner, Shai Ben-David
|We propose a natural cost function for the bi-clustering task, the monochromatic cost.
|Gated Autoencoders with Tied Input Weights
|Droniou Alain, Sigaud Olivier
|In this work, we extend the mathematical study of (Memisevic, 2012b) to show that it is possible to use a unique projection for both images in a way that turns intermediate layers as spectrum encoders of transformations.
|Strict Monotonicity of Sum of Squares Error and Normalized Cut in the Lattice of Clusterings
|Here we study monotonicity not just on the minimizers but on the entire clustering lattice.
|Transition Matrix Estimation in High Dimensional Time Series
|Fang Han, Han Liu
|In this paper, we propose a new method in estimating transition matrices of high dimensional vector autoregressive (VAR) models.
|Label Partitioning For Sublinear Ranking
|Jason Weston, Ameesh Makadia, Hector Yee
|We present a general approach for converting an algorithm which has linear time in the size of the set to a sublinear one via label partitioning.
|Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing
|Huayan Wang, Koller Daphne
|In this paper, we show a “unified” message passing algorithm that: (a) subsumes MPLP, MSD, and TRW-S as special cases when applied to their respective choices of dual objective and blocks, and (b) is able to perform BCD under much more flexible choices of blocks (including very large blocks) as well as the dual objective itself (that arise from an arbitrary dual decomposition).
|Collaborative hyperparameter tuning
|R�mi Bardenet, M�ty�s Brendel, Bal�zs K�gl, Mich�le Sebag
|In this paper, we propose a generic method to incorporate knowledge from previous experiments when simultaneously tuning a learning algorithm on new problems at hand.
|SADA: A General Framework to Support Robust Causation Discovery
|Ruichu Cai, Zhenjie Zhang, Zhifeng Hao
|Motivated by the observations of the local sparsity properties on causal structures, we propose a general Split-and-Merge strategy, named SADA, to enhance the scalability of a wide class of causality discovery algorithms.
|Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines
|Kihyuk Sohn, Guanyu Zhou, Chansoo Lee, Honglak Lee
|To address this problem, we propose a point-wise gated Boltzmann machine, a unified generative model that combines feature learning and feature selection.
|Sequential Bayesian Search
|Zheng Wen, Branislav Kveton, Brian Eriksson, Sandilya Bhamidipati
|We propose an efficient learning algorithm for solving the problem, sequential Bayesian search (SBS), and prove that it is Bayesian optimal.
|Sparse projections onto the simplex
|Anastasios Kyrillidis, Stephen Becker, Volkan Cevher, Christoph Koch
|Most learning methods with rank or sparsity constraints use convex relaxations, which lead to optimization with the nuclear norm or the \ell_1-norm.
|Modeling Musical Influence with Topic Models
|Uri Shalit, Daphna Weinshall, Gal Chechik
|In this work we approach the question of influence by applying topic-modeling tools (Blei & Lafferty, 2006; Gerrish & Blei, 2010) to a dataset of 24941 songs by 9222 artists, from the years 1922 to 2010.
|Subtle Topic Models and Discovering Subtly Manifested Software Concerns Automatically
|Mrinal Das, Suparna Bhattacharya, Chiranjib Bhattacharyya, Gopinath Kanchi
|To address this issue we propose subtle topic models(STM). We pose this problem as that of discovering topics that rarely occur in individual documents, which we will refer to as subtle topics.
|Exploring the Mind: Integrating Questionnaires and fMRI
|Esther Salazar, Ryan Bogdan, Adam Gorka, Ahmad Hariri, Lawrence Carin
|A new model is developed for joint analysis of ordered, categorical, real and count data.
|A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions
|Quoc Tran Dinh, Anastasios Kyrillidis, Volkan Cevher
|We propose an algorithmic framework for convex minimization problems of composite functions with two terms: a self-concordant part and a possibly nonsmooth regularization part.
|A Practical Algorithm for Topic Modeling with Provable Guarantees
|Sanjeev Arora, Rong Ge, Yonatan Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu
|In this paper we present an algorithm for learning topic models that is both provable and practical.
|Distributed training of Large-scale Logistic models
|Siddharth Gopal, Yiming Yang
|In this paper, we propose a training method for large-scale multinomial logistic models that breaks this bottleneck by enabling parallel optimization of the likelihood objective.
|An Adaptive Learning Rate for Stochastic Variational Inference
|Rajesh Ranganath, Chong Wang, Blei David, Eric Xing
|We solve this problem by developing an adaptive learning rate for stochastic inference.
|Margins, Shrinkage, and Boosting
|This manuscript shows that AdaBoost and its immediate variants can produce approximately maximum margin classifiers simply by scaling their step size choices by a fixed small constant.
|Canonical Correlation Analysis based on Hilbert-Schmidt Independence Criterion and Centered Kernel Target Alignment
|Billy Chang, Uwe Kruger, Rafal Kustra, Junping Zhang
|The aim of this article is to introduce two nonlinear CCA extensions that rely on the recently proposed Hilbert-Schmidt independence criterion and the centered kernel target alignment.
|Large-Scale Learning with Less RAM via Randomization
|Daniel Golovin, D. Sculley, Brendan McMahan, Michael Young
|We reduce the memory footprint of popular large-scale online learning methods by projecting our weight vector onto a coarse discrete set using randomized rounding.
|Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization
|Stefano Ermon, Carla Gomes, Ashish Sabharwal, Bart Selman
|We propose a randomized algorithm that, with high probability, gives a constant-factor approximation of a general discrete integral defined over an exponentially large set.
|Sparse coding for multitask and transfer learning
|Andreas Maurer, Massi Pontil, Bernardino Romera-Paredes
|We investigate the use of sparse coding and dictionary learning in the context of multitask and transfer learning.
|Direct Modeling of Complex Invariances for Visual Object Features
|Ka Yu Hui
|In this paper we consider an alternate strategy of directly modeling complex invariances of object features.
|Hierarchically-coupled hidden Markov models for learning kinetic rates from single-molecule data
|Jan-Willem Meent, Jonathan Bronson, Frank Wood, Ruben Gonzalez Jr., Chris Wiggins
|We address the problem of analyzing sets of noisy time-varying signals that all report on the same process but confound straightforward analyses due to complex inter-signal heterogeneities and measurement artifacts.
|Activized Learning with Uniform Classification Noise
|Liu Yang, Steve Hanneke
|We prove that for any VC class, it is possible to transform any passive learning algorithm into an active learning algorithm with strong asymptotic improvements in label complexity for every nontrivial distribution satisfying a uniform classification noise condition.
|Guided Policy Search
|Sergey Levine, Vladlen Koltun
|We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima.
|Squared-loss Mutual Information Regularization: A Novel Information-theoretic Approach to Semi-supervised Learning
|Gang Niu, Wittawat Jitkrittum, Bo Dai, Hirotaka Hachiya, Masashi Sugiyama
|We propose squared-loss mutual information regularization (SMIR) for multi-class probabilistic classification, following the information maximization principle.
|Gossip-based distributed stochastic bandit algorithms
|Balazs Szorenyi, Robert Busa-Fekete, Istvan Hegedus, Robert Ormandi, Mark Jelasity, Balazs Kegl
|Our goal is to adapt stochastic bandit algorithms to P2P networks.
|The Sample-Complexity of General Reinforcement Learning
|Tor Lattimore, Marcus Hutter, Peter Sunehag
|We study the sample-complexity of reinforcement learning in a general setting without assuming ergodicity or finiteness of the environment.
|Hierarchical Regularization Cascade for Joint Learning
|Alon Zweig, Daphna Weinshall
|We present a hierarchical approach which exploits information sharing among different classification tasks, in multi-task and multi-class settings.
|Multi-Class Classification with Maximum Margin Multiple Kernel
|Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh
|We present a new algorithm for multi-class classification with multiple kernels.
|Bayesian Games for Adversarial Regression Problems
|Michael Gro�hans, Christoph Sawade, Michael Br�ckner, Tobias Scheffer
|We study regression problems in which an adversary can exercise some control over the data generation process.
|Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing
|Xi Chen, Qihang Lin, Dengyong Zhou
|To solve the computational challenge, we propose a novel approximate policy which is called optimistic knowledge gradient.
|Markov Network Estimation From Multi-attribute Data
|Mladen Kolar, Han Liu, Eric Xing
|In this paper, we propose a new principled framework for estimating multi-attribute graphs.
|MILEAGE: Multiple Instance LEArning with Global Embedding
|Dan Zhang, Jingrui He, Luo Si, Richard Lawrence
|This paper proposes a novel framework – \emphMultiple Instance LEArning with Global Embedding (MILEAGE), in which the global feature vectors for traditional learning methods are integrated into the MIL setting.
|Guaranteed Sparse Recovery under Linear Transformation
|Ji Liu, Lei Yuan, Jieping Ye
|We consider the following signal recovery problem: given a measurement matrix Φ∈\mathbbR^n\times p and a noisy observation vector c∈\mathbbR^n constructed from c = Φθ^* + εwhere ε∈\mathbbR^n is the noise vector whose entries follow i.i.d. centered sub-Gaussian distribution, how to recover the signal θ^* if Dθ^* is sparse \rca under a linear transformation D∈\mathbbR^m\times p?
|Learning invariant features by harnessing the aperture problem
|Roland Memisevic, Georgios Exarchakis
|The energy model is a simple, biologically inspired approach to extracting relationships between images in tasks like stereopsis and motion analysis.
|Efficient Ranking from Pairwise Comparisons
|Fabian Wauthier, Michael Jordan, Nebojsa Jojic
|In this paper we show that much simpler algorithms can match the same Ω(n) lower bound in expectation.
|Differentially Private Learning with Kernels
|Prateek Jain, Abhradeep Thakurta
|In this paper, we consider the problem of differentially private learning where access to the training features is through a kernel function only.
|Thompson Sampling for Contextual Bandits with Linear Payoffs
|Shipra Agrawal, Navin Goyal
|In this paper, we design and analyze Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary.
|Learning Multiple Behaviors from Unlabeled Demonstrations in a Latent Controller Space
|Javier Almingol, Lui Montesano, Manuel Lopes
|In this paper we introduce a method to learn multiple behaviors in the form of motor primitives from an unlabeled dataset.
|Inference algorithms for pattern-based CRFs on sequence data
|Rustem Takhanov, Vladimir Kolmogorov
|We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP.
|One-Bit Compressed Sensing: Provable Support and Vector Recovery
|Sivakant Gopi, Praneeth Netrapalli, Prateek Jain, Aditya Nori
|In this paper, we study the problem of one-bit compressed sensing (1-bit CS), where the goal is to design a measurement matrix A and a recovery algorithm s.t. a k-sparse vector \x^* can be efficiently recovered back from signed linear measurements, i.e., b=\sign(A\x^*).
|Yichuan Tang, Ruslan Salakhutdinov, Geoffrey Hinton
|In this paper, we introduce Tensor Analyzers which are a multilinear generalization of Factor Analyzers.
|Learning Sparse Penalties for Change-point Detection using Max Margin Interval Regression
|Toby Hocking, Guillem Rigaill, Jean-Philippe Vert, Francis Bach
|In this work, we propose to learn the penalty and its constants in databases of signals with weak change-point annotations.
|Learning from Human-Generated Lists
|Kwang-Sung Jun, Jerry Zhu, Burr Settles, Timothy Rogers
|We propose a generative model – sampling with reduced replacement (SWIRL) – for such lists.
|A Fast and Exact Energy Minimization Algorithm for Cycle MRFs
|Huayan Wang, Koller Daphne
|In this paper, we present a fast and exact algorithm for energy minimization in cycle MRFs, which can be used as a subroutine in tackling general MRFs.
|Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning
|Daniel Tarlow, Kevin Swersky, Laurent Charlin, Ilya Sutskever, Rich Zemel
|Here we present kNCA, which generalizes NCA by learning distance metrics that are appropriate for kNN with arbitrary k.
|An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation
|Nicholas Bryan, Gautham Mysore
|For the constraints considered, an efficient expectation-maximization algorithm is derived with closed-form multiplicative updates, drawing connections to non-negative matrix factorization methods, and allowing for high-quality interactive-rate separation without explicit training data.
|Estimating Unknown Sparsity in Compressed Sensing
|In this paper, we propose to estimate a stable measure of sparsity s(x):=\|x\|_1^2/\|x\|_2^2, which is a sharp lower bound on \|x\|_0.
|MAD-Bayes: MAP-based Asymptotic Derivations from Bayes
|Tamara Broderick, Brian Kulis, Michael Jordan
|We demonstrate several other algorithms, all of which are scalable and simple to implement.
|The Most Generative Maximum Margin Bayesian Networks
|Robert Peharz, Sebastian Tschiatschek, Franz Pernkopf
|In this paper, we introduce a novel approach of hybrid generative-discriminative learning for Bayesian networks.
|Fastfood – Computing Hilbert Space Expansions in loglinear time
|Quoc Le, Tamas Sarlos, Alexander Smola
|This paper proposes an improvement to random kitchen sinks that offers significantly faster computation in log-linear time without sacrificing accuracy.
|Joint Transfer and Batch-mode Active Learning
|Rita Chattopadhyay, Wei Fan, Ian Davidson, Sethuraman Panchanathan, Jieping Ye
|In this work, we present an integrated framework that performs transfer and active learning simultaneously by solving a single convex optimization problem.
|Message passing with l1 penalized KL minimization
|Yuan Qi, Yandong Guo
|To address this issue, we propose an approximate inference method, relaxed expectation propagation(REP), based on a new divergence with a l1 penalty.
|Mean Reversion with a Variance Threshold
|Marco Cuturi, Alexandre D�Aspremont
|Starting from a multivariate data set, we study several techniques to isolate affine combinations of the variables with a maximum amount of mean reversion, while constraining the variance to be larger than a given threshold.
|Top-down particle filtering for Bayesian decision trees
|Balaji Lakshminarayanan, Daniel Roy, Yee Whye Teh
|We present a sequential Monte Carlo (SMC) algorithm that instead works in a top-down manner, mimicking the behavior and speed of classic algorithms.
|Smooth Sparse Coding via Marginal Regression for Learning Sparse Representations
|Krishnakumar Balasubramanian, Kai Yu, Guy Lebanon
|We propose and analyze a novel framework for learning sparse representations, based on two statistical techniques: kernel smoothing and marginal regression.
|Robust and Discriminative Self-Taught Learning
|Hua Wang, Feiping Nie, Heng Huang
|To address this, we propose a novel robust and discriminative self-taught learning approach to utilize any unlabeled data without the above restrictions.
|Safe Policy Iteration
|Matteo Pirotta, Marcello Restelli, Alessio Pecorino, Daniele Calandriello
|We propose two safe policy-iteration algorithms that differ in the way the next policy is chosen w.r.t. the estimated greedy policy.
|Unfolding Latent Tree Structures using 4th Order Tensors
|Mariya Ishteva, Haesun Park, Le Song
|In this paper, we propose a quartet based approach which is agnostic to this number.
|Learning Fair Representations
|Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, Cynthia Dwork
|We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the proportion in the population as a whole), and individual fairness (similar individuals should be treated similarly).
|Hierarchical Tensor Decomposition of Latent Tree Graphical Models
|Le Song, Mariya Ishteva, Ankur Parikh, Eric Xing, Haesun Park
|We approach the problem of estimating the parameters of a latent tree graphical model from a hierarchical tensor decomposition point of view.
|No more pesky learning rates
|Tom Schaul, Sixin Zhang, Yann LeCun
|We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time.
|Multi-View Clustering and Feature Learning via Structured Sparsity
|Hua Wang, Feiping Nie, Heng Huang
|In this paper, we propose a novel multi-view learning model to integrate all features and learn the weight for every feature with respect to each cluster individually via new joint structured sparsity-inducing norms.
|Planning by Prioritized Sweeping with Small Backups
|Harm Van Seijen, Rich Sutton
|In this paper, we introduce a new planning backup that uses only the current value of a single successor state and has a computation time independent of the number of successor states.
|Solving Continuous POMDPs: Value Iteration with Incremental Learning of an Efficient Space Representation
|Sebastian Brechtel, Tobias Gindele, R�diger Dillmann
|We propose the novel concept of learning a discrete representation of the continuous state space to solve the integrals in continuous POMDPs efficiently and generalize sparse calculations over the continuous space.
|Learning Heteroscedastic Models by Convex Programming under Group Sparsity
|Arnak Dalalyan, Mohamed Hebiri, Katia Meziani, Joseph Salmon
|In this paper, we propose a new approach to the joint estimation of the conditional mean and the conditional variance in a high-dimensional (auto-) regression setting.
|Covariate Shift in Hilbert Space: A Solution via Sorrogate Kernels
|Kai Zhang, Vincent Zheng, Qiaojun Wang, James Kwok, Qiang Yang, Ivan Marsic
|Motivated by this, we propose to match data distributions in the Hilbert space, which, given a pre-defined empirical kernel map, can be formulated as aligning kernel matrices across domains.
|A Local Algorithm for Finding Well-Connected Clusters
|Zeyuan Allen Zhu, Silvio Lattanzi, Vahab Mirrokni
|Motivated by applications of large-scale graph clustering, we study random-walk-based LOCAL algorithms whose running times depend only on the size of the output cluster, rather than the entire graph.
|Efficient Multi-label Classification with Many Labels
|Wei Bi, James Kwok
|In this paper, we address this problem by selecting a small subset of class labels that can approximately span the original label space.
|Spectral Compressed Sensing via Structured Matrix Completion
|Yuxin Chen, Yuejie Chi
|To address this problem, we develop a novel nonparametric algorithm, called enhanced matrix completion (EMaC), based on structured matrix completion.
|Multi-Task Learning with Gaussian Matrix Generalized Inverse Gaussian Model
|Ming Yang, Yingming Li, Zhongfei Zhang
|In this paper, we study the multi-task learning problem with a new perspective of considering the structure of the residue error matrix and the low-rank approximation to the task covariance matrix simultaneously.
|Simple Sparsification Improves Sparse Denoising Autoencoders in Denoising Highly Corrupted Images
|We propose that a simple sparsification of the latent representation found by the encoder improves denoising performance, when the DAE was trained with sparsity regularization.
|On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions
|Purushottam Kar, Bharath Sriperumbudur, Prateek Jain, Harish Karnick
|In this paper, we study the generalization properties of online learning based stochastic methods for supervised learning problems where the loss function is dependent on more than one training sample (e.g., metric learning, ranking).
|Non-Linear Stationary Subspace Analysis with Application to Video Classification
|Mahsa Baktashmotlagh, Mehrtash Harandi, Abbas Bigdeli, Brian Lovell, Mathieu Salzmann
|In this paper, we introduce Non-Linear Stationary Subspace Analysis: A method that overcomes this issue by explicitly separating the stationary parts of the video signal (i.e., the parts shared across all videos in one class), from its non-stationary parts (i.e., specific to individual videos).
|Two-Sided Exponential Concentration Bounds for Bayes Error Rate and Shannon Entropy
|Jean Honorio, Jaakkola Tommi
|We provide a method that approximates the Bayes error rate and the Shannon entropy with high probability.
|That was fast! Speeding up NN search of high dimensional distributions.
|Emanuele Coviello, Adeel Mumtaz, Antoni Chan, Gert Lanckriet
|The main technical contribution of the paper is a novel and efficient algorithm for deciding whether to explore nodes during backtracking, based on a variational approximation.
|Entropic Affinities: Properties and Efficient Numerical Computation
|Max Vladymyrov, Miguel Carreira-Perpinan
|Hinton and Roweis (2003) introduced a way to set the scale individually for each point so that it has a distribution over neighbors with a desired perplexity, or effective number of neighbors.
|Local Deep Kernel Learning for Efficient Non-linear SVM Prediction
|Cijo Jose, Prasoon Goyal, Parv Aggrwal, Manik Varma
|Our objective is to speed up non-linear SVM prediction while maintaining classification accuracy above an acceptable limit.
|Temporal Difference Methods for the Variance of the Reward To Go
|Aviv Tamar, Dotan Di Castro, Shie Mannor
|In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward.
|\proptoSVM for Learning with Label Proportions
|Felix Yu, Dong Liu, Sanjiv Kumar, Jebara Tony, Shih-Fu Chang
|We propose a new method called proportion-SVM, or \proptoSVM, which explicitly models the latent unknown instance labels together with the known group label proportions in a large-margin framework.
|Parameter Learning and Convergent Inference for Dense Random Fields
|Philipp Kraehenbuehl, Vladlen Koltun
|In this paper, we present an efficient algorithm for learning parameters in dense random fields.
|Loss-Proportional Subsampling for Subsequent ERM
|Paul Mineiro, Nikos Karampatziakis
|We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk.
|Scalable Simple Random Sampling and Stratified Sampling
|In this work, we describe a scalable simple random sampling algorithm, named ScaSRS, which uses probabilistic thresholds to decide on the fly whether to accept, reject, or wait-list an item independently of others.
|Riemannian Similarity Learning
|We tackle this problem in a Riemannian optimization framework.
|On Compact Codes for Spatially Pooled Features
|Yangqing Jia, Oriol Vinyals, Trevor Darrell
|In this paper we analyze the classification accuracy with respect to dictionary size by linking the encoding stage to kernel methods and \nystrom sampling, and obtain useful bounds on accuracy as a function of size.
|Dynamic Covariance Models for Multivariate Financial Time Series
|Yue Wu, Jose Miguel Hernandez-Lobato, Ghahramani Zoubin
|To address these problems we introduce a novel dynamic model for time-changing covariances.
|Revisiting the Nystrom method for improved large-scale machine learning
|Alex Gittens, Michael Mahoney
|Revisiting the Nystrom method for improved large-scale machine learning
|Infinite Positive Semidefinite Tensor Factorization for Source Separation of Mixture Signals
|Kazuyoshi Yoshii, Ryota Tomioka, Daichi Mochihashi, Masataka Goto
|This paper presents a new class of tensor factorization called positive semidefinite tensor factorization (PSDTF) that decomposes a set of positive semidefinite (PSD) matrices into the convex combinations of fewer PSD basis matrices.
|A Unified Robust Regression Model for Lasso-like Algorithms
|Wenzhuo Yang, Huan Xu
|We develop a unified robust linear regression model and show that it is equivalent to a general regularization framework to encourage sparse-like structure that contains group Lasso and fused Lasso as specific examples.
|Quickly Boosting Decision Trees � Pruning Underachieving Features Early
|Ron Appel, Thomas Fuchs, Piotr Dollar, Pietro Perona
|We propose a principled approach to overcome this drawback.
|On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance
|Aditya Menon, Harikrishna Narasimhan, Shivani Agarwal, Sanjay Chawla
|In this paper, we study consistency with respect to one such performance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establish that some simple methods that have been used in practice, such as applying an empirically determined threshold to a suitable class probability estimate or performing an empirically balanced form of risk minimization, are in fact consistent with respect to the AM (under mild conditions on the underlying distribution).
|Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment
|Jason Chuang, Sonal Gupta, Christopher Manning, Jeffrey Heer
|We introduce a framework to support large-scale assessment of topical relevance.
|Online Kernel Learning with a Near Optimal Sparsity Bound
|Lijun Zhang, Jinfeng Yi, Rong Jin, Ming Lin, Xiaofei He
|In this work, we focus on Online Sparse Kernel Learning that aims to online learn a kernel classifier with a bounded number of support vectors.
|Spectral Learning of Hidden Markov Models from Dynamic and Static Data
|Tzu-Kuo Huang, Jeff Schneider
|We develop spectral learning algorithms for Hidden Markov Models that learn not only from time series, or dynamic data but also static data drawn independently from the HMM’s stationary distribution.
|Analogy-preserving Semantic Embedding for Visual Object Categorization
|Sung Ju Hwang, Kristen Grauman, Fei Sha
|We propose to model \emphanalogies that reflect the relationships between multiple pairs of classes simultaneously, in the form “p is to q, as r is to s””.
|Algebraic classifiers: a generic approach to fast cross-validation, online training, and parallel training
|But not all classifiers have an obvious structure, so we introduce the Free HomTrainer.
|Factorial Multi-Task Learning : A Bayesian Nonparametric Approach
|Sunil Gupta, Dinh Phung, Svetha Venkatesh
|To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups.
|Modeling Information Propagation with Survival Theory
|Manuel Gomez-Rodriguez, Jure Leskovec, Bernhard Sch�lkopf
|Here we apply survival theory to develop general additive and multiplicative risk models under which the network inference problems can be solved efficiently by exploiting their convexity.
|Better Rates for Any Adversarial Deterministic MDP
|Ofer Dekel, Elad Hazan
|We devise a new algorithm that pushes the state-of-the-art forward in two ways: First, it attains a regret of O(T^2/3) with respect to the best fixed policy in hindsight, whereas the previous best regret bound was O(T^3/4).
|ABC Reinforcement Learning
|Christos Dimitrakakis, Nikolaos Tziortziotis
|We introduce a simple, general framework for likelihood-free Bayesian reinforcement learning, through Approximate Bayesian Computation (ABC).
|Sharp Generalization Error Bounds for Randomly-projected Classifiers
|Robert Durrant, Ata Kaban
|We derive sharp bounds on the generalization error of a generic linear classifier trained by empirical risk minimization on randomly-projected data.
|On learning parametric-output HMMs
|Aryeh Kontorovich, Boaz Nadler, Roi Weiss
|We present a novel approach to learning an HMM whose outputs are distributed according to a parametric family.
|LDA Topic Model with Soft Assignment of Descriptors to Words
|Daphna Weinshall, Gal Levi, Dmitri Hanukaev
|Here we extend the LDA model to deal with documents that are represented more naturally by bags of continuous descriptors.
|On autoencoder scoring
|Hanna Kamyshanska, Roland Memisevic
|In this paper we show how an autoencoder can assign meaningful scores to data independently of training procedure and without reference to any probabilistic model, by interpreting it as a dynamical system.
|Infinite Markov-Switching Maximum Entropy Discrimination Machines
|In this paper, we present a method that combines the merits of Bayesian nonparametrics, specifically stick-breaking priors, and large-margin kernel machines in the context of sequential data classification.
|A PAC-Bayesian Approach for Domain Adaptation with Specialization to Linear Classifiers
|Pascal Germain, Amaury Habrard, Fran�ois Laviolette, Emilie Morvant
|We provide a first PAC-Bayesian analysis for domain adaptation (DA) which arises when the learning and test distributions differ.
|Sparse PCA through Low-rank Approximations
|Dimitris Papailiopoulos, Alexandros Dimakis, Stavros Korokythakis
|We introduce a novel algorithm that computes the k-sparse principal component of a positive semidefinite matrix A.
|Computation-Risk Tradeoffs for Covariance-Thresholded Regression
|Dinah Shender, John Lafferty
|We present a family of linear regression estimators that provides a fine-grained tradeoff between statistical accuracy and computational efficiency.
|Exact Rule Learning via Boolean Compressed Sensing
|Dmitry Malioutov, Kush Varshney
|We propose an interpretable rule-based classification system based on ideas from Boolean compressed sensing. Furthermore we construct rule sets from these learned clauses using set covering and boosting.
|Robust Sparse Regression under Adversarial Corruption
|Yudong Chen, Constantine Caramanis, Shie Mannor
|We consider three popular algorithms in the uncorrupted setting: Thresholding Regression, Lasso, and the Dantzig selector, and show that the counterparts obtained using the trimmed inner product are provably robust.
|Optimization with First-Order Surrogate Functions
|In this paper, we study optimization methods consisting of iteratively minimizing surrogates of an objective function.
|Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation
|Hema Koppula, Ashutosh Saxena
|In this paper, we reason about these alternate possibilities by reasoning over multiple possible graph structures.
|Consistency versus Realizable H-Consistency for Multiclass Classification
|Phil Long, Rocco Servedio
|We identify a property of a loss function, realizable consistency with respect to a restricted class of scoring functions, that accounts for this difference.
|Feature Multi-Selection among Subjective Features
|Sivan Sabato, Adam Kalai
|We give theoretically-motivated “”feature multi-selection”” algorithms that choose, among a large set of candidate features, not only which features to judge but how many times to judge each one.
|Domain Adaptation under Target and Conditional Shift
|Kun Zhang, Bernhard Sch�lkopf, Krikamol Muandet, Zhikun Wang
|We exploit importance reweighting or sample transformation to find the learning machine that works well on test data, and propose to estimate the weights or transformations by \it reweighting or transforming training data to reproduce the covariate distribution on the test domain.
|Collective Stability in Structured Prediction: Generalization from One Example
|Ben London, Bert Huang, Ben Taskar, Lise Getoor
|In this paper, we identify a set of natural conditions – weak dependence, hypothesis complexity and a new measure, collective stability – that are sufficient for generalization from even a single example, without imposing an explicit generative model of the data.
|Stable Coactive Learning via Perturbation
|Karthik Raman, Thorsten Joachims, Pannaga Shivaswamy, Tobias Schnabel
|User feedback takes the form of preferences, and recent work has introduced online algorithms that learn from this weak feedback.
|Max-Margin Multiple-Instance Dictionary Learning
|Xinggang Wang, Baoyuan Wang, Xiang Bai, Wenyu Liu, Zhuowen Tu
|In this paper, we propose a multiple instance learning (MIL) strategy (along the line of weakly supervised learning) for dictionary learning.
|Fast Semidifferential-based Submodular Function Optimization
|Rishabh Iyer, Stefanie Jegelka, Jeff Bilmes
|We present a practical and powerful new framework for both unconstrained and constrained submodular function optimization based on discrete semidifferentials (sub- and super-differentials).
|Kernelized Bayesian Matrix Factorization
|Mehmet G�nen, Suleiman Khan, Samuel Kaski
|We extend kernelized matrix factorization with a fully Bayesian treatment and with an ability to work with multiple side information sources expressed as different kernels.
|Learning the Structure of Sum-Product Networks
|Robert Gens, Domingos Pedro
|We propose the first algorithm for learning the structure of SPNs that takes full advantage of their expressiveness.
|Quantile Regression for Large-scale Applications
|Jiyan Yang, Xiangrui Meng, Michael Mahoney
|Here, we present a randomized algorithm that runs in time that is nearly linear in the size of the input and that, with constant probability, computes a (1+ε) approximate solution to an arbitrary quantile regression problem.
|Robust Regression on MapReduce
|Xiangrui Meng, Michael Mahoney
|In this work, we propose an efficient algorithm for solving strongly over-determined (m ≫n) robust \ell_p regression problems to moderate precision on MapReduce.
|Infinitesimal Annealing for Training Semi-Supervised Support Vector Machines
|Kohei Ogawa, Motoki Imamura, Ichiro Takeuchi, Masashi Sugiyama
|In this paper, we go beyond this trade-off by proposing a novel training algorithm that efficiently performs annealing with an infinitesimal resolution.
|One-Pass AUC Optimization
|Wei Gao, Rong Jin, Shenghuo Zhu, Zhi-Hua Zhou
|In this work, we focus on one-pass AUC optimization that requires only going through the training data once without storing the entire training dataset, where conventional online learning algorithms cannot be applied directly because AUC is measured by a sum of losses defined over pairs of instances from different classes.
|Learning Convex QP Relaxations for Structured Prediction
|Jeremy Jancsary, Sebastian Nowozin, Carsten Rother
|We introduce a new large margin approach to discriminative training of intractable discrete graphical models.
|Concurrent Reinforcement Learning from Customer Interactions
|David Silver, Leonard Newnham, David Barker, Suzanne Weller, Jason McFall
|In this paper, we explore applications in which a company interacts concurrently with many customers.
|Saving Evaluation Time for the Decision Function in Boosting: Representation and Reordering Base Learner
|Peng Sun, Jie Zhou
|In this paper, we present a novel method for ordering.
|Stability and Hypothesis Transfer Learning
|Ilja Kuzborskij, Francesco Orabona
|Particularly, we conduct a theoretical analysis of HTL by considering the algorithmic stability of a class of HTL algorithms based on Regularized Least Squares with biased regularization.
|Fast Dual Variational Inference for Non-Conjugate Latent Gaussian Models
|Mohammad Emtiyaz Khan, Aleksandr Aravkin, Michael Friedlander, Matthias Seeger
|In this paper, we derive a novel dual variational inference approach, which exploits the convexity property of the VG approximations.
|Modeling Temporal Evolution and Multiscale Structure in Networks
|Tue Herlau, Morten M�rup, Mikkel Schmidt
|We propose a model for temporally correlated multifurcating hierarchies in complex networks which jointly capture both effects.
|Dependent Normalized Random Measures
|Changyou Chen, Vinayak Rao, Wray Buntine, Yee Whye Teh
|In this paper we propose two constructions of dependent normalized random measures, a class of nonparametric priors over dependent probability measures.
|Fast Max-Margin Matrix Factorization with Data Augmentation
|Minjie Xu, Jun Zhu, Bo Zhang
|In this paper we present a probabilistic M3F model that admits a highly efficient Gibbs sampling algorithm through data augmentation.
|Natural Image Bases to Represent Neuroimaging Data
|Ashish Gupta, Murat Ayhan, Anthony Maida
|The key technique in this paper is the use of cross-domain features to represent MRI data.
|Breaking the Small Cluster Barrier of Graph Clustering
|Nir Ailon, Yudong Chen, Huan Xu
|This paper investigates graph clustering in the planted cluster model in the presence of \em small clusters.
|Approximate Inference in Collective Graphical Models
|Daniel Sheldon, Tao Sun, Akshat Kumar, Tom Dietterich
|We study the problem of approximate inference in collective graphical models (CGMs), which were recently introduced to model the problem of learning and inference with noisy aggregate observations.
|Scaling the Indian Buffet Process via Submodular Maximization
|Colorado Reed, Ghahramani Zoubin
|In this work, we use Kurihara & Wellings (2008)’s maximization-expectation framework to perform approximate MAP inference for linear-Gaussian latent feature models with an Indian Buffet Process (IBP) prior.
|Mini-Batch Primal and Dual Methods for SVMs
|Martin Takac, Avleen Bijral, Peter Richtarik, Nati Srebro
|We address the issue of using mini-batches in stochastic optimization of SVMs.
|The lasso, persistence, and cross-validation
|Darren Homrighausen, Daniel McDonald
|We consider the high-dimensional setting wherein the number of predictors p=n^α, α>0 grows with the number of observations.
|Spectral Experts for Estimating Mixtures of Linear Regressions
|Arun Tejasvi Chaganty, Percy Liang
|In this paper, we develop a new computationally efficient and provably consistent estimator for the mixture of linear regressions, a simple instance of discriminative latent-variable models.
|Distribution to Distribution Regression
|Junier Oliva, Barnabas Poczos, Jeff Schneider
|We analyze ’Distribution to Distribution regression’ where one is regressing a mapping where both the covariate (inputs) and response (outputs) are distributions.
|Regularization of Neural Networks using DropConnect
|Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, Rob Fergus
|We introduce DropConnect, a generalization of DropOut, for regularizing large fully-connected layers within neural networks.
|Gaussian Process Kernels for Pattern Discovery and Extrapolation
|Andrew Wilson, Ryan Adams
|We introduce simple closed form kernels that can be used with Gaussian processes to discover patterns and enable extrapolation.
|Anytime Representation Learning
|Zhixiang Xu, Matt Kusner, Gao Huang, Kilian Weinberger
|We introduce Anytime Feature Representations (AFR), a novel algorithm that explicitly addresses this trade-off in the data representation rather than in the classifier.
|Algorithms for Direct 0�1 Loss Optimization in Binary Classification
|Tan Nguyen, Scott Sanner
|In this paper, however, we do just that: we explore a variety of practical methods for direct (approximate) optimization of the 0–1 loss based on branch and bound search, combinatorial search, and coordinate descent on smooth, differentiable relaxations of 0–1 loss.
|Top-k Selection based on Adaptive Sampling of Noisy Preferences
|Robert Busa-Fekete, Balazs Szorenyi, Weiwei Cheng, Paul Weng, Eyke Huellermeier
|We propose and formally analyze a general preference-based racing algorithm that we instantiate with three specific ranking procedures and corresponding sampling schemes.
|The Extended Parameter Filter
|Yusuf Bugra Erol, Lei Li, Bharath Ramsundar, Russell Stuart
|In this paper, we demonstrate a connection between Storvik’s filter and a Kalman filter in parameter space and establish more general conditions under which Storvik’s filter works.
|Exploiting Ontology Structures and Unlabeled Data for Learning
|Nina Balcan, Avrim Blum, Yishay Mansour
|We present and analyze a theoretical model designed to understand and explain the effectiveness of ontologies for learning multiple related tasks from primarily unlabeled data.
|O(logT) Projections for Stochastic Optimization of Smooth and Strongly Convex Functions
|Lijun Zhang, Tianbao Yang, Rong Jin, Xiaofei He
|In this paper, we present a novel algorithm that aims to reduce the number of projections for stochastic optimization.
|Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization
|Krzysztof Dembczynski, Arkadiusz Jachnik, Wojciech Kotlowski, Willem Waegeman, Eyke Huellermeier
|We introduce a novel plug-in rule algorithm that estimates all parameters required for a Bayes-optimal prediction via a set of multinomial regression models, and we compare this algorithm with SSVMs in terms of computational complexity and statistical consistency.
|On the importance of initialization and momentum in deep learning
|Ilya Sutskever, James Martens, George Dahl, Geoffrey Hinton
|In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization.
|A non-IID Framework for Collaborative Filtering with Restricted Boltzmann Machines
|Kostadin Georgiev, Preslav Nakov
|We propose a framework for collaborative filtering based on Restricted Boltzmann Machines (RBM), which extends previous RBM-based approaches in several important directions.
|Intersecting singularities for multi-structured estimation
|Emile Richard, Francis BACH, Jean-Philippe Vert
|We address the problem of designing a convex nonsmooth regularizer encouraging multiple structural effects simultaneously.
|Structure Discovery in Nonparametric Regression through Compositional Kernel Search
|David Duvenaud, James Lloyd, Roger Grosse, Joshua Tenenbaum, Ghahramani Zoubin
|We present a method for searching over this space of structures which mirrors the scientific discovery process.
|Copy or Coincidence? A Model for Detecting Social Influence and Duplication Events
|Lisa Friedland, David Jensen, Michael Lavine
|In this paper, we analyze the task of inferring rare links between pairs of entities that seem too similar to have occurred by chance.
|Steffen Grunewalder, Gretton Arthur, John Shawe-Taylor
|We develop a generic approach to form smooth versions of basic mathematical operations like multiplication, composition, change of measure, and conditional expectation, among others.
|The Cross-Entropy Method Optimizes for Quantiles
|Sergiu Goschin, Ari Weinstein, Michael Littman
|We provide a variant of CE (Proportional CE) that effectively optimizes the expected value.
|Topic Discovery through Data Dependent and Random Projections
|Weicong Ding, Mohammad Hossein Rohban, Prakash Ishwar, Venkatesh Saligrama
|We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns.
|Bayesian Learning of Recursively Factored Environments
|Marc Bellemare, Joel Veness, Michael Bowling
|In this paper we introduce the class of recursively decomposable factorizations, and show how exact Bayesian inference can be used to efficiently guarantee predictive performance close to the best factorization in this class.
|Selective sampling algorithms for cost-sensitive multiclass prediction
|In this paper, we study the problem of active learning for cost-sensitive multiclass classification.
|The Bigraphical Lasso
|Alfredo Kalaitzis, John Lafferty, Neil D. Lawrence, Shuheng Zhou
|We introduce the bigraphical lasso, an estimator for precision matrices of matrix-normals based on the Cartesian product of graphs.
|Almost Optimal Exploration in Multi-Armed Bandits
|Zohar Karnin, Tomer Koren, Oren Somekh
|We present two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters.
|Deep Canonical Correlation Analysis
|Galen Andrew, Raman Arora, Jeff Bilmes, Karen Livescu
|We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated.
|Consistency of Online Random Forests
|Misha Denil, David Matheson, Nando Freitas
|In this paper, we take a step towards narrowing this gap by providing a consistency result for online random forests.
|Sparse Gaussian Conditional Random Fields: Algorithms, Theory, and Application to Energy Forecasting
|Matt Wytock, Zico Kolter
|In this paper, we make three contributions: 1) we develop a second-order active-set method which is several orders of magnitude faster that previously proposed optimization approaches for this problem 2) we analyze the model from a theoretical standpoint, improving upon past bounds with convergence rates that depend logarithmically on the data dimension, and 3) we apply the method to large-scale energy forecasting problems, demonstrating state-of-the-art performance on two real-world tasks.
|Fast Image Tagging
|Minmin Chen, Alice Zheng, Kilian Weinberger
|We propose FastTag, a novel algorithm that achieves comparable results with two simple linear mappings that are co-regularized in a joint convex loss function.
|Expensive Function Optimization with Stochastic Binary Outcomes
|Matthew Tesch, Jeff Schneider, Howie Choset
|In this work we define the stochastic binary optimization problem and propose an approach using an adaptation of Gaussian Processes for classification that presents a Bayesian optimization framework for this problem.
|Krzysztof Geras, Charles Sutton
|We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting.
|Learning Triggering Kernels for Multi-dimensional Hawkes Processes
|Ke Zhou, Hongyuan Zha, Le Song
|In this paper, we study these critical questions in social network analysis quantitatively under the framework of multi-dimensional Hawkes processes.
|On the difficulty of training recurrent neural networks
|Razvan Pascanu, Tomas Mikolov, Yoshua Bengio
|In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective.
|Ian Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio
|We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout.
|Predictable Dual-View Hashing
|Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Daume Hal, Larry Davis
|We propose a Predictable Dual-View Hashing (PDH) algorithm which embeds proximity of data samples in the original spaces.
|Deep learning with COTS HPC systems
|Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, Ng Andrew
|In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI.
|Nonparametric Mixture of Gaussian Processes with Constraints
|James Ross, Jennifer Dy
|Motivated by the need to identify new and clinically relevant categories of lung disease, we propose a novel clustering with constraints method using a Dirichlet process mixture of Gaussian processes in a variational Bayesian nonparametric framework.
|Scale Invariant Conditional Dependence Measures
|Sashank J Reddi, Barnabas Poczos
|In this paper we develop new dependence and conditional dependence measures and provide their estimators.
|Learning Policies for Contextual Submodular Prediction
|Stephane Ross, Jiaji Zhou, Yisong Yue, Debadeepta Dey, Drew Bagnell
|We propose a simple, efficient, and provably near-optimal approach to optimizing such prediction problems based on no-regret learning.
|Manifold Preserving Hierarchical Topic Models for Quantization and Approximation
|Minje Kim, Paris Smaragdis
|We present two complementary topic models to address the analysis of mixture data lying on manifolds.
|Safe Screening of Non-Support Vectors in Pathwise SVM Computation
|Kohei Ogawa, Yoshiki Suzuki, Ichiro Takeuchi
|In this paper, we claim that some of the non-support vectors (non-SVs) that have no influence on the classifier can be screened out prior to the training phase in pathwise SVM computation scenario, in which one is asked to train a sequence (or path) of SVM classifiers for different regularization parameters.
|Cost-sensitive Multiclass Classification Risk Bounds
|Bernardo �vila Pires, Csaba Szepesvari, Mohammad Ghavamzadeh
|In this paper, we strengthen these results by showing how the 0-1 excess loss of a predictor can be upper bounded as a function of the excess loss of the predictor measured using the convex surrogate.
|Semi-supervised Clustering by Input Pattern Assisted Pairwise Similarity Matrix Completion
|Jinfeng Yi, Lijun Zhang, Rong Jin, Qi Qian, Anil Jain
|Many semi-supervised clustering algorithms have been proposed to improve the clustering accuracy by effectively exploring the available side information that is usually in the form of pairwise constraints.
|Learning the beta-Divergence in Tweedie Compound Poisson Matrix Factorization Models
|Umut Simsekli, Ali Taylan Cemgil, Yusuf Kenan Yilmaz
|In this study, we derive algorithms for estimating mixed β-divergences.
|Fast algorithms for sparse principal component analysis based on Rayleigh quotient iteration
|We introduce new algorithms for sparse principal component analysis (sPCA), a variation of PCA which aims to represent data in a sparse low-dimensional basis.
|Nested Chinese Restaurant Franchise Process: Applications to User Tracking and Document Modeling
|Amr Ahmed, Liangjie Hong, Alexander Smola
|We introduce the nested Chinese Restaurant Franchise Process as a means to obtain both hierarchical tree-structured representations for objects, akin to (but more general than) the nested Chinese Restaurant Process while sharing their structure akin to the Hierarchical Dirichlet Process.
|Tree-Independent Dual-Tree Algorithms
|Ryan Curtin, William March, Parikshit Ram, David Anderson, Alexander Gray, Charles Isbell
|We introduce a four-part logical split: the tree, the traversal, the point-to-point base case, and the pruning rule.
|Multilinear Multitask Learning
|Bernardino Romera-Paredes, Hane Aung, Nadia Bianchi-Berthouze, Massimiliano Pontil
|We present two learning methods; one is an adapted convex relaxation method used in the context of tensor completion.
|Online Learning under Delayed Feedback
|Pooria Joulani, Andras Gyorgy, Csaba Szepesvari
|In this paper we provide a systematic study of the topic, and analyze the effect of delay on the regret of online learning algorithms.
|Adaptive Hamiltonian and Riemann Manifold Monte Carlo
|Ziyu Wang, Shakir Mohamed, Nando Freitas
|In this paper we address the widely-experienced difficulty in tuning Hamiltonian-based Monte Carlo samplers.
|Coco-Q: Learning in Stochastic Games with Side Payments
|Eric Sodomka, Elizabeth Hilliard, Michael Littman, Amy Greenwald
|In this paper, we show that coco values can also be defined for stochastic games and can be learned using a simple variant of Q-learning that is provably convergent. We provide a set of examples showing how the strategies learned by the Coco-Q algorithm relate to those learned by existing multiagent Q-learning algorithms.
|On A Nonlinear Generalization of Sparse Coding and Dictionary Learning
|Jeffrey Ho, Yuchen Xie, Baba Vemuri
|This paper proposes a novel framework for sparse coding and dictionary learning for data on a Riemannian manifold, and it shows that the existing sparse coding and dictionary learning methods can be considered as special (Euclidean) cases of the more general framework proposed here.
|Estimation of Causal Peer Influence Effects
|Panos Toulis, Edward Kao
|In this paper, we extend potential outcomes to allow for interference, we introduce well-defined causal estimands of peer-influence, and we develop two estimation procedures: a frequentist procedure relying on a sequential randomization design that requires knowledge of the network but operates under complicated response functions, and a Bayesian procedure which accounts for network uncertainty but relies on a linear response assumption to increase estimation precision.