Most Influential ICML Papers (2022-02)

February 3, 2022May 13, 2022 admin

The International Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. Paper Digest Team analyzes all papers published on ICML in the past years, and presents the 15 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the most influential papers from other conferences/journals, visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2022-02)

Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services on ranking, search, tracking and literature review.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
New York City, New York, 10017
team@paperdigest.org

TABLE 1: Most Influential ICML Papers (2022-02)

Year	Rank	Paper	Author(s)
2021	1	Training Data-efficient Image Transformers & Distillation Through Attention IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days.	HUGO TOUVRON et. al.
2021	2	Learning Transferable Visual Models From Natural Language Supervision IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet.	ALEC RADFORD et. al.
2021	3	Zero-Shot Text-to-Image Generation IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data.	ADITYA RAMESH et. al.
2021	4	Barlow Twins: Self-Supervised Learning Via Redundancy Reduction IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible.	Jure Zbontar; Li Jing; Ishan Misra; Yann Lecun; Stephane Deny;
2021	5	Is Space-Time Attention All You Need for Video Understanding? IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a convolution-free approach to video classification built exclusively on self-attention over space and time.	Gedas Bertasius; Heng Wang; Lorenzo Torresani;
2021	6	WILDS: A Benchmark of In-the-Wild Distribution Shifts IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address this gap, we present WILDS, a curated benchmark of 10 datasets reflecting a diverse range of distribution shifts that naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping.	PANG WEI KOH et. al.
2021	7	Out-of-Distribution Generalization Via Risk Extrapolation (REx) IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We motivate this approach, Risk Extrapolation (REx), as a form of robust optimization over a perturbation set of extrapolated domains (MM-REx), and propose a penalty on the variance of training risks (V-REx) as a simpler variant.	DAVID KRUEGER et. al.
2021	8	Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.	CHAO JIA et. al.
2021	9	EfficientNetV2: Smaller Models and Faster Training IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models.	Mingxing Tan; Quoc Le;
2021	10	High-Performance Large-Scale Image Recognition Without Normalization IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets.	Andy Brock; Soham De; Samuel L Smith; Karen Simonyan;
2021	11	Decoupling Representation Learning from Reinforcement Learning IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning.	Adam Stooke; Kimin Lee; Pieter Abbeel; Michael Laskin;
2021	12	ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a minimal VLP model, Vision-and-Language Transformer (ViLT), monolithic in the sense that the processing of visual inputs is drastically simplified to just the same convolution-free manner that we process textual inputs.	Wonjae Kim; Bokyung Son; Ildoo Kim;
2021	13	Perceiver: General Perception with Iterative Attention IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we introduce the Perceiver {–} a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets.	ANDREW JAEGLE et. al.
2021	14	Improved Denoising Diffusion Probabilistic Models IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show that with a few simple modifications, DDPMs can also achieve competitive log-likelihoods while maintaining high sample quality.	Alexander Quinn Nichol; Prafulla Dhariwal;
2021	15	E(n) Equivariant Graph Neural Networks IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces a new model to learn graph neural networks equivariant to rotations, translations, reflections and permutations called E(n)-Equivariant Graph Neural Networks (EGNNs).	Vi?ctor Garcia Satorras; Emiel Hoogeboom; Max Welling;
2020	1	A Simple Framework For Contrastive Learning Of Visual Representations IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a simple framework for contrastive representation learning.	Ting Chen; Simon Kornblith; Mohammad Norouzi; Geoffrey Hinton;
2020	2	Data-Efficient Image Recognition With Contrastive Predictive Coding IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We therefore revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning such representations.	Olivier Henaff;
2020	3	PEGASUS: Pre-training With Extracted Gap-sentences For Abstractive Summarization IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective.	Jingqing Zhang; Yao Zhao; Mohammad Saleh; Peter Liu;
2020	4	Generative Pretraining From Pixels IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images.	MARK CHEN et. al.
2020	5	Reliable Evaluation Of Adversarial Robustness With An Ensemble Of Diverse Parameter-free Attacks IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we first propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness.	Francesco Croce; Matthias Hein;
2020	6	SCAFFOLD: Stochastic Controlled Averaging For Federated Learning IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: As a solution, we propose a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the `client drift’.	SAI PRANEETH REDDY KARIMIREDDY et. al.
2020	7	Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from $\bigO{N^2}$ to $\bigO{N}$, where $N$ is the sequence length.	Angelos Katharopoulos; Apoorv Vyas; Nikolaos Pappas; Francois Fleuret;
2020	8	Simple And Deep Graph Convolutional Networks IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the problem of designing and analyzing deep graph convolutional networks.	Ming Chen; Zhewei Wei; Zengfeng Huang; Bolin Ding; Yaliang Li;
2020	9	Overfitting In Adversarially Robust Deep Learning IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we empirically study this phenomenon in the setting of adversarially trained deep networks, which are trained to minimize the loss under worst-case adversarial perturbations.	Eric Wong; Leslie Rice; Zico Kolter;
2020	10	Agent57: Outperforming The Atari Human Benchmark IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games.	ADRI&AGRAVE; PUIGDOMENECH BADIA et. al.
2020	11	Contrastive Multi-View Representation Learning On Graphs IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a self-supervised approach for learning node and graph level representations by contrasting structural views of graphs.	Kaveh Hassani; Amir Hosein Khasahmadi;
2020	12	On Gradient Descent Ascent For Nonconvex-Concave Minimax Problems IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently.	Tianyi Lin; Chi Jin; Michael Jordan;
2020	13	Do We Really Need To Access The Source Data? Source Hypothesis Transfer For Unsupervised Domain Adaptation IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we tackle a novel setting where only a trained source model is available and investigate how we can effectively utilize such a model without source data to solve UDA problems.	Jian Liang; Dapeng Hu; Jiashi Feng;
2020	14	Leveraging Procedural Generation To Benchmark Reinforcement Learning IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Procgen Benchmark, a suite of 16 procedurally generated game-like environments designed to benchmark both sample efficiency and generalization in reinforcement learning.	Karl Cobbe; Chris Hesse; Jacob Hilton; John Schulman;
2020	15	The Many Shapley Values For Model Explanation IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we use the axiomatic approach to study the differences between some of the many operationalizations of the Shapley value for attribution, and propose a technique called Baseline Shapley (BShap) that is backed by a proper uniqueness result.	Mukund Sundararajan; Amir Najmi;
2019	1	EfficientNet: Rethinking Model Scaling For Convolutional Neural Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance.	Mingxing Tan; Quoc Le;
2019	2	Self-Attention Generative Adversarial Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks.	Han Zhang; Ian Goodfellow; Dimitris Metaxas; Augustus Odena;
2019	3	Theoretically Principled Trade-off Between Robustness And Accuracy IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors.	HONGYANG ZHANG et. al.
2019	4	Simplifying Graph Convolutional Networks IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we reduce this excess complexity through successively removing nonlinearities and collapsing weight matrices between consecutive layers.	FELIX WU et. al.
2019	5	A Convergence Theory For Deep Learning Via Over-Parameterization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we prove simple algorithms such as stochastic gradient descent (SGD) can find Global Minima on the training objective of DNNs in Polynomial Time.	Zeyuan Allen-Zhu; Yuanzhi Li; Zhao Song;
2019	6	Certified Adversarial Robustness Via Randomized Smoothing IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the L2 norm.	Jeremy Cohen; Elan Rosenfeld; Zico Kolter;
2019	7	Gradient Descent Finds Global Minima Of Deep Neural Networks IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet).	Simon Du; Jason Lee; Haochuan Li; Liwei Wang; Xiyu Zhai;
2019	8	Challenging Common Assumptions In The Unsupervised Learning Of Disentangled Representations IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions.	FRANCESCO LOCATELLO et. al.
2019	9	MASS: Masked Sequence To Sequence Pre-training For Language Generation IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the success of BERT, we propose MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks.	Kaitao Song; Xu Tan; Tao Qin; Jianfeng Lu; Tie-Yan Liu;
2019	10	Learning Latent Dynamics For Planning From Pixels IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose the Deep Planning Network (PlaNet), a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space.	DANIJAR HAFNER et. al.
2019	11	Fine-Grained Analysis Of Optimization And Generalization For Overparameterized Two-Layer Neural Networks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR’17]. (ii) Generalization bound independent of network size, using a data-dependent complexity measure.	Sanjeev Arora; Simon Du; Wei Hu; Zhiyuan Li; Ruosong Wang;
2019	12	Do ImageNet Classifiers Generalize To ImageNet? IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We build new test sets for the CIFAR-10 and ImageNet datasets.	Benjamin Recht; Rebecca Roelofs; Ludwig Schmidt; Vaishaal Shankar;
2019	13	Off-Policy Deep Reinforcement Learning Without Exploration IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are incapable of learning with data uncorrelated to the distribution under the current policy, making them ineffective for this fixed batch setting.	Scott Fujimoto; David Meger; Doina Precup;
2019	14	Self-Attention Graph Pooling IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a graph pooling method based on self-attention.	Junhyun Lee; Inyeop Lee; Jaewoo Kang;
2019	15	Manifold Mixup: Better Representations By Interpolating Hidden States IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address these issues, we propose \manifoldmixup{}, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations.	VIKAS VERMA et. al.
2018	1	Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework.	Tuomas Haarnoja; Aurick Zhou; Pieter Abbeel; Sergey Levine;
2018	2	Obfuscated Gradients Give A False Sense Of Security: Circumventing Defenses To Adversarial Examples IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it.	Anish Athalye; Nicholas Carlini; David Wagner;
2018	3	CyCADA: Cycle-Consistent Adversarial Domain Adaptation IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a model which adapts between domains using both generative image space alignment and latent representation space alignment.	JUDY HOFFMAN et. al.
2018	4	Addressing Function Approximation Error In Actor-Critic Methods IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.	Scott Fujimoto; Herke Hoof; David Meger;
2018	5	Provable Defenses Against Adversarial Examples Via The Convex Outer Adversarial Polytope IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data.	Eric Wong; Zico Kolter;
2018	6	Synthesizing Robust Adversarial Examples IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We demonstrate the existence of robust 3D adversarial objects, and we present the first algorithm for synthesizing examples that are adversarial over a chosen distribution of transformations.	Anish Athalye; Logan Engstrom; Andrew Ilyas; Kevin Kwok;
2018	7	IMPALA: Scalable Distributed Deep-RL With Importance Weighted Actor-Learner Architectures IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters.	LASSE ESPEHOLT et. al.
2018	8	Which Training Methods For GANs Do Actually Converge? IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent.	Lars Mescheder; Andreas Geiger; Sebastian Nowozin;
2018	9	Disentangling By Factorising IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions.	Hyunjik Kim; Andriy Mnih;
2018	10	Representation Learning On Graphs With Jumping Knowledge Networks IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We analyze some important properties of these models, and propose a strategy to overcome those.	KEYULU XU et. al.
2018	11	MentorNet: Learning Data-Driven Curriculum For Very Deep Neural Networks On Corrupted Labels IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet.	Lu Jiang; Zhengyuan Zhou; Thomas Leung; Li-Jia Li; Li Fei-Fei;
2018	12	Learning To Reweight Examples For Robust Deep Learning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions.	Mengye Ren; Wenyuan Zeng; Bin Yang; Raquel Urtasun;
2018	13	Interpretability Beyond Feature Attribution: Quantitative Testing With Concept Activation Vectors (TCAV) IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net’s internal state in terms of human-friendly concepts.	BEEN KIM et. al.
2018	14	Parallel WaveNet: Fast High-Fidelity Speech Synthesis IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality.	AARON OORD et. al.
2018	15	Learning Representations And Generative Models For 3D Point Clouds IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we look at geometric data represented as point clouds.	Panos Achlioptas; Olga Diamanti; Ioannis Mitliagkas; Leonidas Guibas;
2017	1	Model-Agnostic Meta-Learning For Fast Adaptation Of Deep Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning.	Chelsea Finn; Pieter Abbeel; Sergey Levine;
2017	2	Wasserstein Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a new algorithm named WGAN, an alternative to traditional GAN training.	Martin Arjovsky; Soumith Chintala; L�on Bottou;
2017	3	Neural Message Passing For Quantum Chemistry IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we reformulate existing models into a single common framework we call Message Passing Neural Networks (MPNNs) and explore additional novel variations within this framework.	Justin Gilmer; Samuel S. Schoenholz; Patrick F. Riley; Oriol Vinyals; George E. Dahl;
2017	4	Convolutional Sequence To Sequence Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce an architecture based entirely on convolutional neural networks.	Jonas Gehring; Michael Auli; David Grangier; Denis Yarats; Yann N. Dauphin;
2017	5	Conditional Image Synthesis With Auxiliary Classifier GANs IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we introduce new methods for the improved training of generative adversarial networks (GANs) for image synthesis.	Augustus Odena; Christopher Olah; Jonathon Shlens;
2017	6	Axiomatic Attribution For Deep Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works.	Mukund Sundararajan; Ankur Taly; Qiqi Yan;
2017	7	On Calibration Of Modern Neural Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated.	Chuan Guo; Geoff Pleiss; Yu Sun; Kilian Q. Weinberger;
2017	8	Learning Important Features Through Propagating Activation Differences IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input.	Avanti Shrikumar; Peyton Greenside; Anshul Kundaje;
2017	9	Understanding Black-box Predictions Via Influence Functions IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we use influence functions — a classic technique from robust statistics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction.	Pang Wei Koh; Percy Liang;
2017	10	Learning To Discover Cross-Domain Relations With Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method based on a generative adversarial network that learns to discover relations between different domains (DiscoGAN).	Taeksoo Kim; Moonsu Cha; Hyunsoo Kim; Jung Kwon Lee; Jiwon Kim;
2017	11	Curiosity-driven Exploration By Self-supervised Prediction IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We formulate curiosity as the error in an agent’s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model.	Deepak Pathak; Pulkit Agrawal; Alexei A. Efros; Trevor Darrell;
2017	12	Language Modeling With Gated Convolutional Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens.	Yann N. Dauphin; Angela Fan; Michael Auli; David Grangier;
2017	13	Deep Transfer Learning With Joint Adaptation Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present joint adaptation networks (JAN), which learn a transfer network by aligning the joint distributions of multiple domain-specific layers across domains based on a joint maximum mean discrepancy (JMMD) criterion.	Mingsheng Long; Han Zhu; Jianmin Wang; Michael I. Jordan;
2017	14	Large-Scale Evolution Of Image Classifiers IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our goal is to minimize human participation, so we employ evolutionary algorithms to discover such networks automatically.	ESTEBAN REAL et. al.
2017	15	Continual Learning Through Synaptic Intelligence IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we introduce intelligent synapses that bring some of this biological complexity into artificial neural networks.	Friedemann Zenke; Ben Poole; Surya Ganguli;
2016	1	Asynchronous Methods For Deep Reinforcement Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers.	VOLODYMYR MNIH et. al.
2016	2	Dropout As A Bayesian Approximation: Representing Model Uncertainty In Deep Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes.	Yarin Gal; Zoubin Ghahramani;
2016	3	Deep Speech 2 : End-to-End Speech Recognition In English And Mandarin IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech–two vastly different languages.	DARIO AMODEI et. al.
2016	4	Generative Adversarial Text To Image Synthesis IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we develop a novel deep architecture and GAN formulation to effectively bridge these advances in text and image modeling, translating visual concepts from characters to pixels.	SCOTT REED et. al.
2016	5	Dueling Network Architectures For Deep Reinforcement Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a new neural network architecture for model-free reinforcement learning.	ZIYU WANG et. al.
2016	6	Pixel Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions.	Aaron Van Oord; Nal Kalchbrenner; Koray Kavukcuoglu;
2016	7	Learning Convolutional Neural Networks For Graphs IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a framework for learning convolutional neural networks for arbitrary graphs.	Mathias Niepert; Mohamed Ahmed; Konstantin Kutzkov;
2016	8	Autoencoding Beyond Pixels Using A Learned Similarity Metric IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an autoencoder that leverages learned representations to better measure similarities in data space.	Anders Boesen Lindbo Larsen; S�ren Kaae S�nderby; Hugo Larochelle; Ole Winther;
2016	9	Unsupervised Deep Embedding For Clustering Analysis IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks.	Junyuan Xie; Ross Girshick; Ali Farhadi;
2016	10	Benchmarking Deep Reinforcement Learning For Continuous Control IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure.	Yan Duan; Xi Chen; Rein Houthooft; John Schulman; Pieter Abbeel;
2016	11	Complex Embeddings For Simple Link Prediction IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: As in previous studies, we propose to solve this problem through latent factorization.	Th�o Trouillon; Johannes Welbl; Sebastian Riedel; Eric Gaussier; Guillaume Bouchard;
2016	12	Ask Me Anything: Dynamic Memory Networks For Natural Language Processing IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers.	ANKIT KUMAR et. al.
2016	13	Revisiting Semi-Supervised Learning With Graph Embeddings IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a semi-supervised learning framework based on graph embeddings.	Zhilin Yang; William Cohen; Ruslan Salakhudinov;
2016	14	CryptoNets: Applying Neural Networks To Encrypted Data With High Throughput And Accuracy IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we will present a method to convert learned neural networks to CryptoNets, neural networks that can be applied to encrypted data.	RAN GILAD-BACHRACH et. al.
2016	15	Group Equivariant Convolutional Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries.	Taco Cohen; Max Welling;
2015	1	Batch Normalization: Accelerating Deep Network Training By Reducing Internal Covariate Shift IF:10 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs.	Sergey Ioffe; Christian Szegedy;
2015	2	Show, Attend And Tell: Neural Image Caption Generation With Visual Attention IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images.	KELVIN XU et. al.
2015	3	Trust Region Policy Optimization IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement.	John Schulman; Sergey Levine; Pieter Abbeel; Michael Jordan; Philipp Moritz;
2015	4	Unsupervised Domain Adaptation By Backpropagation IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled target-domain data is necessary).	Yaroslav Ganin; Victor Lempitsky;
2015	5	Learning Transferable Features With Deep Adaptation Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new Deep Adaptation Network (DAN) architecture, which generalizes deep convolutional neural network to the domain adaptation scenario.	Mingsheng Long; Yue Cao; Jianmin Wang; Michael Jordan;
2015	6	Unsupervised Learning Of Video Representations Using LSTMs IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We use Long Short Term Memory (LSTM) networks to learn representations of video sequences.	Nitish Srivastava; Elman Mansimov; Ruslan Salakhudinov;
2015	7	Variational Inference With Normalizing Flows IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions.	Danilo Rezende; Shakir Mohamed;
2015	8	DRAW: A Recurrent Neural Network For Image Generation IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces the Deep Recurrent Attentive Writer (DRAW) architecture for image generation with neural networks.	Karol Gregor; Ivo Danihelka; Alex Graves; Danilo Rezende; Daan Wierstra;
2015	9	Deep Learning With Limited Numerical Precision IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We study the effect of limited precision data representation and computation on neural network training.	Suyog Gupta; Ankur Agrawal; Kailash Gopalakrishnan; Pritish Narayanan;
2015	10	From Word Embeddings To Document Distances IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present the Word Mover’s Distance (WMD), a novel distance function between text documents.	Matt Kusner; Yu Sun; Nicholas Kolkin; Kilian Weinberger;
2015	11	An Empirical Exploration Of Recurrent Network Architectures IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we aim to determine whether the LSTM architecture is optimal or whether much better architectures exist.	Rafal Jozefowicz; Wojciech Zaremba; Ilya Sutskever;
2015	12	Compressing Neural Networks With The Hashing Trick IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes.	Wenlin Chen; James Wilson; Stephen Tyree; Kilian Weinberger; Yixin Chen;
2015	13	An Embarrassingly Simple Approach To Zero-shot Learning IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we describe a zero-shot learning approach that can be implemented in just one line of code, yet it is able to outperform state of the art approaches on standard datasets.	Bernardino Romera-Paredes; Philip Torr;
2015	14	Weight Uncertainty In Neural Network IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop.	Charles Blundell; Julien Cornebise; Koray Kavukcuoglu; Daan Wierstra;
2015	15	Online Tracking By Learning Discriminative Saliency Map With Convolutional Neural Network IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an online visual tracking algorithm by learning discriminative saliency map using Convolutional Neural Network (CNN).	Seunghoon Hong; Tackgeun You; Suha Kwak; Bohyung Han;
2014	1	Distributed Representations Of Sentences And Documents IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an unsupervised algorithm that learns vector representations of sentences and text documents.	Quoc Le; Tomas Mikolov;
2014	2	DeCAF: A Deep Convolutional Activation Feature For Generic Visual Recognition IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks.	JEFF DONAHUE et. al.
2014	3	Stochastic Backpropagation And Approximate Inference In Deep Generative Models IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound.	Danilo Jimenez Rezende; Shakir Mohamed; Daan Wierstra;
2014	4	Deterministic Policy Gradient Algorithms IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions.	DAVID SILVER et. al.
2014	5	Towards End-To-End Speech Recognition With Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation.	Alex Graves; Navdeep Jaitly;
2014	6	Recurrent Convolutional Neural Networks For Scene Labeling IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an approach that consists of a recurrent convolutional neural network which allows us to consider a large input context while limiting the capacity of the model.	Pedro Pinheiro; Ronan Collobert;
2014	7	Neural Variational Inference And Learning In Belief Networks IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior.	Andriy Mnih; Karol Gregor;
2014	8	Stochastic Gradient Hamiltonian Monte Carlo IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore the properties of such a stochastic gradient HMC approach.	Tianqi Chen; Emily Fox; Carlos Guestrin;
2014	9	Multimodal Neural Language Models IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities.	Ryan Kiros; Ruslan Salakhutdinov; Rich Zemel;
2014	10	Learning Character-level Representations For Part-of-Speech Tagging IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a deep neural network that learns character-level representation of words and associate them with usual word representations to perform POS tagging.	Cicero Dos Santos; Bianca Zadrozny;
2014	11	Fast Computation Of Wasserstein Barycenters IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present new algorithms to compute the mean of a set of N empirical probability measures under the optimal transport metric.	Marco Cuturi; Arnaud Doucet;
2014	12	Accelerated Proximal Stochastic Dual Coordinate Ascent For Regularized Loss Minimization IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure.	Shai Shalev-Shwartz; Tong Zhang;
2014	13	Large-scale Multi-label Learning With Missing Labels IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we directly address both these problems by studying the multi-label problem in a generic empirical risk minimization (ERM) framework.	Hsiang-Fu Yu; Prateek Jain; Purushottam Kar; Inderjit Dhillon;
2014	14	A Clockwork RNN IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces a simple, yet powerful modification to the simple RNN (SRN) architecture, the Clockwork RNN (CW-RNN), in which the hidden layer is partitioned into separate modules, each processing inputs at its own temporal granularity, making computations only at its prescribed clock rate.	Jan Koutnik; Klaus Greff; Faustino Gomez; Juergen Schmidhuber;
2014	15	Communication-Efficient Distributed Optimization Using An Approximate Newton-type Method IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems.	Ohad Shamir; Nati Srebro; Tong Zhang;
2013	1	On The Difficulty Of Training Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective.	Razvan Pascanu; Tomas Mikolov; Yoshua Bengio;
2013	2	On The Importance Of Initialization And Momentum In Deep Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization.	Ilya Sutskever; James Martens; George Dahl; Geoffrey Hinton;
2013	3	Regularization Of Neural Networks Using DropConnect IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce DropConnect, a generalization of DropOut, for regularizing large fully-connected layers within neural networks.	Li Wan; Matthew Zeiler; Sixin Zhang; Yann Le Cun; Rob Fergus;
2013	4	Maxout Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout.	Ian Goodfellow; David Warde-Farley; Mehdi Mirza; Aaron Courville; Yoshua Bengio;
2013	5	Deep Canonical Correlation Analysis IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated.	Galen Andrew; Raman Arora; Jeff Bilmes; Karen Livescu;
2013	6	Making A Science Of Model Search: Hyperparameter Optimization In Hundreds Of Dimensions For Vision Architectures IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a meta-modeling approach to support automated hyperparameter optimization, with the goal of providing practical tools that replace hand-tuning with a reproducible and unbiased optimization process.	James Bergstra; Daniel Yamins; David Cox;
2013	7	Learning Fair Representations IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the proportion in the population as a whole), and individual fairness (similar individuals should be treated similarly).	Rich Zemel; Yu Wu; Kevin Swersky; Toni Pitassi; Cynthia Dwork;
2013	8	Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, and discuss the broad application areas of this approach.	Martin Jaggi;
2013	9	Deep Learning With COTS HPC Systems IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI.	ADAM COATES et. al.
2013	10	Guided Policy Search IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima.	Sergey Levine; Vladlen Koltun;
2013	11	Thompson Sampling For Contextual Bandits With Linear Payoffs IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we design and analyze Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary.	Shipra Agrawal; Navin Goyal;
2013	12	Domain Generalization Via Invariant Feature Representation IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Domain-Invariant Component Analysis (DICA), a kernel-based optimization algorithm that learns an invariant transformation by minimizing the dissimilarity across domains, whilst preserving the functional relationship between input and output variables.	Krikamol Muandet; David Balduzzi; Bernhard Sch�lkopf;
2013	13	Connecting The Dots With Landmarks: Discriminatively Learning Domain-Invariant Features For Unsupervised Domain Adaptation IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach for learning such features.	Boqing Gong; Kristen Grauman; Fei Sha;
2013	14	Gaussian Process Kernels For Pattern Discovery And Extrapolation IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce simple closed form kernels that can be used with Gaussian processes to discover patterns and enable extrapolation.	Andrew Wilson; Ryan Adams;
2013	15	Stochastic Gradient Descent For Non-smooth Optimization: Convergence Results And Optimal Averaging Schemes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the performance of SGD \emphwithout such smoothness assumptions, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy.	Ohad Shamir; Tong Zhang;
2012	1	Building High-level Features Using Large Scale Unsupervised Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider the challenge of building feature detectors for high-level concepts from only unlabeled data.	QUOC LE et. al.
2012	2	Conversational Speech Transcription Using Context-Dependent Deep Neural Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Abstract: Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, combine the classic artificial-neural-network HMMs with traditional context-dependent acoustic modeling and …	Dong Yu; Frank Seide; Gang Li;
2012	3	Poisoning Attacks Against Support Vector Machines IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: As we demonstrate in this contribution, an intelligent adversary can to some extent predict the change of the SVM decision function in response to malicious input and use this ability to construct malicious data points.	Battista Biggio; Blaine Nelson; Pavel Laskov;
2012	4	Marginalized Denoising Autoencoders For Domain Adaptation IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a variation, marginalized SDA (mSDA).	Minmin Chen; Zhixiang Xu; Kilian Weinberger; Fei Sha;
2012	5	Modeling Temporal Dependencies In High-Dimensional Sequences: Application To Polyphonic Music Generation And Transcription IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences.	Nicolas Boulanger-Lewandowski; Yoshua Bengio; Pascal Vincent;
2012	6	Making Gradient Descent Optimal For Strongly Convex Stochastic Optimization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the optimality of SGD in a stochastic setting.	Alexander Rakhlin; Ohad Shamir; Karthik Sridharan;
2012	7	A Fast And Simple Algorithm For Training Neural Probabilistic Language Models IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions.	Andriy Mnih; Yee Whye Teh;
2012	8	High Dimensional Semiparametric Gaussian Copula Graphical Models IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a semiparametric approach named nonparanormal SKEPTIC for efficiently and robustly estimating high dimensional undirected graphical models.	Han Liu; Fang Han; Ming Yuan; John Lafferty; Larry Wasserman;
2012	9	Learning Task Grouping And Overlap In Multi-task Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a framework for multi-task learn- ing that enables one to selectively share the information across the tasks.	Abhishek Kumar; Hal Daume III;
2012	10	Fast Approximation Of Matrix Coherence And Statistical Leverage IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Interestingly, to achieve our result we judiciously apply random projections on both sides of A.	Michael Mahoney; Petros Drineas; Malik Magdon-Ismail; David Woodruff;
2012	11	Parallelizing Exploration-Exploitation Tradeoffs With Gaussian Process Bandit Optimization IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We formalize the task as a multi-armed bandit problem, where the unknown payoff function is sampled from a Gaussian process (GP), and instead of a single arm, in each round we pull a batch of several arms in parallel.	Thomas Desautels; Andreas Krause; Joel Burdick;
2012	12	Variational Bayesian Inference With Stochastic Search IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound in all models.	John Paisley; David Blei; Michael Jordan;
2012	13	Learning To Label Aerial Images From Noisy Data IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose two robust loss functions for dealing with these kinds of label noise and use the loss functions to train a deep neural network on two challenging aerial image datasets.	Volodymyr Mnih; Geoffrey Hinton;
2012	14	Revisiting K-means: New Algorithms Via Bayesian Nonparametrics IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint.	Brian Kulis; Michael Jordan;
2012	15	A Joint Model Of Language And Perception For Grounded Attribute Learning IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present an approach for joint learning of language and perception models for grounded attribute induction.	Cynthia Matuszek; Nicholas FitzGerald; Luke Zettlemoyer; Liefeng Bo; Dieter Fox;
2011	1	Multimodal Deep Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel application of deep networks to learn features over multiple modalities.	JIQUAN NGIAM et. al.
2011	2	Domain Adaptation For Large-Scale Sentiment Classification: A Deep Learning Approach IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a deep learning approach which learns to extract a meaningful representation for each review in an unsupervised fashion.	Xavier Glorot; Antoine Bordes; Yoshua Bengio;
2011	3	Bayesian Learning Via Stochastic Gradient Langevin Dynamics IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches.	Max Welling; Yee Whye Teh;
2011	4	A Three-Way Model For Collective Learning On Multi-Relational Data IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we present a novel approach to relational learning based on the factorization of a three-way tensor.	Maximilian Nickel; Volker Tresp; Hans-Peter Kriegel;
2011	5	Parsing Natural Scenes And Natural Language With Recursive Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a max-margin structure prediction architecture based on recursive neural networks that can successfully recover such structure both in complex scene images as well as sentences.	Richard Socher; Cliff Chiung-Yu Lin; Andrew Ng; Chris Manning;
2011	6	Generating Text With Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we demonstrate the power of RNNs trained with the new Hessian-Free optimizer (HF) by applying them to character-level language modeling tasks.	Ilya Sutskever; James Martens; Geoffrey Hinton;
2011	7	Contractive Auto-Encoders: Explicit Invariance During Feature Extraction IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present in this paper a novel approach for training deterministic auto-encoders.	Salah RIFAI; Pascal Vincent; Xavier Muller; Xavier Glorot; Yoshua Bengio;
2011	8	PILCO: A Model-Based And Data-Efficient Approach To Policy Search IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method.	Marc Deisenroth; Carl Rasmussen;
2011	9	Hashing With Graphs IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel graph-based hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes.	Wei Liu; Jun Wang; Sanjiv Kumar; Shih-Fu Chang;
2011	10	On Optimization Methods For Deep Learning IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with linesearch can significantly simplify and speed up the process of pretraining deep algorithms.	QUOC LE et. al.
2011	11	Minimal Loss Hashing For Compact Binary Codes IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method for learning similarity-preserving hash functions that map high-dimensional data onto binary codes.	Mohammad Norouzi; David Fleet;
2011	12	A Co-training Approach For Multi-view Spectral Clustering IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a spectral clustering algorithm for the multi-view setting where we have access to multiple views of the data, each of which can be independently used for clustering.	Abhishek Kumar; Hal Daume III; University of Maryland;
2011	13	The Importance Of Encoding Versus Training With Sparse Coding And Vector Quantization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we investigate the reasons for the success of sparse coding over VQ by decoupling these phases, allowing us to separate out the contributions of the training and encoding in a controlled way.	Adam Coates; Andrew Ng;
2011	14	Learning Recurrent Neural Networks With Hessian-Free Optimization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies.	James Martens; Ilya Sutskever;
2011	15	Doubly Robust Policy Evaluation And Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we leverage the strength and overcome the weaknesses of the two approaches by applying the \emph{doubly robust} technique to the problems of policy evaluation and optimization.	Miroslav Dudik; John Langford; Lihong Li;
2010	1	Rectified Linear Units Improve Restricted Boltzmann Machines IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Abstract: Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all …	Vinod Nair; Geoffrey Hinton;
2010	2	3D Convolutional Neural Networks For Human Action Recognition IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we develop a novel 3D CNN model for action recognition.	Shuiwang Ji; Wei Xu; Ming Yang; Kai Yu;
2010	3	Gaussian Process Optimization In The Bandit Setting: No Regret And Experimental Design IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We formalize this task as a multi-armed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm.	Niranjan Srinivas; Andreas Krause; Sham Kakade; Matthias Seeger;
2010	4	Robust Subspace Segmentation By Low-Rank Representation IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose low-rank representation(LRR) to segment data drawn from a union of multiple linear (or affine) subspaces.	Guangcan Liu; Zhouchen Lin; Yong Yu;
2010	5	Learning Fast Approximations Of Sparse Coding IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We proposed two versions of a very fast algorithm that produces approximate estimates of the sparse code that can be used to compute good visual features, or to initialize exact iterative algorithms.	Karol Gregor; Yann LeCun;
2010	6	Deep Learning Via Hessian-free Optimization IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We develop a 2nd-order optimization method based on the “Hessian-free approach, and apply it to training deep auto-encoders.	James Martens;
2010	7	Large Graph Construction For Scalable Semi-Supervised Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the scalability issue plaguing graph-based semi-supervised learning viaa small number of anchor points which adequately cover the entire point cloud.	Wei Liu; Junfeng He; Shih-Fu Chang;
2010	8	Estimation Of (near) Low-rank Matrices With Noise And High-dimensional Scaling IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We study an instance of high-dimensional statistical inference inwhich the goal is to use $N$ noisy observations to estimate a matrix$\Theta^* \in \real^{k \times p}$ that is assumed to be either exactlylow rank, or "near" low-rank, meaning that it can bewell-approximated by a matrix with low rank.	Sahand Negahban; Martin Wainwright;
2010	9	Tree-Guided Group Lasso For Multi-Task Regression With Structured Sparsity IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our goal is to recover the common set of relevant inputs for each output cluster.	Seyoung Kim; Eric Xing;
2010	10	Bayes Optimal Multilabel Classification Via Probabilistic Classifier Chains IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The goal of this paper is to elaborate on this postulate in a critical way.	Krzysztof Dembczynski; Weiwei Cheng; Eyke Huellermeier;
2010	11	Application Of Machine Learning To Epileptic Seizure Detection IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present and evaluate a machine learning approach to constructing patient-specific classifiers that detect the onset of an epileptic seizure through analysis of the scalp EEG, a non-invasive measure of the brain�s electrical activity.	Ali Shoeb; John Guttag;
2010	12	Proximal Methods For Sparse Hierarchical Dictionary Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to combine two approaches for modeling data admitting sparse representations: on the one hand, dictionary learning has proven effective for various signal processing tasks.	Rodolphe Jenatton; Julien Mairal; Guillaume Obozinski; Francis Bach;
2010	13	Distance Dependent Chinese Restaurant Processes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability.	David Blei; Peter Frazier;
2010	14	Sequential Projection Learning For Hashing With Compact Codes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel data-dependent projection learning method such that each hash function is designed to correct the errors made by the previous one sequentially.	Jun Wang; Sanjiv Kumar; Shih-Fu Chang;
2010	15	Metric Learning To Rank IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a general metric learning algorithm, based on the structural SVM framework, to learn a metric such that rankings of data induced by distance from a query can be optimized against various ranking measures, such as AUC, Precision-at-k, MRR, MAP or NDCG.	Brian McFee; Gert Lanckriet;
2009	1	Curriculum Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we formalize such training strategies in the context of machine learning, and call them "curriculum learning".	Yoshua Bengio; Jérôme Louradour; Ronan Collobert; Jason Weston;
2009	2	Online Dictionary Learning For Sparse Coding IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a new online optimization algorithm for dictionary learning, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples.	Julien Mairal; Francis Bach; Jean Ponce; Guillermo Sapiro;
2009	3	Group Lasso With Overlap And Graph Lasso IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new penalty function which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators.	Laurent Jacob; Guillaume Obozinski; Jean-Philippe Vert;
2009	4	Learning Structural SVMs With Latent Variables IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a large-margin formulation and algorithm for structured output prediction that allows the use of latent variables.	Chun-Nam John Yu; Thorsten Joachims;
2009	5	Multi-view Clustering Via Canonical Correlation Analysis IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA).	Kamalika Chaudhuri; Sham M. Kakade; Karen Livescu; Karthik Sridharan;
2009	6	Large-scale Deep Unsupervised Learning Using Graphics Processors IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we suggest massively parallel methods to help resolve these problems.	Rajat Raina; Anand Madhavan; Andrew Y. Ng;
2009	7	Information Theoretic Measures For Clusterings Comparison: Is A Correction For Chance Necessary? IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison.	Nguyen Xuan Vinh; Julien Epps; James Bailey;
2009	8	An Accelerated Gradient Method For Trace Norm Minimization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O(1/k).	Shuiwang Ji; Jieping Ye;
2009	9	Identifying Suspicious URLs: An Application Of Large-scale Online Learning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper explores online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs.	Justin Ma; Lawrence K. Saul; Stefan Savage; Geoffrey M. Voelker;
2009	10	Fast Gradient-descent Methods For Temporal-difference Learning With Linear Function Approximation IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we introduce two new related algorithms with better convergence rates.	RICHARD S. SUTTON et. al.
2009	11	Learning With Structured Sparsity IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper investigates a new learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing.	Junzhou Huang; Tong Zhang; Dimitris Metaxas;
2009	12	More Generality In Efficient Multiple Kernel Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we observe that existing MKL formulations can be extended to learn general kernel combinations subject to general regularization.	Manik Varma; Bodla Rakesh Babu;
2009	13	Incorporating Domain Knowledge Into Topic Modeling Via Dirichlet Forest Priors IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present its construction, and inference via collapsed Gibbs sampling.	David Andrzejewski; Xiaojin Zhu; Mark Craven;
2009	14	Factored Conditional Restricted Boltzmann Machines For Modeling Motion Style IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a new model, based on the CRBM that preserves its most important computational properties and includes multiplicative three-way interactions that allow the effective interaction weight between two units to be modulated by the dynamic state of a third unit.	Graham W. Taylor; Geoffrey E. Hinton;
2009	15	Multi-instance Learning By Treating Instances As Non-I.I.D. Samples IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose two simple yet effective methods.	Zhi-Hua Zhou; Yu-Yin Sun; Yu-Feng Li;
2008	1	Extracting And Composing Robust Features With Denoising Autoencoders IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern.	Pascal Vincent; Hugo Larochelle; Yoshua Bengio; Pierre-Antoine Manzagol;
2008	2	A Unified Architecture For Natural Language Processing: Deep Neural Networks With Multitask Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model.	Ronan Collobert; Jason Weston;
2008	3	Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte Carlo IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we present a fully Bayesian treatment of the Probabilistic Matrix Factorization (PMF) model in which model capacity is controlled automatically by integrating over all model parameters and hyperparameters.	Ruslan Salakhutdinov; Andriy Mnih;
2008	4	A Dual Coordinate Descent Method For Large-scale Linear SVM IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel dual coordinate descent method for linear SVM with L1-and L2-loss functions.	Cho-Jui Hsieh; Kai-Wei Chang; Chih-Jen Lin; S. Sathiya Keerthi; S. Sundararajan;
2008	5	Training Restricted Boltzmann Machines Using Approximations To The Likelihood Gradient IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A new algorithm for training Restricted Boltzmann Machines is introduced.	Tijmen Tieleman;
2008	6	Classification Using Discriminative Restricted Boltzmann Machines IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we argue that RBMs provide a self-contained framework for deriving competitive non-linear classifiers.	Hugo Larochelle; Yoshua Bengio;
2008	7	Listwise Approach To Learning To Rank: Theory And Algorithm IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims to conduct a study on the listwise approach to learning to rank.	Fen Xia; Tie-Yan Liu; Jue Wang; Wensheng Zhang; Hang Li;
2008	8	Deep Learning Via Semi-supervised Embedding IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show how nonlinear embedding algorithms popular for use with shallow semi-supervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture.	Jason Weston; Frédéric Ratle; Ronan Collobert;
2008	9	Grassmann Discriminant Analysis: A Unifying View On Subspace-based Learning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose a discriminant learning framework for problems in which data consist of linear subspaces instead of vectors.	Jihun Hamm; Daniel D. Lee;
2008	10	An Empirical Evaluation Of Supervised Learning In High Dimensions IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we perform an empirical evaluation of supervised learning on high-dimensional data.	Rich Caruana; Nikos Karampatziakis; Ainur Yessenalina;
2008	11	Learning Diverse Rankings With Multi-armed Bandits IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present two online learning algorithms that directly learn a diverse ranking of documents based on users’ clicking behavior.	Filip Radlinski; Robert Kleinberg; Thorsten Joachims;
2008	12	On The Quantitative Analysis Of Deep Belief Networks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and we present a novel AIS scheme for comparing RBM’s with different architectures.	Ruslan Salakhutdinov; Iain Murray;
2008	13	Hierarchical Sampling For Active Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an active learning scheme that exploits cluster structure in data.	Sanjoy Dasgupta; Daniel Hsu;
2008	14	Confidence-weighted Linear Classification IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce confidence-weighted linear classifiers, which add parameter confidence information to linear classifiers.	Mark Dredze; Koby Crammer; Fernando Pereira;
2008	15	Fast Support Vector Machine Training And Classification On Graphics Processors IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe a solver for Support Vector Machine training running on a GPU, using the Sequential Minimal Optimization algorithm and an adaptive first and second order working set selection heuristic, which achieves speedups of 9-35x over LIBSVM running on a traditional processor.	Bryan Catanzaro; Narayanan Sundaram; Kurt Keutzer;
2007	1	Pegasos: Primal Estimated Sub-GrAdient SOlver For SVM IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe and analyze a simple and effective iterative algorithm for solving the optimization problem cast by Support Vector Machines (SVM).	Shai Shalev-Shwartz; Yoram Singer; Nathan Srebro;
2007	2	Information-theoretic Metric Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function.	Jason V. Davis; Brian Kulis; Prateek Jain; Suvrit Sra; Inderjit S. Dhillon;
2007	3	Restricted Boltzmann Machines For Collaborative Filtering IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM’s), can be used to model tabular data, such as user’s ratings of movies.	Ruslan Salakhutdinov; Andriy Mnih; Geoffrey Hinton;
2007	4	Learning To Rank: From Pairwise Approach To Listwise Approach IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The paper proposes a new probabilistic method for the approach.	Zhe Cao; Tao Qin; Tie-Yan Liu; Ming-Feng Tsai; Hang Li;
2007	5	Self-taught Learning: Transfer Learning From Unlabeled Data IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe an approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data.	Rajat Raina; Alexis Battle; Honglak Lee; Benjamin Packer; Andrew Y. Ng;
2007	6	Boosting For Transfer Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel transfer learning framework called TrAdaBoost, which extends boosting-based learning algorithms (Freund & Schapire, 1997).	Wenyuan Dai; Qiang Yang; Gui-Rong Xue; Yong Yu;
2007	7	An Empirical Evaluation Of Deep Architectures On Problems With Many Factors Of Variation IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recently, several learning algorithms relying on models with deep architectures have been proposed.	Hugo Larochelle; Dumitru Erhan; Aaron Courville; James Bergstra; Yoshua Bengio;
2007	8	Spectral Feature Selection For Supervised And Unsupervised Learning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory.	Zheng Zhao; Huan Liu;
2007	9	Experimental Perspectives On Learning From Imbalanced Data IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We address these and other issues in this work, showing that sampling in many cases will improve classifier performance.	Jason Van Hulse; Taghi M. Khoshgoftaar; Amri Napolitano;
2007	10	Three New Graphical Models For Statistical Language Modelling IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose three new probabilistic language models that define the distribution of the next word in a sequence given several preceding words by using distributed representations of those words.	Andriy Mnih; Geoffrey Hinton;
2007	11	Combining Online And Offline Knowledge In UCT IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider three approaches for combining offline and online value functions in the UCT algorithm.	Sylvain Gelly; David Silver;
2007	12	Spectral Clustering And Transductive Learning With Multiple Views IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider spectral clustering and transductive inference for data with multiple views.	Dengyong Zhou; Christopher J. C. Burges;
2007	13	Discriminative Learning For Differing Training And Test Distributions IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We address classification problems for which the training instances are governed by a distribution that is allowed to differ arbitrarily from the test distribution—problems also referred to as classification under covariate shift.	Steffen Bickel; Michael Brückner; Tobias Scheffer;
2007	14	Uncovering Shared Structures In Multiclass Classification IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper suggests a method for multiclass learning with many classes by simultaneously learning shared characteristics common to the classes, and predictors for the classes in terms of these characteristics.	Yonatan Amit; Michael Fink; Nathan Srebro; Shimon Ullman;
2007	15	More Efficiency In Multiple Kernel Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the MKL problem through an adaptive 2-norm regularization formulation.	Alain Rakotomamonjy; Francis Bach; Stéphane Canu; Yves Grandvalet;
2006	1	The Relationship Between Precision-Recall And ROC Curves IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve.	Jesse Davis; Mark Goadrich;
2006	2	Connectionist Temporal Classification: Labelling Unsegmented Sequence Data With Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems.	Alex Graves; Santiago Fernández; Faustino Gomez; Jürgen Schmidhuber;
2006	3	Dynamic Topic Models IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics.	David M. Blei; John D. Lafferty;
2006	4	An Empirical Comparison Of Supervised Learning Algorithms IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps.	Rich Caruana; Alexandru Niculescu-Mizil;
2006	5	Topic Modeling: Beyond Bag-of-words IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, I explore a hierarchical generative probabilistic model that incorporates both n-gram statistics and latent topic variables by extending a unigram topic model to include properties of a hierarchical Dirichlet bigram language model.	Hanna M. Wallach;
2006	6	Cover Trees For Nearest Neighbor IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points).	Alina Beygelzimer; Sham Kakade; John Langford;
2006	7	Pachinko Allocation: DAG-structured Mixture Models Of Topic Correlations IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG).	Wei Li; Andrew McCallum;
2006	8	Fast Time Series Classification Using Numerosity Reduction IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose an additional technique, numerosity reduction, to speed up one-nearest-neighbor DTW.	Xiaopeng Xi; Eamonn Keogh; Christian Shelton; Li Wei; Chotirat Ann Ratanamahatana;
2006	9	Maximum Margin Planning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert’s behavior.	Nathan D. Ratliff; J. Andrew Bagnell; Martin A. Zinkevich;
2006	10	Probabilistic Inference For Solving Discrete And Continuous State Markov Decision Processes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we present an Expectation Maximization algorithm for computing optimal policies.	Marc Toussaint; Amos Storkey;
2006	11	Label Propagation Through Linear Neighborhoods IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A novel semi-supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood.	Fei Wang; Changshui Zhang;
2006	12	Agnostic Active Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We state and analyze the first active learning algorithm which works in the presence of arbitrary forms of noise.	Maria-Florina Balcan; Alina Beygelzimer; John Langford;
2006	13	Batch Mode Active Learning And Its Application To Medical Image Classification IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a framework for "batch mode active learning" that applies the Fisher information matrix to select a number of informative examples simultaneously.	Steven C. H. Hoi; Rong Jin; Jianke Zhu; Michael R. Lyu;
2006	14	PAC Model-free Reinforcement Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm—Delayed Q-Learning.	Alexander L. Strehl; Lihong Li; Eric Wiewiora; John Langford; Michael L. Littman;
2006	15	Accelerated Training Of Conditional Random Fields With Stochastic Gradient Methods IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We apply Stochastic Meta-Descent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs).	S. V. N. Vishwanathan; Nicol N. Schraudolph; Mark W. Schmidt; Kevin P. Murphy;
2005	1	Learning To Rank Using Gradient Descent IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function.	CHRIS BURGES et. al.
2005	2	Fast Maximum Margin Matrix Factorization For Collaborative Prediction IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we investigate a direct gradient-based optimization method for MMMF and demonstrate it on large collaborative prediction problems.	Jasson D. M. Rennie; Nathan Srebro;
2005	3	Predicting Good Probabilities With Supervised Learning IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We examine the relationship between the predictions made by different learning algorithms and true posterior probabilities.	Alexandru Niculescu-Mizil; Rich Caruana;
2005	4	A Support Vector Method For Multivariate Performance Measures IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F₁-score.	Thorsten Joachims;
2005	5	Comparing Clusterings: An Axiomatic View IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Abstract: This paper views clusterings as elements of a lattice. Distances between clusterings are analyzed in their relationship to the lattice. From this vantage point, we first give an …	Marina Meilǎ;
2005	6	Learning Structured Prediction Models: A Large Margin Approach IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our goal is to learn parameters such that inference using the model reproduces correct answers on the training data.	Ben Taskar; Vassil Chatalbashev; Daphne Koller; Carlos Guestrin;
2005	7	Non-negative Tensor Factorization With Applications To Statistics And Computer Vision IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We derive algorithms for finding a non-negative n-dimensional tensor factorization (n-NTF) which includes the non-negative matrix factorization (NMF) as a particular case when n = 2.	Amnon Shashua; Tamir Hazan;
2005	8	Near-optimal Sensor Placements In Gaussian Processes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a mutual information criteria, and show that it produces better placements.	Carlos Guestrin; Andreas Krause; Ajit Paul Singh;
2005	9	Beyond The Point Cloud: From Transductive To Semi-supervised Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show how to turn transductive and standard supervised learning algorithms into semi-supervised learners.	Vikas Sindhwani; Partha Niyogi; Mikhail Belkin;
2005	10	High Speed Obstacle Avoidance Using Monocular Vision And Reinforcement Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an approach in which supervised learning is first used to estimate depths from single monocular images.	Jeff Michels; Ashutosh Saxena; Andrew Y. Ng;
2005	11	Learning From Labeled And Unlabeled Data On A Directed Graph IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered.	Dengyong Zhou; Jiayuan Huang; Bernhard Schölkopf;
2005	12	Learning Gaussian Processes From Multiple Tasks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider the problem of multi-task learning, that is, learning multiple related functions.	Kai Yu; Volker Tresp; Anton Schwaighofer;
2005	13	Reinforcement Learning With Gaussian Processes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a new generative model for the value function, deduced from its relation with the discounted return.	Yaakov Engel; Shie Mannor; Ron Meir;
2005	14	Bayesian Hierarchical Clustering IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel algorithm for agglomerative hierarchical clustering based on evaluating marginal likelihoods of a probabilistic model.	Katherine A. Heller; Zoubin Ghahramani;
2005	15	Learning The Structure Of Markov Logic Networks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks.	Stanley Kok; Pedro Domingos;
2004	1	Apprenticeship Learning Via Inverse Reinforcement Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert.	Pieter Abbeel; Andrew Y. Ng;
2004	2	A Maximum Entropy Approach To Species Distribution Modeling IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose the use of maximum-entropy techniques for this problem, specifically, sequential-update algorithms that can handle a very large number of features.	Steven J. Phillips; Miroslav Dudík; Robert E. Schapire;
2004	3	Multiple Kernel Learning, Conic Duality, And The SMO Algorithm IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel dual formulation of the QCQP as a second-order cone programming problem, and show how to exploit the technique of Moreau-Yosida regularization to yield a formulation to which SMO techniques can be applied.	Francis R. Bach; Gert R. G. Lanckriet; Michael I. Jordan;
2004	4	Support Vector Machine Learning For Interdependent And Structured Output Spaces IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs.	Ioannis Tsochantaridis; Thomas Hofmann; Thorsten Joachims; Yasemin Altun;
2004	5	K-means Clustering Via Principal Component Analysis IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering.	Chris Ding; Xiaofeng He;
2004	6	Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study stochastic gradient descent (SGD) algorithms on regularized forms of linear prediction methods.	Tong Zhang;
2004	7	Integrating Constraints And Metric Learning In Semi-supervised Clustering IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper provides new methods for the two approaches as well as presents a new semi-supervised clustering algorithm that integrates both of these techniques in a uniform, principled framework.	Mikhail Bilenko; Sugato Basu; Raymond J. Mooney;
2004	8	Dynamic Conditional Random Fields: Factorized Probabilistic Models For Labeling And Segmenting Sequence Data IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks (DBNs)—and parameters are tied across slices.	Charles Sutton; Khashayar Rohanimanesh; Andrew McCallum;
2004	9	Learning And Evaluating Classifiers Under Sample Selection Bias IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we formalize the sample selection bias problem in machine learning terms and study analytically and experimentally how a number of well-known classifier learning methods are affected by it.	Bianca Zadrozny;
2004	10	Ensemble Selection From Libraries Of Models IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a method for constructing ensembles from libraries of thousands of models.	Rich Caruana; Alexandru Niculescu-Mizil; Geoff Crew; Alex Ksikes;
2004	11	A Kernel View Of The Dimensionality Reduction Of Manifolds IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show how all three algorithms can be described as kernel PCA on specially constructed Gram matrices, and illustrate the similarities and differences between the algorithms with representative examples.	Jihun Ham; Daniel D. Lee; Sebastian Mika; Bernhard Schölkopf;
2004	12	Learning A Kernel Matrix For Nonlinear Dimensionality Reduction IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold.	Kilian Q. Weinberger; Fei Sha; Lawrence K. Saul;
2004	13	Active Learning Using Pre-clustering IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The main contribution of the paper is a formal framework that incorporates clustering into active learning.	Hieu T. Nguyen; Arnold Smeulders;
2004	14	Solving Cluster Ensemble Problems By Bipartite Graph Partitioning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a new reduction method that constructs a bipartite graph from a given cluster ensemble.	Xiaoli Zhang Fern; Carla E. Brodley;
2004	15	Margin Based Feature Selection – Theory And Algorithms IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we introduce a margin based feature selection criterion and apply it to measure the quality of sets of features.	Ran Gilad-Bachrach; Amir Navot; Naftali Tishby;