# Most Influential ICML Papers (2022-02)

The International Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. Paper Digest Team analyzes all papers published on ICML in the past years, and presents the 15 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the most influential papers from other conferences/journals, visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. **(Version: 2022-02)**

Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services on ranking, search, tracking and literature review.

If you do not want to miss any interesting academic paper, you are welcome to **sign up our free daily paper digest service ** to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team

New York City, New York, 10017

team@paperdigest.org

#### TABLE 1: Most Influential ICML Papers (2022-02)

Year | Rank | Paper | Author(s) |
---|---|---|---|

2021 | 1 | Training Data-efficient Image Transformers & Distillation Through AttentionIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days. |
HUGO TOUVRON et. al. |

2021 | 2 | Learning Transferable Visual Models From Natural Language SupervisionIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. |
ALEC RADFORD et. al. |

2021 | 3 | Zero-Shot Text-to-Image GenerationIF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. |
ADITYA RAMESH et. al. |

2021 | 4 | Barlow Twins: Self-Supervised Learning Via Redundancy ReductionIF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. |
Jure Zbontar; Li Jing; Ishan Misra; Yann Lecun; Stephane Deny; |

2021 | 5 | Is Space-Time Attention All You Need for Video Understanding?IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a convolution-free approach to video classification built exclusively on self-attention over space and time. |
Gedas Bertasius; Heng Wang; Lorenzo Torresani; |

2021 | 6 | WILDS: A Benchmark of In-the-Wild Distribution ShiftsIF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: To address this gap, we present WILDS, a curated benchmark of 10 datasets reflecting a diverse range of distribution shifts that naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. |
PANG WEI KOH et. al. |

2021 | 7 | Out-of-Distribution Generalization Via Risk Extrapolation (REx)IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We motivate this approach, Risk Extrapolation (REx), as a form of robust optimization over a perturbation set of extrapolated domains (MM-REx), and propose a penalty on the variance of training risks (V-REx) as a simpler variant. |
DAVID KRUEGER et. al. |

2021 | 8 | Scaling Up Visual and Vision-Language Representation Learning With Noisy Text SupervisionIF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset. |
CHAO JIA et. al. |

2021 | 9 | EfficientNetV2: Smaller Models and Faster TrainingIF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models. |
Mingxing Tan; Quoc Le; |

2021 | 10 | High-Performance Large-Scale Image Recognition Without NormalizationIF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets. |
Andy Brock; Soham De; Samuel L Smith; Karen Simonyan; |

2021 | 11 | Decoupling Representation Learning from Reinforcement LearningIF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning. |
Adam Stooke; Kimin Lee; Pieter Abbeel; Michael Laskin; |

2021 | 12 | ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionIF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we present a minimal VLP model, Vision-and-Language Transformer (ViLT), monolithic in the sense that the processing of visual inputs is drastically simplified to just the same convolution-free manner that we process textual inputs. |
Wonjae Kim; Bokyung Son; Ildoo Kim; |

2021 | 13 | Perceiver: General Perception with Iterative AttentionIF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we introduce the Perceiver {–} a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets. |
ANDREW JAEGLE et. al. |

2021 | 14 | Improved Denoising Diffusion Probabilistic ModelsIF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We show that with a few simple modifications, DDPMs can also achieve competitive log-likelihoods while maintaining high sample quality. |
Alexander Quinn Nichol; Prafulla Dhariwal; |

2021 | 15 | E(n) Equivariant Graph Neural NetworksIF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper introduces a new model to learn graph neural networks equivariant to rotations, translations, reflections and permutations called E(n)-Equivariant Graph Neural Networks (EGNNs). |
Vi?ctor Garcia Satorras; Emiel Hoogeboom; Max Welling; |

2020 | 1 | A Simple Framework For Contrastive Learning Of Visual RepresentationsIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper presents a simple framework for contrastive representation learning. |
Ting Chen; Simon Kornblith; Mohammad Norouzi; Geoffrey Hinton; |

2020 | 2 | Data-Efficient Image Recognition With Contrastive Predictive CodingIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We therefore revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning such representations. |
Olivier Henaff; |

2020 | 3 | PEGASUS: Pre-training With Extracted Gap-sentences For Abstractive SummarizationIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective. |
Jingqing Zhang; Yao Zhao; Mohammad Saleh; Peter Liu; |

2020 | 4 | Generative Pretraining From PixelsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. |
MARK CHEN et. al. |

2020 | 5 | Reliable Evaluation Of Adversarial Robustness With An Ensemble Of Diverse Parameter-free AttacksIF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we first propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness. |
Francesco Croce; Matthias Hein; |

2020 | 6 | SCAFFOLD: Stochastic Controlled Averaging For Federated LearningIF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: As a solution, we propose a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the `client drift’. |
SAI PRANEETH REDDY KARIMIREDDY et. al. |

2020 | 7 | Transformers Are RNNs: Fast Autoregressive Transformers With Linear AttentionIF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from $\bigO{N^2}$ to $\bigO{N}$, where $N$ is the sequence length. |
Angelos Katharopoulos; Apoorv Vyas; Nikolaos Pappas; Francois Fleuret; |

2020 | 8 | Simple And Deep Graph Convolutional NetworksIF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we study the problem of designing and analyzing deep graph convolutional networks. |
Ming Chen; Zhewei Wei; Zengfeng Huang; Bolin Ding; Yaliang Li; |

2020 | 9 | Overfitting In Adversarially Robust Deep LearningIF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we empirically study this phenomenon in the setting of adversarially trained deep networks, which are trained to minimize the loss under worst-case adversarial perturbations. |
Eric Wong; Leslie Rice; Zico Kolter; |

2020 | 10 | Agent57: Outperforming The Atari Human BenchmarkIF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games. |
ADRI&AGRAVE; PUIGDOMENECH BADIA et. al. |

2020 | 11 | Contrastive Multi-View Representation Learning On GraphsIF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce a self-supervised approach for learning node and graph level representations by contrasting structural views of graphs. |
Kaveh Hassani; Amir Hosein Khasahmadi; |

2020 | 12 | On Gradient Descent Ascent For Nonconvex-Concave Minimax ProblemsIF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. |
Tianyi Lin; Chi Jin; Michael Jordan; |

2020 | 13 | Do We Really Need To Access The Source Data? Source Hypothesis Transfer For Unsupervised Domain AdaptationIF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work we tackle a novel setting where only a trained source model is available and investigate how we can effectively utilize such a model without source data to solve UDA problems. |
Jian Liang; Dapeng Hu; Jiashi Feng; |

2020 | 14 | Leveraging Procedural Generation To Benchmark Reinforcement LearningIF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce Procgen Benchmark, a suite of 16 procedurally generated game-like environments designed to benchmark both sample efficiency and generalization in reinforcement learning. |
Karl Cobbe; Chris Hesse; Jacob Hilton; John Schulman; |

2020 | 15 | The Many Shapley Values For Model ExplanationIF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we use the axiomatic approach to study the differences between some of the many operationalizations of the Shapley value for attribution, and propose a technique called Baseline Shapley (BShap) that is backed by a proper uniqueness result. |
Mukund Sundararajan; Amir Najmi; |

2019 | 1 | EfficientNet: Rethinking Model Scaling For Convolutional Neural NetworksIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. |
Mingxing Tan; Quoc Le; |

2019 | 2 | Self-Attention Generative Adversarial NetworksIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. |
Han Zhang; Ian Goodfellow; Dimitris Metaxas; Augustus Odena; |

2019 | 3 | Theoretically Principled Trade-off Between Robustness And AccuracyIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. |
HONGYANG ZHANG et. al. |

2019 | 4 | Simplifying Graph Convolutional NetworksIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we reduce this excess complexity through successively removing nonlinearities and collapsing weight matrices between consecutive layers. |
FELIX WU et. al. |

2019 | 5 | A Convergence Theory For Deep Learning Via Over-ParameterizationIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we prove simple algorithms such as stochastic gradient descent (SGD) can find Global Minima on the training objective of DNNs in Polynomial Time. |
Zeyuan Allen-Zhu; Yuanzhi Li; Zhao Song; |

2019 | 6 | Certified Adversarial Robustness Via Randomized SmoothingIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the L2 norm. |
Jeremy Cohen; Elan Rosenfeld; Zico Kolter; |

2019 | 7 | Gradient Descent Finds Global Minima Of Deep Neural NetworksIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). |
Simon Du; Jason Lee; Haochuan Li; Liwei Wang; Xiyu Zhai; |

2019 | 8 | Challenging Common Assumptions In The Unsupervised Learning Of Disentangled RepresentationsIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. |
FRANCESCO LOCATELLO et. al. |

2019 | 9 | MASS: Masked Sequence To Sequence Pre-training For Language GenerationIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Inspired by the success of BERT, we propose MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks. |
Kaitao Song; Xu Tan; Tao Qin; Jianfeng Lu; Tie-Yan Liu; |

2019 | 10 | Learning Latent Dynamics For Planning From PixelsIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose the Deep Planning Network (PlaNet), a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space. |
DANIJAR HAFNER et. al. |

2019 | 11 | Fine-Grained Analysis Of Optimization And Generalization For Overparameterized Two-Layer Neural NetworksIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR’17]. (ii) Generalization bound independent of network size, using a data-dependent complexity measure. |
Sanjeev Arora; Simon Du; Wei Hu; Zhiyuan Li; Ruosong Wang; |

2019 | 12 | Do ImageNet Classifiers Generalize To ImageNet?IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We build new test sets for the CIFAR-10 and ImageNet datasets. |
Benjamin Recht; Rebecca Roelofs; Ludwig Schmidt; Vaishaal Shankar; |

2019 | 13 | Off-Policy Deep Reinforcement Learning Without ExplorationIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are incapable of learning with data uncorrelated to the distribution under the current policy, making them ineffective for this fixed batch setting. |
Scott Fujimoto; David Meger; Doina Precup; |

2019 | 14 | Self-Attention Graph PoolingIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we propose a graph pooling method based on self-attention. |
Junhyun Lee; Inyeop Lee; Jaewoo Kang; |

2019 | 15 | Manifold Mixup: Better Representations By Interpolating Hidden StatesIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: To address these issues, we propose \manifoldmixup{}, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations. |
VIKAS VERMA et. al. |

2018 | 1 | Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning With A Stochastic ActorIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. |
Tuomas Haarnoja; Aurick Zhou; Pieter Abbeel; Sergey Levine; |

2018 | 2 | Obfuscated Gradients Give A False Sense Of Security: Circumventing Defenses To Adversarial ExamplesIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. |
Anish Athalye; Nicholas Carlini; David Wagner; |

2018 | 3 | CyCADA: Cycle-Consistent Adversarial Domain AdaptationIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a model which adapts between domains using both generative image space alignment and latent representation space alignment. |
JUDY HOFFMAN et. al. |

2018 | 4 | Addressing Function Approximation Error In Actor-Critic MethodsIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested. |
Scott Fujimoto; Herke Hoof; David Meger; |

2018 | 5 | Provable Defenses Against Adversarial Examples Via The Convex Outer Adversarial PolytopeIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. |
Eric Wong; Zico Kolter; |

2018 | 6 | Synthesizing Robust Adversarial ExamplesIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We demonstrate the existence of robust 3D adversarial objects, and we present the first algorithm for synthesizing examples that are adversarial over a chosen distribution of transformations. |
Anish Athalye; Logan Engstrom; Andrew Ilyas; Kevin Kwok; |

2018 | 7 | IMPALA: Scalable Distributed Deep-RL With Importance Weighted Actor-Learner ArchitecturesIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. |
LASSE ESPEHOLT et. al. |

2018 | 8 | Which Training Methods For GANs Do Actually Converge?IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent. |
Lars Mescheder; Andreas Geiger; Sebastian Nowozin; |

2018 | 9 | Disentangling By FactorisingIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. |
Hyunjik Kim; Andriy Mnih; |

2018 | 10 | Representation Learning On Graphs With Jumping Knowledge NetworksIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We analyze some important properties of these models, and propose a strategy to overcome those. |
KEYULU XU et. al. |

2018 | 11 | MentorNet: Learning Data-Driven Curriculum For Very Deep Neural Networks On Corrupted LabelsIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet. |
Lu Jiang; Zhengyuan Zhou; Thomas Leung; Li-Jia Li; Li Fei-Fei; |

2018 | 12 | Learning To Reweight Examples For Robust Deep LearningIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. |
Mengye Ren; Wenyuan Zeng; Bin Yang; Raquel Urtasun; |

2018 | 13 | Interpretability Beyond Feature Attribution: Quantitative Testing With Concept Activation Vectors (TCAV)IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net’s internal state in terms of human-friendly concepts. |
BEEN KIM et. al. |

2018 | 14 | Parallel WaveNet: Fast High-Fidelity Speech SynthesisIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. |
AARON OORD et. al. |

2018 | 15 | Learning Representations And Generative Models For 3D Point CloudsIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we look at geometric data represented as point clouds. |
Panos Achlioptas; Olga Diamanti; Ioannis Mitliagkas; Leonidas Guibas; |

2017 | 1 | Model-Agnostic Meta-Learning For Fast Adaptation Of Deep NetworksIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. |
Chelsea Finn; Pieter Abbeel; Sergey Levine; |

2017 | 2 | Wasserstein Generative Adversarial NetworksIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce a new algorithm named WGAN, an alternative to traditional GAN training. |
Martin Arjovsky; Soumith Chintala; L�on Bottou; |

2017 | 3 | Neural Message Passing For Quantum ChemistryIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we reformulate existing models into a single common framework we call Message Passing Neural Networks (MPNNs) and explore additional novel variations within this framework. |
Justin Gilmer; Samuel S. Schoenholz; Patrick F. Riley; Oriol Vinyals; George E. Dahl; |

2017 | 4 | Convolutional Sequence To Sequence LearningIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce an architecture based entirely on convolutional neural networks. |
Jonas Gehring; Michael Auli; David Grangier; Denis Yarats; Yann N. Dauphin; |

2017 | 5 | Conditional Image Synthesis With Auxiliary Classifier GANsIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we introduce new methods for the improved training of generative adversarial networks (GANs) for image synthesis. |
Augustus Odena; Christopher Olah; Jonathon Shlens; |

2017 | 6 | Axiomatic Attribution For Deep NetworksIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. |
Mukund Sundararajan; Ankur Taly; Qiqi Yan; |

2017 | 7 | On Calibration Of Modern Neural NetworksIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. |
Chuan Guo; Geoff Pleiss; Yu Sun; Kilian Q. Weinberger; |

2017 | 8 | Learning Important Features Through Propagating Activation DifferencesIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. |
Avanti Shrikumar; Peyton Greenside; Anshul Kundaje; |

2017 | 9 | Understanding Black-box Predictions Via Influence FunctionsIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we use influence functions — a classic technique from robust statistics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. |
Pang Wei Koh; Percy Liang; |

2017 | 10 | Learning To Discover Cross-Domain Relations With Generative Adversarial NetworksIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a method based on a generative adversarial network that learns to discover relations between different domains (DiscoGAN). |
Taeksoo Kim; Moonsu Cha; Hyunsoo Kim; Jung Kwon Lee; Jiwon Kim; |

2017 | 11 | Curiosity-driven Exploration By Self-supervised PredictionIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We formulate curiosity as the error in an agent’s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. |
Deepak Pathak; Pulkit Agrawal; Alexei A. Efros; Trevor Darrell; |

2017 | 12 | Language Modeling With Gated Convolutional NetworksIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. |
Yann N. Dauphin; Angela Fan; Michael Auli; David Grangier; |

2017 | 13 | Deep Transfer Learning With Joint Adaptation NetworksIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we present joint adaptation networks (JAN), which learn a transfer network by aligning the joint distributions of multiple domain-specific layers across domains based on a joint maximum mean discrepancy (JMMD) criterion. |
Mingsheng Long; Han Zhu; Jianmin Wang; Michael I. Jordan; |

2017 | 14 | Large-Scale Evolution Of Image ClassifiersIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Our goal is to minimize human participation, so we employ evolutionary algorithms to discover such networks automatically. |
ESTEBAN REAL et. al. |

2017 | 15 | Continual Learning Through Synaptic IntelligenceIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this study, we introduce intelligent synapses that bring some of this biological complexity into artificial neural networks. |
Friedemann Zenke; Ben Poole; Surya Ganguli; |

2016 | 1 | Asynchronous Methods For Deep Reinforcement LearningIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. |
VOLODYMYR MNIH et. al. |

2016 | 2 | Dropout As A Bayesian Approximation: Representing Model Uncertainty In Deep LearningIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes. |
Yarin Gal; Zoubin Ghahramani; |

2016 | 3 | Deep Speech 2 : End-to-End Speech Recognition In English And MandarinIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech–two vastly different languages. |
DARIO AMODEI et. al. |

2016 | 4 | Generative Adversarial Text To Image SynthesisIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we develop a novel deep architecture and GAN formulation to effectively bridge these advances in text and image modeling, translating visual concepts from characters to pixels. |
SCOTT REED et. al. |

2016 | 5 | Dueling Network Architectures For Deep Reinforcement LearningIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we present a new neural network architecture for model-free reinforcement learning. |
ZIYU WANG et. al. |

2016 | 6 | Pixel Recurrent Neural NetworksIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. |
Aaron Van Oord; Nal Kalchbrenner; Koray Kavukcuoglu; |

2016 | 7 | Learning Convolutional Neural Networks For GraphsIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a framework for learning convolutional neural networks for arbitrary graphs. |
Mathias Niepert; Mohamed Ahmed; Konstantin Kutzkov; |

2016 | 8 | Autoencoding Beyond Pixels Using A Learned Similarity MetricIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present an autoencoder that leverages learned representations to better measure similarities in data space. |
Anders Boesen Lindbo Larsen; S�ren Kaae S�nderby; Hugo Larochelle; Ole Winther; |

2016 | 9 | Unsupervised Deep Embedding For Clustering AnalysisIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. |
Junyuan Xie; Ross Girshick; Ali Farhadi; |

2016 | 10 | Benchmarking Deep Reinforcement Learning For Continuous ControlIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. |
Yan Duan; Xi Chen; Rein Houthooft; John Schulman; Pieter Abbeel; |

2016 | 11 | Complex Embeddings For Simple Link PredictionIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: As in previous studies, we propose to solve this problem through latent factorization. |
Th�o Trouillon; Johannes Welbl; Sebastian Riedel; Eric Gaussier; Guillaume Bouchard; |

2016 | 12 | Ask Me Anything: Dynamic Memory Networks For Natural Language ProcessingIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers. |
ANKIT KUMAR et. al. |

2016 | 13 | Revisiting Semi-Supervised Learning With Graph EmbeddingsIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a semi-supervised learning framework based on graph embeddings. |
Zhilin Yang; William Cohen; Ruslan Salakhudinov; |

2016 | 14 | CryptoNets: Applying Neural Networks To Encrypted Data With High Throughput And AccuracyIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we will present a method to convert learned neural networks to CryptoNets, neural networks that can be applied to encrypted data. |
RAN GILAD-BACHRACH et. al. |

2016 | 15 | Group Equivariant Convolutional NetworksIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. |
Taco Cohen; Max Welling; |

2015 | 1 | Batch Normalization: Accelerating Deep Network Training By Reducing Internal Covariate ShiftIF:10 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. |
Sergey Ioffe; Christian Szegedy; |

2015 | 2 | Show, Attend And Tell: Neural Image Caption Generation With Visual AttentionIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. |
KELVIN XU et. al. |

2015 | 3 | Trust Region Policy OptimizationIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement. |
John Schulman; Sergey Levine; Pieter Abbeel; Michael Jordan; Philipp Moritz; |

2015 | 4 | Unsupervised Domain Adaptation By BackpropagationIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled target-domain data is necessary). |
Yaroslav Ganin; Victor Lempitsky; |

2015 | 5 | Learning Transferable Features With Deep Adaptation NetworksIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we propose a new Deep Adaptation Network (DAN) architecture, which generalizes deep convolutional neural network to the domain adaptation scenario. |
Mingsheng Long; Yue Cao; Jianmin Wang; Michael Jordan; |

2015 | 6 | Unsupervised Learning Of Video Representations Using LSTMsIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We use Long Short Term Memory (LSTM) networks to learn representations of video sequences. |
Nitish Srivastava; Elman Mansimov; Ruslan Salakhudinov; |

2015 | 7 | Variational Inference With Normalizing FlowsIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions. |
Danilo Rezende; Shakir Mohamed; |

2015 | 8 | DRAW: A Recurrent Neural Network For Image GenerationIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper introduces the Deep Recurrent Attentive Writer (DRAW) architecture for image generation with neural networks. |
Karol Gregor; Ivo Danihelka; Alex Graves; Danilo Rezende; Daan Wierstra; |

2015 | 9 | Deep Learning With Limited Numerical PrecisionIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We study the effect of limited precision data representation and computation on neural network training. |
Suyog Gupta; Ankur Agrawal; Kailash Gopalakrishnan; Pritish Narayanan; |

2015 | 10 | From Word Embeddings To Document DistancesIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present the Word Mover’s Distance (WMD), a novel distance function between text documents. |
Matt Kusner; Yu Sun; Nicholas Kolkin; Kilian Weinberger; |

2015 | 11 | An Empirical Exploration Of Recurrent Network ArchitecturesIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we aim to determine whether the LSTM architecture is optimal or whether much better architectures exist. |
Rafal Jozefowicz; Wojciech Zaremba; Ilya Sutskever; |

2015 | 12 | Compressing Neural Networks With The Hashing TrickIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes. |
Wenlin Chen; James Wilson; Stephen Tyree; Kilian Weinberger; Yixin Chen; |

2015 | 13 | An Embarrassingly Simple Approach To Zero-shot LearningIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we describe a zero-shot learning approach that can be implemented in just one line of code, yet it is able to outperform state of the art approaches on standard datasets. |
Bernardino Romera-Paredes; Philip Torr; |

2015 | 14 | Weight Uncertainty In Neural NetworkIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. |
Charles Blundell; Julien Cornebise; Koray Kavukcuoglu; Daan Wierstra; |

2015 | 15 | Online Tracking By Learning Discriminative Saliency Map With Convolutional Neural NetworkIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose an online visual tracking algorithm by learning discriminative saliency map using Convolutional Neural Network (CNN). |
Seunghoon Hong; Tackgeun You; Suha Kwak; Bohyung Han; |

2014 | 1 | Distributed Representations Of Sentences And DocumentsIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we propose an unsupervised algorithm that learns vector representations of sentences and text documents. |
Quoc Le; Tomas Mikolov; |

2014 | 2 | DeCAF: A Deep Convolutional Activation Feature For Generic Visual RecognitionIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks. |
JEFF DONAHUE et. al. |

2014 | 3 | Stochastic Backpropagation And Approximate Inference In Deep Generative ModelsIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. |
Danilo Jimenez Rezende; Shakir Mohamed; Daan Wierstra; |

2014 | 4 | Deterministic Policy Gradient AlgorithmsIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. |
DAVID SILVER et. al. |

2014 | 5 | Towards End-To-End Speech Recognition With Recurrent Neural NetworksIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. |
Alex Graves; Navdeep Jaitly; |

2014 | 6 | Recurrent Convolutional Neural Networks For Scene LabelingIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose an approach that consists of a recurrent convolutional neural network which allows us to consider a large input context while limiting the capacity of the model. |
Pedro Pinheiro; Ronan Collobert; |

2014 | 7 | Neural Variational Inference And Learning In Belief NetworksIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior. |
Andriy Mnih; Karol Gregor; |

2014 | 8 | Stochastic Gradient Hamiltonian Monte CarloIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we explore the properties of such a stochastic gradient HMC approach. |
Tianqi Chen; Emily Fox; Carlos Guestrin; |

2014 | 9 | Multimodal Neural Language ModelsIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities. |
Ryan Kiros; Ruslan Salakhutdinov; Rich Zemel; |

2014 | 10 | Learning Character-level Representations For Part-of-Speech TaggingIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we propose a deep neural network that learns character-level representation of words and associate them with usual word representations to perform POS tagging. |
Cicero Dos Santos; Bianca Zadrozny; |

2014 | 11 | Fast Computation Of Wasserstein BarycentersIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present new algorithms to compute the mean of a set of N empirical probability measures under the optimal transport metric. |
Marco Cuturi; Arnaud Doucet; |

2014 | 12 | Accelerated Proximal Stochastic Dual Coordinate Ascent For Regularized Loss MinimizationIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure. |
Shai Shalev-Shwartz; Tong Zhang; |

2014 | 13 | Large-scale Multi-label Learning With Missing LabelsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we directly address both these problems by studying the multi-label problem in a generic empirical risk minimization (ERM) framework. |
Hsiang-Fu Yu; Prateek Jain; Purushottam Kar; Inderjit Dhillon; |

2014 | 14 | A Clockwork RNNIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper introduces a simple, yet powerful modification to the simple RNN (SRN) architecture, the Clockwork RNN (CW-RNN), in which the hidden layer is partitioned into separate modules, each processing inputs at its own temporal granularity, making computations only at its prescribed clock rate. |
Jan Koutnik; Klaus Greff; Faustino Gomez; Juergen Schmidhuber; |

2014 | 15 | Communication-Efficient Distributed Optimization Using An Approximate Newton-type MethodIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems. |
Ohad Shamir; Nati Srebro; Tong Zhang; |

2013 | 1 | On The Difficulty Of Training Recurrent Neural NetworksIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. |
Razvan Pascanu; Tomas Mikolov; Yoshua Bengio; |

2013 | 2 | On The Importance Of Initialization And Momentum In Deep LearningIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization. |
Ilya Sutskever; James Martens; George Dahl; Geoffrey Hinton; |

2013 | 3 | Regularization Of Neural Networks Using DropConnectIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce DropConnect, a generalization of DropOut, for regularizing large fully-connected layers within neural networks. |
Li Wan; Matthew Zeiler; Sixin Zhang; Yann Le Cun; Rob Fergus; |

2013 | 4 | Maxout NetworksIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. |
Ian Goodfellow; David Warde-Farley; Mehdi Mirza; Aaron Courville; Yoshua Bengio; |

2013 | 5 | Deep Canonical Correlation AnalysisIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated. |
Galen Andrew; Raman Arora; Jeff Bilmes; Karen Livescu; |

2013 | 6 | Making A Science Of Model Search: Hyperparameter Optimization In Hundreds Of Dimensions For Vision ArchitecturesIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we propose a meta-modeling approach to support automated hyperparameter optimization, with the goal of providing practical tools that replace hand-tuning with a reproducible and unbiased optimization process. |
James Bergstra; Daniel Yamins; David Cox; |

2013 | 7 | Learning Fair RepresentationsIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the proportion in the population as a whole), and individual fairness (similar individuals should be treated similarly). |
Rich Zemel; Yu Wu; Kevin Swersky; Toni Pitassi; Cynthia Dwork; |

2013 | 8 | Revisiting Frank-Wolfe: Projection-Free Sparse Convex OptimizationIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, and discuss the broad application areas of this approach. |
Martin Jaggi; |

2013 | 9 | Deep Learning With COTS HPC SystemsIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI. |
ADAM COATES et. al. |

2013 | 10 | Guided Policy SearchIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. |
Sergey Levine; Vladlen Koltun; |

2013 | 11 | Thompson Sampling For Contextual Bandits With Linear PayoffsIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we design and analyze Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary. |
Shipra Agrawal; Navin Goyal; |

2013 | 12 | Domain Generalization Via Invariant Feature RepresentationIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose Domain-Invariant Component Analysis (DICA), a kernel-based optimization algorithm that learns an invariant transformation by minimizing the dissimilarity across domains, whilst preserving the functional relationship between input and output variables. |
Krikamol Muandet; David Balduzzi; Bernhard Sch�lkopf; |

2013 | 13 | Connecting The Dots With Landmarks: Discriminatively Learning Domain-Invariant Features For Unsupervised Domain AdaptationIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we propose a novel approach for learning such features. |
Boqing Gong; Kristen Grauman; Fei Sha; |

2013 | 14 | Gaussian Process Kernels For Pattern Discovery And ExtrapolationIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce simple closed form kernels that can be used with Gaussian processes to discover patterns and enable extrapolation. |
Andrew Wilson; Ryan Adams; |

2013 | 15 | Stochastic Gradient Descent For Non-smooth Optimization: Convergence Results And Optimal Averaging SchemesIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we investigate the performance of SGD \emphwithout such smoothness assumptions, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy. |
Ohad Shamir; Tong Zhang; |

2012 | 1 | Building High-level Features Using Large Scale Unsupervised LearningIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We consider the challenge of building feature detectors for high-level concepts from only unlabeled data. |
QUOC LE et. al. |

2012 | 2 | Conversational Speech Transcription Using Context-Dependent Deep Neural NetworksIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsAbstract: Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, combine the classic artificial-neural-network HMMs with traditional context-dependent acoustic modeling and … |
Dong Yu; Frank Seide; Gang Li; |

2012 | 3 | Poisoning Attacks Against Support Vector MachinesIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: As we demonstrate in this contribution, an intelligent adversary can to some extent predict the change of the SVM decision function in response to malicious input and use this ability to construct malicious data points. |
Battista Biggio; Blaine Nelson; Pavel Laskov; |

2012 | 4 | Marginalized Denoising Autoencoders For Domain AdaptationIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we introduce a variation, marginalized SDA (mSDA). |
Minmin Chen; Zhixiang Xu; Kilian Weinberger; Fei Sha; |

2012 | 5 | Modeling Temporal Dependencies In High-Dimensional Sequences: Application To Polyphonic Music Generation And TranscriptionIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences. |
Nicolas Boulanger-Lewandowski; Yoshua Bengio; Pascal Vincent; |

2012 | 6 | Making Gradient Descent Optimal For Strongly Convex Stochastic OptimizationIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we investigate the optimality of SGD in a stochastic setting. |
Alexander Rakhlin; Ohad Shamir; Karthik Sridharan; |

2012 | 7 | A Fast And Simple Algorithm For Training Neural Probabilistic Language ModelsIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions. |
Andriy Mnih; Yee Whye Teh; |

2012 | 8 | High Dimensional Semiparametric Gaussian Copula Graphical ModelsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we propose a semiparametric approach named nonparanormal SKEPTIC for efficiently and robustly estimating high dimensional undirected graphical models. |
Han Liu; Fang Han; Ming Yuan; John Lafferty; Larry Wasserman; |

2012 | 9 | Learning Task Grouping And Overlap In Multi-task LearningIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a framework for multi-task learn- ing that enables one to selectively share the information across the tasks. |
Abhishek Kumar; Hal Daume III; |

2012 | 10 | Fast Approximation Of Matrix Coherence And Statistical LeverageIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Interestingly, to achieve our result we judiciously apply random projections on both sides of A. |
Michael Mahoney; Petros Drineas; Malik Magdon-Ismail; David Woodruff; |

2012 | 11 | Parallelizing Exploration-Exploitation Tradeoffs With Gaussian Process Bandit OptimizationIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We formalize the task as a multi-armed bandit problem, where the unknown payoff function is sampled from a Gaussian process (GP), and instead of a single arm, in each round we pull a batch of several arms in parallel. |
Thomas Desautels; Andreas Krause; Joel Burdick; |

2012 | 12 | Variational Bayesian Inference With Stochastic SearchIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present an algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound in all models. |
John Paisley; David Blei; Michael Jordan; |

2012 | 13 | Learning To Label Aerial Images From Noisy DataIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose two robust loss functions for dealing with these kinds of label noise and use the loss functions to train a deep neural network on two challenging aerial image datasets. |
Volodymyr Mnih; Geoffrey Hinton; |

2012 | 14 | Revisiting K-means: New Algorithms Via Bayesian NonparametricsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint. |
Brian Kulis; Michael Jordan; |

2012 | 15 | A Joint Model Of Language And Perception For Grounded Attribute LearningIF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we present an approach for joint learning of language and perception models for grounded attribute induction. |
Cynthia Matuszek; Nicholas FitzGerald; Luke Zettlemoyer; Liefeng Bo; Dieter Fox; |

2011 | 1 | Multimodal Deep LearningIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we propose a novel application of deep networks to learn features over multiple modalities. |
JIQUAN NGIAM et. al. |

2011 | 2 | Domain Adaptation For Large-Scale Sentiment Classification: A Deep Learning ApproachIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a deep learning approach which learns to extract a meaningful representation for each review in an unsupervised fashion. |
Xavier Glorot; Antoine Bordes; Yoshua Bengio; |

2011 | 3 | Bayesian Learning Via Stochastic Gradient Langevin DynamicsIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. |
Max Welling; Yee Whye Teh; |

2011 | 4 | A Three-Way Model For Collective Learning On Multi-Relational DataIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Here, we present a novel approach to relational learning based on the factorization of a three-way tensor. |
Maximilian Nickel; Volker Tresp; Hans-Peter Kriegel; |

2011 | 5 | Parsing Natural Scenes And Natural Language With Recursive Neural NetworksIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce a max-margin structure prediction architecture based on recursive neural networks that can successfully recover such structure both in complex scene images as well as sentences. |
Richard Socher; Cliff Chiung-Yu Lin; Andrew Ng; Chris Manning; |

2011 | 6 | Generating Text With Recurrent Neural NetworksIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we demonstrate the power of RNNs trained with the new Hessian-Free optimizer (HF) by applying them to character-level language modeling tasks. |
Ilya Sutskever; James Martens; Geoffrey Hinton; |

2011 | 7 | Contractive Auto-Encoders: Explicit Invariance During Feature ExtractionIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present in this paper a novel approach for training deterministic auto-encoders. |
Salah RIFAI; Pascal Vincent; Xavier Muller; Xavier Glorot; Yoshua Bengio; |

2011 | 8 | PILCO: A Model-Based And Data-Efficient Approach To Policy SearchIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. |
Marc Deisenroth; Carl Rasmussen; |

2011 | 9 | Hashing With GraphsIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we propose a novel graph-based hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes. |
Wei Liu; Jun Wang; Sanjiv Kumar; Shih-Fu Chang; |

2011 | 10 | On Optimization Methods For Deep LearningIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we show that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with linesearch can significantly simplify and speed up the process of pretraining deep algorithms. |
QUOC LE et. al. |

2011 | 11 | Minimal Loss Hashing For Compact Binary CodesIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a method for learning similarity-preserving hash functions that map high-dimensional data onto binary codes. |
Mohammad Norouzi; David Fleet; |

2011 | 12 | A Co-training Approach For Multi-view Spectral ClusteringIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a spectral clustering algorithm for the multi-view setting where we have access to multiple views of the data, each of which can be independently used for clustering. |
Abhishek Kumar; Hal Daume III; University of Maryland; |

2011 | 13 | The Importance Of Encoding Versus Training With Sparse Coding And Vector QuantizationIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we investigate the reasons for the success of sparse coding over VQ by decoupling these phases, allowing us to separate out the contributions of the training and encoding in a controlled way. |
Adam Coates; Andrew Ng; |

2011 | 14 | Learning Recurrent Neural Networks With Hessian-Free OptimizationIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. |
James Martens; Ilya Sutskever; |

2011 | 15 | Doubly Robust Policy Evaluation And LearningIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we leverage the strength and overcome the weaknesses of the two approaches by applying the \emph{doubly robust} technique to the problems of policy evaluation and optimization. |
Miroslav Dudik; John Langford; Lihong Li; |

2010 | 1 | Rectified Linear Units Improve Restricted Boltzmann MachinesIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsAbstract: Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all … |
Vinod Nair; Geoffrey Hinton; |

2010 | 2 | 3D Convolutional Neural Networks For Human Action RecognitionIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we develop a novel 3D CNN model for action recognition. |
Shuiwang Ji; Wei Xu; Ming Yang; Kai Yu; |

2010 | 3 | Gaussian Process Optimization In The Bandit Setting: No Regret And Experimental DesignIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We formalize this task as a multi-armed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. |
Niranjan Srinivas; Andreas Krause; Sham Kakade; Matthias Seeger; |

2010 | 4 | Robust Subspace Segmentation By Low-Rank RepresentationIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose low-rank representation(LRR) to segment data drawn from a union of multiple linear (or affine) subspaces. |
Guangcan Liu; Zhouchen Lin; Yong Yu; |

2010 | 5 | Learning Fast Approximations Of Sparse CodingIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We proposed two versions of a very fast algorithm that produces approximate estimates of the sparse code that can be used to compute good visual features, or to initialize exact iterative algorithms. |
Karol Gregor; Yann LeCun; |

2010 | 6 | Deep Learning Via Hessian-free OptimizationIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We develop a 2nd-order optimization method based on the “Hessian-free approach, and apply it to training deep auto-encoders. |
James Martens; |

2010 | 7 | Large Graph Construction For Scalable Semi-Supervised LearningIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we address the scalability issue plaguing graph-based semi-supervised learning viaa small number of anchor points which adequately cover the entire point cloud. |
Wei Liu; Junfeng He; Shih-Fu Chang; |

2010 | 8 | Estimation Of (near) Low-rank Matrices With Noise And High-dimensional ScalingIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We study an instance of high-dimensional statistical inference inwhich the goal is to use $N$ noisy observations to estimate a matrix$\Theta^* \in \real^{k \times p}$ that is assumed to be either exactlylow rank, or "near" low-rank, meaning that it can bewell-approximated by a matrix with low rank. |
Sahand Negahban; Martin Wainwright; |

2010 | 9 | Tree-Guided Group Lasso For Multi-Task Regression With Structured SparsityIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Our goal is to recover the common set of relevant inputs for each output cluster. |
Seyoung Kim; Eric Xing; |

2010 | 10 | Bayes Optimal Multilabel Classification Via Probabilistic Classifier ChainsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: The goal of this paper is to elaborate on this postulate in a critical way. |
Krzysztof Dembczynski; Weiwei Cheng; Eyke Huellermeier; |

2010 | 11 | Application Of Machine Learning To Epileptic Seizure DetectionIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present and evaluate a machine learning approach to constructing patient-specific classifiers that detect the onset of an epileptic seizure through analysis of the scalp EEG, a non-invasive measure of the brain�s electrical activity. |
Ali Shoeb; John Guttag; |

2010 | 12 | Proximal Methods For Sparse Hierarchical Dictionary LearningIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose to combine two approaches for modeling data admitting sparse representations: on the one hand, dictionary learning has proven effective for various signal processing tasks. |
Rodolphe Jenatton; Julien Mairal; Guillaume Obozinski; Francis Bach; |

2010 | 13 | Distance Dependent Chinese Restaurant ProcessesIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability. |
David Blei; Peter Frazier; |

2010 | 14 | Sequential Projection Learning For Hashing With Compact CodesIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we propose a novel data-dependent projection learning method such that each hash function is designed to correct the errors made by the previous one sequentially. |
Jun Wang; Sanjiv Kumar; Shih-Fu Chang; |

2010 | 15 | Metric Learning To RankIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a general metric learning algorithm, based on the structural SVM framework, to learn a metric such that rankings of data induced by distance from a query can be optimized against various ranking measures, such as AUC, Precision-at-k, MRR, MAP or NDCG. |
Brian McFee; Gert Lanckriet; |

2009 | 1 | Curriculum LearningIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Here, we formalize such training strategies in the context of machine learning, and call them "curriculum learning". |
Yoshua Bengio; Jérôme Louradour; Ronan Collobert; Jason Weston; |

2009 | 2 | Online Dictionary Learning For Sparse CodingIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper proposes a new online optimization algorithm for dictionary learning, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples. |
Julien Mairal; Francis Bach; Jean Ponce; Guillermo Sapiro; |

2009 | 3 | Group Lasso With Overlap And Graph LassoIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a new penalty function which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators. |
Laurent Jacob; Guillaume Obozinski; Jean-Philippe Vert; |

2009 | 4 | Learning Structural SVMs With Latent VariablesIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a large-margin formulation and algorithm for structured output prediction that allows the use of latent variables. |
Chun-Nam John Yu; Thorsten Joachims; |

2009 | 5 | Multi-view Clustering Via Canonical Correlation AnalysisIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA). |
Kamalika Chaudhuri; Sham M. Kakade; Karen Livescu; Karthik Sridharan; |

2009 | 6 | Large-scale Deep Unsupervised Learning Using Graphics ProcessorsIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we suggest massively parallel methods to help resolve these problems. |
Rajat Raina; Anand Madhavan; Andrew Y. Ng; |

2009 | 7 | Information Theoretic Measures For Clusterings Comparison: Is A Correction For Chance Necessary?IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison. |
Nguyen Xuan Vinh; Julien Epps; James Bailey; |

2009 | 8 | An Accelerated Gradient Method For Trace Norm MinimizationIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O(1/k). |
Shuiwang Ji; Jieping Ye; |

2009 | 9 | Identifying Suspicious URLs: An Application Of Large-scale Online LearningIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper explores online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs. |
Justin Ma; Lawrence K. Saul; Stefan Savage; Geoffrey M. Voelker; |

2009 | 10 | Fast Gradient-descent Methods For Temporal-difference Learning With Linear Function ApproximationIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we introduce two new related algorithms with better convergence rates. |
RICHARD S. SUTTON et. al. |

2009 | 11 | Learning With Structured SparsityIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper investigates a new learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. |
Junzhou Huang; Tong Zhang; Dimitris Metaxas; |

2009 | 12 | More Generality In Efficient Multiple Kernel LearningIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we observe that existing MKL formulations can be extended to learn general kernel combinations subject to general regularization. |
Manik Varma; Bodla Rakesh Babu; |

2009 | 13 | Incorporating Domain Knowledge Into Topic Modeling Via Dirichlet Forest PriorsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present its construction, and inference via collapsed Gibbs sampling. |
David Andrzejewski; Xiaojin Zhu; Mark Craven; |

2009 | 14 | Factored Conditional Restricted Boltzmann Machines For Modeling Motion StyleIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a new model, based on the CRBM that preserves its most important computational properties and includes multiplicative three-way interactions that allow the effective interaction weight between two units to be modulated by the dynamic state of a third unit. |
Graham W. Taylor; Geoffrey E. Hinton; |

2009 | 15 | Multi-instance Learning By Treating Instances As Non-I.I.D. SamplesIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we propose two simple yet effective methods. |
Zhi-Hua Zhou; Yu-Yin Sun; Yu-Feng Li; |

2008 | 1 | Extracting And Composing Robust Features With Denoising AutoencodersIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern. |
Pascal Vincent; Hugo Larochelle; Yoshua Bengio; Pierre-Antoine Manzagol; |

2008 | 2 | A Unified Architecture For Natural Language Processing: Deep Neural Networks With Multitask LearningIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. |
Ronan Collobert; Jason Weston; |

2008 | 3 | Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte CarloIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we present a fully Bayesian treatment of the Probabilistic Matrix Factorization (PMF) model in which model capacity is controlled automatically by integrating over all model parameters and hyperparameters. |
Ruslan Salakhutdinov; Andriy Mnih; |

2008 | 4 | A Dual Coordinate Descent Method For Large-scale Linear SVMIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper presents a novel dual coordinate descent method for linear SVM with L1-and L2-loss functions. |
Cho-Jui Hsieh; Kai-Wei Chang; Chih-Jen Lin; S. Sathiya Keerthi; S. Sundararajan; |

2008 | 5 | Training Restricted Boltzmann Machines Using Approximations To The Likelihood GradientIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: A new algorithm for training Restricted Boltzmann Machines is introduced. |
Tijmen Tieleman; |

2008 | 6 | Classification Using Discriminative Restricted Boltzmann MachinesIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we argue that RBMs provide a self-contained framework for deriving competitive non-linear classifiers. |
Hugo Larochelle; Yoshua Bengio; |

2008 | 7 | Listwise Approach To Learning To Rank: Theory And AlgorithmIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper aims to conduct a study on the listwise approach to learning to rank. |
Fen Xia; Tie-Yan Liu; Jue Wang; Wensheng Zhang; Hang Li; |

2008 | 8 | Deep Learning Via Semi-supervised EmbeddingIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We show how nonlinear embedding algorithms popular for use with shallow semi-supervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture. |
Jason Weston; Frédéric Ratle; Ronan Collobert; |

2008 | 9 | Grassmann Discriminant Analysis: A Unifying View On Subspace-based LearningIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we propose a discriminant learning framework for problems in which data consist of linear subspaces instead of vectors. |
Jihun Hamm; Daniel D. Lee; |

2008 | 10 | An Empirical Evaluation Of Supervised Learning In High DimensionsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we perform an empirical evaluation of supervised learning on high-dimensional data. |
Rich Caruana; Nikos Karampatziakis; Ainur Yessenalina; |

2008 | 11 | Learning Diverse Rankings With Multi-armed BanditsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present two online learning algorithms that directly learn a diverse ranking of documents based on users’ clicking behavior. |
Filip Radlinski; Robert Kleinberg; Thorsten Joachims; |

2008 | 12 | On The Quantitative Analysis Of Deep Belief NetworksIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We show that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and we present a novel AIS scheme for comparing RBM’s with different architectures. |
Ruslan Salakhutdinov; Iain Murray; |

2008 | 13 | Hierarchical Sampling For Active LearningIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present an active learning scheme that exploits cluster structure in data. |
Sanjoy Dasgupta; Daniel Hsu; |

2008 | 14 | Confidence-weighted Linear ClassificationIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce confidence-weighted linear classifiers, which add parameter confidence information to linear classifiers. |
Mark Dredze; Koby Crammer; Fernando Pereira; |

2008 | 15 | Fast Support Vector Machine Training And Classification On Graphics ProcessorsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We describe a solver for Support Vector Machine training running on a GPU, using the Sequential Minimal Optimization algorithm and an adaptive first and second order working set selection heuristic, which achieves speedups of 9-35x over LIBSVM running on a traditional processor. |
Bryan Catanzaro; Narayanan Sundaram; Kurt Keutzer; |

2007 | 1 | Pegasos: Primal Estimated Sub-GrAdient SOlver For SVMIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We describe and analyze a simple and effective iterative algorithm for solving the optimization problem cast by Support Vector Machines (SVM). |
Shai Shalev-Shwartz; Yoram Singer; Nathan Srebro; |

2007 | 2 | Information-theoretic Metric LearningIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function. |
Jason V. Davis; Brian Kulis; Prateek Jain; Suvrit Sra; Inderjit S. Dhillon; |

2007 | 3 | Restricted Boltzmann Machines For Collaborative FilteringIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM’s), can be used to model tabular data, such as user’s ratings of movies. |
Ruslan Salakhutdinov; Andriy Mnih; Geoffrey Hinton; |

2007 | 4 | Learning To Rank: From Pairwise Approach To Listwise ApproachIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: The paper proposes a new probabilistic method for the approach. |
Zhe Cao; Tao Qin; Tie-Yan Liu; Ming-Feng Tsai; Hang Li; |

2007 | 5 | Self-taught Learning: Transfer Learning From Unlabeled DataIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We describe an approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data. |
Rajat Raina; Alexis Battle; Honglak Lee; Benjamin Packer; Andrew Y. Ng; |

2007 | 6 | Boosting For Transfer LearningIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we present a novel transfer learning framework called TrAdaBoost, which extends boosting-based learning algorithms (Freund & Schapire, 1997). |
Wenyuan Dai; Qiang Yang; Gui-Rong Xue; Yong Yu; |

2007 | 7 | An Empirical Evaluation Of Deep Architectures On Problems With Many Factors Of VariationIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Recently, several learning algorithms relying on models with deep architectures have been proposed. |
Hugo Larochelle; Dumitru Erhan; Aaron Courville; James Bergstra; Yoshua Bengio; |

2007 | 8 | Spectral Feature Selection For Supervised And Unsupervised LearningIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory. |
Zheng Zhao; Huan Liu; |

2007 | 9 | Experimental Perspectives On Learning From Imbalanced DataIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We address these and other issues in this work, showing that sampling in many cases will improve classifier performance. |
Jason Van Hulse; Taghi M. Khoshgoftaar; Amri Napolitano; |

2007 | 10 | Three New Graphical Models For Statistical Language ModellingIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose three new probabilistic language models that define the distribution of the next word in a sequence given several preceding words by using distributed representations of those words. |
Andriy Mnih; Geoffrey Hinton; |

2007 | 11 | Combining Online And Offline Knowledge In UCTIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We consider three approaches for combining offline and online value functions in the UCT algorithm. |
Sylvain Gelly; David Silver; |

2007 | 12 | Spectral Clustering And Transductive Learning With Multiple ViewsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We consider spectral clustering and transductive inference for data with multiple views. |
Dengyong Zhou; Christopher J. C. Burges; |

2007 | 13 | Discriminative Learning For Differing Training And Test DistributionsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We address classification problems for which the training instances are governed by a distribution that is allowed to differ arbitrarily from the test distribution—problems also referred to as classification under covariate shift. |
Steffen Bickel; Michael Brückner; Tobias Scheffer; |

2007 | 14 | Uncovering Shared Structures In Multiclass ClassificationIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper suggests a method for multiclass learning with many classes by simultaneously learning shared characteristics common to the classes, and predictors for the classes in terms of these characteristics. |
Yonatan Amit; Michael Fink; Nathan Srebro; Shimon Ullman; |

2007 | 15 | More Efficiency In Multiple Kernel LearningIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we address the MKL problem through an adaptive 2-norm regularization formulation. |
Alain Rakotomamonjy; Francis Bach; Stéphane Canu; Yves Grandvalet; |

2006 | 1 | The Relationship Between Precision-Recall And ROC CurvesIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve. |
Jesse Davis; Mark Goadrich; |

2006 | 2 | Connectionist Temporal Classification: Labelling Unsegmented Sequence Data With Recurrent Neural NetworksIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. |
Alex Graves; Santiago Fernández; Faustino Gomez; Jürgen Schmidhuber; |

2006 | 3 | Dynamic Topic ModelsIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics. |
David M. Blei; John D. Lafferty; |

2006 | 4 | An Empirical Comparison Of Supervised Learning AlgorithmsIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. |
Rich Caruana; Alexandru Niculescu-Mizil; |

2006 | 5 | Topic Modeling: Beyond Bag-of-wordsIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, I explore a hierarchical generative probabilistic model that incorporates both n-gram statistics and latent topic variables by extending a unigram topic model to include properties of a hierarchical Dirichlet bigram language model. |
Hanna M. Wallach; |

2006 | 6 | Cover Trees For Nearest NeighborIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points). |
Alina Beygelzimer; Sham Kakade; John Langford; |

2006 | 7 | Pachinko Allocation: DAG-structured Mixture Models Of Topic CorrelationsIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we introduce the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). |
Wei Li; Andrew McCallum; |

2006 | 8 | Fast Time Series Classification Using Numerosity ReductionIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this work, we propose an additional technique, numerosity reduction, to speed up one-nearest-neighbor DTW. |
Xiaopeng Xi; Eamonn Keogh; Christian Shelton; Li Wei; Chotirat Ann Ratanamahatana; |

2006 | 9 | Maximum Margin PlanningIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert’s behavior. |
Nathan D. Ratliff; J. Andrew Bagnell; Martin A. Zinkevich; |

2006 | 10 | Probabilistic Inference For Solving Discrete And Continuous State Markov Decision ProcessesIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Here we present an Expectation Maximization algorithm for computing optimal policies. |
Marc Toussaint; Amos Storkey; |

2006 | 11 | Label Propagation Through Linear NeighborhoodsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: A novel semi-supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood. |
Fei Wang; Changshui Zhang; |

2006 | 12 | Agnostic Active LearningIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We state and analyze the first active learning algorithm which works in the presence of arbitrary forms of noise. |
Maria-Florina Balcan; Alina Beygelzimer; John Langford; |

2006 | 13 | Batch Mode Active Learning And Its Application To Medical Image ClassificationIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we present a framework for "batch mode active learning" that applies the Fisher information matrix to select a number of informative examples simultaneously. |
Steven C. H. Hoi; Rong Jin; Jianke Zhu; Michael R. Lyu; |

2006 | 14 | PAC Model-free Reinforcement LearningIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm—Delayed Q-Learning. |
Alexander L. Strehl; Lihong Li; Eric Wiewiora; John Langford; Michael L. Littman; |

2006 | 15 | Accelerated Training Of Conditional Random Fields With Stochastic Gradient MethodsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We apply Stochastic Meta-Descent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). |
S. V. N. Vishwanathan; Nicol N. Schraudolph; Mark W. Schmidt; Kevin P. Murphy; |

2005 | 1 | Learning To Rank Using Gradient DescentIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. |
CHRIS BURGES et. al. |

2005 | 2 | Fast Maximum Margin Matrix Factorization For Collaborative PredictionIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Here, we investigate a direct gradient-based optimization method for MMMF and demonstrate it on large collaborative prediction problems. |
Jasson D. M. Rennie; Nathan Srebro; |

2005 | 3 | Predicting Good Probabilities With Supervised LearningIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We examine the relationship between the predictions made by different learning algorithms and true posterior probabilities. |
Alexandru Niculescu-Mizil; Rich Caruana; |

2005 | 4 | A Support Vector Method For Multivariate Performance MeasuresIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F_{1}-score. |
Thorsten Joachims; |

2005 | 5 | Comparing Clusterings: An Axiomatic ViewIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsAbstract: This paper views clusterings as elements of a lattice. Distances between clusterings are analyzed in their relationship to the lattice. From this vantage point, we first give an … |
Marina Meilǎ; |

2005 | 6 | Learning Structured Prediction Models: A Large Margin ApproachIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Our goal is to learn parameters such that inference using the model reproduces correct answers on the training data. |
Ben Taskar; Vassil Chatalbashev; Daphne Koller; Carlos Guestrin; |

2005 | 7 | Non-negative Tensor Factorization With Applications To Statistics And Computer VisionIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We derive algorithms for finding a non-negative n-dimensional tensor factorization (n-NTF) which includes the non-negative matrix factorization (NMF) as a particular case when n = 2. |
Amnon Shashua; Tamir Hazan; |

2005 | 8 | Near-optimal Sensor Placements In Gaussian ProcessesIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a mutual information criteria, and show that it produces better placements. |
Carlos Guestrin; Andreas Krause; Ajit Paul Singh; |

2005 | 9 | Beyond The Point Cloud: From Transductive To Semi-supervised LearningIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we show how to turn transductive and standard supervised learning algorithms into semi-supervised learners. |
Vikas Sindhwani; Partha Niyogi; Mikhail Belkin; |

2005 | 10 | High Speed Obstacle Avoidance Using Monocular Vision And Reinforcement LearningIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present an approach in which supervised learning is first used to estimate depths from single monocular images. |
Jeff Michels; Ashutosh Saxena; Andrew Y. Ng; |

2005 | 11 | Learning From Labeled And Unlabeled Data On A Directed GraphIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. |
Dengyong Zhou; Jiayuan Huang; Bernhard Schölkopf; |

2005 | 12 | Learning Gaussian Processes From Multiple TasksIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We consider the problem of multi-task learning, that is, learning multiple related functions. |
Kai Yu; Volker Tresp; Anton Schwaighofer; |

2005 | 13 | Reinforcement Learning With Gaussian ProcessesIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a new generative model for the value function, deduced from its relation with the discounted return. |
Yaakov Engel; Shie Mannor; Ron Meir; |

2005 | 14 | Bayesian Hierarchical ClusteringIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a novel algorithm for agglomerative hierarchical clustering based on evaluating marginal likelihoods of a probabilistic model. |
Katherine A. Heller; Zoubin Ghahramani; |

2005 | 15 | Learning The Structure Of Markov Logic NetworksIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks. |
Stanley Kok; Pedro Domingos; |

2004 | 1 | Apprenticeship Learning Via Inverse Reinforcement LearningIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. |
Pieter Abbeel; Andrew Y. Ng; |

2004 | 2 | A Maximum Entropy Approach To Species Distribution ModelingIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose the use of maximum-entropy techniques for this problem, specifically, sequential-update algorithms that can handle a very large number of features. |
Steven J. Phillips; Miroslav Dudík; Robert E. Schapire; |

2004 | 3 | Multiple Kernel Learning, Conic Duality, And The SMO AlgorithmIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose a novel dual formulation of the QCQP as a second-order cone programming problem, and show how to exploit the technique of Moreau-Yosida regularization to yield a formulation to which SMO techniques can be applied. |
Francis R. Bach; Gert R. G. Lanckriet; Michael I. Jordan; |

2004 | 4 | Support Vector Machine Learning For Interdependent And Structured Output SpacesIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We propose to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs. |
Ioannis Tsochantaridis; Thomas Hofmann; Thorsten Joachims; Yasemin Altun; |

2004 | 5 | K-means Clustering Via Principal Component AnalysisIF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering. |
Chris Ding; Xiaofeng He; |

2004 | 6 | Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent AlgorithmsIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we study stochastic gradient descent (SGD) algorithms on regularized forms of linear prediction methods. |
Tong Zhang; |

2004 | 7 | Integrating Constraints And Metric Learning In Semi-supervised ClusteringIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: This paper provides new methods for the two approaches as well as presents a new semi-supervised clustering algorithm that integrates both of these techniques in a uniform, principled framework. |
Mikhail Bilenko; Sugato Basu; Raymond J. Mooney; |

2004 | 8 | Dynamic Conditional Random Fields: Factorized Probabilistic Models For Labeling And Segmenting Sequence DataIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks (DBNs)—and parameters are tied across slices. |
Charles Sutton; Khashayar Rohanimanesh; Andrew McCallum; |

2004 | 9 | Learning And Evaluating Classifiers Under Sample Selection BiasIF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper, we formalize the sample selection bias problem in machine learning terms and study analytically and experimentally how a number of well-known classifier learning methods are affected by it. |
Bianca Zadrozny; |

2004 | 10 | Ensemble Selection From Libraries Of ModelsIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We present a method for constructing ensembles from libraries of thousands of models. |
Rich Caruana; Alexandru Niculescu-Mizil; Geoff Crew; Alex Ksikes; |

2004 | 11 | A Kernel View Of The Dimensionality Reduction Of ManifoldsIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We show how all three algorithms can be described as kernel PCA on specially constructed Gram matrices, and illustrate the similarities and differences between the algorithms with representative examples. |
Jihun Ham; Daniel D. Lee; Sebastian Mika; Bernhard Schölkopf; |

2004 | 12 | Learning A Kernel Matrix For Nonlinear Dimensionality ReductionIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We investigate how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. |
Kilian Q. Weinberger; Fei Sha; Lawrence K. Saul; |

2004 | 13 | Active Learning Using Pre-clusteringIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: The main contribution of the paper is a formal framework that incorporates clustering into active learning. |
Hieu T. Nguyen; Arnold Smeulders; |

2004 | 14 | Solving Cluster Ensemble Problems By Bipartite Graph PartitioningIF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: We introduce a new reduction method that constructs a bipartite graph from a given cluster ensemble. |
Xiaoli Zhang Fern; Carla E. Brodley; |

2004 | 15 | Margin Based Feature Selection – Theory And AlgorithmsIF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts DetailsHighlight: In this paper we introduce a margin based feature selection criterion and apply it to measure the quality of sets of features. |
Ran Gilad-Bachrach; Amir Navot; Naftali Tishby; |