Most Influential AISTATS Papers (2025-09 Version)
The Annual Conference on Artificial Intelligence and Statistics (AISTATS) is one of the top machine learning conferences in the world. Paper Digest Team analyzes all papers published on AISTATS in the past years, and presents the 15 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the latest version of this list or the most influential papers from other conferences/journals, please visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2025-09)
To search or review papers within AISTATS related to a specific topic, please use the search by venue (AISTATS) and review by venue (AISTATS) services. To browse the most productive AISTATS authors by year ranked by #papers accepted, here are the most productive AISTATS authors grouped by year.
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Most Influential AISTATS Papers (2025-09 Version)
| Year | Rank | Paper | Author(s) |
|---|---|---|---|
| 2024 | 1 | A General Theoretical Paradigm to Understand Learning from Human Preferences IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this method still heavily relies on the first approximation. In this paper we try to gain a deeper theoretical understanding of these practical algorithms. |
MOHAMMAD GHESHLAGHI AZAR et. al. |
| 2024 | 2 | Generating and Imputing Tabular Data Via Diffusion and Flow-based Gradient-Boosted Trees IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel approach for generating and imputing mixed-type (continuous and categorical) tabular data utilizing score-based diffusion and conditional flow matching. |
Alexia Jolicoeur-Martineau; Kilian Fatras; Tal Kachman; |
| 2024 | 3 | Generative Flow Networks As Entropy-Regularized RL IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work extends the connection between RL and GFlowNets to a general case. |
Daniil Tiapkin; Nikita Morozov; Alexey Naumov; Dmitry P Vetrov; |
| 2024 | 4 | Multi-resolution Time-Series Transformer for Long-term Forecasting IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The patch size controls the ability of transformers to learn the temporal patterns at different frequencies: shorter patches are effective for learning localized, high-frequency patterns, whereas mining long-term seasonalities and trends requires longer patches. Inspired by this observation, we propose a novel framework, Multi-resolution Time-Series Transformer (MTST), which consists of a multi-branch architecture for simultaneous modeling of diverse temporal patterns at different resolutions. |
Yitian Zhang; Liheng Ma; Soumyasundar Pal; Yingxue Zhang; Mark Coates; |
| 2024 | 5 | Mechanics of Next Token Prediction with Self-Attention IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we ask: What does a single self-attention layer learn from next-token prediction? |
Yingcong Li; Yixiao Huang; Muhammed E. Ildiz; Ankit Singh Rawat; Samet Oymak; |
| 2024 | 6 | Identifying Spurious Biases Early in Training Through The Lens of Simplicity Bias IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide the first theoretical analysis of the effect of simplicity bias on learning spurious correlations. |
Yu Yang; Eric Gan; Gintare Karolina Dziugaite; Baharan Mirzasoleiman; |
| 2024 | 7 | Theoretically Grounded Loss Functions and Algorithms for Score-Based Multi-Class Abstention IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the score-based formulation of learning with abstention in the multi-class classification setting. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
| 2024 | 8 | Understanding Generalization of Federated Learning Via Stability: Heterogeneity Matters IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to analyze the generalization performances of federated learning by means of algorithmic stability, which measures the change of the output model of an algorithm when perturbing one data point. |
Zhenyu Sun; Xiaochun Niu; Ermin Wei; |
| 2024 | 9 | Parameter-Agnostic Optimization Under Relaxed Smoothness IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as the assumption is relaxed to the more realistic $(L_0, L_1)$-smoothness, all existing convergence results still necessitate tuning of the stepsize. In this study, we demonstrate that Normalized Stochastic Gradient Descent with Momentum (NSGD-M) can achieve a (nearly) rate-optimal complexity without prior knowledge of any problem parameter, though this comes at the cost of introducing an exponential term dependent on $L_1$ in the complexity. |
Florian H�bler; Junchi Yang; Xiang Li; Niao He; |
| 2024 | 10 | Maximum Entropy GFlowNets with Soft Q-learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the connection by constructing an appropriate reward function, thereby establishing an exact relationship between GFNs and maximum entropy RL. |
Sobhan Mohammadpour; Emmanuel Bengio; Emma Frejinger; Pierre-Luc Bacon; |
| 2024 | 11 | Fair Machine Unlearning: Data Removal While Mitigating Disparities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that most efficient unlearning methods cannot accommodate popular fairness interventions, and we propose the first fair machine unlearning method that can efficiently unlearn data instances from a fair objective. |
Alex Oesterling; Jiaqi Ma; Flavio Calmon; Himabindu Lakkaraju; |
| 2024 | 12 | Exploring The Power of Graph Neural Networks in Solving Linear Optimization Problems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, machine learning, particularly message-passing graph neural networks (MPNNs), has gained traction in enhancing exact optimization algorithms. |
Chendi Qian; Didier Ch�telat; Christopher Morris; |
| 2024 | 13 | General Identifiability and Achievability for Causal Representation Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on causal representation learning (CRL) under a general nonparametric latent causal model and a general transformation model that maps the latent data to the observational data. |
Burak Varici; Emre Acart�rk; Karthikeyan Shanmugam; Ali Tajer; |
| 2024 | 14 | Mixture-of-Linear-Experts for Long-term Time Series Forecasting IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to their inherent simplicity, they are not able to adapt their prediction rules to periodic changes in time series patterns. To address this challenge, we propose a Mixture-of-Experts-style augmentation for linear-centric models and propose Mixture-of-Linear-Experts (MoLE). |
Ronghao Ni; Zinan Lin; Shuaiqi Wang; Giulia Fanti; |
| 2024 | 15 | Quantifying Uncertainty in Natural Language Explanations of Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs. |
Sree Harsha Tanneru; Chirag Agarwal; Himabindu Lakkaraju; |
| 2023 | 1 | TabLLM: Few-shot Classification of Tabular Data with Large Language Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate several serialization methods including templates, table-to-text models, and large language models. |
STEFAN HEGSELMANN et. al. |
| 2023 | 2 | Data Banzhaf: A Robust Data Valuation Framework for Machine Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm to efficiently estimate the Banzhaf value based on the Maximum Sample Reuse (MSR) principle. |
Jiachen T. Wang; Ruoxi Jia; |
| 2023 | 3 | Do Bayesian Neural Networks Need To Be Fully Stochastic? IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the benefit of treating all the parameters in a Bayesian neural network stochastically and find compelling theoretical and empirical evidence that this standard construction may be unnecessary. |
Mrinank Sharma; Sebastian Farquhar; Eric Nalisnick; Tom Rainforth; |
| 2023 | 4 | Fixing By Mixing: A Recipe for Optimal Byzantine ML Under Heterogeneity IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We obtain similar guarantees (in expectation) by plugging NNM in the distributed stochastic heavy ball method, a practical substitute to distributed gradient descent. |
YOUSSEF ALLOUAH et. al. |
| 2023 | 5 | Federated Learning Under Distributed Concept Drift IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify the problem of drift adaptation as a time-varying clustering problem, and we propose two new clustering algorithms for reacting to drifts based on local drift detection and hierarchical clustering. |
Ellango Jothimurugesan; Kevin Hsieh; Jianyu Wang; Gauri Joshi; Phillip B. Gibbons; |
| 2023 | 6 | Who Should Predict? Exact Algorithms For Learning to Defer to Humans IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we jointly train a classifier with a rejector, which decides on each data point whether the classifier or the human should predict. |
HUSSEIN MOZANNAR et. al. |
| 2023 | 7 | Membership Inference Attacks Against Synthetic Data Through Overfitting Detection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution. |
Boris van Breugel; Hao Sun; Zhaozhi Qian; Mihaela van der Schaar; |
| 2023 | 8 | Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified convergence analysis that covers a large variety of stochastic gradient descent-ascent methods, which so far have required different intuitions, have different applications and have been developed separately in various communities. |
Aleksandr Beznosikov; Eduard Gorbunov; Hugo Berard; Nicolas Loizou; |
| 2023 | 9 | From Shapley Values to Generalized Additive Models and Back IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce $n$-Shapley Values, a parametric family of local post-hoc explanation algorithms that explain individual predictions with interaction terms up to order $n$. |
Sebastian Bordt; Ulrike von Luxburg; |
| 2023 | 10 | Improved Sample Complexity Bounds for Distributionally Robust Reinforcement Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on the tabular episodic learning setting where the algorithm has access to a generative model of the nominal (training) environment around which the uncertainty set is defined. We propose the Robust Phased Value Learning (RPVL) algorithm to solve this problem for the uncertainty sets specified by four different divergences: total variation, chi-square, Kullback-Leibler, and Wasserstein. |
Zaiyan Xu; Kishan Panaganti; Dileep Kalathil; |
| 2023 | 11 | Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A common approach to build such models is to use contrastive learning on paired data across the two modalities, as exemplified by Contrastive Language-Image Pre-Training (CLIP). In this paper, (i) we initiate the investigation of a general class of nonlinear loss functions for multimodal contrastive learning (MMCL) including CLIP loss and show its connection to singular value decomposition (SVD). |
RYUMEI NAKADA et. al. |
| 2023 | 12 | Generalization in Graph Neural Networks: Improved PAC-Bayesian Bounds on Graph Diffusion IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present generalization bounds that instead scale with the largest singular value of the graph neural network’s feature diffusion matrix. |
Haotian Ju; Dongyue Li; Aneesh Sharma; Hongyang R. Zhang; |
| 2023 | 13 | Learning to Defer to Multiple Experts: Consistent Surrogate Losses, Confidence Calibration, and Conformal Ensembles IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we address the open problems of deriving a consistent surrogate loss, confidence calibration, and principled ensembling of experts. |
Rajeev Verma; Daniel Barrejon; Eric Nalisnick; |
| 2023 | 14 | An Online and Unified Algorithm for Projection Matrix Vector Multiplication with Application to Empirical Risk Minimization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is defined as follows: given a matrix at the pre-processing phase, at each iteration one receives a query vector and needs to form the matrix-vector product (approximately) before observing the next vector. In this work, we study a particular instance of such problem called the online projection matrix vector multiplication. |
Lianke Qin; Zhao Song; Lichen Zhang; Danyang Zhuo; |
| 2023 | 15 | Diffusion Generative Models in Infinite Dimensions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these models operate directly on the discretized data, and there are no semantics in the modeling process that relate the observed data to the underlying functional forms. We generalize diffusion models to operate directly in function space by developing the foundational theory for such models in terms of Gaussian measures on Hilbert spaces. |
Gavin Kerrigan; Justin Ley; Padhraic Smyth; |
| 2022 | 1 | Federated Learning with Buffered Asynchronous Aggregation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, aggregating individual client updates is incompatible with Secure Aggregation, which could result in an undesirable level of privacy for the system. To address these concerns, we propose a novel buffered asynchronous aggregation method, FedBuff, that is agnostic to the choice of optimizer, and combines the best properties of synchronous and asynchronous FL. |
JOHN NGUYEN et. al. |
| 2022 | 2 | Towards Understanding Biased Client Selection in Federated Learning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that biasing client selection towards clients with higher local loss yields faster error convergence. From this insight, we propose Power-of-Choice, a communication- and computation-efficient client selection framework that flexibly spans the trade-off between convergence speed and solution bias. |
Yae Jee Cho; Jianyu Wang; Gauri Joshi; |
| 2022 | 3 | CF-GNNExplainer: Counterfactual Explanations for Graph Neural Networks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such methods are not counterfactual (CF) in nature: given a prediction, we want to understand how the prediction can be changed in order to achieve an alternative outcome. In this work, we propose a method for generating CF explanations for GNNs: the minimal perturbation to the input (graph) data such that the prediction changes. |
Ana Lucic; Maartje A. Ter Hoeve; Gabriele Tolomei; Maarten De Rijke; Fabrizio Silvestri; |
| 2022 | 4 | Beta Shapley: A Unified and Noise-reduced Data Valuation Framework for Machine Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Beta Shapley, which is a substantial generalization of Data Shapley. |
Yongchan Kwon; James Zou; |
| 2022 | 5 | Differentially Private Federated Learning on Heterogeneous Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Federated Learning (FL) is a paradigm for large-scale distributed learning which faces two key challenges: (i) training efficiently from highly heterogeneous user data, and (ii) protecting the privacy of participating users. In this work, we propose a novel FL approach (DP-SCAFFOLD) to tackle these two challenges together by incorporating Differential Privacy (DP) constraints into the popular SCAFFOLD algorithm. |
Maxence Noble; Aur�lien Bellet; Aymeric Dieuleveut; |
| 2022 | 6 | Generative Models As Distributions of Functions IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we abandon discretized grids and instead parameterize individual data points by continuous functions. |
Emilien Dupont; Yee Whye Teh; Arnaud Doucet; |
| 2022 | 7 | Optimal Accounting of Differential Privacy Via Characteristic Function IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Algorithmically, we propose an analytical Fourier accountant that represents the complex logarithm of $\phi$-functions symbolically and uses Gaussian quadrature for numerical computation. |
Yuqing Zhu; Jinshuo Dong; Yu-Xiang Wang; |
| 2022 | 8 | Proximal Optimal Transport Modeling of Population Dynamics IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach to model the collective dynamics of a population of particles evolving with time. |
Charlotte Bunne; Laetitia Papaxanthos; Andreas Krause; Marco Cuturi; |
| 2022 | 9 | SparseFed: Mitigating Model Poisoning Attacks in Federated Learning with Sparsification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce SparseFed, a novel defense that uses global top-k update sparsification and device-level gradient clipping to mitigate model poisoning attacks. |
Ashwinee Panda; Saeed Mahloujifar; Arjun Nitin Bhagoji; Supriyo Chakraborty; Prateek Mittal; |
| 2022 | 10 | MT3: Meta Test-Time Training for Self-Supervised Test-Time Adaption IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We combine meta-learning, self-supervision and test-time training to learn to adapt to unseen test distributions. |
Alexander Bartler; Andre B�hler; Felix Wiewel; Mario D�bler; Bin Yang; |
| 2022 | 11 | Sinkformers: Transformers with Doubly Stochastic Attention IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose instead to use Sinkhorn’s algorithm to make attention matrices doubly stochastic. |
Michael E. Sander; Pierre Ablin; Mathieu Blondel; Gabriel Peyr�; |
| 2022 | 12 | Sample Complexity of Robust Reinforcement Learning with A Generative Model IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an $\epsilon$-optimal robust policy when the nominal model is unknown. |
Kishan Panaganti; Dileep Kalathil; |
| 2022 | 13 | Federated Reinforcement Learning with Environment Heterogeneity IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction. In this paper, we stress the constraint of environment heterogeneity, which means $n$ environments corresponding to these $n$ agents have different state-transitions. |
Hao Jin; Yang Peng; Wenhao Yang; Shusen Wang; Zhihua Zhang; |
| 2022 | 14 | A Single-Timescale Method for Stochastic Bilevel Optimization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. |
Tianyi Chen; Yuejiao Sun; Quan Xiao; Wotao Yin; |
| 2022 | 15 | Causally Motivated Shortcut Removal Using Auxiliary Labels IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study a flexible, causally-motivated approach to training robust predictors by discouraging the use of specific shortcuts, focusing on a common setting where a robust predictor could achieve optimal i.i.d generalization in principle, but is overshadowed by a shortcut predictor in practice. |
MAGGIE MAKAR et. al. |
| 2021 | 1 | Federated Learning with Compression: Unified Analysis and Sharp Guarantees IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate data heterogeneity, we introduce a local gradient tracking scheme and obtain sharp convergence rates that match the best-known communication complexities without compression for convex, strongly convex, and nonconvex settings. |
Farzin Haddadpour; Mohammad Mahdi Kamani; Aryan Mokhtari; Mehrdad Mahdavi; |
| 2021 | 2 | Benchmarking Simulation-Based Inference IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This has made it difficult to compare algorithms and identify their strengths and weaknesses.We set out to fill this gap: We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms including recent approaches employing neural networks and classical Approximate Bayesian Computation methods. |
Jan-Matthis Lueckmann; Jan Boelts; David Greenberg; Pedro Goncalves; Jakob Macke; |
| 2021 | 3 | Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. |
Nicolas Loizou; Sharan Vaswani; Issam Hadj Laradji; Simon Lacoste-Julien; |
| 2021 | 4 | Shuffled Model of Differential Privacy in Federated Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements, motivated by the federated learning (FL) framework. We propose a distributed communication-efficient and local differentially private stochastic gradient descent (CLDP-SGD) algorithm and analyze its communication, privacy, and convergence trade-offs. |
Antonious Girgis; Deepesh Data; Suhas Diggavi; Peter Kairouz; Ananda Theertha Suresh; |
| 2021 | 5 | Provably Efficient Safe Exploration Via Primal-Dual Policy Optimization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present an \underline{O}ptimistic \underline{P}rimal-\underline{D}ual Proximal Policy \underline{OP}timization \mbox{(OPDOP)} algorithm where the value function is estimated by combining the least-squares policy evaluation and an additional bonus term for safe exploration. |
Dongsheng Ding; Xiaohan Wei; Zhuoran Yang; Zhaoran Wang; Mihailo Jovanovic; |
| 2021 | 6 | Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To do so, a multitude of model-agnostic, nonparametric meta-learners have been proposed in recent years. Such learners decompose the treatment effect estimation problem into separate sub-problems, each solvable using standard supervised learning methods. |
Alicia Curth; Mihaela van der Schaar; |
| 2021 | 7 | Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The use of min-max optimization in the adversarial training of deep neural network classifiers, and the training of generative adversarial networks has motivated the study of nonconvex-nonconcave optimization objectives, which frequently arise in these applications. |
Jelena Diakonikolas; Constantinos Daskalakis; Michael I. Jordan; |
| 2021 | 8 | On Information Gain and Regret Bounds in Gaussian Process Bandits IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide general bounds on $\gamma_T$ based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels improves the existing bounds on $\gamma_T$, and subsequently the regret bounds relying on $\gamma_T$ under numerous settings. |
Sattar Vakili; Kia Khezeli; Victor Picheny; |
| 2021 | 9 | Improving Predictions of Bayesian Neural Nets Via Local Linearization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we argue that the GGN approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN), which turns the BNN into a generalized linear model (GLM). |
Alexander Immer; Maciej Korzepa; Matthias Bauer; |
| 2021 | 10 | Free-rider Attacks on Model Aggregation in Federated Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This kind of attacks are critical in sensitive applications of federated learning when data is scarce and the model has high commercial value. We introduce here the first theoretical and experimental analysis of free-rider attacks on federated learning schemes based on iterative parameters aggregation, such as FedAvg or FedProx, and provide formal guarantees for these attacks to converge to the aggregated models of the fair participants. |
Yann Fraboni; Richard Vidal; Marco Lorenzi; |
| 2021 | 11 | Does Invariant Risk Minimization Capture Invariance? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the Invariant Risk Minimization (IRM) formulation of Arjovsky et al. (2019) can fail to capture "natural" invariances, at least when used in its practical "linear" form, and even on very simple problems which directly follow the motivating examples for IRM. |
Pritish Kamath; Akilesh Tangella; Danica Sutherland; Nathan Srebro; |
| 2021 | 12 | VqSGD: Vector Quantized Stochastic Gradient Descent IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a family of vector quantization schemes vqSGD (Vector-Quantized Stochastic Gradient Descent) that provide an asymptotic reduction in the communication cost with convergence guarantees in first-order distributed optimization. |
Venkata Gandikota; Daniel Kane; Raj Kumar Maity; Arya Mazumdar; |
| 2021 | 13 | Causal Autoregressive Flows IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we highlight an intrinsic correspondence between a simple family of autoregressive normalizing flows and identifiable causal models. |
Ilyes Khemakhem; Ricardo Monti; Robert Leech; Aapo Hyvarinen; |
| 2021 | 14 | Local SGD: Unified Theory and New Efficient Methods IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models. |
Eduard Gorbunov; Filip Hanzely; Peter Richtarik; |
| 2021 | 15 | Scalable Constrained Bayesian Optimization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the scalable constrained Bayesian optimization (SCBO) algorithm that overcomes the above challenges and pushes the applicability of Bayesian optimization far beyond the state-of-the-art.To this end, we propose two new control problems that we expect to be of independent value for the scientific community. |
David Eriksson; Matthias Poloczek; |
| 2020 | 1 | FedPAQ: A Communication-Efficient Federated Learning Method With Periodic Averaging And Quantization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present FedPAQ, a communication-efficient Federated Learning method with Periodic Averaging and Quantization. |
Amirhossein Reisizadeh; Aryan Mokhtari; Hamed Hassani; Ali Jadbabaie; Ramtin Pedarsani; |
| 2020 | 2 | Variational Autoencoders And Nonlinear ICA: A Unifying Framework IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address this issue by showing that for a broad family of deep latent-variable models, identification of the true joint distribution over observed and latent variables is actually possible up to very simple transformations, thus achieving a principled and powerful form of disentanglement. |
Ilyes Khemakhem; Diederik Kingma; Ricardo Monti; Aapo Hyvarinen; |
| 2020 | 3 | Tighter Theory For Local SGD On Identical And Heterogeneous Data IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. |
Ahmed Khaled Ragab Bayoumi; Konstantin Mishchenko; Peter Richtarik; |
| 2020 | 4 | Optimizing Millions Of Hyperparameters By Implicit Differentiation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations. |
Jonathan Lorraine; Paul Vicol; David Duvenaud; |
| 2020 | 5 | Orthogonal Gradient Descent For Continual Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to address this issue from a parameter space perspective and study an approach to restrict the direction of the gradient updates to avoid forgetting previously-learned data. |
Mehrdad Farajtabar; Navid Azizan; Alex Mott; Ang Li; |
| 2020 | 6 | Gradient Descent With Early Stopping Is Provably Robust To Label Noise For Overparameterized Neural Networks IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite this (over)fitting capacity in this paper we demonstrate that such overparameterized networks have an intriguing robustness capability: they are surprisingly robust to label noise when first order methods with early stopping is used to train them. |
Mingchen Li; Mahdi Soltanolkotabi; Samet Oymak; |
| 2020 | 7 | Scalable Gradients For Stochastic Differential Equations IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We generalize this method to stochastic differential equations, allowing time-efficient and constant-memory computation of gradients with high-order adaptive solvers. |
Xuechen Li; Ting-Kam Leonard Wong; Ricky T. Q. Chen; David Duvenaud; |
| 2020 | 8 | A Unified Analysis Of Extra-gradient And Optimistic Gradient Methods For Saddle Point Problems: Proximal Point Approach IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we consider solving saddle point problems using two variants of Gradient Descent-Ascent algorithms, Extra-gradient (EG) and Optimistic Gradient Descent Ascent (OGDA) methods. |
Aryan Mokhtari; Asuman Ozdaglar; Sarath Pattathil; |
| 2020 | 9 | Model-Agnostic Counterfactual Explanations For Consequential Decisions IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we build on standard theory and tools from formal verification and propose a novel algorithm that solves a sequence of satisfiability problems, where both the distance function (objective) and predictive model (constraints) are represented as logic formulae. |
Amir-Hossein Karimi; Gilles Barthe; Borja Balle; Isabel Valera; |
| 2020 | 10 | Identifying And Correcting Label Bias In Machine Learning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a mathematical formulation of how this bias can arise. |
Heinrich Jiang; Ofir Nachum; |
| 2020 | 11 | Permutation Invariant Graph Generation Via Score-Based Generative Modeling IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this difficulty, we propose a permutation invariant approach to modeling graphs, using the recent framework of score-based generative modeling. |
CHENHAO NIU et. al. |
| 2020 | 12 | GP-VAE: Deep Probabilistic Time Series Imputation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new deep sequential latent variable model for dimensionality reduction and data imputation. |
Vincent Fortuin; Dmitry Baranchuk; Gunnar Raetsch; Stephan Mandt; |
| 2020 | 13 | Learning Sparse Nonparametric DAGs IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike existing approaches that require specific modeling choices, loss functions, or algorithms, we present a completely general framework that can be applied to general nonlinear models (e.g. without additive noise), general differentiable loss functions, and generic black-box optimization routines. |
Xun Zheng; Chen Dan; Bryon Aragam; Pradeep Ravikumar; Eric Xing; |
| 2020 | 14 | On The Convergence Theory Of Gradient-Based Model-Agnostic Meta-Learning Algorithms IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We start with the MAML method and its first-order approximation (FO-MAML) and highlight the challenges that emerge in their analysis. |
Alireza Fallah; Aryan Mokhtari; Asuman Ozdaglar; |
| 2020 | 15 | DYNOTEARS: Structure Learning From Time-Series Data IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We revisit the structure learning problem for dynamic Bayesian networks and propose a method that simultaneously estimates contemporaneous (intra-slice) and time-lagged (inter-slice) relationships between variables in a time-series. |
ROXANA PAMFIL et. al. |
| 2019 | 1 | Interpolating Between Optimal Transport And MMD Using Sinkhorn Divergences IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the Sinkhorn divergences, a family of geometric divergences that interpolates between MMD and OT. |
JEAN FEYDY et. al. |
| 2019 | 2 | Towards Efficient Data Valuation Based On The Shapley Value IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of \emph{data valuation} by utilizing the Shapley value, a popular notion of value which originated in coopoerative game theory. |
RUOXI JIA et. al. |
| 2019 | 3 | Lagrange Coded Computing: Optimal Design For Resiliency, Security, And Privacy IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. |
QIAN YU et. al. |
| 2019 | 4 | Sequential Neural Likelihood: Fast Likelihood-free Inference With Autoregressive Flows IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Sequential Neural Likelihood (SNL), a new method for Bayesian inference in simulator models, where the likelihood is intractable but simulating data from the model is possible. |
George Papamakarios; David Sterratt; Iain Murray; |
| 2019 | 5 | Nonlinear ICA Using Auxiliary Variables And Generalized Contrastive Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a general framework for nonlinear ICA, which, as a special case, can make use of temporal structure. |
Aapo Hyvarinen; Hiroaki Sasaki; Richard Turner; |
| 2019 | 6 | On The Convergence Of Stochastic Gradient Descent With Adaptive Stepsizes IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we start closing this gap: we theoretically analyze in the convex and non-convex settings a generalized version of the AdaGrad stepsizes. |
Xiaoyu Li; Francesco Orabona; |
| 2019 | 7 | Fast And Faster Convergence Of SGD For Over-Parameterized Models And An Accelerated Perceptron IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For such models, we show that the stochastic gradients of common loss functions satisfy a strong growth condition. |
Sharan Vaswani; Francis Bach; Mark Schmidt; |
| 2019 | 8 | Sample Complexity Of Sinkhorn Divergences IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus in this paper on Sinkhorn divergences (SDs), a regularized variant of OT distances which can interpolate, depending on the regularization strength $\varepsilon$, between OT ($\varepsilon=0$) and MMD ($\varepsilon=\infty$). |
Aude Genevay; L�na�c Chizat; Francis Bach; Marco Cuturi; Gabriel Peyr�; |
| 2019 | 9 | Truncated Back-propagation For Bilevel Optimization IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the properties of this family of approximate gradients and establish sufficient conditions for convergence. |
Amirreza Shaban; Ching-An Cheng; Nathan Hatch; Byron Boots; |
| 2019 | 10 | Does Data Interpolation Contradict Statistical Optimality? IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that classical learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss. |
Mikhail Belkin; Alexander Rakhlin; Alexandre B. Tsybakov; |
| 2019 | 11 | Interaction Matters: A Note On Non-asymptotic Local Convergence Of Generative Adversarial Networks IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the pursuit of a systematic computational and algorithmic understanding of Generative Adversarial Networks (GANs), we present a simple yet unified non-asymptotic local convergence theory for smooth two-player games, which subsumes several discrete-time gradient-based saddle point dynamics. |
Tengyuan Liang; James Stokes; |
| 2019 | 12 | Derivative-Free Methods For Policy Optimization: Guarantees For Linear Quadratic Systems IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study derivative-free methods for policy optimization over the class of linear policies. |
DHRUV MALIK et. al. |
| 2019 | 13 | Evaluating Model Calibration In Classification IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a general theoretical calibration evaluation framework grounded in probability theory, and point out subtleties present in model calibration evaluation that lead to refined interpretations of existing evaluation techniques. |
JUOZAS VAICENAVICIUS et. al. |
| 2019 | 14 | Unsupervised Alignment Of Embeddings With Wasserstein Procrustes IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to use an alternative formulation, based on the joint estimation of an orthogonal matrix and a permutation matrix. |
Edouard Grave; Armand Joulin; Quentin Berthet; |
| 2019 | 15 | Negative Momentum For Improved Game Dynamics IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze gradient-based methods with momentum on simple games. |
GAUTHIER GIDEL et. al. |
| 2018 | 1 | VAE With A VampPrior IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to extend the variational auto-encoder (VAE) framework with a new type of prior which we call "Variational Mixture of Posteriors" prior, or VampPrior for short. |
Jakub Tomczak; Max Welling; |
| 2018 | 2 | Learning Generative Models With Sinkhorn Divergences IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the first tractable method to train large scale generative models using an OT-based loss called Sinkhorn loss which tackles these three issues by relying on two key ideas: (a) entropic smoothing, which turns the original OT loss into a differentiable and more robust quantity that can be computed using Sinkhorn fixed point iterations; (b) algorithmic (automatic) differentiation of these iterations with seam- less GPU execution. |
Aude Genevay; Gabriel Peyre; Marco Cuturi; |
| 2018 | 3 | Turing: A Language For Flexible Probabilistic Inference IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a system called Turing for building MCMC algorithms for probabilistic programming inference. |
Hong Ge; Kai Xu; Zoubin Ghahramani; |
| 2018 | 4 | Parallelised Bayesian Optimisation Via Thompson Sampling IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design and analyse variations of the classical Thompson sampling (TS) procedure for Bayesian optimisation (BO) in settings where function evaluations are expensive but can be performed in parallel. |
Kirthevasan Kandasamy; Akshay Krishnamurthy; Jeff Schneider; Barnabas Poczos; |
| 2018 | 5 | Batched Large-scale Bayesian Optimization In High-dimensional Spaces IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ensemble Bayesian optimization (EBO) to address three current challenges in BO simultaneously: (1) large-scale observations; (2) high dimensional input spaces; and (3) selections of batch queries that balance quality and diversity. |
Zi Wang; Clement Gehring; Pushmeet Kohli; Stefanie Jegelka; |
| 2018 | 6 | Data-Efficient Reinforcement Learning With Probabilistic Model Predictive Control IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the number of system interactions while simultaneously handling constraints, we propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). |
Sanket Kamthe; Marc Deisenroth; |
| 2018 | 7 | Variational Sequential Monte Carlo IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present a new approximating family of distributions, the variational sequential Monte Carlo (VSMC) family, and show how to optimize it in variational inference. |
Christian Naesseth; Scott Linderman; Rajesh Ranganath; David Blei; |
| 2018 | 8 | Smooth And Sparse Optimal Transport IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore regularizing the primal and dual OT formulations with a strongly convex term, which corresponds to relaxing the dual and primal constraints with smooth approximations. |
Mathieu Blondel; Vivien Seguy; Antoine Rolet; |
| 2018 | 9 | The Emergence Of Spectral Universality In Deep Networks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we leverage powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network’s Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth. |
Jeffrey Pennington; Samuel Schoenholz; Surya Ganguli; |
| 2018 | 10 | Conditional Independence Testing Based On A Nearest-neighbor Estimator Of Conditional Mutual Information IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Conditional independence testing is a fundamental problem underlying causal discovery and a particularly challenging task in the presence of nonlinear dependencies. Here a fully … |
Jakob Runge; |
| 2018 | 11 | Personalized And Private Peer-to-Peer Machine Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an efficient algorithm to address the above problem in a fully decentralized (peer-to-peer) and asynchronous fashion, with provable convergence rate. |
Aur�lien Bellet; Rachid Guerraoui; Mahsa Taziki; Marc Tommasi; |
| 2018 | 12 | Policy Evaluation And Optimization With Continuous Treatments IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of policy evaluation and learning from batched contextual bandit data when treatments are continuous, going beyond previous work on discrete treatments. |
Nathan Kallus; Angela Zhou; |
| 2018 | 13 | Gradient Diversity: A Key Ingredient For Scalable Distributed Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an analysis hinting that high similarity between concurrently processed gradients may be a cause of this performance degradation. |
DONG YIN et. al. |
| 2018 | 14 | Linear Stochastic Approximation: How Far Does Constant Step-Size And Iterate Averaging Go? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a constant step-size averaged linear stochastic approximation (CALSA) algorithm, and for a given class of problems, we ask whether properties of $i)$ a universal constant step-size and $ii)$ a uniform fast rate of $\frac{C}{t}$ for the mean square-error hold for all instance of the class, where the constant $C>0$ does not depend on the problem instance. |
Chandrashekar Lakshminarayanan; Csaba Szepesvari; |
| 2018 | 15 | High-Dimensional Bayesian Optimization Via Additive Models With Overlapping Groups IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the approach of Kandasamy et al. (2015), in which the high-dimensional function decomposes as a sum of lower-dimensional functions on subsets of the underlying variables. |
Paul Rolland; Jonathan Scarlett; Ilija Bogunovic; Volkan Cevher; |
| 2017 | 1 | Communication-Efficient Learning Of Deep Networks From Decentralized Data IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. |
Brendan McMahan; Eider Moore; Daniel Ramage; Seth Hampson; Blaise Aguera y Arcas; |
| 2017 | 2 | Fast Bayesian Optimization Of Machine Learning Hyperparameters On Large Datasets IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To accelerate hyperparameter optimization, we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. |
Aaron Klein; Stefan Falkner; Simon Bartels; Philipp Hennig; Frank Hutter; |
| 2017 | 3 | Linear Thompson Sampling Revisited IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. |
Marc Abeille; Alessandro Lazaric; |
| 2017 | 4 | Decentralized Collaborative Learning Of Personalized Models Over Networks IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The question addressed in this paper is: how can agents improve upon their locally trained model by communicating with other agents that have similar objectives? |
Paul Vanhaesebrouck; Aur�lien Bellet; Marc Tommasi; |
| 2017 | 5 | Bayesian Learning And Inference In Recurrent Switching Linear Dynamical Systems IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on switching linear dynamical systems (SLDS), we develop a model class and Bayesian inference algorithms that not only discover these dynamical units but also, by learning how transition probabilities depend on observations or continuous latent states, explain their switching behavior. |
SCOTT LINDERMAN et. al. |
| 2017 | 6 | Nonlinear ICA Of Temporally Dependent Stationary Sources IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a nonlinear generative model where the independent sources are assumed to be temporally dependent, non-Gaussian, and stationary, and we observe arbitrarily nonlinear mixtures of them. |
Aapo Hyvarinen; Hiroshi Morioka; |
| 2017 | 7 | Beta Calibration: A Well-founded And Easily Implemented Improvement On Logistic Calibration For Binary Classifiers IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we solve all these problems with a richer class of calibration maps based on the beta distribution. |
Meelis Kull; Telmo Silva Filho; Peter Flach; |
| 2017 | 8 | Non-square Matrix Sensing Without Spurious Local Minima Via The Burer-Monteiro Approach IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we complement recent findings on the non-convex geometry of the analogous PSD setting [5], and show that matrix factorization does not introduce any spurious local minima, under RIP. |
Dohyung Park; Anastasios Kyrillidis; Constantine Carmanis; Sujay Sanghavi; |
| 2017 | 9 | Guaranteed Non-convex Optimization: Submodular Maximization Over Continuous Domains IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, i) We introduce the weak DR property that gives a unified characterization of submodularity for all set, integer-lattice and continuous functions; ii) for maximizing monotone DR-submodular continuous functions under general down-closed convex constraints, we propose a Frank-Wolfe variant with (1-1/e) approximation guarantee, and sub-linear convergence rate; iii) for maximizing general non-monotone submodular continuous functions subject to box constraints, we propose a DoubleGreedy algorithm with 1/3 approximation guarantee. |
Andrew An Bian; Baharan Mirzasoleiman; Joachim Buhmann; Andreas Krause; |
| 2017 | 10 | Inference Compilation And Universal Probabilistic Programming IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. |
Tuan Anh Le; Atilim Gunes Baydin; Frank Wood; |
| 2017 | 11 | Diverse Neural Network Learns True Target Functions IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we answer these questions by analyzing one-hidden-layer neural networks with ReLU activation, and show that despite the non-convexity, neural networks with diverse units have no spurious local minima. |
Bo Xie; Yingyu Liang; Le Song; |
| 2017 | 12 | Conjugate-Computation Variational Inference : Converting Variational Inference In Non-Conjugate Models To Inferences In Conjugate Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new algorithm called Conjugate-computation Variational Inference (CVI) which brings the best of the two worlds together – it uses conjugate computations for the conjugate terms and employs stochastic gradients for the rest. |
Mohammad Khan; Wu Lin; |
| 2017 | 13 | Learning From Conditional Distributions Via Dual Embeddings IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we propose a novel approach which employs a new min-max reformulation of the learning from conditional distribution problem. |
Bo Dai; Niao He; Yunpeng Pan; Byron Boots; Le Song; |
| 2017 | 14 | Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a flexible convex relaxation for the phase retrieval problem that operates in the natural domain of the signal. |
Sohail Bahmani; Justin Romberg; |
| 2017 | 15 | Learning Structured Weight Uncertainty In Bayesian Neural Networks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the matrix variate Gaussian (MVG) distribution to model structured correlations within the weights of a DNN. |
Shengyang Sun; Changyou Chen; Lawrence Carin; |
| 2016 | 1 | Deep Kernel Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. |
Andrew Gordon Wilson; Zhiting Hu; Ruslan Salakhutdinov; Eric P. Xing; |
| 2016 | 2 | Non-stochastic Best Arm Identification And Hyperparameter Optimization IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the task of hyperparameter optimization, we introduce the \em non-stochastic best-arm identification problem. |
Kevin Jamieson; Ameet Talwalkar; |
| 2016 | 3 | How To Learn A Graph From Smooth Signals IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework to learn the graph structure underlying a set of smooth signals. |
Vassilis Kalofolias; |
| 2016 | 4 | Generalizing Pooling Functions In Convolutional Neural Networks: Mixed, Gated, And Tree IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We seek to improve deep neural networks by generalizing the pooling operations that play a central role in current architectures. |
Chen-Yu Lee; Patrick W. Gallagher; Zhuowen Tu; |
| 2016 | 5 | Batch Bayesian Optimization Via Local Penalization IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate this issue and propose a highly effective heuristic based on an estimate of the function’s Lipschitz constant that captures the most important aspect of this interaction–local repulsion–at negligible computational overhead. |
Javier Gonzalez; Zhenwen Dai; Philipp Hennig; Neil Lawrence; |
| 2016 | 6 | A Linearly-Convergent Stochastic L-BFGS Algorithm IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. |
Philipp Moritz; Robert Nishihara; Michael Jordan; |
| 2016 | 7 | Controlling Bias In Adaptive Data Analysis Using Information Theory IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a general information-theoretic framework to quantify and provably bound the bias of arbitrary adaptive analysis process. |
Daniel Russo; James Zou; |
| 2016 | 8 | On Sparse Variational Methods And The Kullback-Leibler Divergence Between Stochastic Processes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we give a substantial generalization of the literature on this topic. |
Alexander G. de G. Matthews; James Hensman; Richard Turner; Zoubin Ghahramani; |
| 2016 | 9 | Breaking Sticks And Ambiguities With Adaptive Skip-gram IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose the Adaptive Skip-gram model which is a nonparametric Bayesian extension of Skip-gram capable to automatically learn the required number of representations for all words at desired semantic resolution. |
Sergey Bartunov; Dmitry Kondrashkin; Anton Osokin; Dmitry Vetrov; |
| 2016 | 10 | Dreaming More Data: Class-dependent Distributions Over Diffeomorphisms For Learned Data Augmentation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With an eye towards true end-to-end learning, we suggest learning the applied transformations on a per-class basis. |
S�ren Hauberg; Oren Freifeld; Anders Boesen Lindbo Larsen; John Fisher; Lars Hansen; |
| 2016 | 11 | Fast Dictionary Learning With A Smoothed Wasserstein Loss IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider in this paper the dictionary learning problem when the observations are normalized histograms of features. |
Antoine Rolet; Marco Cuturi; Gabriel Peyr�; |
| 2016 | 12 | Quantization Based Fast Inner Product Search IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a quantization based approach for fast approximate Maximum Inner Product Search (MIPS). |
Ruiqi Guo; Sanjiv Kumar; Krzysztof Choromanski; David Simcha; |
| 2016 | 13 | GLASSES: Relieving The Myopia Of Bayesian Optimisation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present GLASSES: Global optimisation with Look-Ahead through Stochastic Simulation and Expected-loss Search. |
Javier Gonzalez; Michael Osborne; Neil Lawrence; |
| 2016 | 14 | Non-Stationary Gaussian Process Regression With Hamiltonian Monte Carlo IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel approach for non-stationary Gaussian process regression (GPR), where the three key parameters – noise variance, signal variance and lengthscale – can be simultaneously input-dependent. |
Markus Heinonen; Henrik Mannerstr�m; Juho Rousu; Samuel Kaski; Harri L�hdesm�ki; |
| 2016 | 15 | Time-Varying Gaussian Process Bandit Optimization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce two natural extensions of the classical Gaussian process upper confidence bound (GP-UCB) algorithm. |
Ilija Bogunovic; Jonathan Scarlett; Volkan Cevher; |
| 2015 | 1 | Deeply-Supervised Nets IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose deeply-supervised nets (DSN), a method that simultaneously minimizes classification error and improves the directness and transparency of the hidden layer learning process. |
Chen-Yu Lee; Saining Xie; Patrick Gallagher; Zhengyou Zhang; Zhuowen Tu; |
| 2015 | 2 | The Loss Surfaces Of Multilayer Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. |
Anna Choromanska; MIkael Henaff; Michael Mathieu; Gerard Ben Arous; Yann LeCun; |
| 2015 | 3 | Scalable Variational Gaussian Process Classification IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show how to scale the model within a variational inducing point framework, out-performing the state of the art on benchmark datasets. |
James Hensman; Alexander Matthews; Zoubin Ghahramani; |
| 2015 | 4 | Falling Rule Lists IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a Bayesian framework for learning falling rule lists that does not rely on traditional greedy decision tree learning methods. |
Fulton Wang; Cynthia Rudin; |
| 2015 | 5 | Efficient Estimation Of Mutual Information For Strongly Dependent Variables IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new estimator that is robust to local non-uniformity, works well with limited data, and is able to capture relationship strengths over many orders of magnitude. |
Shuyang Gao; Greg Ver Steeg; Aram Galstyan; |
| 2015 | 6 | Trend Filtering On Graphs IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a family of adaptive estimators on graphs, based on penalizing the \ell_1 norm of discrete graph differences. |
Yu-Xiang Wang; James Sharpnack; Alex Smola; Ryan Tibshirani; |
| 2015 | 7 | DART: Dropouts Meet Multiple Additive Regression Trees IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore a different approach to address the problem, that of employing dropouts, a tool that has been recently proposed in the context of learning deep neural networks. |
Rashmi Korlakai Vinayak; Ran Gilad-Bachrach; |
| 2015 | 8 | Estimation From Pairwise Comparisons: Sharp Minimax Bounds With Topology Dependence IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The Bradley-Terry-Luce (BTL) and Thurstone models are the most widely used parametric models for such pairwise comparison data. |
NIHAR SHAH et. al. |
| 2015 | 9 | A Rate Of Convergence For Mixture Proportion Estimation, With Application To Learning From Noisy Labels IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we establish a rate of convergence for mixture proportion estimation under an appropriate distributional assumption, and argue that this rate of convergence is useful for analyzing weakly supervised learning algorithms that build on MPE. |
Clayton Scott; |
| 2015 | 10 | Deep Exponential Families IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe deep exponential families (DEFs), a class of latent variable models that are inspired by the hidden structures used in deep neural networks. |
Rajesh Ranganath; Linpeng Tang; Laurent Charlin; David Blei; |
| 2015 | 11 | WASP: Scalable Bayes Via Barycenters Of Subset Posteriors IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple, general, and highly efficient approach, which first runs a posterior sampling algorithm in parallel on different machines for subsets of a large data set. |
Sanvesh Srivastava; Volkan Cevher; Quoc Dinh; David Dunson; |
| 2015 | 12 | Infinite Edge Partition Models For Overlapping Community Detection And Link Prediction IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A hierarchical gamma process infinite edge partition model is proposed to factorize the binary adjacency matrix of an unweighted undirected relational network under a Bernoulli-Poisson link. |
Mingyuan Zhou; |
| 2015 | 13 | Toward Minimax Off-policy Value Estimation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy. |
Lihong Li; Remi Munos; Csaba Szepesvari; |
| 2015 | 14 | Learning Deep Sigmoid Belief Networks With Data Augmentation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deep directed generative models are developed. |
Zhe Gan; Ricardo Henao; David Carlson; Lawrence Carin; |
| 2015 | 15 | On Theoretical Properties Of Sum-Product Networks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we fill some gaps in the theoretic foundation of SPNs. |
Robert Peharz; Sebastian Tschiatschek; Franz Pernkopf; Pedro Domingos; |
| 2014 | 1 | Black Box Variational Inference IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a “black box” variational inference algorithm, one that can be quickly applied to many models with little additional derivation. |
Rajesh Ranganath; Sean Gerrish; David Blei; |
| 2014 | 2 | A New Approach To Probabilistic Programming Inference IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce and demonstrate a new approach to inference in expressive probabilistic programming languages based on particle Markov chain Monte Carlo. |
Frank Wood; Jan Willem Meent; Vikash Mansinghka; |
| 2014 | 3 | Efficient Transfer Learning Method For Automatic Hyperparameter Tuning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a fast and effective algorithm for automatic hyperparameter tuning that can generalize across datasets. |
Dani Yogatama; Gideon Mann; |
| 2014 | 4 | Student-t Processes As Alternatives To Gaussian Processes IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the Student-t process as an alternative to the Gaussian process as a nonparametric prior over functions. |
Amar Shah; Andrew Wilson; Zoubin Ghahramani; |
| 2014 | 5 | Cluster Canonical Correlation Analysis IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present cluster canonical correlation analysis (cluster-CCA) for joint dimensionality reduction of two sets of data points. |
Nikhil Rasiwasia; Dhruv Mahajan; Vijay Mahadevan; Gaurav Aggarwal; |
| 2014 | 6 | Avoiding Pathologies In Very Deep Networks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an alternate network architecture which does not suffer from this pathology. |
David Duvenaud; Oren Rippel; Ryan Adams; Zoubin Ghahramani; |
| 2014 | 7 | Heterogeneous Domain Adaptation For Multiple Classes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an efficient Multi-class Heterogeneous Domain Adaptation (HDA) method, where data from the source and target domains are represented by heterogeneous features with different dimensions. |
Joey Tianyi Zhou; Ivor W.Tsang; Sinno Jialin Pan; Mingkui Tan; |
| 2014 | 8 | On Correlation And Budget Constraints In Model-based Bandit Optimization With Application To Automatic Machine Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a Bayesian approach for this problem and show that it empirically outperforms both the existing frequentist counterpart and other Bayesian optimization methods. |
Matthew Hoffman; Bobak Shahriari; Nando Freitas; |
| 2014 | 9 | In Defense Of Minhash Over Simhash IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we provide a theoretical answer (validated by experiments) that MinHash virtually always outperforms SimHash when the data are binary, as common in practice such as search. |
Anshumali Shrivastava; Ping Li; |
| 2014 | 10 | Mixed Graphical Models Via Exponential Families IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study several instances of our model, and propose scalable M-estimators for recovering the underlying network structure. |
Eunho Yang; Yulia Baker; Pradeep Ravikumar; Genevera Allen; Zhandong Liu; |
| 2014 | 11 | Accelerating ABC Methods Using Gaussian Processes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Gaussian process (GP) accelerated ABC, which we show can significantly reduce the number of simulations required. |
Richard Wilkinson; |
| 2014 | 12 | Bayesian Nonparametric Poisson Factorization For Recommendation Systems IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a Bayesian nonparametric Poisson factorization model for recommendation systems. |
Prem Gopalan; Francisco J. Ruiz; Rajesh Ranganath; David Blei; |
| 2014 | 13 | Bayesian Multi-Scale Optimistic Optimization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new technique for efficient global optimization that combines Gaussian process confidence bounds and treed simultaneous optimistic optimization to eliminate the need for auxiliary optimization of acquisition functions. |
Ziyu Wang; Babak Shakibi; Lin Jin; Nando Freitas; |
| 2014 | 14 | Explicit Link Between Periodic Covariance Functions And State Space Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper shows how periodic covariance functions in Gaussian process regression can be reformulated as state space models, which can be solved with classical Kalman filtering theory. |
Arno Solin; Simo S�rkk�; |
| 2014 | 15 | Visual Boundary Prediction: A Deep Neural Prediction Network And Quality Dissection IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates visual boundary detection, i.e. prediction of the presence of a boundary at a given image location. |
Jyri Kivinen; Chris Williams; Nicolas Heess; |
| 2013 | 1 | Deep Gaussian Processes IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce deep Gaussian process (GP) models. |
Andreas Damianou; Neil Lawrence; |
| 2013 | 2 | Further Optimal Regret Bounds For Thompson Sampling IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a novel regret analysis for Thompson Sampling that proves the first near-optimal problem-independent bound of O(\sqrtNT\ln T) on the expected regret of this algorithm. |
Shipra Agrawal; Navin Goyal; |
| 2013 | 3 | Learning Social Infectivity In Sparse Low-rank Networks Using Multi-dimensional Hawkes Processes IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a convex optimization approach to discover the hidden network of social influence by modeling the recurrent events at different individuals as multi-dimensional Hawkes processes. |
Ke Zhou; Hongyuan Zha; Le Song; |
| 2013 | 4 | ODE Parameter Inference Using Adaptive Gradient Matching With Gaussian Processes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The present paper discusses a method based on nonparametric Bayesian statistics with Gaussian processes due to Calderhead et al. (2008), and shows how inference in this model can be substantially improved by consistently sampling from the joint distribution of the ODE parameters and GP hyperparameters. |
Frank Dondelinger; Dirk Husmeier; Simon Rogers; Maurizio Filippone; |
| 2013 | 5 | Clustered Support Vector Machines IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Clustered Support Vector Machine (CSVM), which tackles the data in a divide and conquer manner. |
Quanquan Gu; Jiawei Han; |
| 2013 | 6 | Supervised Sequential Classification Under Budget Constraints IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we develop a framework for a sequential decision making under budget constraints for multi-class classification. |
Kirill Trapeznikov; Venkatesh Saligrama; |
| 2013 | 7 | Distribution-Free Distribution Regression IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we develop theory and methods for distribution-free versions of distribution regression. |
Barnabas Poczos; Aarti Singh; Alessandro Rinaldo; Larry Wasserman; |
| 2013 | 8 | Uncover Topic-Sensitive Information Diffusion Networks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a continuous time model, TopicCascade, for topic-sensitive information diffusion networks, and infer the hidden diffusion networks and the topic dependent transmission rates from the observed time stamps and contents of cascades. |
Nan Du; Le Song; Hyenkyun Woo; Hongyuan Zha; |
| 2013 | 9 | Stochastic Blockmodeling Of Relational Event Dynamics IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Several approaches have recently been proposed for modeling of continuous-time network data via dyadic event rates conditioned on the observed history of events and nodal or dyadic covariates. |
Christopher DuBois; Carter Butts; Padhraic Smyth; |
| 2013 | 10 | Meta-Transportability Of Causal Effects: A Formal Approach IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper considers the problem of transferring experimental findings learned from multiple heterogeneous domains to a different environment, in which only passive observations can be collected. |
Elias Bareinboim; Judea Pearl; |
| 2013 | 11 | Changepoint Detection Over Graphs With The Spectral Scan Statistic IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the change-point detection problem of deciding, based on noisy measurements, whether an unknown signal over a given graph is constant or is instead piecewise constant over two induced subgraphs of relatively low cut size. |
James Sharpnack; Aarti Singh; Alessandro Rinaldo; |
| 2013 | 12 | Thompson Sampling In Switching Environments With Bayesian Online Change Detection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. |
Joseph Mellor; Jonathan Shapiro; |
| 2013 | 13 | Data-driven Covariate Selection For Nonparametric Estimation Of Causal Effects IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this contribution, we analyze the problem of inferring whether a given variable has a causal effect on another and, if it does, inferring an adjustment set of covariates that yields a consistent and unbiased estimator of this effect, based on the (conditional) independence and dependence relationships among the observed variables. |
Doris Entner; Patrik Hoyer; Peter Spirtes; |
| 2013 | 14 | Convex Collective Matrix Factorization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing algorithms to estimate parameters of collective matrix factorization models are based on non-convex formulations of the problem; in this paper, a convex formulation of this approach is proposed. |
Guillaume Bouchard; Dawei Yin; Shengbo Guo; |
| 2013 | 15 | Learning Markov Networks With Arithmetic Circuits IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce ACMN, the first ever method for learning efficient Markov networks with arbitrary conjunctive features. |
Daniel Lowd; Amirmohammad Rooshenas; |
| 2012 | 1 | Approximate Inference In Additive Factorial HMMs With Application To Energy Disaggregation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose an alternative inference method for such models, which exploits their additive structure by 1) looking at the observed difference signal of the observation, 2) incorporating a “robust” mixture component that can account for unmodeled observations, and 3) constraining the posterior to allow at most one hidden state to change at a time. |
J. Zico Kolter; Tommi Jaakkola; |
| 2012 | 2 | Generic Methods For Optimization-Based Modeling IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: "Energy” models for continuous domains can be applied to many problems, but often suffer from high computational expense in training, due to the need to repeatedly minimize the energy function to high accuracy. |
Justin Domke; |
| 2012 | 3 | Joint Learning Of Words And Meaning Representations For Open-Text Semantic Parsing IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method that learns to assign MRs to a wide range of text (using a dictionary of more than 70,000 words mapped to more than 40,000 entities) thanks to a training scheme that combines learning from knowledge bases (e.g. WordNet) with learning from raw text. |
Antoine Bordes; Xavier Glorot; Jason Weston; Yoshua Bengio; |
| 2012 | 4 | On Bayesian Upper Confidence Bounds For Bandit Problems IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show in this paper that methods derived from this second perspective prove optimal when evaluated using the frequentist cumulated regret as a measure of performance. |
Emilie Kaufmann; Olivier Cappe; Aurelien Garivier; |
| 2012 | 5 | Bayesian Classifier Combination IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore a general framework for Bayesian model combination (which differs from model averaging) in the context of classification. |
Hyun-Chul Kim; Zoubin Ghahramani; |
| 2012 | 6 | Random Feature Maps For Dot Product Kernels IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Approximating non-linear kernels using feature maps has gained a lot of interest in recent years due to applications in reducing training and testing times of SVM classifiers and other kernel based learning algorithms. |
Purushottam Kar; Harish Karnick; |
| 2012 | 7 | Beta-Negative Binomial Process And Poisson Factor Analysis IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: A beta-negative binomial (BNB) process is proposed, leading to a beta-gamma-Poisson process, which may be viewed as a “multi-scoop” generalization of the beta-Bernoulli process. … |
Mingyuan Zhou; Lauren Hannah; David Dunson; Lawrence Carin; |
| 2012 | 8 | Online Incremental Feature Learning With Denoising Autoencoders IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an incremental feature learning algorithm to determine the optimal model complexity for large-scale, online datasets based on the denoising autoencoder. |
Guanyu Zhou; Kihyuk Sohn; Honglak Lee; |
| 2012 | 9 | Deep Learning Made Easier By Linear Transformations In Perceptrons IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We transform the outputs of each hidden neuron in a multi-layer perceptron network to have zero activation and zero slope on average, and use separate shortcut connections to model the linear dependencies instead. |
Tapani Raiko; Harri Valpola; Yann Lecun; |
| 2012 | 10 | Controlling Selection Bias In Causal Inference IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper highlights several graphical and algebraic methods capable of mitigating and sometimes eliminating this bias. |
Elias Bareinboim; Judea Pearl; |
| 2012 | 11 | On Estimation And Selection For Topic Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization. |
Matt Taddy; |
| 2012 | 12 | Minimax Hypothesis Testing For Curve Registration IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use the generalized likelihood ratio to build a nonadaptive procedure depending on a tuning parameter, which we choose in an optimal way according to the smoothness of the ambient space. |
Olivier Collier; |
| 2012 | 13 | Online-to-Confidence-Set Conversions And Application To Sparse Stochastic Bandits IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel technique, which we call online-to-confidence-set conversion. |
Yasin Abbasi-Yadkori; David Pal; Csaba Szepesvari; |
| 2012 | 14 | Krylov Subspace Descent For Deep Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a second order optimization method to learn models where both the dimensionality of the parameter space and the number of training samples is high. |
Oriol Vinyals; Daniel Povey; |
| 2012 | 15 | Minimax Rates Of Estimation For Sparse PCA In High Dimensions IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study sparse principal components analysis in the high-dimensional setting, where p (the number of variables) can be much larger than n (the number of observations). |
Vincent Vu; Jing Lei; |
| 2011 | 1 | Deep Sparse Rectifier Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-differentiability at zero, creating sparse representations with true zeros which seem remarkably suitable for naturally sparse data. |
Xavier Glorot; Antoine Bordes; Yoshua Bengio; |
| 2011 | 2 | An Analysis Of Single-Layer Networks In Unsupervised Feature Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we will apply several off-the-shelf feature learning algorithms (sparse auto-encoders, sparse RBMs, K-means clustering, and Gaussian mixtures) to CIFAR-10, NORB, and STL datasets using only single-layer networks. |
Adam Coates; Andrew Ng; Honglak Lee; |
| 2011 | 3 | A Reduction Of Imitation Learning And Structured Prediction To No-Regret Online Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. |
Stephane Ross; Geoffrey Gordon; Drew Bagnell; |
| 2011 | 4 | Contextual Bandits With Linear Payoff Functions IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. |
Wei Chu; Lihong Li; Lev Reyzin; Robert Schapire; |
| 2011 | 5 | The Neural Autoregressive Distribution Estimator IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a new approach for modeling the distribution of high-dimensional vectors of discrete variables. |
Hugo Larochelle; Iain Murray; |
| 2011 | 6 | Online Variational Inference For The Hierarchical Dirichlet Process IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an online variational inference algorithm for the HDP, an algorithm that is easily applicable to massive and streaming data. |
Chong Wang; John Paisley; David Blei; |
| 2011 | 7 | Relative Entropy Inverse Reinforcement Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a model-free IRL algorithm, where the relative entropy between the empirical distribution of the state-action trajectories under a baseline policy and their distribution under the learned policy is minimized by stochastic gradient descent. |
Abdeslam Boularias; Jens Kober; Jan Peters; |
| 2011 | 8 | Contextual Bandit Algorithms With Supervised Learning Guarantees IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a modification of the Exp4 algorithm of [Auer et al. 2002] called Exp4.P, which with high probability incurs regret at most O(\sqrtKT\ln N). |
Alina Beygelzimer; John Langford; Lihong Li; Lev Reyzin; Robert Schapire; |
| 2011 | 9 | Follow-the-Regularized-Leader And Mirror Descent: Equivalence Theorems And L1 Regularization IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The FTRL-Proximal algorithm, which we introduce, can be seen as a hybrid of these two algorithms, and significantly outperforms both on a large, real-world dataset. |
Brendan McMahan; |
| 2011 | 10 | Lightweight Implementations Of Probabilistic Programming Languages Via Transformational Compilation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a general method of transforming arbitrary programming languages into probabilistic programming languages with straightforward MCMC inference engines. |
David Wingate; Andreas Stuhlmueller; Noah Goodman; |
| 2011 | 11 | Deep Learning For Efficient Discriminative Parsing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new fast purely discriminative algorithm for natural language parsing, based on a “deep” recurrent convolutional graph transformer network (GTN). |
Ronan Collobert; |
| 2011 | 12 | Parallel Gibbs Sampling: From Colored Fields To Thin Junction Trees IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose two methods to construct parallel Gibbs samplers guaranteed to draw from the targeted distribution. |
Joseph Gonzalez; Yucheng Low; Arthur Gretton; Carlos Guestrin; |
| 2011 | 13 | Deep Learners Benefit More From Out-of-Distribution Examples IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent theoretical and empirical work in statistical machine learning has demonstrated the potential of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple levels of representation. |
YOSHUA BENGIO et. al. |
| 2011 | 14 | Multi-Label Output Codes Using Canonical Correlation Analysis IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using canonical correlation analysis, we propose an error-correcting code for multi-label classification. |
Yi Zhang; Jeff Schneider; |
| 2011 | 15 | Empirical Risk Minimization Of Graphical Model Parameters Given Approximate Inference, Decoding, And Model Structure IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Graphical models are often used “inappropriately,” with approximations in the topology, inference, and prediction. |
Veselin Stoyanov; Alexander Ropson; Jason Eisner; |
| 2010 | 1 | Understanding The Difficulty Of Training Deep Feedforward Neural Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence. |
Xavier Glorot; Yoshua Bengio; |
| 2010 | 2 | Noise-contrastive Estimation: A New Estimation Principle For Unnormalized Statistical Models IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new estimation principle for parameterized statistical models. |
Michael Gutmann; Aapo Hyv�rinen; |
| 2010 | 3 | Why Does Unsupervised Pre-training Help Deep Learning? IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. |
Dumitru Erhan; Aaron Courville; Yoshua Bengio; Pascal Vincent; |
| 2010 | 4 | Efficient Reductions For Imitation Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction. |
Stephane Ross; Drew Bagnell; |
| 2010 | 5 | Bayesian Gaussian Process Latent Variable Model IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. |
Michalis Titsias; Neil D. Lawrence; |
| 2010 | 6 | Elliptical Slice Sampling IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Many probabilistic models introduce strong dependencies between variables using a latent multivariate Gaussian distribution or a Gaussian process. |
Iain Murray; Ryan Adams; David MacKay; |
| 2010 | 7 | Efficient Learning Of Deep Boltzmann Machines IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables. |
Ruslan Salakhutdinov; Hugo Larochelle; |
| 2010 | 8 | Structured Sparse Principal Component Analysis IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes. |
Rodolphe Jenatton; Guillaume Obozinski; Francis Bach; |
| 2010 | 9 | Contextual Multi-Armed Bandits IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For any ε> 0 we present an algorithm with regret O(T^\fraca+b+1a+b+2 + ε) where a,b are the covering dimensions of the query space and the ad space respectively. |
Tyler Lu; David Pal; Martin Pal; |
| 2010 | 10 | Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a factored 3-way RBM that uses the states of its hidden units to represent abnormalities in the local covariance structure of an image. |
Marc�Aurelio Ranzato; Alex Krizhevsky; Geoffrey Hinton; |
| 2010 | 11 | Learning Bayesian Network Structure Using LP Relaxations IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to solve the combinatorial problem of finding the highest scoring Bayesian network structure from data. |
Tommi Jaakkola; David Sontag; Amir Globerson; Marina Meila; |
| 2010 | 12 | Modeling Annotator Expertise: Learning When Everybody Knows A Bit Of Something IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper develops a probabilistic approach to this problem when annotators may be unreliable (labels are noisy), but also their expertise varies depending on the data they observe (annotators may have knowledge about different parts of the input space). |
YAN YAN et. al. |
| 2010 | 13 | Exclusive Lasso For Multi-task Feature Selection IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a framework of kernel-based multi-task feature selection algorithm based on the proposed exclusive lasso regularizer. |
Yang Zhou; Rong Jin; Steven Chu�Hong Hoi; |
| 2010 | 14 | Gaussian Processes With Monotonicity Information IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A method for using monotonicity information in multivariate Gaussian process regression and classification is proposed. |
Jaakko Riihim�ki; Aki Vehtari; |
| 2010 | 15 | Dirichlet Process Mixtures Of Generalized Linear Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLMs), a new method of nonparametric regression that accommodates continuous and categorical inputs, models a response variable locally by a generalized linear model. |
Lauren Hannah; David Blei; Warren Powell; |
| 2009 | 1 | Deep Boltzmann Machines IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables. |
Ruslan Salakhutdinov; Geoffrey Hinton; |
| 2009 | 2 | Variational Learning Of Inducing Variables In Sparse Gaussian Processes IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a variational formulation for sparse approximations that jointly infers the inducing inputs and the kernel hyperparameters by maximizing a lower bound of the true log marginal likelihood. |
Michalis Titsias; |
| 2009 | 3 | Efficient Graphlet Kernels For Large Graph Comparison IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article we propose to compare graphs by counting \it graphlets, \ie subgraphs with k nodes where k ∈{ 3, 4, 5 }. |
Nino Shervashidze; SVN Vishwanathan; Tobias Petri; Kurt Mehlhorn; Karsten Borgwardt; |
| 2009 | 4 | Relational Topic Models For Document Networks IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop the relational topic model (RTM), a model of documents and the links between them. |
Jonathan Chang; David Blei; |
| 2009 | 5 | Learning A Parametric Embedding By Preserving Local Structure IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper presents a new unsupervised dimensionality reduction technique, called parametric t-SNE, that learns a parametric mapping between the high-dimensional data space and the low-dimensional latent space. |
Laurens van der Maaten; |
| 2009 | 6 | Handling Sparsity Via The Horseshoe IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a general, fully Bayesian framework for sparse supervised-learning problems based on the horseshoe prior. |
Carlos M. Carvalho; Nicholas G. Polson; James G. Scott; |
| 2009 | 7 | The Difficulty Of Training Deep Architectures And The Effect Of Unsupervised Pre-Training IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures was unsuccessful until the recent advent of algorithms based on unsupervised pretraining. |
Dumitru Erhan; Pierre-Antoine Manzagol; Yoshua Bengio; Samy Bengio; Pascal Vincent; |
| 2009 | 8 | Optimizing Costly Functions With Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: An optimization algorithm for minimizing a smooth function over a convex set is described. |
Mark Schmidt; Ewout Berg; Michael Friedlander; Kevin Murphy; |
| 2009 | 9 | Online Inference Of Topics With Latent Dirichlet Allocation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce two related Rao-Blackwellized online inference algorithms for the latent Dirichlet allocation (LDA) model – incremental Gibbs samplers and particle filters – and compare their runtime and performance to that of existing algorithms. |
Kevin Canini; Lei Shi; Thomas Griffiths; |
| 2009 | 10 | Latent Force Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modeling with a physical model of the system. |
Mauricio �lvarez; David Luengo; Neil D. Lawrence; |
| 2009 | 11 | A New Perspective For Information Theoretic Feature Selection IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The contribution of this paper is a unifying theoretical understanding of such filters. |
Gavin Brown; |
| 2009 | 12 | Residual Splash For Optimally Parallelizing Belief Propagation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we focus on parallelizing message passing inference algorithms in graphical models. |
Joseph Gonzalez; Yucheng Low; Carlos Guestrin; |
| 2009 | 13 | Particle Belief Propagation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe a generic particle belief propagation (PBP) algorithm which is closely related to previously proposed methods. |
Alexander Ihler; David McAllester; |
| 2009 | 14 | Probabilistic Models For Incomplete Multi-dimensional Arrays IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a probabilistic framework for modeling structural dependency from partially observed multi-dimensional array data, known as pTucker. |
Wei Chu; Zoubin Ghahramani; |
| 2009 | 15 | Sampling Techniques For The Nystrom Method IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an analysis of different sampling techniques for the Nystrom method. |
Sanjiv Kumar; Mehryar Mohri; Ameet Talwalkar; |
| 2007 | 1 | Learning A Nonlinear Embedding By Preserving Class Neighbourhood Structure IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show how to pretrain and fine-tune a multilayer neural network to learn a nonlinear transformation from the input space to a lowdimensional feature space in which K-nearest neighbour classification performs well. |
Ruslan Salakhutdinov; Geoff Hinton; |
| 2007 | 2 | Hierarchical Beta Processes And The Indian Buffet Process IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the beta process is the de Finetti mixing distribution underlying the Indian buffet process of [2]. |
Romain Thibaux; Michael I. Jordan; |
| 2007 | 3 | A Stochastic Quasi-Newton Method For Online Convex Optimization IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We are working on analyzing the convergence of online (L)BFGS, and extending it to nonconvex optimization problems. |
Nicol N. Schraudolph; Jin Yu; Simon G�nter; |
| 2007 | 4 | Bayesian Inference And Optimal Design In The Sparse Linear Model IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate our framework on a gene network identification task. |
Matthias Seeger; Florian Steinke; Koji Tsuda; |
| 2007 | 5 | Hidden Topic Markov Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given these parameters, the topics of all words in the same document are assumed to be independent. |
Amit Gruber; Yair Weiss; Michal Rosen-Zvi; |
| 2007 | 6 | Stick-breaking Construction For The Indian Buffet Process IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we derive a stick-breaking representation for the IBP. |
Yee Whye Teh; Dilan Gr�r; Zoubin Ghahramani; |
| 2007 | 7 | Local And Global Sparse Gaussian Process Approximations IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we start by investigating the regimes in which these different approaches work well or fail. |
Edward Snelson; Zoubin Ghahramani; |
| 2007 | 8 | Learning Multilevel Distributed Representations For High-Dimensional Sequences IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a new family of non-linear sequence models that are substantially more powerful than hidden Markov models or linear dynamical systems. |
Ilya Sutskever; Geoffrey Hinton; |
| 2007 | 9 | Exact Bayesian Structure Learning From Uncertain Interventions IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We apply our exact technique to a biological data set that had previously been analyzed using MCMC [SPP+ 05, EW06, WGH06]. |
Daniel Eaton; Kevin Murphy; |
| 2007 | 10 | Generalized Non-metric Multidimensional Scaling IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we look at a generalization of this problem in which only a set of order relations of the form $d_{ij} < d_{kl}$ are provided. |
SAMEER AGARWAL et. al. |
| 2007 | 11 | Metric Learning For Kernel Regression IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we construct a novel algorithm for supervised metric learning, which learns a distance function by directly minimizing the leave-one-out regression error. |
Kilian Q. Weinberger; Gerald Tesauro; |
| 2007 | 12 | Transductive Classification Via Local Learning Regularization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we adapt it for Transductive Classification (TC) problems. |
Mingrui Wu; Bernhard Sch�lkopf; |
| 2007 | 13 | Dissimilarity In Graph-Based Semi-Supervised Classification IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a semi-supervised classification algorithm that learns from dissimilarity and similarity information on labeled and unlabeled data. |
Andrew B. Goldberg; Xiaojin Zhu; Stephen Wright; |
| 2007 | 14 | A Unified Energy-Based Framework For Unsupervised Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a view of unsupervised learning that integrates probabilistic and nonprobabilistic methods for clustering, dimensionality reduction, and feature extraction in a unified framework. |
Marc�Aurelio Ranzato; Y-Lan Boureau; Sumit Chopra; Yann LeCun; |
| 2007 | 15 | Learning For Larger Datasets With The Gaussian Process Latent Variable Model IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we apply the latest techniques in sparse Gaussian process regression (GPR) to the Gaussian process latent variable model (GPLVM). |
Neil D. Lawrence; |
| 2005 | 1 | Convergent Tree-reweighted Message Passing for Energy Minimization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Vladimir Kolmogorov; |
| 2005 | 2 | Hierarchical Probabilistic Neural Network Language Model IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Frederic Morin; Yoshua Bengio; |
| 2005 | 3 | Semi-Supervised Classification By Low Density Separation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Olivier Chapelle; Alexander Zien; |
| 2005 | 4 | Semisupervised Alignment of Manifolds IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Jihun Ham; Daniel Lee; Lawrence Saul; |
| 2005 | 5 | Kernel Methods for Missing Variables IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Alex J. Smola; S. V. N. Vishwanathan; Thomas Hofmann; |
| 2005 | 6 | Semiparametric Latent Factor Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Yee Whye Teh; Matthias Seeger; Michael I. Jordan; |
| 2005 | 7 | Hilbertian Metrics and Positive Definite Kernels on Probability Measures IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Matthias Hein; Olivier Bousquet; |
| 2005 | 8 | Efficient Non-Parametric Function Induction in Semi-Supervised Learning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Olivier Delalleau; Yoshua Bengio; Nicolas Le Roux; |
| 2005 | 9 | Nonlinear Dimensionality Reduction By Semidefinite Programming and Kernel Matrix Factorization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Kilian Weinberger; Benjamin Packer; Lawrence Saul; |
| 2005 | 10 | Loss Functions for Discriminative Training of Energy-Based Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Yann LeCun; Fu Jie Huang; |
| 2005 | 11 | Inadequacy of Interval Estimates Corresponding to Variational Bayesian Approximations IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Bo Wang; D. M. Titterington; |
| 2005 | 12 | On Manifold Regularization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Misha Belkin; Partha Niyogi; Vikas Sindhwani; |
| 2005 | 13 | Approximate Inference for Infinite Contingent Bayesian Networks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
BRIAN MILCH et. al. |
| 2005 | 14 | Bayesian Conditional Random Fields IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Yuan Qi; Martin Szummer; Tom Minka; |
| 2005 | 15 | Kernel Constrained Covariance for Dependence Measurement IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
ARTHUR GRETTON et. al. |
| 2003 | 1 | Fast Marginal Likelihood Maximisation for Sparse Bayesian Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a new and highly accelerated algorithm which exploits recently-elucidated properties of the marginal likelihood function to enable maximisation via a principled and efficient sequential addition and deletion of candidate basis functions. |
Michael E. Tipping; Anita C. Faul; |
| 2003 | 2 | Fast Forward Selection to Speed Up Sparse Gaussian Process Regression IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection. |
Matthias W. Seeger; Christopher K. I. Williams; Neil D. Lawrence; |
| 2003 | 3 | Planning By Probabilistic Inference IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents and demonstrates a new approach to the problem of planning under uncertainty. |
Hagai Attias; |
| 2003 | 4 | Tree-reweighted Belief Propagation Algorithms and Approximate ML Estimation By Pseudo-Moment Matching IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In previous work [10] we presented a class of upper bounds on the log partition function of an arbitrary undirected graphical model based on solving a convex variational problem. |
Martin J. Wainwright; Tommi S. Jaakkola; Alan S. Willsky; |
| 2003 | 5 | On The Naive Bayes Model for Text Categorization IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We examine the "naive Bayes" assumption in the four models and show that the multinomial model is a modified naive Bayes Poisson model that assumes independence of document length and document class. |
Susana Eyheramendy; David D. Lewis; David Madigan; |
| 2003 | 6 | A Generalized Linear Model for Principal Component Analysis of Binary Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we derive an alternating least squares method to estimate the basis vectors and generalized linear coefficients of the logistic PCA model. |
Andrew I. Schein; Lawrence K. Saul; Lyle H. Ungar; |
| 2003 | 7 | A Unifying Theorem for Spectral Embedding and Clustering IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a theorem that explains, for broad classes of affinity matrices and eigenbases, why this works: For successively smaller eigenbases (i.e., using fewer and fewer of the affinity matrix’s dominant eigenvalues and eigenvectors), the angles between "similar" vectors in the new representation shrink while the angles between "dissimilar" vectors grow. |
Matthew Brand; Kun Huang; |
| 2003 | 8 | Super-resolution Enhancement of Video IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of enhancing the resolution of video through the addition of perceptually plausible high frequency information. |
Christopher M. Bishop; Andrew Blake; Bhaskara Marthi; |
| 2003 | 9 | Solving Markov Random Fields Using Semi Definite Programming IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores a new generic method for matching, when there are conditional dependencies between the matches. |
Philip H. S. Torr; |
| 2003 | 10 | Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper consists of an empirical examination of the first assumption, and surveys, implements and compares techniques by which logistic regression can be scaled to data with millions of attributes and records. |
Paul Komarek; Andrew W. Moore; |
| 2003 | 11 | Rapid Evaluation of Multiple Density Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present a generalization of that algorithm allowing multiple models with different bandwidths to be computed simultaneously, in substantially less time than either running the singlebandwidth algorithm for each model independently, or running the standard exhaustive method. |
Alexander G. Gray; Andrew W. Moore; |
| 2003 | 12 | Curve Clustering with Random Effects Regression Mixtures IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we address the problem of clustering sets of curve or trajectory data generated by groups of objects or individuals. |
Scott Gaffney; Padhraic Smyth; |
| 2003 | 13 | Bayesian Inference in The Presence of Determinism IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the problem of performing inference on Bayesian networks which exhibit a substantial degree of determinism. |
David Larkin; Rina Dechter; |
| 2003 | 14 | Combining Conjugate Direction Methods with Stochastic Approximation of Gradients IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The method of conjugate directions provides a very effective way to optimize large, deterministic systems by gradient descent. |
Nicol N. Schraudolph; Thore Graepel; |
| 2003 | 15 | On Improving The Efficiency of The Iterative Proportional Fitting Procedure IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When the junction tree has an intractably large maximum clique size we propose to maximize an approximate constrained entropy based on region graphs (Yedidia et al., 2002). |
Yee Whye Teh; Max Welling; |
| 2001 | 1 | A Random Walks View of Spectral Segmentation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new view of clustering and segmentation by pairwise similarities. |
Marina Meila; Jianbo Shi; |
| 2001 | 2 | On The Effectiveness of The Skew Divergence for Statistical Language Analysis IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Many measures of distributional similarity have been proposed for use in the distance-weighted averaging framework; here, we empirically study their stability properties, finding that similarity-based estimation appears to make more efficient use of more reliable portions of the training data. |
Lillian Lee; |
| 2001 | 3 | An Anytime Algorithm for Causal Inference IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, I prove that the FCI algorithm can be interrupted at any stage and asked for output. |
Peter Spirtes; |
| 2001 | 4 | An Improved Training Algorithm for Kernel Fisher Discriminants IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a fast training algorithm for the kernel Fisher discriminant classifier. |
Sebastian Mika; Alexander J. Smola; Bernhard Sch�lkopf; |
| 2001 | 5 | Learning Bayesian Networks with Mixed Variables IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The paper considers conditional Gaussian networks. As conjugate local priors, we use the Dirichlet distribution for discrete variables and the Gaussian-inverse Gamma distribution … |
Susanne Bottcher; |
| 2001 | 6 | Using Unsupervised Learning to Guide Resampling in Imbalanced Data Sets IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The purpose of this paper is to argue that methods that equalize class membership are not as effective as possible when applied blindly and that improvements can be obtained by adjusting for the within-class imbalance. |
Adam Nickerson; Nathalie Japkowicz; Evangelos E. Milios; |
| 2001 | 7 | Bagging and The Bayesian Bootstrap IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a Bayesian version of bagging based on the Bayesian bootstrap. |
Merlise Clyde; Herbert Lee; |
| 2001 | 8 | Bayesian Support Vector Regression IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the Bayesian evidence framework can be applied to both $\epsilon$-support vector regression ($\epsilon$-SVR) and $\nu$-support vector regression ($\nu$-SVR) algorithms. |
Martin H. C. Law; James Tin-Yau Kwok; |
| 2001 | 9 | Why Averaging Classifiers Can Protect Against Overfitting IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study a simple learning algorithm for binary classification. |
Yoav Freund; Yishay Mansour; Robert E. Schapire; |
| 2001 | 10 | A Non-Parametric EM-Style Algorithm for Imputing Missing Values IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an iterative non-parametric algorithm for imputing missing values. |
Rich Caruana; |
| 2001 | 11 | Models for Conditional Probability Tables in Educational Assessment IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Experts in educational assessment can often identify the skills needed to provide a solution for a test item and which patterns of those skills pro duce better expected performance. |
RUSSELL G. ALMOND et. al. |
| 2001 | 12 | Products of Hidden Markov Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present products of hidden Markov models (PoHMM’s), a way of combining HMM’s to form a distributed state time series model. |
Andrew D. Brown; Geoffrey E. Hinton; |
| 2001 | 13 | A Kernel Approach for Vector Quantization with Guaranteed Distortion Bounds IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a kernel method for vector quantization and clustering. |
Michael E. Tipping; Bernhard Sch�lkopf; |
| 2001 | 14 | Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Empirical study of sensitivity analysis on a Bayesian network examines the effects of varying the network’s probability parameters on the posterior probabilities of the true hypothesis. |
Oscar Kipersztok; Haiqin Wang; |
| 2001 | 15 | Finding A Path Is Harder Than Finding A Tree IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This note shows that the problem of learning an optimal chain graphical model from data is NP-hard for the Bayesian, maximum likelihood, and minimum description length approaches. |
Christopher Meek; |
| 1999 | 1 | Model Choice: A Minimum Posterior Predictive Loss Approach IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a predictive criterion where the goal is good prediction of a replicate of the observed data but tempered by fidelity to the observed values. |
Sujit Kumar Ghosh; Alan E. Gelfand; |
| 1999 | 2 | Probabilistic Kernel Regression Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a class of flexible conditional probability models and techniques for classification/regression problems. |
Tommi S. Jaakkola; David Haussler; |
| 1999 | 3 | Boosting Methodology for Regression Problems IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we develop a new boosting method for regression problems. |
Greg Ridgeway; David Madigan; Thomas S. Richardson; |
| 1999 | 4 | Pattern Discovery Via Entropy Minimization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework for learning hidden-variable models by optimizing entropies, in which entropy minimization, posterior maximization, and free energy minimization are all equivalent. |
Matthew Brand; |
| 1999 | 5 | Stochastic Local Search for Bayesian Network IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper evaluates empirically the suitability of Stochastic Local Search algorithms (SLS) for finding most probable explanations in Bayesian networks. |
Kalev Kask; Rina Dechter; |
| 1999 | 6 | Modeling Decision Tree Performance with The Power Law IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper discusses the use of a power law to predict decision tree performance. |
Lewis J. Frey; Douglas H. Fisher; |
| 1999 | 7 | Conditional Products: An Alternative Approach to Conditional Independence IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new abstract approach to the study of conditional independence, founded on a concept analogous to the factorization properties of probabilistic independence, rather than the separation properties of a graph. |
A. Philip Dawid; Milan Studen�; |
| 1999 | 8 | A Note on The Comparison of Polynomial Selection Methods IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present a detailed empirical evaluation of three model selection methods which include an MML based approach and two SRM based methods. |
Murlikrishna Viswanathan; Chris S. Wallace; |
| 1999 | 9 | Efficient Learning Using Constrained Sufficient Statistics IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new method for speeding up the computational process of learning Bayesian network structure. |
Nir Friedman; Lise Getoor; |
| 1997 | 1 | A Variational Approach to Bayesian Logistic Regression Models and Their Extensions IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider a logistic regression model with a Gaussian prior distribution over the parameters. |
Tommi S. Jaakkola; Michael I. Jordan; |
| 1997 | 2 | The Effects of Training Set Size on Decision Tree Complexity IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents experiments with 19 datasets and 5 decision tree pruning algorithms that show that increasing training set size often results in a linear increase in tree size, even when that additional complexity results in no significant increase in classification accuracy. |
Tim Oates; David Jensen; |
| 1997 | 3 | A Polynomial Time Algorithm for Determining DAG Equivalence in The Presence of Latent Variables and Selection Bias IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Peter Spirtes; Thomas S. Richardson; |
| 1997 | 4 | Comparing Predictive Inference Methods for Discrete Domains IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Predictive inference is seen here as the process of determining the predictive distribution of a discrete variable, given a data set of training examples and the values for the other problem domain variables. We consider three approaches for computing this predictive distribution, and assume that the joint probability distribution for the variables belongs to a set of distributions determined by a set of parametric models. |
Petri Kontkanen; Petri Myllym�ki; Tomi Silander; Henry Tirri; Peter Gr�nwald; |
| 1997 | 5 | A Comparison of Scientific and Engineering Criteria for Bayesian Model Selection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The approximation is useful because it is computationally efficient, and because it provides a model that facilitates understanding· of the domain. |
David Heckerman; David Maxwell Chickering; |
| 1997 | 6 | A Family of Algorithms for Finding Temporal Structure in Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new member of that family for handling event-based data, and offer an empirical characterization of a time series based algorithm. |
Tim Oates; Matthew J. Schmill; David Jensen; Paul R. Cohen; |
| 1997 | 7 | An Algorithm for Bayesian Network Construction from Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an efficient algorithm for constructing Bayesian belief networks from databases. |
Jie Cheng; David A. Bell; Weiru Liu; |
| 1997 | 8 | PAC Learning with Constant-Partition Classification Noise and Applications to Decision Tree Induction IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we introduce a new model of noise called \emph{constant-partition classification noise} (CPCN) which generalizes the standard model of classification noise to allow different examples to have different rates of random misclassification. |
Scott E. Decatur; |
| 1997 | 9 | Robust Parameter Learning in Bayesian Networks with Missing Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a new method to learn the probabilities defining a BBNs from databases with missing data. |
Marco Ramoni; Paola Sebastiani; |
| 1997 | 10 | A Forward Monte Carlo Method for Solving Influence Diagrams Using Local Computation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main goal of this paper is to describe a Monte Carlo method for solving influence diagrams using local computation. |
John M. Charnes; Prakash P. Shenoy; |
| 1997 | 11 | Heuristic Greedy Search Algorithms for Latent Variable Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we will describe how to extend search algorithms developed for non-latent variable DAG models to the case of DAG models with latent variables. |
Peter Spirtes; Thomas S.Richardson; Christopher Meek; |
| 1997 | 12 | Using Prediction to Improve Combinatorial Optimization Search IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes a statistical approach to improving the performance of stochastic search algorithms for optimization. |
Justin A. Boyan; Andrew W. Moore; |
| 1997 | 13 | Conceptual Clustering with Numeric-and-Nominal Mixed Data – A New Similarity Based System IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new Similarity Based Agglomerative Clustering (SBAC) algorithm that works well for data with mixed numeric and nominal features. |
Cen Li; Gautam Biswas; |
| 1997 | 14 | MML Mixture Modelling of Multi-state, Poisson, VonMises Circular and Gaussian Distributions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a brief overview of MML inductive inference (Wallace and Boulton (1968) , Wallace and Freeman (1987)), and how it has both an information-theoretic and a Bayesian interpretation. |
Chris S. Wallace; David L. Dowe; |
| 1997 | 15 | Mixed Memory Markov Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider how to parameterize Markov models with prohibitively large state spaces. |
Lawrence K. Saul; Michael I. Jordan; |