Most Influential AISTATS Papers (2026-03 Version)

March 27, 2026June 24, 2026 admin

The Annual Conference on Artificial Intelligence and Statistics (AISTATS) is one of the top machine learning conferences in the world. Paper Digest Team analyzes all papers published on AISTATS in the past years, and presents the 15 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the latest version of this list or the most influential papers from other conferences/journals, please visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2026-03)

To search or review papers within AISTATS related to a specific topic, please use the search by venue (AISTATS) and review by venue (AISTATS) services. To browse the most productive AISTATS authors by year ranked by #papers accepted, here are the most productive AISTATS authors grouped by year.

Since 2018, Paper Digest has built a foundation of data spanning decades of conferences, journals, and research topics. The platform features a daily digest service that sifts through tens of thousands of new papers, clinical trials, news articles, and community posts, filtering the noise to highlight what matters most to specific interests. Beyond daily updates, dozens of built-in research tools streamline the academic workflow, supporting efficient reading and writing, comprehensive literature reviews, and automated research report generation.

Paper Digest Team
New York City, New York, 10017
team@paperdigest.org

TABLE 1: Most Influential AISTATS Papers (2026-03 Version)

Year	Rank	Paper	Author(s)
2025	1	What and How Does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we conduct a comprehensive study to understand ICL from a statistical perspective.	Yufeng Zhang; Fengzhuo Zhang; Zhuoran Yang; Zhaoran Wang;
2025	2	Ant Colony Sampling with GFlowNets for Combinatorial Optimization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present the Generative Flow Ant Colony Sampler (GFACS), a novel meta-heuristic method that hierarchically combines amortized inference and parallel stochastic search.	MINSU KIM et. al.
2025	3	Convergence Analysis for General Probability Flow ODEs of Diffusion Models in Wasserstein Distances IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we provide the first non-asymptotic convergence analysis for a general class of probability flow ODE samplers in 2-Wasserstein distance, assuming accurate score estimates and smooth log-concave data distributions.	Xuefeng Gao; Lingjiong Zhu;
2025	4	An Iterative Algorithm for Rescaled Hyperbolic Functions Regression IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose an iterative algorithm to solve a rescaled version of the slightly different formulation of the softmax regression problem that arises in attention mechanisms of large language models.	Yeqi Gao; Zhao Song; Junze Yin;
2025	5	From Gradient Clipping to Normalization for Heavy Tailed SGD IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Lastly, even with this knowledge, current sample complexity upper bounds for the method are sub-optimal in nearly all parameters. To address these issues and motivated by practical observations, we make the connection of gradient clipping to its close relative — Normalized SGD (NSGD) — and study its convergence properties.	Florian H�bler; Ilyas Fatkhullin; Niao He;
2025	6	InfoNCE: Identifying The Gap Between Theory and Practice IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Hence, a more realistic assumption is that all latent factors change with a continuum of variability across all factors. We introduce AnInfoNCE, a generalization of InfoNCE that can provably uncover the latent factors in this anisotropic setting, broadly generalizing previous identifiability results in CL.	EVGENIA RUSAK et. al.
2025	7	The Local Learning Coefficient: A Singularity-Aware Complexity Measure IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Recognizing the limitations of traditional complexity measures, the LLC leverages Singular Learning Theory (SLT), which has long recognized the significance of singularities in the loss landscape geometry. This paper provides an extensive exploration of the LLC’s theoretical underpinnings, offering both a clear definition and intuitive insights into its application.	Edmund Lau; Zach Furman; George Wang; Daniel Murfet; Susan Wei;
2025	8	Diffusion Models As Constrained Samplers for Optimization with Unknown Constraints IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Depending on the differentiability of the objective function, we propose two different sampling methods.	LINGKAI KONG et. al.
2025	9	Bypassing The Exponential Dependency: Looped Transformers Efficiently Learn In-context By Multi-step Gradient Descent IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we study linear looped Transformers in-context learning on linear vector generation tasks.	Bo Chen; Xiaoyu Li; Yingyu Liang; Zhenmei Shi; Zhao Song;
2025	10	Towards A Mathematical Theory for Consistency Training in Diffusion Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Consistency models, which were proposed to mitigate the high computational overhead during the sampling phase of diffusion models, facilitate single-step sampling while attaining state-of-the-art empirical performance. When integrated into the training phase, consistency models attempt to train a sequence of consistency functions capable of mapping any point at any time step of the diffusion process to its starting point.	Gen Li; Zhihan Huang; Yuting Wei;
2025	11	SteinDreamer: Variance Reduction for Text-to-3D Score Distillation Via Stein Identity IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we reveal that the gradient estimation in score distillation is inherent to high variance.	PEIHAO WANG et. al.
2025	12	Looped ReLU MLPs May Be All You Need As Practical Programmable Computers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we provide an affirmative answer that a looped 23-layer $\mathsf{ReLU}$-$\mathsf{MLP}$ is capable of performing the basic necessary operations, more efficiently and effectively functioning as a programmable computer than a looped Transformer.	Yingyu Liang; Zhizhou Sha; Zhenmei Shi; Zhao Song; Yufa Zhou;
2025	13	Implicit Diffusion: Efficient Optimization Through Stochastic Sampling IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce a general framework and a new algorithm for first-order optimization of parameterized stochastic diffusions, performing jointly, in a single loop, optimization and sampling steps.	PIERRE MARION et. al.
2025	14	Fourier Circuits in Neural Networks and Transformers: A Case Study of Modular Arithmetic with Multiple Inputs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We direct our focus to the complex algebraic learning task of modular addition involving $k$ inputs. Our research presents a thorough analytical characterization of the features learned by stylized one-hidden layer neural networks and one-layer Transformers in addressing this task.	Chenyang Li; Yingyu Liang; Zhenmei Shi; Zhao Song; Tianyi Zhou;
2025	15	Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we investigate the vulnerability of LLMs aligned using two widely used methods – DPO and PPO – to membership inference attacks (MIAs).	QIZHANG FENG et. al.
2024	1	A General Theoretical Paradigm to Understand Learning from Human Preferences IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: However, this method still heavily relies on the first approximation. In this paper we try to gain a deeper theoretical understanding of these practical algorithms.	MOHAMMAD GHESHLAGHI AZAR et. al.
2024	2	Multi-resolution Time-Series Transformer for Long-term Forecasting IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: The patch size controls the ability of transformers to learn the temporal patterns at different frequencies: shorter patches are effective for learning localized, high-frequency patterns, whereas mining long-term seasonalities and trends requires longer patches. Inspired by this observation, we propose a novel framework, Multi-resolution Time-Series Transformer (MTST), which consists of a multi-branch architecture for simultaneous modeling of diverse temporal patterns at different resolutions.	Yitian Zhang; Liheng Ma; Soumyasundar Pal; Yingxue Zhang; Mark Coates;
2024	3	Generating and Imputing Tabular Data Via Diffusion and Flow-based Gradient-Boosted Trees IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper introduces a novel approach for generating and imputing mixed-type (continuous and categorical) tabular data utilizing score-based diffusion and conditional flow matching.	Alexia Jolicoeur-Martineau; Kilian Fatras; Tal Kachman;
2024	4	Generative Flow Networks As Entropy-Regularized RL IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Our work extends the connection between RL and GFlowNets to a general case.	Daniil Tiapkin; Nikita Morozov; Alexey Naumov; Dmitry P Vetrov;
2024	5	Mechanics of Next Token Prediction with Self-Attention IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we ask: What does a single self-attention layer learn from next-token prediction?	Yingcong Li; Yixiao Huang; Muhammed E. Ildiz; Ankit Singh Rawat; Samet Oymak;
2024	6	Mixture-of-Linear-Experts for Long-term Time Series Forecasting IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: However, due to their inherent simplicity, they are not able to adapt their prediction rules to periodic changes in time series patterns. To address this challenge, we propose a Mixture-of-Experts-style augmentation for linear-centric models and propose Mixture-of-Linear-Experts (MoLE).	Ronghao Ni; Zinan Lin; Shuaiqi Wang; Giulia Fanti;
2024	7	Identifying Spurious Biases Early in Training Through The Lens of Simplicity Bias IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we provide the first theoretical analysis of the effect of simplicity bias on learning spurious correlations.	Yu Yang; Eric Gan; Gintare Karolina Dziugaite; Baharan Mirzasoleiman;
2024	8	Functional Flow Matching IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose Functional Flow Matching (FFM), a function-space generative model that generalizes the recently-introduced Flow Matching model to operate directly in infinite-dimensional spaces.	Gavin Kerrigan; Giosue Migliorini; Padhraic Smyth;
2024	9	Understanding Generalization of Federated Learning Via Stability: Heterogeneity Matters IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we aim to analyze the generalization performances of federated learning by means of algorithmic stability, which measures the change of the output model of an algorithm when perturbing one data point.	Zhenyu Sun; Xiaochun Niu; Ermin Wei;
2024	10	Theoretically Grounded Loss Functions and Algorithms for Score-Based Multi-Class Abstention IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we analyze the score-based formulation of learning with abstention in the multi-class classification setting.	Anqi Mao; Mehryar Mohri; Yutao Zhong;
2024	11	Adaptivity of Diffusion Models to Manifold Structures IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Empirical studies have demonstrated the effectiveness of (score-based) diffusion models in generating high-dimensional data, such as texts and images, which typically exhibit a low-dimensional manifold nature.	Rong Tang; Yun Yang;
2024	12	Parameter-Agnostic Optimization Under Relaxed Smoothness IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: However, as the assumption is relaxed to the more realistic $(L_0, L_1)$-smoothness, all existing convergence results still necessitate tuning of the stepsize. In this study, we demonstrate that Normalized Stochastic Gradient Descent with Momentum (NSGD-M) can achieve a (nearly) rate-optimal complexity without prior knowledge of any problem parameter, though this comes at the cost of introducing an exponential term dependent on $L_1$ in the complexity.	Florian H�bler; Junchi Yang; Xiang Li; Niao He;
2024	13	Quantifying Uncertainty in Natural Language Explanations of Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs.	Sree Harsha Tanneru; Chirag Agarwal; Himabindu Lakkaraju;
2024	14	Maximum Entropy GFlowNets with Soft Q-learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper addresses the connection by constructing an appropriate reward function, thereby establishing an exact relationship between GFNs and maximum entropy RL.	Sobhan Mohammadpour; Emmanuel Bengio; Emma Frejinger; Pierre-Luc Bacon;
2024	15	Fair Machine Unlearning: Data Removal While Mitigating Disparities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we demonstrate that most efficient unlearning methods cannot accommodate popular fairness interventions, and we propose the first fair machine unlearning method that can efficiently unlearn data instances from a fair objective.	Alex Oesterling; Jiaqi Ma; Flavio Calmon; Himabindu Lakkaraju;
2023	1	TabLLM: Few-shot Classification of Tabular Data with Large Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We evaluate several serialization methods including templates, table-to-text models, and large language models.	STEFAN HEGSELMANN et. al.
2023	2	Data Banzhaf: A Robust Data Valuation Framework for Machine Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose an algorithm to efficiently estimate the Banzhaf value based on the Maximum Sample Reuse (MSR) principle.	Jiachen T. Wang; Ruoxi Jia;
2023	3	Dueling RL: Reinforcement Learning with Trajectory Preferences IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Assuming the transition model is known, we propose an algorithm with a regret guarantee of $\tilde {\mathcal{O}}\left( SH d \log (T / \delta) \sqrt{T} \right)$.	Aadirupa Saha; Aldo Pacchiano; Jonathan Lee;
2023	4	Federated Learning Under Distributed Concept Drift IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We identify the problem of drift adaptation as a time-varying clustering problem, and we propose two new clustering algorithms for reacting to drifts based on local drift detection and hierarchical clustering.	Ellango Jothimurugesan; Kevin Hsieh; Jianyu Wang; Gauri Joshi; Phillip B. Gibbons;
2023	5	Fixing By Mixing: A Recipe for Optimal Byzantine ML Under Heterogeneity IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We obtain similar guarantees (in expectation) by plugging NNM in the distributed stochastic heavy ball method, a practical substitute to distributed gradient descent.	YOUSSEF ALLOUAH et. al.
2023	6	Do Bayesian Neural Networks Need To Be Fully Stochastic? IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We investigate the benefit of treating all the parameters in a Bayesian neural network stochastically and find compelling theoretical and empirical evidence that this standard construction may be unnecessary.	Mrinank Sharma; Sebastian Farquhar; Eric Nalisnick; Tom Rainforth;
2023	7	Who Should Predict? Exact Algorithms For Learning to Defer to Humans IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we jointly train a classifier with a rejector, which decides on each data point whether the classifier or the human should predict.	HUSSEIN MOZANNAR et. al.
2023	8	Membership Inference Attacks Against Synthetic Data Through Overfitting Detection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work we argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.	Boris van Breugel; Hao Sun; Zhaozhi Qian; Mihaela van der Schaar;
2023	9	From Shapley Values to Generalized Additive Models and Back IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce $n$-Shapley Values, a parametric family of local post-hoc explanation algorithms that explain individual predictions with interaction terms up to order $n$.	Sebastian Bordt; Ulrike von Luxburg;
2023	10	Reinforcement Learning for Adaptive Mesh Refinement IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning (RL) to train refinement policies directly from simulation.	JIACHEN YANG et. al.
2023	11	Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose a unified convergence analysis that covers a large variety of stochastic gradient descent-ascent methods, which so far have required different intuitions, have different applications and have been developed separately in various communities.	Aleksandr Beznosikov; Eduard Gorbunov; Hugo Berard; Nicolas Loizou;
2023	12	Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: A common approach to build such models is to use contrastive learning on paired data across the two modalities, as exemplified by Contrastive Language-Image Pre-Training (CLIP). In this paper, (i) we initiate the investigation of a general class of nonlinear loss functions for multimodal contrastive learning (MMCL) including CLIP loss and show its connection to singular value decomposition (SVD).	RYUMEI NAKADA et. al.
2023	13	Learning to Defer to Multiple Experts: Consistent Surrogate Losses, Confidence Calibration, and Conformal Ensembles IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In particular, we address the open problems of deriving a consistent surrogate loss, confidence calibration, and principled ensembling of experts.	Rajeev Verma; Daniel Barrejon; Eric Nalisnick;
2023	14	Improved Sample Complexity Bounds for Distributionally Robust Reinforcement Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We focus on the tabular episodic learning setting where the algorithm has access to a generative model of the nominal (training) environment around which the uncertainty set is defined. We propose the Robust Phased Value Learning (RPVL) algorithm to solve this problem for the uncertainty sets specified by four different divergences: total variation, chi-square, Kullback-Leibler, and Wasserstein.	Zaiyan Xu; Kishan Panaganti; Dileep Kalathil;
2023	15	Generalization in Graph Neural Networks: Improved PAC-Bayesian Bounds on Graph Diffusion IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we present generalization bounds that instead scale with the largest singular value of the graph neural network’s feature diffusion matrix.	Haotian Ju; Dongyue Li; Aneesh Sharma; Hongyang R. Zhang;
2022	1	Federated Learning with Buffered Asynchronous Aggregation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: However, aggregating individual client updates is incompatible with Secure Aggregation, which could result in an undesirable level of privacy for the system. To address these concerns, we propose a novel buffered asynchronous aggregation method, FedBuff, that is agnostic to the choice of optimizer, and combines the best properties of synchronous and asynchronous FL.	JOHN NGUYEN et. al.
2022	2	Towards Understanding Biased Client Selection in Federated Learning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We show that biasing client selection towards clients with higher local loss yields faster error convergence. From this insight, we propose Power-of-Choice, a communication- and computation-efficient client selection framework that flexibly spans the trade-off between convergence speed and solution bias.	Yae Jee Cho; Jianyu Wang; Gauri Joshi;
2022	3	Beta Shapley: A Unified and Noise-reduced Data Valuation Framework for Machine Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose Beta Shapley, which is a substantial generalization of Data Shapley.	Yongchan Kwon; James Zou;
2022	4	CF-GNNExplainer: Counterfactual Explanations for Graph Neural Networks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: However, such methods are not counterfactual (CF) in nature: given a prediction, we want to understand how the prediction can be changed in order to achieve an alternative outcome. In this work, we propose a method for generating CF explanations for GNNs: the minimal perturbation to the input (graph) data such that the prediction changes.	Ana Lucic; Maartje A. Ter Hoeve; Gabriele Tolomei; Maarten De Rijke; Fabrizio Silvestri;
2022	5	Differentially Private Federated Learning on Heterogeneous Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Federated Learning (FL) is a paradigm for large-scale distributed learning which faces two key challenges: (i) training efficiently from highly heterogeneous user data, and (ii) protecting the privacy of participating users. In this work, we propose a novel FL approach (DP-SCAFFOLD) to tackle these two challenges together by incorporating Differential Privacy (DP) constraints into the popular SCAFFOLD algorithm.	Maxence Noble; Aur�lien Bellet; Aymeric Dieuleveut;
2022	6	Optimal Accounting of Differential Privacy Via Characteristic Function IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Algorithmically, we propose an analytical Fourier accountant that represents the complex logarithm of $\phi$-functions symbolically and uses Gaussian quadrature for numerical computation.	Yuqing Zhu; Jinshuo Dong; Yu-Xiang Wang;
2022	7	Sinkformers: Transformers with Doubly Stochastic Attention IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose instead to use Sinkhorn’s algorithm to make attention matrices doubly stochastic.	Michael E. Sander; Pierre Ablin; Mathieu Blondel; Gabriel Peyr�;
2022	8	SparseFed: Mitigating Model Poisoning Attacks in Federated Learning with Sparsification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we introduce SparseFed, a novel defense that uses global top-k update sparsification and device-level gradient clipping to mitigate model poisoning attacks.	Ashwinee Panda; Saeed Mahloujifar; Arjun Nitin Bhagoji; Supriyo Chakraborty; Prateek Mittal;
2022	9	Generative Models As Distributions of Functions IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we abandon discretized grids and instead parameterize individual data points by continuous functions.	Emilien Dupont; Yee Whye Teh; Arnaud Doucet;
2022	10	Mitigating Bias in Calibration Error Estimation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a framework through which we can compute the bias of a particular estimator for an evaluation data set of a given size.	Rebecca Roelofs; Nicholas Cain; Jonathon Shlens; Michael C. Mozer;
2022	11	Proximal Optimal Transport Modeling of Population Dynamics IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a new approach to model the collective dynamics of a population of particles evolving with time.	Charlotte Bunne; Laetitia Papaxanthos; Andreas Krause; Marco Cuturi;
2022	12	Federated Reinforcement Learning with Environment Heterogeneity IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We study Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction. In this paper, we stress the constraint of environment heterogeneity, which means $n$ environments corresponding to these $n$ agents have different state-transitions.	Hao Jin; Yang Peng; Wenhao Yang; Shusen Wang; Zhihua Zhang;
2022	13	Performative Prediction in A Stateful World IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a theoretical framework where the response of a target population to the deployed classifier is modeled as a function of the classifier and the current state (distribution) of the population.	Gavin Brown; Shlomi Hod; Iden Kalemaj;
2022	14	Sample Complexity of Robust Reinforcement Learning with A Generative Model IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an $\epsilon$-optimal robust policy when the nominal model is unknown.	Kishan Panaganti; Dileep Kalathil;
2022	15	MT3: Meta Test-Time Training for Self-Supervised Test-Time Adaption IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We combine meta-learning, self-supervision and test-time training to learn to adapt to unseen test distributions.	Alexander Bartler; Andre B�hler; Felix Wiewel; Mario D�bler; Bin Yang;
2021	1	Federated Learning with Compression: Unified Analysis and Sharp Guarantees IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: To mitigate data heterogeneity, we introduce a local gradient tracking scheme and obtain sharp convergence rates that match the best-known communication complexities without compression for convex, strongly convex, and nonconvex settings.	Farzin Haddadpour; Mohammad Mahdi Kamani; Aryan Mokhtari; Mehrdad Mahdavi;
2021	2	Benchmarking Simulation-Based Inference IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This has made it difficult to compare algorithms and identify their strengths and weaknesses.We set out to fill this gap: We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms including recent approaches employing neural networks and classical Approximate Bayesian Computation methods.	Jan-Matthis Lueckmann; Jan Boelts; David Greenberg; Pedro Goncalves; Jakob Macke;
2021	3	Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method.	Nicolas Loizou; Sharan Vaswani; Issam Hadj Laradji; Simon Lacoste-Julien;
2021	4	Shuffled Model of Differential Privacy in Federated Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements, motivated by the federated learning (FL) framework. We propose a distributed communication-efficient and local differentially private stochastic gradient descent (CLDP-SGD) algorithm and analyze its communication, privacy, and convergence trade-offs.	Antonious Girgis; Deepesh Data; Suhas Diggavi; Peter Kairouz; Ananda Theertha Suresh;
2021	5	Provably Efficient Safe Exploration Via Primal-Dual Policy Optimization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: To this end, we present an \underline{O}ptimistic \underline{P}rimal-\underline{D}ual Proximal Policy \underline{OP}timization \mbox{(OPDOP)} algorithm where the value function is estimated by combining the least-squares policy evaluation and an additional bonus term for safe exploration.	Dongsheng Ding; Xiaohan Wei; Zhuoran Yang; Zhaoran Wang; Mihailo Jovanovic;
2021	6	Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: To do so, a multitude of model-agnostic, nonparametric meta-learners have been proposed in recent years. Such learners decompose the treatment effect estimation problem into separate sub-problems, each solvable using standard supervised learning methods.	Alicia Curth; Mihaela van der Schaar;
2021	7	Free-rider Attacks on Model Aggregation in Federated Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This kind of attacks are critical in sensitive applications of federated learning when data is scarce and the model has high commercial value. We introduce here the first theoretical and experimental analysis of free-rider attacks on federated learning schemes based on iterative parameters aggregation, such as FedAvg or FedProx, and provide formal guarantees for these attacks to converge to the aggregated models of the fair participants.	Yann Fraboni; Richard Vidal; Marco Lorenzi;
2021	8	Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: The use of min-max optimization in the adversarial training of deep neural network classifiers, and the training of generative adversarial networks has motivated the study of nonconvex-nonconcave optimization objectives, which frequently arise in these applications.	Jelena Diakonikolas; Constantinos Daskalakis; Michael I. Jordan;
2021	9	On Information Gain and Regret Bounds in Gaussian Process Bandits IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We provide general bounds on $\gamma_T$ based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels improves the existing bounds on $\gamma_T$, and subsequently the regret bounds relying on $\gamma_T$ under numerous settings.	Sattar Vakili; Kia Khezeli; Victor Picheny;
2021	10	Improving Predictions of Bayesian Neural Nets Via Local Linearization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we argue that the GGN approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN), which turns the BNN into a generalized linear model (GLM).	Alexander Immer; Maciej Korzepa; Matthias Bauer;
2021	11	Does Invariant Risk Minimization Capture Invariance? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We show that the Invariant Risk Minimization (IRM) formulation of Arjovsky et al. (2019) can fail to capture "natural" invariances, at least when used in its practical "linear" form, and even on very simple problems which directly follow the motivating examples for IRM.	Pritish Kamath; Akilesh Tangella; Danica Sutherland; Nathan Srebro;
2021	12	Scalable Constrained Bayesian Optimization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose the scalable constrained Bayesian optimization (SCBO) algorithm that overcomes the above challenges and pushes the applicability of Bayesian optimization far beyond the state-of-the-art.To this end, we propose two new control problems that we expect to be of independent value for the scientific community.	David Eriksson; Matthias Poloczek;
2021	13	VqSGD: Vector Quantized Stochastic Gradient Descent IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we present a family of vector quantization schemes vqSGD (Vector-Quantized Stochastic Gradient Descent) that provide an asymptotic reduction in the communication cost with convergence guarantees in first-order distributed optimization.	Venkata Gandikota; Daniel Kane; Raj Kumar Maity; Arya Mazumdar;
2021	14	Causal Autoregressive Flows IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we highlight an intrinsic correspondence between a simple family of autoregressive normalizing flows and identifiable causal models.	Ilyes Khemakhem; Ricardo Monti; Robert Leech; Aapo Hyvarinen;
2021	15	Local SGD: Unified Theory and New Efficient Methods IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models.	Eduard Gorbunov; Filip Hanzely; Peter Richtarik;
2020	1	How To Backdoor Federated Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We show that this makes federated learning vulnerable to amodel-poisoning attack that is significantly more powerful than poisoningattacks that target only the training data.	Eugene Bagdasaryan; Andreas Veit; Yiqing Hua; Deborah Estrin; Vitaly Shmatikov;
2020	2	FedPAQ: A Communication-Efficient Federated Learning Method With Periodic Averaging And Quantization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we present FedPAQ, a communication-efficient Federated Learning method with Periodic Averaging and Quantization.	Amirhossein Reisizadeh; Aryan Mokhtari; Hamed Hassani; Ali Jadbabaie; Ramtin Pedarsani;
2020	3	Variational Autoencoders And Nonlinear ICA: A Unifying Framework IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We address this issue by showing that for a broad family of deep latent-variable models, identification of the true joint distribution over observed and latent variables is actually possible up to very simple transformations, thus achieving a principled and powerful form of disentanglement.	Ilyes Khemakhem; Diederik Kingma; Ricardo Monti; Aapo Hyvarinen;
2020	4	Tighter Theory For Local SGD On Identical And Heterogeneous Data IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous.	Ahmed Khaled Ragab Bayoumi; Konstantin Mishchenko; Peter Richtarik;
2020	5	Orthogonal Gradient Descent For Continual Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose to address this issue from a parameter space perspective and study an approach to restrict the direction of the gradient updates to avoid forgetting previously-learned data.	Mehrdad Farajtabar; Navid Azizan; Alex Mott; Ang Li;
2020	6	Optimizing Millions Of Hyperparameters By Implicit Differentiation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose an algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations.	Jonathan Lorraine; Paul Vicol; David Duvenaud;
2020	7	Scalable Gradients For Stochastic Differential Equations IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We generalize this method to stochastic differential equations, allowing time-efficient and constant-memory computation of gradients with high-order adaptive solvers.	Xuechen Li; Ting-Kam Leonard Wong; Ricky T. Q. Chen; David Duvenaud;
2020	8	Gradient Descent With Early Stopping Is Provably Robust To Label Noise For Overparameterized Neural Networks IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Despite this (over)fitting capacity in this paper we demonstrate that such overparameterized networks have an intriguing robustness capability: they are surprisingly robust to label noise when first order methods with early stopping is used to train them.	Mingchen Li; Mahdi Soltanolkotabi; Samet Oymak;
2020	9	Model-Agnostic Counterfactual Explanations For Consequential Decisions IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In contrast, we build on standard theory and tools from formal verification and propose a novel algorithm that solves a sequence of satisfiability problems, where both the distance function (objective) and predictive model (constraints) are represented as logic formulae.	Amir-Hossein Karimi; Gilles Barthe; Borja Balle; Isabel Valera;
2020	10	A Unified Analysis Of Extra-gradient And Optimistic Gradient Methods For Saddle Point Problems: Proximal Point Approach IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we consider solving saddle point problems using two variants of Gradient Descent-Ascent algorithms, Extra-gradient (EG) and Optimistic Gradient Descent Ascent (OGDA) methods.	Aryan Mokhtari; Asuman Ozdaglar; Sarath Pattathil;
2020	11	GP-VAE: Deep Probabilistic Time Series Imputation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a new deep sequential latent variable model for dimensionality reduction and data imputation.	Vincent Fortuin; Dmitry Baranchuk; Gunnar Raetsch; Stephan Mandt;
2020	12	Permutation Invariant Graph Generation Via Score-Based Generative Modeling IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: To address this difficulty, we propose a permutation invariant approach to modeling graphs, using the recent framework of score-based generative modeling.	CHENHAO NIU et. al.
2020	13	Formal Limitations On The Measurement Of Mutual Information IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we prove that serious statistical limitations are inherent to any method of measuring mutual information.	David McAllester; Karl Stratos;
2020	14	Identifying And Correcting Label Bias In Machine Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we provide a mathematical formulation of how this bias can arise.	Heinrich Jiang; Ofir Nachum;
2020	15	Learning Sparse Nonparametric DAGs IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Unlike existing approaches that require specific modeling choices, loss functions, or algorithms, we present a completely general framework that can be applied to general nonlinear models (e.g. without additive noise), general differentiable loss functions, and generic black-box optimization routines.	Xun Zheng; Chen Dan; Bryon Aragam; Pradeep Ravikumar; Eric Xing;
2019	1	Interpolating Between Optimal Transport And MMD Using Sinkhorn Divergences IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper studies the Sinkhorn divergences, a family of geometric divergences that interpolates between MMD and OT.	JEAN FEYDY et. al.
2019	2	Towards Efficient Data Valuation Based On The Shapley Value IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we study the problem of \emph{data valuation} by utilizing the Shapley value, a popular notion of value which originated in coopoerative game theory.	RUOXI JIA et. al.
2019	3	Lagrange Coded Computing: Optimal Design For Resiliency, Security, And Privacy IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers.	QIAN YU et. al.
2019	4	Sequential Neural Likelihood: Fast Likelihood-free Inference With Autoregressive Flows IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present Sequential Neural Likelihood (SNL), a new method for Bayesian inference in simulator models, where the likelihood is intractable but simulating data from the model is possible.	George Papamakarios; David Sterratt; Iain Murray;
2019	5	Nonlinear ICA Using Auxiliary Variables And Generalized Contrastive Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Here, we propose a general framework for nonlinear ICA, which, as a special case, can make use of temporal structure.	Aapo Hyvarinen; Hiroaki Sasaki; Richard Turner;
2019	6	Sample Complexity Of Sinkhorn Divergences IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We focus in this paper on Sinkhorn divergences (SDs), a regularized variant of OT distances which can interpolate, depending on the regularization strength $\varepsilon$, between OT ($\varepsilon=0$) and MMD ($\varepsilon=\infty$).	Aude Genevay; L�na�c Chizat; Francis Bach; Marco Cuturi; Gabriel Peyr�;
2019	7	On The Convergence Of Stochastic Gradient Descent With Adaptive Stepsizes IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we start closing this gap: we theoretically analyze in the convex and non-convex settings a generalized version of the AdaGrad stepsizes.	Xiaoyu Li; Francesco Orabona;
2019	8	Fast And Faster Convergence Of SGD For Over-Parameterized Models And An Accelerated Perceptron IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: For such models, we show that the stochastic gradients of common loss functions satisfy a strong growth condition.	Sharan Vaswani; Francis Bach; Mark Schmidt;
2019	9	Truncated Back-propagation For Bilevel Optimization IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we analyze the properties of this family of approximate gradients and establish sufficient conditions for convergence.	Amirreza Shaban; Ching-An Cheng; Nathan Hatch; Byron Boots;
2019	10	Fisher-Rao Metric, Geometry, And Complexity Of Neural Networks IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce a new notion of capacity — the Fisher-Rao norm — that possesses desirable invariance properties and is motivated by Information Geometry.	Tengyuan Liang; Tomaso Poggio; Alexander Rakhlin; James Stokes;
2019	11	Evaluating Model Calibration In Classification IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we develop a general theoretical calibration evaluation framework grounded in probability theory, and point out subtleties present in model calibration evaluation that lead to refined interpretations of existing evaluation techniques.	JUOZAS VAICENAVICIUS et. al.
2019	12	Does Data Interpolation Contradict Statistical Optimality? IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We show that classical learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss.	Mikhail Belkin; Alexander Rakhlin; Alexandre B. Tsybakov;
2019	13	Interaction Matters: A Note On Non-asymptotic Local Convergence Of Generative Adversarial Networks IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Motivated by the pursuit of a systematic computational and algorithmic understanding of Generative Adversarial Networks (GANs), we present a simple yet unified non-asymptotic local convergence theory for smooth two-player games, which subsumes several discrete-time gradient-based saddle point dynamics.	Tengyuan Liang; James Stokes;
2019	14	Derivative-Free Methods For Policy Optimization: Guarantees For Linear Quadratic Systems IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We study derivative-free methods for policy optimization over the class of linear policies.	DHRUV MALIK et. al.
2019	15	Unsupervised Alignment Of Embeddings With Wasserstein Procrustes IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose to use an alternative formulation, based on the joint estimation of an orthogonal matrix and a permutation matrix.	Edouard Grave; Armand Joulin; Quentin Berthet;
2018	1	VAE With A VampPrior IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose to extend the variational auto-encoder (VAE) framework with a new type of prior which we call "Variational Mixture of Posteriors" prior, or VampPrior for short.	Jakub Tomczak; Max Welling;
2018	2	Learning Generative Models With Sinkhorn Divergences IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper presents the first tractable method to train large scale generative models using an OT-based loss called Sinkhorn loss which tackles these three issues by relying on two key ideas: (a) entropic smoothing, which turns the original OT loss into a differentiable and more robust quantity that can be computed using Sinkhorn fixed point iterations; (b) algorithmic (automatic) differentiation of these iterations with seam- less GPU execution.	Aude Genevay; Gabriel Peyre; Marco Cuturi;
2018	3	Turing: A Language For Flexible Probabilistic Inference IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we present a system called Turing for building MCMC algorithms for probabilistic programming inference.	Hong Ge; Kai Xu; Zoubin Ghahramani;
2018	4	Parallelised Bayesian Optimisation Via Thompson Sampling IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We design and analyse variations of the classical Thompson sampling (TS) procedure for Bayesian optimisation (BO) in settings where function evaluations are expensive but can be performed in parallel.	Kirthevasan Kandasamy; Akshay Krishnamurthy; Jeff Schneider; Barnabas Poczos;
2018	5	Batched Large-scale Bayesian Optimization In High-dimensional Spaces IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose ensemble Bayesian optimization (EBO) to address three current challenges in BO simultaneously: (1) large-scale observations; (2) high dimensional input spaces; and (3) selections of batch queries that balance quality and diversity.	Zi Wang; Clement Gehring; Pushmeet Kohli; Stefanie Jegelka;
2018	6	Variational Sequential Monte Carlo IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we present a new approximating family of distributions, the variational sequential Monte Carlo (VSMC) family, and show how to optimize it in variational inference.	Christian Naesseth; Scott Linderman; Rajesh Ranganath; David Blei;
2018	7	Data-Efficient Reinforcement Learning With Probabilistic Model Predictive Control IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: To reduce the number of system interactions while simultaneously handling constraints, we propose a model-based RL framework based on probabilistic Model Predictive Control (MPC).	Sanket Kamthe; Marc Deisenroth;
2018	8	Slow And Stale Gradients Can Win The Race: Error-Runtime Trade-offs In Distributed SGD IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work we present the first theoretical characterization of the speed-up offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime (wallclock time).	Sanghamitra Dutta; Gauri Joshi; Soumyadip Ghosh; Parijat Dube; Priya Nagpurkar;
2018	9	Smooth And Sparse Optimal Transport IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we explore regularizing the primal and dual OT formulations with a strongly convex term, which corresponds to relaxing the dual and primal constraints with smooth approximations.	Mathieu Blondel; Vivien Seguy; Antoine Rolet;
2018	10	Conditional Independence Testing Based On A Nearest-neighbor Estimator Of Conditional Mutual Information IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Abstract: Conditional independence testing is a fundamental problem underlying causal discovery and a particularly challenging task in the presence of nonlinear dependencies. Here a fully …	Jakob Runge;
2018	11	The Emergence Of Spectral Universality In Deep Networks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: To this end, we leverage powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network’s Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth.	Jeffrey Pennington; Samuel Schoenholz; Surya Ganguli;
2018	12	Personalized And Private Peer-to-Peer Machine Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we introduce an efficient algorithm to address the above problem in a fully decentralized (peer-to-peer) and asynchronous fashion, with provable convergence rate.	Aur�lien Bellet; Rachid Guerraoui; Mahsa Taziki; Marc Tommasi;
2018	13	Policy Evaluation And Optimization With Continuous Treatments IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We study the problem of policy evaluation and learning from batched contextual bandit data when treatments are continuous, going beyond previous work on discrete treatments.	Nathan Kallus; Angela Zhou;
2018	14	Gradient Diversity: A Key Ingredient For Scalable Distributed Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we present an analysis hinting that high similarity between concurrently processed gradients may be a cause of this performance degradation.	DONG YIN et. al.
2018	15	Linear Stochastic Approximation: How Far Does Constant Step-Size And Iterate Averaging Go? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we study a constant step-size averaged linear stochastic approximation (CALSA) algorithm, and for a given class of problems, we ask whether properties of $i)$ a universal constant step-size and $ii)$ a uniform fast rate of $\frac{C}{t}$ for the mean square-error hold for all instance of the class, where the constant $C>0$ does not depend on the problem instance.	Chandrashekar Lakshminarayanan; Csaba Szepesvari;
2017	1	Communication-Efficient Learning Of Deep Networks From Decentralized Data IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets.	Brendan McMahan; Eider Moore; Daniel Ramage; Seth Hampson; Blaise Aguera y Arcas;
2017	2	Fairness Constraints: Mechanisms For Fair Classification IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we introduce a flexible mechanism to design fair classifiers by leveraging a novel intuitive measure of decision boundary (un)fairness.	Muhammad Bilal Zafar; Isabel Valera; Manuel Gomez Rogriguez; Krishna P. Gummadi;
2017	3	Fast Bayesian Optimization Of Machine Learning Hyperparameters On Large Datasets IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: To accelerate hyperparameter optimization, we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset.	Aaron Klein; Stefan Falkner; Simon Bartels; Philipp Hennig; Frank Hutter;
2017	4	Linear Thompson Sampling Revisited IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting.	Marc Abeille; Alessandro Lazaric;
2017	5	Bayesian Learning And Inference In Recurrent Switching Linear Dynamical Systems IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Building on switching linear dynamical systems (SLDS), we develop a model class and Bayesian inference algorithms that not only discover these dynamical units but also, by learning how transition probabilities depend on observations or continuous latent states, explain their switching behavior.	SCOTT LINDERMAN et. al.
2017	6	Decentralized Collaborative Learning Of Personalized Models Over Networks IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: The question addressed in this paper is: how can agents improve upon their locally trained model by communicating with other agents that have similar objectives?	Paul Vanhaesebrouck; Aur�lien Bellet; Marc Tommasi;
2017	7	Beta Calibration: A Well-founded And Easily Implemented Improvement On Logistic Calibration For Binary Classifiers IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we solve all these problems with a richer class of calibration maps based on the beta distribution.	Meelis Kull; Telmo Silva Filho; Peter Flach;
2017	8	Nonlinear ICA Of Temporally Dependent Stationary Sources IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce a nonlinear generative model where the independent sources are assumed to be temporally dependent, non-Gaussian, and stationary, and we observe arbitrarily nonlinear mixtures of them.	Aapo Hyvarinen; Hiroshi Morioka;
2017	9	Non-square Matrix Sensing Without Spurious Local Minima Via The Burer-Monteiro Approach IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we complement recent findings on the non-convex geometry of the analogous PSD setting [5], and show that matrix factorization does not introduce any spurious local minima, under RIP.	Dohyung Park; Anastasios Kyrillidis; Constantine Carmanis; Sujay Sanghavi;
2017	10	Guaranteed Non-convex Optimization: Submodular Maximization Over Continuous Domains IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Specifically, i) We introduce the weak DR property that gives a unified characterization of submodularity for all set, integer-lattice and continuous functions; ii) for maximizing monotone DR-submodular continuous functions under general down-closed convex constraints, we propose a Frank-Wolfe variant with (1-1/e) approximation guarantee, and sub-linear convergence rate; iii) for maximizing general non-monotone submodular continuous functions subject to box constraints, we propose a DoubleGreedy algorithm with 1/3 approximation guarantee.	Andrew An Bian; Baharan Mirzasoleiman; Joachim Buhmann; Andreas Krause;
2017	11	Conjugate-Computation Variational Inference : Converting Variational Inference In Non-Conjugate Models To Inferences In Conjugate Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose a new algorithm called Conjugate-computation Variational Inference (CVI) which brings the best of the two worlds together – it uses conjugate computations for the conjugate terms and employs stochastic gradients for the rest.	Mohammad Khan; Wu Lin;
2017	12	Diverse Neural Network Learns True Target Functions IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we answer these questions by analyzing one-hidden-layer neural networks with ReLU activation, and show that despite the non-convexity, neural networks with diverse units have no spurious local minima.	Bo Xie; Yingyu Liang; Le Song;
2017	13	Inference Compilation And Universal Probabilistic Programming IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods.	Tuan Anh Le; Atilim Gunes Baydin; Frank Wood;
2017	14	Learning From Conditional Distributions Via Dual Embeddings IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: To address these challenges, we propose a novel approach which employs a new min-max reformulation of the learning from conditional distribution problem.	Bo Dai; Niao He; Yunpeng Pan; Byron Boots; Le Song;
2017	15	Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a flexible convex relaxation for the phase retrieval problem that operates in the natural domain of the signal.	Sohail Bahmani; Justin Romberg;
2016	1	Deep Kernel Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods.	Andrew Gordon Wilson; Zhiting Hu; Ruslan Salakhutdinov; Eric P. Xing;
2016	2	Non-stochastic Best Arm Identification And Hyperparameter Optimization IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Motivated by the task of hyperparameter optimization, we introduce the \em non-stochastic best-arm identification problem.	Kevin Jamieson; Ameet Talwalkar;
2016	3	How To Learn A Graph From Smooth Signals IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a framework to learn the graph structure underlying a set of smooth signals.	Vassilis Kalofolias;
2016	4	Generalizing Pooling Functions In Convolutional Neural Networks: Mixed, Gated, And Tree IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We seek to improve deep neural networks by generalizing the pooling operations that play a central role in current architectures.	Chen-Yu Lee; Patrick W. Gallagher; Zhuowen Tu;
2016	5	Batch Bayesian Optimization Via Local Penalization IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We investigate this issue and propose a highly effective heuristic based on an estimate of the function’s Lipschitz constant that captures the most important aspect of this interaction–local repulsion–at negligible computational overhead.	Javier Gonzalez; Zhenwen Dai; Philipp Hennig; Neil Lawrence;
2016	6	A Linearly-Convergent Stochastic L-BFGS Algorithm IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions.	Philipp Moritz; Robert Nishihara; Michael Jordan;
2016	7	Controlling Bias In Adaptive Data Analysis Using Information Theory IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose a general information-theoretic framework to quantify and provably bound the bias of arbitrary adaptive analysis process.	Daniel Russo; James Zou;
2016	8	On Sparse Variational Methods And The Kullback-Leibler Divergence Between Stochastic Processes IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we give a substantial generalization of the literature on this topic.	Alexander G. de G. Matthews; James Hensman; Richard Turner; Zoubin Ghahramani;
2016	9	Breaking Sticks And Ambiguities With Adaptive Skip-gram IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we propose the Adaptive Skip-gram model which is a nonparametric Bayesian extension of Skip-gram capable to automatically learn the required number of representations for all words at desired semantic resolution.	Sergey Bartunov; Dmitry Kondrashkin; Anton Osokin; Dmitry Vetrov;
2016	10	Dreaming More Data: Class-dependent Distributions Over Diffeomorphisms For Learned Data Augmentation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: With an eye towards true end-to-end learning, we suggest learning the applied transformations on a per-class basis.	S�ren Hauberg; Oren Freifeld; Anders Boesen Lindbo Larsen; John Fisher; Lars Hansen;
2016	11	Fast Dictionary Learning With A Smoothed Wasserstein Loss IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We consider in this paper the dictionary learning problem when the observations are normalized histograms of features.	Antoine Rolet; Marco Cuturi; Gabriel Peyr�;
2016	12	Quantization Based Fast Inner Product Search IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a quantization based approach for fast approximate Maximum Inner Product Search (MIPS).	Ruiqi Guo; Sanjiv Kumar; Krzysztof Choromanski; David Simcha;
2016	13	GLASSES: Relieving The Myopia Of Bayesian Optimisation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present GLASSES: Global optimisation with Look-Ahead through Stochastic Simulation and Expected-loss Search.	Javier Gonzalez; Michael Osborne; Neil Lawrence;
2016	14	Non-Stationary Gaussian Process Regression With Hamiltonian Monte Carlo IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present a novel approach for non-stationary Gaussian process regression (GPR), where the three key parameters – noise variance, signal variance and lengthscale – can be simultaneously input-dependent.	Markus Heinonen; Henrik Mannerstr�m; Juho Rousu; Samuel Kaski; Harri L�hdesm�ki;
2016	15	Time-Varying Gaussian Process Bandit Optimization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce two natural extensions of the classical Gaussian process upper confidence bound (GP-UCB) algorithm.	Ilija Bogunovic; Jonathan Scarlett; Volkan Cevher;
2015	1	Deeply-Supervised Nets IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose deeply-supervised nets (DSN), a method that simultaneously minimizes classification error and improves the directness and transparency of the hidden layer learning process.	Chen-Yu Lee; Saining Xie; Patrick Gallagher; Zhengyou Zhang; Zhuowen Tu;
2015	2	The Loss Surfaces Of Multilayer Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity.	Anna Choromanska; MIkael Henaff; Michael Mathieu; Gerard Ben Arous; Yann LeCun;
2015	3	Scalable Variational Gaussian Process Classification IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We show how to scale the model within a variational inducing point framework, out-performing the state of the art on benchmark datasets.	James Hensman; Alexander Matthews; Zoubin Ghahramani;
2015	4	Falling Rule Lists IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We provide a Bayesian framework for learning falling rule lists that does not rely on traditional greedy decision tree learning methods.	Fulton Wang; Cynthia Rudin;
2015	5	Online Optimization : Competing With Dynamic Comparators IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we address these two directions together.	Ali Jadbabaie; Alexander Rakhlin; Shahin Shahrampour; Karthik Sridharan;
2015	6	Trend Filtering On Graphs IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce a family of adaptive estimators on graphs, based on penalizing the \ell_1 norm of discrete graph differences.	Yu-Xiang Wang; James Sharpnack; Alex Smola; Ryan Tibshirani;
2015	7	Efficient Estimation Of Mutual Information For Strongly Dependent Variables IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce a new estimator that is robust to local non-uniformity, works well with limited data, and is able to capture relationship strengths over many orders of magnitude.	Shuyang Gao; Greg Ver Steeg; Aram Galstyan;
2015	8	DART: Dropouts Meet Multiple Additive Regression Trees IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we explore a different approach to address the problem, that of employing dropouts, a tool that has been recently proposed in the context of learning deep neural networks.	Rashmi Korlakai Vinayak; Ran Gilad-Bachrach;
2015	9	Estimation From Pairwise Comparisons: Sharp Minimax Bounds With Topology Dependence IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: The Bradley-Terry-Luce (BTL) and Thurstone models are the most widely used parametric models for such pairwise comparison data.	NIHAR SHAH et. al.
2015	10	A Rate Of Convergence For Mixture Proportion Estimation, With Application To Learning From Noisy Labels IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work we establish a rate of convergence for mixture proportion estimation under an appropriate distributional assumption, and argue that this rate of convergence is useful for analyzing weakly supervised learning algorithms that build on MPE.	Clayton Scott;
2015	11	WASP: Scalable Bayes Via Barycenters Of Subset Posteriors IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a simple, general, and highly efficient approach, which first runs a posterior sampling algorithm in parallel on different machines for subsets of a large data set.	Sanvesh Srivastava; Volkan Cevher; Quoc Dinh; David Dunson;
2015	12	Deep Exponential Families IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We describe deep exponential families (DEFs), a class of latent variable models that are inspired by the hidden structures used in deep neural networks.	Rajesh Ranganath; Linpeng Tang; Laurent Charlin; David Blei;
2015	13	Infinite Edge Partition Models For Overlapping Community Detection And Link Prediction IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: A hierarchical gamma process infinite edge partition model is proposed to factorize the binary adjacency matrix of an unweighted undirected relational network under a Bernoulli-Poisson link.	Mingyuan Zhou;
2015	14	Toward Minimax Off-policy Value Estimation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy.	Lihong Li; Remi Munos; Csaba Szepesvari;
2015	15	On Theoretical Properties Of Sum-Product Networks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we fill some gaps in the theoretic foundation of SPNs.	Robert Peharz; Sebastian Tschiatschek; Franz Pernkopf; Pedro Domingos;
2014	1	Black Box Variational Inference IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we present a “black box” variational inference algorithm, one that can be quickly applied to many models with little additional derivation.	Rajesh Ranganath; Sean Gerrish; David Blei;
2014	2	A New Approach To Probabilistic Programming Inference IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce and demonstrate a new approach to inference in expressive probabilistic programming languages based on particle Markov chain Monte Carlo.	Frank Wood; Jan Willem Meent; Vikash Mansinghka;
2014	3	Efficient Transfer Learning Method For Automatic Hyperparameter Tuning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a fast and effective algorithm for automatic hyperparameter tuning that can generalize across datasets.	Dani Yogatama; Gideon Mann;
2014	4	Student-t Processes As Alternatives To Gaussian Processes IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We investigate the Student-t process as an alternative to the Gaussian process as a nonparametric prior over functions.	Amar Shah; Andrew Wilson; Zoubin Ghahramani;
2014	5	Cluster Canonical Correlation Analysis IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we present cluster canonical correlation analysis (cluster-CCA) for joint dimensionality reduction of two sets of data points.	Nikhil Rasiwasia; Dhruv Mahajan; Vijay Mahadevan; Gaurav Aggarwal;
2014	6	Avoiding Pathologies In Very Deep Networks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose an alternate network architecture which does not suffer from this pathology.	David Duvenaud; Oren Rippel; Ryan Adams; Zoubin Ghahramani;
2014	7	Heterogeneous Domain Adaptation For Multiple Classes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we present an efficient Multi-class Heterogeneous Domain Adaptation (HDA) method, where data from the source and target domains are represented by heterogeneous features with different dimensions.	Joey Tianyi Zhou; Ivor W.Tsang; Sinno Jialin Pan; Mingkui Tan;
2014	8	On Correlation And Budget Constraints In Model-based Bandit Optimization With Application To Automatic Machine Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce a Bayesian approach for this problem and show that it empirically outperforms both the existing frequentist counterpart and other Bayesian optimization methods.	Matthew Hoffman; Bobak Shahriari; Nando Freitas;
2014	9	In Defense Of Minhash Over Simhash IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this study, we provide a theoretical answer (validated by experiments) that MinHash virtually always outperforms SimHash when the data are binary, as common in practice such as search.	Anshumali Shrivastava; Ping Li;
2014	10	Mixed Graphical Models Via Exponential Families IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We study several instances of our model, and propose scalable M-estimators for recovering the underlying network structure.	Eunho Yang; Yulia Baker; Pradeep Ravikumar; Genevera Allen; Zhandong Liu;
2014	11	Accelerating ABC Methods Using Gaussian Processes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce Gaussian process (GP) accelerated ABC, which we show can significantly reduce the number of simulations required.	Richard Wilkinson;
2014	12	Bayesian Nonparametric Poisson Factorization For Recommendation Systems IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We develop a Bayesian nonparametric Poisson factorization model for recommendation systems.	Prem Gopalan; Francisco J. Ruiz; Rajesh Ranganath; David Blei;
2014	13	Bayesian Multi-Scale Optimistic Optimization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we introduce a new technique for efficient global optimization that combines Gaussian process confidence bounds and treed simultaneous optimistic optimization to eliminate the need for auxiliary optimization of acquisition functions.	Ziyu Wang; Babak Shakibi; Lin Jin; Nando Freitas;
2014	14	Explicit Link Between Periodic Covariance Functions And State Space Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper shows how periodic covariance functions in Gaussian process regression can be reformulated as state space models, which can be solved with classical Kalman filtering theory.	Arno Solin; Simo S�rkk�;
2014	15	Visual Boundary Prediction: A Deep Neural Prediction Network And Quality Dissection IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper investigates visual boundary detection, i.e. prediction of the presence of a boundary at a given image location.	Jyri Kivinen; Chris Williams; Nicolas Heess;
2013	1	Deep Gaussian Processes IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we introduce deep Gaussian process (GP) models.	Andreas Damianou; Neil Lawrence;
2013	2	Further Optimal Regret Bounds For Thompson Sampling IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we provide a novel regret analysis for Thompson Sampling that proves the first near-optimal problem-independent bound of O(\sqrtNT\ln T) on the expected regret of this algorithm.	Shipra Agrawal; Navin Goyal;
2013	3	Learning Social Infectivity In Sparse Low-rank Networks Using Multi-dimensional Hawkes Processes IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a convex optimization approach to discover the hidden network of social influence by modeling the recurrent events at different individuals as multi-dimensional Hawkes processes.	Ke Zhou; Hongyuan Zha; Le Song;
2013	4	ODE Parameter Inference Using Adaptive Gradient Matching With Gaussian Processes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: The present paper discusses a method based on nonparametric Bayesian statistics with Gaussian processes due to Calderhead et al. (2008), and shows how inference in this model can be substantially improved by consistently sampling from the joint distribution of the ODE parameters and GP hyperparameters.	Frank Dondelinger; Dirk Husmeier; Simon Rogers; Maurizio Filippone;
2013	5	Clustered Support Vector Machines IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose a Clustered Support Vector Machine (CSVM), which tackles the data in a divide and conquer manner.	Quanquan Gu; Jiawei Han;
2013	6	Supervised Sequential Classification Under Budget Constraints IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we develop a framework for a sequential decision making under budget constraints for multi-class classification.	Kirill Trapeznikov; Venkatesh Saligrama;
2013	7	Distribution-Free Distribution Regression IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we develop theory and methods for distribution-free versions of distribution regression.	Barnabas Poczos; Aarti Singh; Alessandro Rinaldo; Larry Wasserman;
2013	8	Uncover Topic-Sensitive Information Diffusion Networks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose a continuous time model, TopicCascade, for topic-sensitive information diffusion networks, and infer the hidden diffusion networks and the topic dependent transmission rates from the observed time stamps and contents of cascades.	Nan Du; Le Song; Hyenkyun Woo; Hongyuan Zha;
2013	9	Stochastic Blockmodeling Of Relational Event Dynamics IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Several approaches have recently been proposed for modeling of continuous-time network data via dyadic event rates conditioned on the observed history of events and nodal or dyadic covariates.	Christopher DuBois; Carter Butts; Padhraic Smyth;
2013	10	Meta-Transportability Of Causal Effects: A Formal Approach IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper considers the problem of transferring experimental findings learned from multiple heterogeneous domains to a different environment, in which only passive observations can be collected.	Elias Bareinboim; Judea Pearl;
2013	11	Changepoint Detection Over Graphs With The Spectral Scan Statistic IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We consider the change-point detection problem of deciding, based on noisy measurements, whether an unknown signal over a given graph is constant or is instead piecewise constant over two induced subgraphs of relatively low cut size.	James Sharpnack; Aarti Singh; Alessandro Rinaldo;
2013	12	Thompson Sampling In Switching Environments With Bayesian Online Change Detection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem.	Joseph Mellor; Jonathan Shapiro;
2013	13	Data-driven Covariate Selection For Nonparametric Estimation Of Causal Effects IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this contribution, we analyze the problem of inferring whether a given variable has a causal effect on another and, if it does, inferring an adjustment set of covariates that yields a consistent and unbiased estimator of this effect, based on the (conditional) independence and dependence relationships among the observed variables.	Doris Entner; Patrik Hoyer; Peter Spirtes;
2013	14	Convex Collective Matrix Factorization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Existing algorithms to estimate parameters of collective matrix factorization models are based on non-convex formulations of the problem; in this paper, a convex formulation of this approach is proposed.	Guillaume Bouchard; Dawei Yin; Shengbo Guo;
2013	15	Learning Markov Networks With Arithmetic Circuits IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we introduce ACMN, the first ever method for learning efficient Markov networks with arbitrary conjunctive features.	Daniel Lowd; Amirmohammad Rooshenas;
2012	1	Approximate Inference In Additive Factorial HMMs With Application To Energy Disaggregation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we propose an alternative inference method for such models, which exploits their additive structure by 1) looking at the observed difference signal of the observation, 2) incorporating a “robust” mixture component that can account for unmodeled observations, and 3) constraining the posterior to allow at most one hidden state to change at a time.	J. Zico Kolter; Tommi Jaakkola;
2012	2	Generic Methods For Optimization-Based Modeling IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: "Energy” models for continuous domains can be applied to many problems, but often suffer from high computational expense in training, due to the need to repeatedly minimize the energy function to high accuracy.	Justin Domke;
2012	3	On Bayesian Upper Confidence Bounds For Bandit Problems IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We show in this paper that methods derived from this second perspective prove optimal when evaluated using the frequentist cumulated regret as a measure of performance.	Emilie Kaufmann; Olivier Cappe; Aurelien Garivier;
2012	4	Joint Learning Of Words And Meaning Representations For Open-Text Semantic Parsing IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a method that learns to assign MRs to a wide range of text (using a dictionary of more than 70,000 words mapped to more than 40,000 entities) thanks to a training scheme that combines learning from knowledge bases (e.g. WordNet) with learning from raw text.	Antoine Bordes; Xavier Glorot; Jason Weston; Yoshua Bengio;
2012	5	Bayesian Classifier Combination IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We explore a general framework for Bayesian model combination (which differs from model averaging) in the context of classification.	Hyun-Chul Kim; Zoubin Ghahramani;
2012	6	Random Feature Maps For Dot Product Kernels IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Approximating non-linear kernels using feature maps has gained a lot of interest in recent years due to applications in reducing training and testing times of SVM classifiers and other kernel based learning algorithms.	Purushottam Kar; Harish Karnick;
2012	7	Beta-Negative Binomial Process And Poisson Factor Analysis IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Abstract: A beta-negative binomial (BNB) process is proposed, leading to a beta-gamma-Poisson process, which may be viewed as a “multi-scoop” generalization of the beta-Bernoulli process. …	Mingyuan Zhou; Lauren Hannah; David Dunson; Lawrence Carin;
2012	8	Hierarchical Relative Entropy Policy Search IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we reformulate learning a hi- erarchical policy as a latent variable estima- tion problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case.	Christian Daniel; Gerhard Neumann; Jan Peters;
2012	9	Online Incremental Feature Learning With Denoising Autoencoders IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose an incremental feature learning algorithm to determine the optimal model complexity for large-scale, online datasets based on the denoising autoencoder.	Guanyu Zhou; Kihyuk Sohn; Honglak Lee;
2012	10	Deep Learning Made Easier By Linear Transformations In Perceptrons IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We transform the outputs of each hidden neuron in a multi-layer perceptron network to have zero activation and zero slope on average, and use separate shortcut connections to model the linear dependencies instead.	Tapani Raiko; Harri Valpola; Yann Lecun;
2012	11	On Estimation And Selection For Topic Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization.	Matt Taddy;
2012	12	Controlling Selection Bias In Causal Inference IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper highlights several graphical and algebraic methods capable of mitigating and sometimes eliminating this bias.	Elias Bareinboim; Judea Pearl;
2012	13	Minimax Hypothesis Testing For Curve Registration IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We use the generalized likelihood ratio to build a nonadaptive procedure depending on a tuning parameter, which we choose in an optimal way according to the smoothness of the ambient space.	Olivier Collier;
2012	14	Online-to-Confidence-Set Conversions And Application To Sparse Stochastic Bandits IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce a novel technique, which we call online-to-confidence-set conversion.	Yasin Abbasi-Yadkori; David Pal; Csaba Szepesvari;
2012	15	Krylov Subspace Descent For Deep Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose a second order optimization method to learn models where both the dimensionality of the parameter space and the number of training samples is high.	Oriol Vinyals; Daniel Povey;
2011	1	Deep Sparse Rectifier Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-differentiability at zero, creating sparse representations with true zeros which seem remarkably suitable for naturally sparse data.	Xavier Glorot; Antoine Bordes; Yoshua Bengio;
2011	2	An Analysis Of Single-Layer Networks In Unsupervised Feature Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Specifically, we will apply several off-the-shelf feature learning algorithms (sparse auto-encoders, sparse RBMs, K-means clustering, and Gaussian mixtures) to CIFAR-10, NORB, and STL datasets using only single-layer networks.	Adam Coates; Andrew Ng; Honglak Lee;
2011	3	A Reduction Of Imitation Learning And Structured Prediction To No-Regret Online Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting.	Stephane Ross; Geoffrey Gordon; Drew Bagnell;
2011	4	Contextual Bandits With Linear Payoff Functions IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions.	Wei Chu; Lihong Li; Lev Reyzin; Robert Schapire;
2011	5	The Neural Autoregressive Distribution Estimator IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We describe a new approach for modeling the distribution of high-dimensional vectors of discrete variables.	Hugo Larochelle; Iain Murray;
2011	6	Online Variational Inference For The Hierarchical Dirichlet Process IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose an online variational inference algorithm for the HDP, an algorithm that is easily applicable to massive and streaming data.	Chong Wang; John Paisley; David Blei;
2011	7	Relative Entropy Inverse Reinforcement Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose a model-free IRL algorithm, where the relative entropy between the empirical distribution of the state-action trajectories under a baseline policy and their distribution under the learned policy is minimized by stochastic gradient descent.	Abdeslam Boularias; Jens Kober; Jan Peters;
2011	8	Contextual Bandit Algorithms With Supervised Learning Guarantees IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present a modification of the Exp4 algorithm of [Auer et al. 2002] called Exp4.P, which with high probability incurs regret at most O(\sqrtKT\ln N).	Alina Beygelzimer; John Langford; Lihong Li; Lev Reyzin; Robert Schapire;
2011	9	Follow-the-Regularized-Leader And Mirror Descent: Equivalence Theorems And L1 Regularization IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: The FTRL-Proximal algorithm, which we introduce, can be seen as a hybrid of these two algorithms, and significantly outperforms both on a large, real-world dataset.	Brendan McMahan;
2011	10	Lightweight Implementations Of Probabilistic Programming Languages Via Transformational Compilation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We describe a general method of transforming arbitrary programming languages into probabilistic programming languages with straightforward MCMC inference engines.	David Wingate; Andreas Stuhlmueller; Noah Goodman;
2011	11	Deep Learning For Efficient Discriminative Parsing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a new fast purely discriminative algorithm for natural language parsing, based on a “deep” recurrent convolutional graph transformer network (GTN).	Ronan Collobert;
2011	12	Parallel Gibbs Sampling: From Colored Fields To Thin Junction Trees IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we propose two methods to construct parallel Gibbs samplers guaranteed to draw from the targeted distribution.	Joseph Gonzalez; Yucheng Low; Arthur Gretton; Carlos Guestrin;
2011	13	Deep Learners Benefit More From Out-of-Distribution Examples IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Recent theoretical and empirical work in statistical machine learning has demonstrated the potential of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple levels of representation.	YOSHUA BENGIO et. al.
2011	14	Multi-Label Output Codes Using Canonical Correlation Analysis IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Using canonical correlation analysis, we propose an error-correcting code for multi-label classification.	Yi Zhang; Jeff Schneider;
2011	15	Domain Adaptation With Coupled Subspaces IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Domain adaptation algorithms address a key issue in applied machine learning: How can we train a system under a source distribution but achieve high performance under a different target distribution?	John Blitzer; Sham Kakade; Dean Foster;
2010	1	Understanding The Difficulty Of Training Deep Feedforward Neural Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence.	Xavier Glorot; Yoshua Bengio;
2010	2	Noise-contrastive Estimation: A New Estimation Principle For Unnormalized Statistical Models IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present a new estimation principle for parameterized statistical models.	Michael Gutmann; Aapo Hyv�rinen;
2010	3	Why Does Unsupervised Pre-training Help Deep Learning? IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder variants with impressive results being obtained in several areas, mostly on vision and language datasets.	Dumitru Erhan; Aaron Courville; Yoshua Bengio; Pascal Vincent;
2010	4	Efficient Reductions For Imitation Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction.	Stephane Ross; Drew Bagnell;
2010	5	Bayesian Gaussian Process Latent Variable Model IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction.	Michalis Titsias; Neil D. Lawrence;
2010	6	Elliptical Slice Sampling IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Many probabilistic models introduce strong dependencies between variables using a latent multivariate Gaussian distribution or a Gaussian process.	Iain Murray; Ryan Adams; David MacKay;
2010	7	Efficient Learning Of Deep Boltzmann Machines IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present a new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables.	Ruslan Salakhutdinov; Hugo Larochelle;
2010	8	Structured Sparse Principal Component Analysis IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes.	Rodolphe Jenatton; Guillaume Obozinski; Francis Bach;
2010	9	Contextual Multi-Armed Bandits IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: For any ε> 0 we present an algorithm with regret O(T^\fraca+b+1a+b+2 + ε) where a,b are the covering dimensions of the query space and the ad space respectively.	Tyler Lu; David Pal; Martin Pal;
2010	10	Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a factored 3-way RBM that uses the states of its hidden units to represent abnormalities in the local covariance structure of an image.	Marc�Aurelio Ranzato; Alex Krizhevsky; Geoffrey Hinton;
2010	11	Learning Bayesian Network Structure Using LP Relaxations IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose to solve the combinatorial problem of finding the highest scoring Bayesian network structure from data.	Tommi Jaakkola; David Sontag; Amir Globerson; Marina Meila;
2010	12	Exclusive Lasso For Multi-task Feature Selection IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present a framework of kernel-based multi-task feature selection algorithm based on the proposed exclusive lasso regularizer.	Yang Zhou; Rong Jin; Steven Chu�Hong Hoi;
2010	13	Modeling Annotator Expertise: Learning When Everybody Knows A Bit Of Something IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper develops a probabilistic approach to this problem when annotators may be unreliable (labels are noisy), but also their expertise varies depending on the data they observe (annotators may have knowledge about different parts of the input space).	YAN YAN et. al.
2010	14	Gaussian Processes With Monotonicity Information IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: A method for using monotonicity information in multivariate Gaussian process regression and classification is proposed.	Jaakko Riihim�ki; Aki Vehtari;
2010	15	Dirichlet Process Mixtures Of Generalized Linear Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLMs), a new method of nonparametric regression that accommodates continuous and categorical inputs, models a response variable locally by a generalized linear model.	Lauren Hannah; David Blei; Warren Powell;
2009	1	Deep Boltzmann Machines IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables.	Ruslan Salakhutdinov; Geoffrey Hinton;
2009	2	Variational Learning Of Inducing Variables In Sparse Gaussian Processes IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce a variational formulation for sparse approximations that jointly infers the inducing inputs and the kernel hyperparameters by maximizing a lower bound of the true log marginal likelihood.	Michalis Titsias;
2009	3	Efficient Graphlet Kernels For Large Graph Comparison IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this article we propose to compare graphs by counting \it graphlets, \ie subgraphs with k nodes where k ∈{ 3, 4, 5 }.	Nino Shervashidze; SVN Vishwanathan; Tobias Petri; Kurt Mehlhorn; Karsten Borgwardt;
2009	4	Handling Sparsity Via The Horseshoe IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper presents a general, fully Bayesian framework for sparse supervised-learning problems based on the horseshoe prior.	Carlos M. Carvalho; Nicholas G. Polson; James G. Scott;
2009	5	Relational Topic Models For Document Networks IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We develop the relational topic model (RTM), a model of documents and the links between them.	Jonathan Chang; David Blei;
2009	6	Learning A Parametric Embedding By Preserving Local Structure IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: The paper presents a new unsupervised dimensionality reduction technique, called parametric t-SNE, that learns a parametric mapping between the high-dimensional data space and the low-dimensional latent space.	Laurens van der Maaten;
2009	7	The Difficulty Of Training Deep Architectures And The Effect Of Unsupervised Pre-Training IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures was unsuccessful until the recent advent of algorithms based on unsupervised pretraining.	Dumitru Erhan; Pierre-Antoine Manzagol; Yoshua Bengio; Samy Bengio; Pascal Vincent;
2009	8	Optimizing Costly Functions With Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: An optimization algorithm for minimizing a smooth function over a convex set is described.	Mark Schmidt; Ewout Berg; Michael Friedlander; Kevin Murphy;
2009	9	Online Inference Of Topics With Latent Dirichlet Allocation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce two related Rao-Blackwellized online inference algorithms for the latent Dirichlet allocation (LDA) model – incremental Gibbs samplers and particle filters – and compare their runtime and performance to that of existing algorithms.	Kevin Canini; Lei Shi; Thomas Griffiths;
2009	10	Latent Force Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modeling with a physical model of the system.	Mauricio �lvarez; David Luengo; Neil D. Lawrence;
2009	11	A New Perspective For Information Theoretic Feature Selection IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: The contribution of this paper is a unifying theoretical understanding of such filters.	Gavin Brown;
2009	12	Residual Splash For Optimally Parallelizing Belief Propagation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we focus on parallelizing message passing inference algorithms in graphical models.	Joseph Gonzalez; Yucheng Low; Carlos Guestrin;
2009	13	Particle Belief Propagation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we describe a generic particle belief propagation (PBP) algorithm which is closely related to previously proposed methods.	Alexander Ihler; David McAllester;
2009	14	Probabilistic Models For Incomplete Multi-dimensional Arrays IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We develop a probabilistic framework for modeling structural dependency from partially observed multi-dimensional array data, known as pTucker.	Wei Chu; Zoubin Ghahramani;
2009	15	Sampling Techniques For The Nystrom Method IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this work, we present an analysis of different sampling techniques for the Nystrom method.	Sanjiv Kumar; Mehryar Mohri; Ameet Talwalkar;
2007	1	Learning A Nonlinear Embedding By Preserving Class Neighbourhood Structure IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We show how to pretrain and fine-tune a multilayer neural network to learn a nonlinear transformation from the input space to a lowdimensional feature space in which K-nearest neighbour classification performs well.	Ruslan Salakhutdinov; Geoff Hinton;
2007	2	Hierarchical Beta Processes And The Indian Buffet Process IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We show that the beta process is the de Finetti mixing distribution underlying the Indian buffet process of [2].	Romain Thibaux; Michael I. Jordan;
2007	3	A Stochastic Quasi-Newton Method For Online Convex Optimization IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We are working on analyzing the convergence of online (L)BFGS, and extending it to nonconvex optimization problems.	Nicol N. Schraudolph; Jin Yu; Simon G�nter;
2007	4	Bayesian Inference And Optimal Design In The Sparse Linear Model IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We demonstrate our framework on a gene network identification task.	Matthias Seeger; Florian Steinke; Koji Tsuda;
2007	5	Hidden Topic Markov Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Given these parameters, the topics of all words in the same document are assumed to be independent.	Amit Gruber; Yair Weiss; Michal Rosen-Zvi;
2007	6	Stick-breaking Construction For The Indian Buffet Process IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we derive a stick-breaking representation for the IBP.	Yee Whye Teh; Dilan Gr�r; Zoubin Ghahramani;
2007	7	Local And Global Sparse Gaussian Process Approximations IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we start by investigating the regimes in which these different approaches work well or fail.	Edward Snelson; Zoubin Ghahramani;
2007	8	Learning Multilevel Distributed Representations For High-Dimensional Sequences IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We describe a new family of non-linear sequence models that are substantially more powerful than hidden Markov models or linear dynamical systems.	Ilya Sutskever; Geoffrey Hinton;
2007	9	Exact Bayesian Structure Learning From Uncertain Interventions IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We apply our exact technique to a biological data set that had previously been analyzed using MCMC [SPP+ 05, EW06, WGH06].	Daniel Eaton; Kevin Murphy;
2007	10	Generalized Non-metric Multidimensional Scaling IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we look at a generalization of this problem in which only a set of order relations of the form $d_{ij} < d_{kl}$ are provided.	SAMEER AGARWAL et. al.
2007	11	Metric Learning For Kernel Regression IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we construct a novel algorithm for supervised metric learning, which learns a distance function by directly minimizing the leave-one-out regression error.	Kilian Q. Weinberger; Gerald Tesauro;
2007	12	Transductive Classification Via Local Learning Regularization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we adapt it for Transductive Classification (TC) problems.	Mingrui Wu; Bernhard Sch�lkopf;
2007	13	A Unified Energy-Based Framework For Unsupervised Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce a view of unsupervised learning that integrates probabilistic and nonprobabilistic methods for clustering, dimensionality reduction, and feature extraction in a unified framework.	Marc�Aurelio Ranzato; Y-Lan Boureau; Sumit Chopra; Yann LeCun;
2007	14	Dissimilarity In Graph-Based Semi-Supervised Classification IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present a semi-supervised classification algorithm that learns from dissimilarity and similarity information on labeled and unlabeled data.	Andrew B. Goldberg; Xiaojin Zhu; Stephen Wright;
2007	15	SVM Versus Least Squares SVM IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We study the relationship between Support Vector Machines (SVM) and Least Squares SVM (LS-SVM).	Jieping Ye; Tao Xiong;
2005	1	Convergent Tree-reweighted Message Passing for Energy Minimization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Vladimir Kolmogorov;
2005	2	Hierarchical Probabilistic Neural Network Language Model IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Frederic Morin; Yoshua Bengio;
2005	3	Semi-Supervised Classification By Low Density Separation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Olivier Chapelle; Alexander Zien;
2005	4	Semisupervised Alignment of Manifolds IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Jihun Ham; Daniel Lee; Lawrence Saul;
2005	5	Kernel Methods for Missing Variables IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Alex J. Smola; S. V. N. Vishwanathan; Thomas Hofmann;
2005	6	Semiparametric Latent Factor Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Yee Whye Teh; Matthias Seeger; Michael I. Jordan;
2005	7	Hilbertian Metrics and Positive Definite Kernels on Probability Measures IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Matthias Hein; Olivier Bousquet;
2005	8	Efficient Non-Parametric Function Induction in Semi-Supervised Learning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Olivier Delalleau; Yoshua Bengio; Nicolas Le Roux;
2005	9	Nonlinear Dimensionality Reduction By Semidefinite Programming and Kernel Matrix Factorization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Kilian Weinberger; Benjamin Packer; Lawrence Saul;
2005	10	Loss Functions for Discriminative Training of Energy-Based Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Yann LeCun; Fu Jie Huang;
2005	11	Inadequacy of Interval Estimates Corresponding to Variational Bayesian Approximations IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Bo Wang; D. M. Titterington;
2005	12	On Manifold Regularization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Misha Belkin; Partha Niyogi; Vikas Sindhwani;
2005	13	Kernel Constrained Covariance for Dependence Measurement IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save	ARTHUR GRETTON et. al.
2005	14	Approximate Inference for Infinite Contingent Bayesian Networks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save	BRIAN MILCH et. al.
2005	15	Bayesian Conditional Random Fields IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Yuan Qi; Martin Szummer; Tom Minka;
2003	1	Fast Marginal Likelihood Maximisation for Sparse Bayesian Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We describe a new and highly accelerated algorithm which exploits recently-elucidated properties of the marginal likelihood function to enable maximisation via a principled and efficient sequential addition and deletion of candidate basis functions.	Michael E. Tipping; Anita C. Faul;
2003	2	Fast Forward Selection to Speed Up Sparse Gaussian Process Regression IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present a method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection.	Matthias W. Seeger; Christopher K. I. Williams; Neil D. Lawrence;
2003	3	Planning By Probabilistic Inference IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper presents and demonstrates a new approach to the problem of planning under uncertainty.	Hagai Attias;
2003	4	Tree-reweighted Belief Propagation Algorithms and Approximate ML Estimation By Pseudo-Moment Matching IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In previous work [10] we presented a class of upper bounds on the log partition function of an arbitrary undirected graphical model based on solving a convex variational problem.	Martin J. Wainwright; Tommi S. Jaakkola; Alan S. Willsky;
2003	5	On The Naive Bayes Model for Text Categorization IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We examine the "naive Bayes" assumption in the four models and show that the multinomial model is a modified naive Bayes Poisson model that assumes independence of document length and document class.	Susana Eyheramendy; David D. Lewis; David Madigan;
2003	6	A Generalized Linear Model for Principal Component Analysis of Binary Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we derive an alternating least squares method to estimate the basis vectors and generalized linear coefficients of the logistic PCA model.	Andrew I. Schein; Lawrence K. Saul; Lyle H. Ungar;
2003	7	A Unifying Theorem for Spectral Embedding and Clustering IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present a theorem that explains, for broad classes of affinity matrices and eigenbases, why this works: For successively smaller eigenbases (i.e., using fewer and fewer of the affinity matrix’s dominant eigenvalues and eigenvectors), the angles between "similar" vectors in the new representation shrink while the angles between "dissimilar" vectors grow.	Matthew Brand; Kun Huang;
2003	8	Super-resolution Enhancement of Video IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We consider the problem of enhancing the resolution of video through the addition of perceptually plausible high frequency information.	Christopher M. Bishop; Andrew Blake; Bhaskara Marthi;
2003	9	Solving Markov Random Fields Using Semi Definite Programming IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper explores a new generic method for matching, when there are conditional dependencies between the matches.	Philip H. S. Torr;
2003	10	Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper consists of an empirical examination of the first assumption, and surveys, implements and compares techniques by which logistic regression can be scaled to data with millions of attributes and records.	Paul Komarek; Andrew W. Moore;
2003	11	Rapid Evaluation of Multiple Density Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we present a generalization of that algorithm allowing multiple models with different bandwidths to be computed simultaneously, in substantially less time than either running the singlebandwidth algorithm for each model independently, or running the standard exhaustive method.	Alexander G. Gray; Andrew W. Moore;
2003	12	Curve Clustering with Random Effects Regression Mixtures IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we address the problem of clustering sets of curve or trajectory data generated by groups of objects or individuals.	Scott Gaffney; Padhraic Smyth;
2003	13	Bayesian Inference in The Presence of Determinism IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we consider the problem of performing inference on Bayesian networks which exhibit a substantial degree of determinism.	David Larkin; Rina Dechter;
2003	14	Combining Conjugate Direction Methods with Stochastic Approximation of Gradients IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: The method of conjugate directions provides a very effective way to optimize large, deterministic systems by gradient descent.	Nicol N. Schraudolph; Thore Graepel;
2003	15	On Improving The Efficiency of The Iterative Proportional Fitting Procedure IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: When the junction tree has an intractably large maximum clique size we propose to maximize an approximate constrained entropy based on region graphs (Yedidia et al., 2002).	Yee Whye Teh; Max Welling;
2001	1	Online Bagging and Boosting IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present simple online bagging and boosting algorithms that we claim perform as well as their batch counterparts.	Nikunj C. Oza; Stuart J. Russell;
2001	2	A Random Walks View of Spectral Segmentation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present a new view of clustering and segmentation by pairwise similarities.	Marina Meila; Jianbo Shi;
2001	3	On The Effectiveness of The Skew Divergence for Statistical Language Analysis IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Many measures of distributional similarity have been proposed for use in the distance-weighted averaging framework; here, we empirically study their stability properties, finding that similarity-based estimation appears to make more efficient use of more reliable portions of the training data.	Lillian Lee;
2001	4	An Anytime Algorithm for Causal Inference IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, I prove that the FCI algorithm can be interrupted at any stage and asked for output.	Peter Spirtes;
2001	5	An Improved Training Algorithm for Kernel Fisher Discriminants IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present a fast training algorithm for the kernel Fisher discriminant classifier.	Sebastian Mika; Alexander J. Smola; Bernhard Sch�lkopf;
2001	6	Learning Bayesian Networks with Mixed Variables IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Abstract: The paper considers conditional Gaussian networks. As conjugate local priors, we use the Dirichlet distribution for discrete variables and the Gaussian-inverse Gamma distribution …	Susanne Bottcher;
2001	7	Using Unsupervised Learning to Guide Resampling in Imbalanced Data Sets IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: The purpose of this paper is to argue that methods that equalize class membership are not as effective as possible when applied blindly and that improvements can be obtained by adjusting for the within-class imbalance.	Adam Nickerson; Nathalie Japkowicz; Evangelos E. Milios;
2001	8	Bagging and The Bayesian Bootstrap IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we introduce a Bayesian version of bagging based on the Bayesian bootstrap.	Merlise Clyde; Herbert Lee;
2001	9	Bayesian Support Vector Regression IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We show that the Bayesian evidence framework can be applied to both $\epsilon$-support vector regression ($\epsilon$-SVR) and $\nu$-support vector regression ($\nu$-SVR) algorithms.	Martin H. C. Law; James Tin-Yau Kwok;
2001	10	Why Averaging Classifiers Can Protect Against Overfitting IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We study a simple learning algorithm for binary classification.	Yoav Freund; Yishay Mansour; Robert E. Schapire;
2001	11	A Non-Parametric EM-Style Algorithm for Imputing Missing Values IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present an iterative non-parametric algorithm for imputing missing values.	Rich Caruana;
2001	12	Models for Conditional Probability Tables in Educational Assessment IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Experts in educational assessment can often identify the skills needed to provide a solution for a test item and which patterns of those skills pro duce better expected performance.	RUSSELL G. ALMOND et. al.
2001	13	Products of Hidden Markov Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We present products of hidden Markov models (PoHMM’s), a way of combining HMM’s to form a distributed state time series model.	Andrew D. Brown; Geoffrey E. Hinton;
2001	14	A Kernel Approach for Vector Quantization with Guaranteed Distortion Bounds IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a kernel method for vector quantization and clustering.	Michael E. Tipping; Bernhard Sch�lkopf;
2001	15	Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Empirical study of sensitivity analysis on a Bayesian network examines the effects of varying the network’s probability parameters on the posterior probabilities of the true hypothesis.	Oscar Kipersztok; Haiqin Wang;
1999	1	Model Choice: A Minimum Posterior Predictive Loss Approach IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a predictive criterion where the goal is good prediction of a replicate of the observed data but tempered by fidelity to the observed values.	Sujit Kumar Ghosh; Alan E. Gelfand;
1999	2	Probabilistic Kernel Regression Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce a class of flexible conditional probability models and techniques for classification/regression problems.	Tommi S. Jaakkola; David Haussler;
1999	3	Boosting Methodology for Regression Problems IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we develop a new boosting method for regression problems.	Greg Ridgeway; David Madigan; Thomas S. Richardson;
1999	4	Pattern Discovery Via Entropy Minimization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We propose a framework for learning hidden-variable models by optimizing entropies, in which entropy minimization, posterior maximization, and free energy minimization are all equivalent.	Matthew Brand;
1999	5	Stochastic Local Search for Bayesian Network IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: The paper evaluates empirically the suitability of Stochastic Local Search algorithms (SLS) for finding most probable explanations in Bayesian networks.	Kalev Kask; Rina Dechter;
1999	6	Modeling Decision Tree Performance with The Power Law IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper discusses the use of a power law to predict decision tree performance.	Lewis J. Frey; Douglas H. Fisher;
1999	7	Conditional Products: An Alternative Approach to Conditional Independence IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We introduce a new abstract approach to the study of conditional independence, founded on a concept analogous to the factorization properties of probabilistic independence, rather than the separation properties of a graph.	A. Philip Dawid; Milan Studen�;
1999	8	A Note on The Comparison of Polynomial Selection Methods IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we present a detailed empirical evaluation of three model selection methods which include an MML based approach and two SRM based methods.	Murlikrishna Viswanathan; Chris S. Wallace;
1999	9	Efficient Learning Using Constrained Sufficient Statistics IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we propose a new method for speeding up the computational process of learning Bayesian network structure.	Nir Friedman; Lise Getoor;
1997	1	A Variational Approach to Bayesian Logistic Regression Models and Their Extensions IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We consider a logistic regression model with a Gaussian prior distribution over the parameters.	Tommi S. Jaakkola; Michael I. Jordan;
1997	2	The Effects of Training Set Size on Decision Tree Complexity IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper presents experiments with 19 datasets and 5 decision tree pruning algorithms that show that increasing training set size often results in a linear increase in tree size, even when that additional complexity results in no significant increase in classification accuracy.	Tim Oates; David Jensen;
1997	3	Comparing Predictive Inference Methods for Discrete Domains IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Predictive inference is seen here as the process of determining the predictive distribution of a discrete variable, given a data set of training examples and the values for the other problem domain variables. We consider three approaches for computing this predictive distribution, and assume that the joint probability distribution for the variables belongs to a set of distributions determined by a set of parametric models.	Petri Kontkanen; Petri Myllym�ki; Tomi Silander; Henry Tirri; Peter Gr�nwald;
1997	4	A Polynomial Time Algorithm for Determining DAG Equivalence in The Presence of Latent Variables and Selection Bias IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save	Peter Spirtes; Thomas S. Richardson;
1997	5	A Family of Algorithms for Finding Temporal Structure in Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper, we introduce a new member of that family for handling event-based data, and offer an empirical characterization of a time series based algorithm.	Tim Oates; Matthew J. Schmill; David Jensen; Paul R. Cohen;
1997	6	A Comparison of Scientific and Engineering Criteria for Bayesian Model Selection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: The approximation is useful because it is computationally efficient, and because it provides a model that facilitates understanding· of the domain.	David Heckerman; David Maxwell Chickering;
1997	7	An Algorithm for Bayesian Network Construction from Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper presents an efficient algorithm for constructing Bayesian belief networks from databases.	Jie Cheng; David A. Bell; Weiru Liu;
1997	8	Robust Parameter Learning in Bayesian Networks with Missing Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper introduces a new method to learn the probabilities defining a BBNs from databases with missing data.	Marco Ramoni; Paola Sebastiani;
1997	9	PAC Learning with Constant-Partition Classification Noise and Applications to Decision Tree Induction IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: Specifically, we introduce a new model of noise called \emph{constant-partition classification noise} (CPCN) which generalizes the standard model of classification noise to allow different examples to have different rates of random misclassification.	Scott E. Decatur;
1997	10	A Forward Monte Carlo Method for Solving Influence Diagrams Using Local Computation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: The main goal of this paper is to describe a Monte Carlo method for solving influence diagrams using local computation.	John M. Charnes; Prakash P. Shenoy;
1997	11	Heuristic Greedy Search Algorithms for Latent Variable Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: In this paper we will describe how to extend search algorithms developed for non-latent variable DAG models to the case of DAG models with latent variables.	Peter Spirtes; Thomas S.Richardson; Christopher Meek;
1997	12	Using Prediction to Improve Combinatorial Optimization Search IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper describes a statistical approach to improving the performance of stochastic search algorithms for optimization.	Justin A. Boyan; Andrew W. Moore;
1997	13	Conceptual Clustering with Numeric-and-Nominal Mixed Data – A New Similarity Based System IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: This paper presents a new Similarity Based Agglomerative Clustering (SBAC) algorithm that works well for data with mixed numeric and nominal features.	Cen Li; Gautam Biswas;
1997	14	MML Mixture Modelling of Multi-state, Poisson, VonMises Circular and Gaussian Distributions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We provide a brief overview of MML inductive inference (Wallace and Boulton (1968) , Wallace and Freeman (1987)), and how it has both an information-theoretic and a Bayesian interpretation.	Chris S. Wallace; David L. Dowe;
1997	15	Mixed Memory Markov Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Save Highlight: We consider how to parameterize Markov models with prohibitively large state spaces.	Lawrence K. Saul; Michael I. Jordan;