Most Influential ICLR Papers (2026-03 Version)

March 27, 2026March 29, 2026 admin

The International Conference on Learning Representations (ICLR) is one of the top machine learning conferences in the world. Paper Digest Team analyzes all papers published on ICLR in the past years, and presents the 15 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the latest version of this list or the most influential papers from other conferences/journals, please visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2026-03)

To search or review papers within ICLR related to a specific topic, please use the search by venue (ICLR) and review by venue (ICLR) services. To browse the most productive ICLR authors by year ranked by #papers accepted, here are the most productive ICLR authors grouped by year.

As a pioneer in the field since 2018, Paper Digest has curated thousands of such lists, drawing on years of accumulated data across decades of conferences and research topics.To ensure users never miss a breakthrough, our daily digest service sifts through tens of thousands of new papers, clinical trials, news articles, community posts every day – delivering only what matters most to your specific interests. Beyond discovery, Paper Digest offers built-in research tools to help users read articles, write articles, get answers, conduct literature reviews, and generate research reports more efficiently.

Paper Digest Team
New York City, New York, 10017

TABLE 1: Most Influential ICLR Papers (2026-03 Version)

Year	Rank	Paper	Author(s)
2025	1	SAM 2: Segment Anything in Images and Videos IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos.	NIKHILA RAVI et. al.
2025	2	CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos that align seamlessly with text prompts, with a frame rate of 16 fps and resolution of 768 x 1360 pixels.	ZHUOYI YANG et. al.
2025	3	KAN: Kolmogorov–Arnold Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs).	ZIMING LIU et. al.
2025	4	LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose LiveCodeBench, a comprehensive and contamination-free evaluation of LLMs for code, which collects new problems over time from contests across three competition platforms, Leetcode, Atcoder, and Codeforces.	NAMAN JAIN et. al.
2025	5	WizardMath: Empowering Mathematical Reasoning for Large Language Models Via Reinforced Evol-Instruct IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of LLMs, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math.	HAIPENG LUO et. al.
2025	6	Show-o: One Single Transformer to Unify Multimodal Understanding and Generation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation.	JINHENG XIE et. al.
2025	7	GSM-Symbolic: Understanding The Limitations of Mathematical Reasoning in Large Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To overcome the limitations of existing evaluations, we introduce GSM-Symbolic, an improved benchmark created from symbolic templates that allow for the generation of a diverse set of questions.	SEYED IMAN MIRZADEH et. al.
2025	8	NV-Embed: Improved Techniques for Training LLMs As Generalist Embedding Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce the NV-Embed model, incorporating architectural designs, training procedures, and curated datasets to significantly enhance the performance of LLM as a versatile embedding model, while maintaining its simplicity and reproducibility.	CHANKYU LEE et. al.
2025	9	BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To assess how well LLMs can solve challenging and practical tasks via programs, we introduce BigCodeBench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks.	TERRY YUE ZHUO et. al.
2025	10	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this way, we achieve 100\% attack success rate—according to GPT-4 as a judge—on Vicuna-13B, Mistral-7B, Phi-3-Mini, Nemotron-4-340B, Llama-2-Chat-7B/13B/70B, Llama-3-Instruct-8B, Gemma-7B, GPT-3.5, GPT-4o, and R2D2 from HarmBench that was adversarially trained against the GCG attack.	Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion;
2025	11	RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present the Robotics Diffusion Transformer (RDT), a pioneering diffusion foundation model for bimanual manipulation.	SONGMING LIU et. al.
2025	12	OpenHands: An Open Platform for AI Software Developers As Generalist Agents IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce OpenHands, a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to a human developer: by writing code, interacting with a command line, and browsing the web.	XINGYAO WANG et. al.
2025	13	Generative Verifiers: Reward Modeling As Next-Token Prediction IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While LLM-based verifiers are typically trained as discriminative classifiers to score solutions, they do not utilize the text generation capabilities of pretrained LLMs. To overcome this limitation, we instead propose training verifiers using the ubiquitous next-token prediction objective, jointly on verification and solution generation.	LUNJUN ZHANG et. al.
2025	14	Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent studies have shown that the denoising process in (generative) diffusion models can induce meaningful (discriminative) representations inside the model, though the quality of these representations still lags behind those learned through recent self-supervised learning methods. We argue that one main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations.	SIHYUN YU et. al.
2025	15	Training Language Models to Self-Correct Via Reinforcement Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current methods for training self-correction typically depend on either multiple models, a more advanced model, or additional forms of supervision. To address these shortcomings, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM’s self-correction ability using entirely self-generated data.	AVIRAL KUMAR et. al.
2024	1	SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Stable Diffusion XL (SDXL), a latent diffusion model for text-to-image synthesis.	DUSTIN PODELL et. al.
2024	2	MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We believe that the enhanced multi-modal generation capabilities of GPT-4 stem from the utilization of sophisticated large language models (LLM). To examine this phenomenon, we present MiniGPT-4, which aligns a frozen visual encoder with a frozen advanced LLM, Vicuna, using one projection layer.	Deyao Zhu; Jun Chen; Xiaoqian Shen; Xiang Li; Mohamed Elhoseiny;
2024	3	Let’s Verify Step By Step IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset.	HUNTER LIGHTMAN et. al.
2024	4	FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We observe that the inefficiency is due to suboptimal work partitioning between different thread blocks and warps on the GPU, causing either low-occupancy or unnecessary shared memory reads/writes. We propose FlashAttention-2, with better work partitioning to address these issues.	Tri Dao;
2024	5	SWE-bench: Can Language Models Resolve Real-world Github Issues? IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce SWE-bench, an evaluation framework consisting of 2,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 12 popular Python repositories.	CARLOS E JIMENEZ et. al.
2024	6	Self-RAG: Learning to Retrieve, Generate, and Critique Through Self-Reflection IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG)* that enhances an LM’s quality and factuality through retrieval and self-reflection.*	Akari Asai; Zeqiu Wu; Yizhong Wang; Avirup Sil; Hannaneh Hajishirzi;
2024	7	Efficient Streaming Language Models with Attention Sinks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we first demonstrate that the emergence of attention sink is due to the strong attention scores towards initial tokens as a “sink” even if they are not semantically important. Based on the above analysis, we introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning.	Guangxuan Xiao; Yuandong Tian; Beidi Chen; Song Han; Mike Lewis;
2024	8	ITransformer: Inverted Transformers Are Effective for Time Series Forecasting IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we reflect on the competent duties of Transformer components and repurpose the Transformer architecture without any modification to the basic components.	YONG LIU et. al.
2024	9	AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models Without Specific Tuning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present AnimateDiff, a practical framework for animating personalized T2I models without requiring model-specific tuning.	YUWEI GUO et. al.
2024	10	MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has not been systematically studied. To bridge this gap, we present MathVista, a benchmark designed to combine challenges from diverse mathematical and visual tasks.	PAN LU et. al.
2024	11	ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This is in contrast to the excellent tool-use capabilities of state-of-the-art (SOTA) closed-source LLMs, e.g., ChatGPT. To bridge this gap, we introduce ToolLLM, a general tool-use framework encompassing data construction, model training, and evaluation.	YUJIA QIN et. al.
2024	12	WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we show an avenue for creating large amounts of instruction data with varying levels of complexity using LLM instead of humans.	CAN XU et. al.
2024	13	Teaching Large Language Models to Self-Debug IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, for complex programming tasks, generating the correct solution in one go becomes challenging, thus some prior works have designed program repair approaches to improve code generation performance. In this work, we propose self-debugging, which teaches a large language model to debug its predicted program.	Xinyun Chen; Maxwell Lin; Nathanael Schärli; Denny Zhou;
2024	14	Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We outline and critically analyze potential mitigations and advocate for further research efforts toward reinforcing safety protocols for the customized fine-tuning of aligned LLMs.	XIANGYU QI et. al.
2024	15	WebArena: A Realistic Web Environment for Building Autonomous Agents IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we build an environment for language-guided agents that is highly realistic and reproducible.Building upon our environment, we release a set of benchmark tasks focusing on evaluating the functional correctness of task completions.	SHUYAN ZHOU et. al.
2023	1	Self-Consistency Improves Chain of Thought Reasoning in Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting.	XUEZHI WANG et. al.
2023	2	ReAct: Synergizing Reasoning and Acting in Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information.	SHUNYU YAO et. al.
2023	3	DreamFusion: Text-to-3D Using 2D Diffusion IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D or multiview data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis.	Ben Poole; Ajay Jain; Jonathan T. Barron; Ben Mildenhall;
2023	4	Flow Matching for Generative Modeling IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale.	Yaron Lipman; Ricky T. Q. Chen; Heli Ben-Hamu; Maximilian Nickel; Matthew Le;
2023	5	A Time Series Is Worth 64 Words: Long-term Forecasting with Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning.	Yuqi Nie; Nam H Nguyen; Phanwadee Sinthong; Jayant Kalagnanam;
2023	6	An Image Is Worth One Word: Personalizing Text-to-Image Generation Using Textual Inversion IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom.	RINON GAL et. al.
2023	7	DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present DINO (DETR with Improved deNoising anchOr boxes), a strong end-to-end object detector.	HAO ZHANG et. al.
2023	8	Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present rectified flow, a simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions $\pi_0$ and $\pi_1$, hence providing a unified solution to generative modeling and domain transfer, among various other tasks involving distribution transport.	Xingchao Liu; Chengyue Gong; qiang liu;
2023	9	Make-A-Video: Text-to-Video Generation Without Text-Video Data IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Make-A-Video — an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).	URIEL SINGER et. al.
2023	10	TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Technically, we propose the TimesNet with TimesBlock as a task-general backbone for time series analysis.	HAIXU WU et. al.
2023	11	CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The prevalence of large language models advances the state-of-the-art for program synthesis, though limited training resources and data impede open access to such models. To democratize this, we train and release a family of large language models up to 16.1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER.	ERIK NIJKAMP et. al.
2023	12	Diffusion Posterior Sampling for General Noisy Inverse Problems IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we extend diffusion solvers to efficiently handle general noisy (non)linear inverse problems via the Laplace approximation of the posterior sampling.	Hyungjin Chung; Jeongsol Kim; Michael Thompson Mccann; Marc Louis Klasky; Jong Chul Ye;
2023	13	GLM-130B: An Open Bilingual Pre-trained Model IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.	AOHAN ZENG et. al.
2023	14	Large Language Models Are Human-Level Prompt Engineers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by classical program synthesis and the human approach to prompt engineering, we propose Automatic Prompt Engineer (APE) for automatic instruction generation and selection.	YONGCHAO ZHOU et. al.
2023	15	Human Motion Diffusion Model IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Motion Diffusion Model (MDM), a carefully adapted classifier-free diffusion-based generative model for human motion data.	GUY TEVET et. al.
2022	1	LoRA: Low-Rank Adaptation of Large Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Finetuning updates have a low intrinsic rank which allows us to train only the rank decomposition matrices of certain weights, yielding better performance and practical benefits.	EDWARD J HU et. al.
2022	2	MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Light-weight and general-purpose vision transformers for mobile devices	Sachin Mehta; Mohammad Rastegari;
2022	3	SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these issues, we introduce a new image synthesis and editing method, Stochastic Differential Editing (SDEdit), based on a diffusion model generative prior, which synthesizes realistic images by iteratively denoising through a stochastic differential equation (SDE).	CHENLIN MENG et. al.
2022	4	Multitask Prompted Training Enables Zero-Shot Task Generalization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping general natural language tasks into a human-readable prompted form.	VICTOR SANH et. al.
2022	5	How Attentive Are Graph Attention Networks? IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify that Graph Attention Networks (GAT) compute a very weak form of attention. We show its empirical implications and propose a fix.	Shaked Brody; Uri Alon; Eran Yahav;
2022	6	Open-vocabulary Object Detection Via Vision and Language Knowledge Distillation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose using knowledge distillation to train an object detector that can detect objects with arbitrary text inputs, outperforming its supervised counterparts on rare categories.	Xiuye Gu; Tsung-Yi Lin; Weicheng Kuo; Yin Cui;
2022	7	VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Variance regularization prevents collapse in self-supervised representation learning	Adrien Bardes; Jean Ponce; Yann LeCun;
2022	8	Towards A Unified View of Parameter-Efficient Transfer Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a unified framework for several state-of-the-art parameter-efficient tuning methods,	Junxian He; Chunting Zhou; Xuezhe Ma; Taylor Berg-Kirkpatrick; Graham Neubig;
2022	9	DAB-DETR: Dynamic Anchor Boxes Are Better Queries for DETR IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present in this paper a novel query formulation using dynamic anchor boxes for DETR and offer a deeper understanding of the role of queries in DETR.	SHILONG LIU et. al.
2022	10	Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show that our simple position method enables transformer LMs to efficiently and accurately perform inference on longer sequences than they were trained on.	Ofir Press; Noah Smith; Mike Lewis;
2022	11	An Explanation of In-context Learning As Implicit Bayesian Inference IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In-context learning emerges both theoretically and empirically when the pretraining distribution is a mixture distribution, resulting in the language model implicitly performing Bayesian inference in its forward pass.	Sang Michael Xie; Aditi Raghunathan; Percy Liang; Tengyu Ma;
2022	12	SimVLM: Simple Visual Language Model Pretraining with Weak Supervision IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we relax these constraints and present a minimalist pretraining framework, named Simple Visual Language Model (SimVLM).	ZIRUI WANG et. al.
2022	13	Reversible Instance Normalization for Accurate Time-Series Forecasting Against Distribution Shift IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a simple yet effective normalization method, reversible instance normalization (RevIN), which solves the time-series forecasting task against the distribution shift problem.	TAESUNG KIM et. al.
2022	14	Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper detects time series anomalies from a new association-based dimension. We find an inherently normal-abnormal distinguishable evidence as Association Discrepancy. Co-designed with this evidence, our model achieves the SOTA on six benchmarks.	Jiehui Xu; Haixu Wu; Jianmin Wang; Mingsheng Long;
2022	15	Pseudo Numerical Methods for Diffusion Models on Manifolds IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose PNDMs, a new kind of numerical method, to accelerate diffusion models on manifolds.	Luping Liu; Yi Ren; Zhijie Lin; Zhou Zhao;
2021	1	An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Transformers applied directly to image patches and pre-trained on large datasets work really well on image classification.	ALEXEY DOSOVITSKIY et. al.
2021	2	Denoising Diffusion Implicit Models IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show and justify a GAN-like iterative generative model with relatively fast sampling, high sample quality and without any adversarial training.	Jiaming Song; Chenlin Meng; Stefano Ermon;
2021	3	Score-Based Generative Modeling Through Stochastic Differential Equations IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A general framework for training and sampling from score-based models that unifies and generalizes previous methods, allows likelihood computation, and enables controllable generation.	YANG SONG et. al.
2021	4	Measuring Massive Multitask Language Understanding IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We test language models on 57 different multiple-choice tasks.	DAN HENDRYCKS et. al.
2021	5	Deformable DETR: Deformable Transformers for End-to-End Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Deformable DETR is an efficient and fast-converging end-to-end object detector. It mitigates the high complexity and slow convergence issues of DETR via a novel sampling-based efficient attention mechanism.	XIZHOU ZHU et. al.
2021	6	DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A new model architecture DeBERTa is proposed that improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder.	Pengcheng He; Xiaodong Liu; Jianfeng Gao; Weizhu Chen;
2021	7	Fourier Neural Operator for Parametric Partial Differential Equations IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A novel neural operator based on Fourier transformation for learning partial differential equations.	ZONGYI LI et. al.
2021	8	Rethinking Attention with Performers IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Performers, linear full-rank-attention Transformers via provable random feature approximation methods, without relying on sparsity or low-rankness.	KRZYSZTOF MARCIN CHOROMANSKI et. al.
2021	9	Adaptive Federated Optimization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose adaptive federated optimization techniques, and highlight their improved performance over popular methods such as FedAvg.	SASHANK J. REDDI et. al.
2021	10	DiffWave: A Versatile Diffusion Model for Audio Synthesis IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: DiffWave is a versatile diffusion probabilistic model for waveform generation, which matches the state-of-the-art neural vocoder in terms of quality and can generate abundant realistic voices in time-domain without any conditional information.	Zhifeng Kong; Wei Ping; Jiaji Huang; Kexin Zhao; Bryan Catanzaro;
2021	11	Sharpness-aware Minimization for Efficiently Improving Generalization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by the connection between geometry of the loss landscape and generalization, we introduce a procedure for simultaneously minimizing loss value and loss sharpness.	Pierre Foret; Ariel Kleiner; Hossein Mobahi; Behnam Neyshabur;
2021	12	GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we demonstrate conditional computation as a remedy to the above mentioned impediments, and demonstrate its efficacy and utility.	DMITRY LEPIKHIN et. al.
2021	13	FastSpeech 2: Fast and High-Quality End-to-End Text to Speech IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a non-autoregressive TTS model named FastSpeech 2 to better solve the one-to-many mapping problem in TTS and surpass autoregressive models in voice quality.	YI REN et. al.
2021	14	Tent: Fully Test-Time Adaptation By Entropy Minimization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Deep networks can generalize better during testing by adapting to feedback from their own predictions.	Dequan Wang; Evan Shelhamer; Shaoteng Liu; Bruno Olshausen; Trevor Darrell;
2021	15	GraphCodeBERT: Pre-training Code Representations with Data Flow IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.	DAYA GUO et. al.
2020	1	ELECTRA: Pre-training Text Encoders As Discriminators Rather Than Generators IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A text encoder trained to distinguish real input tokens from plausible fakes efficiently learns effective language representations.	Kevin Clark; Minh-Thang Luong; Quoc V. Le; Christopher D. Manning;
2020	2	BERTScore: Evaluating Text Generation With BERT IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose BERTScore, an automatic evaluation metric for text generation, which correlates better with human judgments and provides stronger model selection performance than existing metrics.	Tianyi Zhang; Varsha Kishore; Felix Wu*; Kilian Q. Weinberger; Yoav Artzi;
2020	3	ALBERT: A Lite BERT For Self-supervised Learning Of Language Representations IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A new pretraining method that establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.	ZHENZHONG LAN et. al.
2020	4	The Curious Case Of Neural Text Degeneration IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current language generation systems either aim for high likelihood and devolve into generic repetition or miscalibrate their stochasticity?we provide evidence of both and propose a solution: Nucleus Sampling.	Ari Holtzman; Jan Buys; Leo Du; Maxwell Forbes; Yejin Choi;
2020	5	On The Convergence Of FedAvg On Non-IID Data IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.	Xiang Li; Kaixuan Huang; Wenhao Yang; Shusen Wang; Zhihua Zhang;
2020	6	Reformer: The Efficient Transformer IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Efficient Transformer with locality-sensitive hashing and reversible layers	Nikita Kitaev; Lukasz Kaiser; Anselm Levskaya;
2020	7	On The Variance Of The Adaptive Learning Rate And Beyond IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: If warmup is the answer, what is the question?	LIYUAN LIU et. al.
2020	8	VL-BERT: Pre-training Of Generic Visual-Linguistic Representations IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: VL-BERT is a simple yet powerful pre-trainable generic representation for visual-linguistic tasks. It is pre-trained on the massive-scale caption dataset and text-only corpus, and can be finetuned for varies down-stream visual-linguistic tasks.	WEIJIE SU et. al.
2020	9	Dream To Control: Learning Behaviors By Latent Imagination IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Dreamer, an agent that learns long-horizon behaviors purely by latent imagination using analytic value gradients.	Danijar Hafner; Timothy Lillicrap; Jimmy Ba; Mohammad Norouzi;
2020	10	Strategies For Pre-training Graph Neural Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We develop a strategy for pre-training Graph Neural Networks (GNNs) and systematically study its effectiveness on multiple datasets, GNN architectures, and diverse downstream tasks.	WEIHUA HU* et. al.
2020	11	DropEdge: Towards Deep Graph Convolutional Networks On Node Classification IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes DropEdge, a novel and flexible technique to alleviate over-smoothing and overfitting issue in deep Graph Convolutional Networks.	Yu Rong; Wenbing Huang; Tingyang Xu; Junzhou Huang;
2020	12	AugMix: A Simple Data Processing Method To Improve Robustness And Uncertainty IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We obtain state-of-the-art on robustness to data shifts, and we maintain calibration under data shift even though even when accuracy drops	DAN HENDRYCKS* et. al.
2020	13	Once For All: Train One Network And Specialize It For Efficient Deployment IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce techniques to train a single once-for-all network that fits many hardware platforms.	Han Cai; Chuang Gan; Tianzhe Wang; Zhekai Zhang; Song Han;
2020	14	N-BEATS: Neural Basis Expansion Analysis For Interpretable Time Series Forecasting IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A novel deep interpretable architecture that achieves state of the art on three large scale univariate time series forecasting datasets	Boris N. Oreshkin; Dmitri Carpov; Nicolas Chapados; Yoshua Bengio;
2020	15	Decoupling Representation And Classifier For Long-Tailed Recognition IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we decouple the learning procedure into representation learning and classification, and systematically explore how different balancing strategies affect them for long-tailed recognition.	BINGYI KANG et. al.
2019	1	Decoupled Weight Decay Regularization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Novel variants of optimization methods that combine the benefits of both adaptive and non-adaptive methods.	Ilya Loshchilov; Frank Hutter;
2019	2	How Powerful Are Graph Neural Networks? IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We develop theoretical foundations for the expressive power of GNNs and design a provably most powerful GNN.	Keyulu Xu; Weihua Hu; Jure Leskovec; Stefanie Jegelka;
2019	3	GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a multi-task benchmark and analysis platform for evaluating generalization in natural language understanding systems.	ALEX WANG et. al.
2019	4	Large Scale GAN Training for High Fidelity Natural Image Synthesis IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: GANs benefit from scaling up.	Andrew Brock; Jeff Donahue; Karen Simonyan;
2019	5	DARTS: Differentiable Architecture Search IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a differentiable architecture search algorithm for both convolutional and recurrent networks, achieving competitive performance with the state of the art using orders of magnitude less computation resources.	Hanxiao Liu; Karen Simonyan; Yiming Yang;
2019	6	Benchmarking Neural Network Robustness to Common Corruptions and Perturbations IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ImageNet-C to measure classifier corruption robustness and ImageNet-P to measure perturbation robustness	Dan Hendrycks; Thomas Dietterich;
2019	7	The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Feedforward neural networks that can have weights pruned after training could have had the same weights pruned before training	Jonathan Frankle; Michael Carbin;
2019	8	ImageNet-trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: ImageNet-trained CNNs are biased towards object texture (instead of shape like humans). Overcoming this major difference between human and machine vision yields improved detection performance and previously unseen robustness to image distortions.	ROBERT GEIRHOS et. al.
2019	9	Learning Deep Representations By Mutual Information Estimation and Maximization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We learn deep representation by maximizing mutual information, leveraging structure in the objective, and are able to compute with fully supervised classifiers with comparable architectures	R DEVON HJELM et. al.
2019	10	Deep Graph Infomax IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A new method for unsupervised representation learning on graphs, relying on maximizing mutual information between local and global representations in a graph. State-of-the-art results, competitive with supervised learning.	PETAR VELICKOVIC et. al.
2019	11	RotatE: Knowledge Graph Embedding By Relational Rotation in Complex Space IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A new state-of-the-art approach for knowledge graph embedding.	Zhiqing Sun; Zhi-Hong Deng; Jian-Yun Nie; Jian Tang;
2019	12	ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Proxy-less neural architecture search for directly learning architectures on large-scale target task (ImageNet) while reducing the cost to the same level of normal training.	Han Cai; Ligeng Zhu; Song Han;
2019	13	A Closer Look at Few-shot Classification IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A detailed empirical study in few-shot classification that revealing challenges in standard evaluation setting and showing a new direction.	Wei-Yu Chen; Yen-Cheng Liu; Zsolt Kira; Yu-Chiang Frank Wang; Jia-Bin Huang;
2019	14	Predict Then Propagate: Graph Neural Networks Meet Personalized PageRank IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Personalized propagation of neural predictions (PPNP) improves graph neural networks by separating them into prediction and propagation via personalized PageRank.	Johannes Klicpera; Aleksandar Bojchevski; Stephan G�nnemann;
2019	15	Robustness May Be at Odds with Accuracy IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show that adversarial robustness might come at the cost of standard classification performance, but also yields unexpected benefits.	Dimitris Tsipras; Shibani Santurkar; Logan Engstrom; Alexander Turner; Aleksander Madry;
2018	1	Graph Attention Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A novel approach to processing graph-structured data by neural networks, leveraging attention over a node’s neighborhood. Achieves state-of-the-art results on transductive citation network tasks and an inductive protein-protein interaction task.	PETAR VELICKOVIC et. al.
2018	2	Towards Deep Learning Models Resistant to Adversarial Attacks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We provide a principled, optimization-based re-look at the notion of adversarial examples, and develop methods that produce models that are adversarially robust against a wide range of adversaries.	Aleksander Madry; Aleksandar Makelov; Ludwig Schmidt; Dimitris Tsipras; Adrian Vladu;
2018	3	Mixup: Beyond Empirical Risk Minimization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Training on convex combinations between random training examples and their labels improves generalization in deep neural networks	Hongyi Zhang; Moustapha Cisse; Yann N. Dauphin; David Lopez-Paz;
2018	4	Progressive Growing of GANs for Improved Quality, Stability, and Variation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We train generative adversarial networks in a progressive fashion, enabling us to generate high-resolution images with high quality.	Tero Karras; Timo Aila; Samuli Laine; Jaakko Lehtinen;
2018	5	Spectral Normalization for Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator of GANs.	Takeru Miyato; Toshiki Kataoka; Masanori Koyama; Yuichi Yoshida;
2018	6	Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A neural sequence model that learns to forecast on a directed graph.	Yaguang Li; Rose Yu; Cyrus Shahabi; Yan Liu;
2018	7	Unsupervised Representation Learning By Predicting Image Rotations IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In our work we propose to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input.	Spyros Gidaris; Praveer Singh; Nikos Komodakis;
2018	8	Ensemble Adversarial Training: Attacks and Defenses IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Adversarial training with single-step methods overfits, and remains vulnerable to simple black-box and white-box attacks. We show that including adversarial examples from multiple sources helps defend against black-box attacks.	FLORIAN TRAM�R et. al.
2018	9	On The Convergence of Adam and Beyond IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We investigate the convergence of popular optimization algorithms like Adam , RMSProp and propose new variants of these methods which provably converge to optimal solution in convex settings.	Sashank J. Reddi; Satyen Kale; Sanjiv Kumar;
2018	10	Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ODIN, a simple and effective method that does not require any change to a pre-trained neural network.	Shiyu Liang; Yixuan Li; R. Srikant;
2018	11	Active Learning for Convolutional Neural Networks: A Core-Set Approach IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We approach to the problem of active learning as a core-set selection problem and show that this approach is especially useful in the batch active learning setting which is crucial when training CNNs.	Ozan Sener; Silvio Savarese;
2018	12	Mixed Precision Training IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Since this format has a narrower range than single-precision we propose three techniques for preventing the loss of critical information.	PAULIUS MICIKEVICIUS et. al.
2018	13	Variational Image Compression with A Scale Hyperprior IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We describe an end-to-end trainable model for image compression based on variational autoencoders.	Johannes Ball�; David Minnen; Saurabh Singh; Sung Jin Hwang; Nick Johnston;
2018	14	Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: An end-to-end trained deep neural network that leverages Gaussian Mixture Modeling to perform density estimation and unsupervised anomaly detection in a low-dimensional space learned by deep autoencoder.	BO ZONG et. al.
2018	15	Demystifying MMD GANs IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Explain bias situation with MMD GANs; MMD GANs work with smaller critic networks than WGAN-GPs; new GAN evaluation metric.	Mikolaj Binkowski; Dougal J. Sutherland; Michael Arbel; Arthur Gretton;