Most Influential ECCV Papers (2026-03 Version)

March 27, 2026March 29, 2026 admin

The European Conference on Computer Vision (ECCV) is one of the top computer vision conferences in the world. Paper Digest Team analyzes all papers published on ECCV in the past years, and presents the 15 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the latest version of this list or the most influential papers from other conferences/journals, please visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2026-03)

To search or review papers within ECCV related to a specific topic, please use the search by venue (ECCV) and review by venue (ECCV) services. To browse the most productive ECCV authors by year ranked by #papers accepted, here are the most productive ECCV authors grouped by year.

As a pioneer in the field since 2018, Paper Digest has curated thousands of such lists, drawing on years of accumulated data across decades of conferences and research topics.To ensure users never miss a breakthrough, our daily digest service sifts through tens of thousands of new papers, clinical trials, news articles, community posts every day – delivering only what matters most to your specific interests. Beyond discovery, Paper Digest offers built-in research tools to help users read articles, write articles, get answers, conduct literature reviews, and generate research reports more efficiently.

Paper Digest Team
New York City, New York, 10017

TABLE 1: Most Influential ECCV Papers (2026-03 Version)

Year	Rank	Paper	Author(s)
2024	1	Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we develop an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions.	SHILONG LIU et. al.
2024	2	YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We proposed the concept of programmable gradient information (PGI) to cope with the various changes required by deep networks to achieve multiple objectives.	Chien-Yao Wang; I-Hau Yeh; Hong-Yuan Mark Liao;
2024	3	MMBENCH: Is Your Multi-Modal Model An All-around Player? IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Meanwhile, subjective benchmarks, such as OwlEval, offer comprehensive evaluations of a model’s abilities by incorporating human labor, which is not scalable and may display significant bias. In response to these challenges, we propose MMBench, a bilingual benchmark for assessing the multi-modal capabilities of VLMs.	YUAN LIU et. al.
2024	4	ShareGPT4V: Improving Large Multi-Modal Models with Better Captions IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we delve into the influence of training data on LMMs, uncovering three pivotal findings: 1) Highly detailed captions enable more nuanced vision-language alignment, significantly boosting the performance of LMMs in diverse benchmarks, surpassing outcomes from brief captions or VQA data 2) Cutting-edge LMMs can be close to the captioning capability of costly human annotators, and open-source LMMs could reach similar quality after lightweight fine-tuning 3) The performance of LMMs scales with the number of detailed captions, exhibiting remarkable improvements across a range from thousands to millions of captions.	LIN CHEN et. al.
2024	5	LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images.	JIAXIANG TANG et. al.
2024	6	Adversarial Diffusion Distillation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1–4 steps while maintaining high image quality.	Axel Sauer; Dominik Lorenz; Andreas Blattmann; Robin Rombach;
2024	7	MambaIR: A Simple Baseline for Image Restoration with State-Space Model IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce a simple but effective baseline, named MambaIR, which introduces both local enhancement and channel attention to improve the vanilla Mamba.	HANG GUO et. al.
2024	8	Grounding Image Matching in 3D with MASt3R IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we take a different stance and propose to cast matching as a 3D task with , a recent and powerful 3D reconstruction framework based on Transformers.	Vincent Leroy; Yohann Cabon; Jerome Revaud;
2024	9	LLaMA-VID: An Image Is Worth 2 Tokens in Large Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a novel method to tackle the token generation challenge in Vision Language Models (VLMs) for video and image understanding, called LLaMA-VID.	Yanwei Li; Chengyao Wang; Jiaya Jia;
2024	10	MathVerse: Does Your Multi-modal LLM Truly See The Diagrams in Visual Math Problems? IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce , an all-around visual math benchmark designed for an equitable and in-depth evaluation of MLLMs.We meticulously collect 2,612 high-quality, multi-subject math problems with diagrams from publicly available sources.	RENRUI ZHANG et. al.
2024	11	CoTracker: It Is Better to Track Together IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce , a transformer-based model that tracks a large number of 2D points in long video sequences.	NIKITA KARAEV et. al.
2024	12	SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Scalable Interpolant Transformers (SiT), a family of generative models built on the backbone of Diffusion Transformers (DiT).	NANYE MA et. al.
2024	13	DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional image animation techniques mainly focus on animating natural scenes with stochastic dynamics (e.g. clouds and fluid) or domain-specific motions (e.g. human hair or body motions), and thus limits their applicability to more general visual content. To overcome this limitation, we explore the synthesis of dynamic content for open-domain images, converting them into animated videos.	JINBO XING et. al.
2024	14	VideoMamba: State Space Model for Efficient Video Understanding IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Addressing the dual challenges of local redundancy and global dependencies in video understanding, this work innovatively adapts the Mamba to the video domain.	KUNCHANG LI et. al.
2024	15	MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce , an efficient model that, given sparse multi-view images as input, predicts clean feed-forward 3D Gaussians.	YUEDONG CHEN et. al.
2022	1	Visual Prompt Tuning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision.	MENGLIN JIA et. al.
2022	2	ByteTrack: Multi-Object Tracking By Associating Every Detection Box IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The objects with low detection scores, e.g. occluded objects, are simply thrown away, which brings non-negligible true object missing and fragmented trajectories. To solve this problem, we present a simple, effective and generic association method, tracking by associating almost every detection box instead of only the high score ones.	YIFU ZHANG et. al.
2022	3	BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images Via Spatiotemporal Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks.	ZHIQI LI et. al.
2022	4	TensoRF: Tensorial Radiance Fields IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present TensoRF, a novel approach to model and reconstruct radiance fields.	Anpei Chen; Zexiang Xu; Andreas Geiger; Jingyi Yu; Hao Su;
2022	5	Simple Baselines for Image Restoration IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a simple baseline that exceeds the SOTA methods and is computationally efficient.	Liangyu Chen; Xiaojie Chu; Xiangyu Zhang; Jian Sun;
2022	6	Exploring Plain Vision Transformer Backbones for Object Detection IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for object detection.	Yanghao Li; Hanzi Mao; Ross Girshick; Kaiming He;
2022	7	MaxViT: Multi-axis Vision Transformer IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we introduce an efficient and scalable attention model we call multi-axis attention, which consists of two aspects: blocked local and dilated global attention.	ZHENGZHONG TU et. al.
2022	8	A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce A-OKVQA, a crowdsourced dataset composed of a diverse set of about 25K questions requiring a broad base of commonsense and world knowledge to answer.	Dustin Schwenk; Apoorv Khandelwal; Christopher Clark; Kenneth Marino; Roozbeh Mottaghi;
2022	9	Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The current popular two-stream, two-stage tracking framework extracts the template and the search region features separately and then performs relation modeling, thus the extracted features lack the awareness of the target and have limited target-background discriminability. To tackle the above issue, we propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling by bridging the template-search image pairs with bidirectional information flows.	Botao Ye; Hong Chang; Bingpeng Ma; Shiguang Shan; Xilin Chen;
2022	10	Detecting Twenty-Thousand Classes Using Image-Level Supervision IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Detic, which simply trains the classifiers of a detector on image classification data and thus expands the vocabulary of detectors to tens of thousands of concepts.	Xingyi Zhou; Rohit Girdhar; Armand Joulin; Philipp Kr&auml,henb&uuml,hl; Ishan Misra;
2022	11	PETR: Position Embedding Transformation for Multi-View 3D Object Detection IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we develop position embedding transformation (PETR) for multi-view 3D object detection.	Yingfei Liu; Tiancai Wang; Xiangyu Zhang; Jian Sun;
2022	12	DualPrompt: Complementary Prompting for Rehearsal-Free Continual Learning IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a simple yet effective framework, DualPrompt, which learns a tiny set of parameters, called prompt, to properly instruct a pre-trained model to learn tasks arriving sequentially, without buffering past examples.	ZIFENG WANG et. al.
2022	13	MOTR: End-to-End Multiple-Object Tracking with TRansformer IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose MOTR, which extends DETR \cite{carion2020detr} and introduces “track query” to model the tracked instances in the entire video.	FANGAO ZENG et. al.
2022	14	Extract Free Dense Labels from CLIP IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we wish examine the intrinsic potential of CLIP for pixel-level dense prediction, specifically in semantic segmentation.	Chong Zhou; Chen Change Loy; Bo Dai;
2022	15	Compositional Visual Generation with Composable Diffusion Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose an alternative structured approach for compositional generation using diffusion models.	Nan Liu; Shuang Li; Yilun Du; Antonio Torralba; Joshua B. Tenenbaum;
2020	1	End-to-End Object Detection With Transformers IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a new method that views object detection as a direct set prediction.	NICOLAS CARION et. al.
2020	2	RAFT: Recurrent All-Pairs Field Transforms For Optical Flow IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Recurrent All-Pairs Field Transforms (RAFT), a new deep network architecture for estimating optical flow.	Zachary Teed; Jia Deng;
2020	3	Contrastive Multiview Coding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact.	Yonglong Tian; Dilip Krishnan; Phillip Isola;
2020	4	UNITER: UNiversal Image-TExt Representation Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce UNITER, a UNiversal Image-TExt Representation, learned through large-scale pre-training over four image-text datasets (COCO, Visual Genome, Conceptual Captions, and SBU Captions), which can power heterogeneous downstream V+L tasks with joint multimodal embeddings.	YEN-CHUN CHEN et. al.
2020	5	Oscar: Object-Semantics Aligned Pre-training For Vision-Language Tasks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While existing methods simply concatenate image region features and text features as input to the model to be pre-trained and use self-attention to learn image-text semantic alignments in a brute force manner, in this paper, we propose a new learning method Oscar, which uses object tags detected in images as anchor points to significantly ease the learning of alignments.	XIUJUN LI et. al.
2020	6	Object-Contextual Representations For Semantic Segmentation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we address the semantic segmentation problem with a focus on the context aggregation strategy.	Yuhui Yuan; Xilin Chen; Jingdong Wang;
2020	7	Contrastive Learning For Unpaired Image-to-Image Translation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a straightforward method for doing so — maximizing mutual information between the two, using a framework based on contrastive learning.	Taesung Park Alexei A. Efros Richard Zhang Jun-Yan Zhu;
2020	8	Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs By Implicitly Unprojecting To 3D IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a new end-to-end architecture that directly extracts a bird’s-eye-view representation of a scene given image data from an arbitrary number of cameras.	Jonah Philion; Sanja Fidler;
2020	9	Big Transfer (BiT): General Visual Representation Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT).	ALEXANDER KOLESNIKOV et. al.
2020	10	Tracking Objects As Points IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art.	Xingyi Zhou; Vladlen Koltun; Philipp Kr&aumlhenb&uumlhl;
2020	11	Square Attack: A Query-efficient Black-box Adversarial Attack Via Random Search IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the Square Attack, a score-based black-box $l_2$- and $l_\infty$- adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking.	Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion; Matthias Hein;
2020	12	Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Towards this end, we present Trajectron++, a modular, graph-structured recurrent model that forecasts the trajectories of a general number of diverse agents while incorporating agent dynamics and heterogeneous data (e.g., semantic maps).	Tim Salzmann; Boris Ivanovic; Punarjay Chakravarty; Marco Pavone;
2020	13	NeRF: Representing Scenes As Neural Radiance Fields For View Synthesis IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views.	BEN MILDENHALL et. al.
2020	14	Convolutional Occupancy Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes.	Songyou Peng; Michael Niemeyer; Lars Mescheder; Marc Pollefeys; Andreas Geiger;
2020	15	Towards Real-Time Multi-Object Tracking IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose an MOT system that allows target detection and appearance embedding to be learned in a shared model.	Zhongdao Wang; Liang Zheng; Yixuan Liu; Yali Li; Shengjin Wang;
2018	1	CBAM: Convolutional Block Attention Module IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Convolutional Block Attention Module (CBAM), a simple and effective attention module that can be integrated with any feed-forward convolutional neural networks.	Sanghyun Woo; Jongchan Park; Joon-Young Lee; In So Kweon;
2018	2	Encoder-Decoder With Atrous Separable Convolution For Semantic Image Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose to combine the advantages from both methods.	Liang-Chieh Chen; Yukun Zhu; George Papandreou; Florian Schroff; Hartwig Adam;
2018	3	ShuffleNet V2: Practical Guidelines For Efficient CNN Architecture Design IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Taking these factors into account, this work proposes practical guidelines for efficient network de- sign.	Ningning Ma; Xiangyu Zhang; Hai-Tao Zheng; Jian Sun;
2018	4	Image Super-Resolution Using Very Deep Residual Channel Attention Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To solve these problems, we propose the very deep residual channel attention networks (RCAN).	YULUN ZHANG et. al.
2018	5	Group Normalization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present Group Normalization (GN) as a simple alternative to BN.	Yuxin Wu; Kaiming He;
2018	6	CornerNet: Detecting Objects As Paired Keypoints IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network.	Hei Law; Jia Deng;
2018	7	Multimodal Unsupervised Image-to-image Translation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this limitation, we propose a Multimodal Unsupervised Image-to-image Translation (MUNIT) framework.	Xun Huang; Ming-Yu Liu; Serge Belongie; Jan Kautz;
2018	8	BiSeNet: Bilateral Segmentation Network For Real-time Semantic Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we address this dilemma with a novel Bilateral Segmentation Network (BiSeNet).	CHANGQIAN YU et. al.
2018	9	Unified Perceptual Parsing For Scene Understanding IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image.	Tete Xiao; Yingcheng Liu; Bolei Zhou; Yuning Jiang; Jian Sun;
2018	10	Deep Clustering For Unsupervised Learning Of Visual Features IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features.	Mathilde Caron; Piotr Bojanowski; Armand Joulin; Matthijs Douze;
2018	11	Image Inpainting For Irregular Holes Using Partial Convolutions IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose to use partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels.	GUILIN LIU et. al.
2018	12	Progressive Neural Architecture Search IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms.	CHENXI LIU et. al.
2018	13	Simple Baselines For Human Pose Estimation And Tracking IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work provides simple and effective baseline methods.	Bin Xiao; Haiping Wu; Yichen Wei;
2018	14	Memory Aware Synapses: Learning What (not) To Forget IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we argue that, given the limited model capacity and the unlimited new information to be learned, knowledge has to be preserved or erased selectively.	Rahaf Aljundi; Francesca Babiloni; Mohamed Elhoseiny; Marcus Rohrbach; Tinne Tuytelaars;
2018	15	ICNet For Real-Time Semantic Segmentation On High-Resolution Images IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We focus on the challenging task of real-time semantic segmentation in this paper.	Hengshuang Zhao; Xiaojuan Qi; Xiaoyong Shen; Jianping Shi; Jiaya Jia;