Key Computer Vision Trends: ICCV 2023 vs 2025

An analysis comparing research trends between ICCV 2023 and ICCV 2025.

Source: ICCV-2023 Paper Digest, ICCV-2025 Paper Digest

1. Generative AI Surge (Diffusion & Gaussian Splatting)

Diffusion models expanded significantly from 2023's focus on image tasks to broader applications in 2025 including video, 3D, and restoration. 3D Gaussian Splatting emerged dramatically in 2025 for 3D scene representation and rendering, a topic almost absent in 2023.

Representative Papers:

2. Rise of Large Multimodal Models (MLLMs / VLMs)

Vision-Language Models (VLMs) saw substantial growth. While 2023 explored VLM applications, 2025 shows a massive increase in research on Large Multimodal Models (MLLMs), focusing on reasoning, instruction following, efficiency, and mitigating issues like hallucinations and biases.

Representative Papers:

3. Advancements in 3D Vision (Beyond NeRF)

While Neural Radiance Fields (NeRF) were popular in 2023, 3D Gaussian Splatting gained significant traction in 2025 for reconstruction and rendering. The overall focus expanded to scaling 3D generation, improving quality and efficiency, scene understanding, and handling dynamics.

Representative Papers:

4. Efficiency & Adaptation Emphasis

The increasing size of models drove a stronger focus on efficiency in 2025. Key areas include model compression (quantization, pruning), developing efficient architectures (like Mamba), token reduction/merging strategies, and parameter-efficient fine-tuning (PEFT) methods for adapting large models.

Representative Papers:

5. Deeper Integration in Autonomous Systems

Autonomous driving research moved towards deeper AI integration. While 2023 focused on prediction and cooperative perception, 2025 emphasizes end-to-end systems, world models, using vision-language reasoning for driving decisions, and enhancing robustness in real-world conditions.

Representative Papers:

6. Video Takes Center Stage

Video analysis and generation gained significant momentum. Compared to 2023, 2025 shows a marked increase in research on video generation, controllable video editing, understanding long videos, and using video for tasks like 3D reconstruction and motion synthesis.

Representative Papers:

Other Notable Shifts & Conclusion

Beyond the major trends, there's a growing emphasis on adapting large foundation models (like CLIP, SAM) using techniques like prompting and adapters, often in zero/few-shot settings. Research into ethics, fairness, robustness, and interpretability has also intensified, particularly concerning generative and multimodal models. Medical imaging remains a strong application area, increasingly integrating multimodal data and foundation models.

In conclusion, the ICCV landscape from 2023 to 2025 highlights a rapid acceleration in generative modeling (especially Diffusion and Gaussian Splatting) and a significant shift towards leveraging and refining Large Multimodal Models. While core vision tasks persist, the focus increasingly includes scaling these technologies, improving efficiency, controllability, and robustness, and applying them to complex real-world challenges like autonomous driving and long-form video understanding, all while maintaining a growing awareness of ethical implications.