Focusing on 3D Gaussian Splatting (3DGS) for scene representation, reconstruction, rendering, and manipulation, including efficiency improvements and extensions for dynamic scenes or specific object types.
Methods for reconstructing 3D shapes, scenes, and objects from various inputs (single/multi-view images, point clouds, videos), including surface reconstruction, scene graphs, and understanding object functionality.
Developing perception, planning, and control systems for autonomous vehicles and robots, including 3D detection, scene completion, trajectory prediction, embodied AI agents, and simulation.
Introducing new datasets and benchmarks for various tasks (e.g., VQA, segmentation, driving, robotics, fairness), and developing novel evaluation metrics and methodologies.
Exploring diffusion models for creating, modifying, and restoring images and videos, including techniques for control, efficiency, and specific applications like style transfer or inpainting.
Developing lightweight and computationally efficient models, including novel architectures (e.g., Mamba, State Space Models), quantization, pruning, efficient attention mechanisms, and model distillation.
Training models across decentralized data sources while preserving privacy, addressing challenges like data heterogeneity, communication efficiency, model merging, and security concerns like backdoor attacks.
Developing and applying generative models other than diffusion models, such as GANs, VAEs, Autoregressive Models, and Flow-based models, for tasks like image/video synthesis, data augmentation, and representation learning.
Focusing on 3D/4D human reconstruction, pose estimation, motion generation/prediction, avatar creation, and understanding human-object interactions, often from monocular video or sparse inputs.
Improving image quality and generating new images, covering tasks like super-resolution, denoising, deblurring, inpainting, colorization, HDR generation, and style transfer (excluding papers primarily focused on diffusion models).
Exploring novel training strategies and adaptation techniques, including domain adaptation/generalization, few-shot/zero-shot learning, continual/lifelong learning, self-supervised learning, and dataset distillation.
Applying computer vision techniques to medical imaging, including segmentation (tumors, vessels, organs), reconstruction, synthesis, registration, anomaly detection, report generation, and representation learning specifically for medical data.
Integrating vision and language understanding, including model architectures, alignment techniques, evaluation benchmarks, and applications in tasks like VQA, captioning, reasoning, and embodied AI.
Core computer vision tasks including detecting, segmenting, and tracking objects in images and videos, with focuses on open-vocabulary, weakly supervised, few-shot, multi-object, and anomaly scenarios.
Focusing on non-standard imaging modalities and sensors like event cameras, LiDAR, hyperspectral, polarization, thermal, radar, and single-photon detectors, including reconstruction, perception, and fusion techniques.
Addressing issues of fairness, bias, safety, privacy, robustness, and interpretability in AI models, including detection of AI-generated content, adversarial attacks/defenses, explainability methods, and unlearning.
Analyzing and synthesizing video content, including action recognition/detection, temporal grounding, video captioning, long video understanding, video generation/editing, and video-audio alignment.
Core topics related to understanding visual content, including image classification, feature learning, metric learning, representation alignment (e.g., visual-textual), and analyzing model capabilities.