Paper Digest: ICCV 2023 Highlights
To search or review papers within ICCV-2023 related to a specific topic, please use the search by venue (ICCV-2023) and review by venue (ICCV-2023) services. To browse papers by author, here is a list of all authors (ICCV-2023). You may also like to explore our “Best Paper” Digest (ICCV), which lists the most influential ICCV papers since 1988.
Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services to track, search, review and rewrite scientific literature.
You are welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: ICCV 2023 Highlights
Paper | Author(s) | |
---|---|---|
1 | Towards Attack-tolerant Federated Learning Via Critical Parameter Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new defense strategy, FedCPA (Federated learning with Critical Parameter Analysis). |
Sungwon Han; Sungwon Park; Fangzhao Wu; Sundong Kim; Bin Zhu; Xing Xie; Meeyoung Cha; |
2 | Stochastic Segmentation with Conditional Categorical Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this context, stochastic semantic segmentation methods must learn to predict conditional distributions of labels given the image, but this is challenging due to the typically multimodal distributions, high-dimensional output spaces, and limited annotation data. To address these challenges, we propose a conditional categorical diffusion model (CCDM) for semantic segmentation based on Denoising Diffusion Probabilistic Models. |
Lukas Zbinden; Lars Doorenbos; Theodoros Pissas; Adrian Thomas Huber; Raphael Sznitman; Pablo Márquez-Neila; |
3 | Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we rethink the low-light image enhancement task and propose a physically explainable and generative diffusion model for low-light image enhancement, termed as Diff-Retinex. |
Xunpeng Yi; Han Xu; Hao Zhang; Linfeng Tang; Jiayi Ma; |
4 | Bird’s-Eye-View Scene Graph for Vision-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current agents are built upon panoramic observations, which hinders their ability to perceive 3D scene geometry and easily leads to ambiguous selection of panoramic view. To address these limitations, we present a BEV Scene Graph (BSG), which leverages multi-step BEV representations to encode scene layouts and geometric cues of indoor environment under the supervision of 3D detection. |
Rui Liu; Xiaohan Wang; Wenguan Wang; Yi Yang; |
5 | PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a simple framework for end-to-end latency-aware tracking, i.e., end-to-end predictive visual tracking (PVT++). |
Bowen Li; Ziyuan Huang; Junjie Ye; Yiming Li; Sebastian Scherer; Hang Zhao; Changhong Fu; |
6 | A Dynamic Dual-Processing Object Detection Framework Inspired By The Brain’s Recognition Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Research in neuroscience has shown that the recognition decision in the brain is based on two processes, namely familiarity and recollection. Based on this biological support, we propose an efficient and effective dual-processing object detection framework. |
Minying Zhang; Tianpeng Bu; Lulu Hu; |
7 | Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that the vulnerability indeed exists. |
Zhengzhi Lu; He Wang; Ziyi Chang; Guoan Yang; Hubert P. H. Shum; |
8 | GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Autonomous vehicles operating in complex real-world environments require accurate predictions of interactive behaviors between traffic participants. This paper tackles the interaction prediction problem by formulating it with hierarchical game theory and proposing the GameFormer model for its implementation. |
Zhiyu Huang; Haochen Liu; Chen Lv; |
9 | Towards Better Robustness Against Common Corruptions for Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Towards improving RaCC for UDA methods in an unsupervised manner, we propose a novel Distributionally and Discretely Adversarial Regularization (DDAR) framework in this paper. |
Zhiqiang Gao; Kaizhu Huang; Rui Zhang; Dawei Liu; Jieming Ma; |
10 | Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Not surprisingly, we find that most LT-MLC and PL-MLC approaches fail to solve the PLT-MLC, resulting in significant performance degradation on the two proposed PLT-MLC benchmarks. Therefore, we propose an end-to-end learning framework: COrrection -> ModificatIon -> balanCe, abbreviated as COMC. |
Wenqiao Zhang; Changshuo Liu; Lingze Zeng; Bengchin Ooi; Siliang Tang; Yueting Zhuang; |
11 | Flexible Visual Recognition By Evidential Modeling of Confusion and Ignorance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Two challenges emerge along with this novel task. First, prediction uncertainty should be separately quantified as confusion depicting inter-class uncertainties and ignorance identifying out-of-distribution samples. Second, both confusion and ignorance should be comparable between samples to enable effective decision-making. In this paper, we propose to model these two sources of uncertainty explicitly with the theory of Subjective Logic. |
Lei Fan; Bo Liu; Haoxiang Li; Ying Wu; Gang Hua; |
12 | Texture Generation on 3D Meshes with Point-UV Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on synthesizing high-quality textures on 3D meshes. |
Xin Yu; Peng Dai; Wenbo Li; Lan Ma; Zhengzhe Liu; Xiaojuan Qi; |
13 | Supervised Homography Learning with Realistic Dataset Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an iterative framework, which consists of two phases: a generation phase and a training phase, to generate realistic training data and yield a supervised homography network. |
Hai Jiang; Haipeng Li; Songchen Han; Haoqiang Fan; Bing Zeng; Shuaicheng Liu; |
14 | E2E-LOAD: End-to-End Long-form Online Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods are constrained by their fixed backbone design, which fails to leverage the potential benefits of a trainable backbone. This paper introduces an end-to-end learning network that revises these approaches, incorporating a backbone network design that improves effectiveness and efficiency. |
Shuqiang Cao; Weixin Luo; Bairui Wang; Wei Zhang; Lin Ma; |
15 | TALL: Thumbnail Layout for Deepfake Video Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies. |
Yuting Xu; Jian Liang; Gengyun Jia; Ziming Yang; Yanhao Zhang; Ran He; |
16 | Enhanced Soft Label for Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, modern self-training based SSL algorithms use a pre-defined constant threshold to select unlabeled pixel samples that contribute to the training, thus failing to be compatible with different learning difficulties of variant categories and different learning status of the model. To address these issues, we propose Enhanced Soft Label (ESL), a curriculum learning approach to fully leverage the high-value supervisory signals implicit in the untrustworthy pseudo label. |
Jie Ma; Chuan Wang; Yang Liu; Liang Lin; Guanbin Li; |
17 | Self-supervised Monocular Depth Estimation: Let’s Talk About The Weather Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While it is tempting to use such data augmentations for self-supervised depth, in the past this was shown to degrade performance instead of improving it. In this paper, we put forward a method that uses augmentations to remedy this problem. |
Kieran Saunders; George Vogiatzis; Luis J. Manso; |
18 | Bidirectional Alignment for Domain Adaptive Detection with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Bidirectional Alignment for domain adaptive Detection with Transformers (BiADT) to improve cross domain object detection performance. |
Liqiang He; Wei Wang; Albert Chen; Min Sun; Cheng-Hao Kuo; Sinisa Todorovic; |
19 | Fast Neural Scene Flow Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate that scene flow is different—with the dominant computational bottleneck stemming from the loss function itself (i.e., Chamfer distance). |
Xueqian Li; Jianqiao Zheng; Francesco Ferroni; Jhony Kaesemodel Pontes; Simon Lucey; |
20 | CAME: Contrastive Automated Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Contrastive Automatic Model Evaluation (CAME), a novel AutoEval framework that is rid of involving training set in the loop. |
Ru Peng; Qiuyang Duan; Haobo Wang; Jiachen Ma; Yanbo Jiang; Yongjun Tu; Xiu Jiang; Junbo Zhao; |
21 | ExposureDiffusion: Learning to Expose for Low-light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure model. Different from a vanilla diffusion model that has to perform Gaussian denoising, with the injected physics-based exposure model, our restoration process can directly start from a noisy image instead of pure noise. |
Yufei Wang; Yi Yu; Wenhan Yang; Lanqing Guo; Lap-Pui Chau; Alex C. Kot; Bihan Wen; |
22 | HM-ViT: Hetero-Modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the multi-agent hetero-modal cooperative perception problem where agents may have distinct sensor modalities. |
Hao Xiang; Runsheng Xu; Jiaqi Ma; |
23 | HyperReenact: One-Shot Reenactment Via Jointly Learning to Refine and Retarget Faces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose. |
Stella Bounareli; Christos Tzelepis; Vasileios Argyriou; Ioannis Patras; Georgios Tzimiropoulos; |
24 | Order-preserving Consistency Regularization for Domain Adaptation and Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the Order-preserving Consistency Regularization (OCR) for cross-domain tasks. |
Mengmeng Jing; Xiantong Zhen; Jingjing Li; Cees G. M. Snoek; |
25 | RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on egocentric videos of Ego4D, we constructed a broad coverage of the video-based referring expression comprehension dataset: RefEgo. |
Shuhei Kurita; Naoki Katsura; Eri Onami; |
26 | Exploring Temporal Frequency Spectrum in Deep Video Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the blurred sequence in the Fourier space and figure out some intrinsic frequency-temporal priors that imply the temporal blur degradation can be accessibly decoupled in the potential frequency domain. |
Qi Zhu; Man Zhou; Naishan Zheng; Chongyi Li; Jie Huang; Feng Zhao; |
27 | Unified Visual Relationship Detection with Vision and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The issue is exacerbated in visual relationship detection when second-order visual semantics are introduced between pairs of objects. To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs). |
Long Zhao; Liangzhe Yuan; Boqing Gong; Yin Cui; Florian Schroff; Ming-Hsuan Yang; Hartwig Adam; Ting Liu; |
28 | Occ^2Net: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Occ^2Net, a novel image matching method that models occlusion relations using 3D occupancy and infers matching points in occluded regions. |
Miao Fan; Mingrui Chen; Chen Hu; Shuchang Zhou; |
29 | Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Make-An-Animation, a text-conditioned human motion generation model which learns more diverse poses and prompts from large-scale image-text datasets, enabling significant improvement in performance over prior works. |
Samaneh Azadi; Akbar Shah; Thomas Hayes; Devi Parikh; Sonal Gupta; |
30 | Rickrolling The Artist: Injecting Backdoors Into Text Encoders for Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce backdoor attacks against text-guided generative models and demonstrate that their text encoders pose a major tampering risk. |
Lukas Struppek; Dominik Hintersdorf; Kristian Kersting; |
31 | LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is because they have to synthesize intricate details about all objects in an image based on a text description. Therefore, we present a technique for segmenting real and AI-generated images using latent diffusion models (LDMs) trained on internet-scale datasets. |
Koutilya PNVR; Bharat Singh; Pallabi Ghosh; Behjat Siddiquie; David Jacobs; |
32 | Workie-Talkie: Accelerating Federated Learning By Overlapping Computing and Communications Via Contrastive Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address aforementioned challenges, in this paper, we propose a novel "workie-talkie" FL scheme, which can accelerate FL’s training by overlapping local computing and wireless communications via contrastive regularization (FedCR). |
Rui Chen; Qiyu Wan; Pavana Prakash; Lan Zhang; Xu Yuan; Yanmin Gong; Xin Fu; Miao Pan; |
33 | Downstream-agnostic Adversarial Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose AdvEncoder, the first framework for generating downstream-agnostic universal adversarial examples based on the pre-trained encoder. |
Ziqi Zhou; Shengshan Hu; Ruizhi Zhao; Qian Wang; Leo Yu Zhang; Junhui Hou; Hai Jin; |
34 | Late Stopping: Avoiding Confidently Learning from Mislabeled Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process. |
Suqin Yuan; Lei Feng; Tongliang Liu; |
35 | AerialVLN: Vision-and-Language Navigation for UAVs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning. To fill this gap and facilitate research in this field, we propose a new task named AerialVLN, which is UAV-based and towards outdoor environments. |
Shubo Liu; Hongsheng Zhang; Yuankai Qi; Peng Wang; Yanning Zhang; Qi Wu; |
36 | On The Robustness of Open-World Test-Time Training: Self-Training with Dynamic Prototype Expansion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve the robustness of OWTTT we first develop an adaptive strong OOD pruning which improves the efficacy of the self-training TTT method. We further propose a way to dynamically expand the prototypes to represent strong OOD samples for an improved weak/strong OOD data separation. |
Yushu Li; Xun Xu; Yongyi Su; Kui Jia; |
37 | Studying How to Efficiently and Effectively Guide Models with Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better understand the effectiveness of the various design choices that have been explored in the context of model guidance, in this work we conduct an in-depth evaluation across various loss functions, attribution methods, models, and ‘guidance depths’ on the PASCAL VOC 2007 and MS COCO 2014 datasets. |
Sukrut Rao; Moritz Böhle; Amin Parchami-Araghi; Bernt Schiele; |
38 | Most Important Person-Guided Dual-Branch Cross-Patch Attention for Group Affect Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a solution by incorporating the psychological concept of the Most Important Person (MIP), which represents the most noteworthy face in a crowd and has affective semantic meaning. |
Hongxia Xie; Ming-Xian Lee; Tzu-Jui Chen; Hung-Jen Chen; Hou-I Liu; Hong-Han Shuai; Wen-Huang Cheng; |
39 | SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL). |
Hong Yan; Yang Liu; Yushen Wei; Zhen Li; Guanbin Li; Liang Lin; |
40 | Achievement-Based Training Progress Balancing for Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To balance the training progress, we propose an achievement-based multi-task loss to modulate training speed based on the "achievement," defined as the ratio of current accuracy to single-task accuracy. |
Hayoung Yun; Hanjoo Cho; |
41 | Pose-Free Neural Radiance Fields Via Implicit Pose Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design IR-NeRF, an innovative pose-free NeRF that introduces implicit pose regularization to refine pose estimator with unposed real images and improve the robustness of the pose estimation for real images. |
Jiahui Zhang; Fangneng Zhan; Yingchen Yu; Kunhao Liu; Rongliang Wu; Xiaoqin Zhang; Ling Shao; Shijian Lu; |
42 | Self-supervised Learning to Bring Dual Reversed Rolling Shutter Images Alive Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Self-supervised learning framework for Dual reversed RS distortions Correction (SelfDRSC), where a DRSC network can be learned to generate a high framerate GS video only based on dual RS images with reversed distortions. |
Wei Shang; Dongwei Ren; Chaoyu Feng; Xiaotao Wang; Lei Lei; Wangmeng Zuo; |
43 | Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is that conflicts within pseudo labels, identified through symbolic knowledge, can serve as strong yet commonly ignored learning signals. |
Chen Liang; Wenguan Wang; Jiaxu Miao; Yi Yang; |
44 | Self-Supervised Monocular Depth Estimation By Direction-aware Cumulative Convolution Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we find that self-supervised monocular depth estimation shows a direction sensitivity and environmental dependency in the feature representation. |
Wencheng Han; Junbo Yin; Jianbing Shen; |
45 | Encyclopedic VQA: Visual Questions About Detailed Properties of Fine-Grained Categories Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. |
Thomas Mensink; Jasper Uijlings; Lluis Castrejon; Arushi Goel; Felipe Cadar; Howard Zhou; Fei Sha; André Araujo; Vittorio Ferrari; |
46 | Towards Understanding The Generalization of Deepfake Detectors from A Game-Theoretical View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to explain the generalization of deepfake detectors from the novel perspective of multi-order interactions among visual concepts. |
Kelu Yao; Jin Wang; Boyu Diao; Chao Li; |
47 | Few-Shot Common Action Localization Via Cross-Attentional Fusion of Context and Temporal Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is to localize action instances in a long untrimmed query video using just meager trimmed support videos representing a common action whose class information is not given. |
Juntae Lee; Mihir Jain; Sungrack Yun; |
48 | Physically-Plausible Illumination Distribution Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As part of this effort, we extend the Cube++ illumination estimation dataset to provide ground truth illumination distributions per image. Using this new ground truth data, we describe how to train a lightweight neural network method to predict the scene’s illumination distribution. |
Egor Ershov; Vasily Tesalin; Ivan Ermakov; Michael S. Brown; |
49 | 3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We hypothesize that 3D point locations can provide more information than rays. Therefore, we introduce 3D point positional encoding, 3DPPE, to the 3D detection Transformer decoder. |
Changyong Shu; Jiajun Deng; Fisher Yu; Yifan Liu; |
50 | Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a novel clustering-based F&B separation algorithm. |
Qinying Liu; Zilei Wang; Shenghai Rong; Junjie Li; Yixin Zhang; |
51 | VertexSerum: Poisoning Graph Neural Networks for Link Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The graph links, e.g., social relationships and transaction history, are sensitive and valuable information, which raises privacy concerns when using GNNs. To exploit these vulnerabilities, we propose VertexSerum, a novel graph poisoning attack that increases the effectiveness of graph link stealing by amplifying the link connectivity leakage. |
Ruyi Ding; Shijin Duan; Xiaolin Xu; Yunsi Fei; |
52 | NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input. |
Chenfeng Xu; Bichen Wu; Ji Hou; Sam Tsai; Ruilong Li; Jialiang Wang; Wei Zhan; Zijian He; Peter Vajda; Kurt Keutzer; Masayoshi Tomizuka; |
53 | Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SCOPE, a novel collaborative perception framework that aggregates the spatio-temporal awareness characteristics across on-road agents in an end-to-end manner. |
Kun Yang; Dingkang Yang; Jingyu Zhang; Mingcheng Li; Yang Liu; Jing Liu; Hanqi Wang; Peng Sun; Liang Song; |
54 | LPFF: A Portrait Dataset for Face Generators Across Large Poses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present LPFF, a large-pose Flickr face dataset comprised of 19,590 high-quality real large-pose portrait images. |
Yiqian Wu; Jing Zhang; Hongbo Fu; Xiaogang Jin; |
55 | Pseudo-label Alignment for Semi-supervised Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, in existing pipelines, pseudo-labels that contain valuable information may be directly filtered out due to mismatches in class and mask quality. To address this issue, we propose a novel framework, called pseudo-label aligning instance segmentation (PAIS), in this paper. |
Jie Hu; Chen Chen; Liujuan Cao; Shengchuan Zhang; Annan Shu; Guannan Jiang; Rongrong Ji; |
56 | Deep Geometrized Cartoon Line Inbetweening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To preserve the precision and detail of the line drawings, we propose a new approach, called AnimeInbet, which geometrizes raster line drawings into graphs of endpoints and reframes the inbetweening task as a graph fusion problem with vertex repositioning. |
Li Siyao; Tianpei Gu; Weiye Xiao; Henghui Ding; Ziwei Liu; Chen Change Loy; |
57 | MixBag: Bag-Level Data Augmentation for Learning from Label Proportions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a bag-level data augmentation method for LLP called MixBag, which is based on the key observation from our preliminary experiments; that the instance-level classification accuracy improves as the number of labeled bags increases even though the total number of instances is fixed. |
Takanori Asanomi; Shinnosuke Matsuo; Daiki Suehiro; Ryoma Bise; |
58 | Effective Real Image Editing with Accelerated Iterative Diffusion Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose an Accelerated Iterative Diffusion Inversion method, dubbed AIDI, that significantly improves reconstruction accuracy with minimal additional overhead in space and time complexity. |
Zhihong Pan; Riccardo Gherardi; Xiufeng Xie; Stephen Huang; |
59 | 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we propose a generative model of deep features based on a volumetric human representation with Gaussian ellipsoidal kernels emitting 3D pose-dependent feature vectors. |
Yi Zhang; Pengliang Ji; Angtian Wang; Jieru Mei; Adam Kortylewski; Alan Yuille; |
60 | Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, inspired by the way humans recognize Chinese texts, we propose a two-stage framework for CTR. |
Haiyang Yu; Xiaocong Wang; Bin Li; Xiangyang Xue; |
61 | MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Leveraging the Unreal Engine 5 City Sample project, we developed a pipeline to easily collect aerial and street city views, accompanied by ground-truth camera poses and a range of additional data modalities. |
Yixuan Li; Lihan Jiang; Linning Xu; Yuanbo Xiangli; Zhenzhi Wang; Dahua Lin; Bo Dai; |
62 | LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents an easy-to-use regularizer for GAN training, which helps explicitly link some axes of the latent space to a set of pixels in the synthesized image. |
Jiapeng Zhu; Ceyuan Yang; Yujun Shen; Zifan Shi; Bo Dai; Deli Zhao; Qifeng Chen; |
63 | Exploiting Proximity-Aware Tasks for Embodied Social Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an end-to-end architecture that exploits Proximity-Aware Tasks (referred as to Risk and Proximity Compass) to inject into a reinforcement learning navigation policy the ability to infer common-sense social behaviours. |
Enrico Cancelli; Tommaso Campari; Luciano Serafini; Angel X. Chang; Lamberto Ballan; |
64 | SVDiff: Compact Parameter Space for Diffusion Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach to address the limitations in existing text-to-image diffusion models for personalization and customization. |
Ligong Han; Yinxiao Li; Han Zhang; Peyman Milanfar; Dimitris Metaxas; Feng Yang; |
65 | UniFace: Unified Cross-Entropy Loss for Deep Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, no unified threshold is available to separate positive sample-to-class pairs from negative sample-to-class pairs. To bridge this gap, we design a UCE (Unified Cross-Entropy) loss for face recognition model training, which is built on the vital constraint that all the positive sample-to-class similarities shall be larger than the negative ones. |
Jiancan Zhou; Xi Jia; Qiufu Li; Linlin Shen; Jinming Duan; |
66 | Jumping Through Local Minima: Quantization in The Loss Landscape of Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, dubbed Evol-Q, we use evolutionary search to effectively traverse the non-smooth landscape. Additionally, we propose using an infoNCE loss, which not only helps combat overfitting on the small (1,000 images) calibration dataset but also makes traversing such a highly non-smooth surface easier. |
Natalia Frumkin; Dibakar Gope; Diana Marculescu; |
67 | Hierarchical Contrastive Learning for Pattern-Generalizable Image Corruption Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a novel method for automatic corruption detection, which allows for blind corruption restoration without known corruption masks. |
Xin Feng; Yifeng Xu; Guangming Lu; Wenjie Pei; |
68 | Learning Optical Flow from Event Camera with Rendered Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to render a physically correct event-flow dataset using computer graphics models. |
Xinglong Luo; Kunming Luo; Ao Luo; Zhengning Wang; Ping Tan; Shuaicheng Liu; |
69 | EPiC: Ensemble of Partial Point Clouds for Robust Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we propose a general ensemble framework, based on partial point cloud sampling. |
Meir Yossef Levi; Guy Gilboa; |
70 | Distilling Large Vision-Language Model with Out-of-Distribution Generalizability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose two principles from vision and language modality perspectives to enhance student’s OOD generalization: (1) by better imitating teacher’s visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher’s language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. |
Xuanlin Li; Yunhao Fang; Minghua Liu; Zhan Ling; Zhuowen Tu; Hao Su; |
71 | Cross-Modal Learning with 3D Deformable Attention for Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a new 3D deformable transformer for action recognition with adaptive spatiotemporal receptive fields and a cross-modal learning scheme. |
Sangwon Kim; Dasom Ahn; Byoung Chul Ko; |
72 | What Do Neural Networks Learn in Image Classification? A Frequency Shortcut Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a metric to measure class-wise frequency characteristics and a method to identify frequency shortcuts. |
Shunxin Wang; Raymond Veldhuis; Christoph Brune; Nicola Strisciuglio; |
73 | Tracking By 3D Model Estimation of Unknown Objects in Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most model-free visual object tracking methods formulate the tracking task as object location estimation given by a 2D segmentation or a bounding box in each video frame. We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation, namely the textured 3D shape and 6DoF pose in each video frame. |
Denys Rozumnyi; Jiří Matas; Marc Pollefeys; Vittorio Ferrari; Martin R. Oswald; |
74 | ScatterNeRF: Seeing Through Fog with Physically-Based Inverse Neural Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce ScatterNeRF, a neural rendering method which adequately renders foggy scenes and decomposes the fog-free background from the participating media — exploiting the multiple views from a short automotive sequence without the need for a large training data corpus. |
Andrea Ramazzina; Mario Bijelic; Stefanie Walz; Alessandro Sanvito; Dominik Scheuble; Felix Heide; |
75 | Sigmoid Loss for Language Image Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple pairwise sigmoid loss for image-text pre-training. |
Xiaohua Zhai; Basil Mustafa; Alexander Kolesnikov; Lucas Beyer; |
76 | PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Generic image captions often miss visual details essential for the LM to answer visual questions correctly. To address this challenge, we propose PromptCap (Prompt-guided image Captioning), a captioning model designed to serve as a better connector between images and black-box LMs. |
Yushi Hu; Hang Hua; Zhengyuan Yang; Weijia Shi; Noah A. Smith; Jiebo Luo; |
77 | Neural Video Depth Stabilizer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: An alternative approach is to learn how to enforce temporal consistency from data, but this requires well-designed models and sufficient video depth data. To address these challenges, we propose a plug-and-play framework called Neural Video Depth Stabilizer (NVDS) that stabilizes inconsistent depth estimations and can be applied to different single-image depth models without extra effort. |
Yiran Wang; Min Shi; Jiaqi Li; Zihao Huang; Zhiguo Cao; Jianming Zhang; Ke Xian; Guosheng Lin; |
78 | Learning Symmetry-Aware Geometry Correspondences for 6D Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a geometry correspondence-based framework, termed GCPose, to estimate 6D pose of arbitrary unseen objects without any re-training. |
Heng Zhao; Shenxing Wei; Dahu Shi; Wenming Tan; Zheyang Li; Ye Ren; Xing Wei; Yi Yang; Shiliang Pu; |
79 | TrackFlow: Multi-Object Tracking with Normalizing Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The field of multi-object tracking has recently seen a renewed interest in the good old schema of tracking-by-detection, as its simplicity and strong priors spare it from the complex design and painful babysitting of tracking-by-attention approaches. In view of this, we aim at extending tracking-by-detection to multi-modal settings, where a comprehensive cost has to be computed from heterogeneous information e.g., 2D motion cues, visual appearance, and pose estimates. |
Gianluca Mancusi; Aniello Panariello; Angelo Porrello; Matteo Fabbri; Simone Calderara; Rita Cucchiara; |
80 | Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the success of recent learning-based approaches for image manipulation detection, they typically require expensive pixel-level annotations to train, while exhibiting degraded performance when testing on images that are differently manipulated compared with training images. To address these limitations, we propose weakly-supervised image manipulation detection, such that only binary image-level labels (authentic or tampered with) are required for training purpose. |
Yuanhao Zhai; Tianyu Luan; David Doermann; Junsong Yuan; |
81 | PARF: Primitive-Aware Radiance Fusion for Indoor Scene Novel View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a method for fast scene radiance field reconstruction with strong novel view synthesis performance and convenient scene editing functionality. |
Haiyang Ying; Baowei Jiang; Jinzhi Zhang; Di Xu; Tao Yu; Qionghai Dai; Lu Fang; |
82 | DeePoint: Visual Pointing Recognition and Direction Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we realize automatic visual recognition and direction estimation of pointing. |
Shu Nakamura; Yasutomo Kawanishi; Shohei Nobuhara; Ko Nishino; |
83 | Periodically Exchange Teacher-Student for Source-Free Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such paradigm can easily fall into a training instability problem that when the teacher model collapses uncontrollably due to the domain shift, the student model also suffers drastic performance degradation. To address this issue, we propose the Periodically Exchange Teacher-Student (PETS) method, a simple yet novel approach that introduces a multiple-teacher framework consisting of a static teacher, a dynamic teacher, and a student model. |
Qipeng Liu; Luojun Lin; Zhifeng Shen; Zhifeng Yang; |
84 | Generating Instance-level Prompts for Rehearsal-free Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Domain-Adaptive Prompt (DAP), a novel method for continual learning using Vision Transformers (ViT). |
Dahuin Jung; Dongyoon Han; Jihwan Bang; Hwanjun Song; |
85 | Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To adaptively leverage the visual clue before and after the occlusion or blurring for robust hand pose estimation, we propose the Deformer: a framework that implicitly reasons about the relationship between hand parts within the same image (spatial dimension) and different timesteps (temporal dimension). |
Qichen Fu; Xingyu Liu; Ran Xu; Juan Carlos Niebles; Kris M. Kitani; |
86 | HSE: Hybrid Species Embedding for Deep Metric Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce Hybrid Species Embedding (HSE), which employs mixed sample data augmentations to generate hybrid species and provide additional training signals. |
Bailin Yang; Haoqiang Sun; Frederick W. B. Li; Zheng Chen; Jianlu Cai; Chao Song; |
87 | Online Continual Learning on Hierarchical Label Expansion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our configuration allows a network to first learn coarse-grained classes, with data labels continually expanding to more fine-grained classes in various hierarchy depths. To tackle this new setup, we propose a rehearsal-based method that utilizes hierarchy-aware pseudo-labeling to incorporate hierarchical class information. |
Byung Hyun Lee; Okchul Jung; Jonghyun Choi; Se Young Chun; |
88 | IDAG: Invariant DAG Searching for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first characterize that this failure of conventional ML models in DG is attributed to an inadequate identification of causal structures. We further propose a novel and theoretically grounded invariant Directed Acyclic Graph (dubbed iDAG) searching framework that attains an invariant graphical relation as the proxy to the causality structure from the intrinsic data-generating process. |
Zenan Huang; Haobo Wang; Junbo Zhao; Nenggan Zheng; |
89 | Spacetime Surface Regularization for Neural Dynamic Scene Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm, 4DRegSDF, for the spacetime surface regularization to improve the fidelity of neural rendering and reconstruction in dynamic scenes. |
Jaesung Choe; Christopher Choy; Jaesik Park; In So Kweon; Anima Anandkumar; |
90 | GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for Indoor Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we found that limited by the scale ambiguity across different scenes in the training dataset, a naive introduction of geometric coarse poses cannot play a positive role in performance improvement, which is counter-intuitive. To address this problem, we propose to refine those poses during training through rotation and translation/scale optimization. |
Chaoqiang Zhao; Matteo Poggi; Fabio Tosi; Lei Zhou; Qiyu Sun; Yang Tang; Stefano Mattoccia; |
91 | 3D Motion Magnification: Visualizing Subtle Motions from Time-Varying Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a 3D motion magnification method that can magnify subtle motions from scenes captured by a moving camera, while supporting novel view rendering. |
Brandon Y. Feng; Hadi Alzayer; Michael Rubinstein; William T. Freeman; Jia-bin Huang; |
92 | Learning to Transform for Generalizable Instance-wise Invariance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Ideally, the appropriate invariance would be learned from data and inferred at test-time. We treat invariance as a prediction problem. Given any image, we predict a distribution over transformations. We use variational inference to learn this distribution end-to-end. |
Utkarsh Singhal; Carlos Esteves; Ameesh Makadia; Stella X. Yu; |
93 | Audio-Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite this, deception detection research is hindered by the lack of high-quality deception datasets, as well as the difficulties of learning multimodal features effectively. To address this issue, we introduce DOLOS, the largest gameshow deception detection dataset with rich deceptive conversations. |
Xiaobao Guo; Nithish Muthuchamy Selvaraj; Zitong Yu; Adams Wai-Kin Kong; Bingquan Shen; Alex Kot; |
94 | Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Some literature has revealed that hard examples are beneficial for modeling a discriminative boundary accurately. By applying such an idea at the instance level, we elaborate a novel MIL framework with masked hard instance mining (MHIM-MIL), which uses a Siamese structure (Teacher-Student) with a consistency constraint to explore the potential hard instances. |
Wenhao Tang; Sheng Huang; Xiaoxian Zhang; Fengtao Zhou; Yi Zhang; Bo Liu; |
95 | Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the inverse problem – given a collection of different images, can we discover the generative concepts that represent each image? |
Nan Liu; Yilun Du; Shuang Li; Joshua B. Tenenbaum; Antonio Torralba; |
96 | Partition-And-Debias: Agnostic Biases Mitigation Via A Mixture of Biases-Specific Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a more challenging scenario, agnostic biases mitigation, aiming at bias removal regardless of whether the type of bias or the number of types is unknown in the datasets. To address this difficult task, we present the Partition-and-Debias (PnD) method that uses a mixture of biases-specific experts to implicitly divide the bias space into multiple subspaces and a gating module to find a consensus among experts to achieve debiased classification. |
Jiaxuan Li; Duc Minh Vo; Hideki Nakayama; |
97 | Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we heuristically propose a Spatial Self-Distillation based Object Detector (SSD-Det) to mine spatial information to refine the inaccurate box in a self-distillation fashion. |
Di Wu; Pengfei Chen; Xuehui Yu; Guorong Li; Zhenjun Han; Jianbin Jiao; |
98 | CC3D: Layout-Conditioned Generation of Compositional 3D Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. |
Sherwin Bahmani; Jeong Joon Park; Despoina Paschalidou; Xingguang Yan; Gordon Wetzstein; Leonidas Guibas; Andrea Tagliasacchi; |
99 | Alleviating Catastrophic Forgetting of Incremental Object Detection Via Within-Class and Between-Class Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss the cause of catastrophic forgetting in IOD task as destruction of semantic feature space. |
Mengxue Kang; Jinpeng Zhang; Jinming Zhang; Xiashuang Wang; Yang Chen; Zhe Ma; Xuhui Huang; |
100 | TextPSG: Panoptic Scene Graph Generation from Textual Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The problem is very challenging for three constraints: 1) no location priors; 2) no explicit links between visual regions and textual entities; and 3) no pre-defined concept sets. To tackle this problem, we propose a new framework TextPSG consisting of four modules, i.e., a region grouper, an entity grounder, a segment merger, and a label generator, with several novel techniques. |
Chengyang Zhao; Yikang Shen; Zhenfang Chen; Mingyu Ding; Chuang Gan; |
101 | Revisiting The Parameter Efficiency of Adapters from The Perspective of Precision Redundancy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate how to make adapters even more efficient, reaching a new minimum size required to store a task-specific fine-tuned network. |
Shibo Jie; Haoqing Wang; Zhi-Hong Deng; |
102 | EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We proposed a diversity-prompting selection strategy and compatibility screening protocol to avoid premature convergence and improve search efficiency. |
Peijie Dong; Lujun Li; Zimian Wei; Xin Niu; Zhiliang Tian; Hengyue Pan; |
103 | Face Clustering Via Graph Convolutional Networks with Confidence Edges Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we define a new concept called confidence edge and guide the construction of graphs. |
Yang Wu; Zhiwei Ge; Yuhao Luo; Lin Liu; Sulong Xu; |
104 | Learning Spatial-context-aware Global Visual Feature Representation for Instance Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel feature learning framework for instance image retrieval, which embeds local spatial context information into the learned global feature representations. |
Zhongyan Zhang; Lei Wang; Luping Zhou; Piotr Koniusz; |
105 | Cross-modal Latent Space Alignment for Image to Avatar Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel method for automatic vectorized avatar generation from a single portrait image. |
Manuel Ladron de Guevara; Jose Echevarria; Yijun Li; Yannick Hold-Geoffroy; Cameron Smith; Daichi Ito; |
106 | Inspecting The Geographical Representativeness of Images from Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we measure the geographical representativeness of common nouns (e.g., a house) generated through DALL.E 2 and Stable Diffusion models using a crowdsourced study comprising 540 participants across 27 countries. |
Abhipsa Basu; R. Venkatesh Babu; Danish Pruthi; |
107 | Space-time Prompting for Video Class-incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, prompt-based learning has made impressive progress on image class-incremental learning, but it still lacks sufficient exploration in the video domain. In this paper, we will fill this gap by learning multiple prompts based on a powerful image-language pre-trained model, i.e., CLIP, making it fit for video class-incremental learning (VCIL). |
Yixuan Pei; Zhiwu Qing; Shiwei Zhang; Xiang Wang; Yingya Zhang; Deli Zhao; Xueming Qian; |
108 | Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. |
Alberto Baldrati; Davide Morelli; Giuseppe Cartella; Marcella Cornia; Marco Bertini; Rita Cucchiara; |
109 | Time-to-Contact Map By Joint Estimation of Up-to-Scale Inverse Depth and Global Motion Using A Single Event Camera Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem of time-to-contact (TTC) estimation using a single event camera. |
Urbano Miguel Nunes; Laurent Udo Perrinet; Sio-Hoi Ieng; |
110 | Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to present an efficient and flexible mechanism to learn and model degradation relationships in a global view, thereby achieving a unified removal of intricate rain scenes. |
Sixiang Chen; Tian Ye; Jinbin Bai; Erkang Chen; Jun Shi; Lei Zhu; |
111 | A Benchmark for Chinese-English Scene Text Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a real-world Chinese-English benchmark dataset, namely Real-CE, for the task of STISR with the emphasis on restoring structurally complex Chinese characters. |
Jianqi Ma; Zhetong Liang; Wangmeng Xiang; Xi Yang; Lei Zhang; |
112 | HSR-Diff: Hyperspectral Image Super-Resolution Via Conditional Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by recent advancements in deep generative models, we propose an HSI Super-resolution (SR) approach with Conditional Diffusion Models (HSR-Diff) that merges a high-resolution (HR) multispectral image (MSI) with the corresponding LR-HSI. |
Chanyue Wu; Dong Wang; Yunpeng Bai; Hanyu Mao; Ying Li; Qiang Shen; |
113 | Replay: Multi-modal Multi-view Acted Videos for Casual Holography Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Replay, a collection of multi-view, multi-modal videos of humans interacting socially. |
Roman Shapovalov; Yanir Kleiman; Ignacio Rocco; David Novotny; Andrea Vedaldi; Changan Chen; Filippos Kokkinos; Ben Graham; Natalia Neverova; |
114 | Advancing Example Exploitation Can Alleviate Critical Challenges in Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, we investigate the role of examples in AT and find that examples which contribute primarily to accuracy or robustness are distinct. Based on this finding, we propose a novel example-exploitation idea that can further improve the performance of advanced AT methods. |
Yao Ge; Yun Li; Keji Han; Junyi Zhu; Xianzhong Long; |
115 | Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions and is trained collaboratively through two sub-networks, a global and a local network. |
Junjia Huang; Haofeng Li; Xiang Wan; Guanbin Li; |
116 | Removing Anomalies As Noises for Industrial Defect Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a denoising model to detect and localize the anomalies with a generative diffusion model. |
Fanbin Lu; Xufeng Yao; Chi-Wing Fu; Jiaya Jia; |
117 | GPGait: Generalized Pose-based Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve the generalization ability of pose-based methods across datasets, we propose a Generalized Pose-based Gait recognition (GPGait) framework. |
Yang Fu; Shibei Meng; Saihui Hou; Xuecai Hu; Yongzhen Huang; |
118 | Stable and Causal Inference for Discriminative Self-supervised Deep Visual Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although many studies have demonstrated the empirical success of various learning methods, the resulting learned representations can exhibit instability and hinder downstream performance. In this study, we analyze discriminative self-supervised methods from a causal perspective to explain these unstable behaviors and propose solutions to overcome them. |
Yuewei Yang; Hai Li; Yiran Chen; |
119 | ShiftNAS: Improving One-shot NAS Via Probability Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the performance gap and attribute it to the use of uniform sampling, which is a common approach in supernet training. |
Mingyang Zhang; Xinyi Yu; Haodong Zhao; Linlin Ou; |
120 | Semantic Attention Flow Fields for Monocular Dynamic Scene Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From video, we reconstruct a neural volume that captures time-varying color, density, scene flow, semantics, and attention information. |
Yiqing Liang; Eliot Laidlaw; Alexander Meyerowitz; Srinath Sridhar; James Tompkin; |
121 | LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A critical gap emerges from representing continuous image data in a sparse vocabulary space. To bridge this gap, we introduce a novel pre-training framework, Lexicon-Bottlenecked Language-Image Pre-Training (LexLIP), that learns importance-aware lexicon representations. |
Ziyang Luo; Pu Zhao; Can Xu; Xiubo Geng; Tao Shen; Chongyang Tao; Jing Ma; Qingwei Lin; Daxin Jiang; |
122 | A Fast Unified System for 3D Object Detection and Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present FUS3D, a fast and lightweight system for real-time 3D object detection and tracking on edge devices. |
Thomas Heitzinger; Martin Kampel; |
123 | Adaptive Testing of Computer Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce AdaVision, an interactive process for testing vision models which helps users identify and fix coherent failure modes. |
Irena Gao; Gabriel Ilharco; Scott Lundberg; Marco Tulio Ribeiro; |
124 | LFS-GAN: Lifelong Few-Shot Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the other hand, the existing few-shot GANs suffer from severe catastrophic forgetting when learning multiple tasks. To alleviate these issues, we propose a framework called Lifelong Few-Shot GAN (LFS-GAN) that can generate high-quality and diverse images in lifelong few-shot image generation task. |
Juwon Seo; Ji-Su Kang; Gyeong-Moon Park; |
125 | AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle in naturalistic scenarios. |
Dingkang Yang; Shuai Huang; Zhi Xu; Zhenpeng Li; Shunli Wang; Mingcheng Li; Yuzheng Wang; Yang Liu; Kun Yang; Zhaoyu Chen; Yan Wang; Jing Liu; Peixuan Zhang; Peng Zhai; Lihua Zhang; |
126 | Feature Proliferation — The "Cancer" in StyleGAN and Its Treatments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although effective, it has long been noted that the truncation trick tends to reduce the diversity of synthesized images and unnecessarily sacrifices many distinct image features. To address this issue, in this paper, we first delve into the StyleGAN image synthesis mechanism and discover an important phenomenon, namely Feature Proliferation, which demonstrates how specific features reproduce with forward propagation. |
Shuang Song; Yuanbang Liang; Jing Wu; Yu-Kun Lai; Yipeng Qin; |
127 | Self-Supervised Character-to-Character Distillation for Text Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing self-supervised text recognition methods conduct sequence-to-sequence representation learning by roughly splitting the visual features along the horizontal axis, which limits the flexibility of the augmentations, as large geometric-based augmentations may lead to sequence-to-sequence feature inconsistency. Motivated by this, we propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate general text representation learning. |
Tongkun Guan; Wei Shen; Xue Yang; Qi Feng; Zekun Jiang; Xiaokang Yang; |
128 | MixCycle: Mixup Assisted Semi-Supervised 3D Single Object Tracking with Cycle Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the great success of cycle tracking in unsupervised 2D SOT, we introduce the first semi-supervised approach to 3D SOT. |
Qiao Wu; Jiaqi Yang; Kun Sun; Chu’ai Zhang; Yanning Zhang; Mathieu Salzmann; |
129 | Multi-Label Self-Supervised Learning with Scene Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Self-supervised learning (SSL) methods targeting scene images have seen a rapid growth recently, and they mostly rely on either a dedicated dense matching mechanism or a costly unsupervised object discovery module. This paper shows that instead of hinging on these strenuous operations, quality image representations can be learned by treating scene/multi-label image SSL simply as a multi-label classification problem, which greatly simplifies the learning framework. |
Ke Zhu; Minghao Fu; Jianxin Wu; |
130 | Domain Adaptive Few-Shot Open-Set Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing techniques fall short when it comes to identifying target outliers under domain shifts by learning to reject pseudo-outliers from the source domain, resulting in an incomplete solution to both problems. To address these challenges comprehensively, we propose a novel approach called Domain Adaptive Few-Shot Open Set Recognition (DA-FSOS) and introduce a meta-learning-based architecture named DAFOS-Net. |
Debabrata Pal; Deeptej More; Sai Bhargav; Dipesh Tamboli; Vaneet Aggarwal; Biplab Banerjee; |
131 | DiffFacto: Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DiffFacto, a novel probabilistic generative model that learns the distribution of shapes with part-level control. |
George Kiyohiro Nakayama; Mikaela Angelina Uy; Jiahui Huang; Shi-Min Hu; Ke Li; Leonidas Guibas; |
132 | Interactive Class-Agnostic Object Counting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel framework for interactive class-agnostic object counting, where a human user can interactively provide feedback to improve the accuracy of a counter. |
Yifeng Huang; Viresh Ranjan; Minh Hoai; |
133 | Spatio-temporal Prompting Network for Robust Video Feature Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a neat and unified framework, called Spatio-Temporal Prompting Network (STPN). |
Guanxiong Sun; Chi Wang; Zhaoyu Zhang; Jiankang Deng; Stefanos Zafeiriou; Yang Hua; |
134 | Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enhance the fine-tuning based defense, inspired by the observation that the backdoor-related neurons often have larger weight norms, we propose FT-SAM, a novel backdoor defense paradigm that aims to shrink the norms of backdoorrelated neurons by incorporating sharpness-aware minimization with fine-tuning. |
Mingli Zhu; Shaokui Wei; Li Shen; Yanbo Fan; Baoyuan Wu; |
135 | Deep Geometry-Aware Camera Self-Calibration from Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a camera self-calibration approach that infers camera intrinsics during application, from monocular videos in the wild. |
Annika Hagemann; Moritz Knorr; Christoph Stiller; |
136 | A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study weakly semi-supervised 3D object detection (WSS3D) with point annotations, where the dataset comprises a small number of fully labeled and massive weakly labeled data with a single point annotated for each 3D object. |
Dingyuan Zhang; Dingkang Liang; Zhikang Zou; Jingyu Li; Xiaoqing Ye; Zhe Liu; Xiao Tan; Xiang Bai; |
137 | Estimator Meets Equilibrium Perspective: A Rectified Straight Through Estimator for Binary Neural Networks Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fully take the gradient stability into consideration, we present a new perspective to the BNNs training, regarding it as the equilibrium between the estimating error and the gradient stability. |
Xiao-Ming Wu; Dian Zheng; Zuhao Liu; Wei-Shi Zheng; |
138 | Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a long-sequence modeling framework, named StreamPETR, for multi-view 3D object detection. |
Shihao Wang; Yingfei Liu; Tiancai Wang; Ying Li; Xiangyu Zhang; |
139 | Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing image classification benchmarks often evaluate recognition on a specific domain (e.g., outdoor images) or a specific task (e.g., classifying plant species), which falls short of evaluating whether pre-trained foundational models are universal visual recognizers. To address this, we formally present the task of Open-domain Visual Entity recognitioN (OVEN), where a model need to link an image onto a Wikipedia entity with respect to a text query. |
Hexiang Hu; Yi Luan; Yang Chen; Urvashi Khandelwal; Mandar Joshi; Kenton Lee; Kristina Toutanova; Ming-Wei Chang; |
140 | MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. |
Chaoyi Wu; Xiaoman Zhang; Ya Zhang; Yanfeng Wang; Weidi Xie; |
141 | Automated Knowledge Distillation Via Monte Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Auto-KD, the first automated search framework for optimal knowledge distillation design. |
Lujun Li; Peijie Dong; Zimian Wei; Ya Yang; |
142 | EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods often neglect emotional facial expressions or fail to disentangle them from speech content. To address this issue, this paper proposes an end-to-end neural network to disentangle different emotions in speech so as to generate rich 3D facial expressions. |
Ziqiao Peng; Haoyu Wu; Zhenbo Song; Hao Xu; Xiangyu Zhu; Jun He; Hongyan Liu; Zhaoxin Fan; |
143 | A Soft Nearest-Neighbor Framework for Continual Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual semi-supervised learning–a setting where not all the data samples are labeled. |
Zhiqi Kang; Enrico Fini; Moin Nabi; Elisa Ricci; Karteek Alahari; |
144 | Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent token-based approaches achieve competitive performance to diffusion-based models, their generation performance is still suboptimal as they sample multiple tokens simultaneously without considering the dependence among them. We empirically investigate this problem and propose a learnable sampling model, Text-Conditioned Token Selection (TCTS), to select optimal tokens via localized supervision with text information. |
Jaewoong Lee; Sangwon Jang; Jaehyeong Jo; Jaehong Yoon; Yunji Kim; Jin-Hwa Kim; Jung-Woo Ha; Sung Ju Hwang; |
145 | ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ScanNet++, a large-scale dataset that couples together capture of high-quality and commodity-level geometry and color of indoor scenes. |
Chandan Yeshwanth; Yueh-Cheng Liu; Matthias Nießner; Angela Dai; |
146 | Minimal Solutions to Uncalibrated Two-view Geometry with Known Epipoles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes minimal solutions to uncalibrated two-view geometry with known epipoles. |
Gaku Nakano; |
147 | Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the problem, we propose a novel method to find semantic variations of the target text in the CLIP space. |
Seogkyu Jeon; Bei Liu; Pilhyeon Lee; Kibeom Hong; Jianlong Fu; Hyeran Byun; |
148 | Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue, we propose the CPEM (Context-aware Planner and Environment-aware Memory) embodied agent to incorporate the contextual information of previous actions for planning and maintaining spatial arrangement of objects with their states (e.g., if an object has been already moved or not) in the environment to the perception model for improving both visual navigation and object interactions. |
Byeonghwi Kim; Jinyeon Kim; Yuyeong Kim; Cheolhong Min; Jonghyun Choi; |
149 | Vox-E: Text-Guided Voxel Editing of 3D Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a technique that harnesses the power of latent diffusion models for editing existing 3D objects. |
Etai Sella; Gal Fiebelman; Peter Hedman; Hadar Averbuch-Elor; |
150 | Inverse Problem Regularization with Hierarchical Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to regularize ill-posed inverse problems using a deep hierarchical Variational AutoEncoder (HVAE) as an image prior. |
Jean Prost; Antoine Houdard; Andrés Almansa; Nicolas Papadakis; |
151 | Unpaired Multi-domain Attribute Translation of 3D Facial Shapes with A Square and Symmetric Geometric Map Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is primarily limited by the lack of 3D generative models and ineffective usage of 3D facial data. We propose a learning framework for 3D facial attribute translation to relieve these limitations. |
Zhenfeng Fan; Zhiheng Zhang; Shuang Yang; Chongyang Zhong; Min Cao; Shihong Xia; |
152 | Passive Ultra-Wideband Single-Photon Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of imaging a dynamic scene over an extreme range of timescales simultaneously–seconds to picoseconds–and doing so passively, without much light, and without any timing signals from the light source(s) emitting it. |
Mian Wei; Sotiris Nousias; Rahul Gulve; David B. Lindell; Kiriakos N. Kutulakos; |
153 | Template Inversion Attack Against Face Recognition Systems Using 3D Face Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on template inversion attacks against face recognition systems and introduce a novel method (dubbed GaFaR) to reconstruct 3D face from facial templates. |
Hatef Otroshi Shahreza; Sébastien Marcel; |
154 | ETran: Energy-Based Transferability Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose ETran, an energy-based transferability assessment metric, which includes three scores: 1) energy score, 2) classification score, and 3) regression score. |
Mohsen Gholami; Mohammad Akbari; Xinglu Wang; Behnam Kamranian; Yong Zhang; |
155 | Predict to Detect: Prediction-guided 3D Object Detection Using Sequential Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These approaches do not fully exploit the potential of sequential images and show limited performance improvements. To address this limitation, we propose a novel 3D object detection model, P2D (Predict to Detect), that integrates a prediction scheme into a detection framework to explicitly extract and leverage motion features. |
Sanmin Kim; Youngseok Kim; In-Jae Lee; Dongsuk Kum; |
156 | Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation for Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation (UniCon-HA), taking into account both the requirements above. |
Guodong Wang; Yunhong Wang; Jie Qin; Dongming Zhang; Xiuguo Bao; Di Huang; |
157 | Learning Image-Adaptive Codebooks for Class-Agnostic Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose AdaCode for learning image-adaptive codebooks for class-agnostic image restoration. |
Kechun Liu; Yitong Jiang; Inchang Choi; Jinwei Gu; |
158 | 3D Segmentation of Humans in Point Clouds with Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Few works have attempted to directly segment humans in cluttered 3D scenes, which is largely due to the lack of annotated training data of humans interacting with 3D scenes. We address this challenge and propose a framework for generating training data of synthetic humans interacting with real 3D scenes. |
Ayça Takmaz; Jonas Schult; Irem Kaftan; Mertcan Akçay; Bastian Leibe; Robert Sumner; Francis Engelmann; Siyu Tang; |
159 | Mastering Spatial Graph Prediction of Road Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accurately predicting road networks from satellite images requires a global understanding of the network topology. We propose to capture such high-level information by introducing a graph-based framework that given a partially generated graph, sequentially adds new edges. |
Anagnostidis Sotiris; Aurelien Lucchi; Thomas Hofmann; |
160 | IDiff-Face: Synthetic-based Face Recognition Through Fizzy Identity-Conditioned Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent synthetic datasets that are used to train face recognition models suffer either from limitations in intra-class diversity or cross-class (identity) discrimination, leading to less optimal accuracies, far away from the accuracies achieved by models trained on authentic data. This paper targets this issue by proposing IDiff-Face, a novel approach based on conditional latent diffusion models for synthetic identity generation with realistic identity variations for face recognition training. |
Fadi Boutros; Jonas Henry Grebe; Arjan Kuijper; Naser Damer; |
161 | Deep Video Demoireing Via Compact Invertible Dyadic Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By interpreting video demoireing as a multi-frame decomposition problem, we propose a compact invertible dyadic network called CIDNet that progressively decouples latent frames and the moire patterns from an input video sequence. |
Yuhui Quan; Haoran Huang; Shengfeng He; Ruotao Xu; |
162 | Rethinking Multi-Contrast MRI Super-Resolution: Rectangle-Window Cross-Attention Transformer and Arbitrary-Scale Upsampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, we propose the reference-aware implicit attention as an upsampling module, achieving arbitrary-scale super-resolution via implicit neural representation, further fusing supplementary information of the reference image. |
Guangyuan Li; Lei Zhao; Jiakai Sun; Zehua Lan; Zhanjie Zhang; Jiafu Chen; Zhijie Lin; Huaizhong Lin; Wei Xing; |
163 | Domain Generalization Via Rationale Invariance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For a well-generalized model, we suggest the rationale matrices for samples belonging to the same category should be similar, indicating the model relies on domain-invariant clues to make decisions, thereby ensuring robust results. To implement this idea, we introduce a rationale invariance loss as a simple regularization technique, requiring only a few lines of code. |
Liang Chen; Yong Zhang; Yibing Song; Anton van den Hengel; Lingqiao Liu; |
164 | ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained VLMs via inter/intra-modal alignment in a post-hoc manner without needing large-scale datasets or computing. |
Uddeshya Upadhyay; Shyamgopal Karthik; Massimiliano Mancini; Zeynep Akata; |
165 | Towards Open-Set Test-Time Adaptation Utilizing The Wisdom of Crowds in Entropy Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Long-term stable adaptation is hampered by such noisy signals, so training models without such error accumulation is crucial for practical TTA. To address these issues, including open-set TTA, we propose a simple yet effective sample selection method inspired by the following crucial empirical finding. |
Jungsoo Lee; Debasmit Das; Jaegul Choo; Sungha Choi; |
166 | Scene Graph Contrastive Learning for Embodied Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Scene Graph Contrastive (SGC) loss, which uses scene graphs as training-only supervisory signals. |
Kunal Pratap Singh; Jordi Salvador; Luca Weihs; Aniruddha Kembhavi; |
167 | Long-Range Grouping Transformer for Multi-View 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To alleviate this problem, recent methods compress the token number representing each view or discard the attention operations between the tokens from different views. Obviously, they give a negative impact on performance. Therefore, we propose long-range grouping attention (LGA) based on the divide-and-conquer principle. |
Liying Yang; Zhenwei Zhu; Xuxin Lin; Jian Nong; Yanyan Liang; |
168 | Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Latent-OFER, the proposed method, can detect occlusions, restore occluded parts of the face as if they were unoccluded, and recognize them, improving FER accuracy. |
Isack Lee; Eungi Lee; Seok Bong Yoo; |
169 | DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the DenseShift network, which significantly improves the accuracy of Shift networks, achieving competitive performance to full-precision networks for vision and speech applications. |
Xinlin Li; Bang Liu; Rui Heng Yang; Vanessa Courville; Chao Xing; Vahid Partovi Nia; |
170 | Preparing The Future for Continual Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we focus on Continual Semantic Segmentation (CSS) and present a novel approach to tackle the issue of existing methods struggling to learn new classes. |
Zihan Lin; Zilei Wang; Yixin Zhang; |
171 | Efficient Computation Sharing for Multi-Task Visual Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel computation- and parameter-sharing framework that balances efficiency and accuracy to perform multiple visual tasks utilizing individually-trained single-task transformers. |
Sara Shoouri; Mingyu Yang; Zichen Fan; Hun-Seok Kim; |
172 | Self-supervised Cross-view Representation Reconstruction for Change Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Its key challenge is how to learn a stable difference representation under pseudo changes caused by viewpoint change. In this paper, we address this by proposing a self-supervised cross-view representation reconstruction (SCORER) network. |
Yunbin Tu; Liang Li; Li Su; Zheng-Jun Zha; Chenggang Yan; Qingming Huang; |
173 | Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an Unify, Align and then Refine (UAR) approach to learn multi-level cross-modal alignments and introduce three novel modules: Latent Space Unifier (LSU), Cross-modal Representation Aligner (CRA) and Text-to-Image Refiner (TIR). |
Yaowei Li; Bang Yang; Xuxin Cheng; Zhihong Zhu; Hongxiang Li; Yuexian Zou; |
174 | Synthesizing Diverse Human Motions in 3D Indoor Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel method for populating 3D indoor scenes with virtual humans that can navigate in the environment and interact with objects in a realistic manner. |
Kaifeng Zhao; Yan Zhang; Shaofei Wang; Thabo Beeler; Siyu Tang; |
175 | Deep Optics for Video Snapshot Compressive Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, there are two clouds in the sunshine of SCI: i) low dynamic range as a victim of high temporal multiplexing, and ii) existing deep learning algorithms’ degradation on real system. To address these challenges, this paper presents a deep optics framework to jointly optimize masks and a reconstruction network. |
Ping Wang; Lishun Wang; Xin Yuan; |
176 | DDIT: Semantic Scene Completion Via Deformable Deep Implicit Templates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the completed shapes may be rough and imprecise since respective methods rely on 3D convolution and/or lack effective shape constraints. To overcome these limitations, we propose a semantic scene completion method based on deformable deep implicit templates (DDIT). |
Haoang Li; Jinhu Dong; Binghui Wen; Ming Gao; Tianyu Huang; Yun-Hui Liu; Daniel Cremers; |
177 | Joint Demosaicing and Deghosting of Time-Varying Exposures for Single-Shot HDR Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, time-varying exposures are not ideal for dynamic scenes and require an additional deghosting method. To tackle this issue, we propose a single-shot HDR demosaicing method that takes time-varying multiple exposures as input and jointly solves both the demosaicing and deghosting problems. |
Jungwoo Kim; Min H. Kim; |
178 | Scene-Aware Feature Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This results in significant performance degradation when handling challenging scenes such as scenes with large viewpoint and illumination changes. To tackle this problem, we propose a novel model named SAM, which applies attentional grouping to guide Scene-Aware feature Matching. |
Xiaoyong Lu; Yaping Yan; Tong Wei; Songlin Du; |
179 | FDViT: Improve The Hierarchical Architecture of Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose FDViT to improve the hierarchical architecture of the vision transformer by using a flexible downsampling layer that is not limited to integer stride to smoothly reduce the sizes of the middle feature maps. |
Yixing Xu; Chao Li; Dong Li; Xiao Sheng; Fan Jiang; Lu Tian; Ashish Sirasao; |
180 | Tuning Pre-trained Model Via Moment Probing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Moment Probing (MP) method to further explore the potential of LP. |
Mingze Gao; Qilong Wang; Zhenyi Lin; Pengfei Zhu; Qinghua Hu; Jingbo Zhou; |
181 | Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation. |
Haoyu Cao; Changcun Bao; Chaohu Liu; Huang Chen; Kun Yin; Hao Liu; Yinsong Liu; Deqiang Jiang; Xing Sun; |
182 | Task Agnostic Restoration of Natural Video Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: State-Of-The-Art (SOTA) techniques that address these inconsistencies rely on the availability of unprocessed videos to implicitly siphon and utilize consistent video dynamics to restore the temporal consistency of frame-wise processed videos which often jeopardizes the translation effect. We propose a general framework for this task that learns to infer and utilize consistent motion dynamics from inconsistent videos to mitigate the temporal flicker while preserving the perceptual quality for both the temporally neighboring and relatively distant frames without requiring the raw videos at test time. |
Muhammad Kashif Ali; Dongjin Kim; Tae Hyun Kim; |
183 | TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present TMR, a simple yet effective approach for text to 3D human motion retrieval. |
Mathis Petrovich; Michael J. Black; Gül Varol; |
184 | 3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce probabilistic modeling to the inverse graphics framework to quantify uncertainty and achieve robustness in 6D pose estimation tasks. |
Guangyao Zhou; Nishad Gothoskar; Lirui Wang; Joshua B. Tenenbaum; Dan Gutfreund; Miguel Lázaro-Gredilla; Dileep George; Vikash K. Mansinghka; |
185 | Towards Robust Model Watermark Via Reducing Parametric Vulnerability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To further explore this vulnerability, we investigate the parametric space and find there exist many watermark-removed models in the vicinity of the watermarked one, which may be easily used by removal attacks. Inspired by this finding, we propose a minimax formulation to find these watermark-removed models and recover their watermark behavior. |
Guanhao Gan; Yiming Li; Dongxian Wu; Shu-Tao Xia; |
186 | SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel training strategy called SupFusion, which provides an auxiliary feature level supervision for effective LiDAR-Camera fusion and significantly boosts detection performance. |
Yiran Qin; Chaoqun Wang; Zijian Kang; Ningning Ma; Zhen Li; Ruimao Zhang; |
187 | EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the issues, this paper proposes Emotional Motion Memory Net (EMMN) that synthesizes expression overall on the talking face via emotion embedding and lip motion instead of the sole audio. |
Shuai Tan; Bin Ji; Ye Pan; |
188 | Rethinking Vision Transformers for MobileNet Size and Speed Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate a central question, can transformer models run as fast as MobileNet and maintain a similar size? |
Yanyu Li; Ju Hu; Yang Wen; Georgios Evangelidis; Kamyar Salahi; Yanzhi Wang; Sergey Tulyakov; Jian Ren; |
189 | Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, dramatic and complex motions in the driving video cause ambiguous generation, because the still source image cannot provide sufficient appearance information for occluded regions or delicate expression variations, which produces severe artifacts and significantly degrades the generation quality. To tackle this problem, we propose to learn a global facial representation space, and design a novel implicit identity representation conditioned memory compensation network, coined as MCNet, for high-fidelity talking head generation. |
Fa-Ting Hong; Dan Xu; |
190 | SINC: Self-Supervised In-Context Learning for Vision-Language Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To answer it, we propose a succinct and general framework, Self-supervised IN-Context learning (SINC), that introduces a meta-model to learn on self-supervised prompts consisting of tailored demonstrations. |
Yi-Syuan Chen; Yun-Zhu Song; Cheng Yu Yeo; Bei Liu; Jianlong Fu; Hong-Han Shuai; |
191 | LEA2: A Lightweight Ensemble Adversarial Attack Via Non-overlapping Vulnerable Frequency Regions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we find three types of models with non-overlapping vulnerable frequency regions, which can cover a large enough vulnerable subspace. |
Yaguan Qian; Shuke He; Chenyu Zhao; Jiaqiang Sha; Wei Wang; Bin Wang; |
192 | Chupa: Carving 3D Clothed Humans from Skinned Shape Priors Using 2D Diffusion Probabilistic Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a 3D generation pipeline that uses diffusion models to generate realistic human digital avatars. |
Byungjun Kim; Patrick Kwon; Kwangho Lee; Myunggi Lee; Sookwan Han; Daesik Kim; Hanbyul Joo; |
193 | Unsupervised Domain Adaptive Detection with Network Stability Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, drawing inspiration from the concept of stability from the control theory that a robust system requires to remain consistent both externally and internally regardless of disturbances, we propose a novel framework that achieves unsupervised domain adaptive detection through stability analysis. |
Wenzhang Zhou; Heng Fan; Tiejian Luo; Libo Zhang; |
194 | Learning A Room with The Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our findings show that the color rendering loss creates an optimization bias against low-intensity areas, resulting in gradient vanishing and leaving these areas unoptimized. To address this issue, we propose a feature-based color rendering loss that utilizes non-zero feature values to bring back optimization signals. |
Xiaoyang Lyu; Peng Dai; Zizhang Li; Dongyu Yan; Yi Lin; Yifan Peng; Xiaojuan Qi; |
195 | Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we define and study a new Cloth2Body problem which has a goal of generating 3d human body meshes from a 2D clothing image. |
Lu Dai; Liqian Ma; Shenhan Qian; Hao Liu; Ziwei Liu; Hui Xiong; |
196 | Spatially and Spectrally Consistent Deep Functional Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Cycle consistency has long been exploited as a powerful prior for jointly optimizing maps within a collection of shapes. In this paper, we investigate its utility in the approaches of Deep Functional Maps, which are considered state-of-the-art in non-rigid shape matching. |
Mingze Sun; Shiwei Mao; Puhua Jiang; Maks Ovsjanikov; Ruqi Huang; |
197 | Sparse Point Guided 3D Lane Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a sparse point-guided 3D lane detection, focusing on points related to 3D lanes. |
Chengtang Yao; Lidong Yu; Yuwei Wu; Yunde Jia; |
198 | Event-based Temporally Dense Optical Flow Estimation with Sequential Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that a temporally dense flow estimation at 100Hz can be achieved by treating the flow estimation as a sequential problem using two different variants of recurrent networks – Long-short term memory (LSTM) and spiking neural network (SNN). |
Wachirawit Ponghiran; Chamika Mihiranga Liyanagedera; Kaushik Roy; |
199 | Going Beyond Nouns With Vision & Language Models Using Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, their difficulty to understand Visual Language Concepts (VLC) that go ‘beyond nouns’ such as the meaning of non-object words (e.g., attributes, actions, relations, states, etc.), or difficulty in performing compositional reasoning such as understanding the significance of the order of the words in a sentence. In this work, we investigate to which extent purely synthetic data could be leveraged to teach these models to overcome such shortcomings without compromising their zero-shot capabilities. |
Paola Cascante-Bonilla; Khaled Shehada; James Seale Smith; Sivan Doveh; Donghyun Kim; Rameswar Panda; Gul Varol; Aude Oliva; Vicente Ordonez; Rogerio Feris; Leonid Karlinsky; |
200 | Continual Zero-Shot Learning Through Semantically Guided Generative Random Walks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the challenge of continual zero-shot learning where unseen information is not provided during training, by leveraging generative modeling. |
Wenxuan Zhang; Paul Janson; Kai Yi; Ivan Skorokhodov; Mohamed Elhoseiny; |
201 | Foreground-Background Distribution Modeling Transformer for Visual Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the feature learning of these Transformer-based trackers is easily disturbed by complex backgrounds. To address the above limitations, we propose a novel foreground-background distribution modeling transformer for visual object tracking (F-BDMTrack), including a fore-background agent learning (FBAL) module and a distribution-aware attention (DA2) module in a unified transformer architecture. |
Dawei Yang; Jianfeng He; Yinchao Ma; Qianjin Yu; Tianzhu Zhang; |
202 | MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To investigate the feasibility of using motion expressions to ground and segment objects in videos, we propose a large-scale dataset called MeViS, which contains numerous motion expressions to indicate target objects in complex environments. |
Henghui Ding; Chang Liu; Shuting He; Xudong Jiang; Chen Change Loy; |
203 | OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Omni-suPErvised Representation leArning with hierarchical supervisions (OPERA) as a solution. |
Chengkun Wang; Wenzhao Zheng; Zheng Zhu; Jie Zhou; Jiwen Lu; |
204 | GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, from the perspective of feature extraction, most existing pFL methods only focus on extracting global or personalized feature information during local training, which fails to meet the collaborative learning and personalization goals of pFL. To address this, we propose a new pFL method, named GPFL, to simultaneously learn global and personalized feature information on each client. |
Jianqing Zhang; Yang Hua; Hao Wang; Tao Song; Zhengui Xue; Ruhui Ma; Jian Cao; Haibing Guan; |
205 | Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods require computationally expensive fine-tuning of diffusion models or additional neural network. To address this, here we propose a zero-shot contrastive loss for diffusion models that doesn’t require additional fine-tuning or auxiliary networks. |
Serin Yang; Hyunmin Hwang; Jong Chul Ye; |
206 | Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents ER-NeRF, a novel conditional Neural Radiance Fields (NeRF) based architecture for talking portrait synthesis that can concurrently achieve fast convergence, real-time rendering, and state-of-the-art performance with small model size. |
Jiahe Li; Jiawei Zhang; Xiao Bai; Jun Zhou; Lin Gu; |
207 | End2End Multi-View Feature Matching with Differentiable Pose Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a graph attention network to predict image correspondences along with confidence weights. |
Barbara Roessle; Matthias Nießner; |
208 | Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete Image Modelling Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel network structure with illumination-aware gamma correction and complete image modelling to solve the low-light image enhancement problem. |
Yinglong Wang; Zhen Liu; Jianzhuang Liu; Songcen Xu; Shuaicheng Liu; |
209 | Both Diverse and Realism Matter: Physical Attribute and Style Alignment for Rainy Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing RIG methods mainly focus on diversity but miss realistic, or the realistic but neglect diversity of the generation. To solve this dilemma, we propose a physical alignment and controllable generation network (PCGNet) for diverse and realistic rain generation. |
Changfeng Yu; Shiming Chen; Yi Chang; Yibing Song; Luxin Yan; |
210 | Exploring The Benefits of Visual Prompting in Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore the benefits of VP in constructing compelling neural network classifiers with differential privacy (DP). |
Yizhe Li; Yu-Lin Tsai; Chia-Mu Yu; Pin-Yu Chen; Xuebin Ren; |
211 | Single Image Reflection Separation Via Component Synergy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, based on the investigation of the weaknesses of existing models, we propose a more general form of the superposition model by introducing a learnable residue term, which can effectively capture residual information during decomposition, guiding the separated layers to be complete. |
Qiming Hu; Xiaojie Guo; |
212 | Mining Bias-target Alignment from Voronoi Cells Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a bias-agnostic approach to mitigate the impact of biases in deep neural networks. |
Rémi Nahon; Van-Tam Nguyen; Enzo Tartaglione; |
213 | The Victim and The Beneficiary: Exploiting A Poisoned Model to Train A Clean Model on Poisoned Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we find that the poisoned samples and benign samples can be distinguished with prediction entropy. |
Zixuan Zhu; Rui Wang; Cong Zou; Lihua Jing; |
214 | DIFFGUARD: Semantic Mismatch-Guided Out-of-Distribution Detection Using Pre-Trained Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As diffusion models are much easier to train and amenable to various conditions compared to cGANs, in this work, we propose to directly use pre-trained diffusion models for semantic mismatch-guided OOD detection, named DiffGuard. |
Ruiyuan Gao; Chenchen Zhao; Lanqing Hong; Qiang Xu; |
215 | Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose an Identity-seeking Self-supervised Representation learning (ISR) method. |
Zhaopeng Dou; Zhongdao Wang; Yali Li; Shengjin Wang; |
216 | 3D-Aware Generative Model for Improved Side-View Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SideGAN, a novel 3D GAN training method to generate photo-realistic images irrespective of the camera pose, especially for faces of side-view angles. |
Kyungmin Jo; Wonjoon Jin; Jaegul Choo; Hyunjoon Lee; Sunghyun Cho; |
217 | Tracking Anything with Decoupled Video Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To ‘track anything’ without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. |
Ho Kei Cheng; Seoung Wug Oh; Brian Price; Alexander Schwing; Joon-Young Lee; |
218 | Generative Gradient Inversion Via Over-Parameterized Networks in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, our study shows that local participants in a federated learning system are vulnerable to potential data leakage issues. |
Chi Zhang; Zhang Xiaoman; Ekanut Sotthiwat; Yanyu Xu; Ping Liu; Liangli Zhen; Yong Liu; |
219 | EQ-Net: Elastic Quantization Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore a one-shot network quantization regime, named Elastic Quantization Neural Networks (EQ-Net), which aims to train a robust weight-sharing quantization supernet. |
Ke Xu; Lei Han; Ye Tian; Shangshang Yang; Xingyi Zhang; |
220 | OxfordTVG-HIC: Can Machine Make Humorous Captions from Images? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents OxfordTVG-HIC (Humorous Image Captions), a large-scale dataset for humour generation and understanding. |
Runjia Li; Shuyang Sun; Mohamed Elhoseiny; Philip Torr; |
221 | Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches often rely on expensive human annotations as supervision for model training, limiting their scalability to large, unlabeled datasets. To address this challenge, we present ZeroSeg, a novel method that leverages the existing pretrained vision-language (VL) model (e.g. CLIP vision encoder) to train open-vocabulary zero-shot semantic segmentation models. |
Jun Chen; Deyao Zhu; Guocheng Qian; Bernard Ghanem; Zhicheng Yan; Chenchen Zhu; Fanyi Xiao; Sean Chang Culatana; Mohamed Elhoseiny; |
222 | EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite being a crucial output of the perception stack, panoptic segmentation has been largely overlooked by the domain adaptation community. Therefore, we revisit well-performing domain adaptation strategies from other fields, adapt them to panoptic segmentation, and show that they can effectively enhance panoptic domain adaptation. |
Suman Saha; Lukas Hoyer; Anton Obukhov; Dengxin Dai; Luc Van Gool; |
223 | Parallax-Tolerant Unsupervised Deep Image Stitching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, deep stitching schemes overcome adverse conditions by adaptively learning robust semantic features, but they cannot handle large-parallax cases. To solve these issues, we propose a parallax-tolerant unsupervised deep image stitching technique. |
Lang Nie; Chunyu Lin; Kang Liao; Shuaicheng Liu; Yao Zhao; |
224 | Scratch Each Other’s Back: Incomplete Multi-Modal Brain Tumor Segmentation Via Category Aware Group Self-Support Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, considering the sensitivity of different modalities to diverse tumor regions, we propose a Category Aware Group Self-Support Learning framework, called GSS, to make up for the information deficit among the modalities in the individual modal feature extraction phase. |
Yansheng Qiu; Delin Chen; Hongdou Yao; Yongchao Xu; Zheng Wang; |
225 | SFHarmony: Source Free Domain Adaptation for Distributed Neuroimaging Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, neuroimaging data is inherently personal in nature, leading to data privacy concerns when sharing the data. To overcome these barriers, we propose an Unsupervised Source-Free Domain Adaptation (SFDA) method, SFHarmony. |
Nicola K Dinsdale; Mark Jenkinson; Ana IL Namburete; |
226 | M2T: Masking Transformers Twice for Faster Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show how bidirectional transformers trained for masked token prediction can be applied to neural image compression to achieve state-of-the-art results. |
Fabian Mentzer; Eirikur Agustson; Michael Tschannen; |
227 | CoIn: Contrastive Instance Feature Mining for Outdoor 3D Object Detection with Very Limited Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current detectors usually perform poorly under very limited annotations. To address this problem, we propose a novel Contrastive Instance feature mining method, named CoIn. |
Qiming Xia; Jinhao Deng; Chenglu Wen; Hai Wu; Shaoshuai Shi; Xin Li; Cheng Wang; |
228 | 3D Human Mesh Recovery with Sequentially Global Rotation Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes to directly estimate the global rotation of each joint to avoid error accumulation and pursue better accuracy. |
Dongkai Wang; Shiliang Zhang; |
229 | DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Driven by the belief that the ability to anticipate the consequences of future actions is crucial for the emergence of intelligent and interpretable planning behavior, we propose Dreamwalker — a world model based VLN-CE agent. |
Hanqing Wang; Wei Liang; Luc Van Gool; Wenguan Wang; |
230 | Computation and Data Efficient Backdoor Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is also very time-consuming as it needs to go through almost all the training stages for data selection. To address such limitations, we propose a novel confidence-based scoring methodology, which can efficiently measure the contribution of each poisoning sample based on the distance posteriors. |
Yutong Wu; Xingshuo Han; Han Qiu; Tianwei Zhang; |
231 | Agglomerative Transformer for Human-Object Interaction Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an agglomerative Transformer (AGER) that enables Transformer-based human-object interaction (HOI) detectors to flexibly exploit extra instance-level cues in a single-stage and end-to-end manner for the first time. |
Danyang Tu; Wei Sun; Guangtao Zhai; Wei Shen; |
232 | Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, neglecting the interactions between modalities will lead to poor performance. To tackle these challenging issues, we propose a comprehensive formulation for CL-VQA from the perspective of multi-modal vision-language fusion. |
Zi Qian; Xin Wang; Xuguang Duan; Pengda Qin; Yuhong Li; Wenwu Zhu; |
233 | Rethinking Fast Fourier Convolution in Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the fundamental flaws of using FFC in image inpainting, which are 1) spectrum shifting, 2) unexpected spatial activation, and 3) limited frequency receptive field. |
Tianyi Chu; Jiafu Chen; Jiakai Sun; Shuobin Lian; Zhizhong Wang; Zhiwen Zuo; Lei Zhao; Wei Xing; Dongming Lu; |
234 | Learning Robust Representations with Information Bottleneck and Memory Network for RGB-D-based Gesture Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a convenient and analytical framework to learn a robust feature representation that is impervious to gesture-irrelevant factors. |
Yunan Li; Huizhou Chen; Guanwen Feng; Qiguang Miao; |
235 | P1AC: Revisiting Absolute Pose From A Single Affine Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the first general solution to the problem of estimating the pose of a calibrated camera given a single observation of an oriented point and an affine correspondence. |
Jonathan Ventura; Zuzana Kukelova; Torsten Sattler; Dániel Baráth; |
236 | LAN-HDR: Luminance-based Alignment Network for High Dynamic Range Video Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an end-to-end HDR video composition framework, which aligns LDR frames in the feature space and then merges aligned features into an HDR frame, without relying on pixel-domain optical flow. |
Haesoo Chung; Nam Ik Cho; |
237 | Dancing in The Dark: A Benchmark Towards General Low-light Video Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current research in this area is limited by the lack of high-quality benchmark datasets. To address this issue, we design a camera system and collect a high-quality low-light video dataset with multiple exposures and cameras. |
Huiyuan Fu; Wenkai Zheng; Xicong Wang; Jiaxuan Wang; Heng Zhang; Huadong Ma; |
238 | RED-PSM: Regularization By Denoising of Partially Separable Models for Dynamic Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a partially separable objective with RED and an optimization scheme with variable splitting and ADMM. |
Berk Iskender; Marc L. Klasky; Yoram Bresler; |
239 | Unsupervised Manifold Linearizing and Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to optimize the Maximal Coding Rate Reduction metric with respect to both the data representation and a novel doubly stochastic cluster membership, inspired by state-of-the-art subspace clustering results. |
Tianjiao Ding; Shengbang Tong; Kwan Ho Ryan Chan; Xili Dai; Yi Ma; Benjamin D. Haeffele; |
240 | Lossy and Lossless (L2) Post-training Model Size Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a post-training model size compression method that combines lossy and lossless compression in a unified way. |
Yumeng Shi; Shihao Bai; Xiuying Wei; Ruihao Gong; Jianlei Yang; |
241 | C2ST: Cross-Modal Contextualized Sequence Transduction for Continuous Sign Language Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Cross-modal Contextualized Sequence Transduction (C2ST) for CSLR, which effectively incorporates the knowledge of gloss sequence into the process of video representation learning and sequence transduction. |
Huaiwen Zhang; Zihang Guo; Yang Yang; Xin Liu; De Hu; |
242 | ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Object-centric Fusion (ObjectFusion) paradigm, which completely gets rid of camera-to-BEV transformation during fusion to align object-centric features across different modalities for 3D object detection. |
Qi Cai; Yingwei Pan; Ting Yao; Chong-Wah Ngo; Tao Mei; |
243 | D-IF: Uncertainty-aware Human Digitization Via Implicit Distribution Field Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose replacing the implicit value with an adaptive uncertainty distribution, to differentiate between points based on their distance to the surface. |
Xueting Yang; Yihao Luo; Yuliang Xiu; Wei Wang; Hao Xu; Zhaoxin Fan; |
244 | MMVP: Motion-Matrix-Based Video Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A central challenge of video prediction lies where the system has to reason the object’s future motion from image frames while simultaneously maintaining the consistency of its appearance across frames. This work introduces an end-to-end trainable two-stream video prediction framework, Motion-Matrix-based Video Prediction (MMVP), to tackle this challenge. |
Yiqi Zhong; Luming Liang; Ilya Zharkov; Ulrich Neumann; |
245 | Human Preference Score: Better Aligning Text-to-Image Models with Human Preference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Using HPS, we propose a simple yet effective method to adapt Stable Diffusion to better align with human preferences. |
Xiaoshi Wu; Keqiang Sun; Feng Zhu; Rui Zhao; Hongsheng Li; |
246 | Guided Motion Diffusion for Controllable Human Motion Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, integrating spatial constraints, such as pre-defined motion trajectories and obstacles, remains a challenge despite being essential for bridging the gap between isolated human motion and its surrounding environment. To address this issue, we propose Guided Motion Diffusion (GMD), a method that incorporates spatial constraints into the motion generation process. |
Korrawe Karunratanakul; Konpat Preechakul; Supasorn Suwajanakorn; Siyu Tang; |
247 | AffordPose: A Large-Scale Dataset of Hand-Object Interactions with Affordance-Driven Hand Pose Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present AffordPose, a large-scale dataset of hand-object interactions with affordance-driven hand pose. |
Juntao Jian; Xiuping Liu; Manyi Li; Ruizhen Hu; Jian Liu; |
248 | Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present LAMA, Locomotion-Action-MAnipulation, to synthesize natural and plausible long term human movements in complex indoor environments. |
Jiye Lee; Hanbyul Joo; |
249 | NDDepth: Normal-Distance Assisted Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel physics (geometry)-driven deep learning framework for monocular depth estimation by assuming that 3D scenes are constituted by piece-wise planes. |
Shuwei Shao; Zhongcai Pei; Weihai Chen; Xingming Wu; Zhengguo Li; |
250 | Sequential Texts Driven Cohesive Motions Synthesis with Natural Transitions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a cohesive human motion sequence synthesis framework based on free-form sequential texts while ensuring semantic connection and natural transitions between adjacent motions. |
Shuai Li; Sisi Zhuang; Wenfeng Song; Xinyu Zhang; Hejia Chen; Aimin Hao; |
251 | Efficient Converted Spiking Neural Network for 3D and 2D Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient unified ANN-SNN conversion method for point cloud classification and image classification to significantly reduce the time step to meet the fast and lossless ANN-SNN transformation. |
Yuxiang Lan; Yachao Zhang; Xu Ma; Yanyun Qu; Yun Fu; |
252 | Eulerian Single-Photon Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we demonstrate computationally light-weight phase-based algorithms for the tasks of edge detection and motion estimation. |
Shantanu Gupta; Mohit Gupta; |
253 | Adaptive Calibrator Ensemble: Navigating Test Set Difficulty in Out-of-Distribution Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: If a test set has a drastically different difficulty level from the calibration set, a phenomenon out-of-distribution (OOD) data often exhibit: the optimal calibration parameters of the two datasets would be different, rendering an optimal calibrator on the calibration set suboptimal on the OOD test set and thus degraded calibration performance. With this knowledge, we propose a simple and effective method named adaptive calibrator ensemble (ACE) to calibrate OOD datasets whose difficulty is usually higher than the calibration set. |
Yuli Zou; Weijian Deng; Liang Zheng; |
254 | Contrastive Learning Relies More on Spatial Inductive Bias Than Supervised Learning: An Empirical Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Different from most previous work that understands CL from learning objectives, we focus on an unexplored yet natural aspect: the spatial inductive bias which seems to be implicitly exploited via data augmentations in CL. |
Yuanyi Zhong; Haoran Tang; Jun-Kun Chen; Yu-Xiong Wang; |
255 | DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the pre-trained Stable Diffusion, which uses only text-image pairs during training. |
Weijia Wu; Yuzhong Zhao; Mike Zheng Shou; Hong Zhou; Chunhua Shen; |
256 | NSF: Neural Surface Fields for Human Modeling from Monocular Depth Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, predicting per-vertex deformations on a pre-designed human template with a discrete surface lacks flexibility in resolution and topology. To overcome these limitations, we propose a novel method ‘NSF: Neural Surface Fields’ for modeling 3D clothed humans from monocular depth. |
Yuxuan Xue; Bharat Lal Bhatnagar; Riccardo Marin; Nikolaos Sarafianos; Yuanlu Xu; Gerard Pons-Moll; Tony Tung; |
257 | Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion Using Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By contrast, we propose a simple and novel 2D to 3D synthesis approach based on conditional diffusion with vector-quantized codes. |
Abril Corona-Figueroa; Sam Bond-Taylor; Neelanjan Bhowmik; Yona Falinie A. Gaus; Toby P. Breckon; Hubert P. H. Shum; Chris G. Willcocks; |
258 | DMNet: Delaunay Meshing Network for 3D Shape Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel learning-based method with Delaunay triangulation to achieve high-precision reconstruction. |
Chen Zhang; Ganzhangqin Yuan; Wenbing Tao; |
259 | StyleDomain: Efficient and Lightweight Parameterizations of StyleGAN for One-shot and Few-shot Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a systematic and in-depth analysis of the domain adaptation problem of GANs, focusing on the StyleGAN model. |
Aibek Alanov; Vadim Titov; Maksim Nakhodnov; Dmitry Vetrov; |
260 | RankMixup: Ranking-Based Mixup Training for Network Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present RankMixup, a novel mixup-based framework alleviating the problem of the mixture of labels for network calibration. |
Jongyoun Noh; Hyekang Park; Junghyup Lee; Bumsub Ham; |
261 | Body Knowledge and Uncertainty Modeling for Monocular 3D Human Body Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose KNOWN, a framework that effectively utilizes body KNOWledge and uNcertainty modeling to compensate for insufficient 3D supervisions. |
Yufei Zhang; Hanjing Wang; Jeffrey O. Kephart; Qiang Ji; |
262 | Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the orthogonal channel dimension for generic data augmentation by exploiting precision redundancy. |
Huimin Wu; Chenyang Lei; Xiao Sun; Peng-Shuai Wang; Qifeng Chen; Kwang-Ting Cheng; Stephen Lin; Zhirong Wu; |
263 | Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel approach for enhancing text-image correspondence by leveraging available semantic layouts. |
Minho Park; Jooyeol Yun; Seunghwan Choi; Jaegul Choo; |
264 | Neural Radiance Field with LiDAR Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing NeRF methods usually require specially collected hypersampled source views and do not perform well with the open source camera-LiDAR datasets – significantly limiting the approach’s practical utility. In this paper, we demonstrate an approach that allows for these datasets to be utilized for high quality neural renderings. |
MingFang Chang; Akash Sharma; Michael Kaess; Simon Lucey; |
265 | AREA: Adaptive Reweighting Via Effective Area for Long-Tailed Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reconsider reweighting from a totally new perspective of analyzing the spanned space of each class. |
Xiaohua Chen; Yucan Zhou; Dayan Wu; Chule Yang; Bo Li; Qinghua Hu; Weiping Wang; |
266 | Erasing Concepts from Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a fine-tuning method that can erase a visual concept from a pre-trained diffusion model, given only the name of the style and using negative guidance as a teacher. |
Rohit Gandikota; Joanna Materzynska; Jaden Fiotto-Kaufman; David Bau; |
267 | Fully Attentional Networks with Self-emerging Token Labeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the FAN models and improve their pre-training with a self-emerging token labeling (STL) framework. |
Bingyin Zhao; Zhiding Yu; Shiyi Lan; Yutao Cheng; Anima Anandkumar; Yingjie Lao; Jose M. Alvarez; |
268 | ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, universality and robustness in existing methods often fall short as the transferability aspect is often overlooked, thus restricting their application only to a specific target with limited performance. To address these challenges, we present Adversarial Camouflage for Transferable and Intensive Vehicle Evasion (ACTIVE), a state-of-the-art physical camouflage attack framework designed to generate universal and robust adversarial camouflage capable of concealing any 3D vehicle from detectors. |
Naufal Suryanto; Yongsu Kim; Harashta Tatimma Larasati; Hyoeun Kang; Thi-Thu-Huong Le; Yoonyoung Hong; Hunmin Yang; Se-Yoon Oh; Howon Kim; |
269 | Learning Adaptive Neighborhoods for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods typically fix the choice of node degree for the entire graph, which is suboptimal. Instead, we propose a novel end-to-end differentiable graph generator which builds graph topologies where each node selects both its neighborhood and its size. |
Avishkar Saha; Oscar Mendez; Chris Russell; Richard Bowden; |
270 | Equivariant Similarity for Vision-Language Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose EqSim, a regularization loss that can be efficiently calculated from any two matched training pairs and easily pluggable into existing image-text retrieval fine-tuning. |
Tan Wang; Kevin Lin; Linjie Li; Chung-Ching Lin; Zhengyuan Yang; Hanwang Zhang; Zicheng Liu; Lijuan Wang; |
271 | ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel reconfigurable graph model that first associates all detected objects across cameras spatially before reconfiguring it into a temporal graph for Temporal Association. |
Cheng-Che Cheng; Min-Xuan Qiu; Chen-Kuo Chiang; Shang-Hong Lai; |
272 | Too Large; Data Reduction for Vision-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper examines the problems of severe image-text misalignment and high redundancy in the widely-used large-scale Vision-Language Pre-Training (VLP) datasets. To address these issues, we propose an efficient and straightforward Vision-Language learning algorithm called TL;DR which aims to compress the existing large VLP data into a small, high-quality set. |
Alex Jinpeng Wang; Kevin Qinghong Lin; David Junhao Zhang; Stan Weixian Lei; Mike Zheng Shou; |
273 | Make-It-3D: High-fidelity 3D Creation from A Single Image with Diffusion Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate the problem of creating high-fidelity 3D content from only a single image. |
Junshu Tang; Tengfei Wang; Bo Zhang; Ting Zhang; Ran Yi; Lizhuang Ma; Dong Chen; |
274 | Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a deeply unified framework for depth-aware panoptic segmentation, which performs joint segmentation and depth estimation both in a per-segment manner with identical object queries. |
Junwen He; Yifan Wang; Lijun Wang; Huchuan Lu; Bin Luo; Jun-Yan He; Jin-Peng Lan; Yifeng Geng; Xuansong Xie; |
275 | Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging Via Optimization Trajectory Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose optimization trajectory distillation, a unified approach to address the two technical challenges from a new perspective. |
Jianan Fan; Dongnan Liu; Hang Chang; Heng Huang; Mei Chen; Weidong Cai; |
276 | DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new formulation of temporal action detection (TAD) with denoising diffusion, DiffTAD in short. |
Sauradip Nag; Xiatian Zhu; Jiankang Deng; Yi-Zhe Song; Tao Xiang; |
277 | Ray Conditioning: Trading Photo-consistency for Photo-realism in Multi-view Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such explicit bias for photo-consistency sacrifices photo-realism, causing geometry artifacts and loss of fine-scale details when these methods are applied to edit real images. To address this issue, we propose ray conditioning, a geometry-free alternative that relaxes the photo-consistency constraint. |
Eric Ming Chen; Sidhanth Holalkere; Ruyu Yan; Kai Zhang; Abe Davis; |
278 | SCOB: Universal Text Understanding Via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the great success of language model (LM)-based pre-training, recent studies in visual document understanding have explored LM-based pre-training methods for modeling text within document images. |
Daehee Kim; Yoonsik Kim; DongHyun Kim; Yumin Lim; Geewook Kim; Taeho Kil; |
279 | Point-Query Quadtree for Crowd Counting, Localization, and More Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Too few imply underestimation; too many increase computational overhead. To address this dilemma, we introduce a decomposable structure, i.e., the point-query quadtree, and propose a new counting model, termed Point quEry Transformer (PET). |
Chengxin Liu; Hao Lu; Zhiguo Cao; Tongliang Liu; |
280 | Heterogeneous Diversity Driven Active Learning for Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the cost of human annotations, we propose Heterogeneous Diversity driven Active Multi-Object Tracking (HD-AMOT), to infer the most informative frames for any MOT tracker by observing the heterogeneous cues of samples. |
Rui Li; Baopeng Zhang; Jun Liu; Wei Liu; Jian Zhao; Zhu Teng; |
281 | Domain Generalization of 3D Semantic Segmentation in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its importance, domain generalization is relatively unexplored in the case of 3D autonomous driving semantic segmentation. To fill this gap, this paper presents the first benchmark for this application by testing state-of-the-art methods and discussing the difficulty of tackling Laser Imaging Detection and Ranging (LiDAR) domain shifts. |
Jules Sanchez; Jean-Emmanuel Deschaud; François Goulette; |
282 | HaMuCo: Hand Pose Estimation Via Multiview Collaborative Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate the label-hungry limitation, we propose a self-supervised learning framework, HaMuCo, that learns a single view hand pose estimator from multi-view pseudo 2D labels. |
Xiaozheng Zheng; Chao Wen; Zhou Xue; Pengfei Ren; Jingyu Wang; |
283 | Efficient Model Personalization in Federated Learning Via Client-Specific Prompt Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To leverage robust representations from large-scale models while enabling efficient model personalization for heterogeneous clients, we propose a novel personalized FL framework of client-specific Prompt Generation (pFedPG), which learns to deploy a personalized prompt generator at the server for producing client-specific visual prompts that efficiently adapts frozen backbones to local data distributions. |
Fu-En Yang; Chien-Yi Wang; Yu-Chiang Frank Wang; |
284 | Dual Aggregation Transformer for Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This inspires us to combine the two dimensions in Transformer for a more powerful representation capability. Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. |
Zheng Chen; Yulun Zhang; Jinjin Gu; Linghe Kong; Xiaokang Yang; Fisher Yu; |
285 | Zero-Shot Spatial Layout Conditioning for Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we consider image generation from text associated with segments on the image canvas, which combines an intuitive natural language interface with precise spatial control over the generated content. |
Guillaume Couairon; Marlène Careil; Matthieu Cord; Stéphane Lathuilière; Jakob Verbeek; |
286 | SegGPT: Towards Segmenting Everything in Context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SegGPT, a generalist model for segmenting everything in context. |
Xinlong Wang; Xiaosong Zhang; Yue Cao; Wen Wang; Chunhua Shen; Tiejun Huang; |
287 | Semantify: Simplifying The Control of 3D Morphable Models Using CLIP Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Semantify: a self-supervised method that utilizes the semantic power of CLIP language-vision foundation model to simplify the control of 3D morphable models. |
Omer Gralnik; Guy Gafni; Ariel Shamir; |
288 | From Sky to The Ground: A Large-scale Benchmark and Simple Baseline Towards Real Rain Removal Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Learning-based image deraining methods have made great progress. However, the lack of large-scale high-quality paired training samples is the main bottleneck to hamper the real image deraining (RID). To address this dilemma and advance RID, we construct a Large-scale High-quality Paired real rain benchmark (LHP-Rain), including 3000 video sequences with 1 million high-resolution (1920*1080) frame pairs. |
Yun Guo; Xueyao Xiao; Yi Chang; Shumin Deng; Luxin Yan; |
289 | Knowledge Restore and Transfer for Multi-Label Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although there have been many anti-forgetting methods to solve the problem of catastrophic forgetting in single-label class-incremental learning, these methods have difficulty in solving the MLCIL problem due to label absence and information dilution problems. To solve these problems, we propose a Knowledge Restore and Transfer (KRT) framework including a dynamic pseudo-label (DPL) module to solve the label absence problem by restoring the knowledge of old classes to the new data and an incremental cross-attention (ICA) module with session-specific knowledge retention tokens storing knowledge and a unified knowledge transfer token transferring knowledge to solve the information dilution problem. |
Songlin Dong; Haoyu Luo; Yuhang He; Xing Wei; Jie Cheng; Yihong Gong; |
290 | DDColor: Towards Photo-Realistic Image Colorization Via Dual Decoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While transformer-based methods can deliver better results, they often rely on manually designed priors, suffer from poor generalization ability, and introduce color bleeding effects. To address these issues, we propose DDColor, an end-to-end method with dual decoders for image colorization. |
Xiaoyang Kang; Tao Yang; Wenqi Ouyang; Peiran Ren; Lingzhi Li; Xuansong Xie; |
291 | Visual Explanations Via Iterated Integrated Attributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Iterated Integrated Attributions (IIA) – a generic method for explaining the predictions of vision models. |
Oren Barkan; Yehonatan Elisha; Yuval Asher; Amit Eshel; Noam Koenigstein; |
292 | PanFlowNet: A Flow-Based Deep Network for Pan-Sharpening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing deep learning-based methods recover only one HRMS image from the LRMS image and PAN image using a deterministic mapping, thus ignoring the diversity of the HRMS image. In this paper, to alleviate this ill-posed issue, we propose a flow-based pan-sharpening network (PanFlowNet) to directly learn the conditional distribution of HRMS image given LRMS image and PAN image instead of learning a deterministic mapping. |
Gang Yang; Xiangyong Cao; Wenzhe Xiao; Man Zhou; Aiping Liu; Xun Chen; Deyu Meng; |
293 | Domain Generalization Via Balancing Training Difficulty and Model Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design MoDify, a Momentum Difficulty framework that tackles the misalignment by balancing the seesaw between the model’s capability and the samples’ difficulties along the training process. |
Xueying Jiang; Jiaxing Huang; Sheng Jin; Shijian Lu; |
294 | Pairwise Similarity Learning Is SimPLE Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on a general yet important learning problem, pairwise similarity learning (PSL). |
Yandong Wen; Weiyang Liu; Yao Feng; Bhiksha Raj; Rita Singh; Adrian Weller; Michael J. Black; Bernhard Schölkopf; |
295 | GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Purposely, we present GO-SLAM, a deep-learning-based dense visual SLAM framework globally optimizing poses and 3D reconstruction in real-time. |
Youmin Zhang; Fabio Tosi; Stefano Mattoccia; Matteo Poggi; |
296 | JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we focus on the problem of 3D human mesh recovery from a single image under obscured conditions. |
Jiahao Li; Zongxin Yang; Xiaohan Wang; Jianxin Ma; Chang Zhou; Yi Yang; |
297 | CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, due to the small size and partially labeled problem of each dataset, as well as a limited investigation of diverse types of tumors, the resulting models are often limited to segmenting specific organs/tumors and ignore the semantics of anatomical structures, nor can they be extended to novel domains. To address these issues, we propose the CLIP-Driven Universal Model, which incorporates text embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models. |
Jie Liu; Yixiao Zhang; Jie-Neng Chen; Junfei Xiao; Yongyi Lu; Bennett A Landman; Yixuan Yuan; Alan Yuille; Yucheng Tang; Zongwei Zhou; |
298 | NIR-assisted Video Enhancement Via Unpaired 24-hour Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we defend the feasibility and superiority of NIR-assisted low-light video enhancement results by using unpaired 24-hour data for the first time, which significantly eases data collection and improves generalization performance on in-the-wild data. |
Muyao Niu; Zhihang Zhong; Yinqiang Zheng; |
299 | FACTS: First Amplify Correlations and Then Slice to Discover Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Models trained on such datasets learn "shortcuts" and underperform on bias-conflicting slices of data where the correlation does not hold. In this work, we study the problem of identifying such slices to inform downstream bias mitigation strategies. |
Sriram Yenamandra; Pratik Ramesh; Viraj Prabhu; Judy Hoffman; |
300 | Anchor Structure Regularization Induced Multi-view Subspace Clustering Via Enhanced Tensor Rank Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Being aware of these, we propose Anchor Structure Regularitation Induced Multi-view Subspace Clustering via Enhanced Tensor Rank Minimization (ASR-ETR). |
Jintian Ji; Songhe Feng; |
301 | VeRi3D: Generative Vertex-based Radiance Fields for 3D Controllable Human Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Some recent work demonstrates promising results of learning human generative models using neural articulated radiance fields, yet their generalization ability and controllability lag behind parametric human models, i.e., they do not perform well when generalizing to novel pose/shape and are not part controllable. To solve these problems, we propose VeRi3D, a generative human vertex-based radiance field parameterized by vertices of the parametric human template, SMPL. |
Xinya Chen; Jiaxin Huang; Yanrui Bin; Lu Yu; Yiyi Liao; |
302 | MOSE: A New Dataset for Video Object Segmentation in Complex Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To revisit VOS and make it more applicable in the real world, we collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments. |
Henghui Ding; Chang Liu; Shuting He; Xudong Jiang; Philip H.S. Torr; Song Bai; |
303 | BoMD: Bag of Multi-label Descriptors for Noisy Chest X-ray Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new method designed for noisy multi-label CXR learning, which detects and smoothly re-labels noisy samples from the dataset to be used in the training of common multi-label classifiers. |
Yuanhong Chen; Fengbei Liu; Hu Wang; Chong Wang; Yuyuan Liu; Yu Tian; Gustavo Carneiro; |
304 | Mask-Attention-Free Transformer for 3D Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through center regression, we effectively overcome the low-recall issue and perform cross-attention by imposing positional prior. |
Xin Lai; Yuhui Yuan; Ruihang Chu; Yukang Chen; Han Hu; Jiaya Jia; |
305 | SHIFT3D: Synthesizing Hard Inputs For Tricking 3D Detectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SHIFT3D, a differentiable pipeline for generating 3D shapes that are structurally plausible yet challenging to 3D object detectors. |
Hongge Chen; Zhao Chen; Gregory P. Meyer; Dennis Park; Carl Vondrick; Ashish Shrivastava; Yuning Chai; |
306 | EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formalize a pipeline (we dub EgoLoc) that better entangles 3D multiview geometry with 2D object retrieval from egocentric videos. |
Jinjie Mai; Abdullah Hamdi; Silvio Giancola; Chen Zhao; Bernard Ghanem; |
307 | Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, their performance is severely limited by the lack of inter-person interactions in the spatial-temporal mesh recovery, as well as by detection and tracking defects. To address these challenges, we propose the Coordinate transFormer (CoordFormer) that directly models multi-person spatial-temporal relations and simultaneously performs multi-mesh recovery in an end-to-end manner. |
Haoyuan Li; Haoye Dong; Hanchao Jia; Dong Huang; Michael C. Kampffmeyer; Liang Lin; Xiaodan Liang; |
308 | FLatten Transformer: Vision Transformer Using Focused Linear Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness. |
Dongchen Han; Xuran Pan; Yizeng Han; Shiji Song; Gao Huang; |
309 | Q-Diffusion: Quantizing Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture of the diffusion models, which compresses the noise estimation network to accelerate the generation process. |
Xiuyu Li; Yijiang Liu; Long Lian; Huanrui Yang; Zhen Dong; Daniel Kang; Shanghang Zhang; Kurt Keutzer; |
310 | Robustifying Token Attention for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: More critically, these tokens are not robust to corruptions, often leading to highly diverging attention patterns. In this paper, we intend to alleviate this overfocusing issue and make attention more stable through two general techniques: First, our Token-aware Average Pooling (TAP) module encourages the local neighborhood of each token to take part in the attention mechanism. |
Yong Guo; David Stutz; Bernt Schiele; |
311 | Boosting Positive Segments for Weakly-Supervised Audio-Visual Video Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem of weakly supervised Audio-Visual Video Parsing (AVVP), where the goal is to temporally localize events that are audible or visible and simultaneously classify them into known event categories. |
Kranthi Kumar Rachavarapu; Rajagopalan A. N.; |
312 | ADNet: Lane Shape Prediction Via Anchor Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit the limitations of anchor-based lane detection methods, which have predominantly focused on fixed anchors that stem from the edges of the image, disregarding their versatility and quality. |
Lingyu Xiao; Xiang Li; Sen Yang; Wankou Yang; |
313 | UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and The OpenPCSeg Codebase Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a unified multi-modal LiDAR segmentation network, termed UniSeg, which leverages the information of RGB images and three views of the point cloud, and accomplishes semantic segmentation and panoptic segmentation simultaneously. |
Youquan Liu; Runnan Chen; Xin Li; Lingdong Kong; Yuchen Yang; Zhaoyang Xia; Yeqi Bai; Xinge Zhu; Yuexin Ma; Yikang Li; Yu Qiao; Yuenan Hou; |
314 | Sign Language Translation with Iterative Prototype Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents IP-SLT, a simple yet effective framework for sign language translation (SLT). |
Huijie Yao; Wengang Zhou; Hao Feng; Hezhen Hu; Hao Zhou; Houqiang Li; |
315 | Pixel-Wise Contrastive Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a simple but effective pixel-level self-supervised distillation framework friendly to dense prediction tasks. |
Junqiang Huang; Zichao Guo; |
316 | Efficient Deep Space Filling Curve Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, MST generation is un-differentiable, which is infeasible to optimize via gradient descent. To remedy these issues, we propose a GNN-based SFC-search framework with a tailored algorithm that largely reduces computational cost of GNN. |
Wanli Chen; Xufeng Yao; Xinyun Zhang; Bei Yu; |
317 | GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The approach introduces a new training objective that leverages parallel corpora to align the representation spaces of different encoders. |
Can Qin; Ning Yu; Chen Xing; Shu Zhang; Zeyuan Chen; Stefano Ermon; Yun Fu; Caiming Xiong; Ran Xu; |
318 | Humans in 4D: Reconstructing and Tracking Humans with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an approach to reconstruct humans and track them over time. |
Shubham Goel; Georgios Pavlakos; Jathushan Rajasegaran; Angjoo Kanazawa; Jitendra Malik; |
319 | Ponder: Point Cloud Pre-training Via Neural Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering. |
Di Huang; Sida Peng; Tong He; Honghui Yang; Xiaowei Zhou; Wanli Ouyang; |
320 | Perpetual Humanoid Control for Real-time Simulated Avatars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a physics-based humanoid controller that achieves high-fidelity motion imitation and fault-tolerant behavior in the presence of noisy input (e.g. pose estimates from video or generated from language) and unexpected falls. |
Zhengyi Luo; Jinkun Cao; AlexanderWinkler; Kris Kitani; Weipeng Xu; |
321 | HollowNeRF: Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To cull away unnecessary regions of the feature grid, existing solutions rely on prior knowledge of object shape or periodically estimate object shape during training by repeated model evaluations, which are costly and wasteful. To address this issue, we propose HollowNeRF, a novel compression solution for hashgrid-based NeRF which automatically sparsifies the feature grid during the training phase. |
Xiufeng Xie; Riccardo Gherardi; Zhihong Pan; Stephen Huang; |
322 | A Complete Recipe for Diffusion Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in SGMs, ensuring convergence to the desired target distribution. |
Kushagra Pandey; Stephan Mandt; |
323 | The Devil Is in The Crack Orientation: A New Perspective for Crack Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a first-of-its-kind oriented sub-crack detector, dubbed as CrackDet, which is derived from a novel piecewise angle definition, to ease the boundary discontinuity problem. |
Zhuangzhuang Chen; Jin Zhang; Zhuonan Lai; Guanming Zhu; Zun Liu; Jie Chen; Jianqiang Li; |
324 | FedPD: Federated Open Set Recognition with Parameter Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a parameter disentanglement guided federated open-set recognition (FedPD) algorithm to address two core challenges of FedOSR: cross-client inter-set interference between learning closed-set and open-set knowledge and cross-client intra-set inconsistency by data heterogeneity. |
Chen Yang; Meilu Zhu; Yifan Liu; Yixuan Yuan; |
325 | WaterMask: Instance Segmentation for Underwater Imagery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first underwater image instance segmentation dataset (UIIS), which provides 4628 images for 7 categories with pixel-level annotations. |
Shijie Lian; Hua Li; Runmin Cong; Suqi Li; Wei Zhang; Sam Kwong; |
326 | Score Priors Guided Deep Variational Inference for Unsupervised Real-World Single Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, applying existing score-based methods for real-world denoising requires not only the explicit train of score priors on the target domain but also the careful design of sampling procedures for posterior inference, which is complicated and impractical. To address these limitations, we propose a score priors-guided deep variational inference, namely ScoreDVI, for practical real-world denoising. |
Jun Cheng; Tao Liu; Shan Tan; |
327 | L-DAWA: Layer-wise Divergence Aware Weight Aggregation in Federated Self-Supervised Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new aggregation strategy termed Layer-wise Divergence Aware Weight Aggregation (L-DAWA) to mitigate the influence of client bias and divergence during FL aggregation. |
Yasar Abbas Ur Rehman; Yan Gao; Pedro Porto Buarque de Gusmao; Mina Alibeigi; Jiajun Shen; Nicholas D. Lane; |
328 | Improving Transformer-based Image Matching By Cascaded Capturing Spatially Informative Keypoints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: But correlations produced by transformer-based methods are spatially limited to the center of source views’ coarse patches, because of the costly attention learning. In this work, we rethink this issue and find that such matching formulation degrades pose estimation, especially for low-resolution images. |
Chenjie Cao; Yanwei Fu; |
329 | Controllable Guide-Space for Generalizable Face Forgery Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a controllable guide-space (GS) method to enhance the discrimination of different forgery domains, so as to increase the forgery relevance of features and thereby improve the generalization. |
Ying Guo; Cheng Zhen; Pengfei Yan; |
330 | Calibrating Uncertainty for Semi-Supervised Crowd Counting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method to calibrate model uncertainty for crowd counting. |
Chen LI; Xiaoling Hu; Shahira Abousamra; Chao Chen; |
331 | MosaiQ: Quantum Generative Adversarial Networks for Image Generation on NISQ Computers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, quantum image generation has been explored with many potential advantages over non-quantum techniques; however, previous techniques have suffered from poor quality and robustness. To address these problems, we introduce MosaiQ a high-quality quantum image generation GAN framework that can be executed on today’s Near-term Intermediate Scale Quantum (NISQ) computers. |
Daniel Silver; Tirthak Patel; William Cutler; Aditya Ranjan; Harshitta Gandhi; Devesh Tiwari; |
332 | DVIS: Decoupled Video Instance Segmentation Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Firstly, offline methods are limited by the tightly-coupled modeling paradigm, which treats all frames equally and disregards the interdependencies between adjacent frames. Consequently, this leads to the introduction of excessive noise during long-term temporal alignment. Secondly, online methods suffer from inadequate utilization of temporal information. To tackle these challenges, we propose a decoupling strategy for VIS by dividing it into three independent sub-tasks: segmentation, tracking, and refinement. |
Tao Zhang; Xingye Tian; Yu Wu; Shunping Ji; Xuebo Wang; Yuan Zhang; Pengfei Wan; |
333 | Segmentation of Tubular Structures Using Iterative Training with Tailored Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a minimal path method to simultaneously compute segmentation masks and extract centerlines of tubular structures with line-topology. |
Wei Liao; |
334 | Boundary-Aware Divide and Conquer: A Diffusion-Based Solution for Unsupervised Shadow Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel diffusion-based solution for unsupervised shadow removal, which separately models the shadow, non-shadow, and their boundary regions. |
Lanqing Guo; Chong Wang; Wenhan Yang; Yufei Wang; Bihan Wen; |
335 | Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper addresses the problem of rolling shutter correction in complex nonlinear and dynamic scenes with extreme occlusion. |
Delin Qu; Yizhen Lao; Zhigang Wang; Dong Wang; Bin Zhao; Xuelong Li; |
336 | Surface Extraction from Neural Unsigned Distance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method, named DualMesh-UDF, to extract a surface from unsigned distance functions (UDFs), encoded by neural networks, or neural UDFs. |
Congyi Zhang; Guying Lin; Lei Yang; Xin Li; Taku Komura; Scott Schaefer; John Keyser; Wenping Wang; |
337 | CBA: Improving Online Continual Learning Via Continual Bias Adaptor Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Due to the time-varying training setting, the model learned from a changing distribution easily forgets the previously learned knowledge and biases towards the newly received task. To address this problem, we propose a Continual Bias Adaptor (CBA) module to augment the classifier network to adapt to catastrophic distribution change during training, such that the classifier network is able to learn a stable consolidation of previously learned tasks. |
Quanziang Wang; Renzhen Wang; Yichen Wu; Xixi Jia; Deyu Meng; |
338 | GraphEcho: Graph-Driven Unsupervised Domain Adaptation for Echocardiogram Video Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a newly collected CardiacUDA dataset and a novel GraphEcho method for cardiac structure segmentation. |
Jiewen Yang; Xinpeng Ding; Ziyang Zheng; Xiaowei Xu; Xiaomeng Li; |
339 | Multi-view Spectral Polarization Propagation for Video Glass Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first polarization-guided video glass segmentation propagation solution (PGVS-Net) that can robustly and coherently propagate glass segmentation in RGB-P video sequences. |
Yu Qiao; Bo Dong; Ao Jin; Yu Fu; Seung-Hwan Baek; Felix Heide; Pieter Peers; Xiaopeng Wei; Xin Yang; |
340 | Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus propose an Efficient object-centric Representation amodal Segmentation (EoRaS). |
Ke Fan; Jingshi Lei; Xuelin Qian; Miaopeng Yu; Tianjun Xiao; Tong He; Zheng Zhang; Yanwei Fu; |
341 | Augmented Box Replay: Overcoming Foreground Shift for Incremental Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify the overlooked problem of foreground shift as the main reason for this. |
Yuyang Liu; Yang Cong; Dipam Goswami; Xialei Liu; Joost van de Weijer; |
342 | Distilled Reverse Attention Network for Open-world Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Distilled Reverse Attention Network to address the challenges. |
Yun Li; Zhe Liu; Saurav Jha; Lina Yao; |
343 | DandelionNet: Domain Composition with Instance Adaptive Classification for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to preserve more complementary information from multiple domains at the meantime of reducing their domain gap, we propose that the multiple domains should not be tightly aligned but composite together, where all domains are pulled closer but still preserve their individuality respectively. |
Lanqing Hu; Meina Kan; Shiguang Shan; Xilin Chen; |
344 | TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present TexFusion(Texture Diffusion), a new method to synthesize textures for given 3D geometries, using only large-scale text-guided image diffusion models. |
Tianshi Cao; Karsten Kreis; Sanja Fidler; Nicholas Sharp; Kangxue Yin; |
345 | Shift from Texture-bias to Shape-bias: Edge Deformation-based Augmentation for Robust Object Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to augment the training dataset by generating semantically meaningful shapes and samples, via a shape deformation-based online augmentation, namely as SDbOA. |
Xilin He; Qinliang Lin; Cheng Luo; Weicheng Xie; Siyang Song; Feng Liu; Linlin Shen; |
346 | Lighting Every Darkness in Two Pairs: A Calibration-Free Pipeline for RAW Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods suffer from several main deficiencies: 1) the calibration procedure is laborious and time-consuming, 2) denoisers for different cameras are difficult to transfer, and 3) the discrepancy between synthetic noise and real noise is enlarged by high digital gain. To overcome the above shortcomings, we propose a calibration-free pipeline for Lighting Every Drakness (LED), regardless of the digital gain or camera sensor. |
Xin Jin; Jia-Wen Xiao; Ling-Hao Han; Chunle Guo; Ruixun Zhang; Xialei Liu; Chongyi Li; |
347 | Data-free Knowledge Distillation for Fine-grained Visual Categorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the existing methods exploiting DFKD have achieved inspiring achievements in coarse-grained classification, in practical applications involving fine-grained classification tasks that require more detailed distinctions between similar categories, sub-optimal results are obtained. To address this issue, we propose an approach called DFKD-FGVC that extends DFKD to fine-grained vision categorization (FGVC) tasks. |
Renrong Shao; Wei Zhang; Jianhua Yin; Jun Wang; |
348 | MotionBERT: A Unified Perspective on Learning Human Motion Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources. |
Wentao Zhu; Xiaoxuan Ma; Zhaoyang Liu; Libin Liu; Wayne Wu; Yizhou Wang; |
349 | PASTA: Proportional Amplitude Spectrum Training Augmentation for Syn-to-Real Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Proportional Amplitude Spectrum Training Augmentation (PASTA), a simple and effective augmentation strategy to improve out-of-the-box synthetic-to-real (syn-to-real) generalization performance. |
Prithvijit Chattopadhyay; Kartik Sarangmath; Vivek Vijaykumar; Judy Hoffman; |
350 | EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper rethinks and proposes a new framework as an infrastructure to advance Ego-HOI recognition by Probing, Curation and Adaption (EgoPCA). |
Yue Xu; Yong-Lu Li; Zhemin Huang; Michael Xu Liu; Cewu Lu; Yu-Wing Tai; Chi-Keung Tang; |
351 | Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models. |
Wei Yin; Chi Zhang; Hao Chen; Zhipeng Cai; Gang Yu; Kaixuan Wang; Xiaozhi Chen; Chunhua Shen; |
352 | I Can’t Believe There’s No Images! Learning Visual Tasks Using Only Language Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Many high-level skills that are required for computer vision tasks, such as parsing questions, comparing and contrasting semantics, and writing descriptions, are also required in other domains such as natural language processing. In this paper, we ask whether it is possible to learn those skills from text data and then transfer them to vision tasks without ever training on visual training data. |
Sophia Gu; Christopher Clark; Aniruddha Kembhavi; |
353 | Lightweight Image Super-Resolution with Superpixel Token Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, this conventional regular patch division is too coarse and lacks interpretability, resulting in artifacts and non-similar structure interference during attention operations. To address these challenges, we propose a novel super token interaction network (SPIN). |
Aiping Zhang; Wenqi Ren; Yi Liu; Xiaochun Cao; |
354 | Feature Prediction Diffusion Model for Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the impressive generative and anti-noise capacity of diffusion model (DM), in this work, we introduce a novel DM-based method to predict the features of video frames for anomaly detection. |
Cheng Yan; Shiyu Zhang; Yang Liu; Guansong Pang; Wenjun Wang; |
355 | RANA: Relightable Articulated Neural Avatars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose RANA, a relightable and articulated neural avatar for the photorealistic synthesis of humans under arbitrary viewpoints, body poses, and lighting. |
Umar Iqbal; Akin Caliskan; Koki Nagano; Sameh Khamis; Pavlo Molchanov; Jan Kautz; |
356 | Iterative Denoiser and Noise Estimator for Self-Supervised Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Denoise-Corrupt-Denoise pipeline (DCD-Net) for self-supervised image denoising. |
Yunhao Zou; Chenggang Yan; Ying Fu; |
357 | MasQCLIP for Open-Vocabulary Universal Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new method for open-vocabulary universal image segmentation, which is capable of performing instance, semantic, and panoptic segmentation under a unified framework. |
Xin Xu; Tianyi Xiong; Zheng Ding; Zhuowen Tu; |
358 | Memory-and-Anticipation Transformer for Online Action Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we rethink the temporal dependence of event evolution and propose a novel memory-anticipation-based paradigm to model an entire temporal structure, including the past, present, and future. Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks. |
Jiahao Wang; Guo Chen; Yifei Huang; Limin Wang; Tong Lu; |
359 | Self-similarity Driven Scale-invariant Learning for Weakly Supervised Person Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address it by proposing a novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL). |
Benzhi Wang; Yang Yang; Jinlin Wu; Guo-jun Qi; Zhen Lei; |
360 | MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified system for multi-person, diverse, and high-fidelity talking portrait generation. |
Yunfei Liu; Lijian Lin; Fei Yu; Changyin Zhou; Yu Li; |
361 | Realistic Full-Body Tracking from Sparse Observations Via Joint-Level Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a two-stage framework that can obtain accurate and smooth full-body motions with the three tracking signals of head and hands only. |
Xiaozheng Zheng; Zhuo Su; Chao Wen; Zhou Xue; Xiaojie Jin; |
362 | MetaF2N: Blind Image Super-Resolution By Learning Efficient Model Adaptation from Faces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate efficient model adaptation towards image-specific degradations, we propose a method dubbed MetaF2N, which leverages the contained faces to fine-tune model parameters for adapting to the whole natural image in a meta-learning framework. |
Zhicun Yin; Ming Liu; Xiaoming Li; Hui Yang; Longan Xiao; Wangmeng Zuo; |
363 | Lighting Up NeRF Via Unsupervised Decomposition and Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel approach, called Low-Light NeRF (or LLNeRF), to enhance the scene representation and synthesize normal-light novel views directly from sRGB low-light images in an unsupervised manner. |
Haoyuan Wang; Xiaogang Xu; Ke Xu; Rynson W.H. Lau; |
364 | ViM: Vision Middleware for Unified Downstream Transferring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents Vision Middleware (ViM), a new learning paradigm that targets unified transferring from a single foundation model to a variety of downstream tasks. |
Yutong Feng; Biao Gong; Jianwen Jiang; Yiliang Lv; Yujun Shen; Deli Zhao; Jingren Zhou; |
365 | DIRE for Diffusion-Generated Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we seek to build a detector for telling apart real images from diffusion-generated images. |
Zhendong Wang; Jianmin Bao; Wengang Zhou; Weilun Wang; Hezhen Hu; Hong Chen; Houqiang Li; |
366 | Ord2Seq: Regarding Ordinal Regression As Label Sequence Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple sequence prediction framework for ordinal regression called Ord2Seq, which, for the first time, transforms each ordinal category label into a special label sequence and thus regards an ordinal regression task as a sequence prediction process. |
Jinhong Wang; Yi Cheng; Jintai Chen; TingTing Chen; Danny Chen; Jian Wu; |
367 | Bring Clipart to Life Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new interaction method by guiding the editing with abstract clipart, composed of a set of simple semantic parts, allowing users to control across face photos with simple clicks. |
Nanxuan Zhao; Shengqi Dang; Hexun Lin; Yang Shi; Nan Cao; |
368 | Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing video-based methods generally recover human mesh by estimating the complex pose and shape parameters from coupled image features, whose high complexity and low representation ability often result in inconsistent pose motion and limited shape patterns. To alleviate this issue, we introduce 3D pose as the intermediary and propose a Pose and Mesh Co-Evolution network (PMCE) that decouples this task into two parts: 1) video-based 3D human pose estimation and 2) mesh vertices regression from the estimated 3D pose and temporal image feature. |
Yingxuan You; Hong Liu; Ti Wang; Wenhao Li; Runwei Ding; Xia Li; |
369 | Noise2Info: Noisy Image to Information of Noise for Self-Supervised Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, it is unrealistic to assume that \sigma_n is known for pursuing high model performance. To alleviate this issue, we propose Noise2Info to extract the critical information, the standard deviation \sigma_n of injected noise, only based on the noisy images. |
Jiachuan Wang; Shimin Di; Lei Chen; Charles Wang Wai Ng; |
370 | Controllable Visual-Tactile Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we leverage deep generative models to create a multi-sensory experience where users can touch and see the synthesized object when sliding their fingers on a haptic surface. |
Ruihan Gao; Wenzhen Yuan; Jun-Yan Zhu; |
371 | Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By discussing the properties of each group of methods, we derive SimPool, a simple attention-based pooling mechanism as a replacement of the default one for both convolutional and transformer encoders. |
Bill Psomas; Ioannis Kakogeorgiou; Konstantinos Karantzalos; Yannis Avrithis; |
372 | SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To advance the diversity and annotation quality of human models, we introduce a new synthetic dataset, SynBody, with three appealing features: 1) a clothed parametric human model that can generate a diverse range of subjects; 2) the layered human representation that naturally offers high-quality 3D annotations to support multiple tasks; 3) a scalable system for producing realistic data to facilitate real-world tasks. |
Zhitao Yang; Zhongang Cai; Haiyi Mei; Shuai Liu; Zhaoxi Chen; Weiye Xiao; Yukun Wei; Zhongfei Qing; Chen Wei; Bo Dai; Wayne Wu; Chen Qian; Dahua Lin; Ziwei Liu; Lei Yang; |
373 | Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Viewset Diffusion, a diffusion-based generator that outputs 3D objects while only using multi-view 2D data for supervision. |
Stanislaw Szymanowicz; Christian Rupprecht; Andrea Vedaldi; |
374 | LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores our key insight: synthetic text images are good visual prompts for vision-language models! |
Cheng Shi; Sibei Yang; |
375 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing approaches that jointly learn 2D-3D feature matching suffer from low inliers due to representational differences between the two modalities, and the methods that bypass this problem into classification have an issue of poor refinement. In this work, we propose EP2P-Loc, a novel large-scale visual localization method that mitigates such appearance discrepancy and enables end-to-end training for pose estimation. |
Minjung Kim; Junseo Koo; Gunhee Kim; |
376 | SIRA-PCR: Sim-to-Real Adaptation for 3D Point Cloud Registration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design SIRA-PCR, a new approach to 3D point cloud registration. |
Suyi Chen; Hao Xu; Ru Li; Guanghui Liu; Chi-Wing Fu; Shuaicheng Liu; |
377 | FeatEnHancer: Enhancing Hierarchical Features for Object Detection and Beyond Under Low-Light Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that optimizing enhanced image representation pertaining to the loss of the downstream task can result in more expressive representations. Therefore, in this work, we propose a novel module, FeatEnHancer, that hierarchically combines multiscale features using multiheaded attention guided by task-related loss function to create suitable representations. |
Khurram Azeem Hashmi; Goutham Kallempudi; Didier Stricker; Muhammad Zeshan Afzal; |
378 | SOAR: Scene-debiasing Open-set Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This problem severely degrades the open-set action recognition performance when the testing samples exhibit scene distributions different from the training samples. To mitigate this scene bias, we propose a Scene-debiasing Open-set Action Recognition method (SOAR), which features an adversarial reconstruction module and an adaptive adversarial scene classification module. |
Yuanhao Zhai; Ziyi Liu; Zhenyu Wu; Yi Wu; Chunluan Zhou; David Doermann; Junsong Yuan; Gang Hua; |
379 | Physics-Augmented Autoencoder for 3D Skeleton-Based Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce physics-augmented autoencoder (PAA), a framework for 3D skeleton-based human gait recognition. |
Hongji Guo; Qiang Ji; |
380 | Regularized Primitive Graph Learning for Unified Vector Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose GraphMapper, a unified framework for end-to-end vector map extraction from satellite images. |
Lei Wang; Min Dai; Jianan He; Jingwei Huang; |
381 | Saliency Regularization for Self-Training with Partial Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose saliency regularization (SR) for a novel self-training framework. |
Shouwen Wang; Qian Wan; Xiang Xiang; Zhigang Zeng; |
382 | Stabilizing Visual Reinforcement Learning Via Asymmetric Interactive Cooperation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze that the training instability arises from the oscillating self-overfitting of the heavy-optimizable encoder. |
Yunpeng Zhai; Peixi Peng; Yifan Zhao; Yangru Huang; Yonghong Tian; |
383 | FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose FlipNeRF, a novel regularization method for few-shot novel view synthesis by utilizing our proposed flipped reflection rays. |
Seunghyeon Seo; Yeonjin Chang; Nojun Kwak; |
384 | Discovering Spatio-Temporal Rationales for Video Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the challenge, we highlight the importance of identifying question-critical temporal moments and spatial objects from the vast amount of video content. Towards this, we propose a Spatio-Temporal Rationalizer (STR), a differentiable selection module that adaptively collects question-critical moments and objects using cross-modal interaction. |
Yicong Li; Junbin Xiao; Chun Feng; Xiang Wang; Tat-Seng Chua; |
385 | Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly initialized network at each iteration and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly. |
Jiamian Wang; Huan Wang; Yulun Zhang; Yun Fu; Zhiqiang Tao; |
386 | Learning Hierarchical Features with Joint Latent Space Energy-Based Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a joint latent space EBM prior model with multi-layer latent variables for effective hierarchical representation learning. |
Jiali Cui; Ying Nian Wu; Tian Han; |
387 | UniFormerV2: Unlocking The Potential of Image ViTs for Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the emergence of powerful open-source image ViTs, we propose unlocking their potential for video understanding with efficient UniFormer designs. |
Kunchang Li; Yali Wang; Yinan He; Yizhuo Li; Yi Wang; Limin Wang; Yu Qiao; |
388 | G2L: Semantically Aligned and Uniform Video Grounding Via Geodesic and Game Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Geodesic and Game Localization (G2L), a semantically aligned and uniform video grounding framework via geodesic and game theory. |
Hongxiang Li; Meng Cao; Xuxin Cheng; Yaowei Li; Zhihong Zhu; Yuexian Zou; |
389 | TARGET: Federated Class-Continual Learning Via Exemplar-Free Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper focuses on an under-explored yet important problem: Federated Class-Continual Learning (FCCL), where new classes are dynamically added in federated learning. |
Jie Zhang; Chen Chen; Weiming Zhuang; Lingjuan Lyu; |
390 | FashionNTM: Multi-turn Fashion Image Retrieval Via Cascaded Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. |
Anwesan Pal; Sahil Wadhwa; Ayush Jaiswal; Xu Zhang; Yue Wu; Rakesh Chada; Pradeep Natarajan; Henrik I. Christensen; |
391 | MolGrapher: Graph-based Visual Recognition of Chemical Structures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce MolGrapher to recognize chemical structures visually. |
Lucas Morin; Martin Danelljan; Maria Isabel Agea; Ahmed Nassar; Valery Weber; Ingmar Meijer; Peter Staar; Fisher Yu; |
392 | SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image based on improved multiplane images (MPI). |
Xiaoyu Zhou; Zhiwei Lin; Xiaojun Shan; Yongtao Wang; Deqing Sun; Ming-Hsuan Yang; |
393 | DiffV2S: Diffusion-Based Video-to-Speech Synthesis with Vision-Guided Speaker Embedding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel vision-guided speaker embedding extractor using a self-supervised pre-trained model and prompt tuning technique. |
Jeongsoo Choi; Joanna Hong; Yong Man Ro; |
394 | PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. |
Yang Zheng; Adam W. Harley; Bokui Shen; Gordon Wetzstein; Leonidas J. Guibas; |
395 | The Effectiveness of MAE Pre-Pretraining for Billion-Scale Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an additional pre-pretraining stage that is simple and uses the self supervised MAE technique to initialize the model. |
Mannat Singh; Quentin Duval; Kalyan Vasudev Alwala; Haoqi Fan; Vaibhav Aggarwal; Aaron Adcock; Armand Joulin; Piotr Dollar; Christoph Feichtenhofer; Ross Girshick; Rohit Girdhar; Ishan Misra; |
396 | Towards Zero Domain Gap: A Comprehensive Study of Realistic LiDAR Simulation for Autonomy Testing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel "paired-scenario" approach to evaluating the domain gap of a LiDAR simulator by reconstructing digital twins of real world scenarios. |
Sivabalan Manivasagam; Ioan Andrei Bârsan; Jingkang Wang; Ze Yang; Raquel Urtasun; |
397 | GPA-3D: Geometry-aware Prototype Alignment for Unsupervised Domain Adaptive 3D Object Detection from Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel unsupervised domain adaptive 3D detection framework, namely Geometry-aware Prototype Alignment (GPA-3D), which explicitly leverages the intrinsic geometric relationship from point cloud objects to reduce the feature discrepancy, thus facilitating cross-domain transferring. |
Ziyu Li; Jingming Guo; Tongtong Cao; Liu Bingbing; Wankou Yang; |
398 | TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the task of generalizable neural human rendering which trains conditional Neural Radiance Fields (NeRF) from multi-view videos of different characters. |
Xiao Pan; Zongxin Yang; Jianxin Ma; Chang Zhou; Yi Yang; |
399 | LNPL-MIL: Learning from Noisy Pseudo Labels for Promoting Multiple Instance Learning in Whole Slide Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In MIL, we propose a Transformer aware of instance Order and Distribution (TOD-MIL) that strengthens instances correlation and weakens semantical unalignment in the bag. |
Zhuchen Shao; Yifeng Wang; Yang Chen; Hao Bian; Shaohui Liu; Haoqian Wang; Yongbing Zhang; |
400 | Few-Shot Dataset Distillation Via Translative Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on few-shot dataset distillation, where a distilled dataset is synthesized with only a few or even a single network. |
Songhua Liu; Xinchao Wang; |
401 | Random Sub-Samples Generation for Self-Supervised Real Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, as a typical method for self-supervised denoising, the original blind spot network (BSN) assumes that the noise is pixel-wise independent, which is much different from the real cases. To solve this problem, we propose a novel self-supervised real image denoising framework named Sampling Difference As Perturbation (SDAP) based on Random Sub-samples Generation (RSG) with a cyclic sample difference loss. |
Yizhong Pan; Xiao Liu; Xiangyu Liao; Yuanzhouhan Cao; Chao Ren; |
402 | Waffling Around for Performance: Visual Classification with Random Words and Broad Concepts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, averaging over LLM-generated class descriptors, e.g. "waffle, which has a round shape", can notably improve generalization performance. In this work, we critically study this behavior and propose WaffleCLIP, a framework for zero-shot visual classification which simply replaces LLM-generated descriptors with random character and word descriptors. |
Karsten Roth; Jae Myung Kim; A. Sophia Koepke; Oriol Vinyals; Cordelia Schmid; Zeynep Akata; |
403 | Unsupervised Surface Anomaly Detection with Diffusion Probabilistic Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there are three major challenges to the practical application of this approach: 1) the reconstruction quality needs to be further improved since it has a great impact on the final result, especially for images with structural changes; 2) it is observed that for many neural networks, the anomalies can also be well reconstructed, which severely violates the underlying assumption; 3) since reconstruction is an ill-conditioned problem, a test instance may correspond to multiple normal patterns, but most current reconstruction-based methods have ignored this critical fact. In this paper, we propose DiffAD, a method for unsupervised anomaly detection based on the latent diffusion model, inspired by its ability to generate high-quality and diverse images. |
Xinyi Zhang; Naiqi Li; Jiawei Li; Tao Dai; Yong Jiang; Shu-Tao Xia; |
404 | AutoAD II: The Sequel – Who, When, and What in Movie Audio Description Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we develop a new model for automatically generating movie AD, given CLIP visual features of the frames, the cast list, and the temporal locations of the speech; addressing all three of the `who’, `when’, and `what’ questions: (i) who — we introduce a character bank consisting of the character’s name, the actor that played the part, and a CLIP feature of their face, for the principal cast of each movie, and demonstrate how this can be used to improve naming in the generated AD; (ii) when — we investigate several models for determining whether an AD should be generated for a time interval or not, based on the visual content of the interval and its neighbours; and (iii) what — we implement a new vision-language model for this task, that can ingest the proposals from the character bank, whilst conditioning on the visual features using cross-attention, and demonstrate how this improves over previous architectures for AD text generation in an apples-to-apples comparison. |
Tengda Han; Max Bain; Arsha Nagrani; Gul Varol; Weidi Xie; Andrew Zisserman; |
405 | TinyCLIP: CLIP Distillation Via Affinity Mimicking and Weight Inheritance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-scale language-image pre-trained models. |
Kan Wu; Houwen Peng; Zhenghong Zhou; Bin Xiao; Mengchen Liu; Lu Yuan; Hong Xuan; Michael Valenzuela; Xi (Stephen) Chen; Xinggang Wang; Hongyang Chao; Han Hu; |
406 | Hyperbolic Chamfer Distance for Point Cloud Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is well known that CD is vulnerable to outliers, leading to the drift towards suboptimal models. In contrast to the literature where most works address such issues in Euclidean space, we propose an extremely simple yet powerful metric for point cloud completion, namely Hyperbolic Chamfer Distance (HyperCD), that computes CD in hyperbolic space. |
Fangzhou Lin; Yun Yue; Songlin Hou; Xuechu Yu; Yajun Xu; Kazunori D Yamada; Ziming Zhang; |
407 | Democratising 2D Sketch to 3D Shape Retrieval Through Pivoting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the problem of 2D sketch to 3D shape retrieval, but with a focus on democratising the process. |
Pinaki Nath Chowdhury; Ayan Kumar Bhunia; Aneeshan Sain; Subhadeep Koley; Tao Xiang; Yi-Zhe Song; |
408 | Simoun: Synergizing Interactive Motion-appearance Understanding for Vision-based Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the problem, we present Synergizing Interactive Motion-appearance Understanding (Simoun), a unified framework for vision-based RL. |
Yangru Huang; Peixi Peng; Yifan Zhao; Yunpeng Zhai; Haoran Xu; Yonghong Tian; |
409 | AG3D: Learning to Generate 3D Avatars from 2D Image Collections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new adversarial generative model of realistic 3D people from 2D images. |
Zijian Dong; Xu Chen; Jinlong Yang; Michael J. Black; Otmar Hilliges; Andreas Geiger; |
410 | KECOR: Kernel Coding Rate Maximization for Active 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we resort to a novel kernel coding rate maximization (KECOR) strategy which aims to identify the most informative point clouds to acquire labels through the lens of information theory. |
Yadan Luo; Zhuoxiao Chen; Zhen Fang; Zheng Zhang; Mahsa Baktashmotlagh; Zi Huang; |
411 | Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-spectral Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The success of deep neural networks for pan-sharpening is commonly in a form of black box, lacking transparency and interpretability. To alleviate this issue, we propose a novel model-driven deep unfolding framework with image reasoning prior tailored for the pan-sharpening task. |
Man Zhou; Jie Huang; Naishan Zheng; Chongyi Li; |
412 | Representation Disparity-aware Distillation for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on developing knowledge distillation (KD) for compact 3D detectors. |
Yanjing Li; Sheng Xu; Mingbao Lin; Jihao Yin; Baochang Zhang; Xianbin Cao; |
413 | NCHO: Unsupervised Learning for Neural 3D Composition of Humans and Objects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a novel framework for learning a compositional generative model of humans and objects (backpacks, coats, scarves, and more) from real-world 3D scans. |
Taeksoo Kim; Shunsuke Saito; Hanbyul Joo; |
414 | Breaking The Limits of Text-conditioned 3D Motion Synthesis with Elaborative Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Differing from the majority of previous works, which regard actions as single entities and can only generate short sequences for simple motions, we propose EMS, an elaborative motion synthesis model conditioned on detailed natural language descriptions. |
Yijun Qian; Jack Urbanek; Alexander G. Hauptmann; Jungdam Won; |
415 | VL-PET: Vision-and-Language Parameter-Efficient Tuning Via Granularity Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Vision-and-Language Parameter-Efficient Tuning (VL-PET) framework to impose effective control over modular modifications via a novel granularity-controlled mechanism. |
Zi-Yuan Hu; Yanyang Li; Michael R. Lyu; Liwei Wang; |
416 | ROME: Robustifying Memory-Efficient NAS Via Topology Disentanglement and Gradient Accumulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new algorithm called RObustifying Memory-Efficient NAS (ROME) to give a cure. |
Xiaoxing Wang; Xiangxiang Chu; Yuda Fan; Zhexi Zhang; Bo Zhang; Xiaokang Yang; Junchi Yan; |
417 | Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fill the gap, this paper makes progresses from two distinct perspectives: (1) It presents a Hierarchical Concept Graph (HCG) that discriminates and associates multi-granularity concepts with a multi-layered hierarchical structure, aligning visual observations with knowledge across different levels to alleviate data biases. |
Yifeng Zhang; Shi Chen; Qi Zhao; |
418 | 3D-aware Image Generation Using 2D Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel 3D-aware image generation method that leverages 2D diffusion models. |
Jianfeng Xiang; Jiaolong Yang; Binbin Huang; Xin Tong; |
419 | Locating Noise Is Halfway Denoising for Semi- |