Paper Digest: CVPR 2023 Highlights
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) is one of the top computer vision conferences in the world. In 2023, it is to be held in Vancouver, Canada.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.
If you do not want to miss interesting academic papers, you are welcome to sign up our daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: CVPR 2023 Highlights
Paper | Author(s) | |
---|---|---|
1 | GFPose: Learning 3D Human Pose Prior With Gradient Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we present GFPose, a versatile framework to model plausible 3D human poses for various applications. |
Hai Ci; Mingdong Wu; Wentao Zhu; Xiaoxuan Ma; Hao Dong; Fangwei Zhong; Yizhou Wang; |
2 | CXTrack: Improving 3D Point Cloud Tracking With Contextual Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve accurate localization for objects of all sizes, we propose a transformer-based localization head with a novel center embedding module to distinguish the target from distractors. |
Tian-Xing Xu; Yuan-Chen Guo; Yu-Kun Lai; Song-Hai Zhang; |
3 | Deep Frequency Filtering for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Deep Frequency Filtering (DFF) for learning domain-generalizable features, which is the first endeavour to explicitly modulate the frequency components of different transfer difficulties across domains in the latent space during training. |
Shiqi Lin; Zhizheng Zhang; Zhipeng Huang; Yan Lu; Cuiling Lan; Peng Chu; Quanzeng You; Jiang Wang; Zicheng Liu; Amey Parulkar; Viraj Navkal; Zhibo Chen; |
4 | Frame Flexible Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: If we evaluate the model using other frames which are not used in training, we observe the performance will drop significantly (see Fig.1, which is summarized as Temporal Frequency Deviation phenomenon. To fix this issue, we propose a general framework, named Frame Flexible Network (FFN), which not only enables the model to be evaluated at different frames to adjust its computation, but also reduces the memory costs of storing multiple models significantly. |
Yitian Zhang; Yue Bai; Chang Liu; Huan Wang; Sheng Li; Yun Fu; |
5 | Unsupervised Cumulative Domain Adaptation for Foggy Scene Optical Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To handle the practical optical flow under real foggy scenes, in this work, we propose a novel unsupervised cumulative domain adaptation optical flow (UCDA-Flow) framework: depth-association motion adaptation and correlation-alignment motion adaptation. |
Hanyu Zhou; Yi Chang; Wending Yan; Luxin Yan; |
6 | NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With NoisyTwins, we first introduce an effective and inexpensive augmentation strategy for class embeddings, which then decorrelates the latents based on self-supervision in the W space. |
Harsh Rangwani; Lavish Bansal; Kartik Sharma; Tejan Karmali; Varun Jampani; R. Venkatesh Babu; |
7 | DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents DisCoScene: a 3D-aware generative model for high-quality and controllable scene synthesis. |
Yinghao Xu; Menglei Chai; Zifan Shi; Sida Peng; Ivan Skorokhodov; Aliaksandr Siarohin; Ceyuan Yang; Yujun Shen; Hsin-Ying Lee; Bolei Zhou; Sergey Tulyakov; |
8 | Revisiting Self-Similarity: Structural Embedding for Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we revisit the conventional self-similarity descriptor from a convolutional perspective, to encode both the visual and structural cues of the image to global image representation. |
Seongwon Lee; Suhyeon Lee; Hongje Seong; Euntai Kim; |
9 | Minimizing The Accumulated Trajectory Error To Improve Dataset Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these gradient-matching methods suffer from the accumulated trajectory error caused by the discrepancy between the distillation and subsequent evaluation. To alleviate the adverse impact of this accumulated trajectory error, we propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. |
Jiawei Du; Yidi Jiang; Vincent Y. F. Tan; Joey Tianyi Zhou; Haizhou Li; |
10 | Decoupling-and-Aggregating for Image Exposure Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This will limit the statistical and structural modeling capacity for exposure correction. To address this issue, this paper proposes to decouple the contrast enhancement and detail restoration within each convolution process. |
Yang Wang; Long Peng; Liang Li; Yang Cao; Zheng-Jun Zha; |
11 | Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, both approaches employ many computational resources predicting areas or objects that might never be queried by the motion planner. This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network. |
Ben Agro; Quinlan Sykora; Sergio Casas; Raquel Urtasun; |
12 | CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the open challenges and introduces the first quantum-hybrid approach for 3D shape multi-matching; in addition, it is also cycle-consistent. |
Harshil Bhatia; Edith Tretschk; Zorah Lähner; Marcel Seelbach Benkner; Michael Moeller; Christian Theobalt; Vladislav Golyanik; |
13 | TrojViT: Trojan Insertion in Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a stealth and practical ViT-specific backdoor attack TrojViT. |
Mengxin Zheng; Qian Lou; Lei Jiang; |
14 | MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MarS3D, a plug-and-play motion-aware model for semantic segmentation on multi-scan 3D point clouds. |
Jiahui Liu; Chirui Chang; Jianhui Liu; Xiaoyang Wu; Lan Ma; Xiaojuan Qi; |
15 | An Image Quality Assessment Dataset for Portraits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces PIQ23, a portrait-specific IQA dataset of 5116 images of 50 predefined scenarios acquired by 100 smartphones, covering a high variety of brands, models, and use cases. |
Nicolas Chahine; Stefania Calarasanu; Davide Garcia-Civiero; Théo Cayla; Sira Ferradans; Jean Ponce; |
16 | MSeg3D: Multi-Modal 3D Semantic Segmentation for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a multi-modal 3D semantic segmentation model (MSeg3D) with joint intra-modal feature extraction and inter-modal feature fusion to mitigate the modality heterogeneity. |
Jiale Li; Hang Dai; Hao Han; Yong Ding; |
17 | Robust Outlier Rejection for 3D Registration With Variational Bayes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop a novel variational non-local network-based outlier rejection framework for robust alignment. |
Haobo Jiang; Zheng Dang; Zhen Wei; Jin Xie; Jian Yang; Mathieu Salzmann; |
18 | Dynamically Instance-Guided Adaptation: A Backward-Free Approach for Test-Time Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the application of Test-time domain adaptation in semantic segmentation (TTDA-Seg) where both efficiency and effectiveness are crucial. |
Wei Wang; Zhun Zhong; Weijie Wang; Xi Chen; Charles Ling; Boyu Wang; Nicu Sebe; |
19 | Painting 3D Nature in 2D: View Synthesis of Natural Scenes From A Single Semantic Mask Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel approach that takes a single semantic mask as input to synthesize multi-view consistent color images of natural scenes, trained with a collection of single images from the Internet. |
Shangzhan Zhang; Sida Peng; Tianrun Chen; Linzhan Mou; Haotong Lin; Kaicheng Yu; Yiyi Liao; Xiaowei Zhou; |
20 | LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome these, we present LANguage-driven Image-to-image Translation model, dubbed LANIT. |
Jihye Park; Sunwoo Kim; Soohyun Kim; Seokju Cho; Jaejun Yoo; Youngjung Uh; Seungryong Kim; |
21 | MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they generally suffer from two limitations: i) the matching procedure between local frames tends to be inaccurate due to the lack of guidance to force long-range temporal perception; ii) explicit motion learning is usually ignored, leading to partial information loss. To address these issues, we develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder. |
Xiang Wang; Shiwei Zhang; Zhiwu Qing; Changxin Gao; Yingya Zhang; Deli Zhao; Nong Sang; |
22 | Fast Point Cloud Generation With Straight Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While beneficial, the complexity of learning steps has limited its applications to many 3D real-world. To address this limitation, we propose Point Straight Flow (PSF), a model that exhibits impressive performance using one step. |
Lemeng Wu; Dilin Wang; Chengyue Gong; Xingchao Liu; Yunyang Xiong; Rakesh Ranjan; Raghuraman Krishnamoorthi; Vikas Chandra; Qiang Liu; |
23 | Text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To associate with preset attributes, most existing approaches focus on supervised learning for semantically meaningful latent space traversal directions, and each manipulation step is typically determined for an individual attribute. To address this limitation, we propose a Text-guided Unsupervised StyleGAN Latent Transformation (TUSLT) model, which adaptively infers a single transformation step in the latent space of StyleGAN to simultaneously manipulate multiple attributes on a given input image. |
Xiwen Wei; Zhen Xu; Cheng Liu; Si Wu; Zhiwen Yu; Hau San Wong; |
24 | Achieving A Better Stability-Plasticity Trade-Off Via Auxiliary Networks in Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Auxiliary Network Continual Learning (ANCL), a novel method that applies an additional auxiliary network which promotes plasticity to the continually learned model which mainly focuses on stability. |
Sanghwan Kim; Lorenzo Noci; Antonio Orvieto; Thomas Hofmann; |
25 | Power Bundle Adjustment for Large-Scale 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Power Bundle Adjustment as an expansion type algorithm for solving large-scale bundle adjustment problems. |
Simon Weber; Nikolaus Demmel; Tin Chon Chan; Daniel Cremers; |
26 | Picture That Sketch: Photorealistic Image Generation From Abstract Sketches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our contribution at the outset is a decoupled encoder-decoder training paradigm, where the decoder is a StyleGAN trained on photos only. |
Subhadeep Koley; Ayan Kumar Bhunia; Aneeshan Sain; Pinaki Nath Chowdhury; Tao Xiang; Yi-Zhe Song; |
27 | Contrastive Semi-Supervised Learning for Underwater Image Restoration Via Reliable Bank Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a mean-teacher based Semi-supervised Underwater Image Restoration (Semi-UIR) framework to incorporate the unlabeled data into network training. |
Shirui Huang; Keyan Wang; Huan Liu; Jun Chen; Yunsong Li; |
28 | Video Event Restoration Based on Keyframes for Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel U-shaped Swin Transformer Network with Dual Skip Connections (USTN-DSC) for video event restoration, where a cross-attention and a temporal upsampling residual skip connection are introduced to further assist in restoring complex static and dynamic motion object features in the video. |
Zhiwei Yang; Jing Liu; Zhaoyang Wu; Peng Wu; Xiaotao Liu; |
29 | EcoTTA: Memory-Efficient Continual Test-Time Adaptation Via Self-Distilled Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a simple yet effective approach that improves continual test-time adaptation (TTA) in a memory-efficient manner. |
Junha Song; Jungsoo Lee; In So Kweon; Sungha Choi; |
30 | 3D-Aware Object Goal Navigation Via Simultaneous Exploration and Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. |
Jiazhao Zhang; Liu Dai; Fanpeng Meng; Qingnan Fan; Xuelin Chen; Kai Xu; He Wang; |
31 | Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its better efficiency than voxel representation, it has difficulty describing the fine-grained 3D structure of a scene with a single plane. To address this, we propose a tri-perspective view (TPV) representation which accompanies BEV with two additional perpendicular planes. |
Yuanhui Huang; Wenzhao Zheng; Yunpeng Zhang; Jie Zhou; Jiwen Lu; |
32 | Castling-ViT: Compressing Self-Attention Via Switching Towards Linear-Angular Attention at Vision Transformer Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we ask an important research question: Can ViTs learn both global and local context while being more efficient during inference? |
Haoran You; Yunyang Xiong; Xiaoliang Dai; Bichen Wu; Peizhao Zhang; Haoqi Fan; Peter Vajda; Yingyan (Celine) Lin; |
33 | Shape, Pose, and Appearance From A Single Image Via Bootstrapped Radiance Field Inversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available. |
Dario Pavllo; David Joseph Tan; Marie-Julie Rakotosaona; Federico Tombari; |
34 | Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose and promote a more practical label-agnostic setting, where the hackers may exploit the protected data quite differently from the protectors. |
Jiaming Zhang; Xingjun Ma; Qi Yi; Jitao Sang; Yu-Gang Jiang; Yaowei Wang; Changsheng Xu; |
35 | Rethinking Federated Learning With Domain Shift: A Prototype View Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Federated Prototypes Learning (FPL) for federated learning under domain shift. |
Wenke Huang; Mang Ye; Zekun Shi; He Li; Bo Du; |
36 | NoPe-NeRF: Optimising Neural Radiance Field With No Pose Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods still face difficulties during dramatic camera movement. We tackle this challenging problem by incorporating undistorted monocular depth priors. |
Wenjing Bian; Zirui Wang; Kejie Li; Jia-Wang Bian; Victor Adrian Prisacariu; |
37 | HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel hierarchical grouping transformer (HGFormer) to explicitly group pixels to form part-level masks and then whole-level masks. |
Jian Ding; Nan Xue; Gui-Song Xia; Bernt Schiele; Dengxin Dai; |
38 | Distilling Vision-Language Pre-Training To Collaborate With Weakly-Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle this issue without additional annotations, this paper considers to distill free action knowledge from Vision-Language Pre-training (VLP), as we surprisingly observe that the localization results of vanilla VLP have an over-complete issue, which is just complementary to the CBP results. To fuse such complementarity, we propose a novel distillation-collaboration framework with two branches acting as CBP and VLP respectively. |
Chen Ju; Kunhao Zheng; Jinxiang Liu; Peisen Zhao; Ya Zhang; Jianlong Chang; Qi Tian; Yanfeng Wang; |
39 | Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Differently, in this work, we follow a standard teacher-student framework and propose AugSeg, a simple and clean approach that focuses mainly on data perturbations to boost the SSS performance. |
Zhen Zhao; Lihe Yang; Sifan Long; Jimin Pi; Luping Zhou; Jingdong Wang; |
40 | SIEDOB: Semantic Image Editing By Disentangling Object and Background Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, they remain limited in processing content-rich images and suffer from generating unrealistic objects and texture-inconsistent backgrounds. To address this issue, we propose a novel paradigm, Semantic Image Editing by Disentangling Object and Background (SIEDOB), the core idea of which is to explicitly leverages several heterogeneous subnetworks for objects and backgrounds. |
Wuyang Luo; Su Yang; Xinjian Zhang; Weishan Zhang; |
41 | Multiclass Confidence and Localization Calibration for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new train-time technique for calibrating modern object detection methods. |
Bimsara Pathiraja; Malitha Gunawardhana; Muhammad Haris Khan; |
42 | Query-Dependent Video Representation for Moment Retrieval and Highlight Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, the relevance between text query and video contents is sometimes neglected when predicting the moment and its saliency. To tackle this issue, we introduce Query-Dependent DETR (QD-DETR), a detection transformer tailored for MR/HD. |
WonJun Moon; Sangeek Hyun; SangUk Park; Dongchan Park; Jae-Pil Heo; |
43 | Robust 3D Shape Classification Via Non-Local Graph Attention Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a non-local graph attention network (NLGAT), which generates a novel global descriptor through two sub-networks for robust 3D shape classification. |
Shengwei Qin; Zhong Li; Ligang Liu; |
44 | Boosting Verified Training for Robust Image Classifications Via Abstraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel, abstraction-based, certified training method for robust image classifiers. |
Zhaodi Zhang; Zhiyi Xue; Yang Chen; Si Liu; Yueling Zhang; Jing Liu; Min Zhang; |
45 | Exploring Structured Semantic Prior for Multi Label Recognition With Incomplete Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior about the label-to-label correspondence via a semantic prior prompter. |
Zixuan Ding; Ao Wang; Hui Chen; Qiang Zhang; Pengzhang Liu; Yongjun Bao; Weipeng Yan; Jungong Han; |
46 | Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we emphasize the cruciality of instance differences and propose an instance-specific and model-adaptive supervision for semi-supervised semantic segmentation, named iMAS. |
Zhen Zhao; Sifan Long; Jimin Pi; Jingdong Wang; Luping Zhou; |
47 | 3D Shape Reconstruction of Semi-Transparent Worms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This approach is not viable when the subject is semi-transparent and moving in and out of focus. Here we overcome these challenges by rendering a candidate shape with adaptive blurring and transparency for comparison with the images. |
Thomas P. Ilett; Omer Yuval; Thomas Ranner; Netta Cohen; David C. Hogg; |
48 | Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection With Single Point Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Interestingly, during the training phase supervised by point labels, we discover that CNNs first learn to segment a cluster of pixels near the targets, and then gradually converge to predict groundtruth point labels. Motivated by this "mapping degeneration" phenomenon, we propose a label evolution framework named label evolution with single point supervision (LESPS) to progressively expand the point label by leveraging the intermediate predictions of CNNs. |
Xinyi Ying; Li Liu; Yingqian Wang; Ruojing Li; Nuo Chen; Zaiping Lin; Weidong Sheng; Shilin Zhou; |
49 | Swept-Angle Synthetic Wavelength Interferometry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new imaging technique, swept-angle synthetic wavelength interferometry, for full-field micron-scale 3D sensing. |
Alankar Kotwal; Anat Levin; Ioannis Gkioulekas; |
50 | Delving Into Shape-Aware Zero-Shot Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, translating this success to semantic segmentation is not trivial, because this dense prediction task requires not only accurate semantic understanding but also fine shape delineation and existing vision-language models are trained with image-level language descriptions. To bridge this gap, we pursue shape-aware zero-shot semantic segmentation in this study. |
Xinyu Liu; Beiwen Tian; Zhen Wang; Rui Wang; Kehua Sheng; Bo Zhang; Hao Zhao; Guyue Zhou; |
51 | Post-Training Quantization on Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we accelerate generation from the perspective of compressing the noise estimation network. |
Yuzhang Shang; Zhihang Yuan; Bin Xie; Bingzhe Wu; Yan Yan; |
52 | Adaptive Global Decay Process for Event Cameras Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We instead propose a novel decay process for event cameras that adapts to the global scene dynamics and whose latency is in the order of nanoseconds. |
Urbano Miguel Nunes; Ryad Benosman; Sio-Hoi Ieng; |
53 | Multi-Space Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of calculating a single radiance field, we propose a multispace neural radiance field (MS-NeRF) that represents the scene using a group of feature fields in parallel sub-spaces, which leads to a better understanding of the neural network toward the existence of reflective and refractive objects. |
Ze-Xin Yin; Jiaxiong Qiu; Ming-Ming Cheng; Bo Ren; |
54 | Leveraging Inter-Rater Agreement for Classification in The Presence of Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we: (i) show how to leverage the inter-annotator statistics to estimate the noise distribution to which labels are subject; (ii) introduce methods that use the estimate of the noise distribution to learn from the noisy dataset; and (iii) establish generalization bounds in the empirical risk minimization framework that depend on the estimated quantities. |
Maria Sofia Bucarelli; Lucas Cassano; Federico Siciliano; Amin Mantrach; Fabrizio Silvestri; |
55 | Bitstream-Corrupted JPEG Images Are Restorable: Two-Stage Compensation and Alignment Framework for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study a real-world JPEG image restoration problem with bit errors on the encrypted bitstream. |
Wenyang Liu; Yi Wang; Kim-Hui Yap; Lap-Pui Chau; |
56 | Analyzing Physical Impacts Using Transient Surface Wave Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extract information from the transient surface vibrations simultaneously measured at a sparse set of object points using the dual-shutter camera described by Sheinin[31]. |
Tianyuan Zhang; Mark Sheinin; Dorian Chan; Mark Rau; Matthew O’Toole; Srinivasa G. Narasimhan; |
57 | X-Pruner: EXplainable Pruning for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent studies have proposed to prune transformers in an unexplainable manner, which overlook the relationship between internal units of the model and the target class, thereby leading to inferior performance. To alleviate this problem, we propose a novel explainable pruning framework dubbed X-Pruner, which is designed by considering the explainability of the pruning criterion. |
Lu Yu; Wei Xiang; |
58 | Hard Sample Matters A Lot in Zero-Shot Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Accordingly, quantized models obtained by these methods suffer from significant performance degradation on hard samples. To address this issue, we propose HArd sample Synthesizing and Training (HAST). |
Huantong Li; Xiangmiao Wu; Fanbing Lv; Daihai Liao; Thomas H. Li; Yonggang Zhang; Bo Han; Mingkui Tan; |
59 | Meta Compositional Referring Expression Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, through the lens of meta learning, we propose a Meta Compositional Referring Expression Segmentation (MCRES) framework to enhance model compositional generalization performance. |
Li Xu; Mark He Huang; Xindi Shang; Zehuan Yuan; Ying Sun; Jun Liu; |
60 | Histopathology Whole Slide Image Analysis With Heterogeneous Graph Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel heterogeneous graph-based framework to leverage the inter-relationships among different types of nuclei for WSI analysis. |
Tsai Hor Chan; Fernando Julio Cendra; Lan Ma; Guosheng Yin; Lequan Yu; |
61 | ScanDMM: A Deep Markov Model of Scanpath Prediction for 360deg Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a scanpath prediction method for 360deg images by designing a novel Deep Markov Model (DMM) architecture, namely ScanDMM. |
Xiangjie Sui; Yuming Fang; Hanwei Zhu; Shiqi Wang; Zhou Wang; |
62 | Towards All-in-One Pre-Training Via Maximizing Multi-Modal Mutual Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first propose a general multi-modal mutual information formula as a unified optimization target and demonstrate that all mainstream approaches are special cases of our framework. Under this unified perspective, we propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training). |
Weijie Su; Xizhou Zhu; Chenxin Tao; Lewei Lu; Bin Li; Gao Huang; Yu Qiao; Xiaogang Wang; Jie Zhou; Jifeng Dai; |
63 | Aligning Bag of Regions for Open-Vocabulary Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to align the embedding of bag of regions beyond individual regions. |
Size Wu; Wenwei Zhang; Sheng Jin; Wentao Liu; Chen Change Loy; |
64 | Two-View Geometry Scoring Without Correspondences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a remedy, we propose the Fundamental Scoring Network (FSNet), which infers a score for a pair of overlapping images and any proposed fundamental matrix. |
Axel Barroso-Laguna; Eric Brachmann; Victor Adrian Prisacariu; Gabriel J. Brostow; Daniyar Turmukhambetov; |
65 | Annealing-Based Label-Transfer Learning for Open World Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we claim the learning of object detection could be seen as an object-level feature-entanglement process, where unknown traits are propagated to the known proposals through convolutional operations and could be distilled to benefit unknown recognition without manual selection. |
Yuqing Ma; Hainan Li; Zhange Zhang; Jinyang Guo; Shanghang Zhang; Ruihao Gong; Xianglong Liu; |
66 | Continual Semantic Segmentation With Automatic Memory Sample Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel memory sample selection mechanism that selects informative samples for effective replay in a fully automatic way by considering comprehensive factors including sample diversity and class performance. |
Lanyun Zhu; Tianrun Chen; Jianxiong Yin; Simon See; Jun Liu; |
67 | Meta-Tuning Loss Functions and Data Augmentation for Few-Shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite their simplicity, fine-tuning based approaches typically yield competitive detection results. Based on this observation, we focus on the role of loss functions and augmentations as the force driving the fine-tuning process, and propose to tune their dynamics through meta-learning principles. |
Berkan Demirel; Orhun Buğra Baran; Ramazan Gokberk Cinbis; |
68 | A Light Weight Model for Active Speaker Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although these methods have achieved excellent performance, their high memory and computational power consumption render their application to resource-limited scenarios difficult. Therefore, in this study, a lightweight active speaker detection architecture is constructed by reducing the number of input candidates, splitting 2D and 3D convolutions for audio-visual feature extraction, and applying gated recurrent units with low computational complexity for cross-modal modeling. |
Junhua Liao; Haihan Duan; Kanghui Feng; Wanbing Zhao; Yanbing Yang; Liangyin Chen; |
69 | Self-Supervised Video Forensics By Audio-Visual Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Manipulated videos often contain subtle inconsistencies between their visual and audio signals. We propose a video forensics method, based on anomaly detection, that can identify these inconsistencies, and that can be trained solely using real, unlabeled data. |
Chao Feng; Ziyang Chen; Andrew Owens; |
70 | CLIP2Scene: Towards Label-Efficient 3D Scene Understanding By CLIP Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose CLIP2Scene, a simple yet effective framework that transfers CLIP knowledge from 2D image-text pre-trained models to a 3D point cloud network. |
Runnan Chen; Youquan Liu; Lingdong Kong; Xinge Zhu; Yuexin Ma; Yikang Li; Yuenan Hou; Yu Qiao; Wenping Wang; |
71 | GCFAgg: Global and Cross-View Feature Aggregation for Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing deep clustering methods learn consensus representation or view-specific representations from multiple views via view-wise aggregation way, where they ignore structure relationship of all samples. In this paper, we propose a novel multi-view clustering network to address these problems, called Global and Cross-view Feature Aggregation for Multi-View Clustering (GCFAggMVC). |
Weiqing Yan; Yuanyang Zhang; Chenlei Lv; Chang Tang; Guanghui Yue; Liang Liao; Weisi Lin; |
72 | Class Balanced Adaptive Pseudo Labeling for Federated Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, problems in FSSL are still yet to be solved. To seek for a fundamental solution to this problem, we present Class Balanced Adaptive Pseudo Labeling (CBAFed), to study FSSL from the perspective of pseudo labeling. |
Ming Li; Qingli Li; Yan Wang; |
73 | Rethinking Out-of-Distribution (OOD) Detection: Masked Image Modeling Is All You Need Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we find surprisingly that simply using reconstruction-based methods could boost the performance of OOD detection significantly. |
Jingyao Li; Pengguang Chen; Zexin He; Shaozuo Yu; Shu Liu; Jiaya Jia; |
74 | DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell Detection and Counting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we propose guided posterior regularization DeGPR, which assists an object detector by guiding it to exploit discriminative features among cells. |
Aayush Kumar Tyagi; Chirag Mohapatra; Prasenjit Das; Govind Makharia; Lalita Mehra; Prathosh AP; Mausam; |
75 | Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the trend of large-scale unsupervised learning in 3D has yet to emerge due to two stumbling blocks: the inefficiency of matching RGB-D frames as contrastive views and the annoying mode collapse phenomenon mentioned in previous works. Turning the two stumbling blocks into empirical stepping stones, we first propose an efficient and effective contrastive learning framework, which generates contrastive views directly on scene-level point clouds by a well-curated data augmentation pipeline and a practical view mixing strategy. Second, we introduce reconstructive learning on the contrastive learning framework with an exquisite design of contrastive cross masks, which targets the reconstruction of point color and surfel normal. |
Xiaoyang Wu; Xin Wen; Xihui Liu; Hengshuang Zhao; |
76 | Multi Domain Learning for Motion Magnification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The deep learning based approach has higher magnification but is prone to severe artifacts in some scenarios. We propose a new phase based deep network for video motion magnification that operates in both domains (frequency and spatial) to address this issue. |
Jasdeep Singh; Subrahmanyam Murala; G. Sankara Raju Kosuru; |
77 | LOGO: A Long-Form Video Dataset for Group Action Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most existing methods and datasets focus on single-person short-sequence scenes, hindering the application of AQA in more complex situations. To address this issue, we construct a new multi-person long-form video dataset for action quality assessment named LOGO. |
Shiyi Zhang; Wenxun Dai; Sujia Wang; Xiangwei Shen; Jiwen Lu; Jie Zhou; Yansong Tang; |
78 | A Simple Baseline for Video Restoration With Grouped Spatial-Temporal Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a simple yet effective framework for video restoration. |
Dasong Li; Xiaoyu Shi; Yi Zhang; Ka Chun Cheung; Simon See; Xiaogang Wang; Hongwei Qin; Hongsheng Li; |
79 | UniSim: A Neural Closed-Loop Sensor Simulator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present UniSim, a neural sensor simulator that takes a single recorded log captured by a sensor-equipped vehicle and converts it into a realistic closed-loop multi-sensor simulation. |
Ze Yang; Yun Chen; Jingkang Wang; Sivabalan Manivasagam; Wei-Chiu Ma; Anqi Joyce Yang; Raquel Urtasun; |
80 | ItKD: Interchange Transfer-Based Knowledge Distillation for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To learn the map-view feature of a teacher network, the features from teacher and student networks are independently passed through the shared autoencoder; here, we use a compressed representation loss that binds the channel-wised compression knowledge from both student and teacher networks as a kind of regularization. |
Hyeon Cho; Junyong Choi; Geonwoo Baek; Wonjun Hwang; |
81 | SliceMatch: Geometry-Guided Aggregation for Cross-View Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SliceMatch, which consists of ground and aerial feature extractors, feature aggregators, and a pose predictor. |
Ted Lentsch; Zimin Xia; Holger Caesar; Julian F. P. Kooij; |
82 | 2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address errors that arise from low-light regions and other night-related attributes in images, we propose a night-specific augmentation pipeline called NightAug. |
Mikhail Kennerley; Jian-Gang Wang; Bharadwaj Veeravalli; Robby T. Tan; |
83 | Prefix Conditioning Unifies Language and Label Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study a pretraining strategy that uses both classification and caption datasets to unite their complementary benefits. |
Kuniaki Saito; Kihyuk Sohn; Xiang Zhang; Chun-Liang Li; Chen-Yu Lee; Kate Saenko; Tomas Pfister; |
84 | Panoptic Lifting for 3D Scene Understanding With Neural Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Panoptic Lifting, a novel approach for learning panoptic 3D volumetric representations from images of in-the-wild scenes. |
Yawar Siddiqui; Lorenzo Porzi; Samuel Rota Bulò; Norman Müller; Matthias Nießner; Angela Dai; Peter Kontschieder; |
85 | WeatherStream: Light Transport Automation of Single Image Deweathering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce WeatherStream, an automatic pipeline capturing all real-world weather effects (rain, snow, and rain fog degradations), along with their clean image pairs. |
Howard Zhang; Yunhao Ba; Ethan Yang; Varan Mehra; Blake Gella; Akira Suzuki; Arnold Pfahnl; Chethan Chinder Chandrappa; Alex Wong; Achuta Kadambi; |
86 | Learning To Detect Mirrors From Videos Via Dual Correspondences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our observation is that there are often correspondences between the contents inside (reflected) and outside (real) of a mirror, but such correspondences may not always appear in every frame, e.g., due to the change of camera pose. This inspires us to propose a video mirror detection method, named VMD-Net, that can tolerate spatially missing correspondences by considering the mirror correspondences at both the intra-frame level as well as inter-frame level via a dual correspondence module that looks over multiple frames spatially and temporally for correlating correspondences. |
Jiaying Lin; Xin Tan; Rynson W.H. Lau; |
87 | Single View Scene Scale Estimation Using Scale Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a single image scale estimation method based on a novel scale field representation. |
Byeong-Uk Lee; Jianming Zhang; Yannick Hold-Geoffroy; In So Kweon; |
88 | Learning Semantic-Aware Disentangled Representation for Flexible 3D Human Body Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a human body representation with fine-grained semantics and high reconstruction-accuracy in an unsupervised setting. |
Xiaokun Sun; Qiao Feng; Xiongzheng Li; Jinsong Zhang; Yu-Kun Lai; Jingyu Yang; Kun Li; |
89 | Generating Features With Increased Crop-Related Diversity for Few-Shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Training a robust classifier against this crop-related variability requires abundant training data, which is not available in few-shot settings. To mitigate this issue, we propose a novel variational autoencoder (VAE) based data generation model, which is capable of generating data with increased crop-related diversity. |
Jingyi Xu; Hieu Le; Dimitris Samaras; |
90 | Towards Scalable Neural Representation for Diverse Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first show that instead of dividing videos into small subsets and encoding them with separate models, encoding long and diverse videos jointly with a unified model achieves better compression results. Based on this observation, we propose D-NeRV, a novel neural representation framework designed to encode diverse videos by (i) decoupling clip-specific visual content from motion information, (ii) introducing temporal reasoning into the implicit neural network, and (iii) employing the task-oriented flow as intermediate output to reduce spatial redundancies. |
Bo He; Xitong Yang; Hanyu Wang; Zuxuan Wu; Hao Chen; Shuaiyi Huang; Yixuan Ren; Ser-Nam Lim; Abhinav Shrivastava; |
91 | The Devil Is in The Points: Weakly Semi-Supervised Instance Segmentation Via Point-Guided Mask Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel learning scheme named weakly semi-supervised instance segmentation (WSSIS) with point labels for budget-efficient and high-performance instance segmentation. |
Beomyoung Kim; Joonhyun Jeong; Dongyoon Han; Sung Ju Hwang; |
92 | Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first propose a novel method for generating composite adversarial examples. Our method can find the optimal attack composition by utilizing component-wise projected gradient descent and automatic attack-order scheduling. We then propose generalized adversarial training (GAT) to extend model robustness from Lp-ball to composite semantic perturbations, such as the combination of Hue, Saturation, Brightness, Contrast, and Rotation. |
Lei Hsiung; Yun-Yun Tsai; Pin-Yu Chen; Tsung-Yi Ho; |
93 | Language-Guided Audio-Visual Source Separation Via Trimodal Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data. |
Reuben Tan; Arijit Ray; Andrea Burns; Bryan A. Plummer; Justin Salamon; Oriol Nieto; Bryan Russell; Kate Saenko; |
94 | CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition With Variational Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel contrastive visual-textual transformation for SLR, CVT-SLR, to fully explore the pretrained knowledge of both the visual and language modalities. |
Jiangbin Zheng; Yile Wang; Cheng Tan; Siyuan Li; Ge Wang; Jun Xia; Yidong Chen; Stan Z. Li; |
95 | DynaMask: Dynamic Mask Selection for Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to dynamically select suitable masks for different object proposals. |
Ruihuang Li; Chenhang He; Shuai Li; Yabin Zhang; Lei Zhang; |
96 | Paint By Example: Exemplar-Based Image Editing With Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate exemplar-guided image editing for more precise control. |
Binxin Yang; Shuyang Gu; Bo Zhang; Ting Zhang; Xuejin Chen; Xiaoyan Sun; Dong Chen; Fang Wen; |
97 | Ego-Body Pose Estimation Via Ego-Head Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To eliminate the need for paired egocentric video and human motions, we propose a new method, Ego-Body Pose Estimation via Ego-Head Pose Estimation (EgoEgo), which decomposes the problem into two stages, connected by the head motion as an intermediate representation. |
Jiaman Li; Karen Liu; Jiajun Wu; |
98 | SAP-DETR: Bridging The Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To bridge the gap between the reference points of salient queries and Transformer detectors, we propose SAlient Point-based DETR (SAP-DETR) by treating object detection as a transformation from salient points to instance objects. |
Yang Liu; Yao Zhang; Yixin Wang; Yang Zhang; Jiang Tian; Zhongchao Shi; Jianping Fan; Zhiqiang He; |
99 | GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast to previous 3D MAE frameworks, which either design a complex decoder to infer masked information from maintained regions or adopt sophisticated masking strategies, we instead propose a much simpler paradigm. |
Honghui Yang; Tong He; Jiaheng Liu; Hua Chen; Boxi Wu; Binbin Lin; Xiaofei He; Wanli Ouyang; |
100 | Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel framework to capture more fine-grained clues in complex scenarios for tampered text detection, termed as Document Tampering Detector (DTD), which consists of a Frequency Perception Head (FPH) to compensate the deficiencies caused by the inconspicuous visual features, and a Multi-view Iterative Decoder (MID) for fully utilizing the information of features in different scales. |
Chenfan Qu; Chongyu Liu; Yuliang Liu; Xinhong Chen; Dezhi Peng; Fengjun Guo; Lianwen Jin; |
101 | Learning Rotation-Equivariant Features for Visual Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a self-supervised learning framework to extract discriminative rotation-invariant descriptors using group-equivariant CNNs. |
Jongmin Lee; Byungjin Kim; Seungwook Kim; Minsu Cho; |
102 | DexArt: Benchmarking Generalizable Dexterous Manipulation With Articulated Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a new benchmark called DexArt, which involves Dexterous manipulation with Articulated objects in a physical simulator. |
Chen Bao; Helin Xu; Yuzhe Qin; Xiaolong Wang; |
103 | DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose an improved model called DeSTSeg, which integrates a pre-trained teacher network, a denoising student encoder-decoder, and a segmentation network into one framework. |
Xuan Zhang; Shiyu Li; Xi Li; Ping Huang; Jiulong Shan; Ting Chen; |
104 | Neural Rate Estimator and Unsupervised Learning for Efficient Distributed Image Analytics in Split-DNN Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most works on compression for image analytics use heuristic approaches to estimate the rate, leading to suboptimal performance. We propose a high-quality ‘neural rate-estimator’ to address this gap. |
Nilesh Ahuja; Parual Datta; Bhavya Kanzariya; V. Srinivasa Somayazulu; Omesh Tickoo; |
105 | Object Pop-Up: Can We Infer 3D Objects and Their Poses From Human Interactions Alone? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the inverse perspective is significantly less explored: Can we infer 3D objects and their poses from human interactions alone? Our investigation follows this direction, showing that a generic 3D human point cloud is enough to pop up an unobserved object, even when the user is just imitating a functionality (e.g., looking through a binocular) without involving a tangible counterpart. |
Ilya A. Petrov; Riccardo Marin; Julian Chibane; Gerard Pons-Moll; |
106 | VoP: Text-Video Co-Operative Prompt Tuning for Cross-Modal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the VoP: Text-Video Co-operative Prompt Tuning for efficient tuning on the text-video retrieval task. |
Siteng Huang; Biao Gong; Yulin Pan; Jianwen Jiang; Yiliang Lv; Yuyuan Li; Donglin Wang; |
107 | Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the former, we propose a simple modification to the standard triplet loss, that explicitly enforces separation amongst photos/sketch instances. |
Aneeshan Sain; Ayan Kumar Bhunia; Subhadeep Koley; Pinaki Nath Chowdhury; Soumitri Chattopadhyay; Tao Xiang; Yi-Zhe Song; |
108 | You Do Not Need Additional Priors or Regularizers in Retinex-Based Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a contrastive learning method and a self-knowledge distillation method that allow training our Retinex-based model for Retinex decomposition without elaborate hand-crafted regularization functions. |
Huiyuan Fu; Wenkai Zheng; Xiangyu Meng; Xin Wang; Chuanming Wang; Huadong Ma; |
109 | PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Driven by the principle of explainability-by-design, we introduce PIP-Net (Patch-based Intuitive Prototypes Network): an interpretable image classification model that learns prototypical parts in a self-supervised fashion which correlate better with human vision. |
Meike Nauta; Jörg Schlötterer; Maurice van Keulen; Christin Seifert; |
110 | SCADE: NeRFs from Space Carving With Ambiguity-Aware Depth Estimates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a well-known drawback of NeRFs is the less-than-ideal performance under a small number of views, due to insufficient constraints enforced by volumetric rendering. To address this issue, we introduce SCADE, a novel technique that improves NeRF reconstruction quality on sparse, unconstrained input views for in-the-wild indoor scenes. |
Mikaela Angelina Uy; Ricardo Martin-Brualla; Leonidas Guibas; Ke Li; |
111 | Re-Thinking Model Inversion Attacks Against Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we revisit MI, study two fundamental issues pertaining to all state-of-the-art (SOTA) MI algorithms, and propose solutions to these issues which lead to a significant boost in attack performance for all SOTA MI. |
Ngoc-Bao Nguyen; Keshigeyan Chandrasegaran; Milad Abdollahzadeh; Ngai-Man Cheung; |
112 | 1% VS 100%: Parameter-Efficient Low Rank Adapter for Dense Predictions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose LoRand, a method for fine-tuning large-scale vision models with a better trade-off between task performance and the number of trainable parameters. |
Dongshuo Yin; Yiran Yang; Zhechao Wang; Hongfeng Yu; Kaiwen Wei; Xian Sun; |
113 | ResFormer: Scaling ViTs With Multi-Resolution Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions. |
Rui Tian; Zuxuan Wu; Qi Dai; Han Hu; Yu Qiao; Yu-Gang Jiang; |
114 | You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It is suboptimal in term of saving computation power to ignore the early exiting in encoder component. To handle this challenge, we propose a novel early exiting strategy for unified visual language models, which allows dynamically skip the layers in encoder and decoder simultaneously in term of input layer-wise similarities with multiple times of early exiting, namely MuE. |
Shengkun Tang; Yaqing Wang; Zhenglun Kong; Tianchi Zhang; Yao Li; Caiwen Ding; Yanzhi Wang; Yi Liang; Dongkuan Xu; |
115 | CloSET: Modeling Clothed Humans on Continuous Surface With Explicit Template Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing learning-based methods typically add pose-dependent deformations upon a minimally-clothed mesh template or a learned implicit template, which have limitations in capturing details or hinder end-to-end learning. In this paper, we revisit point-based solutions and propose to decompose explicit garment-related templates and then add pose-dependent wrinkles to them. |
Hongwen Zhang; Siyou Lin; Ruizhi Shao; Yuxiang Zhang; Zerong Zheng; Han Huang; Yandong Guo; Yebin Liu; |
116 | BUOL: A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D Scene Reconstruction From A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose BUOL, a Bottom-Up framework with Occupancy-aware Lifting to address the two issues for panoptic 3D scene reconstruction from a single image. |
Tao Chu; Pan Zhang; Qiong Liu; Jiaqi Wang; |
117 | Hierarchical Video-Moment Retrieval and Step-Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such an end-to-end setup would allow for many interesting applications, e.g., a text-based search that finds a relevant video from a video corpus, extracts the most relevant moment from that video, and segments the moment into important steps with captions. To address this, we present the HiREST (HIerarchical REtrieval and STep-captioning) dataset and propose a new benchmark that covers hierarchical information retrieval and visual/textual stepwise summarization from an instructional video corpus. |
Abhay Zala; Jaemin Cho; Satwik Kottur; Xilun Chen; Barlas Oguz; Yashar Mehdad; Mohit Bansal; |
118 | PROB: Probabilistic Objectness for Open World Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Herein, we introduce a novel probabilistic framework for objectness estimation, where we alternate between probability distribution estimation and objectness likelihood maximization of known objects in the embedded feature space – ultimately allowing us to estimate the objectness probability of different proposals. |
Orr Zohar; Kuan-Chieh Wang; Serena Yeung; |
119 | PD-Quant: Post-Training Quantization Based on Prediction Difference Metric Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods attempt to determine these parameters by minimize the distance between features before and after quantization, but such an approach only considers local information and may not result in the most optimal quantization parameters. We analyze this issue and propose PD-Quant, a method that addresses this limitation by considering global information. |
Jiawei Liu; Lin Niu; Zhihang Yuan; Dawei Yang; Xinggang Wang; Wenyu Liu; |
120 | AUNet: Learning Relations Between Action Units for Face Forgery Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Observing that face manipulation may alter the relation between different facial action units (AU), we propose the Action Units Relation Learning framework to improve the generality of forgery detection. |
Weiming Bai; Yufan Liu; Zhipeng Zhang; Bing Li; Weiming Hu; |
121 | SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation. |
Zhizhuo Zhou; Shubham Tulsiani; |
122 | PolyFormer: Referring Image Segmentation As Sequential Polygon Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks. |
Jiang Liu; Hui Ding; Zhaowei Cai; Yuting Zhang; Ravi Kumar Satzoda; Vijay Mahadevan; R. Manmatha; |
123 | Seeing What You Miss: Vision-Language Pre-Training With Semantic Completion Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most of them pay little attention to the global semantic features generated for the masked data, resulting in a limited cross-modal alignment ability of global representations. Therefore, in this paper, we propose a novel Semantic Completion Learning (SCL) task, complementary to existing masked modeling tasks, to facilitate global-to-local alignment. |
Yatai Ji; Rongcheng Tu; Jie Jiang; Weijie Kong; Chengfei Cai; Wenzhe Zhao; Hongfa Wang; Yujiu Yang; Wei Liu; |
124 | Interactive Segmentation As Gaussion Process Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Albeit achieving promising performance, they do not fully and explicitly utilize and propagate the click information, inevitably leading to unsatisfactory segmentation results, even at clicked points. Against this issue, in this paper, we propose to formulate the IS task as a Gaussian process (GP)-based pixel-wise binary classification model on each image. |
Minghao Zhou; Hong Wang; Qian Zhao; Yuexiang Li; Yawen Huang; Deyu Meng; Yefeng Zheng; |
125 | Differentiable Shadow Mapping for Efficient Inverse Graphics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate at several inverse graphics problems that differentiable shadow maps are orders of magnitude faster than differentiable light transport simulation with similar accuracy — while differentiable rasterization without shadows often fails to converge. |
Markus Worchel; Marc Alexa; |
126 | Dynamic Focus-Aware Positional Queries for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously. |
Haoyu He; Jianfei Cai; Zizheng Pan; Jing Liu; Jing Zhang; Dacheng Tao; Bohan Zhuang; |
127 | A Practical Stereo Depth System for Smart Glasses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the design of a productionized end-to-end stereo depth sensing system that does pre-processing, online stereo rectification, and stereo depth estimation with a fallback to monocular depth estimation when rectification is unreliable. |
Jialiang Wang; Daniel Scharstein; Akash Bapat; Kevin Blackburn-Matzen; Matthew Yu; Jonathan Lehman; Suhib Alsisan; Yanghan Wang; Sam Tsai; Jan-Michael Frahm; Zijian He; Peter Vajda; Michael F. Cohen; Matt Uyttendaele; |
128 | Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose three general approaches to construct latent modality structures. |
Qian Jiang; Changyou Chen; Han Zhao; Liqun Chen; Qing Ping; Son Dinh Tran; Yi Xu; Belinda Zeng; Trishul Chilimbi; |
129 | PointConvFormer: Revenge of The Point-Based Convolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce PointConvFormer, a novel building block for point cloud based deep network architectures. |
Wenxuan Wu; Li Fuxin; Qi Shan; |
130 | Instant Volumetric Head Avatars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Instant Volumetric Head Avatars (INSTA), a novel approach for reconstructing photo-realistic digital avatars instantaneously. |
Wojciech Zielonka; Timo Bolkart; Justus Thies; |
131 | HARP: Personalized Hand Reconstruction From A Monocular RGB Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present HARP (HAnd Reconstruction and Personalization), a personalized hand avatar creation approach that takes a short monocular RGB video of a human hand as input and reconstructs a faithful hand avatar exhibiting a high-fidelity appearance and geometry. |
Korrawe Karunratanakul; Sergey Prokudin; Otmar Hilliges; Siyu Tang; |
132 | Variational Distribution Learning for Unsupervised Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a text-to-image generation algorithm based on deep neural networks when text captions for images are unavailable during training. |
Minsoo Kang; Doyup Lee; Jiseob Kim; Saehoon Kim; Bohyung Han; |
133 | MetaMix: Towards Corruption-Robust Continual Learning With Temporally Self-Adaptive Data Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our empirical evaluation results show that existing state-of-the-art (SOTA) CL models are particularly vulnerable to various data corruptions during testing. To make them trustworthy and robust to corruptions deployed in safety-critical scenarios, we propose a meta-learning framework of self-adaptive data augmentation to tackle the corruption robustness in CL. |
Zhenyi Wang; Li Shen; Donglin Zhan; Qiuling Suo; Yanjun Zhu; Tiehang Duan; Mingchen Gao; |
134 | Ultra-High Resolution Segmentation With Ultra-Rich Context: A Novel Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the increasing interest and rapid development of methods for Ultra-High Resolution (UHR) segmentation, a large-scale benchmark covering a wide range of scenes with full fine-grained dense annotations is urgently needed to facilitate the field. To this end, the URUR dataset is introduced, in the meaning of Ultra-High Resolution dataset with Ultra-Rich Context. |
Deyi Ji; Feng Zhao; Hongtao Lu; Mingyuan Tao; Jieping Ye; |
135 | DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we first establish a surprisingly simple but strong benchmark for generalization which utilizes diverse augmentations within a training minibatch, and show that this can learn a more balanced distribution of features. Further, we propose Diversify-Aggregate-Repeat Training (DART) strategy that first trains diverse models using different augmentations (or domains) to explore the loss basin, and further Aggregates their weights to combine their expertise and obtain improved generalization. |
Samyak Jain; Sravanti Addepalli; Pawan Kumar Sahu; Priyam Dey; R. Venkatesh Babu; |
136 | Cross-Domain Image Captioning With Discriminative Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that fine-tuning an out-of-the-box neural captioner with a self-supervised discriminative communication objective helps to recover a plain, visually descriptive language that is more informative about image contents. |
Roberto Dessì; Michele Bevilacqua; Eleonora Gualdoni; Nathanaël Carraz Rakotonirina; Francesca Franzon; Marco Baroni; |
137 | Accelerating Vision-Language Pretraining With Free Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To accelerate the convergence of VLP, we propose a new pretraining task, namely, free language modeling (FLM), that enables a 100% prediction rate with arbitrary corruption rates. |
Teng Wang; Yixiao Ge; Feng Zheng; Ran Cheng; Ying Shan; Xiaohu Qie; Ping Luo; |
138 | Efficient Mask Correction for Click-Based Interactive Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an efficient method to correct the mask with a lightweight mask correction network. |
Fei Du; Jianlong Yuan; Zhibin Wang; Fan Wang; |
139 | DBARF: Deep Bundle-Adjusting Generalizable Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first analyze the difficulties of jointly optimizing camera poses with GeNeRFs, and then further propose our DBARF to tackle these issues. |
Yu Chen; Gim Hee Lee; |
140 | EvShutter: Transforming Events for Unconstrained Rolling Shutter Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new method, called Eventful Shutter (EvShutter), that corrects RS artifacts using a single RGB image and event information with high temporal resolution. |
Julius Erbach; Stepan Tulyakov; Patricia Vitoria; Alfredo Bochicchio; Yuanyou Li; |
141 | Graphics Capsule: Learning Hierarchical 3D Face Representations From 2D Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an Inverse Graphics Capsule Network (IGC-Net) to learn the hierarchical 3D face representations from large-scale unlabeled images. |
Chang Yu; Xiangyu Zhu; Xiaomei Zhang; Zhaoxiang Zhang; Zhen Lei; |
142 | Connecting The Dots: Floorplan Reconstruction Using Two-Level Queries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address 2D floorplan reconstruction from 3D scans. |
Yuanwen Yue; Theodora Kontogianni; Konrad Schindler; Francis Engelmann; |
143 | Analyzing and Diagnosing Pose Estimation With Attributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Pose Integrated Gradient (PoseIG), the first interpretability technique designed for pose estimation. |
Qiyuan He; Linlin Yang; Kerui Gu; Qiuxia Lin; Angela Yao; |
144 | Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We experimentally find that the root lies in two kinds of ambiguities: (1) Selection ambiguity that selected pseudo labels are less accurate, since classification scores cannot properly represent the localization quality. (2) Assignment ambiguity that samples are matched with improper labels in pseudo-label assignment, as the strategy is misguided by missed objects and inaccurate pseudo boxes. To tackle these problems, we propose a Ambiguity-Resistant Semi-supervised Learning (ARSL) for one-stage detectors. |
Chang Liu; Weiming Zhang; Xiangru Lin; Wei Zhang; Xiao Tan; Junyu Han; Xiaomao Li; Errui Ding; Jingdong Wang; |
145 | Scalable, Detailed and Mask-Free Universal Photometric Stereo Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce SDM-UniPS, a groundbreaking Scalable, Detailed, Mask-free, and Universal Photometric Stereo network. |
Satoshi Ikehata; |
146 | Towards High-Quality and Efficient Video Super-Resolution Via Spatial-Temporal Data Overfitting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to a minimum. |
Gen Li; Jie Ji; Minghai Qin; Wei Niu; Bin Ren; Fatemeh Afghah; Linke Guo; Xiaolong Ma; |
147 | Make-a-Story: Visual Memory Conditioned Consistent Story Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we address the aforementioned challenges and propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context across the generated frames. |
Tanzila Rahman; Hsin-Ying Lee; Jian Ren; Sergey Tulyakov; Shweta Mahajan; Leonid Sigal; |
148 | BiFormer: Vision Transformer With Bi-Level Routing Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A series of works attempt to alleviate this problem by introducing handcrafted and content-agnostic sparsity into attention, such as restricting the attention operation to be inside local windows, axial stripes, or dilated windows. In contrast to these approaches, we propose a novel dynamic sparse attention via bi-level routing to enable a more flexible allocation of computations with content awareness. |
Lei Zhu; Xinjiang Wang; Zhanghan Ke; Wayne Zhang; Rynson W.H. Lau; |
149 | Masked Autoencoders Enable Efficient Knowledge Distillers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper studies the potential of distilling knowledge from pre-trained models, especially Masked Autoencoders. |
Yutong Bai; Zeyu Wang; Junfei Xiao; Chen Wei; Huiyu Wang; Alan L. Yuille; Yuyin Zhou; Cihang Xie; |
150 | TinyMIM: An Empirical Study of Distilling MIM Pre-Trained Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. |
Sucheng Ren; Fangyun Wei; Zheng Zhang; Han Hu; |
151 | Persistent Nature: A Generative Model of Unbounded 3D Worlds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the task of unconditionally synthesizing unbounded nature scenes, enabling arbitrarily large camera motion while maintaining a persistent 3D world model. |
Lucy Chai; Richard Tucker; Zhengqi Li; Phillip Isola; Noah Snavely; |
152 | OneFormer: One Transformer To Rule Universal Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To that end, we propose OneFormer, a universal image segmentation framework that unifies segmentation with a multi-task train-once design. |
Jitesh Jain; Jiachen Li; Mang Tik Chiu; Ali Hassani; Nikita Orlov; Humphrey Shi; |
153 | Hierarchical Neural Memory Network for Low Latency Event Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a low latency neural network architecture for event-based dense prediction tasks. |
Ryuhei Hamaguchi; Yasutaka Furukawa; Masaki Onishi; Ken Sakurada; |
154 | Finding Geometric Models By Clustering in The Consensus Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies. |
Daniel Barath; Denys Rozumnyi; Ivan Eichhardt; Levente Hajder; Jiri Matas; |
155 | Leapfrog Diffusion Model for Stochastic Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To resolve the dilemma, we present LEapfrog Diffusion model (LED), a novel diffusion-based trajectory prediction model, which provides real-time, precise, and diverse predictions. |
Weibo Mao; Chenxin Xu; Qi Zhu; Siheng Chen; Yanfeng Wang; |
156 | DaFKD: Domain-Aware Federated Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new perspective that treats the local data in each client as a specific domain and design a novel domain knowledge aware federated distillation method, dubbed DaFKD, that can discern the importance of each model to the distillation sample, and thus is able to optimize the ensemble of soft predictions from diverse models. |
Haozhao Wang; Yichen Li; Wenchao Xu; Ruixuan Li; Yufeng Zhan; Zhigang Zeng; |
157 | GeoLayoutLM: Geometric Pre-Training for Visual Information Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. |
Chuwei Luo; Changxu Cheng; Qi Zheng; Cong Yao; |
158 | Class-Incremental Exemplar Compression for Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an adaptive mask generation model called class-incremental masking (CIM) to explicitly resolve two difficulties of using CAM: 1) transforming the heatmaps of CAM to 0-1 masks with an arbitrary threshold leads to a trade-off between the coverage on discriminative pixels and the quantity of exemplars, as the total memory is fixed; and 2) optimal thresholds vary for different object classes, which is particularly obvious in the dynamic environment of CIL. |
Zilin Luo; Yaoyao Liu; Bernt Schiele; Qianru Sun; |
159 | Boost Vision Transformer With GPU-Friendly Sparsity and Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper thoroughly designs a compression scheme to maximally utilize the GPU-friendly 2:4 fine-grained structured sparsity and quantization. |
Chong Yu; Tao Chen; Zhongxue Gan; Jiayuan Fan; |
160 | Spectral Bayesian Uncertainty for Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to quantify spectral Bayesian uncertainty in image SR. |
Tao Liu; Jun Cheng; Shan Tan; |
161 | Behind The Scenes: Density Fields for Single View Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As an alternative, we propose to predict an implicit density field from a single image. |
Felix Wimbauer; Nan Yang; Christian Rupprecht; Daniel Cremers; |
162 | StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there remains a challenge in controlling the hallucinations to accurately transfer hairstyle and preserve the face shape and identity of the input. To overcome this, we propose a multi-view optimization framework that uses "two different views" of reference composites to semantically guide occluded or ambiguous regions. |
Sasikarn Khwanmuang; Pakkapon Phongthawee; Patsorn Sangkloy; Supasorn Suwajanakorn; |
163 | Resource-Efficient RGBD Aerial Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, in this paper, we explore RGBD aerial tracking in an overhead space, which can greatly enlarge the development of drone-based visual perception. |
Jinyu Yang; Shang Gao; Zhe Li; Feng Zheng; Aleš Leonardis; |
164 | Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel multi-frame human pose estimation framework, which employs temporal differences across frames to model dynamic contexts and engages mutual information objectively to facilitate useful motion information disentanglement. |
Runyang Feng; Yixing Gao; Xueqing Ma; Tze Ho Elden Tse; Hyung Jin Chang; |
165 | Bilateral Memory Consolidation for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the way our brains constantly rewrite and consolidate past recollections, we propose a novel Bilateral Memory Consolidation (BiMeCo) framework that focuses on enhancing memory interaction capabilities. |
Xing Nie; Shixiong Xu; Xiyan Liu; Gaofeng Meng; Chunlei Huo; Shiming Xiang; |
166 | SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, for the first time, we study the potential of leveraging synthetic visual data for VSR. |
Xubo Liu; Egor Lakomkin; Konstantinos Vougioukas; Pingchuan Ma; Honglie Chen; Ruiming Xie; Morrie Doulaty; Niko Moritz; Jachym Kolar; Stavros Petridis; Maja Pantic; Christian Fuegen; |
167 | BiasBed – Rigorous Texture Bias Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate difficulties and limitations when training networks with reduced texture bias. |
Nikolai Kalischek; Rodrigo Caye Daudt; Torben Peters; Reinhard Furrer; Jan D. Wegner; Konrad Schindler; |
168 | Open-Category Human-Object Interaction Pre-Training Via Language Modeling Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methods trained from closed-set data predict HOIs as fixed-dimension logits, which restricts their scalability to open-set categories. To address this issue, we introduce OpenCat, a language modeling framework that reformulates HOI prediction as sequence generation. |
Sipeng Zheng; Boshen Xu; Qin Jin; |
169 | SFD2: Semantic-Guided Feature Detection and Description Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, we propose to extract globally reliable features by implicitly embedding high-level semantics into both the detection and description processes. |
Fei Xue; Ignas Budvytis; Roberto Cipolla; |
170 | Search-Map-Search: A Frame Selection Paradigm for Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the limitations of existing methods, we propose a Search-Map-Search learning paradigm which combines the advantages of heuristic search and supervised learning to select the best combination of frames from a video as one entity. |
Mingjun Zhao; Yakun Yu; Xiaoli Wang; Lei Yang; Di Niu; |
171 | Uncovering The Missing Pattern: Unified Framework Towards Trajectory Imputation and Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This limitation inevitably hinders the accuracy of trajectory prediction. To address this issue, our paper presents a unified framework, the Graph-based Conditional Variational Recurrent Neural Network (GC-VRNN), which can perform trajectory imputation and prediction simultaneously. |
Yi Xu; Armin Bazarjani; Hyung-gun Chi; Chiho Choi; Yun Fu; |
172 | CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage CLIP for zero-shot sketch based image retrieval (ZS-SBIR). |
Aneeshan Sain; Ayan Kumar Bhunia; Pinaki Nath Chowdhury; Subhadeep Koley; Tao Xiang; Yi-Zhe Song; |
173 | FlexiViT: One Model for All Patch Sizes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. |
Lucas Beyer; Pavel Izmailov; Alexander Kolesnikov; Mathilde Caron; Simon Kornblith; Xiaohua Zhai; Matthias Minderer; Michael Tschannen; Ibrahim Alabdulmohsin; Filip Pavetic; |
174 | RIAV-MVS: Recurrent-Indexing An Asymmetric Volume for Multi-View Stereo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a learning-based method for multi-view depth estimation from posed images. |
Changjiang Cai; Pan Ji; Qingan Yan; Yi Xu; |
175 | Structured Kernel Estimation for Photon-Limited Deconvolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new blur estimation technique customized for photon-limited conditions. |
Yash Sanghvi; Zhiyuan Mao; Stanley H. Chan; |
176 | Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we tackle supervised anomaly detection, i.e., we learn AD models using a few available anomalies with the objective to detect both the seen and unseen anomalies. |
Xincheng Yao; Ruoqi Li; Jing Zhang; Jun Sun; Chongyang Zhang; |
177 | 3D Video Loops From Asynchronous Input Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take a step forward and propose a practical solution that enables an immersive experience on dynamic 3D looping scenes. |
Li Ma; Xiaoyu Li; Jing Liao; Pedro V. Sander; |
178 | Style Projected Clustering for Domain Generalized Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to existing methods, we instead utilize the difference between images to build a better representation space, where the distinct style features are extracted and stored as the bases of representation. Then, the generalization to unseen image styles is achieved by projecting features to this known space. |
Wei Huang; Chang Chen; Yong Li; Jiacheng Li; Cheng Li; Fenglong Song; Youliang Yan; Zhiwei Xiong; |
179 | DIP: Dual Incongruity Perceiving Network for Sarcasm Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Different from other multi-modal tasks, for the sarcastic data, there exists intrinsic incongruity between a pair of image and text as demonstrated in psychological theories. To tackle this issue, we propose a Dual Incongruity Perceiving (DIP) network consisting of two branches to mine the sarcastic information from factual and affective levels. |
Changsong Wen; Guoli Jia; Jufeng Yang; |
180 | Frame Interpolation Transformer and Uncertainty Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we are bridging the gap towards video production with a novel transformer-based interpolation network architecture capable of estimating the expected error together with the interpolated frame. |
Markus Plack; Karlis Martins Briedis; Abdelaziz Djelouah; Matthias B. Hullin; Markus Gross; Christopher Schroers; |
181 | Learning To Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, two knotty obstacles limit the practicability of current SGG methods in real-world scenarios: 1) training SGG models requires time-consuming ground-truth annotations, and 2) the closed-set object categories make the SGG models limited in their ability to recognize novel objects outside of training corpora. To address these issues, we novelly exploit a powerful pre-trained visual-semantic space (VSS) to trigger language-supervised and open-vocabulary SGG in a simple yet effective manner. |
Yong Zhang; Yingwei Pan; Ting Yao; Rui Huang; Tao Mei; Chang-Wen Chen; |
182 | VectorFloorSeg: Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address semantic segmentation of a typical VG, i.e., roughcast floorplans with bare wall structures, whose output can be directly used for further applications like interior furnishing and room space modeling. |
Bingchen Yang; Haiyong Jiang; Hao Pan; Jun Xiao; |
183 | Neural Preset for Color Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed. |
Zhanghan Ke; Yuhao Liu; Lei Zhu; Nanxuan Zhao; Rynson W.H. Lau; |
184 | DeCo: Decomposition and Reconstruction for Compositional Temporal Grounding Via Coarse-To-Fine Contrastive Ranking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method to learn a coarse-to-fine compositional representation by decomposing the original query sentence into different granular levels, and then learning the correct correspondences between the video and recombined queries through a contrastive ranking constraint. |
Lijin Yang; Quan Kong; Hsuan-Kung Yang; Wadim Kehl; Yoichi Sato; Norimasa Kobori; |
185 | Dynamic Aggregated Network for Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new perspective that actual gait features include global motion patterns in multiple key regions, and each global motion pattern is composed of a series of local motion patterns. |
Kang Ma; Ying Fu; Dezhi Zheng; Chunshui Cao; Xuecai Hu; Yongzhen Huang; |
186 | Wavelet Diffusion Models Are Fast and Scalable Image Generators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper aims to reduce the speed gap by proposing a novel wavelet-based diffusion scheme. |
Hao Phung; Quan Dao; Anh Tran; |
187 | PA&DA: Jointly Sampling Path and Data for Consistent NAS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: And we further find that large gradient variance occurs during supernet training, which degrades the supernet ranking consistency. To mitigate this issue, we propose to explicitly minimize the gradient variance of the supernet training by jointly optimizing the sampling distributions of PAth and DAta (PA&DA). |
Shun Lu; Yu Hu; Longxing Yang; Zihao Sun; Jilin Mei; Jianchao Tan; Chengru Song; |
188 | Sphere-Guided Training of Neural Implicit Surfaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These methods, however, apply the ray marching procedure for the entire scene volume, leading to reduced sampling efficiency and, as a result, lower reconstruction quality in the areas of high-frequency details. In this work, we address this problem via joint training of the implicit function and our new coarse sphere-based surface reconstruction. |
Andreea Dogaru; Andrei-Timotei Ardelean; Savva Ignatyev; Egor Zakharov; Evgeny Burnaev; |
189 | 3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in Point Cloud Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we find that the inherently hierarchical structures of physical space in 3D scenes aid in the automatic association of semantic and spatial arrangements, specifying clear patterns and leading to less ambiguous predictions. |
Mingtao Feng; Haoran Hou; Liang Zhang; Zijie Wu; Yulan Guo; Ajmal Mian; |
190 | Extracting Motion and Appearance Via Inter-Frame Attention for Efficient Video Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new module to explicitly extract motion and appearance information via a unified operation. |
Guozhen Zhang; Yuhan Zhu; Haonan Wang; Youxin Chen; Gangshan Wu; Limin Wang; |
191 | Bias Mimicking: A Simple Sampling Approach for Bias Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, Undersampling drops a significant part of the input distribution per epoch while Oversampling repeats samples, causing overfitting. To address these shortcomings, we introduce a new class-conditioned sampling method: Bias Mimicking. |
Maan Qraitem; Kate Saenko; Bryan A. Plummer; |
192 | ViTs for SITS: Vision Transformers for Satellite Image Time Series Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we introduce the Temporo-Spatial Vision Transformer (TSViT), a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the Vision Transformer (ViT). |
Michail Tarasiou; Erik Chavez; Stefanos Zafeiriou; |
193 | NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of tuning the quantizer to better fit the complicated activation distribution, this paper proposes NoisyQuant, a quantizer-agnostic enhancement for the post-training activation quantization performance of vision transformers. |
Yijiang Liu; Huanrui Yang; Zhen Dong; Kurt Keutzer; Li Du; Shanghang Zhang; |
194 | Semi-Supervised Stereo-Based 3D Object Detection Via Cross-View Consensus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to achieve semi-supervised learning for stereo-based 3D object detection through pseudo annotation generation from a temporal-aggregated teacher model, which temporally accumulates knowledge from a student model. |
Wenhao Wu; Hau San Wong; Si Wu; |
195 | Minimizing Maximum Model Discrepancy for Transferable Black-Box Targeted Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the black-box targeted attack problem from the model discrepancy perspective. |
Anqi Zhao; Tong Chu; Yahao Liu; Wen Li; Jingjing Li; Lixin Duan; |
196 | Efficient Loss Function By Minimizing The Detrimental Effect of Floating-Point Errors on Gradient-Based Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Correspondingly, we propose an efficient loss function by minimizing the detrimental impact of the floating-point errors on the attacks. |
Yunrui Yu; Cheng-Zhong Xu; |
197 | BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel bundle adjusted deblur Neural Radiance Fields (BAD-NeRF), which can be robust to severe motion blurred images and inaccurate camera poses. |
Peng Wang; Lingzhe Zhao; Ruijie Ma; Peidong Liu; |
198 | Video Compression With Entropy-Constrained Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel convolutional architecture for video representation that better represents spatio-temporal information and a training strategy capable of jointly optimizing rate and distortion. |
Carlos Gomes; Roberto Azevedo; Christopher Schroers; |
199 | Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose CaFo, a Cascade of Foundation models that incorporates diverse prior knowledge of various pre training paradigms for better few-shot learning. |
Renrui Zhang; Xiangfei Hu; Bohao Li; Siyuan Huang; Hanqiu Deng; Yu Qiao; Peng Gao; Hongsheng Li; |
200 | Deep Random Projector: Accelerated Deep Image Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on IR, and propose two crucial modifications to DIP that help achieve substantial speedup: 1) optimizing the DIP seed while freezing randomly-initialized network weights, and 2) reducing the network depth. |
Taihui Li; Hengkang Wang; Zhong Zhuang; Ju Sun; |
201 | SCPNet: Semantic Scene Completion on Point Cloud Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the above-mentioned problems, we propose the following three solutions: 1) Redesigning the completion network. |
Zhaoyang Xia; Youquan Liu; Xin Li; Xinge Zhu; Yuexin Ma; Yikang Li; Yuenan Hou; Yu Qiao; |
202 | Revisiting Prototypical Network for Cross Domain Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, its performance drops dramatically when generalizing to the FSC tasks in new domains. In this study, we revisit this problem and argue that the devil lies in the simplicity bias pitfall in neural networks. |
Fei Zhou; Peng Wang; Lei Zhang; Wei Wei; Yanning Zhang; |
203 | QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In addition, there is an inherent asynchronous relationship between human speech and gestures. To tackle these challenges, we introduce a novel quantization-based and phase-guided motion matching framework. |
Sicheng Yang; Zhiyong Wu; Minglei Li; Zhensong Zhang; Lei Hao; Weihong Bao; Haolin Zhuang; |
204 | Multiscale Tensor Decomposition and Rendering Equation Encoding for View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper aims to further advance the quality of view rendering by proposing a novel approach dubbed the neural radiance feature field (NRFF). |
Kang Han; Wei Xiang; |
205 | NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, essential desiderata for models are to be data-efficient, generalize to different data distributions and tasks with unseen semantic forms, as well as ground complex language semantics (e.g., view-point anchoring and multi-object reference). To address these challenges, we propose NS3D, a neuro-symbolic framework for 3D grounding. |
Joy Hsu; Jiayuan Mao; Jiajun Wu; |
206 | Learning Accurate 3D Shape Based on Stereo Polarimetric Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The accuracy of existing SfP methods is affected by two main problems. First, the ambiguity of polarization cues partially results in false normal estimation. Second, the widely-used assumption about orthographic projection is too ideal. To solve these problems, we propose the first approach that combines deep learning and stereo polarization information to recover not only normal but also disparity. |
Tianyu Huang; Haoang Li; Kejing He; Congying Sui; Bin Li; Yun-Hui Liu; |
207 | VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we present a dual masking strategy for efficient pre-training, with an encoder operating on a subset of video tokens and a decoder processing another subset of video tokens. |
Limin Wang; Bingkun Huang; Zhiyu Zhao; Zhan Tong; Yinan He; Yi Wang; Yali Wang; Yu Qiao; |
208 | GANmouflage: 3D Object Nondetection With Texture Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method that learns to camouflage 3D objects within scenes. |
Rui Guo; Jasmine Collins; Oscar de Lima; Andrew Owens; |
209 | Perception and Semantic Aware Regularization for Sequential Confidence Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we find tokens/sequences with high perception and semantic correlations with the target ones contain more correlated and effective information and thus facilitate more effective regularization. |
Zhenghua Peng; Yu Luo; Tianshui Chen; Keke Xu; Shuangping Huang; |
210 | Revisiting Residual Networks for Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, little attention was devoted to analyzing the role of architectural elements (e.g., topology, depth, and width) on adversarial robustness. This paper seeks to bridge this gap and present a holistic study on the impact of architectural design on adversarial robustness. |
Shihua Huang; Zhichao Lu; Kalyanmoy Deb; Vishnu Naresh Boddeti; |
211 | Vision Transformer With Super Token Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A challenge then arises: can we access efficient and effective global context modeling at the early stages of a neural network? To address this issue, we draw inspiration from the design of superpixels, which reduces the number of image primitives in subsequent processing, and introduce super tokens into vision transformer. |
Huaibo Huang; Xiaoqiang Zhou; Jie Cao; Ran He; Tieniu Tan; |
212 | RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel and efficient framework: Retrieval Augmented Contrastive Language-Image Pre-training (RA-CLIP) to augment embeddings by online retrieval. |
Chen-Wei Xie; Siyang Sun; Xiong Xiong; Yun Zheng; Deli Zhao; Jingren Zhou; |
213 | PosterLayout: A New Benchmark and Approach for Content-Aware Visual-Textual Presentation Layout Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Since content-aware visual-textual presentation layout is a novel task, we first construct a new dataset named PKU PosterLayout, which consists of 9,974 poster-layout pairs and 905 images, i.e., non-empty canvases. It is more challenging and useful for greater layout variety, domain diversity, and content diversity. Then, we propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers, and a novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts. |
Hsiao Yuan Hsu; Xiangteng He; Yuxin Peng; Hao Kong; Qing Zhang; |
214 | A Practical Upper Bound for The Worst-Case Attribution Deviations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, for the first time, a constrained optimization problem is formulated to derive an upper bound that measures the largest dissimilarity of attributions after the samples are perturbed by any noises within a certain region while the classification results remain the same. |
Fan Wang; Adams Wai-Kin Kong; |
215 | A General Regret Bound of Preconditioned Gradient Method for DNN Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a general regret bound with a constrained full-matrix preconditioned gradient and show that the updating formula of the preconditioner can be derived by solving a cone-constrained optimization problem. |
Hongwei Yong; Ying Sun; Lei Zhang; |
216 | Teacher-Generated Spatial-Attention Labels Boost Robustness and Accuracy of Contrastive Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, col-lecting such large scale data is very expensive. To addressthis challenge, we construct an auxiliary teacher model topredict human attention, trained on a relatively small la-beled dataset. |
Yushi Yao; Chang Ye; Junfeng He; Gamaleldin F. Elsayed; |
217 | Exploring and Exploiting Uncertainty for Incomplete Multi-View Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To explore and exploit the uncertainty, we propose an Uncertainty-induced Incomplete Multi-View Data Classification (UIMC) model to classify the incomplete multi-view data under a stable and reliable framework. |
Mengyao Xie; Zongbo Han; Changqing Zhang; Yichen Bai; Qinghua Hu; |
218 | Vid2Seq: Large-Scale Pretraining of A Visual Language Model for Dense Video Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning model pretrained on narrated videos which are readily-available at scale. |
Antoine Yang; Arsha Nagrani; Paul Hongsuck Seo; Antoine Miech; Jordi Pont-Tuset; Ivan Laptev; Josef Sivic; Cordelia Schmid; |
219 | Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though a few methods have been explored, most of them still suffer from longer training time and more complex deployment, which cannot be deployed in the actual industrial applications. In this paper, we intend to bridge this gap and propose an Optimal Proposal Learning (OPL) framework for deployable end-to-end pedestrian detection. |
Xiaolin Song; Binghui Chen; Pengyu Li; Jun-Yan He; Biao Wang; Yifeng Geng; Xuansong Xie; Honggang Zhang; |
220 | Discovering The Real Association: Multimodal Causal Reasoning in Video Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we investigate relational structure from a causal representation perspective on multimodal data and propose a novel inference framework. |
Chuanqi Zang; Hanqing Wang; Mingtao Pei; Wei Liang; |
221 | Temporal Interpolation Is All You Need for Dynamic Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method to train spatiotemporal neural radiance fields of dynamic scenes based on temporal interpolation of feature vectors. |
Sungheon Park; Minjung Son; Seokhwan Jang; Young Chun Ahn; Ji-Yeon Kim; Nahyup Kang; |
222 | Graph Transformer GANs for Graph-Constrained House Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task. |
Hao Tang; Zhenyu Zhang; Humphrey Shi; Bo Li; Ling Shao; Nicu Sebe; Radu Timofte; Luc Van Gool; |
223 | On The Benefits of 3D Pose and Tracking for Human Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we study the benefits of using tracking and 3D poses for action recognition. |
Jathushan Rajasegaran; Georgios Pavlakos; Angjoo Kanazawa; Christoph Feichtenhofer; Jitendra Malik; |
224 | How to Backdoor Diffusion Models? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. |
Sheng-Yen Chou; Pin-Yu Chen; Tsung-Yi Ho; |
225 | ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model With Knowledge-Enhanced Mixture-of-Denoising-Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose ERNIE-ViLG 2.0, a large-scale Chinese text-to-image diffusion model, to progressively upgrade the quality of generated images by: (1) incorporating fine-grained textual and visual knowledge of key elements in the scene, and (2) utilizing different denoising experts at different denoising stages. |
Zhida Feng; Zhenyu Zhang; Xintong Yu; Yewei Fang; Lanxin Li; Xuyi Chen; Yuxiang Lu; Jiaxiang Liu; Weichong Yin; Shikun Feng; Yu Sun; Li Chen; Hao Tian; Hua Wu; Haifeng Wang; |
226 | PACO: Parts and Attributes of Common Objects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, we introduce PACO: Parts and Attributes of Common Objects. |
Vignesh Ramanathan; Anmol Kalia; Vladan Petrovic; Yi Wen; Baixue Zheng; Baishan Guo; Rui Wang; Aaron Marquez; Rama Kovvuri; Abhishek Kadian; Amir Mousavi; Yiwen Song; Abhimanyu Dubey; Dhruv Mahajan; |
227 | Learning Transformations To Reduce The Geometric Shift in Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, by contrast, we tackle geometric shifts emerging from variations in the image capture process, or due to the constraints of the environment causing differences in the apparent geometry of the content itself. We introduce a self-training approach that learns a set of geometric transformations to minimize these shifts without leveraging any labeled data in the new domain, nor any information about the cameras. |
Vidit Vidit; Martin Engilberge; Mathieu Salzmann; |
228 | OReX: Object Reconstruction From Planar Cross-Sections Using Neural Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present OReX, a method for 3D shape reconstruction from slices alone, featuring a Neural Field as the interpolation prior. |
Haim Sawdayee; Amir Vaxman; Amit H. Bermano; |
229 | SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting With Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In 3D, solutions must be both consistent across multiple views and geometrically valid. In this paper, we propose a novel 3D inpainting method that addresses these challenges. |
Ashkan Mirzaei; Tristan Aumentado-Armstrong; Konstantinos G. Derpanis; Jonathan Kelly; Marcus A. Brubaker; Igor Gilitschenski; Alex Levinshtein; |
230 | Revisiting The Stack-Based Inverse Tone Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the stack-based ITM approaches and propose a novel method to reconstruct HDR radiance from a single image, which only needs to estimate two exposure images. |
Ning Zhang; Yuyao Ye; Yang Zhao; Ronggang Wang; |
231 | Revisiting Rotation Averaging: Uncertainties and Robust Losses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit the rotation averaging problem applied in global Structure-from-Motion pipelines. |
Ganlin Zhang; Viktor Larsson; Daniel Barath; |
232 | Continuous Sign Language Recognition With Correlation Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current methods in continuous sign language recognition(CSLR) usually process frames independently to capture frame-wise features, thus failing to capture cross-frame trajectories to effectively identify a sign. To handle this limitation, we propose correlation network (CorrNet) to explicitly leverage body trajectories across frames to identify signs. |
Lianyu Hu; Liqing Gao; Zekang Liu; Wei Feng; |
233 | A Simple Framework for Text-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper shows that a vanilla contrastive language-image pre-training (CLIP) model is an effective text-supervised semantic segmentor by itself. |
Muyang Yi; Quan Cui; Hao Wu; Cheng Yang; Osamu Yoshie; Hongtao Lu; |
234 | Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As the pseudo labels play a crucial role, we propose an enhancement framework by exploiting completeness and uncertainty properties for effective self-training. |
Chen Zhang; Guorong Li; Yuankai Qi; Shuhui Wang; Laiyun Qing; Qingming Huang; Ming-Hsuan Yang; |
235 | PlenVDB: Memory Efficient VDB-Based Radiance Fields for Fast Training and Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new representation for neural radiance fields that accelerates both the training and the inference processes with VDB, a hierarchical data structure for sparse volumes. |
Han Yan; Celong Liu; Chao Ma; Xing Mei; |
236 | Patch-Based 3D Natural Scene Generation From A Single Example Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We target a 3D generative model for general natural scenes that are typically unique and intricate. |
Weiyu Li; Xuelin Chen; Jue Wang; Baoquan Chen; |
237 | Full or Weak Annotations? An Adaptive Strategy for Budget-Constrained Annotation Campaigns Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel approach to determine annotation strategies for segmentation datasets, whereby estimating what proportion of segmentation and classification annotations should be collected given a fixed budget. |
Javier Gamazo Tejero; Martin S. Zinkernagel; Sebastian Wolf; Raphael Sznitman; Pablo Márquez-Neila; |
238 | Leveraging Hidden Positives for Unsupervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although the recent work employing the vision transformer (ViT) backbone shows exceptional performance, there is still a lack of consideration for task-specific training guidance and local semantic consistency. To tackle these issues, we leverage contrastive learning by excavating hidden positives to learn rich semantic relationships and ensure semantic consistency in local regions. |
Hyun Seok Seong; WonJun Moon; SuBeen Lee; Jae-Pil Heo; |
239 | Backdoor Defense Via Deconfounded Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the causal understanding, we propose the Causality-inspired Backdoor Defense (CBD), to learn deconfounded representations by employing the front-door adjustment. |
Zaixi Zhang; Qi Liu; Zhicai Wang; Zepu Lu; Qingyong Hu; |
240 | LG-BPN: Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel method called LG-BPN for self-supervised real-world denoising, which takes the spatial correlation statistic into our network design for local detail restoration, and also brings the long-range dependencies modeling ability to previously CNN-based BSN methods. |
Zichun Wang; Ying Fu; Ji Liu; Yulun Zhang; |
241 | Efficient View Synthesis and 3D-Based Multi-Frame Denoising With Multiplane Feature Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While current multi-frame restoration methods combine information from multiple input images using 2D alignment techniques, recent advances in novel view synthesis are paving the way for a new paradigm relying on volumetric scene representations. In this work, we introduce the first 3D-based multi-frame denoising method that significantly outperforms its 2D-based counterparts with lower computational requirements. |
Thomas Tanay; Aleš Leonardis; Matteo Maggioni; |
242 | An Actor-Centric Causality Graph for Asynchronous Temporal Inference in Group Activity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an Actor-Centric Causality Graph Model, which learns the asynchronous temporal causality relation with three modules, i.e., an asynchronous temporal causality relation detection module, a causality feature fusion module, and a causality relation graph inference module. |
Zhao Xie; Tian Gao; Kewei Wu; Jiao Chang; |
243 | Color Backdoor: A Robust Poisoning Attack in Color Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel color backdoor attack, which can exhibit robustness and stealthiness at the same time. |
Wenbo Jiang; Hongwei Li; Guowen Xu; Tianwei Zhang; |
244 | HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the challenging problem of learning-based single-view 3D hair modeling. |
Yujian Zheng; Zirong Jin; Moran Li; Haibin Huang; Chongyang Ma; Shuguang Cui; Xiaoguang Han; |
245 | MoDAR: Using Motion Forecasting for 3D Object Detection in Point Cloud Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose MoDAR, using motion forecasting outputs as a type of virtual modality, to augment LiDAR point clouds. |
Yingwei Li; Charles R. Qi; Yin Zhou; Chenxi Liu; Dragomir Anguelov; |
246 | How You Feelin’? Learning Emotions and Mental States in Movie Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose EmoTx, a multimodal Transformer-based architecture that ingests videos, multiple characters, and dialog utterances to make joint predictions. |
Dhruv Srivastava; Aditya Kumar Singh; Makarand Tapaswi; |
247 | Dynamic Inference With Grounding Based Vision and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, there exists a large amount of computational redundancy in these large models which skips their run-time efficiency. To address this problem, we propose dynamic inference for grounding based vision and language models conditioned on the input image-text pair. |
Burak Uzkent; Amanmeet Garg; Wentao Zhu; Keval Doshi; Jingru Yi; Xiaolong Wang; Mohamed Omar; |
248 | ALSO: Automotive Lidar Self-Supervision By Occupancy Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds. |
Alexandre Boulch; Corentin Sautier; Björn Michele; Gilles Puy; Renaud Marlet; |
249 | Connecting Vision and Language With Video Localized Narratives Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Video Localized Narratives, a new form of multimodal video annotations connecting vision and language. |
Paul Voigtlaender; Soravit Changpinyo; Jordi Pont-Tuset; Radu Soricut; Vittorio Ferrari; |
250 | Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the training samples are usually limited, while the modality gaps are too large, which leads that the existing methods cannot effectively mine diverse cross-modality clues. To handle this limitation, we propose a novel augmentation network in the embedding space, called diverse embedding expansion network (DEEN). |
Yukang Zhang; Hanzi Wang; |
251 | Model Barrier: A Compact Un-Transferable Isolation Domain for Model Intellectual Property Protection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel compact un-transferable isolation domain (CUTI-domain), which acts as a model barrier to block illegal transferring from the authorized domain to the unauthorized domain. |
Lianyu Wang; Meng Wang; Daoqiang Zhang; Huazhu Fu; |
252 | Object Detection With Self-Supervised Scene Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel method to improve the performance of a trained object detector on scenes with fixed camera perspectives based on self-supervised adaptation. |
Zekun Zhang; Minh Hoai; |
253 | Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the specific textual knowledge has worse generalizable to the unseen classes because it forgets the essential general textual knowledge having a strong generalization ability. To tackle this issue, we introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes. |
Hantao Yao; Rui Zhang; Changsheng Xu; |
254 | Weakly Supervised Video Representation Learning With Unaligned Text for Sequential Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we use a transformer to aggregate frame-level features for video representation and use a pre-trained text encoder to encode the texts corresponding to each action and the whole video, respectively. To model the correspondence between text and video, we propose a multiple granularity loss, where the video-paragraph contrastive loss enforces matching between the whole video and the complete script, and a fine-grained frame-sentence contrastive loss enforces the matching between each action and its description. |
Sixun Dong; Huazhang Hu; Dongze Lian; Weixin Luo; Yicheng Qian; Shenghua Gao; |
255 | Self-Positioning Point-Based Transformer for Point Cloud Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a Self-Positioning point-based Transformer (SPoTr), which is designed to capture both local and global shape contexts with reduced complexity. |
Jinyoung Park; Sanghyeok Lee; Sihyeon Kim; Yunyang Xiong; Hyunwoo J. Kim; |
256 | Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing works hold an impractical assumption that the novel class distribution prior is uniform, yet neglect the imbalanced nature of real-world data. In this paper, we relax this assumption by proposing a new challenging task: distribution-agnostic NCD, which allows data drawn from arbitrary unknown class distributions and thus renders existing methods useless or even harmful. |
Muli Yang; Liancheng Wang; Cheng Deng; Hanwang Zhang; |
257 | Learning To Generate Image Embeddings With User-Level Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To achieve user-level DP for large image-to-embedding feature extractors, we propose DP-FedEmb, a variant of federated learning algorithms with per-user sensitivity control and noise addition, to train from user-partitioned data centralized in datacenter. |
Zheng Xu; Maxwell Collins; Yuxiao Wang; Liviu Panait; Sewoong Oh; Sean Augenstein; Ting Liu; Florian Schroff; H. Brendan McMahan; |
258 | Open-Vocabulary Panoptic Segmentation With Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. |
Jiarui Xu; Sifei Liu; Arash Vahdat; Wonmin Byeon; Xiaolong Wang; Shalini De Mello; |
259 | Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider the problem of open-vocabulary semantic segmentation (OVS), which aims to segment objects of arbitrary classes instead of pre-defined, closed-set categories. |
Jilan Xu; Junlin Hou; Yuejie Zhang; Rui Feng; Yi Wang; Yu Qiao; Weidi Xie; |
260 | Learning Dynamic Style Kernels for Artistic Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further enhance the flexibility of our style transfer method, we propose a Style Alignment Encoding (SAE) module complemented with a Content-based Gating Modulation (CGM) module for learning the dynamic style kernels in focusing regions. |
Wenju Xu; Chengjiang Long; Yongwei Nie; |
261 | DeepLSD: Line Segment Detection and Refinement With Deep Image Gradients Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to combine traditional and learned approaches to get the best of both worlds: an accurate and robust line detector that can be trained in the wild without ground truth lines. |
Rémi Pautrat; Daniel Barath; Viktor Larsson; Martin R. Oswald; Marc Pollefeys; |
262 | OcTr: Octree-Based Transformer for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Albeit recent efforts made by Transformers with the long sequence modeling capability, they fail to properly balance the accuracy and efficiency, suffering from inadequate receptive fields or coarse-grained holistic correlations. In this paper, we propose an Octree-based Transformer, named OcTr, to address this issue. |
Chao Zhou; Yanan Zhang; Jiaxin Chen; Di Huang; |
263 | Chat2Map: Efficient Scene Mapping From Multi-Ego Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To that end, we present an audio-visual deep reinforcement learning approach that works with our shared scene mapper to selectively turn on the camera to efficiently chart out the space. |
Sagnik Majumder; Hao Jiang; Pierre Moulon; Ethan Henderson; Paul Calamia; Kristen Grauman; Vamsi Krishna Ithapu; |
264 | Learning Distortion Invariant Representation for Image Restoration From A Causality Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we are the first to propose a novel training strategy for image restoration from the causality perspective, to improve the generalization ability of DNNs for unknown degradations. |
Xin Li; Bingchen Li; Xin Jin; Cuiling Lan; Zhibo Chen; |
265 | MOT: Masked Optimal Transport for Partial Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on the rigorous OT modeling for conditional distribution matching and label shift correction. |
You-Wei Luo; Chuan-Xian Ren; |
266 | Executing Your Commands Via Motion Diffusion in Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors. |
Xin Chen; Biao Jiang; Wen Liu; Zilong Huang; Bin Fu; Tao Chen; Gang Yu; |
267 | GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper tries to address a fundamental question in point cloud self-supervised learning: what is a good signal we should leverage to learn features from point clouds without annotations? To answer that, we introduce a point cloud representation learning framework, based on geometric feature reconstruction. |
Xiaoyu Tian; Haoxi Ran; Yue Wang; Hang Zhao; |
268 | Learning Conditional Attributes for Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a solution, we provide analysis and argue that attributes are conditioned on the recognized object and input image and explore learning conditional attribute embeddings by a proposed attribute learning framework containing an attribute hyper learner and an attribute base learner. |
Qingsheng Wang; Lingqiao Liu; Chenchen Jing; Hao Chen; Guoqiang Liang; Peng Wang; Chunhua Shen; |
269 | Complete 3D Human Reconstruction From A Single Incomplete Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a method to reconstruct a complete human geometry and texture from an image of a person with only partial body observed, e.g., a torso. |
Junying Wang; Jae Shin Yoon; Tuanfeng Y. Wang; Krishna Kumar Singh; Ulrich Neumann; |
270 | PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the former requires time-consuming sampling while the latter introduces quantization errors. In this paper, we present a novel Point-Voxel Transformer for single-stage 3D detection (PVT-SSD) that takes advantage of these two representations. |
Honghui Yang; Wenxiao Wang; Minghao Chen; Binbin Lin; Tong He; Hua Chen; Xiaofei He; Wanli Ouyang; |
271 | Adaptive Human Matting for Dynamic Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the latest tripmap-free methods showing promising results, their performance often degrades when dealing with highly diverse and unstructured videos. We address this limitation by introducing Adaptive Matting for Dynamic Videos, termed AdaM, which is a framework designed for simultaneously differentiating foregrounds from backgrounds and capturing alpha matte details of human subjects in the foreground. |
Chung-Ching Lin; Jiang Wang; Kun Luo; Kevin Lin; Linjie Li; Lijuan Wang; Zicheng Liu; |
272 | Learning Common Rationale To Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, both our preliminary investigation and recent studies suggest that they may be less effective in learning representations for fine-grained visual recognition (FGVR) since many features helpful for optimizing SSL objectives are not suitable for characterizing the subtle differences in FGVR. To overcome this issue, we propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes, dubbed as common rationales in this paper. |
Yangyang Shu; Anton van den Hengel; Lingqiao Liu; |
273 | Reconstructing Animatable Categories From Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present RAC, a method to build category-level 3D models from monocular videos, disentangling variations over instances and motion over time. |
Gengshan Yang; Chaoyang Wang; N. Dinesh Reddy; Deva Ramanan; |
274 | UDE: A Unified Driving Engine for Human Motion Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose "UDE", the first unified driving engine that enables generating human motion sequences from natural language or audio sequences (see Fig. 1). |
Zixiang Zhou; Baoyuan Wang; |
275 | High-Fidelity 3D Human Digitization From Single 2K Resolution Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: High-quality 3D human body reconstruction requires high-fidelity and large-scale training data and appropriate network design that effectively exploits the high-resolution input images. To tackle these problems, we propose a simple yet effective 3D human digitization method called 2K2K, which constructs a large-scale 2K human dataset and infers 3D human models from 2K resolution images. |
Sang-Hun Han; Min-Gyu Park; Ju Hong Yoon; Ju-Mi Kang; Young-Jae Park; Hae-Gon Jeon; |
276 | Co-Salient Object Detection With Uncertainty-Aware Group Exchange-Masking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This brings about model robustness defect under the condition of irrelevant images in the testing image group, which hinders the use of CoSOD models in real-world applications. To address this issue, this paper presents a group exchange-masking (GEM) strategy for robust CoSOD model learning. |
Yang Wu; Huihui Song; Bo Liu; Kaihua Zhang; Dong Liu; |
277 | Tangentially Elongated Gaussian Belief Propagation for Event-Based Incremental Optical Flow Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose tangentially elongated Gaussian (TEG) belief propagation (BP) that realizes incremental full-flow estimation. |
Jun Nagata; Yusuke Sekikawa; |
278 | Extracting Class Activation Maps From Non-Discriminative Features As Well Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The crux behind is that the weight of the classifier (used to compute CAM) captures only the discriminative features of objects. We tackle this by introducing a new computation method for CAM that explicitly captures non-discriminative features as well, thereby expanding CAM to cover whole objects. |
Zhaozheng Chen; Qianru Sun; |
279 | BlendFields: Few-Shot Example-Driven Facial Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods are either data-driven, requiring an extensive corpus of data not publicly accessible to the research community, or fail to capture fine details because they rely on geometric face models that cannot represent fine-grained details in texture with a mesh discretization and linear deformation designed to model only a coarse face geometry. We introduce a method that bridges this gap by drawing inspiration from traditional computer graphics techniques. |
Kacper Kania; Stephan J. Garbin; Andrea Tagliasacchi; Virginia Estellers; Kwang Moo Yi; Julien Valentin; Tomasz Trzciński; Marek Kowalski; |
280 | Adaptive Sparse Pairwise Loss for Object Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This dense sampling mechanism inevitably introduces positive pairs that share few visual similarities, which can be harmful to the training. To address this problem, we propose a novel loss paradigm termed Sparse Pairwise (SP) loss that only leverages few appropriate pairs for each class in a mini-batch, and empirically demonstrate that it is sufficient for the ReID tasks. |
Xiao Zhou; Yujie Zhong; Zhen Cheng; Fan Liang; Lin Ma; |
281 | NeFII: Inverse Rendering for Reflectance Decomposition With Near-Field Indirect Illumination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an end-to-end inverse rendering pipeline that decomposes materials and illumination from multi-view images, while considering near-field indirect illumination. |
Haoqian Wu; Zhipeng Hu; Lincheng Li; Yongqiang Zhang; Changjie Fan; Xin Yu; |
282 | Towards Professional Level Crowd Annotation of Expert Domain Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A new approach, based on semi-supervised learning (SSL) and denoted as SSL with human filtering (SSL-HF) is proposed. |
Pei Wang; Nuno Vasconcelos; |
283 | Fully Self-Supervised Depth Estimation From Defocus Clue Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such limitation discourages the applications of DFD methods. To tackle this issue, we propose a completely self-supervised framework that estimates depth purely from a sparse focal stack. |
Haozhe Si; Bin Zhao; Dong Wang; Yunpeng Gao; Mulin Chen; Zhigang Wang; Xuelong Li; |
284 | Semi-Weakly Supervised Object Kinematic Motion Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle the task of object kinematic motion prediction problem in a semi-weakly supervised manner. |
Gengxin Liu; Qian Sun; Haibin Huang; Chongyang Ma; Yulan Guo; Li Yi; Hui Huang; Ruizhen Hu; |
285 | Learning A Simple Low-Light Image Enhancer From Paired Low-Light Instances Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose PairLIE, an unsupervised approach that learns adaptive priors from low-light image pairs. |
Zhenqi Fu; Yan Yang; Xiaotong Tu; Yue Huang; Xinghao Ding; Kai-Kuang Ma; |
286 | Deep Stereo Video Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel deep stereo video inpainting network named SVINet, which is the first attempt for stereo video inpainting task utilizing deep convolutional neural networks. |
Zhiliang Wu; Changchang Sun; Hanyu Xuan; Yan Yan; |
287 | Prompting Large Language Models With Answer Heuristics for Knowledge-Based Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Prophet—a conceptually simple framework designed to prompt GPT-3 with answer heuristics for knowledge-based VQA. |
Zhenwei Shao; Zhou Yu; Meng Wang; Jun Yu; |
288 | IFSeg: Image-Free Semantic Segmentation Via Vision-Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel image-free segmentation task where the goal is to perform semantic segmentation given only a set of the target semantic categories, but without any task-specific images and annotations. |
Sukmin Yun; Seong Hyeon Park; Paul Hongsuck Seo; Jinwoo Shin; |
289 | Improving Robustness of Semantic Segmentation to Motion-Blur Using Class-Centric Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most research has focused on improving segmentation performance for sharp clean images and the few works that deal with degradations, consider motion-blur as one of many generic degradations. In this work, we focus exclusively on motion-blur and attempt to achieve robustness for semantic segmentation in its presence. |
Aakanksha; A. N. Rajagopalan; |
290 | Progressive Open Space Expansion for Open-Set Model Attribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we focus on a challenging task, namely Open-Set Model Attribution (OSMA), to simultaneously attribute images to known models and identify those from unknown ones. |
Tianyun Yang; Danding Wang; Fan Tang; Xinying Zhao; Juan Cao; Sheng Tang; |
291 | Backdoor Cleansing With Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such requirement may be unrealistic as the training data are often unavailable to end users. In this paper, we investigate the possibility of circumventing such barrier. |
Lu Pang; Tao Sun; Haibin Ling; Chao Chen; |
292 | Is BERT Blind? Exploring The Effect of Vision-and-Language Pretraining on Visual Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate whether vision-and-language pretraining can improve performance on text-only tasks that involve implicit visual reasoning, focusing primarily on zero-shot probing methods. |
Morris Alper; Michael Fiman; Hadar Averbuch-Elor; |
293 | PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to this perspective, the model lacks any explicit understanding of action boundaries and tends to focus only on the most discriminative parts of the video resulting in incomplete action localization. To address this, we present PivoTAL, Prior-driven Supervision for Weakly-supervised Temporal Action Localization, to approach WTAL from a localization-by-localization perspective by learning to localize the action snippets directly. |
Mamshad Nayeem Rizve; Gaurav Mittal; Ye Yu; Matthew Hall; Sandra Sajeev; Mubarak Shah; Mei Chen; |
294 | Harmonious Feature Learning for Interactive Hand-Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Harmonious Feature Learning Network (HFL-Net). |
Zhifeng Lin; Changxing Ding; Huan Yao; Zengsheng Kuang; Shaoli Huang; |
295 | 3D GAN Inversion With Facial Symmetry Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method to promote 3D GAN inversion by introducing facial symmetry prior. |
Fei Yin; Yong Zhang; Xuan Wang; Tengfei Wang; Xiaoyu Li; Yuan Gong; Yanbo Fan; Xiaodong Cun; Ying Shan; Cengiz Oztireli; Yujiu Yang; |
296 | CLOTH4D: A Dataset for Clothed Human Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce CLOTH4D, a clothed human dataset containing 1,000 subjects with varied appearances, 1,000 3D outfits, and over 100,000 clothed meshes with paired unclothed humans, to fill the gap in large-scale and high-quality 4D clothing data. |
Xingxing Zou; Xintong Han; Waikeung Wong; |
297 | SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a novel framework built to simplify 3D asset generation for amateur users. |
Yen-Chi Cheng; Hsin-Ying Lee; Sergey Tulyakov; Alexander G. Schwing; Liang-Yan Gui; |
298 | SMAE: Few-Shot Learning for HDR Deghosting With Saturation-Aware Masked Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel semi-supervised approach to realize few-shot HDR imaging via two stages of training, called SSHDR. |
Qingsen Yan; Song Zhang; Weiye Chen; Hao Tang; Yu Zhu; Jinqiu Sun; Luc Van Gool; Yanning Zhang; |
299 | Improving Generalization With Domain Convex Game Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our explorations empirically reveal that the correlation between model generalization and the diversity of domains may be not strictly positive, which limits the effectiveness of domain augmentation. This work therefore aim to guarantee and further enhance the validity of this strand. |
Fangrui Lv; Jian Liang; Shuang Li; Jinming Zhang; Di Liu; |
300 | Learning To Render Novel Views From Wide-Baseline Stereo Pairs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair. |
Yilun Du; Cameron Smith; Ayush Tewari; Vincent Sitzmann; |
301 | TryOnDiffusion: A Tale of Two UNets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. |
Luyang Zhu; Dawei Yang; Tyler Zhu; Fitsum Reda; William Chan; Chitwan Saharia; Mohammad Norouzi; Ira Kemelmacher-Shlizerman; |
302 | Fair Scratch Tickets: Finding Fair Sparse Networks Without Weight Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we lead a novel fairness-aware learning paradigm for in-processing methods through the lens of the lottery ticket hypothesis (LTH) in the context of computer vision fairness. |
Pengwei Tang; Wei Yao; Zhicong Li; Yong Liu; |
303 | Generative Bias for Robust Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, in order to better learn the bias a target VQA model suffers from, we propose a generative method to train the bias model directly from the target model, called GenB. |
Jae Won Cho; Dong-Jin Kim; Hyeonggon Ryu; In So Kweon; |
304 | Data-Free Sketch-Based Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a methodology for DF-SBIR, which can leverage knowledge from models independently trained to perform classification on photos and sketches. |
Abhra Chaudhuri; Ayan Kumar Bhunia; Yi-Zhe Song; Anjan Dutta; |
305 | Multi-Object Manipulation Via Object-Centric Neural Scattering Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose using object-centric neural scattering functions (OSFs) as object representations in a model-predictive control framework. |
Stephen Tian; Yancheng Cai; Hong-Xing Yu; Sergey Zakharov; Katherine Liu; Adrien Gaidon; Yunzhu Li; Jiajun Wu; |
306 | The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales. |
Alexandros Stergiou; Dima Damen; |
307 | Invertible Neural Skinning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing reposing methods suffer from the limited expressiveness of Linear Blend Skinning (LBS), require costly mesh extraction to generate each new pose, and typically do not preserve surface correspondences across different poses. In this work, we introduce Invertible Neural Skinning (INS) to address these shortcomings. |
Yash Kant; Aliaksandr Siarohin; Riza Alp Guler; Menglei Chai; Jian Ren; Sergey Tulyakov; Igor Gilitschenski; |
308 | Weakly Supervised Semantic Segmentation Via Adversarial Learning of Classifier and Reconstructor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issues, we propose a novel WSSS framework via adversarial learning of a classifier and an image reconstructor. |
Hyeokjun Kweon; Sung-Hoon Yoon; Kuk-Jin Yoon; |
309 | Intrinsic Physical Concepts Discovery With Object-Centric Predictive Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take a step forward and try to discover and represent intrinsic physical concepts such as mass and charge. |
Qu Tang; Xiangyu Zhu; Zhen Lei; Zhaoxiang Zhang; |
310 | Distilling Cross-Temporal Contexts for Continuous Sign Language Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a shallow temporal aggregation module cannot well capture both local and global temporal context information in sign language. To address this dilemma, we propose a cross-temporal context aggregation (CTCA) model. |
Leming Guo; Wanli Xue; Qing Guo; Bo Liu; Kaihua Zhang; Tiantian Yuan; Shengyong Chen; |
311 | Automatic High Resolution Wire Segmentation and Removal Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We thus propose a two-stage method that leverages both global and local context to accurately segment wires in high-resolution images efficiently, and a tile-based inpainting strategy to remove the wires given our predicted segmentation masks. |
Mang Tik Chiu; Xuaner Zhang; Zijun Wei; Yuqian Zhou; Eli Shechtman; Connelly Barnes; Zhe Lin; Florian Kainz; Sohrab Amirghodsi; Humphrey Shi; |
312 | The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the use of sparsity can decrease the model size overhead by over 327x and the computation time by 3.34x compared to SOTA while maintaining equivalent total leakage rate, 77% even with 1000 clients in aggregation. |
Joshua C. Zhao; Ahmed Roushdy Elkordy; Atul Sharma; Yahya H. Ezzeldin; Salman Avestimehr; Saurabh Bagchi; |
313 | Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Deep point cloud registration methods face challenges to partial overlaps and rely on labeled data. To address these issues, we propose UDPReg, an unsupervised deep probabilistic registration framework for point clouds with partial overlaps. |
Guofeng Mei; Hao Tang; Xiaoshui Huang; Weijie Wang; Juan Liu; Jian Zhang; Luc Van Gool; Qiang Wu; |
314 | Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore multi-modal correlations derived from large-scale image-text data to facilitate generalisable VMR. |
Dezhao Luo; Jiabo Huang; Shaogang Gong; Hailin Jin; Yang Liu; |
315 | Learning Adaptive Dense Event Stereo From The Image Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, traditional UDA still needs the input event data with ground-truth in the source domain, which is more challenging and costly to obtain than image data. To tackle this issue, we propose a novel unsupervised domain Adaptive Dense Event Stereo (ADES), which resolves gaps between the different domains and input modalities. |
Hoonhee Cho; Jegyeong Cho; Kuk-Jin Yoon; |
316 | Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel and data-efficient framework for WILSS, named FMWISS. |
Chaohui Yu; Qiang Zhou; Jingliang Li; Jianlong Yuan; Zhibin Wang; Fan Wang; |
317 | Seeing A Rose in Five Thousand Ways Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With knowledge of these intrinsic properties, we may render roses of different sizes and shapes, in different poses, and under different lighting conditions. In this work, we build a generative model that learns to capture such object intrinsics from a single image, such as a photo of a bouquet. |
Yunzhi Zhang; Shangzhe Wu; Noah Snavely; Jiajun Wu; |
318 | Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel technique, Residual Radiance Field or ReRF, as a highly compact neural representation to achieve real-time FVV rendering on long-duration dynamic scenes. |
Liao Wang; Qiang Hu; Qihan He; Ziyu Wang; Jingyi Yu; Tinne Tuytelaars; Lan Xu; Minye Wu; |
319 | ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the pixel-level semantic aggregation in self-supervised ViT pre-trained models as image Segmentation and propose the Adaptive Conceptualization approach for USS, termed ACSeg. |
Kehan Li; Zhennan Wang; Zesen Cheng; Runyi Yu; Yian Zhao; Guoli Song; Chang Liu; Li Yuan; Jie Chen; |
320 | NeRFVS: Neural Radiance Fields for Free View Synthesis Via Geometry Scaffolds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present NeRFVS, a novel neural radiance fields (NeRF) based method to enable free navigation in a room. |
Chen Yang; Peihao Li; Zanwei Zhou; Shanxin Yuan; Bingbing Liu; Xiaokang Yang; Weichao Qiu; Wei Shen; |
321 | Reproducible Scaling Laws for Contrastive Language-Image Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, previous work on scaling laws has primarily used private data & models or focused on uni-modal language or vision learning. To address these limitations, we investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository. |
Mehdi Cherti; Romain Beaumont; Ross Wightman; Mitchell Wortsman; Gabriel Ilharco; Cade Gordon; Christoph Schuhmann; Ludwig Schmidt; Jenia Jitsev; |
322 | Similarity Metric Learning for RGB-Infrared Group Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a metric learning method Closest Permutation Matching (CPM) for RGB-IR G-ReID. |
Jianghao Xiong; Jianhuang Lai; |
323 | Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-Time Mobile Telepresence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a framework called Auto-CARD, which for the first time enables real-time and robust driving of Codec Avatars when exclusively using merely on-device computing resources. |
Yonggan Fu; Yuecheng Li; Chenghui Li; Jason Saragih; Peizhao Zhang; Xiaoliang Dai; Yingyan (Celine) Lin; |
324 | Conjugate Product Graphs for Globally Optimal 2D-3D Shape Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While such problems can be solved to global optimality by finding a shortest path in the product graph between both shapes, existing solutions heavily rely on unrealistic prior assumptions to avoid degenerate solutions (e.g. knowledge to which region of the 3D shape each point of the 2D contour is matched). To address this, we propose a novel 2D-3D shape matching formalism based on the conjugate product graph between the 2D contour and the 3D shape. |
Paul Roetzer; Zorah Lähner; Florian Bernard; |
325 | PromptCAL: Contrastive Affinity Learning Via Auxiliary Prompts for Generalized Novel Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we target a pragmatic but under-explored Generalized Novel Category Discovery (GNCD) setting. |
Sheng Zhang; Salman Khan; Zhiqiang Shen; Muzammal Naseer; Guangyi Chen; Fahad Shahbaz Khan; |
326 | Train/Test-Time Adaptation With Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Train/Test-Time Adaptation with Retrieval (T3AR), a method to adapt models both at train and test time by means of a retrieval module and a searchable pool of external samples. |
Luca Zancato; Alessandro Achille; Tian Yu Liu; Matthew Trager; Pramuditha Perera; Stefano Soatto; |
327 | ProxyFormer: Proxy Alignment Assisted Point Cloud Completion With Missing Part Sensitive Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, recovering the complete point clouds from the partial ones plays an vital role in many practical tasks, and one of the keys lies in the prediction of the missing part. In this paper, we propose a novel point cloud completion approach namely ProxyFormer that divides point clouds into existing (input) and missing (to be predicted) parts and each part communicates information through its proxies. |
Shanshan Li; Pan Gao; Xiaoyang Tan; Mingqiang Wei; |
328 | Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a ‘Squad’). |
Zitian Chen; Yikang Shen; Mingyu Ding; Zhenfang Chen; Hengshuang Zhao; Erik G. Learned-Miller; Chuang Gan; |
329 | Learning Customized Visual Models With Retrieval-Augmented Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Alternatively, we propose REACT, REtrieval-Augmented CusTomization, a framework to acquire the relevant web knowledge to build customized visual models for target domains. |
Haotian Liu; Kilho Son; Jianwei Yang; Ce Liu; Jianfeng Gao; Yong Jae Lee; Chunyuan Li; |
330 | Multi-Realism Image Compression With A Conditional Generator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a misleading reconstruction far from the input image is generated. In this work, we alleviate these concerns by training a decoder that can bridge the two regimes and navigate the distortion-realism trade-off. |
Eirikur Agustsson; David Minnen; George Toderici; Fabian Mentzer; |
331 | Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We hence propose a novel partial convolution (PConv) that extracts spatial features more efficiently, by cutting down redundant computation and memory access simultaneously. |
Jierun Chen; Shiu-hong Kao; Hao He; Weipeng Zhuo; Song Wen; Chul-Ho Lee; S.-H. Gary Chan; |
332 | A Unified Spatial-Angular Structured Light for Single-View Acquisition of Shape and Reflectance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a unified structured light, consisting of an LED array and an LCD mask, for high-quality acquisition of both shape and reflectance from a single view. |
Xianmin Xu; Yuxin Lin; Haoyang Zhou; Chong Zeng; Yaxin Yu; Kun Zhou; Hongzhi Wu; |
333 | Best of Both Worlds: Multimodal Contrastive Learning With Tabular and Imaging Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Combined with increasing medical dataset sizes and expensive annotation costs, the necessity for unsupervised methods that can pretrain multimodally and predict unimodally has risen. To address these needs, we propose the first self-supervised contrastive learning framework that takes advantage of images and tabular data to train unimodal encoders. |
Paul Hager; Martin J. Menten; Daniel Rueckert; |
334 | On The Difficulty of Unpaired Infrared-to-Visible Video Translation: Fine-Grained Content-Rich Patches Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose a novel CPTrans framework to tackle the challenge via balancing gradients of different patches, achieving the fine-grained Content-rich Patches Transferring. |
Zhenjie Yu; Shuang Li; Yirui Shen; Chi Harold Liu; Shuigen Wang; |
335 | Masked Images Are Counterfactual Samples for Robust Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, based on causal analysis of the aforementioned problems, we propose a novel fine-tuning method, which uses masked images as counterfactual samples that help improve the robustness of the fine-tuning model. |
Yao Xiao; Ziyi Tang; Pengxu Wei; Cong Liu; Liang Lin; |
336 | StepFormer: Self-Supervised Step Discovery and Localization in Instructional Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the problem with no human supervision and introduce StepFormer, a self-supervised model that discovers and localizes instruction steps in a video. |
Nikita Dvornik; Isma Hadji; Ran Zhang; Konstantinos G. Derpanis; Richard P. Wildes; Allan D. Jepson; |
337 | Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to learn video representation that encodes both action steps and their temporal ordering, based on a large-scale dataset of web instructional videos and their narrations, without using human annotations. |
Licheng Yu; Yang Bai; Shangwen Li; Xueting Yan; Yin Li; Yiwu Zhong; |
338 | Open Vocabulary Semantic Segmentation With Patch Aligned Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP’s contrastive loss, intending to train an alignment between the patch tokens of the vision encoder and the CLS token of the text encoder. |
Jishnu Mukhoti; Tsung-Yu Lin; Omid Poursaeed; Rui Wang; Ashish Shah; Philip H.S. Torr; Ser-Nam Lim; |
339 | CLIP The Gap: A Single Domain Generalization Approach for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenges of simultaneously learning robust object localization and representation, we propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. |
Vidit Vidit; Martin Engilberge; Mathieu Salzmann; |
340 | Co-Training 2L Submodels for Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. |
Hugo Touvron; Matthieu Cord; Maxime Oquab; Piotr Bojanowski; Jakob Verbeek; Hervé Jégou; |
341 | On The Importance of Accurate Geometry Data for Dense 3D Vision Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper investigates the effect of sensor errors for the dense 3D vision tasks of depth estimation and reconstruction. |
HyunJun Jung; Patrick Ruhkamp; Guangyao Zhai; Nikolas Brasch; Yitong Li; Yannick Verdie; Jifei Song; Yiren Zhou; Anil Armagan; Slobodan Ilic; Aleš Leonardis; Nassir Navab; Benjamin Busam; |
342 | Camouflaged Instance Segmentation Via Explicit De-Camouflaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous instance segmentation methods perform poorly on this task as they are easily disturbed by the deceptive camouflage. To address these challenges, we propose a novel De-camouflaging Network (DCNet) including a pixel-level camouflage decoupling module and an instance-level camouflage suppression module. |
Naisong Luo; Yuwen Pan; Rui Sun; Tianzhu Zhang; Zhiwei Xiong; Feng Wu; |
343 | Understanding Masked Autoencoders Via Hierarchical Latent Variable Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formally characterize and justify existing empirical insights and provide theoretical guarantees of MAE. |
Lingjing Kong; Martin Q. Ma; Guangyi Chen; Eric P. Xing; Yuejie Chi; Louis-Philippe Morency; Kun Zhang; |
344 | K-Planes: Explicit Radiance Fields in Space, Time, and Appearance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions. |
Sara Fridovich-Keil; Giacomo Meanti; Frederik Rahbæk Warburg; Benjamin Recht; Angjoo Kanazawa; |
345 | Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a Multi-mode Online Knowledge Distillation method (MOKD) to boost self-supervised visual representation learning. |
Kaiyou Song; Jin Xie; Shan Zhang; Zimeng Luo; |
346 | Unbalanced Optimal Transport: A Unified Framework for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Each of these strategies comes with its own properties, underlying losses, and heuristics. We show how Unbalanced Optimal Transport unifies these different approaches and opens a whole continuum of methods in between. |
Henri De Plaen; Pierre-François De Plaen; Johan A. K. Suykens; Marc Proesmans; Tinne Tuytelaars; Luc Van Gool; |
347 | Viewpoint Equivariance for Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we gain intuition from the integral role of multi-view consistency in 3D scene understanding and geometric learning. |
Dian Chen; Jie Li; Vitor Guizilini; Rares Andrei Ambrus; Adrien Gaidon; |
348 | Photo Pre-Training, But for Sketch Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This lack of sketch data has imposed on the community a few "peculiar" design choices — the most representative of them all is perhaps the coerced utilisation of photo-based pre-training (i.e., no sketch), for many core tasks that otherwise dictates specific sketch understanding. In this paper, we ask just the one question — can we make such photo-based pre-training, to actually benefit sketch? |
Ke Li; Kaiyue Pang; Yi-Zhe Song; |
349 | NeuralPCI: Spatio-Temporal Neural Field for 3D Point Cloud Multi-Frame Non-Linear Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meanwhile, the existence of numerous nonlinear large motions in real-world scenarios makes the point cloud interpolation task more challenging. In light of these issues, we present NeuralPCI: an end-to-end 4D spatio-temporal Neural field for 3D Point Cloud Interpolation, which implicitly integrates multi-frame information to handle nonlinear large motions for both indoor and outdoor scenarios. |
Zehan Zheng; Danni Wu; Ruisi Lu; Fan Lu; Guang Chen; Changjun Jiang; |
350 | Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition. |
Wenhao Wu; Xiaohan Wang; Haipeng Luo; Jingdong Wang; Yi Yang; Wanli Ouyang; |
351 | Adaptive Plasticity Improvement for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new method, called adaptive plasticity improvement (API), for continual learning. |
Yan-Shuo Liang; Wu-Jun Li; |
352 | Pic2Word: Mapping Pictures to Words for Zero-Shot Composed Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to study an important task, Zero-Shot Composed Image Retrieval (ZS-CIR), whose goal is to build a CIR model without requiring labeled triplets for training. |
Kuniaki Saito; Kihyuk Sohn; Xiang Zhang; Chun-Liang Li; Chen-Yu Lee; Kate Saenko; Tomas Pfister; |
353 | MMANet: Margin-Aware Distillation and Modality-Aware Regularization for Incomplete Multimodal Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a general framework called MMANet to assist incomplete multimodal learning. |
Shicai Wei; Chunbo Luo; Yang Luo; |
354 | Putting People in Their Place: Affordance-Aware Human Insertion Into Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes. |
Sumith Kulal; Tim Brooks; Alex Aiken; Jiajun Wu; Jimei Yang; Jingwan Lu; Alexei A. Efros; Krishna Kumar Singh; |
355 | 3D Neural Field Generation Using Triplane Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present an efficient diffusion-based model for 3D-aware generation of neural fields. |
J. Ryan Shue; Eric Ryan Chan; Ryan Po; Zachary Ankner; Jiajun Wu; Gordon Wetzstein; |
356 | Regularized Vector Quantization for Tokenized Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, deterministic quantization suffers from severe codebook collapse and misaligned inference stage while stochastic quantization suffers from low codebook utilization and perturbed reconstruction objective. This paper presents a regularized vector quantization framework that allows to mitigate above issues effectively by applying regularization from two perspectives. |
Jiahui Zhang; Fangneng Zhan; Christian Theobalt; Shijian Lu; |
357 | Semantic Scene Completion With Cleaner Self Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we use the ground-truth 3D voxels to generate a perfect visible surface, called TSDF-CAD, and then train a cleaner SSC model. |
Fengyun Wang; Dong Zhang; Hanwang Zhang; Jinhui Tang; Qianru Sun; |
358 | Improving Image Recognition By Retrieving From Web-Scale Image-Text Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce an attention-based memory module, which learns the importance of each retrieved example from the memory. |
Ahmet Iscen; Alireza Fathi; Cordelia Schmid; |
359 | Deep Factorized Metric Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Differently, we propose a deep factorized metric learning method (DFML) to factorize the training signal and employ different samples to train various components of the backbone network. |
Chengkun Wang; Wenzhao Zheng; Junlong Li; Jie Zhou; Jiwen Lu; |
360 | High-Fidelity 3D Face Generation From Natural Language Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We argue the major obstacle lies in 1) the lack of high-quality 3D face data with descriptive text annotation, and 2) the complex mapping relationship between descriptive language space and shape/appearance space. |
Menghua Wu; Hao Zhu; Linjia Huang; Yiyu Zhuang; Yuanxun Lu; Xun Cao; |
361 | A Generalized Framework for Video Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. |
Miran Heo; Sukjun Hwang; Jeongseok Hyun; Hanjung Kim; Seoung Wug Oh; Joon-Young Lee; Seon Joo Kim; |
362 | Multi-Level Logit Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Concretely, we propose a simple yet effective approach to logit distillation via multi-level prediction alignment. |
Ying Jin; Jiaqi Wang; Dahua Lin; |
363 | On Distillation of Guided Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, a downside of classifier-free guided diffusion models is that they are computationally expensive at inference time since they require evaluating two diffusion models, a class-conditional model and an unconditional model, tens to hundreds of times. To deal with this limitation, we propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from: Given a pre-trained classifier-free guided model, we first learn a single model to match the output of the combined conditional and unconditional models, and then we progressively distill that model to a diffusion model that requires much fewer sampling steps. |
Chenlin Meng; Robin Rombach; Ruiqi Gao; Diederik Kingma; Stefano Ermon; Jonathan Ho; Tim Salimans; |
364 | Dual-Path Adaptation From Image to Video Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters. |
Jungin Park; Jiyoung Lee; Kwanghoon Sohn; |
365 | Towards Better Decision Forests: Forest Alternating Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, unlike for most other models, such as neural networks, optimizing forests or trees is not easy, because they define a non-differentiable function. We show, for the first time, that it is possible to learn a forest by optimizing a desirable loss and regularization jointly over all its trees and parameters. |
Miguel Á. Carreira-Perpiñán; Magzhan Gabidolla; Arman Zharmagambetov; |
366 | DA Wand: Distortion-Aware Selection Using Neural Mesh Parameterization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a neural technique for learning to select a local sub-region around a point which can be used for mesh parameterization. |
Richard Liu; Noam Aigerman; Vladimir G. Kim; Rana Hanocka; |
367 | Disentangled Representation Learning for Unsupervised Neural Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we firstly point out a problem that an existing deep learning-based quantizer hardly benefits from the residual vector space, unlike conventional shallow quantizers. To cope with this problem, we introduce a novel disentangled representation learning for unsupervised neural quantization. |
Haechan Noh; Sangeek Hyun; Woojin Jeong; Hanshin Lim; Jae-Pil Heo; |
368 | Hierarchical Semantic Correspondence Networks for Video Paragraph Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel Hierarchical Semantic Correspondence Network (HSCNet), which explores multi-level visual-textual correspondence by learning hierarchical semantic alignment and utilizes dense supervision by grounding diverse levels of queries. |
Chaolei Tan; Zihang Lin; Jian-Fang Hu; Wei-Shi Zheng; Jianhuang Lai; |
369 | Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate existing methods and present a general framework of spatiotemporal predictive learning, in which the spatial encoder and decoder capture intra-frame features and the middle temporal module catches inter-frame correlations. |
Cheng Tan; Zhangyang Gao; Lirong Wu; Yongjie Xu; Jun Xia; Siyuan Li; Stan Z. Li; |
370 | Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a zero-shot approach that requires only the widely available deformed non-stylized avatars in training, and deforms stylized characters of significantly different shapes at inference. |
Jiashun Wang; Xueting Li; Sifei Liu; Shalini De Mello; Orazio Gallo; Xiaolong Wang; Jan Kautz; |
371 | Listening Human Behavior: 3D Human Pose Estimation With Acoustic Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, we introduce a framework that encodes multichannel audio features into 3D human poses. |
Yuto Shibata; Yutaka Kawashima; Mariko Isogawa; Go Irie; Akisato Kimura; Yoshimitsu Aoki; |
372 | Meta-Learning With A Geometry-Adaptive Preconditioner Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, they do not satisfy the Riemannian metric condition, which can enable the steepest descent learning with preconditioned gradient. In this study, we propose Geometry-Adaptive Preconditioned gradient descent (GAP) that can overcome the limitations in MAML; GAP can efficiently meta-learn a preconditioner that is dependent on task-specific parameters, and its preconditioner can be shown to be a Riemannian metric. |
Suhyun Kang; Duhun Hwang; Moonjung Eo; Taesup Kim; Wonjong Rhee; |
373 | Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the limitation, we propose a knowledge graph with Dynamic structure and nodes to facilitate chest X-ray report generation with Contrastive Learning, named DCL. |
Mingjie Li; Bingqian Lin; Zicong Chen; Haokun Lin; Xiaodan Liang; Xiaojun Chang; |
374 | BiCro: Noisy Correspondence Rectification for Multi-Modality Data Via Bi-Directional Cross-Modal Similarity Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, the cheaply collected dataset unavoidably contains many mismatched data pairs, which have been proven to be harmful to the model’s performance. To address this, we propose a general framework called BiCro (Bidirectional Cross-modal similarity consistency), which can be easily integrated into existing cross-modal matching models and improve their robustness against noisy data. |
Shuo Yang; Zhaopan Xu; Kai Wang; Yang You; Hongxun Yao; Tongliang Liu; Min Xu; |
375 | Transfer Knowledge From Head to Tail: Uncertainty Calibration Under Long-Tailed Distribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the problem of calibrating the model trained from a long-tailed distribution. |
Jiahao Chen; Bing Su; |
376 | FrustumFormer: Adaptive Instance-Aware Resampling for Multi-View 3D Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel framework named FrustumFormer, which pays more attention to the features in instance regions via adaptive instance-aware resampling. |
Yuqi Wang; Yuntao Chen; Zhaoxiang Zhang; |
377 | Global Vision Transformer Pruning With Hessian-Aware Saliency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims on challenging the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage, where we redistribute the parameters both across transformer blocks and between different structures within the block via the first systematic attempt on global structural pruning. |
Huanrui Yang; Hongxu Yin; Maying Shen; Pavlo Molchanov; Hai Li; Jan Kautz; |
378 | Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, we propose an effective twostage sharpness-aware optimization approach based on the decoupling paradigm in DLTR. |
Zhipeng Zhou; Lanqing Li; Peilin Zhao; Pheng-Ann Heng; Wei Gong; |
379 | ScarceNet: Animal Pose Estimation With Scarce Annotations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose the ScarceNet, a pseudo label-based approach to generate artificial labels for the unlabeled images. |
Chen Li; Gim Hee Lee; |
380 | OmniCity: Omnipotent City Understanding With Multi-Level and Multi-View Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents OmniCity, a new dataset for omnipotent city understanding from multi-level and multi-view images. |
Weijia Li; Yawen Lai; Linning Xu; Yuanbo Xiangli; Jinhua Yu; Conghui He; Gui-Song Xia; Dahua Lin; |
381 | Efficient On-Device Training Via Gradient Filtering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, in this paper, we propose a new gradient filtering approach which enables on-device CNN model training. |
Yuedong Yang; Guihong Li; Radu Marculescu; |
382 | SViTT: Temporal Learning of Sparse Video-Text Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify several key challenges in temporal learning of video-text transformers: the spatiotemporal trade-off from limited network size; the curse of dimensionality for multi-frame modeling; and the diminishing returns of semantic information by extending clip length. |
Yi Li; Kyle Min; Subarna Tripathi; Nuno Vasconcelos; |
383 | NeuralDome: A Neural Modeling Pipeline on Multi-View Human-Object Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To process the HODome dataset, we develop NeuralDome, a layer-wise neural processing pipeline tailored for multi-view video inputs to conduct accurate tracking, geometry reconstruction and free-view rendering, for both human subjects and objects. |
Juze Zhang; Haimin Luo; Hongdi Yang; Xinru Xu; Qianyang Wu; Ye Shi; Jingyi Yu; Lan Xu; Jingya Wang; |
384 | 3D Human Mesh Estimation From Virtual Markers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present an intermediate representation, named virtual markers, which learns 64 landmark keypoints on the body surface based on the large-scale mocap data in a generative style, mimicking the effects of physical markers. |
Xiaoxuan Ma; Jiajun Su; Chunyu Wang; Wentao Zhu; Yizhou Wang; |
385 | CUDA: Convolution-Based Unlearnable Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel, model-free, Convolution-based Unlearnable DAtaset (CUDA) generation technique. |
Vinu Sankar Sadasivan; Mahdi Soltanolkotabi; Soheil Feizi; |
386 | No One Left Behind: Improving The Worst Categories in Long-Tailed Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a simple plug-in method that is applicable to a wide range of methods. |
Yingxiao Du; Jianxin Wu; |
387 | Deep Fair Clustering Via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although a number of works have been conducted and achieved huge success recently, most of them are heuristical, and there lacks a unified theory for algorithm design. In this work, we fill this blank by developing a mutual information theory for deep fair clustering and accordingly designing a novel algorithm, dubbed FCMI. |
Pengxin Zeng; Yunfan Li; Peng Hu; Dezhong Peng; Jiancheng Lv; Xi Peng; |
388 | MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the problem, we propose a multi-information aggregation network (MIANet) that effectively leverages the general knowledge, i.e., semantic word embeddings, and instance information for accurate segmentation. |
Yong Yang; Qiong Chen; Yuan Feng; Tianlin Huang; |
389 | High Fidelity 3D Hand Shape Reconstruction Via Scalable Graph Frequency Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To capture high-frequency personalized details, we transform the 3D mesh into the frequency domain, and propose a novel frequency decomposition loss to supervise each frequency component. |
Tianyu Luan; Yuanhao Zhai; Jingjing Meng; Zhong Li; Zhang Chen; Yi Xu; Junsong Yuan; |
390 | COT: Unsupervised Domain Adaptation With Clustering and Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To cope with two aforementioned issues, we propose a Clustering-based Optimal Transport (COT) algorithm, which formulates the alignment procedure as an Optimal Transport problem and constructs a mapping between clustering centers in the source and target domain via an end-to-end manner. |
Yang Liu; Zhipeng Zhou; Baigui Sun; |
391 | Target-Referenced Reactive Grasping for Dynamic Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to solve reactive grasping in a target-referenced setting by tracking through generated grasp spaces. |
Jirong Liu; Ruo Zhang; Hao-Shu Fang; Minghao Gou; Hongjie Fang; Chenxi Wang; Sheng Xu; Hengxu Yan; Cewu Lu; |
392 | Learning To Exploit The Sequence-Specific Prior Knowledge for Image Processing Pipelines Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a sequential ISP hyperparameter prediction framework that utilizes the sequential relationship within ISP modules and the similarity among parameters to guide the model sequence process. |
Haina Qin; Longfei Han; Weihua Xiong; Juan Wang; Wentao Ma; Bing Li; Weiming Hu; |
393 | Complexity-Guided Slimmable Decoder for Efficient Deep Video Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the complexity-guided slimmable decoder (cgSlimDecoder) in combination with skip-adaptive entropy coding (SaEC) for efficient deep video compression. |
Zhihao Hu; Dong Xu; |
394 | Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, the efficient combination of CNNs and Transformers is investigated, and a hybrid architecture called Lite-Mono is presented. |
Ning Zhang; Francesco Nex; George Vosselman; Norman Kerle; |
395 | MarginMatch: Improving Semi-Supervised Learning with Pseudo-Margins Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MarginMatch, a new SSL approach combining consistency regularization and pseudo-labeling, with its main novelty arising from the use of unlabeled data training dynamics to measure pseudo-label quality. |
Tiberiu Sosea; Cornelia Caragea; |
396 | Neural Scene Chronology Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to reconstruct a time-varying 3D model, capable of rendering photo-realistic renderings with independent control of viewpoint, illumination, and time, from Internet photos of large-scale landmarks. |
Haotong Lin; Qianqian Wang; Ruojin Cai; Sida Peng; Hadar Averbuch-Elor; Xiaowei Zhou; Noah Snavely; |
397 | Starting From Non-Parametric Networks for 3D Point Cloud Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a Non-parametric Network for 3D point cloud analysis, Point-NN, which consists of purely non-learnable components: farthest point sampling (FPS), k-nearest neighbors (k-NN), and pooling operations, with trigonometric functions. |
Renrui Zhang; Liuhui Wang; Yali Wang; Peng Gao; Hongsheng Li; Jianbo Shi; |
398 | Light Source Separation and Intrinsic Image Decomposition Under AC Illumination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that the flickers due to AC illumination is useful for intrinsic image decomposition (IID). |
Yusaku Yoshida; Ryo Kawahara; Takahiro Okabe; |
399 | TIPI: Test Time Adaptation With Transformation Invariance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When deploying a machine learning model to a new environment, we often encounter the distribution shift problem — meaning the target data distribution is different from the model’s training distribution. In this paper, we assume that labels are not provided for this new domain, and that we do not store the source data (e.g., for privacy reasons). |
A. Tuan Nguyen; Thanh Nguyen-Tang; Ser-Nam Lim; Philip H.S. Torr; |
400 | OTAvatar: One-Shot Talking Face Avatar With Controllable Tri-Plane Rendering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose One-shot Talking face Avatar (OTAvatar), which constructs face avatars by a generalized controllable tri-plane rendering solution so that each personalized avatar can be constructed from only one portrait as the reference. |
Zhiyuan Ma; Xiangyu Zhu; Guo-Jun Qi; Zhen Lei; Lei Zhang; |
401 | Beyond Appearance: A Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to learn a general human representation from massive unlabeled human images which can benefit downstream human-centric tasks to the maximum extent. |
Weihua Chen; Xianzhe Xu; Jian Jia; Hao Luo; Yaohua Wang; Fan Wang; Rong Jin; Xiuyu Sun; |
402 | Large-Capacity and Flexible Video Steganography Via Invertible Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For large-capacity, we present a reversible pipeline to perform multiple videos hiding and recovering through a single invertible neural network (INN). |
Chong Mou; Youmin Xu; Jiechong Song; Chen Zhao; Bernard Ghanem; Jian Zhang; |
403 | CFA: Class-Wise Calibrated Fair Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we are the first to theoretically and empirically investigate the preference of different classes for adversarial configurations, including perturbation margin, regularization, and weight averaging. |
Zeming Wei; Yifei Wang; Yiwen Guo; Yisen Wang; |
404 | EVAL: Explainable Video Anomaly Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a novel framework for single-scene video anomaly localization that allows for human-understandable reasons for the decisions the system makes. |
Ashish Singh; Michael J. Jones; Erik G. Learned-Miller; |
405 | Position-Guided Text Prompt for Vision-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel Position-guided Text Prompt (PTP) paradigm to enhance the visual grounding ability of cross-modal models trained with VLP. |
Jinpeng Wang; Pan Zhou; Mike Zheng Shou; Shuicheng Yan; |
406 | HOLODIFFUSION: Training A 3D Diffusion Model Using 2D Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the first challenge by introducing a new diffusion setup that can be trained, end-to-end, with only posed 2D images for supervision; and the second challenge by proposing an image formation model that decouples model memory from spatial memory. |
Animesh Karnewar; Andrea Vedaldi; David Novotny; Niloy J. Mitra; |
407 | Stimulus Verification Is A Universal and Effective Sampler in Multi-Modal Human Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose stimulus verification, serving as a universal and effective sampling process to improve the multi-modal prediction capability, where stimulus refers to the factor in the observation that may affect the future movements such as social interaction and scene context. |
Jianhua Sun; Yuxuan Li; Liang Chai; Cewu Lu; |
408 | 3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we facilitate the issue by decomposing correlation learning into space and time, and present a novel Spatio-Temporal Criss-cross attention (STC) block. |
Zhenhua Tang; Zhaofan Qiu; Yanbin Hao; Richang Hong; Ting Yao; |
409 | Plateau-Reduced Differentiable Path Tracing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, inverse rendering might not converge due to inherent plateaus, i.e., regions of zero gradient, in the objective function. We propose to alleviate this by convolving the high-dimensional rendering function that maps scene parameters to images with an additional kernel that blurs the parameter space. |
Michael Fischer; Tobias Ritschel; |
410 | LoGoNet: Towards Accurate 3D Object Detection With Local-to-Global Cross-Modal Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the novel Local-to-Global fusion network (LoGoNet), which performs LiDAR-camera fusion at both local and global levels. |
Xin Li; Tao Ma; Yuenan Hou; Botian Shi; Yuchen Yang; Youquan Liu; Xingjiao Wu; Qin Chen; Yikang Li; Yu Qiao; Liang He; |
411 | ScaleKD: Distilling Scale-Aware Knowledge in Small Object Detector Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike existing works that struggle to balance the trade-off between inference speed and SOD performance, in this paper, we propose a novel Scale-aware Knowledge Distillation (ScaleKD), which transfers knowledge of a complex teacher model to a compact student model. |
Yichen Zhu; Qiqi Zhou; Ning Liu; Zhiyuan Xu; Zhicai Ou; Xiaofeng Mou; Jian Tang; |
412 | An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we systematically examine the potential of MVM in the context of VidL learning. |
Tsu-Jui Fu; Linjie Li; Zhe Gan; Kevin Lin; William Yang Wang; Lijuan Wang; Zicheng Liu; |
413 | Glocal Energy-Based Learning for Few-Shot Open-Set Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we approach the FSOR task by proposing a novel energy-based hybrid model. |
Haoyu Wang; Guansong Pang; Peng Wang; Lei Zhang; Wei Wei; Yanning Zhang; |
414 | Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain. |
Ruyang Liu; Jingjia Huang; Ge Li; Jiashi Feng; Xinglong Wu; Thomas H. Li; |
415 | MethaneMapper: Spectral Absorption Aware Hyperspectral Transformer for Methane Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods for analyzing this data are sensitive to local terrain conditions, often require manual inspection from domain experts, prone to significant error and hence are not scalable. To address these challenges, we propose a novel end-to-end spectral absorption wavelength aware transformer network, MethaneMapper, to detect and quantify the emissions. |
Satish Kumar; Ivan Arevalo; ASM Iftekhar; B S Manjunath; |
416 | Autonomous Manipulation Learning for Similar Deformable Objects Via Only One Demonstration Related Papers |