Paper Digest: CVPR 2022 Highlights
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) is one of the top computer vision conferences in the world. In 2022, it is to be held in New Orleans.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.
If you do not want to miss interesting academic papers, you are welcome to sign up our daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: CVPR 2022 Highlights
Paper | Author(s) | |
---|---|---|
1 | Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we explore how to extend self-attention modules to better learn subtle feature embeddings for recognizing fine-grained objects, e.g., different bird species or person identities. |
Haowei Zhu; Wenjing Ke; Dong Li; Ji Liu; Lu Tian; Yi Shan; |
2 | SimAN: Exploring Self-Supervised Representation Learning of Scene Text Via Similarity-Aware Normalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, we propose a Similarity-Aware Normalization (SimAN) module to identify the different patterns and align the corresponding styles from the guiding patch. |
Canjie Luo; Lianwen Jin; Jingdong Chen; |
3 | GASP, A Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a theoretical framework that generalizes simple and fast algorithms for hierarchical agglomerative clustering to weighted graphs with both attractive and repulsive interactions between the nodes. |
Alberto Bailoni; Constantin Pape; Nathan Hütsch; Steffen Wolf; Thorsten Beier; Anna Kreshuk; Fred A. Hamprecht; |
4 | Estimating Example Difficulty Using Variance of Gradients Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose Variance of Gradients (VoG) as a valuable and efficient metric to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing. |
Chirag Agarwal; Daniel D’souza; Sara Hooker; |
5 | One Loss for Quantization: Deep Hashing With Discrete Wasserstein Distributional Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper considers an alternative approach to learning the quantization constraints. |
Khoa D. Doan; Peng Yang; Ping Li; |
6 | Pixel Screening Based Intermediate Correction for Blind Deblurring Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, these methods still fail while dealing with images containing saturations and large blurs. To address this problem, we propose an intermediate image correction method which utilizes Bayes posterior estimation to screen through the intermediate image and exclude those unfavorable pixels to reduce their influence for kernel estimation. |
Meina Zhang; Yingying Fang; Guoxi Ni; Tieyong Zeng; |
7 | Weakly Supervised Semantic Segmentation By Pixel-to-Prototype Contrast Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we propose weakly-supervised pixel-to-prototype contrast that can provide pixel-level supervisory signals to narrow the gap. |
Ye Du; Zehua Fu; Qingjie Liu; Yunhong Wang; |
8 | Controllable Animation of Fluid Elements in Still Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a method to interactively control the animation of fluid elements in still images to generate cinemagraphs. |
Aniruddha Mahapatra; Kuldeep Kulkarni; |
9 | Holocurtains: Programming Light Curtains Via Binary Holography Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose Holocurtains: a light-efficient approach to producing light curtains of arbitrary shape. |
Dorian Chan; Srinivasa G. Narasimhan; Matthew O’Toole; |
10 | Recurrent Dynamic Embedding for Video Object Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. |
Mingxing Li; Li Hu; Zhiwei Xiong; Bang Zhang; Pan Pan; Dong Liu; |
11 | Deep Hierarchical Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we instead address hierarchical semantic segmentation (HSS), which aims at structured, pixel-wise description of visual observation in terms of a class hierarchy. |
Liulei Li; Tianfei Zhou; Wenguan Wang; Jianwu Li; Yi Yang; |
12 | F-SfT: Shape-From-Template With A Physics-Based Deformation Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast to previous works, this paper proposes a new SfT approach explaining 2D observations through physical simulations accounting for forces and material properties. |
Navami Kairanda; Edith Tretschk; Mohamed Elgharib; Christian Theobalt; Vladislav Golyanik; |
13 | Continual Object Detection Via Prototypical Task Correlation Guided Gating Mechanism Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Different from previous works that tune the whole network for all tasks, in this work, we present a simple and flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTing mechAnism (ROSETTA). |
Binbin Yang; Xinchi Deng; Han Shi; Changlin Li; Gengwei Zhang; Hang Xu; Shen Zhao; Liang Lin; Xiaodan Liang; |
14 | DATA: Domain-Aware and Task-Aware Self-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present DATA, a simple yet effective NAS approach specialized for SSL that provides Domain-Aware and Task-Aware pre-training. |
Qing Chang; Junran Peng; Lingxi Xie; Jiajun Sun; Haoran Yin; Qi Tian; Zhaoxiang Zhang; |
15 | TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To leverage the unlabeled data to boost model performance, we present a novel Two-Way Inter-label Self-Training framework named TWIST. |
Ruihang Chu; Xiaoqing Ye; Zhengzhe Liu; Xiao Tan; Xiaojuan Qi; Chi-Wing Fu; Jiaya Jia; |
16 | Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel voxel-based architecture, namely Voxel Set Transformer (VoxSeT), to detect 3D objects from point clouds by means of set-to-set translation. |
Chenhang He; Ruihuang Li; Shuai Li; Lei Zhang; |
17 | Learning Adaptive Warping for Real-World Rolling Shutter Correction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a real-world rolling shutter (RS) correction dataset, BS-RSC, and a corresponding model to correct the RS frames in a distorted video. |
Mingdeng Cao; Zhihang Zhong; Jiahao Wang; Yinqiang Zheng; Yujiu Yang; |
18 | Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Siamese Contrastive Embedding Network (SCEN) for unseen composition recognition. |
Xiangyu Li; Xu Yang; Kun Wei; Cheng Deng; Muli Yang; |
19 | Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Bongard-HOI, a new visual reasoning benchmark that focuses on compositional learning of human-object interactions (HOIs) from natural images. |
Huaizu Jiang; Xiaojian Ma; Weili Nie; Zhiding Yu; Yuke Zhu; Anima Anandkumar; |
20 | RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce RIM-Net, a neural network which learns recursive implicit fields for unsupervised inference of hierarchical shape structures. |
Chengjie Niu; Manyi Li; Kai Xu; Hao Zhang; |
21 | Do Learned Representations Respect Causal Relationships? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Data often has many semantic attributes that are causally associated with each other. But do attribute-specific learned representations of data also respect the same causal relations? We answer this question in three steps. |
Lan Wang; Vishnu Naresh Boddeti; |
22 | ZebraPose: Coarse To Fine Surface Encoding for 6DoF Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a discrete descriptor, which can represent the object surface densely. |
Yongzhi Su; Mahdi Saleh; Torben Fetzer; Jason Rambach; Nassir Navab; Benjamin Busam; Didier Stricker; Federico Tombari; |
23 | ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of generating caption given an image. In this work, we repurpose such models to generate a descriptive text given an image at inference time, without any further training or tuning step. |
Yoad Tewel; Yoav Shalev; Idan Schwartz; Lior Wolf; |
24 | Learning To Affiliate: Mutual Centralized Learning for Few-Shot Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: They generally explore a unidirectional paradigm, e.g., find the nearest support feature for every query feature and aggregate these local matches for a joint classification. In this paper, we propose a novel Mutual Centralized Learning (MCL) to fully affiliate these two disjoint dense features sets in a bidirectional paradigm. |
Yang Liu; Weifeng Zhang; Chao Xiang; Tu Zheng; Deng Cai; Xiaofei He; |
25 | CAPRI-Net: Learning Compact CAD Shapes With Adaptive Primitive Assembly Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce CAPRI-Net, a self-supervised neural network for learning compact and interpretable implicit representations of 3D computer-aided design (CAD) models, in the form of adaptive primitive assemblies. |
Fenggen Yu; Zhiqin Chen; Manyi Li; Aditya Sanghi; Hooman Shayani; Ali Mahdavi-Amiri; Hao Zhang; |
26 | ATPFL: Automatic Trajectory Prediction Model Design Under Federated Learning Framework Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Besides, the existing works ignore Federated Learning (FL) scenarios, failing to make full use of distributed multi-source datasets with rich actual scenes to learn more a powerful TP model. In this paper, we make up for the above defects and propose ATPFL to help users federate multi-source trajectory datasets to automatically design and train a powerful TP model. |
Chunnan Wang; Xiang Chen; Junzhe Wang; Hongzhi Wang; |
27 | Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: These affine parameters were introduced to maintain the expressive powers of the model following normalization. While this hypothesis holds true for classification within the same domain, this work illustrates that these parameters are detrimental to downstream performance on common few-shot transfer tasks. |
Moslem Yazdanpanah; Aamer Abdul Rahman; Muawiz Chaudhary; Christian Desrosiers; Mohammad Havaei; Eugene Belilovsky; Samira Ebrahimi Kahou; |
28 | Bridging The Gap Between Classification and Localization for Weakly Supervised Object Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we find the gap between classification and localization in terms of the misalignment of the directions between an input feature and a class-specific weight. |
Eunji Kim; Siwon Kim; Jungbeom Lee; Hyunwoo Kim; Sungroh Yoon; |
29 | Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a new transformer-based framework to learn class-specific object localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS). |
Lian Xu; Wanli Ouyang; Mohammed Bennamoun; Farid Boussaid; Dan Xu; |
30 | 3D Moments From Near-Duplicate Photos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce 3D Moments, a new computational photography effect. |
Qianqian Wang; Zhengqi Li; David Salesin; Noah Snavely; Brian Curless; Janne Kontkanen; |
31 | Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we, for the first time to our best knowledge, propose to perform Exact Feature Distribution Matching (EFDM) by exactly matching the empirical Cumulative Distribution Functions (eCDFs) of image features, which could be implemented by applying the Exact Histogram Matching (EHM) in the image feature space. |
Yabin Zhang; Minghan Li; Ruihuang Li; Kui Jia; Lei Zhang; |
32 | Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple yet efficient approach called Blind2Unblind to overcome the information loss in blindspot-driven denoising methods. |
Zejin Wang; Jiazheng Liu; Guoqing Li; Hua Han; |
33 | Balanced and Hierarchical Relation Learning for One-Shot Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce the balanced and hierarchical learning for our detector. |
Hanqing Yang; Sijia Cai; Hualian Sheng; Bing Deng; Jianqiang Huang; Xian-Sheng Hua; Yong Tang; Yu Zhang; |
34 | End-to-End Generative Pretraining for Multimodal Video Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present Multimodal Video Generative Pretraining (MV-GPT), a new pretraining framework for learning from unlabelled videos which can be effectively used for generative tasks such as multimodal video captioning. |
Paul Hongsuck Seo; Arsha Nagrani; Anurag Arnab; Cordelia Schmid; |
35 | Delving Deep Into The Generalization of Vision Transformers Under Distribution Shifts Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we provide a comprehensive study on the out-of-distribution generalization of Vision Transformers. |
Chongzhi Zhang; Mingyuan Zhang; Shanghang Zhang; Daisheng Jin; Qiang Zhou; Zhongang Cai; Haiyu Zhao; Xianglong Liu; Ziwei Liu; |
36 | NICE-SLAM: Neural Implicit Scalable Encoding for SLAM Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present NICE-SLAM, a dense SLAM system that incorporates multi-level local information by introducing a hierarchical scene representation. |
Zihan Zhu; Songyou Peng; Viktor Larsson; Weiwei Xu; Hujun Bao; Zhaopeng Cui; Martin R. Oswald; Marc Pollefeys; |
37 | HyperDet3D: Learning A Scene-Conditioned 3D Object Detector Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose HyperDet3D to explore scene-conditioned prior knowledge for 3D object detection. |
Yu Zheng; Yueqi Duan; Jiwen Lu; Jie Zhou; Qi Tian; |
38 | Stochastic Trajectory Prediction Via Motion Indeterminacy Diffusion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a new framework to formulate the trajectory prediction task as a reverse process of motion indeterminacy diffusion (MID), in which we progressively discard indeterminacy from all the walkable areas until reaching the desired trajectory. |
Tianpei Gu; Guangyi Chen; Junlong Li; Chunze Lin; Yongming Rao; Jie Zhou; Jiwen Lu; |
39 | CLRNet: Cross Layer Refinement Network for Lane Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present Cross Layer Refinement Network (CLRNet) aiming at fully utilizing both high-level and low-level features in lane detection. |
Tu Zheng; Yifei Huang; Yang Liu; Wenjian Tang; Zheng Yang; Deng Cai; Xiaofei He; |
40 | Cross-Modal Map Learning for Vision and Language Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a cross-modal map learning model for vision-and-language navigation that first learns to predict the top-down semantics on an egocentric map for both observed and unobserved regions, and then predicts a path towards the goal as a set of waypoints. |
Georgios Georgakis; Karl Schmeckpeper; Karan Wanchoo; Soham Dan; Eleni Miltsakaki; Dan Roth; Kostas Daniilidis; |
41 | Motion-Aware Contrastive Video Representation Learning Via Foreground-Background Merging Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Such bias makes the model suffer from weak generalization ability, leading to worse performance on downstream tasks such as action recognition. To alleviate such bias, we propose Foreground-background Merging (FAME) to deliberately compose the moving foreground region of the selected video onto the static background of others. |
Shuangrui Ding; Maomao Li; Tianyu Yang; Rui Qian; Haohang Xu; Qingyi Chen; Jue Wang; Hongkai Xiong; |
42 | Incremental Transformer Structure Enhanced Image Inpainting With Masking Positional Encoding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: On the other hand, attention-based models can learn better long-range dependency for the structure recovery, but they are limited by the heavy computation for inference with large image sizes. To address these issues, we propose to leverage an additional structure restorer to facilitate the image inpainting incrementally. |
Qiaole Dong; Chenjie Cao; Yanwei Fu; |
43 | Pointly-Supervised Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an embarrassingly simple point annotation scheme to collect weak supervision for instance segmentation. |
Bowen Cheng; Omkar Parkhi; Alexander Kirillov; |
44 | Cross-Modal Clinical Graph Transformer for Ophthalmic Report Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To endow models with the capability of incorporating expert knowledge, we propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG), in which clinical relation triples are injected into the visual features as prior knowledge to drive the decoding procedure. |
Mingjie Li; Wenjia Cai; Karin Verspoor; Shirui Pan; Xiaodan Liang; Xiaojun Chang; |
45 | Human-Object Interaction Detection Via Disentangled Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our main motivation is that detecting the human-object instances and classifying interactions accurately needs to learn representations that focus on different regions. To this end, we present Disentangled Transformer, where both encoder and decoder are disentangled to facilitate learning of two subtasks. |
Desen Zhou; Zhichao Liu; Jian Wang; Leshan Wang; Tao Hu; Errui Ding; Jingdong Wang; |
46 | DINE: Domain Adaptation From Single and Multiple Black-Box Predictors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper studies a practical and interesting setting for UDA, where only black-box source models (i.e., only network predictions are available) are provided during adaptation in the target domain. To solve this problem, we propose a new two-step knowledge adaptation framework called DIstill and fine-tuNE (DINE). |
Jian Liang; Dapeng Hu; Jiashi Feng; Ran He; |
47 | LGT-Net: Indoor Panoramic Room Layout Estimation With Geometry-Aware Transformer Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present that using horizon-depth along with room height can obtain omnidirectional-geometry awareness of room layout in both horizontal and vertical directions. |
Zhigang Jiang; Zhongzheng Xiang; Jinhua Xu; Ming Zhao; |
48 | CRIS: CLIP-Driven Referring Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the recent advance in Contrastive Language-Image Pretraining (CLIP), in this paper, we propose an end-to-end CLIP-Driven Referring Image Segmentation framework (CRIS). |
Zhaoqing Wang; Yu Lu; Qiang Li; Xunqiang Tao; Yandong Guo; Mingming Gong; Tongliang Liu; |
49 | Multi-View Mesh Reconstruction With Neural Deferred Shading Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an analysis-by-synthesis method for fast multi-view 3D reconstruction of opaque objects with arbitrary materials and illumination. |
Markus Worchel; Rodrigo Diaz; Weiwen Hu; Oliver Schreer; Ingo Feldmann; Peter Eisert; |
50 | CVF-SID: Cyclic Multi-Variate Function for Self-Supervised Image Denoising By Disentangling Noise From Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the aforementioned challenges, we propose a novel and powerful self-supervised denoising method called CVF-SID based on a Cyclic multi-Variate Function (CVF) module and a self-supervised image disentangling (SID) framework. |
Reyhaneh Neshatavar; Mohsen Yavartanoo; Sanghyun Son; Kyoung Mu Lee; |
51 | Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We proposed the infrared adversarial clothing, which could fool infrared pedestrian detectors at different angles. |
Xiaopei Zhu; Zhanhao Hu; Siyuan Huang; Jianmin Li; Xiaolin Hu; |
52 | Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel Distribution-Aware Single-stage (DAS) model for tackling the challenging multi-person 3D pose estimation problem. |
Zitian Wang; Xuecheng Nie; Xiaochao Qu; Yunpeng Chen; Si Liu; |
53 | FaceFormer: Speech-Driven 3D Facial Animation With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Prior works typically focus on learning phoneme-level features of short audio windows with limited context, occasionally resulting in inaccurate lip movements. To tackle this limitation, we propose a Transformer-based autoregressive model, FaceFormer, which encodes the long-term audio context and autoregressively predicts a sequence of animated 3D face meshes. |
Yingruo Fan; Zhaojiang Lin; Jun Saito; Wenping Wang; Taku Komura; |
54 | Exploring Patch-Wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the methods often ignore the diverse semantic relation within the images. To address this, here we propose a novel semantic relation consistency (SRC) regularization along with the decoupled contrastive learning (DCL), which utilize the diverse semantics by focusing on the heterogeneous semantics between the image patches of a single image. |
Chanyong Jung; Gihyun Kwon; Jong Chul Ye; |
55 | High-Resolution Face Swapping Via Latent Semantics Disentanglement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel high-resolution face swapping method using the inherent prior knowledge of a pre-trained GAN model. |
Yangyang Xu; Bailin Deng; Junle Wang; Yanqing Jing; Jia Pan; Shengfeng He; |
56 | Searching The Deployable Convolution Neural Networks for GPUs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper intends to expedite the model customization with a model hub that contains the optimized models tiered by their inference latency using Neural Architecture Search (NAS). |
Linnan Wang; Chenhan Yu; Satish Salian; Slawomir Kierat; Szymon Migacz; Alex Fit Florea; |
57 | Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Sparse Local Patch Transformer (SLPT) for learning the inherent relation. |
Jiahao Xia; Weiwei Qu; Wenjian Huang; Jianguo Zhang; Xi Wang; Min Xu; |
58 | DeepFake Disrupter: The Detector of DeepFake Is My Friend Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel DeepFake disruption algorithm called "DeepFake Disrupter". |
Xueyu Wang; Jiajun Huang; Siqi Ma; Surya Nepal; Chang Xu; |
59 | Rotationally Equivariant 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To incorporate object-level rotation equivariance into 3D object detectors, we need a mechanism to extract equivariant features with local object-level spatial support while being able to model cross-object context information. To this end, we propose Equivariant Object detection Network (EON) with a rotation equivariance suspension design to achieve object-level equivariance. |
Hong-Xing Yu; Jiajun Wu; Li Yi; |
60 | Accelerating DETR Convergence Via Semantic-Aligned Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We observe that the slow convergence is largely attributed to the complication in matching object queries with target features in different feature embedding spaces. This paper presents SAM-DETR, a Semantic-Aligned-Matching DETR that greatly accelerates DETR’s convergence without sacrificing its accuracy. |
Gongjie Zhang; Zhipeng Luo; Yingchen Yu; Kaiwen Cui; Shijian Lu; |
61 | Long-Short Temporal Contrastive Learning of Video Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we empirically demonstrate that self-supervised pretraining of video transformers on video-only datasets can lead to action recognition results that are on par or better than those obtained with supervised pretraining on large-scale image datasets, even massive ones such as ImageNet-21K. |
Jue Wang; Gedas Bertasius; Du Tran; Lorenzo Torresani; |
62 | Vision Transformer With Deformable Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This flexible scheme enables the self-attention module to focus on relevant regions and cap-ture more informative features. On this basis, we present Deformable Attention Transformer, a general backbone model with deformable attention for both image classifi-cation and dense prediction tasks. |
Zhuofan Xia; Xuran Pan; Shiji Song; Li Erran Li; Gao Huang; |
63 | Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose GPV-1, a task-agnostic vision-language architecture that can learn and perform tasks that involve receiving an image and producing text and/or bounding boxes, including classification, localization, visual question answering, captioning, and more. |
Tanmay Gupta; Amita Kamath; Aniruddha Kembhavi; Derek Hoiem; |
64 | Deep Vanishing Point Detection: Geometric Priors Make Dataset Variations Vanish Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Yet, deep networks require expensive annotated datasets trained on costly hardware and do not generalize to even slightly different domains, and minor problem variants. Here, we address these issues by injecting deep vanishing point detection networks with prior knowledge. |
Yancong Lin; Ruben Wiersma; Silvia L. Pintea; Klaus Hildebrandt; Elmar Eisemann; Jan C. van Gemert; |
65 | RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, an unsupervised learning framework is proposed to jointly predict monocular depth and complete 3D motion including the motions of moving objects and camera. |
Tak-Wai Hui; |
66 | LiT: Zero-Shot Transfer With Locked-Image Text Tuning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training. |
Xiaohua Zhai; Xiao Wang; Basil Mustafa; Andreas Steiner; Daniel Keysers; Alexander Kolesnikov; Lucas Beyer; |
67 | Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, synthesized persons in existing datasets are mostly cartoon-like and in random dress collocation, which limits their performance. To address this, in this work, an automatic approach is proposed to directly clone the whole outfits from real-world person images to virtual 3D characters, such that any virtual person thus created will appear very similar to its real-world counterpart. |
Yanan Wang; Xuezhi Liang; Shengcai Liao; |
68 | GeoNeRF: Generalizing NeRF With Geometry Priors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present GeoNeRF, a generalizable photorealistic novel view synthesis method based on neural radiance fields. |
Mohammad Mahdi Johari; Yann Lepoittevin; François Fleuret; |
69 | ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel adaptive blend pyramid network, which aims to achieve fast local retouching on ultra high-resolution photos. |
Biwen Lei; Xiefan Guo; Hongyu Yang; Miaomiao Cui; Xuansong Xie; Di Huang; |
70 | PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation With Photometrically Challenging Objects Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To provide a benchmark with high-quality ground truth annotations to the community, we introduce a multimodal dataset for category-level object pose estimation with photometrically challenging objects termed PhoCaL. |
Pengyuan Wang; HyunJun Jung; Yitong Li; Siyuan Shen; Rahul Parthasarathy Srikanth; Lorenzo Garattoni; Sven Meier; Nassir Navab; Benjamin Busam; |
71 | Neural Compression-Based Feature Learning for Video Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes learning noise-robust feature representations to help video restoration. |
Cong Huang; Jiahao Li; Bin Li; Dong Liu; Yan Lu; |
72 | Expanding Low-Density Latent Regions for Open-Set Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose to identify unknown objects by separating high/low-density regions in the latent space, based on the consensus that unknown objects are usually distributed in low-density latent regions. |
Jiaming Han; Yuqiang Ren; Jian Ding; Xingjia Pan; Ke Yan; Gui-Song Xia; |
73 | Drop The GAN: In Defense of Patches Nearest Neighbors As Single Image Generative Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, despite their impressiveness, single-image GANs require long training time (usually hours) for each image and each task and often suffer from visual artifacts. In this paper we revisit the classical patch-based methods, and show that – unlike previously believed — classical methods can be adapted to tackle these novel "GAN-only" tasks. |
Niv Granot; Ben Feinstein; Assaf Shocher; Shai Bagon; Michal Irani; |
74 | Uformer: A General U-Shaped Transformer for Image Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present Uformer, an effective and efficient Transformer-based architecture for image restoration, in which we build a hierarchical encoder-decoder network using the Transformer block. |
Zhendong Wang; Xiaodong Cun; Jianmin Bao; Wengang Zhou; Jianzhuang Liu; Houqiang Li; |
75 | Exploring Dual-Task Correlation for Pose Guided Person Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Most of the existing methods only focus on the ill-posed source-to-target task and fail to capture reasonable texture mapping. To address this problem, we propose a novel Dual-task Pose Transformer Network (DPTN), which introduces an auxiliary task (i.e., source-tosource task) and exploits the dual-task correlation to promote the performance of PGPIG. |
Pengze Zhang; Lingxiao Yang; Jian-Huang Lai; Xiaohua Xie; |
76 | Portrait Eyeglasses and Shadow Removal By Leveraging 3D Synthetic Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel framework to remove eyeglasses as well as their cast shadows from face images. |
Junfeng Lyu; Zhibo Wang; Feng Xu; |
77 | Neural Rays for Occlusion-Aware Image-Based Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a new neural representation, called Neural Ray (NeuRay), for the novel view synthesis task. |
Yuan Liu; Sida Peng; Lingjie Liu; Qianqian Wang; Peng Wang; Christian Theobalt; Xiaowei Zhou; Wenping Wang; |
78 | Modeling 3D Layout for Group Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, layout ambiguity is introduced because these methods only consider the 2D layout on the imaging plane. In this paper, we overcome the above limitations by 3D layout modeling. |
Quan Zhang; Kaiheng Dang; Jian-Huang Lai; Zhanxiang Feng; Xiaohua Xie; |
79 | Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here we propose a novel approach for mask proposals, Generic Grouping Networks (GGNs), constructed without semantic supervision. |
Weiyao Wang; Matt Feiszli; Heng Wang; Jitendra Malik; Du Tran; |
80 | SIOD: Single Instance Annotated Per Category Per Image for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Under the SIOD setting, we propose a simple yet effective framework, termed Dual-Mining (DMiner), which consists of a Similarity-based Pseudo Label Generating module (SPLG) and a Pixel-level Group Contrastive Learning module (PGCL). |
Hanjun Li; Xingjia Pan; Ke Yan; Fan Tang; Wei-Shi Zheng; |
81 | Toward Fast, Flexible, and Robust Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios. |
Long Ma; Tengyu Ma; Risheng Liu; Xin Fan; Zhongxuan Luo; |
82 | Online Learning of Reusable Abstract Models for Object Goal Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a novel approach to incrementally learn an Abstract Model of an unknown environment, and show how an agent can reuse the learned model for tackling the Object Goal Navigation task. |
Tommaso Campari; Leonardo Lamanna; Paolo Traverso; Luciano Serafini; Lamberto Ballan; |
83 | Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a prompt-based framework, Bridge-Prompt (Br-Prompt), to model the semantics across adjacent actions, so that it simultaneously exploits both out-of-context and contextual information from a series of ordinal actions in instructional videos. |
Muheng Li; Lei Chen; Yueqi Duan; Zhilan Hu; Jianjiang Feng; Jie Zhou; Jiwen Lu; |
84 | SimMatch: Semi-Supervised Learning With Similarity Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduced a new semi-supervised learning framework, SimMatch, which simultaneously considers semantic similarity and instance similarity. |
Mingkai Zheng; Shan You; Lang Huang; Fei Wang; Chen Qian; Chang Xu; |
85 | OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a new eXplanation framework, called OrphicX, for generating causal explanations for any graph neural networks (GNNs) based on learned latent causal factors. |
Wanyu Lin; Hao Lan; Hao Wang; Baochun Li; |
86 | HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus, in this work, we propose a novel 3D hand mesh estimation network HandOccNet, that can fully exploits the information at occluded regions as a secondary means to enhance image features and make it much richer. |
JoonKyu Park; Yeonguk Oh; Gyeongsik Moon; Hongsuk Choi; Kyoung Mu Lee; |
87 | EfficientNeRF Efficient Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present EfficientNeRF as an efficient NeRF-based method to represent 3D scene and synthesize novel-view images. |
Tao Hu; Shu Liu; Yilun Chen; Tiancheng Shen; Jiaya Jia; |
88 | Quantifying Societal Bias Amplification in Image Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We provide a comprehensive study on the strengths and limitations of each metric, and propose LIC, a metric to study captioning bias amplification. |
Yusuke Hirota; Yuta Nakashima; Noa Garcia; |
89 | Modular Action Concept Grounding in Semantic Video Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the idea of Mixture of Experts, we embody each abstract label by a structured combination of various visual concept learners and propose a novel video prediction model, Modular Action Concept Network (MAC). |
Wei Yu; Wenxin Chen; Songheng Yin; Steve Easterbrook; Animesh Garg; |
90 | StyleSwin: Transformer-Based GAN for High-Resolution Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we seek to explore using pure transformers to build a generative adversarial network for high-resolution image synthesis. |
Bowen Zhang; Shuyang Gu; Bo Zhang; Jianmin Bao; Dong Chen; Fang Wen; Yong Wang; Baining Guo; |
91 | Reinforced Structured State-Evolution for Vision-Language Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Structured state-Evolution (SEvol) model to effectively maintain the environment layout clues for VLN. |
Jinyu Chen; Chen Gao; Erli Meng; Qiong Zhang; Si Liu; |
92 | Sub-Word Level Lip Reading With Visual Attention Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The goal of this paper is to learn strong lip reading models that can recognise speech in silent videos. |
K R Prajwal; Triantafyllos Afouras; Andrew Zisserman; |
93 | Weakly Supervised High-Fidelity Clothing Model Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the expensive proprietary model images challenge the existing image virtual try-on methods in this scenario, as most of them need to be trained on considerable amounts of model images accompanied with paired clothes images. In this paper, we propose a cheap yet scalable weakly-supervised method called Deep Generative Projection (DGP) to address this specific scenario. |
Ruili Feng; Cheng Ma; Chengji Shen; Xin Gao; Zhenjiang Liu; Xiaobo Li; Kairi Ou; Deli Zhao; Zheng-Jun Zha; |
94 | Highly-Efficient Incomplete Large-Scale Multi-View Clustering With Consensus Bipartite Graph Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although many IMVC methods have been recently proposed, they always encounter high complexity and expensive time expenditure from being applied into large-scale tasks. In this paper, we present a flexible highly-efficient incomplete large-scale multi-view clustering approach based on bipartite graph framework to solve these issues. |
Siwei Wang; Xinwang Liu; Li Liu; Wenxuan Tu; Xinzhong Zhu; Jiyuan Liu; Sihang Zhou; En Zhu; |
95 | Towards Principled Disentanglement for Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on the transformation, we propose a primal-dual algorithm for joint representation disentanglement and domain generalization. |
Hanlin Zhang; Yi-Fan Zhang; Weiyang Liu; Adrian Weller; Bernhard Schölkopf; Eric P. Xing; |
96 | Discrete Cosine Transform Network for Guided Depth Map Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To solve the challenges in interpreting the working mechanism, extracting cross-modal features and RGB texture over-transferred, we propose a novel Discrete Cosine Transform Network (DCTNet) to alleviate the problems from three aspects. |
Zixiang Zhao; Jiangshe Zhang; Shuang Xu; Zudi Lin; Hanspeter Pfister; |
97 | Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. |
Xiaoxue Chen; Tianyu Liu; Hao Zhao; Guyue Zhou; Ya-Qin Zhang; |
98 | E2V-SDE: From Asynchronous Events to Fast and Continuous Video Reconstruction Via Neural Stochastic Differential Equations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, these works do not provide videos with sufficiently good quality due to unrealistic artifacts, such as lack of temporal information from irregular and discontinuous data and deterministic modeling for continuous-time stochastic process. In this study, we overcome these difficulties by introducing a new model called E2V-SDE, which is a neural continuous time-state model consisting of a latent stochastic differential equation and a conditional distribution of the observation. |
Jongwan Kim; DongJin Lee; Byunggook Na; Seongsik Park; Sungroh Yoon; |
99 | CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the more realistic setting of class-imbalanced data – called imbalanced SSL – is largely underexplored and standard SSL tends to underperform. In this paper, we propose a novel co-learning framework (CoSSL), which decouples representation and classifier learning while coupling them closely. |
Yue Fan; Dengxin Dai; Anna Kukleva; Bernt Schiele; |
100 | Discovering Objects That Can Move Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper studies the problem of object discovery — separating objects from the background without manual labels. |
Zhipeng Bao; Pavel Tokmakov; Allan Jabri; Yu-Xiong Wang; Adrien Gaidon; Martial Hebert; |
101 | Knowledge Mining With Scene Text for Fine-Grained Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image and enhance the semantics and correlation to fine-tune the image representation. |
Hao Wang; Junchao Liao; Tianheng Cheng; Zewen Gao; Hao Liu; Bo Ren; Xiang Bai; Wenyu Liu; |
102 | Self-Supervised Learning of Object Parts for Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, learning dense representations is challenging, as in the unsupervised context it is not clear how to guide the model to learn representations that correspond to various potential object categories. In this paper, we argue that self-supervised learning of object parts is a solution to this issue. |
Adrian Ziegler; Yuki M. Asano; |
103 | Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In the following, we thus propose ICG, a novel probabilistic tracker that fuses region and depth information and only requires the object geometry. |
Manuel Stoiber; Martin Sundermeyer; Rudolph Triebel; |
104 | Single-Photon Structured Light Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel structured light technique that uses Single Photon Avalanche Diode (SPAD) arrays to enable 3D scanning at high-frame rates and low-light levels. |
Varun Sundar; Sizhuo Ma; Aswin C. Sankaranarayanan; Mohit Gupta; |
105 | Deblurring Via Stochastic Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present an alternative framework for blind deblurring based on conditional diffusion models. |
Jay Whang; Mauricio Delbracio; Hossein Talebi; Chitwan Saharia; Alexandros G. Dimakis; Peyman Milanfar; |
106 | 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Observing that the 3D captioning task and the 3D grounding task contain both shared and complementary information in nature, in this work, we propose a unified framework to jointly solve these two distinct but closely related tasks in a synergistic fashion, which consists of both shared task-agnostic modules and lightweight task-specific modules. |
Daigang Cai; Lichen Zhao; Jing Zhang; Lu Sheng; Dong Xu; |
107 | TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The dominant CNN-based methods for cross-view image geo-localization rely on polar transform and fail to model global correlation. We propose a pure transformer-based approach (TransGeo) to address these limitations from a different perspective. |
Sijie Zhu; Mubarak Shah; Chen Chen; |
108 | R(Det)2: Randomized Decision Routing for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel approach to combine decision trees and deep neural networks in an end-to-end learning manner for object detection. |
Yali Li; Shengjin Wang; |
109 | Abandoning The Bayer-Filter To See in The Dark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to the fact that not all photons can pass the Bayer-Filter on the sensor of the color camera, in this work, we first present a De-Bayer-Filter simulator based on deep neural networks to generate a monochrome raw image from the colored raw image. Next, a fully convolutional network is proposed to achieve the low-light image enhancement by fusing colored raw data with synthesized monochrome data. |
Xingbo Dong; Wanyan Xu; Zhihui Miao; Lan Ma; Chao Zhang; Jiewen Yang; Zhe Jin; Andrew Beng Jin Teoh; Jiajun Shen; |
110 | SASIC: Stereo Image Compression With Latent Shifts and Stereo Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a learned method for stereo image compression that leverages the similarity of the left and right images in a stereo pair due to overlapping fields of view. |
Matthias Wödlinger; Jan Kotera; Jan Xu; Robert Sablatnig; |
111 | Exploiting Temporal Relations on Radar Perception for Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To enhance the capacity of automotive radar, in this work, we exploit the temporal information from successive ego-centric bird-eye-view radar image frames for radar object recognition. |
Peizhao Li; Pu Wang; Karl Berntorp; Hongfu Liu; |
112 | Multi-Instance Point Cloud Registration By Efficient Correspondence Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to directly group the set of noisy correspondences into different clusters based on a distance invariance matrix. |
Weixuan Tang; Danping Zou; |
113 | Contrastive Boundary Learning for Point Cloud Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on the segmentation of scene boundaries. |
Liyao Tang; Yibing Zhan; Zhe Chen; Baosheng Yu; Dacheng Tao; |
114 | Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we demonstrate that it is possible to train a GAN-based SISR model which can stably generate perceptually realistic details while inhibiting visual artifacts. |
Jie Liang; Hui Zeng; Lei Zhang; |
115 | CVNet: Contour Vibration Network for Building Extraction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the physical vibration theory, we propose a contour vibration network (CVNet) for automatic building boundary delineation. |
Ziqiang Xu; Chunyan Xu; Zhen Cui; Xiangwei Zheng; Jian Yang; |
116 | Hyperbolic Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we show that hyperbolic manifolds provide a valuable alternative for image segmentation and propose a tractable formulation of hierarchical pixel-level classification in hyperbolic space. |
Mina Ghadimi Atigh; Julian Schoep; Erman Acar; Nanne van Noord; Pascal Mettes; |
117 | Forward Compatible Training for Large-Scale Embedding Retrieval Systems Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a new learning paradigm for representation learning: forward compatible training (FCT). |
Vivek Ramanujan; Pavan Kumar Anasosalu Vasu; Ali Farhadi; Oncel Tuzel; Hadi Pouransari; |
118 | Everything at Once – Multi-Modal Fusion Transformer for Video Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Multi-modal learning from video data has seen increased attention recently as it allows training of semantically meaningful embeddings without human annotation, enabling tasks like zero-shot retrieval and action localization. In this work, we present a multi-modal, modality agnostic fusion transformer that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a fused representation in a joined multi-modal embedding space. |
Nina Shvetsova; Brian Chen; Andrew Rouditchenko; Samuel Thomas; Brian Kingsbury; Rogerio S. Feris; David Harwath; James Glass; Hilde Kuehne; |
119 | Swin Transformer V2: Scaling Up Capacity and Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present techniques for scaling Swin Transformer [??] up to 3 billion parameters and making it capable of training with images of up to 1,536×1,536 resolution. |
Ze Liu; Han Hu; Yutong Lin; Zhuliang Yao; Zhenda Xie; Yixuan Wei; Jia Ning; Yue Cao; Zheng Zhang; Li Dong; Furu Wei; Baining Guo; |
120 | Neural Template: Topology-Aware Reconstruction and Disentangled Generation of 3D Meshes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces a novel framework called DT-Net for 3D mesh reconstruction and generation via Disentangled Topology. |
Ka-Hei Hui; Ruihui Li; Jingyu Hu; Chi-Wing Fu; |
121 | DEFEAT: Deep Hidden Feature Backdoor Attacks By Imperceptible Perturbation and Latent Representation Constraints Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, We propose a novel and stealthy backdoor attack – DEFEAT. |
Zhendong Zhao; Xiaojun Chen; Yuexin Xuan; Ye Dong; Dakui Wang; Kaitai Liang; |
122 | Projective Manifold Gradient Layer for Deep Rotation Regression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a manifold-aware gradient that directly backpropagates into deep network weights. |
Jiayi Chen; Yingda Yin; Tolga Birdal; Baoquan Chen; Leonidas J. Guibas; He Wang; |
123 | CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Cross Language Image Matching (CLIMS) framework, based on the recently introduced Contrastive Language-Image Pre-training (CLIP) model, for WSSS. |
Jinheng Xie; Xianxu Hou; Kai Ye; Linlin Shen; |
124 | Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we explore two orthogonal but complementary aspects of a video snippet, i.e., the action features and the co-occurrence features. |
Kun Xia; Le Wang; Sanping Zhou; Nanning Zheng; Wei Tang; |
125 | It’s Time for Artistic Correspondence in Music and Video Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present an approach for recommending a music track for a given video, and vice versa, based on both their temporal alignment and their correspondence at an artistic level. |
Dídac Surís; Carl Vondrick; Bryan Russell; Justin Salamon; |
126 | Mixed Differential Privacy in Computer Vision Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce AdaMix, an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data. |
Aditya Golatkar; Alessandro Achille; Yu-Xiang Wang; Aaron Roth; Michael Kearns; Stefano Soatto; |
127 | AdaFace: Quality Adaptive Margin for Face Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce another aspect of adaptiveness in the loss function, namely the image quality. |
Minchul Kim; Anil K. Jain; Xiaoming Liu; |
128 | Learning Soft Estimator of Keypoint Scale and Orientation With Probabilistic Covariant Loss Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, hard estimators are difficult to handle the local patches containing structures of different objects or multiple edges. In this paper, a Soft Self-Supervised Estimator (S3Esti) is proposed to overcome this problem by learning to predict multiple scales and orientations. |
Pei Yan; Yihua Tan; Shengzhou Xiong; Yuan Tai; Yansheng Li; |
129 | DN-DETR: Accelerate DETR Training By Introducing Query DeNoising Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present in this paper a novel denoising training method to speedup DETR (DEtection TRansformer) training and offer a deepened understanding of the slow convergence issue of DETR-like methods. |
Feng Li; Hao Zhang; Shilong Liu; Jian Guo; Lionel M. Ni; Lei Zhang; |
130 | HCSC: Hierarchical Contrastive Selective Coding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In addition, the negative pairs used in these methods are not guaranteed to be semantically distinct, which could further hamper the structural correctness of learned image representations. To tackle these limitations, we propose a novel contrastive learning framework called Hierarchical Contrastive Selective Coding (HCSC). |
Yuanfan Guo; Minghao Xu; Jiawen Li; Bingbing Ni; Xuanyu Zhu; Zhenbang Sun; Yi Xu; |
131 | TransRank: Self-Supervised Video Representation Learning Via Ranking-Based Transformation Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on hard-label classification, existing RecogTrans approaches suffer from noisy supervision signals in pre-training. To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation. |
Haodong Duan; Nanxuan Zhao; Kai Chen; Dahua Lin; |
132 | KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the theory of Non-Rigid Structure from Motion prescribes to constrain the deformations for 3D reconstruction. We thus propose a new model that departs significantly from this prior work. |
David Novotny; Ignacio Rocco; Samarth Sinha; Alexandre Carlier; Gael Kerchenbaum; Roman Shapovalov; Nikita Smetanin; Natalia Neverova; Benjamin Graham; Andrea Vedaldi; |
133 | Invariant Grounding for Video Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we first take a causal look at VideoQA and argue that invariant grounding is the key to ruling out the spurious correlations. Towards this end, we propose a new learning framework, Invariant Grounding for VideoQA (IGV), to ground the question-critical scene, whose causal relations with answers are invariant across different interventions on the complement. |
Yicong Li; Xiang Wang; Junbin Xiao; Wei Ji; Tat-Seng Chua; |
134 | Prompt Distribution Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present prompt distribution learning for effectively adapting a pre-trained vision-language model to address downstream recognition tasks. |
Yuning Lu; Jianzhuang Liu; Yonggang Zhang; Yajing Liu; Xinmei Tian; |
135 | RAGO: Recurrent Graph Optimizer for Multiple Rotation Averaging Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a deep recurrent Rotation Averaging Graph Optimizer (RAGO) for Multiple Rotation Averaging (MRA). |
Heng Li; Zhaopeng Cui; Shuaicheng Liu; Ping Tan; |
136 | Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this line of research, effectively modeling task correlations is vital yet highly neglected. Therefore, we propose Arch-Graph, a transferable NAS method that predicts task-specific optimal architectures with respect to given task embeddings. |
Minbin Huang; Zhijian Huang; Changlin Li; Xin Chen; Hang Xu; Zhenguo Li; Xiaodan Liang; |
137 | On Aliased Resizing and Surprising Subtleties in GAN Evaluation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper shows that choices in low-level image processing have been an under-appreciated aspect of generative modeling. |
Gaurav Parmar; Richard Zhang; Jun-Yan Zhu; |
138 | Lepard: Learning Partial Point Cloud Matching in Rigid and Deformable Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Lepard, a Learning based approach for partial point cloud matching in rigid and deformable scenes. |
Yang Li; Tatsuya Harada; |
139 | Virtual Elastic Objects Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present Virtual Elastic Objects (VEOs): virtual objects that not only look like their real-world counterparts but also behave like them, even when subject to novel interactions. |
Hsiao-yu Chen; Edith Tretschk; Tuur Stuyck; Petr Kadlecek; Ladislav Kavan; Etienne Vouga; Christoph Lassner; |
140 | DiSparse: Disentangled Sparsification for Multitask Model Compression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose DiSparse, a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme. |
Xinglong Sun; Ali Hassani; Zhangyang Wang; Gao Huang; Humphrey Shi; |
141 | Pushing The Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make A Difference Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We seek to push the limits of a simple-but-effective pipeline for real-world few-shot image classification in practice. To this end, we explore few-shot learning from the perspective of neural architecture, as well as a three stage pipeline of pre-training on external data, meta-training with labelled few-shot tasks, and task-specific fine-tuning on unseen tasks. |
Shell Xu Hu; Da Li; Jan Stühmer; Minyoung Kim; Timothy M. Hospedales; |
142 | Opening Up Open World Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper addresses this evaluation deficit and lays out the landscape and evaluation methodology for detecting and tracking both known and unknown objects in the open-world setting. |
Yang Liu; Idil Esen Zulfikar; Jonathon Luiten; Achal Dave; Deva Ramanan; Bastian Leibe; Aljoša Ošep; Laura Leal-Taixé; |
143 | Towards Efficient and Scalable Sharpness-Aware Minimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel algorithm LookSAM – that only periodically calculates the inner gradient ascent, to significantly reduce the additional training cost of SAM. |
Yong Liu; Siqi Mai; Xiangning Chen; Cho-Jui Hsieh; Yang You; |
144 | VISTA: Boosting 3D Object Detection Via Dual Cross-VIew SpaTial Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to adaptively fuse multi-view features in a global spatial context via Dual Cross-VIew SpaTial Attention (VISTA). |
Shengheng Deng; Zhihao Liang; Lin Sun; Kui Jia; |
145 | Rethinking Deep Face Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Because the human visual system is very sensitive to faces, even minor facial changes may alter the identity and significantly degrade the perceptual quality. In this work, we argue the problems of existing models can be traced down to the two sub-tasks of the face restoration problem, i.e. face generation and face reconstruction, and the fragile balance between them. |
Yang Zhao; Yu-Chuan Su; Chun-Te Chu; Yandong Li; Marius Renn; Yukun Zhu; Changyou Chen; Xuhui Jia; |
146 | OSSO: Obtaining Skeletal Shape From Outside Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We address the problem of inferring the anatomic skeleton of a person, in an arbitrary pose, from the 3D surface of the body; i.e. we predict the inside (bones) from the outside (skin). |
Marilyn Keller; Silvia Zuffi; Michael J. Black; Sergi Pujades; |
147 | Temporal Alignment Networks for Long-Term Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The objective of this paper is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment. |
Tengda Han; Weidi Xie; Andrew Zisserman; |
148 | Few-Shot Head Swapping in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present the Head Swapper (HeSer), which achieves few-shot head swapping in the wild through two dedicated designed modules. |
Changyong Shu; Hemao Wu; Hang Zhou; Jiaming Liu; Zhibin Hong; Changxing Ding; Junyu Han; Jingtuo Liu; Errui Ding; Jingdong Wang; |
149 | A Study on The Distribution of Social Biases in Self-Supervised Learning Visual Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study the biases of a varied set of SSL visual models, trained using ImageNet data, using a method and dataset designed by psychological experts to measure social biases. |
Kirill Sirotkin; Pablo Carballeira; Marcos Escudero-Viñolo; |
150 | LAR-SR: A Local Autoregressive Model for Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Based on the fact that given the structural information, the textural details in the natural images are locally related without long term dependency, in this paper we propose a novel autoregressive model-based SR approach, namely LAR-SR, which can efficiently generate realistic SR images using a novel local autoregressive (LAR) module. |
Baisong Guo; Xiaoyun Zhang; Haoning Wu; Yu Wang; Ya Zhang; Yan-Feng Wang; |
151 | Bayesian Invariant Risk Minimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our empirical evidence also provides supports: IRM methods that work well in typical settings significantly deteriorate even if we slightly enlarge the model size or lessen the training data. To alleviate this issue, we propose Bayesian Invariant Risk Minimization (BIRM) by introducing Bayesian inference into the IRM. |
Yong Lin; Hanze Dong; Hao Wang; Tong Zhang; |
152 | Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we aim to mine comprehensive co-salient features with democracy and reduce background interference without introducing any extra information. |
Siyue Yu; Jimin Xiao; Bingfeng Zhang; Eng Gee Lim; |
153 | Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation Via Structure Consistency Constraint Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on the low-level I2I translation, where the structure of images is highly related to their semantics. |
Jiaxian Guo; Jiachen Li; Huan Fu; Mingming Gong; Kun Zhang; Dacheng Tao; |
154 | Doodle It Yourself: Class Incremental Learning By Drawing A Few Sketches Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: For that, we present a framework that infuses (i) gradient consensus for domain invariant learning, (ii) knowledge distillation for preserving old class information, and (iii) graph attention networks for message passing between old and novel classes. |
Ayan Kumar Bhunia; Viswanatha Reddy Gajjala; Subhadeep Koley; Rohit Kundu; Aneeshan Sain; Tao Xiang; Yi-Zhe Song; |
155 | Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, instead of following previous literature, we propose Self-Supervised Predictive Learning (SSPL), a negative-free method for sound localization via explicit positive mining. |
Zengjie Song; Yuxi Wang; Junsong Fan; Tieniu Tan; Zhaoxiang Zhang; |
156 | ICON: Implicit Clothed Humans Obtained From Normals Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, our goal is to learn the avatar from only 2D images of people in unconstrained poses. |
Yuliang Xiu; Jinlong Yang; Dimitrios Tzionas; Michael J. Black; |
157 | Comparing Correspondences: Video Prediction With Correspondence-Wise Losses Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Image prediction methods often struggle on tasks that require changing the positions of objects, such as video prediction, producing blurry images that average over the many positions that objects might occupy. In this paper, we propose a simple change to existing image similarity metrics that makes them more robust to positional errors: we match the images using optical flow, then measure the visual similarity of corresponding pixels. |
Daniel Geng; Max Hamilton; Andrew Owens; |
158 | Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a generic perception architecture named Uni-Perceiver, which processes a variety of modalities and tasks with unified modeling and shared parameters. |
Xizhou Zhu; Jinguo Zhu; Hao Li; Xiaoshi Wu; Hongsheng Li; Xiaohua Wang; Jifeng Dai; |
159 | The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To build computer vision systems that truly solve real-world problems at global scale, we need benchmarks that fully capture real-world complexity, including geographic domain shift, long-tailed distributions, and data noise. We propose urban forest monitoring as an ideal testbed for studying and improving upon these computer vision challenges, while simultaneously working towards filling a crucial environmental and societal need. |
Sara Beery; Guanhang Wu; Trevor Edwards; Filip Pavetic; Bo Majewski; Shreyasee Mukherjee; Stanley Chan; John Morgan; Vivek Rathod; Jonathan Huang; |
160 | On The Instability of Relative Pose Estimation and RANSAC’s Role Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: These cases arise due to numerical instability of the 5- and 7-point minimal problems. This paper characterizes these instabilities, both in terms of minimal world scene configurations that lead to infinite condition number in epipolar estimation, and also in terms of the related minimal image feature pair correspondence configurations. |
Hongyi Fan; Joe Kileel; Benjamin Kimia; |
161 | Shape From Polarization for Complex Scenes in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a new data-driven approach with physics-based priors to scene-level normal estimation from a single polarization image. |
Chenyang Lei; Chenyang Qi; Jiaxin Xie; Na Fan; Vladlen Koltun; Qifeng Chen; |
162 | Real-Time, Accurate, and Consistent Video Semantic Segmentation Via Unsupervised Adaptation and Cross-Unit Deployment on Mobile Device Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This demonstration showcases our innovations on efficient, accurate, and temporally consistent video semantic segmentation on mobile device. |
Hyojin Park; Alan Yessenbayev; Tushar Singhal; Navin Kumar Adhikari; Yizhe Zhang; Shubhankar Mangesh Borse; Hong Cai; Nilesh Prasad Pandey; Fei Yin; Frank Mayer; Balaji Calidas; Fatih Porikli; |
163 | SNUG: Self-Supervised Neural Dynamic Garments Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a self-supervised method to learn dynamic 3D deformations of garments worn by parametric human bodies. |
Igor Santesteban; Miguel A. Otaduy; Dan Casas; |
164 | Towards Fewer Annotations: Active Learning Via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple region-based active learning approach for semantic segmentation under a domain shift, aiming to automatically query a small partition of image regions to be labeled while maximizing segmentation performance. |
Binhui Xie; Longhui Yuan; Shuang Li; Chi Harold Liu; Xinjing Cheng; |
165 | Glass Segmentation Using Intensity and Spectral Polarization Cues Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we exploit that the light-matter interactions on glass materials provide unique intensity-polarization cues for each observed wavelength of light. |
Haiyang Mei; Bo Dong; Wen Dong; Jiaxi Yang; Seung-Hwan Baek; Felix Heide; Pieter Peers; Xiaopeng Wei; Xin Yang; |
166 | CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We observe in the real world that humans are capable of mapping the visual concepts learnt from 2D images to understand the 3D world. Encouraged by this insight, we propose CrossPoint, a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations. |
Mohamed Afham; Isuru Dissanayake; Dinithi Dissanayake; Amaya Dharmasiri; Kanchana Thilakarathna; Ranga Rodrigo; |
167 | Few Shot Generative Model Adaption Via Relaxed Spatial Structural Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing methods are prone to model overfitting and collapse in extremely few shot setting (less than 10). To solve this problem, we propose a relaxed spatial structural alignment (RSSA) method to calibrate the target generative models during the adaption. |
Jiayu Xiao; Liang Li; Chaofei Wang; Zheng-Jun Zha; Qingming Huang; |
168 | Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Albeit great effort made in single source domain adaptation, a more generalized task with multiple source domains remains not being well explored, due to knowledge degradation during their combination. To address this issue, we propose a novel approach, namely target-relevant knowledge preservation (TRKP), to unsupervised multi-source DAOD. |
Jiaxi Wu; Jiaxin Chen; Mengzhe He; Yiru Wang; Bo Li; Bingqi Ma; Weihao Gan; Wei Wu; Yali Wang; Di Huang; |
169 | Pyramid Grafting Network for One-Stage High Resolution Saliency Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most of existing SOD models designed for low-resolution input perform poorly on high-resolution images due to the contradiction between the sampling depth and the receptive field size. Aiming at resolving this contradiction, we propose a novel one-stage framework called Pyramid Grafting Network (PGNet), using transformer and CNN backbone to extract features from different resolution images independently and then graft the features from transformer branch to CNN branch. |
Chenxi Xie; Changqun Xia; Mingcan Ma; Zhirui Zhao; Xiaowu Chen; Jia Li; |
170 | A Style-Aware Discriminator for Controllable Image Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This limitation largely arises because labels do not consider the semantic distance. To mitigate such problems, we propose a style-aware discriminator that acts as a critic as well as a style encoder to provide conditions. |
Kunhee Kim; Sanghun Park; Eunyeong Jeon; Taehun Kim; Daijin Kim; |
171 | Non-Iterative Recovery From Nonlinear Observations Using Generative Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we aim to estimate the direction of an underlying signal from its nonlinear observations following the semi-parametric single index model (SIM). |
Jiulong Liu; Zhaoqiang Liu; |
172 | Incremental Cross-View Mutual Distillation for Self-Supervised Medical CT Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, this paper builds a novel medical slice synthesis to increase the inter-slice resolution. Considering that the ground-truth intermediate medical slices are always absent in clinical practice, we introduce the incremental cross-view mutual distillation strategy to accomplish this task in the self-supervised learning manner. |
Chaowei Fang; Liang Wang; Dingwen Zhang; Jun Xu; Yixuan Yuan; Junwei Han; |
173 | Enhancing Adversarial Training With Second-Order Statistics of Weights Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we show that treating model weights as random variables allows for enhancing adversarial training through Second-Order Statistics Optimization (S^2O) with respect to the weights. |
Gaojie Jin; Xinping Yi; Wei Huang; Sven Schewe; Xiaowei Huang; |
174 | Partially Does It: Towards Scene-Level FG-SBIR With Partial Input Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: A quick pilot study reveals: (i) a scene sketch does not necessarily contain all objects in the corresponding photo, due to the subjective holistic interpretation of scenes, (ii) there exists significant empty (white) regions as a result of object-level abstraction, and as a result, (iii) existing scene-level fine-grained sketch-based image retrieval methods collapse as scene sketches become more partial. To solve this "partial" problem, we advocate for a simple set-based approach using optimal transport (OT) to model cross-modal region associativity in a partially-aware fashion. |
Pinaki Nath Chowdhury; Ayan Kumar Bhunia; Viswanatha Reddy Gajjala; Aneeshan Sain; Tao Xiang; Yi-Zhe Song; |
175 | Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our findings motivate us to simplify MoCo v2 via the removal of its dictionary as well as momentum. |
Chaoning Zhang; Kang Zhang; Trung X. Pham; Axi Niu; Zhinan Qiao; Chang D. Yoo; In So Kweon; |
176 | Moving Window Regression: A Novel Approach to Ordinal Regression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A novel ordinal regression algorithm, called moving window regression (MWR), is proposed in this paper. |
Nyeong-Ho Shin; Seon-Ho Lee; Chang-Su Kim; |
177 | UniCoRN: A Unified Conditional Image Repainting Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing methods based on two-phase architecture design assume dependency between phases and cause color-image incongruity. To solve these problems, we propose a novel Unified Conditional image Repainting Network (UniCoRN). |
Jimeng Sun; Shuchen Weng; Zheng Chang; Si Li; Boxin Shi; |
178 | Forecasting Characteristic 3D Poses of Human Actions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To predict characteristic poses, we propose a probabilistic approach that models the possible multi-modality in the distribution of likely characteristic poses. |
Christian Diller; Thomas Funkhouser; Angela Dai; |
179 | ACPL: Anti-Curriculum Pseudo-Labelling for Semi-Supervised Medical Image Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, unlike traditional methods that select confident pseudo label by threshold, we propose a new SSL algorithm, called anti-curriculum pseudo-labelling (ACPL), which introduces novel techniques to select informative unlabelled samples, improving training balance and allowing the model to work for both multi-label and multi-class problems, and to estimate pseudo labels by an accurate ensemble of classifiers (improving pseudo label accuracy). |
Fengbei Liu; Yu Tian; Yuanhong Chen; Yuyuan Liu; Vasileios Belagiannis; Gustavo Carneiro; |
180 | Learning to Deblur Using Light Field Generated and Real Defocus Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel deep defocus deblurring network that leverages the strength and overcomes the shortcoming of light fields. |
Lingyan Ruan; Bin Chen; Jizhou Li; Miuling Lam; |
181 | Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Different from related methods, we propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block. |
Nicolae-Cătălin Ristea; Neelu Madan; Radu Tudor Ionescu; Kamal Nasrollahi; Fahad Shahbaz Khan; Thomas B. Moeslund; Mubarak Shah; |
182 | Safe Self-Refinement for Transformer-Based Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we propose a novel solution named SSRT (Safe Self-Refinement for Transformer-based domain adaptation), which brings improvement from two aspects. |
Tao Sun; Cheng Lu; Tianshuo Zhang; Haibin Ling; |
183 | Density-Preserving Deep Point Cloud Compression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Local density of point clouds is crucial for representing local details, but has been overlooked by existing point cloud compression methods. To address this, we propose a novel deep point cloud compression method that preserves local density information. |
Yun He; Xinlin Ren; Danhang Tang; Yinda Zhang; Xiangyang Xue; Yanwei Fu; |
184 | StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We apply style transfer on mesh reconstructions of indoor scenes. |
Lukas Höllein; Justin Johnson; Matthias Nießner; |
185 | Which Model To Transfer? Finding The Needle in The Growing Haystack Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As these repositories keep growing exponentially, efficiently selecting a good model for the task at hand becomes paramount. We provide a formalization of this problem through a familiar notion of regret and introduce the predominant strategies, namely task-agnostic (e.g. ranking models by their ImageNet performance) and task-aware search strategies (such as linear or kNN evaluation). |
Cedric Renggli; André Susano Pinto; Luka Rimanic; Joan Puigcerver; Carlos Riquelme; Ce Zhang; Mario Lučić; |
186 | Fast and Unsupervised Action Boundary Detection for Action Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To deal with the great number of untrimmed videos produced every day, we propose an efficient unsupervised action segmentation method by detecting boundaries, named action boundary detection (ABD). |
Zexing Du; Xue Wang; Guoqing Zhou; Qing Wang; |
187 | Class-Incremental Learning With Strong Pre-Trained Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a 2-stage training scheme, i) feature augmentation – cloning part of the backbone and fine-tuning it on the novel data, and ii) fusion – combining the base and novel classifiers into a unified classifier. |
Tz-Ying Wu; Gurumurthy Swaminathan; Zhizhong Li; Avinash Ravichandran; Nuno Vasconcelos; Rahul Bhotika; Stefano Soatto; |
188 | Robust Optimization As Data Augmentation for Large-Scale Graphs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training. |
Kezhi Kong; Guohao Li; Mucong Ding; Zuxuan Wu; Chen Zhu; Bernard Ghanem; Gavin Taylor; Tom Goldstein; |
189 | Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast to the literature, we propose a family of robust structured declarative classifiers for point cloud classification, where the internal constrained optimization mechanism can effectively defend adversarial attacks through implicit gradients. |
Kaidong Li; Ziming Zhang; Cuncong Zhong; Guanghui Wang; |
190 | PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Most indoor 3D scene reconstruction methods focus on recovering 3D geometry and scene layout. In this work, we go beyond this to propose PhotoScene, a framework that takes input image(s) of a scene along with approximately aligned CAD geometry (either reconstructed automatically or manually specified) and builds a photorealistic digital twin with high-quality materials and similar lighting. |
Yu-Ying Yeh; Zhengqin Li; Yannick Hold-Geoffroy; Rui Zhu; Zexiang Xu; Miloš Hašan; Kalyan Sunkavalli; Manmohan Chandraker; |
191 | Improving The Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, prior works utilize simple image transformations such as resizing, which limits input diversity. To tackle this limitation, we propose the object-based diverse input (ODI) method that draws an adversarial image on a 3D object and induces the rendered image to be classified as the target class. |
Junyoung Byun; Seungju Cho; Myung-Joon Kwon; Hee-Seon Kim; Changick Kim; |
192 | IRON: Inverse Rendering By Optimizing Neural SDFs and Materials From Photometric Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a neural inverse rendering pipeline called IRON that operates on photometric images and outputs high-quality 3D content in the format of triangle meshes and material textures readily deployable in existing graphics pipelines. |
Kai Zhang; Fujun Luan; Zhengqi Li; Noah Snavely; |
193 | ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present ObjectFolder 2.0, a large-scale, multisensory dataset of common household objects in the form of implicit neural representations that significantly enhances ObjectFolder 1.0 in three aspects. |
Ruohan Gao; Zilin Si; Yen-Yu Chang; Samuel Clarke; Jeannette Bohg; Li Fei-Fei; Wenzhen Yuan; Jiajun Wu; |
194 | Versatile Multi-Modal Pre-Training for Human-Centric Perception Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Therefore, it is desirable to have a versatile pre-train model that serves as a foundation for data-efficient downstream tasks transfer. To this end, we propose the Human-Centric Multi-Modal Contrastive Learning framework HCMoCo that leverages the multi-modal nature of human data (e.g. RGB, depth, 2D keypoints) for effective representation learning. |
Fangzhou Hong; Liang Pan; Zhongang Cai; Ziwei Liu; |
195 | 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a flexible framework for monocular depth estimation from high-resolution 360deg images using tangent images. |
Manuel Rey-Area; Mingze Yuan; Christian Richardt; |
196 | Splicing ViT Features for Semantic Appearance Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a method for semantically transferring the visual appearance of one natural image to another. |
Narek Tumanyan; Omer Bar-Tal; Shai Bagon; Tali Dekel; |
197 | Contrastive Regression for Domain Adaptation on Gaze Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel gaze adaptation approach, namely Contrastive Regression Gaze Adaptation (CRGA), for generalizing gaze estimation on the target domain in an unsupervised manner. |
Yaoming Wang; Yangzhou Jiang; Jin Li; Bingbing Ni; Wenrui Dai; Chenglin Li; Hongkai Xiong; Teng Li; |
198 | MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose MUSE-VAE, a new probabilistic modeling framework based on a cascade of Conditional VAEs, which tackles the long-term, uncertain trajectory prediction task using a coarse-to-fine multi-factor forecasting architecture. |
Mihee Lee; Samuel S. Sohn; Seonghyeon Moon; Sejong Yoon; Mubbasir Kapadia; Vladimir Pavlovic; |
199 | Multi-View Consistent Generative Adversarial Networks for 3D-Aware Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, one key challenge remains: existing approaches lack geometry constraints, hence usually fail to generate multi-view consistent images. To address this challenge, we propose Multi-View Consistent Generative Adversarial Networks (MVCGAN) for high-quality 3D-aware image synthesis with geometry constraints. |
Xuanmeng Zhang; Zhedong Zheng; Daiheng Gao; Bang Zhang; Pan Pan; Yi Yang; |
200 | Putting People in Their Place: Monocular Regression of 3D People in Depth Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Given an image with multiple people, our goal is to directly regress the pose and shape of all the people as well as their relative depth. |
Yu Sun; Wu Liu; Qian Bao; Yili Fu; Tao Mei; Michael J. Black; |
201 | POCO: Point Convolution for Surface Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Besides, relying on fixed patch sizes may require discretization tuning. To address these issues, we propose to use point cloud convolutions and compute latent vectors at each input point. |
Alexandre Boulch; Renaud Marlet; |
202 | Memory-Augmented Non-Local Attention for Video Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective video super-resolution method that aims at generating high-fidelity high-resolution (HR) videos from low-resolution (LR) ones. |
Jiyang Yu; Jingen Liu; Liefeng Bo; Tao Mei; |
203 | Neural Texture Extraction and Distribution for Controllable Person Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Observing that person images are highly structured, we propose to generate desired images by extracting and distributing semantic entities of reference images. |
Yurui Ren; Xiaoqing Fan; Ge Li; Shan Liu; Thomas H. Li; |
204 | Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Today’s VidSGG models are all proposal-based methods, i.e., they first generate numerous paired subject-object snippets as proposals, and then conduct predicate classification for each proposal. In this paper, we argue that this prevalent proposal-based framework has three inherent drawbacks: 1) The ground-truth predicate labels for proposals are partially correct. |
Kaifeng Gao; Long Chen; Yulei Niu; Jian Shao; Jun Xiao; |
205 | Transformer-Empowered Multi-Scale Contextual Matching and Aggregation for Multi-Contrast MRI Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing methods still have two shortcomings: (1) they neglect that the multi-contrast features at different scales contain different anatomical details and hence lack effective mechanisms to match and fuse these features for better reconstruction; and (2) they are still deficient in capturing long-range dependencies, which are essential for the regions with complicated anatomical structures. We propose a novel network to comprehensively address these problems by developing a set of innovative Transformer-empowered multi-scale contextual matching and aggregation techniques; we call it McMRSR. |
Guangyuan Li; Jun Lv; Yapeng Tian; Qi Dou; Chengyan Wang; Chenliang Xu; Jing Qin; |
206 | GazeOnce: Real-Time Multi-Person Gaze Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the first one-stage end-to-end gaze estimation method, GazeOnce, which is capable of simultaneously predicting gaze directions for multiple faces (>10) in an image. |
Mingfang Zhang; Yunfei Liu; Feng Lu; |
207 | GateHUB: Gated History Unit With Background Suppression for Online Action Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: It is therefore important to accentuate parts of the history that are more informative to the prediction of the current frame. We present GateHUB, Gated History Unit with Background Suppression, that comprises a novel position-guided gated cross-attention mechanism to enhance or suppress parts of the history as per how informative they are for current frame prediction. |
Junwen Chen; Gaurav Mittal; Ye Yu; Yu Kong; Mei Chen; |
208 | Few-Shot Font Generation By Learning Fine-Grained Local Styles Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new font generation approach by learning 1) the fine-grained local styles from references, and 2) the spatial correspondence between the content and reference glyphs. |
Licheng Tang; Yiyang Cai; Jiaming Liu; Zhibin Hong; Mingming Gong; Minhu Fan; Junyu Han; Jingtuo Liu; Errui Ding; Jingdong Wang; |
209 | Bridging Video-Text Retrieval With Multiple Choice Questions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we enable fine-grained video-text interactions while maintaining high efficiency for retrieval via a novel pretext task, dubbed as Multiple Choice Questions (MCQ), where a parametric module BridgeFormer is trained to answer the "questions" constructed by the text features via resorting to the video features. |
Yuying Ge; Yixiao Ge; Xihui Liu; Dian Li; Ying Shan; Xiaohu Qie; Ping Luo; |
210 | Depth-Aware Generative Adversarial Network for Talking Head Video Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a self-supervised face-depth learning method to automatically recover dense 3D facial geometry (i.e. depth) from the face videos without the requirement of any expensive 3D annotation data. |
Fa-Ting Hong; Longhao Zhang; Li Shen; Dan Xu; |
211 | Dual-Path Image Inpainting With Auxiliary GAN Inversion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop a dual-path inpainting network with inversion path and feed-forward path, in which inversion path provides auxiliary information to help feed-forward path. |
Wentao Wang; Li Niu; Jianfu Zhang; Xue Yang; Liqing Zhang; |
212 | DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN). |
Ming Tao; Hao Tang; Fei Wu; Xiao-Yuan Jing; Bing-Kun Bao; Changsheng Xu; |
213 | Generative Flows With Invertible Attentions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To fill the gap, this paper introduces two types of invertible attention mechanisms, i.e., map-based and transformer-based attentions, for both unconditional and conditional generative flows. |
Rhea Sanjay Sukthanker; Zhiwu Huang; Suryansh Kumar; Radu Timofte; Luc Van Gool; |
214 | Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an effective solution by simply clipping the Euclidean feature magnitude while training HNNs. |
Yunhui Guo; Xudong Wang; Yubei Chen; Stella X. Yu; |
215 | Estimating Fine-Grained Noise Model Via Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we combine both noise modeling and estimation, and propose an innovative noise model estimation and noise synthesis pipeline for realistic noisy image generation. |
Yunhao Zou; Ying Fu; |
216 | DiffPoseNet: Direct Differentiable Camera Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce a network NFlowNet, for normal flow estimation which is used to enforce robust and direct constraints. |
Chethan M. Parameshwara; Gokul Hari; Cornelia Fermüller; Nitin J. Sanket; Yiannis Aloimonos; |
217 | The Flag Median and FlagIRLS Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While a number of different subspace prototypes have been described, the calculation of some of these prototypes has proven to be computationally expensive while other prototypes are affected by outliers and produce highly imperfect clustering on noisy data. This work proposes a new subspace prototype, the flag median, and introduces the FlagIRLS algorithm for its calculation. |
Nathan Mankovich; Emily J. King; Chris Peterson; Michael Kirby; |
218 | Implicit Feature Decoupling With Depthwise Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Depthwise Quantization (DQ) where quantization is applied to a decomposed sub-tensor along the feature axis of weak statistical dependence. |
Iordanis Fostiropoulos; Barry Boehm; |
219 | Graph-Context Attention Networks for Size-Varied Deep Graph Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To tackle this, we firstly propose to formulate the combinatorial problem of graph matching as an Integer Linear Programming (ILP) problem, which is more flexible and efficient to facilitate comparing graphs of varied sizes. A novel Graph-context Attention Network (GCAN), which jointly capture intrinsic graph structure and cross-graph information for improving the discrimination of node features, is then proposed and trained to resolve this ILP problem with node correspondence supervision. |
Zheheng Jiang; Hossein Rahmani; Plamen Angelov; Sue Black; Bryan M. Williams; |
220 | FENeRF: Face Editing in Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: 3D-aware GAN methods can maintain view consistency but their generated images are not locally editable. To overcome these limitations, we propose FENeRF, a 3D-aware generator that can produce view-consistent and locally-editable portrait images. |
Jingxiang Sun; Xuan Wang; Yong Zhang; Xiaoyu Li; Qi Zhang; Yebin Liu; Jue Wang; |
221 | CoNeRF: Controllable Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our key idea is to treat the attributes as latent variables that are regressed by the neural network given the scene encoding. |
Kacper Kania; Kwang Moo Yi; Marek Kowalski; Tomasz Trzciński; Andrea Tagliasacchi; |
222 | Noise2NoiseFlow: Realistic Camera Noise Modeling Without Clean Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a framework for training a noise model and a denoiser simultaneously while relying only on pairs of noisy images rather than noisy/clean paired image data. |
Ali Maleky; Shayan Kousha; Michael S. Brown; Marcus A. Brubaker; |
223 | ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we take a step towards computer-aided waste detection and present the first in-the-wild industrial-grade waste detection and segmentation dataset, ZeroWaste. |
Dina Bashkirova; Mohamed Abdelfattah; Ziliang Zhu; James Akl; Fadi Alladkani; Ping Hu; Vitaly Ablavsky; Berk Calli; Sarah Adel Bargal; Kate Saenko; |
224 | Remember Intentions: Retrospective-Memory-Based Trajectory Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To provide a more explicit link between the current situation and the seen instances, we imitate the mechanism of retrospective memory in neuropsychology and propose MemoNet, an instance-based approach that predicts the movement intentions of agents by looking for similar scenarios in the training data. |
Chenxin Xu; Weibo Mao; Wenjun Zhang; Siheng Chen; |
225 | Measuring Compositional Consistency for Video Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop a question decomposition engine that programmatically deconstructs a compositional question into a directed acyclic graph of sub-questions. |
Mona Gandhi; Mustafa Omer Gul; Eva Prakash; Madeleine Grunde-McLaughlin; Ranjay Krishna; Maneesh Agrawala; |
226 | Category Contrast for Unsupervised Domain Adaptation in Visual Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we explore the idea of instance contrastive learning in unsupervised domain adaptation (UDA) and propose a novel Category Contrast technique (CaCo) that introduces semantic priors on top of instance discrimination for visual UDA tasks. |
Jiaxing Huang; Dayan Guan; Aoran Xiao; Shijian Lu; Ling Shao; |
227 | SwapMix: Diagnosing and Regularizing The Over-Reliance on Visual Context in Visual Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we study the robustness of VQA models from a novel perspective: visual context. |
Vipul Gupta; Zhuowan Li; Adam Kortylewski; Chenyu Zhang; Yingwei Li; Alan Yuille; |
228 | UNIST: Unpaired Neural Implicit Shape Translation Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce UNIST, the first deep neural implicit model for general-purpose, unpaired shape-to-shape translation, in both 2D and 3D domains. |
Qimin Chen; Johannes Merz; Aditya Sanghi; Hooman Shayani; Ali Mahdavi-Amiri; Hao Zhang; |
229 | Local-Adaptive Face Recognition Via Graph-Based Meta-Clustering and Regularized Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To support continuous learning and fill the last-mile quality gap, we introduce a new problem setup called Local-Adaptive Face Recognition (LaFR). |
Wenbin Zhu; Chien-Yi Wang; Kuan-Lun Tseng; Shang-Hong Lai; Baoyuan Wang; |
230 | The DEVIL Is in The Details: A Diagnostic Evaluation Benchmark for Video Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although attributes such as camera and background scene motion inherently change the difficulty of the task and affect methods differently, existing evaluation schemes fail to control for them, thereby providing minimal insight into inpainting failure modes. To address this gap, we propose the Diagnostic Evaluation of Video Inpainting on Landscapes (DEVIL) benchmark, which consists of two contributions: (i) a novel dataset of videos and masks labeled according to several key inpainting failure modes, and (ii) an evaluation scheme that samples slices of the dataset characterized by a fixed content attribute, and scores performance on each slice according to reconstruction, realism, and temporal consistency quality. |
Ryan Szeto; Jason J. Corso; |
231 | Mutual Information-Driven Pan-Sharpening Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This leads to information redundancy not being handled well, which further limits the performance of these methods. To address the above issue, we propose a novel mutual information-driven Pan-sharpening framework in this paper. |
Man Zhou; Keyu Yan; Jie Huang; Zihe Yang; Xueyang Fu; Feng Zhao; |
232 | Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Query-modulated Refinement Network (QRNet) to address the inconsistent issue by adjusting intermediate features in the visual backbone with a novel Query-aware Dynamic Attention (QD-ATT) mechanism and query-aware multiscale fusion. |
Jiabo Ye; Junfeng Tian; Ming Yan; Xiaoshan Yang; Xuwu Wang; Ji Zhang; Liang He; Xin Lin; |
233 | A Framework for Learning Ante-Hoc Explainable Models Via Concepts Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Self-explaining deep models are designed to learn the latent concept-based explanations implicitly during training, which eliminates the requirement of any post-hoc explanation generation technique. In this work, we propose one such model that appends an explanation generation module on top of any basic network and jointly trains the whole module that shows high predictive performance and generates meaningful explanations in terms of concepts. |
Anirban Sarkar; Deepak Vijaykeerthy; Anindya Sarkar; Vineeth N Balasubramanian; |
234 | Generating Useful Accident-Prone Driving Scenarios Via A Learned Traffic Prior Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce STRIVE, a method to automatically generate challenging scenarios that cause a given planner to produce undesirable behavior, like collisions. |
Davis Rempe; Jonah Philion; Leonidas J. Guibas; Sanja Fidler; Or Litany; |
235 | FLOAT: Factorized Learning of Object Attributes for Improved Multi-Object Multi-Part Scene Parsing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose FLOAT, a factorized label space framework for scalable multi-object multi-part parsing. |
Rishubh Singh; Pranav Gupta; Pradeep Shenoy; Ravikiran Sarvadevabhatla; |
236 | Efficient Geometry-Aware 3D Generative Adversarial Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. |
Eric R. Chan; Connor Z. Lin; Matthew A. Chan; Koki Nagano; Boxiao Pan; Shalini De Mello; Orazio Gallo; Leonidas J. Guibas; Jonathan Tremblay; Sameh Khamis; Tero Karras; Gordon Wetzstein; |
237 | DO-GAN: A Double Oracle Framework for Generative Adversarial Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new approach to train Generative Adversarial Networks (GANs) where we deploy a double-oracle framework using the generator and discriminator oracles. |
Aye Phyu Phyu Aung; Xinrun Wang; Runsheng Yu; Bo An; Senthilnath Jayavelu; Xiaoli Li; |
238 | Dancing Under The Stars: Video Denoising in Starlight Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we demonstrate photorealistic video under starlight (no moon present, <0.001 lux) for the first time. |
Kristina Monakhova; Stephan R. Richter; Laura Waller; Vladlen Koltun; |
239 | FocusCut: Diving Into A Focus View in Interactive Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the global view makes the model lose focus from later clicks, and is not in line with user intentions. In this paper, we dive into the view of clicks’ eyes to endow them with the decisive role in object details again. |
Zheng Lin; Zheng-Peng Duan; Zhao Zhang; Chun-Le Guo; Ming-Ming Cheng; |
240 | Medial Spectral Coordinates for 3D Shape Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Yet, surprisingly, such coordinates have thus far typically considered only local surface positional or derivative information. In the present article, we propose to equip spectral coordinates with medial (object width) information, so as to enrich them. |
Morteza Rezanejad; Mohammad Khodadad; Hamidreza Mahyar; Herve Lombaert; Michael Gruninger; Dirk Walther; Kaleem Siddiqi; |
241 | Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present Contextualized Spatio-Temporal Contrastive Learning (ConST-CL) to effectively learn spatio-temporally fine-grained video representations via self-supervision. |
Liangzhe Yuan; Rui Qian; Yin Cui; Boqing Gong; Florian Schroff; Ming-Hsuan Yang; Hartwig Adam; Ting Liu; |
242 | Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. |
Liangqiong Qu; Yuyin Zhou; Paul Pu Liang; Yingda Xia; Feifei Wang; Ehsan Adeli; Li Fei-Fei; Daniel Rubin; |
243 | APES: Articulated Part Extraction From Sprite Sheets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Creating these puppets requires partitioning characters into independently moving parts. In this work, we present a method to automatically identify such articulated parts from a small set of character poses shown in a sprite sheet, which is an illustration of the character that artists often draw before puppet creation. |
Zhan Xu; Matthew Fisher; Yang Zhou; Deepali Aneja; Rushikesh Dudhat; Li Yi; Evangelos Kalogerakis; |
244 | Dressing in The Wild By Watching Dance Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper, therefore, attends to virtual try-on in real-world scenes and brings essential improvements in authenticity and naturalness especially for loose garment (e.g., skirts, formal dresses), challenging poses (e.g., cross arms, bent legs), and cluttered backgrounds. |
Xin Dong; Fuwei Zhao; Zhenyu Xie; Xijin Zhang; Daniel K. Du; Min Zheng; Xiang Long; Xiaodan Liang; Jianchao Yang; |
245 | SPAct: Self-Supervised Privacy Preservation for Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For the first time, we present a novel training framework which removes privacy information from input video in a self-supervised manner without requiring privacy labels. |
Ishan Rajendrakumar Dave; Chen Chen; Mubarak Shah; |
246 | Uni6D: A Unified CNN Framework Without Projection Breakdown for 6D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As a consequence, the 3D structure is no longer preserved by a modified depth image or feature. To address this issue, we propose a simple yet effective method denoted as Uni6D that explicitly takes the extra UV data along with RGB-D images as input. |
Xiaoke Jiang; Donghai Li; Hao Chen; Ye Zheng; Rui Zhao; Liwei Wu; |
247 | De-Rendering 3D Objects in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a weakly supervised method that is able to decompose a single image of an object into shape (depth and normals), material (albedo, reflectivity and shininess) and global lighting parameters. |
Felix Wimbauer; Shangzhe Wu; Christian Rupprecht; |
248 | SPAMs: Structured Implicit Parametric Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We observe that deformable object motion is often semantically structured, and thus propose to learn Structured-implicit PArametric Models (SPAMs) as a deformable object representation that structurally decomposes non-rigid object motion into part-based disentangled representations of shape and pose, with each being represented by deep implicit functions. |
Pablo Palafox; Nikolaos Sarafianos; Tony Tung; Angela Dai; |
249 | Global Sensing and Measurements Reuse for Image Compressed Sensing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Moreover, using measurements only once may not be enough for extracting richer information from measurements. To address these issues, we propose a novel Measurements Reuse Convolutional Compressed Sensing Network (MR-CCSNet) which employs Global Sensing Module (GSM) to collect all level features for achieving an efficient sensing and Measurements Reuse Block (MRB) to reuse measurements multiple times on multi-scale. |
Zi-En Fan; Feng Lian; Jia-Ni Quan; |
250 | SeeThroughNet: Resurrection of Auxiliary Loss By Preserving Class Probability Information Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce Class Probability Preserving (CPP) pooling to alleviate information loss in down-sampling the ground truth in semantic segmentation tasks. |
Dasol Han; Jaewook Yoo; Dokwan Oh; |
251 | Representing 3D Shapes With Probabilistic Directed Distance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we endeavour to address both shortcomings with a novel shape representation that allows fast differentiable rendering within an implicit architecture. |
Tristan Aumentado-Armstrong; Stavros Tsogkas; Sven Dickinson; Allan D. Jepson; |
252 | Learning ABCs: Approximate Bijective Correspondence for Isolating Factors of Variation With Weak Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel algorithm that utilizes a weak form of supervision where the data is partitioned into sets according to certain inactive (common) factors of variation which are invariant across elements of each set. |
Kieran A. Murphy; Varun Jampani; Srikumar Ramalingam; Ameesh Makadia; |
253 | ABO: Dataset and Benchmarks for Real-World 3D Object Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Amazon Berkeley Objects (ABO), a new large-scale dataset designed to help bridge the gap between real and virtual 3D worlds. |
Jasmine Collins; Shubham Goel; Kenan Deng; Achleshwar Luthra; Leon Xu; Erhan Gundogdu; Xi Zhang; Tomas F. Yago Vicente; Thomas Dideriksen; Himanshu Arora; Matthieu Guillaumin; Jitendra Malik; |
254 | DETReg: Unsupervised Pretraining With Region Priors for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead, we introduce DETReg, a new self-supervised method that pretrains the entire object detection network, including the object localization and embedding components. |
Amir Bar; Xin Wang; Vadim Kantorov; Colorado J. Reed; Roei Herzig; Gal Chechik; Anna Rohrbach; Trevor Darrell; Amir Globerson; |
255 | Learning To Restore 3D Face From In-the-Wild Degraded Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In-the-wild 3D face modelling is a challenging problem as the predicted facial geometry and texture suffer from a lack of reliable clues or priors, when the input images are degraded. To address such a problem, in this paper we propose a novel Learning to Restore (L2R) 3D face framework for unsupervised high-quality face reconstruction from low-resolution images. |
Zhenyu Zhang; Yanhao Ge; Ying Tai; Xiaoming Huang; Chengjie Wang; Hao Tang; Dongjin Huang; Zhifeng Xie; |
256 | Practical Evaluation of Adversarial Robustness Via Adaptive Auto Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A practical evaluation method should be convenient (i.e., parameter-free), efficient (i.e., fewer iterations) and reliable (i.e., approaching the lower bound of robustness). Towards this target, we propose a parameter-free Adaptive Auto Attack (A3) evaluation method which addresses the efficiency and reliability in a test-time-training fashion. |
Ye Liu; Yaya Cheng; Lianli Gao; Xianglong Liu; Qilong Zhang; Jingkuan Song; |
257 | Convolutions for Spatial Interaction Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we consider the problem of spatial interaction modeling in the context of predicting the motion of actors around autonomous vehicles, and investigate alternatives to GNNs. |
Zhaoen Su; Chao Wang; David Bradley; Carlos Vallespi-Gonzalez; Carl Wellington; Nemanja Djuric; |
258 | MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For detecting actions in those complex videos, efficiently capturing both short-term and long-term temporal information in the video is critical. To this end, we propose a novel ConvTransformer network for action detection. |
Rui Dai; Srijan Das; Kumara Kahatapitiya; Michael S. Ryoo; François Brémond; |
259 | Salvage of Supervision in Weakly Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To bridge the performance and technical gaps between WSOD and FSOD, this paper proposes a new framework, Salvage of Supervision (SoS), with the key idea being to harness every potentially useful supervisory signal in WSOD: the weak image-level labels, the pseudo-labels, and the power of semi-supervised object detection. |
Lin Sui; Chen-Lin Zhang; Jianxin Wu; |
260 | Cross-View Transformers for Real-Time Map-View Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present cross-view transformers, an efficient attention-based model for map-view semantic segmentation from multiple cameras. |
Brady Zhou; Philipp Krähenbühl; |
261 | Distinguishing Unseen From Seen for Generalized Zero-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a novel method which leverages both visual and semantic modalities to distinguish seen and unseen categories. |
Hongzu Su; Jingjing Li; Zhi Chen; Lei Zhu; Ke Lu; |
262 | Online Continual Learning on A Contaminated Data Stream With Blurry Task Boundaries Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To balance diversity and purity in the episodic memory, we propose a novel strategy to manage and use the memory by a unified approach of label noise aware diverse sampling and robust learning with semi-supervised learning. |
Jihwan Bang; Hyunseo Koh; Seulki Park; Hwanjun Song; Jung-Woo Ha; Jonghyun Choi; |
263 | Controllable Dynamic Multi-Task Architectures Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This challenge motivates models which allow control over the relative importance of tasks and total compute cost during inference time. In this work, we propose such a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. |
Dripta S. Raychaudhuri; Yumin Suh; Samuel Schulter; Xiang Yu; Masoud Faraki; Amit K. Roy-Chowdhury; Manmohan Chandraker; |
264 | Learning To Imagine: Diversify Memory for Incremental Learning Using Unlabeled Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although maintaining a handful of samples (called "exemplars") of each task could alleviate forgetting to some extent, existing methods are still limited by the small number of exemplars since these exemplars are too few to carry enough task-specific knowledge, and therefore the forgetting remains. To overcome this problem, we propose to "imagine" diverse counterparts of given exemplars referring to the abundant semantic-irrelevant information from unlabeled data. |
Yu-Ming Tang; Yi-Xing Peng; Wei-Shi Zheng; |
265 | SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we ask, and answer, the wide-ranging question across all MBODFs: How to expose the right set of execution branches and then how to schedule the optimal one at inference time? |
Ran Xu; Fangzhou Mu; Jayoung Lee; Preeti Mukherjee; Somali Chaterji; Saurabh Bagchi; Yin Li; |
266 | VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Hence, in this paper, we introduce adapter-based parameter-efficient transfer learning techniques to V&L models such as VL-BART and VL-T5. |
Yi-Lin Sung; Jaemin Cho; Mohit Bansal; |
267 | Deep Hybrid Models for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a principled and practical method for out-of-distribution (OoD) detection with deep hybrid models (DHMs), which model the joint density p(x,y) of features and labels with a single forward pass. |
Senqi Cao; Zhongfei Zhang; |
268 | Accelerating Video Object Segmentation With Compressed Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an efficient plug-and-play acceleration framework for semi-supervised video object segmentation by exploiting the temporal redundancies in videos presented by the compressed bitstream. |
Kai Xu; Angela Yao; |
269 | Exploring Domain-Invariant Parameters for Source Free Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The motivation behind this insight is clear: the domain-invariant representations are dominated by only partial parameters of an available deep source model. We devise the Domain-Invariant Parameter Exploring (DIPE) approach to capture such domain-invariant parameters in the source model to generate domain-invariant representations. |
Fan Wang; Zhongyi Han; Yongshun Gong; Yilong Yin; |
270 | FastDOG: Fast Discrete Optimization on GPU Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a massively parallel Lagrange decomposition method for solving 0–1 integer linear programs occurring in structured prediction. |
Ahmed Abbas; Paul Swoboda; |
271 | Fire Together Wire Together: A Dynamic Pruning Approach With Self-Supervised Mask Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our contribution is two-fold: 1) decoupled task and pruning training. 2) Simple hyperparameter selection that enables FLOPs reduction estimation before training. |
Sara Elkerdawy; Mostafa Elhoushi; Hong Zhang; Nilanjan Ray; |
272 | Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel multi-source uncertainty mining method to facilitate unsupervised deep learning from multiple noisy labels generated by traditional handcrafted SOD methods. |
Yifan Wang; Wenbo Zhang; Lijun Wang; Ting Liu; Huchuan Lu; |
273 | Self-Supervised Equivariant Learning for Oriented Keypoint Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To learn to detect robust oriented keypoints, we introduce a self-supervised learning framework using rotation-equivariant CNNs. |
Jongmin Lee; Byungjin Kim; Minsu Cho; |
274 | Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The results show that GANs, especially small GANs lack the ability to generate high-quality high frequency information. To address this problem, we propose a novel knowledge distillation method referred to as wavelet knowledge distillation. |
Linfeng Zhang; Xin Chen; Xiaobing Tu; Pengfei Wan; Ning Xu; Kaisheng Ma; |
275 | Focal and Global Knowledge Distillation for Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we point out that in object detection, the features of the teacher and student vary greatly in different areas, especially in the foreground and background. |
Zhendong Yang; Zhe Li; Xiaohu Jiang; Yuan Gong; Zehuan Yuan; Danpei Zhao; Chun Yuan; |
276 | Learning To Prompt for Continual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Typical methods rely on a rehearsal buffer or known task identity at test time to retrieve learned knowledge and address forgetting, while this work presents a new paradigm for continual learning that aims to train a more succinct memory system without accessing task identity at test time. |
Zifeng Wang; Zizhao Zhang; Chen-Yu Lee; Han Zhang; Ruoxi Sun; Xiaoqi Ren; Guolong Su; Vincent Perot; Jennifer Dy; Tomas Pfister; |
277 | Human Mesh Recovery From Multiple Shots Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the richness of data comes at the expense of fundamental challenges such as abrupt shot changes and close up shots of actors with heavy truncation, which limits the applicability of existing 3D human understanding methods. In this paper, we address these limitations with the insight that while shot changes of the same scene incur a discontinuity between frames, the 3D structure of the scene still changes smoothly. |
Georgios Pavlakos; Jitendra Malik; Angjoo Kanazawa; |
278 | Improving Adversarial Transferability Via Neuron Attribution-Based Attacks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing feature-level attacks generally employ inaccurate neuron importance estimations, which deteriorates their transferability. To overcome such pitfalls, in this paper, we propose the Neuron Attribution-based Attack (NAA), which conducts feature-level attacks with more accurate neuron importance estimations. |
Jianping Zhang; Weibin Wu; Jen-tse Huang; Yizhan Huang; Wenxuan Wang; Yuxin Su; Michael R. Lyu; |
279 | Better Trigger Inversion Optimization in Backdoor Scanning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We develop a new optimization method that directly minimizes individual pixel changes, without using a mask. |
Guanhong Tao; Guangyu Shen; Yingqi Liu; Shengwei An; Qiuling Xu; Shiqing Ma; Pan Li; Xiangyu Zhang; |
280 | GANSeg: Learning To Segment By Unsupervised Hierarchical Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Weakly-supervised and unsupervised methods exist, but they depend on the comparison of pairs of images, such as from multi-views, frames of videos, and image augmentation, which limits their applicability. To address this, we propose a GAN-based approach that generates images conditioned on latent masks, thereby alleviating full or weak annotations required in previous approaches. |
Xingzhe He; Bastian Wandt; Helge Rhodin; |
281 | Dense Learning Based Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although a few works have proposed various self-training-based methods or consistency-regularization-based methods, they all target anchor-based detectors, while ignoring the dependency on anchor-free detectors of the actual industrial deployment. To this end, in this paper, we intend to bridge the gap on anchor-free SSOD algorithm by proposing a DenSe Learning (DSL) based algorithm for SSOD. |
Binghui Chen; Pengyu Li; Xiang Chen; Biao Wang; Lei Zhang; Xian-Sheng Hua; |
282 | Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To mimic humans’ mental simulation process, we present FixNet, a novel framework that seamlessly incorporates perception and physical dynamics. |
Yining Hong; Kaichun Mo; Li Yi; Leonidas J. Guibas; Antonio Torralba; Joshua B. Tenenbaum; Chuang Gan; |
283 | Convolution of Convolution: Let Kernels Spatially Collaborate Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In the biological visual pathway, especially the retina, neurons are tiled along spatial dimensions with the electrical coupling as their local association, while in a convolution layer, kernels are placed along the channel dimension singly. We propose Convolution of Convolution, associating kernels in a layer and letting them collaborate spatially. |
Rongzhen Zhao; Jian Li; Zhenzhi Wu; |
284 | Make It Move: Controllable Image-to-Video Generation With Text Descriptions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The key challenges of TI2V task lie both in aligning appearance and motion from different modalities, and in handling uncertainty in text descriptions. To address these challenges, we propose a Motion Anchor-based video GEnerator (MAGE) with an innovative motion anchor (MA) structure to store appearance-motion aligned representation. |
Yaosi Hu; Chong Luo; Zhenzhong Chen; |
285 | C2AM Loss: Chasing A Better Decision Boundary for Long-Tail Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Hence, we devise a Category-Aware Angular Margin Loss (C2AM Loss) to introduce an adaptive angular margin between any two categories |
Tong Wang; Yousong Zhu; Yingying Chen; Chaoyang Zhao; Bin Yu; Jinqiao Wang; Ming Tang; |
286 | Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Neural Points, a novel point cloud representation and apply it to the arbitrary-factored upsampling task. |
Wanquan Feng; Jin Li; Hongrui Cai; Xiaonan Luo; Juyong Zhang; |
287 | Distribution Consistent Neural Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Instead, in this paper, we propose a novel distribution consistent one-shot neural architecture search algorithm. |
Junyi Pan; Chong Sun; Yizhou Zhou; Ying Zhang; Chen Li; |
288 | Video-Text Representation Learning Via Differentiable Weak Temporal Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel multi-modal self-supervised framework Video-Text Temporally Weak Alignment-based Contrastive Learning (VT-TWINS) to capture significant information from noisy and weakly correlated data using a variant of Dynamic Time Warping (DTW). |
Dohwan Ko; Joonmyung Choi; Juyeon Ko; Shinyeong Noh; Kyoung-Woon On; Eun-Sol Kim; Hyunwoo J. Kim; |
289 | Bi-Directional Object-Context Prioritization Learning for Saliency Ranking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This inspires us to model the region-level interactions, in addition to the object-level reasoning, for saliency ranking. To this end, we propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking. |
Xin Tian; Ke Xu; Xin Yang; Lin Du; Baocai Yin; Rynson W.H. Lau; |
290 | FreeSOLO: Learning To Segment Objects Without Annotations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present FreeSOLO, a self-supervised instance segmentation framework built on top of the simple instance segmentation method SOLO. |
Xinlong Wang; Zhiding Yu; Shalini De Mello; Jan Kautz; Anima Anandkumar; Chunhua Shen; Jose M. Alvarez; |
291 | What Do Navigation Agents Learn About Their Environment? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce the Interpretability System for Embodied agEnts (iSEE) for Point Goal (PointNav) and Object Goal (ObjectNav) navigation models. |
Kshitij Dwivedi; Gemma Roig; Aniruddha Kembhavi; Roozbeh Mottaghi; |
292 | Progressive Minimal Path Method With Embedded CNN Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Path-CNN, a method for the segmentation of centerlines of tubular structures by embedding convolutional neural networks (CNNs) into the progressive minimal path method. |
Wei Liao; |
293 | FIFO: Learning Fog-Invariant Features for Foggy Scene Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Robust visual recognition under adverse weather conditions is of great importance in real-world applications. In this context, we propose a new method for learning semantic segmentation models robust against fog. |
Sohyun Lee; Taeyoung Son; Suha Kwak; |
294 | 3D Human Tongue Reconstruction From Single "In-the-Wild" Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we present the first, to the best of our knowledge, end-to-end trainable pipeline that accurately reconstructs the 3D face together with the tongue. |
Stylianos Ploumpis; Stylianos Moschoglou; Vasileios Triantafyllou; Stefanos Zafeiriou; |
295 | Enhancing Adversarial Robustness for Deep Metric Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Conversely, we propose Hardness Manipulation to efficiently perturb the training triplet till a specified level of hardness for adversarial training, according to a harder benign triplet or a pseudo-hardness function. |
Mo Zhou; Vishal M. Patel; |
296 | Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, ViTs are mainly designed for image classification that generate single-scale low-resolution representations, which makes dense prediction tasks such as semantic segmentation challenging for ViTs. Therefore, we propose HRViT, which enhances ViTs to learn semantically-rich and spatially-precise multi-scale representations by integrating high-resolution multi-branch architectures with ViTs. |
Jiaqi Gu; Hyoukjun Kwon; Dilin Wang; Wei Ye; Meng Li; Yu-Hsin Chen; Liangzhen Lai; Vikas Chandra; David Z. Pan; |
297 | Lite-MDETR: A Lightweight Multi-Modal Detector Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a Lightweight modulated detector, Lite-MDETR, to facilitate efficient end-to-end multi-modal understanding on mobile devices. |
Qian Lou; Yen-Chang Hsu; Burak Uzkent; Ting Hua; Yilin Shen; Hongxia Jin; |
298 | CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce Coordinate GAN (CoordGAN), a structure-texture disentangled GAN that learns a dense correspondence map for each generated image. |
Jiteng Mu; Shalini De Mello; Zhiding Yu; Nuno Vasconcelos; Xiaolong Wang; Jan Kautz; Sifei Liu; |
299 | A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a simple transfer learning baseline for sign language translation. |
Yutong Chen; Fangyun Wei; Xiao Sun; Zhirong Wu; Stephen Lin; |
300 | Unsupervised Visual Representation Learning By Online Constrained K-Means Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address these challenges, we first investigate the objective of clustering-based representation learning. Based on this, we propose a novel clustering-based pretext task with online Constrained K-means (CoKe). |
Qi Qian; Yuanhong Xu; Juhua Hu; Hao Li; Rong Jin; |
301 | Neural Point Light Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce Neural Point Light Fields that represent scenes implicitly with a light field living on a sparse point cloud. |
Julian Ost; Issam Laradji; Alejandro Newell; Yuval Bahat; Felix Heide; |
302 | Vehicle Trajectory Prediction Works, But Not Everywhere Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel method that automatically generates realistic scenes causing state-of-the-art models to go off-road. |
Mohammadhossein Bahari; Saeed Saadatnejad; Ahmad Rahimi; Mohammad Shaverdikondori; Amir Hossein Shahidzadeh; Seyed-Mohsen Moosavi-Dezfooli; Alexandre Alahi; |
303 | PSMNet: Position-Aware Stereo Merging Network for Room Layout Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new deep learning-based method for estimating room layout given a pair of 360 panoramas. |
Haiyan Wang; Will Hutchcroft; Yuguang Li; Zhiqiang Wan; Ivaylo Boyadzhiev; Yingli Tian; Sing Bing Kang; |
304 | MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Some existing methods leverage depth information from an off-the-shelf depth estimator to assist 3D detection, but suffer from the additional computational burden and achieve limited performance caused by inaccurate depth priors. To alleviate this, we propose MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection. |
Kuan-Chih Huang; Tsung-Han Wu; Hung-Ting Su; Winston H. Hsu; |
305 | Learning Graph Regularisation for Guided Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a novel formulation for guided super-resolution. |
Riccardo de Lutio; Alexander Becker; Stefano D’Aronco; Stefania Russo; Jan D. Wegner; Konrad Schindler; |
306 | Instance-Wise Occlusion and Depth Orders in Natural Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a new dataset, named InstaOrder, that can be used to understand the spatial relationships of instances in a 3D space. |
Hyunmin Lee; Jaesik Park; |
307 | Look for The Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we seek to temporally localize object states (e.g. "empty" and "full" cup) together with the corresponding state-modifying actions ("pouring coffee") in long uncurated videos with minimal supervision. |
Tomáš Souček; Jean-Baptiste Alayrac; Antoine Miech; Ivan Laptev; Josef Sivic; |
308 | Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents new hierarchically cascaded transformers that can improve data efficiency through attribute surrogates learning and spectral tokens pooling. |
Yangji He; Weihan Liang; Dongyang Zhao; Hong-Yu Zhou; Weifeng Ge; Yizhou Yu; Wenqiang Zhang; |
309 | Generalized Category Discovery Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set. |
Sagar Vaze; Kai Han; Andrea Vedaldi; Andrew Zisserman; |
310 | Maximum Consensus By Weighted Influences of Monotone Boolean Functions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper studies the concept of weighted influences for solving MaxCon. |
Erchuan Zhang; David Suter; Ruwan Tennakoon; Tat-Jun Chin; Alireza Bab-Hadiashar; Giang Truong; Syed Zulqarnain Gilani; |
311 | TransforMatcher: Match-to-Match Attention for Semantic Correspondence Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce a strong semantic image matching learner, dubbed TransforMatcher, which builds on the success of transformer networks in vision domains. |
Seungwook Kim; Juhong Min; Minsu Cho; |
312 | Robust Outlier Detection By De-Biasing VAE Likelihoods Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose novel analytical and algorithmic approaches to ameliorate key biases with VAE likelihoods. |
Kushal Chauhan; Barath Mohan U; Pradeep Shenoy; Manish Gupta; Devarajan Sridharan; |
313 | Contour-Hugging Heatmaps for Landmark Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an effective and easy-to-implement method for simultaneously performing landmark detection in images and obtaining an ingenious uncertainty measurement for each landmark. |
James McCouat; Irina Voiculescu; |
314 | Voxel Field Fusion for 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion. |
Yanwei Li; Xiaojuan Qi; Yukang Chen; Liwei Wang; Zeming Li; Jian Sun; Jiaya Jia; |
315 | Divide and Conquer: Compositional Experts for Generalized Novel Class Discovery Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Despite showing competitive performance on novel classes, they fail to generalize to recognizing samples from both base and novel sets. In this paper, we focus on this generalized setting of NCD (GNCD), and propose to divide and conquer it with two groups of Compositional Experts (ComEx). |
Muli Yang; Yuehua Zhu; Jiaping Yu; Aming Wu; Cheng Deng; |
316 | Programmatic Concept Learning for Human Motion Description and Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce Programmatic Motion Concepts, a hierarchical motion representation for human actions that captures both low level motion and high level description as motion concepts. |
Sumith Kulal; Jiayuan Mao; Alex Aiken; Jiajun Wu; |
317 | Interpretable Part-Whole Hierarchies and Conceptual-Semantic Relationships in Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we want to make a step forward towards interpretability in neural networks, providing new tools to interpret their behavior. |
Nicola Garau; Niccolò Bisagno; Zeno Sambugaro; Nicola Conci; |
318 | Fast Algorithm for Low-Rank Tensor Completion in Delay-Embedded Space Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Recent studies have shown high completion performance with a relatively small window size, but experiments with large window sizes require huge amount of memory and cannot be easily calculated. In this study, we address this serious computational issue, and propose its fast and efficient algorithm. |
Ryuki Yamamoto; Hidekata Hontani; Akira Imakura; Tatsuya Yokota; |
319 | Panoptic, Instance and Semantic Relations: A Relational Context Encoder To Enhance Panoptic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a novel framework to integrate both semantic and instance contexts for panoptic segmentation. |
Shubhankar Borse; Hyojin Park; Hong Cai; Debasmit Das; Risheek Garrepalli; Fatih Porikli; |
320 | Point2Seq: Detecting 3D Objects As Sequences Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds. |
Yujing Xue; Jiageng Mao; Minzhe Niu; Hang Xu; Michael Bi Mi; Wei Zhang; Xiaogang Wang; Xinchao Wang; |
321 | Less Is More: Generating Grounded Navigation Instructions From Landmarks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes. |
Su Wang; Ceslee Montgomery; Jordi Orbay; Vighnesh Birodkar; Aleksandra Faust; Izzeddin Gur; Natasha Jaques; Austin Waters; Jason Baldridge; Peter Anderson; |
322 | Task-Adaptive Negative Envision for Few-Shot Open-Set Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we instead propose task-adaptive negative class envision for FSOR to integrate threshold tuning into the learning process. |
Shiyuan Huang; Jiawei Ma; Guangxing Han; Shih-Fu Chang; |
323 | DisARM: Displacement Aware Relation Module for 3D Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Displacement Aware Relation Module (DisARM), a novel neural network module for enhancing the performance of 3D object detection in point cloud scenes. |
Yao Duan; Chenyang Zhu; Yuqing Lan; Renjiao Yi; Xinwang Liu; Kai Xu; |
324 | ETHSeg: An Amodel Instance Segmentation Network and A Real-World Dataset for X-Ray Waste Inspection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a novel problem of instance-level waste segmentation in X-ray image for intelligent waste inspection, and contribute a real dataset consisting of 5,038 X-ray images (totally 30,881 waste items) with high-quality annotations (i.e., waste categories, object boxes, and instance-level masks) as a benchmark for this problem. |
Lingteng Qiu; Zhangyang Xiong; Xuhao Wang; Kenkun Liu; Yihan Li; Guanying Chen; Xiaoguang Han; Shuguang Cui; |
325 | MixFormer: Mixing Features Across Windows and Dimensions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While local-window self-attention performs notably in vision tasks, it suffers from limited receptive field and weak modeling capability issues. This is mainly because it performs self-attention within non-overlapped windows and shares weights on the channel dimension. We propose MixFormer to find a solution. |
Qiang Chen; Qiman Wu; Jian Wang; Qinghao Hu; Tao Hu; Errui Ding; Jian Cheng; Jingdong Wang; |
326 | Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs By Partial FC Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a sparsely updating variant of the FC layer, named Partial FC (PFC). |
Xiang An; Jiankang Deng; Jia Guo; Ziyong Feng; XuHan Zhu; Jing Yang; Tongliang Liu; |
327 | NeRF-Editing: Geometry Editing of Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a method that allows users to perform controllable shape deformation on the implicit representation of the scene, and synthesizes the novel view images of the edited scene without re-training the network. |
Yu-Jie Yuan; Yang-Tian Sun; Yu-Kun Lai; Yuewen Ma; Rongfei Jia; Lin Gao; |
328 | Optimal Correction Cost for Object Detection Evaluation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To alleviate the gap between downstream tasks and the evaluation scenario, we propose Optimal Correction Cost (OC-cost), which assesses detection accuracy at image level. |
Mayu Otani; Riku Togashi; Yuta Nakashima; Esa Rahtu; Janne Heikkilä; Shin’ichi Satoh; |
329 | Contextual Similarity Distillation for Asymmetric Image Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, existing approaches either fail to achieve feature coherence or make strong assumptions, e.g., requiring labeled datasets or classifiers from large model, etc., which limits their practical application. To this end, we propose a flexible contextual similarity distillation framework to enhance the small query model and keep its output feature compatible with that of large gallery model, which is crucial with asymmetric retrieval. |
Hui Wu; Min Wang; Wengang Zhou; Houqiang Li; Qi Tian; |
330 | FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, we propose to parse pairwise query and exemplar action instances into consecutive steps with diverse semantic and temporal correspondences. |
Jinglin Xu; Yongming Rao; Xumin Yu; Guangyi Chen; Jie Zhou; Jiwen Lu; |
331 | Artistic Style Discovery With Independent Components Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In particular, we take a closer look into the mechanism of style transfer and obtain different artistic style components from the latent space consisting of different style features. |
Xin Xie; Yi Li; Huaibo Huang; Haiyan Fu; Wanwan Wang; Yanqing Guo; |
332 | HEAT: Holistic Edge Attention Transformer for Structured Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a novel attention-based neural network for structured reconstruction, which takes a 2D raster image as an input and reconstructs a planar graph depicting an underlying geometric structure. |
Jiacheng Chen; Yiming Qian; Yasutaka Furukawa; |
333 | HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce this approach into the realm of encoder-based inversion. |
Yuval Alaluf; Omer Tov; Ron Mokady; Rinon Gal; Amit Bermano; |
334 | DASO: Distribution-Aware Semantics-Oriented Pseudo-Label for Imbalanced Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The capability of the traditional semi-supervised learning (SSL) methods is far from real-world application due to severely biased pseudo-labels caused by (1) class imbalance and (2) class distribution mismatch between labeled and unlabeled data. This paper addresses such a relatively under-explored problem. |
Youngtaek Oh; Dong-Jin Kim; In So Kweon; |
335 | Mobile-Former: Bridging MobileNet and Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Mobile-Former, a parallel design of MobileNet and transformer with a two-way bridge in between. |
Yinpeng Chen; Xiyang Dai; Dongdong Chen; Mengchen Liu; Xiaoyi Dong; Lu Yuan; Zicheng Liu; |
336 | Exploiting Pseudo Labels in A Self-Supervised Learning Framework for Improved Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel self-distillation based self-supervised monocular depth estimation (SD-SSMDE) learning framework. |
Andra Petrovai; Sergiu Nedevschi; |
337 | DESTR: Object Detection With Split Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: First, we propose a new Detection Split Transformer (DESTR) that separates estimation of cross-attention into two independent branches — one tailored for classification and the other for box regression. |
Liqiang He; Sinisa Todorovic; |
338 | LTP: Lane-Based Trajectory Prediction for Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a two-stage proposal-based motion forecasting method that exploits the sliced lane segments as fine-grained, shareable, and interpretable proposals. |
Jingke Wang; Tengju Ye; Ziqing Gu; Junbo Chen; |
339 | CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the difficulties, we propose a new framework for scribble learning-based medical image segmentation, which is composed of mix augmentation and cycle consistency and thus is referred to as CycleMix. |
Ke Zhang; Xiahai Zhuang; |
340 | VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most of them only support a fixed up-sampling scale, which limits their flexibility and applications. In this work, instead of following the discrete representations, we propose Video Implicit Neural Representation (VideoINR), and we show its applications for STVSR. |
Zeyuan Chen; Yinbo Chen; Jingwen Liu; Xingqian Xu; Vidit Goel; Zhangyang Wang; Humphrey Shi; Xiaolong Wang; |
341 | Towards End-to-End Unified Scene Text Detection and Layout Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Scene text detection and document layout analysis have long been treated as two separate tasks in different image domains. In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis. |
Shangbang Long; Siyang Qin; Dmitry Panteleev; Alessandro Bissacco; Yasuhisa Fujii; Michalis Raptis; |
342 | Image Based Reconstruction of Liquids From 2D Surface Detections Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we present a solution to the challenging problem of reconstructing liquids from image data. |
Florian Richter; Ryan K. Orosco; Michael C. Yip; |
343 | Contextual Outpainting With Object-Level Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To explore the semantic cues provided by the remaining foreground contents, we propose a novel ConTextual Outpainting GAN (CTO-GAN), leveraging the semantic layout as a bridge to synthesize coherent and diverse background contents. |
Jiacheng Li; Chang Chen; Zhiwei Xiong; |
344 | AP-BSN: Self-Supervised Denoising for Real-World Images Via Asymmetric PD and Blind-Spot Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, it is not trivial to integrate PD and BSN directly, which prevents the fully self-supervised denoising model on real-world images. We propose an Asymmetric PD (AP) to address this issue, which introduces different PD stride factors for training and inference. |
Wooseok Lee; Sanghyun Son; Kyoung Mu Lee; |
345 | AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an autoregressive prior for 3D shapes to solve multimodal 3D tasks such as shape completion, reconstruction, and generation. |
Paritosh Mittal; Yen-Chi Cheng; Maneesh Singh; Shubham Tulsiani; |
346 | ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we first show that optimal neural architectures in the DIP framework are image-dependent. Leveraging this insight, we then propose an image-specific NAS strategy for the DIP framework that requires substantially less training than typical NAS approaches, effectively enabling image-specific NAS. |
Metin Ersin Arican; Ozgur Kara; Gustav Bredell; Ender Konukoglu; |
347 | Depth-Guided Sparse Structure-From-Motion for Movies and TV Shows Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The resulting small-motion parallax between video frames makes standard geometry-based SfM approaches not as effective for movies and TV shows. To address this challenge, we propose a simple yet effective approach that uses single-frame depth-prior obtained from a pretrained network to significantly improve geometry-based SfM for our small-parallax setting. |
Sheng Liu; Xiaohan Nie; Raffay Hamid; |
348 | End-to-End Referring Video Object Segmentation With Multimodal Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple Transformer-based approach to RVOS. |
Adam Botach; Evgenii Zheltonozhskii; Chaim Baskin; |
349 | Unpaired Cartoon Image Synthesis Via Gated Cycle Mapping Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a general-purpose solution to cartoon image synthesis with unpaired training data. |
Yifang Men; Yuan Yao; Miaomiao Cui; Zhouhui Lian; Xuansong Xie; Xian-Sheng Hua; |
350 | IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present IterMVS, a new data-driven method for high-resolution multi-view stereo. |
Fangjinhua Wang; Silvano Galliani; Christoph Vogel; Marc Pollefeys; |
351 | Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In particular, the foreground points are inherently more important than background points for object detectors. Motivated by this, we propose a highly-efficient single-stage point-based 3D detector in this paper, termed IA-SSD. |
Yifan Zhang; Qingyong Hu; Guoquan Xu; Yanxin Ma; Jianwei Wan; Yulan Guo; |
352 | FedCorr: Multi-Stage Federated Learning for Label Noise Correction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose FedCorr, a general multi-stage framework to tackle heterogeneous label noise in FL, without making any assumptions on the noise models of local clients, while still maintaining client data privacy. |
Jingyi Xu; Zihan Chen; Tony Q.S. Quek; Kai Fong Ernest Chong; |
353 | Detecting Camouflaged Object in Frequency Domain Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To well involve the frequency clues into the CNN models, we present a powerful network with two special components. |
Yijie Zhong; Bo Li; Lv Tang; Senyun Kuang; Shuang Wu; Shouhong Ding; |
354 | RigNeRF: Fully Controllable Neural 3D Portraits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose RigNeRF, a system that goes beyond just novel view synthesis and enables full control of head pose and facial expressions learned from a single portrait video. |
ShahRukh Athar; Zexiang Xu; Kalyan Sunkavalli; Eli Shechtman; Zhixin Shu; |
355 | CLIP-Forge: Towards Zero-Shot Text-To-Shape Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While significant recent progress has been made in text-to-image generation, text-to-shape generation remains a challenging problem due to the unavailability of paired text and shape data at a large scale. We present a simple yet effective method for zero-shot text-to-shape generation that circumvents such data scarcity. |
Aditya Sanghi; Hang Chu; Joseph G. Lambourne; Ye Wang; Chin-Yi Cheng; Marco Fumero; Kamal Rahimi Malekshan; |
356 | Style-Based Global Appearance Flow for Virtual Try-On Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: They are thus intrinsically susceptible to difficult body poses/occlusions and large mis-alignments between person and garment images. To overcome this limitation, a novel global appearance flow estimation model is proposed in this work. |
Sen He; Yi-Zhe Song; Tao Xiang; |
357 | Source-Free Object Detection By Learning To Overlook Domain Style Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This approach suffers from both unsatisfactory accuracy of pseudo labels due to the presence of domain shift and limited use of target domain training data. In this work, we present a novel Learning to Overlook Domain Style (LODS) method with such limitations solved in a principled manner. |
Shuaifeng Li; Mao Ye; Xiatian Zhu; Lihua Zhou; Lin Xiong; |
358 | Active Learning for Open-Set Annotation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, in real annotation tasks, the unlabeled data usually contains a large amount of examples from unknown classes, resulting in the failure of most active learning methods. To tackle this open-set annotation (OSA) problem, we propose a new active learning framework called LfOSA, which boosts the classification performance with an effective sampling strategy to precisely detect examples from known classes for annotation. |
Kun-Peng Ning; Xun Zhao; Yu Li; Sheng-Jun Huang; |
359 | SceneSqueezer: Learning To Compress Scene for Camera Relocalization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We design a novel framework that compresses a scene while still maintaining localization accuracy. |
Luwei Yang; Rakesh Shrestha; Wenbo Li; Shuaicheng Liu; Guofeng Zhang; Zhaopeng Cui; Ping Tan; |
360 | SelfRecon: Self Reconstruction Your Digital Avatar From Monocular Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose SelfRecon, a clothed human body reconstruction method that combines implicit and explicit representations to recover space-time coherent geometries from a monocular self-rotating human video. |
Boyi Jiang; Yang Hong; Hujun Bao; Juyong Zhang; |
361 | Instance-Dependent Label-Noise Learning With Manifold-Regularized Transition Matrix Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address this problem, we have noticed that, there are psychological and physiological evidences showing that we humans are more likely to annotate instances of similar appearances to the same classes, and thus poor-quality or ambiguous instances of similar appearances are easier to be mislabeled to the correlated or same noisy classes. Therefore, we propose assumption on the geometry of T(x) that the closer two instances are, the more similar their corresponding transition matrices should be.. |
De Cheng; Tongliang Liu; Yixiong Ning; Nannan Wang; Bo Han; Gang Niu; Xinbo Gao; Masashi Sugiyama; |
362 | Rethinking The Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance With Expanded Views Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Treating each type of augmentation equally during training makes the model learn non-optimal representations for various downstream tasks and limits the flexibility to choose augmentation types beforehand. Second, the strong data augmentations used in classic contrastive learning methods may bring too much invariance in some cases, and fine-grained information that is essential to some downstream tasks may be lost. This paper proposes a general method to alleviate these two problems by considering "where" and "what" to contrast in a general contrastive learning framework. |
Junbo Zhang; Kaisheng Ma; |
363 | Self-Supervised Models Are Continual Learners Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we show that self-supervised loss functions can be seamlessly converted into distillation mechanisms for CL by adding a predictor network that maps the current state of the representations to their past state. |
Enrico Fini; Victor G. Turrisi da Costa; Xavier Alameda-Pineda; Elisa Ricci; Karteek Alahari; Julien Mairal; |
364 | Dreaming To Prune Image Deraining Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We note that it is overstretched to fine-tune the compressed model using self-collected data, as it exhibits poor generalization over images with different degradation characteristics. To address this problem, we propose a novel data-free compression framework for deraining networks. |
Weiqi Zou; Yang Wang; Xueyang Fu; Yang Cao; |
365 | Equivariant Point Cloud Analysis Via Learning Orientations for Message Passing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel and simple framework to achieve equivariance for point cloud analysis based on the message passing (graph neural network) scheme. |
Shitong Luo; Jiahan Li; Jiaqi Guan; Yufeng Su; Chaoran Cheng; Jian Peng; Jianzhu Ma; |
366 | When Does Contrastive Visual Representation Learning Work? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: By looking through the lenses of data quantity, data domain, data quality, and task granularity, we provide new insights into the necessary conditions for successful self-supervised learning. |
Elijah Cole; Xuan Yang; Kimberly Wilber; Oisin Mac Aodha; Serge Belongie; |
367 | One Step at A Time: Long-Horizon Vision-and-Language Navigation With Milestones Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in the middle of the long instructions and eventually fail the task. To address this challenge, we propose a model-agnostic milestone-based task tracker (M-Track) to guide the agent and monitor its progress. |
Chan Hee Song; Jihyung Kil; Tai-Yu Pan; Brian M. Sadler; Wei-Lun Chao; Yu Su; |
368 | Node Representation Learning in Graph Via Node-to-Neighbourhood Mutual Information Maximization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a simple-yet-effective self-supervised node representation learning strategy via directly maximizing the mutual information between the hidden representations of nodes and their neighbourhood, which can be theoretically justified by its link to graph smoothing. |
Wei Dong; Junsheng Wu; Yi Luo; Zongyuan Ge; Peng Wang; |
369 | Point Cloud Pre-Training With Natural 3D Structures Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Therefore, constructing a largescale 3D point clouds dataset is difficult. In order to remedy this issue, we propose a newly developed point cloud fractal database (PC-FractalDB), which is a novel family of formula-driven supervised learning inspired by fractal geometry encountered in natural 3D structures. |
Ryosuke Yamada; Hirokatsu Kataoka; Naoya Chiba; Yukiyasu Domae; Tetsuya Ogata; |
370 | Scene Consistency Representation Learning for Video Scene Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Spotting the correct scene boundary from the long-term video is a challenging task, since a model must understand the storyline of the video to figure out where a scene starts and ends. To this end, we propose an effective Self-Supervised Learning (SSL) framework to learn better shot representations from unlabeled long-term videos. |
Haoqian Wu; Keyu Chen; Yanan Luo; Ruizhi Qiao; Bo Ren; Haozhe Liu; Weicheng Xie; Linlin Shen; |
371 | Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A complementary way towards robustness is to introduce a rejection option, allowing the model to not return predictions on uncertain inputs, where confidence is a commonly used certainty proxy. Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones. |
Tianyu Pang; Huishuai Zhang; Di He; Yinpeng Dong; Hang Su; Wei Chen; Jun Zhu; Tie-Yan Liu; |
372 | Exploiting Explainable Metrics for Augmented SGD Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we address the following question: can we probe intermediate layers of a deep neural network to identify and quantify the learning quality of each layer? |
Mahdi S. Hosseini; Mathieu Tuli; Konstantinos N. Plataniotis; |
373 | Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This is called inner-video overfitting, and it would actually lead to inferior performance. To tackle this issue, we propose a novel inter-frame feature reconstruction (IFR) technique to leverage the ground-truth labels to supervise the model training on unlabeled frames. |
Jiafan Zhuang; Zilei Wang; Yuan Gao; |
374 | GenDR: A Generalized Differentiable Renderer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present and study a generalized family of differentiable renderers. |
Felix Petersen; Bastian Goldluecke; Christian Borgelt; Oliver Deussen; |
375 | Improving Neural Implicit Surfaces Geometry With Patch Warping Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Neural implicit surfaces have become an important technique for multi-view 3D reconstruction but their accuracy remains limited. In this paper, we argue that this comes from the difficulty to learn and render high frequency textures with neural networks. |
François Darmon; Bénédicte Bascle; Jean-Clément Devaux; Pascal Monasse; Mathieu Aubry; |
376 | XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a robust layout-aware multimodal network named XYLayoutLM to capture and leverage rich layout information from proper reading orders produced by our Augmented XY Cut. |
Zhangxuan Gu; Changhua Meng; Ke Wang; Jun Lan; Weiqiang Wang; Ming Gu; Liqing Zhang; |
377 | Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization With A Bayesian Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This task is particularly challenging for deep neural networks because data is difficult to obtain and annotate. Therefore, we formulate amodal segmentation as an out-of-task and out-of-distribution generalization problem. |
Yihong Sun; Adam Kortylewski; Alan Yuille; |
378 | How Well Do Sparse ImageNet Models Transfer? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Generally, more accurate models on the "upstream" dataset tend to provide better transfer accuracy "downstream". In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned–that is, compressed by sparsifiying their connections. |
Eugenia Iofinova; Alexandra Peste; Mark Kurtz; Dan Alistarh; |
379 | REX: Reasoning-Aware and Grounded Explanation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper aims to close the gap from three distinct perspectives: first, we define a new type of multi-modal explanations that explain the decisions by progressively traversing the reasoning process and grounding keywords in the images. |
Shi Chen; Qi Zhao; |
380 | Dynamic Dual-Output Diffusion Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we reveal some of the causes that affect the generation quality of diffusion models, especially when sampling with few iterations, and come up with a simple, yet effective, solution to mitigate them. |
Yaniv Benny; Lior Wolf; |
381 | StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a new framework, StyleT2I, to improve the compositionality of text-to-image synthesis. |
Zhiheng Li; Martin Renqiang Min; Kai Li; Chenliang Xu; |
382 | JoinABLe: Learning Bottom-Up Assembly of Parametric CAD Joints Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we introduce JoinABLe, a learning-based method that assembles parts together to form joints. |
Karl D.D. Willis; Pradeep Kumar Jayaraman; Hang Chu; Yunsheng Tian; Yifei Li; Daniele Grandi; Aditya Sanghi; Linh Tran; Joseph G. Lambourne; Armando Solar-Lezama; Wojciech Matusik; |
383 | CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation Via Neural Homeomorphism Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Canonical Deformation Coordinate Space (CaDeX), a unified representation of both shape and nonrigid motion. |
Jiahui Lei; Kostas Daniilidis; |
384 | Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In the work, we disentangle the direct offset into Local Canonical Coordinates (LCC), box scales and box orientations. |
Yang You; Zelin Ye; Yujing Lou; Chengkun Li; Yong-Lu Li; Lizhuang Ma; Weiming Wang; Cewu Lu; |
385 | V-Doc: Visual Questions Answers With Documents Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose V-Doc, a question-answering tool using document images and PDF, mainly for researchers and general non-deep learning experts looking to generate, process, and understand the document visual question answering tasks. |
Yihao Ding; Zhe Huang; Runlin Wang; YanHang Zhang; Xianru Chen; Yuzhong Ma; Hyunsuk Chung; Soyeon Caren Han; |
386 | AEGNN: Asynchronous Event-Based Graph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: For this reason, recent works have adopted Graph Neural Networks (GNNs), which process events as static spatio-temporal graphs, which are inherently sparse. We take this trend one step further by introducing Asynchronous, Event-based Graph Neural Networks (AEGNNs), a novel event-processing paradigm that generalizes standard GNNs to process events as evolving spatio-temporal graphs. |
Simon Schaefer; Daniel Gehrig; Davide Scaramuzza; |
387 | Layer-Wised Model Aggregation for Personalized Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel pFL training framework dubbed Layer-wised Personalized Federated learning (pFedLA) that can discern the importance of each layer from different clients, and thus is able to optimize the personalized model aggregation for clients with heterogeneous data. |
Xiaosong Ma; Jie Zhang; Song Guo; Wenchao Xu; |
388 | Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks Via Singular Values Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Polarity Sampling, a theoretically justified plug-and-play method for controlling the generation quality and diversity of any pre-trained deep generative network (DGN). |
Ahmed Imtiaz Humayun; Randall Balestriero; Richard Baraniuk; |
389 | Style-Structure Disentangled Features and Normalizing Flows for Diverse Icon Colorization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we present a colorization network that generates flat-color icons according to given sketches and semantic colorization styles. |
Yuan-kui Li; Yun-Hsuan Lien; Yu-Shuen Wang; |
390 | Object-Aware Video-Language Pre-Training for Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present Object-aware Transformers, an object-centric approach that extends video-language transformer to incorporate object representations. |
Jinpeng Wang; Yixiao Ge; Guanyu Cai; Rui Yan; Xudong Lin; Ying Shan; Xiaohu Qie; Mike Zheng Shou; |
391 | OSKDet: Orientation-Sensitive Keypoint Localization for Rotated Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an orientation-sensitive keypoint based rotated detector OSKDet. |
Dongchen Lu; Dongmei Li; Yali Li; Shengjin Wang; |
392 | MAT: Mask-Aware Transformer for Large Hole Image Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel transformer-based model for large hole inpainting, which unifies the merits of transformers and convolutions to efficiently process high-resolution images. |
Wenbo Li; Zhe Lin; Kun Zhou; Lu Qi; Yi Wang; Jiaya Jia; |
393 | Exploring Geometric Consistency for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, we design a series of geometric manipulations to diagnose existing detectors and then illustrate their vulnerability to consistently associate the depth with object apparent sizes and positions. To alleviate this issue, we propose four geometry-aware data augmentation approaches to enhance the geometric consistency of the detectors. |
Qing Lian; Botao Ye; Ruijia Xu; Weilong Yao; Tong Zhang; |
394 | Neural Window Fully-Connected CRFs for Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While recent works design increasingly complicated and powerful networks to directly regress the depth map, we take the path of CRFs optimization. |
Weihao Yuan; Xiaodong Gu; Zuozhuo Dai; Siyu Zhu; Ping Tan; |
395 | CodedVTR: Codebook-Based Sparse Voxel Transformer With Geometric Guidance Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Codebook-based Voxel TRansformer), which improves data efficiency and generalization ability for 3D sparse voxel transformers. |
Tianchen Zhao; Niansong Zhang; Xuefei Ning; He Wang; Li Yi; Yu Wang; |
396 | Uncertainty-Aware Deep Multi-View Photometric Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a simple and effective solution to the longstanding classical multi-view photometric stereo (MVPS) problem. |
Berk Kaya; Suryansh Kumar; Carlos Oliveira; Vittorio Ferrari; Luc Van Gool; |
397 | Coherent Point Drift Revisited for Non-Rigid Shape Matching and Registration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we explore a new type of extrinsic method to directly align two geometric shapes with point-to-point correspondences in ambient space by recovering a deformation, which allows more continuous and smooth maps to be obtained. |
Aoxiang Fan; Jiayi Ma; Xin Tian; Xiaoguang Mei; Wei Liu; |
398 | Unleashing Potential of Unsupervised Pre-Training With Intra-Identity Regularization for Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by the great success of self-supervised representation learning with contrastive objectives, in this paper, we design an Unsupervised Pre-training framework for ReID based on the contrastive learning (CL) pipeline, dubbed UP-ReID. |
Zizheng Yang; Xin Jin; Kecheng Zheng; Feng Zhao; |
399 | Align and Prompt: Video-and-Language Pre-Training With Entity Prompts Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Align and Prompt: an efficient and effective video-and-language pre-training framework with better cross-modal alignment. |
Dongxu Li; Junnan Li; Hongdong Li; Juan Carlos Niebles; Steven C.H. Hoi; |
400 | A Unified Query-Based Paradigm for Point Cloud Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel Embedding-Querying paradigm (EQ- Paradigm) for 3D understanding tasks including detection, segmentation and classification. |
Zetong Yang; Li Jiang; Yanan Sun; Bernt Schiele; Jiaya Jia; |
401 | It’s About Time: Analog Clock Reading in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a framework for reading analog clocks in natural images or videos. |
Charig Yang; Weidi Xie; Andrew Zisserman; |
402 | MSG-Transformer: Exchanging Local Spatial Information By Manipulating Messenger Tokens Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper aims to alleviate the conflict between efficiency and flexibility, for which we propose a specialized token for each region that serves as a messenger (MSG). |
Jiemin Fang; Lingxi Xie; Xinggang Wang; Xiaopeng Zhang; Wenyu Liu; Qi Tian; |
403 | Cross Modal Retrieval With Querybank Normalisation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Drawing inspiration from the NLP literature, we formulate a simple but effective framework called Querybank Normalisation (QB-Norm) that re-normalises query similarities to account for hubs in the embedding space. |
Simion-Vlad Bogolin; Ioana Croitoru; Hailin Jin; Yang Liu; Samuel Albanie; |
404 | Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, we found that such salience predictors cannot be easily trained when they are naively applied to contrastive learning from scratch. To address this issue, we propose contrastive dual gating(CDG), a novel dynamic pruning algorithm that skips the uninformative features during contrastive learning without hurting the trainability of the networks. |
Jian Meng; Li Yang; Jinwoo Shin; Deliang Fan; Jae-sun Seo; |
405 | Universal Photometric Stereo Network Using Global Lighting Contexts Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Unlike existing tasks that assumed specific physical lighting models; hence, drastically limited their usability, a solution algorithm of this task is supposed to work for objects with diverse shapes and materials under arbitrary lighting variations without assuming any specific models. To solve this extremely challenging task, we present a purely data-driven method, which eliminates the prior assumption of lighting by replacing the recovery of physical lighting parameters with the extraction of the generic lighting representation, named global lighting contexts. |
Satoshi Ikehata; |
406 | Hire-MLP: Vision MLP Via Hierarchical Rearrangement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents Hire-MLP, a simple yet competitive vision MLP architecture via Hierarchical rearrangement, which contains two levels of rearrangements. |
Jianyuan Guo; Yehui Tang; Kai Han; Xinghao Chen; Han Wu; Chao Xu; Chang Xu; Yunhe Wang; |
407 | Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel monocular ray-based 3D (Ray3D) absolute human pose estimation with calibrated camera. |
Yu Zhan; Fenghai Li; Renliang Weng; Wongun Choi; |
408 | Occluded Human Mesh Recovery Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Consequently, top-down methods have difficulties in recovering accurate 3D human meshes under severe person-person occlusion. To address this, we present Occluded Human Mesh Recovery (OCHMR) – a novel top-down mesh recovery approach that incorporates image spatial context to overcome the limitations of the single-human assumption. |
Rawal Khirodkar; Shashank Tripathi; Kris Kitani; |
409 | Multi-Object Tracking Meets Moving UAV Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a UAVMOT network specially for multi-object tracking in UAV views. |
Shuai Liu; Xin Li; Huchuan Lu; You He; |
410 | ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, this formulation typically treats snippets in a video as independent instances, ignoring the underlying temporal structures within and across action segments. To address this problem, we propose \system, a novel WTAL framework that enables explicit, action-aware segment modeling beyond standard MIL-based methods. |
Bo He; Xitong Yang; Le Kang; Zhiyu Cheng; Xin Zhou; Abhinav Shrivastava; |
411 | Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper introduces a probabilistic model named Uncertainty-Guided Probabilistic Transformer (UGPT) for complex action recognition. |
Hongji Guo; Hanjing Wang; Qiang Ji; |
412 | Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Following the guidelines, we propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31×31, in contrast to commonly used 3×3. |
Xiaohan Ding; Xiangyu Zhang; Jungong Han; Guiguang Ding; |
413 | End-to-End Multi-Person Pose Estimation With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. |
Dahu Shi; Xing Wei; Liangqi Li; Ye Ren; Wenming Tan; |
414 | REGTR: End-to-End Point Cloud Correspondences With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we conjecture that attention mechanisms can replace the role of explicit feature matching and RANSAC, and thus propose an end-to-end framework to directly predict the final set of correspondences. |
Zi Jian Yew; Gim Hee Lee; |
415 | Neural 3D Scene Reconstruction With The Manhattan-World Assumption Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we show that the planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. |
Haoyu Guo; Sida Peng; Haotong Lin; Qianqian Wang; Guofeng Zhang; Hujun Bao; Xiaowei Zhou; |
416 | V2C: Visual Voice Cloning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, there also exist many scenarios that cannot be well reflected by these VC tasks, such as movie dubbing, which requires the speech to be with emotions consistent with the movie plots. To fill this gap, in this work we propose a new task named Visual Voice Cloning (V2C), which seeks to convert a paragraph of text to a speech with both desired voice specified by a reference audio and desired emotion specified by a reference video. |
Qi Chen; Mingkui Tan; Yuankai Qi; Jiaqiu Zhou; Yuanqing Li; Qi Wu; |
417 | Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we revisit the average precision (AP) loss and reveal that the crucial element is that of selecting the ranking pairs between positive and negative samples. |
Dongli Xu; Jinhong Deng; Wen Li; |
418 | 3DeformRS: Certifying Spatial Deformations on Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose 3DeformRS, a method to certify the robustness of point cloud Deep Neural Networks (DNNs) against real-world deformations. |
Gabriel Pérez S.; Juan C. Pérez; Motasem Alfarra; Silvio Giancola; Bernard Ghanem; |
419 | ElePose: Unsupervised 3D Human Pose Estimation By Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Unfortunately, labeled training data does not yet exist for many human activities since 3D annotation requires dedicated motion capture systems. Therefore, we propose an unsupervised approach that learns to predict a 3D human pose from a single image while only being trained with 2D pose data, which can be crowd-sourced and is already widely available. |
Bastian Wandt; James J. Little; Helge Rhodin; |
420 |