CVPR 2022 Papers with Code/Data
Readers are also encouraged to read our CVPR 2022 highlights, which associates each CVPR-2022 paper with a one sentence highlight. You may also like to explore our “Best Paper” Digest (CVPR), which lists the most influential CVPR papers since 1988.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper. Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services on ranking, search, tracking and automatic literature review.
If you do not want to miss interesting academic papers, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: CVPR 2022 Papers with Code/Data
Paper | Author(s) | Code | |
---|---|---|---|
1 | Controllable Animation of Fluid Elements in Still Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method to interactively control the animation of fluid elements in still images to generate cinemagraphs. |
Aniruddha Mahapatra; Kuldeep Kulkarni; | code |
2 | F-SfT: Shape-From-Template With A Physics-Based Deformation Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast to previous works, this paper proposes a new SfT approach explaining 2D observations through physical simulations accounting for forces and material properties. |
Navami Kairanda; Edith Tretschk; Mohamed Elgharib; Christian Theobalt; Vladislav Golyanik; | code |
3 | TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To leverage the unlabeled data to boost model performance, we present a novel Two-Way Inter-label Self-Training framework named TWIST. |
Ruihang Chu; Xiaoqing Ye; Zhengzhe Liu; Xiao Tan; Xiaojuan Qi; Chi-Wing Fu; Jiaya Jia; | code |
4 | Do Learned Representations Respect Causal Relationships? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Data often has many semantic attributes that are causally associated with each other. But do attribute-specific learned representations of data also respect the same causal relations? We answer this question in three steps. |
Lan Wang; Vishnu Naresh Boddeti; | code |
5 | ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of generating caption given an image. In this work, we repurpose such models to generate a descriptive text given an image at inference time, without any further training or tuning step. |
Yoad Tewel; Yoav Shalev; Idan Schwartz; Lior Wolf; | code |
6 | 3D Moments From Near-Duplicate Photos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce 3D Moments, a new computational photography effect. |
Qianqian Wang; Zhengqi Li; David Salesin; Noah Snavely; Brian Curless; Janne Kontkanen; | code |
7 | Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we, for the first time to our best knowledge, propose to perform Exact Feature Distribution Matching (EFDM) by exactly matching the empirical Cumulative Distribution Functions (eCDFs) of image features, which could be implemented by applying the Exact Histogram Matching (EHM) in the image feature space. |
Yabin Zhang; Minghan Li; Ruihuang Li; Kui Jia; Lei Zhang; | code |
8 | Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simple yet efficient approach called Blind2Unblind to overcome the information loss in blindspot-driven denoising methods. |
Zejin Wang; Jiazheng Liu; Guoqing Li; Hua Han; | code |
9 | Balanced and Hierarchical Relation Learning for One-Shot Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce the balanced and hierarchical learning for our detector. |
Hanqing Yang; Sijia Cai; Hualian Sheng; Bing Deng; Jianqiang Huang; Xian-Sheng Hua; Yong Tang; Yu Zhang; | code |
10 | NICE-SLAM: Neural Implicit Scalable Encoding for SLAM Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present NICE-SLAM, a dense SLAM system that incorporates multi-level local information by introducing a hierarchical scene representation. |
Zihan Zhu; Songyou Peng; Viktor Larsson; Weiwei Xu; Hujun Bao; Zhaopeng Cui; Martin R. Oswald; Marc Pollefeys; | code |
11 | Stochastic Trajectory Prediction Via Motion Indeterminacy Diffusion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a new framework to formulate the trajectory prediction task as a reverse process of motion indeterminacy diffusion (MID), in which we progressively discard indeterminacy from all the walkable areas until reaching the desired trajectory. |
Tianpei Gu; Guangyi Chen; Junlong Li; Chunze Lin; Yongming Rao; Jie Zhou; Jiwen Lu; | code |
12 | CLRNet: Cross Layer Refinement Network for Lane Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present Cross Layer Refinement Network (CLRNet) aiming at fully utilizing both high-level and low-level features in lane detection. |
Tu Zheng; Yifei Huang; Yang Liu; Wenjian Tang; Zheng Yang; Deng Cai; Xiaofei He; | code |
13 | Motion-Aware Contrastive Video Representation Learning Via Foreground-Background Merging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such bias makes the model suffer from weak generalization ability, leading to worse performance on downstream tasks such as action recognition. To alleviate such bias, we propose Foreground-background Merging (FAME) to deliberately compose the moving foreground region of the selected video onto the static background of others. |
Shuangrui Ding; Maomao Li; Tianyu Yang; Rui Qian; Haohang Xu; Qingyi Chen; Jue Wang; Hongkai Xiong; | code |
14 | DINE: Domain Adaptation From Single and Multiple Black-Box Predictors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper studies a practical and interesting setting for UDA, where only black-box source models (i.e., only network predictions are available) are provided during adaptation in the target domain. To solve this problem, we propose a new two-step knowledge adaptation framework called DIstill and fine-tuNE (DINE). |
Jian Liang; Dapeng Hu; Jiashi Feng; Ran He; | code |
15 | FaceFormer: Speech-Driven 3D Facial Animation With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Prior works typically focus on learning phoneme-level features of short audio windows with limited context, occasionally resulting in inaccurate lip movements. To tackle this limitation, we propose a Transformer-based autoregressive model, FaceFormer, which encodes the long-term audio context and autoregressively predicts a sequence of animated 3D face meshes. |
Yingruo Fan; Zhaojiang Lin; Jun Saito; Wenping Wang; Taku Komura; | code |
16 | Rotationally Equivariant 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To incorporate object-level rotation equivariance into 3D object detectors, we need a mechanism to extract equivariant features with local object-level spatial support while being able to model cross-object context information. To this end, we propose Equivariant Object detection Network (EON) with a rotation equivariance suspension design to achieve object-level equivariance. |
Hong-Xing Yu; Jiajun Wu; Li Yi; | code |
17 | Accelerating DETR Convergence Via Semantic-Aligned Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We observe that the slow convergence is largely attributed to the complication in matching object queries with target features in different feature embedding spaces. This paper presents SAM-DETR, a Semantic-Aligned-Matching DETR that greatly accelerates DETR’s convergence without sacrificing its accuracy. |
Gongjie Zhang; Zhipeng Luo; Yingchen Yu; Kaiwen Cui; Shijian Lu; | code |
18 | Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, synthesized persons in existing datasets are mostly cartoon-like and in random dress collocation, which limits their performance. To address this, in this work, an automatic approach is proposed to directly clone the whole outfits from real-world person images to virtual 3D characters, such that any virtual person thus created will appear very similar to its real-world counterpart. |
Yanan Wang; Xuezhi Liang; Shengcai Liao; | code |
19 | GeoNeRF: Generalizing NeRF With Geometry Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present GeoNeRF, a generalizable photorealistic novel view synthesis method based on neural radiance fields. |
Mohammad Mahdi Johari; Yann Lepoittevin; François Fleuret; | code |
20 | ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel adaptive blend pyramid network, which aims to achieve fast local retouching on ultra high-resolution photos. |
Biwen Lei; Xiefan Guo; Hongyu Yang; Miaomiao Cui; Xuansong Xie; Di Huang; | code |
21 | Expanding Low-Density Latent Regions for Open-Set Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to identify unknown objects by separating high/low-density regions in the latent space, based on the consensus that unknown objects are usually distributed in low-density latent regions. |
Jiaming Han; Yuqiang Ren; Jian Ding; Xingjia Pan; Ke Yan; Gui-Song Xia; | code |
22 | Uformer: A General U-Shaped Transformer for Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present Uformer, an effective and efficient Transformer-based architecture for image restoration, in which we build a hierarchical encoder-decoder network using the Transformer block. |
Zhendong Wang; Xiaodong Cun; Jianmin Bao; Wengang Zhou; Jianzhuang Liu; Houqiang Li; | code |
23 | Exploring Dual-Task Correlation for Pose Guided Person Image Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Most of the existing methods only focus on the ill-posed source-to-target task and fail to capture reasonable texture mapping. To address this problem, we propose a novel Dual-task Pose Transformer Network (DPTN), which introduces an auxiliary task (i.e., source-tosource task) and exploits the dual-task correlation to promote the performance of PGPIG. |
Pengze Zhang; Lingxiao Yang; Jian-Huang Lai; Xiaohua Xie; | code |
24 | Portrait Eyeglasses and Shadow Removal By Leveraging 3D Synthetic Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel framework to remove eyeglasses as well as their cast shadows from face images. |
Junfeng Lyu; Zhibo Wang; Feng Xu; | code |
25 | Modeling 3D Layout for Group Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, layout ambiguity is introduced because these methods only consider the 2D layout on the imaging plane. In this paper, we overcome the above limitations by 3D layout modeling. |
Quan Zhang; Kaiheng Dang; Jian-Huang Lai; Zhanxiang Feng; Xiaohua Xie; | code |
26 | Toward Fast, Flexible, and Robust Low-Light Image Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios. |
Long Ma; Tengyu Ma; Risheng Liu; Xin Fan; Zhongxuan Luo; | code |
27 | Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a prompt-based framework, Bridge-Prompt (Br-Prompt), to model the semantics across adjacent actions, so that it simultaneously exploits both out-of-context and contextual information from a series of ordinal actions in instructional videos. |
Muheng Li; Lei Chen; Yueqi Duan; Zhilan Hu; Jianjiang Feng; Jie Zhou; Jiwen Lu; | code |
28 | HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Thus, in this work, we propose a novel 3D hand mesh estimation network HandOccNet, that can fully exploits the information at occluded regions as a secondary means to enhance image features and make it much richer. |
JoonKyu Park; Yeonguk Oh; Gyeongsik Moon; Hongsuk Choi; Kyoung Mu Lee; | code |
29 | Modular Action Concept Grounding in Semantic Video Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the idea of Mixture of Experts, we embody each abstract label by a structured combination of various visual concept learners and propose a novel video prediction model, Modular Action Concept Network (MAC). |
Wei Yu; Wenxin Chen; Songheng Yin; Steve Easterbrook; Animesh Garg; | code |
30 | StyleSwin: Transformer-Based GAN for High-Resolution Image Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we seek to explore using pure transformers to build a generative adversarial network for high-resolution image synthesis. |
Bowen Zhang; Shuyang Gu; Bo Zhang; Jianmin Bao; Dong Chen; Fang Wen; Yong Wang; Baining Guo; | code |
31 | Discrete Cosine Transform Network for Guided Depth Map Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To solve the challenges in interpreting the working mechanism, extracting cross-modal features and RGB texture over-transferred, we propose a novel Discrete Cosine Transform Network (DCTNet) to alleviate the problems from three aspects. |
Zixiang Zhao; Jiangshe Zhang; Shuang Xu; Zudi Lin; Hanspeter Pfister; | code |
32 | Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. |
Xiaoxue Chen; Tianyu Liu; Hao Zhao; Guyue Zhou; Ya-Qin Zhang; | code |
33 | TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The dominant CNN-based methods for cross-view image geo-localization rely on polar transform and fail to model global correlation. We propose a pure transformer-based approach (TransGeo) to address these limitations from a different perspective. |
Sijie Zhu; Mubarak Shah; Chen Chen; | code |
34 | Contrastive Boundary Learning for Point Cloud Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on the segmentation of scene boundaries. |
Liyao Tang; Yibing Zhan; Zhe Chen; Baosheng Yu; Dacheng Tao; | code |
35 | Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we demonstrate that it is possible to train a GAN-based SISR model which can stably generate perceptually realistic details while inhibiting visual artifacts. |
Jie Liang; Hui Zeng; Lei Zhang; | code |
36 | CVNet: Contour Vibration Network for Building Extraction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the physical vibration theory, we propose a contour vibration network (CVNet) for automatic building boundary delineation. |
Ziqiang Xu; Chunyan Xu; Zhen Cui; Xiangwei Zheng; Jian Yang; | code |
37 | Swin Transformer V2: Scaling Up Capacity and Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present techniques for scaling Swin Transformer [??] up to 3 billion parameters and making it capable of training with images of up to 1,536×1,536 resolution. |
Ze Liu; Han Hu; Yutong Lin; Zhuliang Yao; Zhenda Xie; Yixuan Wei; Jia Ning; Yue Cao; Zheng Zhang; Li Dong; Furu Wei; Baining Guo; | code |
38 | Projective Manifold Gradient Layer for Deep Rotation Regression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a manifold-aware gradient that directly backpropagates into deep network weights. |
Jiayi Chen; Yingda Yin; Tolga Birdal; Baoquan Chen; Leonidas J. Guibas; He Wang; | code |
39 | HCSC: Hierarchical Contrastive Selective Coding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In addition, the negative pairs used in these methods are not guaranteed to be semantically distinct, which could further hamper the structural correctness of learned image representations. To tackle these limitations, we propose a novel contrastive learning framework called Hierarchical Contrastive Selective Coding (HCSC). |
Yuanfan Guo; Minghao Xu; Jiawen Li; Bingbing Ni; Xuanyu Zhu; Zhenbang Sun; Yi Xu; | code |
40 | TransRank: Self-Supervised Video Representation Learning Via Ranking-Based Transformation Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on hard-label classification, existing RecogTrans approaches suffer from noisy supervision signals in pre-training. To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation. |
Haodong Duan; Nanxuan Zhao; Kai Chen; Dahua Lin; | code |
41 | DiSparse: Disentangled Sparsification for Multitask Model Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose DiSparse, a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme. |
Xinglong Sun; Ali Hassani; Zhangyang Wang; Gao Huang; Humphrey Shi; | code |
42 | Pushing The Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make A Difference Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We seek to push the limits of a simple-but-effective pipeline for real-world few-shot image classification in practice. To this end, we explore few-shot learning from the perspective of neural architecture, as well as a three stage pipeline of pre-training on external data, meta-training with labelled few-shot tasks, and task-specific fine-tuning on unseen tasks. |
Shell Xu Hu; Da Li; Jan Stühmer; Minyoung Kim; Timothy M. Hospedales; | code |
43 | Towards Efficient and Scalable Sharpness-Aware Minimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel algorithm LookSAM – that only periodically calculates the inner gradient ascent, to significantly reduce the additional training cost of SAM. |
Yong Liu; Siqi Mai; Xiangning Chen; Cho-Jui Hsieh; Yang You; | code |
44 | OSSO: Obtaining Skeletal Shape From Outside Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We address the problem of inferring the anatomic skeleton of a person, in an arbitrary pose, from the 3D surface of the body; i.e. we predict the inside (bones) from the outside (skin). |
Marilyn Keller; Silvia Zuffi; Michael J. Black; Sergi Pujades; | code |
45 | A Study on The Distribution of Social Biases in Self-Supervised Learning Visual Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the biases of a varied set of SSL visual models, trained using ImageNet data, using a method and dataset designed by psychological experts to measure social biases. |
Kirill Sirotkin; Pablo Carballeira; Marcos Escudero-Viñolo; | code |
46 | Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, instead of following previous literature, we propose Self-Supervised Predictive Learning (SSPL), a negative-free method for sound localization via explicit positive mining. |
Zengjie Song; Yuxi Wang; Junsong Fan; Tieniu Tan; Zhaoxiang Zhang; | code |
47 | Comparing Correspondences: Video Prediction With Correspondence-Wise Losses Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Image prediction methods often struggle on tasks that require changing the positions of objects, such as video prediction, producing blurry images that average over the many positions that objects might occupy. In this paper, we propose a simple change to existing image similarity metrics that makes them more robust to positional errors: we match the images using optical flow, then measure the visual similarity of corresponding pixels. |
Daniel Geng; Max Hamilton; Andrew Owens; | code |
48 | Towards Fewer Annotations: Active Learning Via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simple region-based active learning approach for semantic segmentation under a domain shift, aiming to automatically query a small partition of image regions to be labeled while maximizing segmentation performance. |
Binhui Xie; Longhui Yuan; Shuang Li; Chi Harold Liu; Xinjing Cheng; | code |
49 | CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We observe in the real world that humans are capable of mapping the visual concepts learnt from 2D images to understand the 3D world. Encouraged by this insight, we propose CrossPoint, a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations. |
Mohamed Afham; Isuru Dissanayake; Dinithi Dissanayake; Amaya Dharmasiri; Kanchana Thilakarathna; Ranga Rodrigo; | code |
50 | Few Shot Generative Model Adaption Via Relaxed Spatial Structural Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, existing methods are prone to model overfitting and collapse in extremely few shot setting (less than 10). To solve this problem, we propose a relaxed spatial structural alignment (RSSA) method to calibrate the target generative models during the adaption. |
Jiayu Xiao; Liang Li; Chaofei Wang; Zheng-Jun Zha; Qingming Huang; | code |
51 | Enhancing Adversarial Training With Second-Order Statistics of Weights Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that treating model weights as random variables allows for enhancing adversarial training through Second-Order Statistics Optimization (S^2O) with respect to the weights. |
Gaojie Jin; Xinping Yi; Wei Huang; Sven Schewe; Xiaowei Huang; | code |
52 | Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our findings motivate us to simplify MoCo v2 via the removal of its dictionary as well as momentum. |
Chaoning Zhang; Kang Zhang; Trung X. Pham; Axi Niu; Zhinan Qiao; Chang D. Yoo; In So Kweon; | code |
53 | Moving Window Regression: A Novel Approach to Ordinal Regression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A novel ordinal regression algorithm, called moving window regression (MWR), is proposed in this paper. |
Nyeong-Ho Shin; Seon-Ho Lee; Chang-Su Kim; | code |
54 | Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Different from related methods, we propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block. |
Nicolae-Cătălin Ristea; Neelu Madan; Radu Tudor Ionescu; Kamal Nasrollahi; Fahad Shahbaz Khan; Thomas B. Moeslund; Mubarak Shah; | code |
55 | Robust Optimization As Data Augmentation for Large-Scale Graphs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training. |
Kezhi Kong; Guohao Li; Mucong Ding; Zuxuan Wu; Chen Zhu; Bernard Ghanem; Gavin Taylor; Tom Goldstein; | code |
56 | Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast to the literature, we propose a family of robust structured declarative classifiers for point cloud classification, where the internal constrained optimization mechanism can effectively defend adversarial attacks through implicit gradients. |
Kaidong Li; Ziming Zhang; Cuncong Zhong; Guanghui Wang; | code |
57 | Improving The Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, prior works utilize simple image transformations such as resizing, which limits input diversity. To tackle this limitation, we propose the object-based diverse input (ODI) method that draws an adversarial image on a 3D object and induces the rendered image to be classified as the target class. |
Junyoung Byun; Seungju Cho; Myung-Joon Kwon; Hee-Seon Kim; Changick Kim; | code |
58 | ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present ObjectFolder 2.0, a large-scale, multisensory dataset of common household objects in the form of implicit neural representations that significantly enhances ObjectFolder 1.0 in three aspects. |
Ruohan Gao; Zilin Si; Yen-Yu Chang; Samuel Clarke; Jeannette Bohg; Li Fei-Fei; Wenzhen Yuan; Jiajun Wu; | code |
59 | 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a flexible framework for monocular depth estimation from high-resolution 360deg images using tangent images. |
Manuel Rey-Area; Mingze Yuan; Christian Richardt; | code |
60 | POCO: Point Convolution for Surface Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Besides, relying on fixed patch sizes may require discretization tuning. To address these issues, we propose to use point cloud convolutions and compute latent vectors at each input point. |
Alexandre Boulch; Renaud Marlet; | code |
61 | Neural Texture Extraction and Distribution for Controllable Person Image Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Observing that person images are highly structured, we propose to generate desired images by extracting and distributing semantic entities of reference images. |
Yurui Ren; Xiaoqing Fan; Ge Li; Shan Liu; Thomas H. Li; | code |
62 | Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Today’s VidSGG models are all proposal-based methods, i.e., they first generate numerous paired subject-object snippets as proposals, and then conduct predicate classification for each proposal. In this paper, we argue that this prevalent proposal-based framework has three inherent drawbacks: 1) The ground-truth predicate labels for proposals are partially correct. |
Kaifeng Gao; Long Chen; Yulei Niu; Jian Shao; Jun Xiao; | code |
63 | DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN). |
Ming Tao; Hao Tang; Fei Wu; Xiao-Yuan Jing; Bing-Kun Bao; Changsheng Xu; | code |
64 | ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we take a step towards computer-aided waste detection and present the first in-the-wild industrial-grade waste detection and segmentation dataset, ZeroWaste. |
Dina Bashkirova; Mohamed Abdelfattah; Ziliang Zhu; James Akl; Fadi Alladkani; Ping Hu; Vitaly Ablavsky; Berk Calli; Sarah Adel Bargal; Kate Saenko; | code |
65 | UNIST: Unpaired Neural Implicit Shape Translation Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce UNIST, the first deep neural implicit model for general-purpose, unpaired shape-to-shape translation, in both 2D and 3D domains. |
Qimin Chen; Johannes Merz; Aditya Sanghi; Hooman Shayani; Ali Mahdavi-Amiri; Hao Zhang; | code |
66 | APES: Articulated Part Extraction From Sprite Sheets Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Creating these puppets requires partitioning characters into independently moving parts. In this work, we present a method to automatically identify such articulated parts from a small set of character poses shown in a sprite sheet, which is an illustration of the character that artists often draw before puppet creation. |
Zhan Xu; Matthew Fisher; Yang Zhou; Deepali Aneja; Rushikesh Dudhat; Li Yi; Evangelos Kalogerakis; | code |
67 | SPAct: Self-Supervised Privacy Preservation for Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For the first time, we present a novel training framework which removes privacy information from input video in a self-supervised manner without requiring privacy labels. |
Ishan Rajendrakumar Dave; Chen Chen; Mubarak Shah; | code |
68 | De-Rendering 3D Objects in The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a weakly supervised method that is able to decompose a single image of an object into shape (depth and normals), material (albedo, reflectivity and shininess) and global lighting parameters. |
Felix Wimbauer; Shangzhe Wu; Christian Rupprecht; | code |
69 | Global Sensing and Measurements Reuse for Image Compressed Sensing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, using measurements only once may not be enough for extracting richer information from measurements. To address these issues, we propose a novel Measurements Reuse Convolutional Compressed Sensing Network (MR-CCSNet) which employs Global Sensing Module (GSM) to collect all level features for achieving an efficient sensing and Measurements Reuse Block (MRB) to reuse measurements multiple times on multi-scale. |
Zi-En Fan; Feng Lian; Jia-Ni Quan; | code |
70 | Practical Evaluation of Adversarial Robustness Via Adaptive Auto Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A practical evaluation method should be convenient (i.e., parameter-free), efficient (i.e., fewer iterations) and reliable (i.e., approaching the lower bound of robustness). Towards this target, we propose a parameter-free Adaptive Auto Attack (A3) evaluation method which addresses the efficiency and reliability in a test-time-training fashion. |
Ye Liu; Yaya Cheng; Lianli Gao; Xianglong Liu; Qilong Zhang; Jingkuan Song; | code |
71 | Cross-View Transformers for Real-Time Map-View Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present cross-view transformers, an efficient attention-based model for map-view semantic segmentation from multiple cameras. |
Brady Zhou; Philipp Krähenbühl; | code |
72 | Controllable Dynamic Multi-Task Architectures Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This challenge motivates models which allow control over the relative importance of tasks and total compute cost during inference time. In this work, we propose such a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. |
Dripta S. Raychaudhuri; Yumin Suh; Samuel Schulter; Xiang Yu; Masoud Faraki; Amit K. Roy-Chowdhury; Manmohan Chandraker; | code |
73 | FastDOG: Fast Discrete Optimization on GPU Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a massively parallel Lagrange decomposition method for solving 0–1 integer linear programs occurring in structured prediction. |
Ahmed Abbas; Paul Swoboda; | code |
74 | Focal and Global Knowledge Distillation for Detectors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we point out that in object detection, the features of the teacher and student vary greatly in different areas, especially in the foreground and background. |
Zhendong Yang; Zhe Li; Xiaohu Jiang; Yuan Gong; Zehuan Yuan; Danpei Zhao; Chun Yuan; | code |
75 | Learning To Prompt for Continual Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Typical methods rely on a rehearsal buffer or known task identity at test time to retrieve learned knowledge and address forgetting, while this work presents a new paradigm for continual learning that aims to train a more succinct memory system without accessing task identity at test time. |
Zifeng Wang; Zizhao Zhang; Chen-Yu Lee; Han Zhang; Ruoxi Sun; Xiaoqi Ren; Guolong Su; Vincent Perot; Jennifer Dy; Tomas Pfister; | code |
76 | Human Mesh Recovery From Multiple Shots Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the richness of data comes at the expense of fundamental challenges such as abrupt shot changes and close up shots of actors with heavy truncation, which limits the applicability of existing 3D human understanding methods. In this paper, we address these limitations with the insight that while shot changes of the same scene incur a discontinuity between frames, the 3D structure of the scene still changes smoothly. |
Georgios Pavlakos; Jitendra Malik; Angjoo Kanazawa; | code |
77 | Convolution of Convolution: Let Kernels Spatially Collaborate Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In the biological visual pathway, especially the retina, neurons are tiled along spatial dimensions with the electrical coupling as their local association, while in a convolution layer, kernels are placed along the channel dimension singly. We propose Convolution of Convolution, associating kernels in a layer and letting them collaborate spatially. |
Rongzhen Zhao; Jian Li; Zhenzhi Wu; | code |
78 | Make It Move: Controllable Image-to-Video Generation With Text Descriptions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The key challenges of TI2V task lie both in aligning appearance and motion from different modalities, and in handling uncertainty in text descriptions. To address these challenges, we propose a Motion Anchor-based video GEnerator (MAGE) with an innovative motion anchor (MA) structure to store appearance-motion aligned representation. |
Yaosi Hu; Chong Luo; Zhenzhong Chen; | code |
79 | Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Neural Points, a novel point cloud representation and apply it to the arbitrary-factored upsampling task. |
Wanquan Feng; Jin Li; Hongrui Cai; Xiaonan Luo; Juyong Zhang; | code |
80 | Video-Text Representation Learning Via Differentiable Weak Temporal Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel multi-modal self-supervised framework Video-Text Temporally Weak Alignment-based Contrastive Learning (VT-TWINS) to capture significant information from noisy and weakly correlated data using a variant of Dynamic Time Warping (DTW). |
Dohwan Ko; Joonmyung Choi; Juyeon Ko; Shinyeong Noh; Kyoung-Woon On; Eun-Sol Kim; Hyunwoo J. Kim; | code |
81 | Bi-Directional Object-Context Prioritization Learning for Saliency Ranking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This inspires us to model the region-level interactions, in addition to the object-level reasoning, for saliency ranking. To this end, we propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking. |
Xin Tian; Ke Xu; Xin Yang; Lin Du; Baocai Yin; Rynson W.H. Lau; | code |
82 | Vehicle Trajectory Prediction Works, But Not Everywhere Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel method that automatically generates realistic scenes causing state-of-the-art models to go off-road. |
Mohammadhossein Bahari; Saeed Saadatnejad; Ahmad Rahimi; Mohammad Shaverdikondori; Amir Hossein Shahidzadeh; Seyed-Mohsen Moosavi-Dezfooli; Alexandre Alahi; | code |
83 | MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Some existing methods leverage depth information from an off-the-shelf depth estimator to assist 3D detection, but suffer from the additional computational burden and achieve limited performance caused by inaccurate depth priors. To alleviate this, we propose MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection. |
Kuan-Chih Huang; Tsung-Han Wu; Hung-Ting Su; Winston H. Hsu; | code |
84 | Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents new hierarchically cascaded transformers that can improve data efficiency through attribute surrogates learning and spectral tokens pooling. |
Yangji He; Weihan Liang; Dongyang Zhao; Hong-Yu Zhou; Weifeng Ge; Yizhou Yu; Wenqiang Zhang; | code |
85 | Generalized Category Discovery Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set. |
Sagar Vaze; Kai Han; Andrea Vedaldi; Andrew Zisserman; | code |
86 | Contour-Hugging Heatmaps for Landmark Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an effective and easy-to-implement method for simultaneously performing landmark detection in images and obtaining an ingenious uncertainty measurement for each landmark. |
James McCouat; Irina Voiculescu; | code |
87 | Voxel Field Fusion for 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion. |
Yanwei Li; Xiaojuan Qi; Yukang Chen; Liwei Wang; Zeming Li; Jian Sun; Jiaya Jia; | code |
88 | DisARM: Displacement Aware Relation Module for 3D Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Displacement Aware Relation Module (DisARM), a novel neural network module for enhancing the performance of 3D object detection in point cloud scenes. |
Yao Duan; Chenyang Zhu; Yuqing Lan; Renjiao Yi; Xinwang Liu; Kai Xu; | code |
89 | MixFormer: Mixing Features Across Windows and Dimensions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While local-window self-attention performs notably in vision tasks, it suffers from limited receptive field and weak modeling capability issues. This is mainly because it performs self-attention within non-overlapped windows and shares weights on the channel dimension. We propose MixFormer to find a solution. |
Qiang Chen; Qiman Wu; Jian Wang; Qinghao Hu; Tao Hu; Errui Ding; Jian Cheng; Jingdong Wang; | code |
90 | FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Specifically, we propose to parse pairwise query and exemplar action instances into consecutive steps with diverse semantic and temporal correspondences. |
Jinglin Xu; Yongming Rao; Xumin Yu; Guangyi Chen; Jie Zhou; Jiwen Lu; | code |
91 | HEAT: Holistic Edge Attention Transformer for Structured Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel attention-based neural network for structured reconstruction, which takes a 2D raster image as an input and reconstructs a planar graph depicting an underlying geometric structure. |
Jiacheng Chen; Yiming Qian; Yasutaka Furukawa; | code |
92 | Mobile-Former: Bridging MobileNet and Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present Mobile-Former, a parallel design of MobileNet and transformer with a two-way bridge in between. |
Yinpeng Chen; Xiyang Dai; Dongdong Chen; Mengchen Liu; Xiaoyi Dong; Lu Yuan; Zicheng Liu; | code |
93 | CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the difficulties, we propose a new framework for scribble learning-based medical image segmentation, which is composed of mix augmentation and cycle consistency and thus is referred to as CycleMix. |
Ke Zhang; Xiahai Zhuang; | code |
94 | VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most of them only support a fixed up-sampling scale, which limits their flexibility and applications. In this work, instead of following the discrete representations, we propose Video Implicit Neural Representation (VideoINR), and we show its applications for STVSR. |
Zeyuan Chen; Yinbo Chen; Jingwen Liu; Xingqian Xu; Vidit Goel; Zhangyang Wang; Humphrey Shi; Xiaolong Wang; | code |
95 | Towards End-to-End Unified Scene Text Detection and Layout Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Scene text detection and document layout analysis have long been treated as two separate tasks in different image domains. In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis. |
Shangbang Long; Siyang Qin; Dmitry Panteleev; Alessandro Bissacco; Yasuhisa Fujii; Michalis Raptis; | code |
96 | AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an autoregressive prior for 3D shapes to solve multimodal 3D tasks such as shape completion, reconstruction, and generation. |
Paritosh Mittal; Yen-Chi Cheng; Maneesh Singh; Shubham Tulsiani; | code |
97 | ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we first show that optimal neural architectures in the DIP framework are image-dependent. Leveraging this insight, we then propose an image-specific NAS strategy for the DIP framework that requires substantially less training than typical NAS approaches, effectively enabling image-specific NAS. |
Metin Ersin Arican; Ozgur Kara; Gustav Bredell; Ender Konukoglu; | code |
98 | End-to-End Referring Video Object Segmentation With Multimodal Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simple Transformer-based approach to RVOS. |
Adam Botach; Evgenii Zheltonozhskii; Chaim Baskin; | code |
99 | IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present IterMVS, a new data-driven method for high-resolution multi-view stereo. |
Fangjinhua Wang; Silvano Galliani; Christoph Vogel; Marc Pollefeys; | code |
100 | Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In particular, the foreground points are inherently more important than background points for object detectors. Motivated by this, we propose a highly-efficient single-stage point-based 3D detector in this paper, termed IA-SSD. |
Yifan Zhang; Qingyong Hu; Guoquan Xu; Yanxin Ma; Jianwei Wan; Yulan Guo; | code |
101 | Detecting Camouflaged Object in Frequency Domain Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To well involve the frequency clues into the CNN models, we present a powerful network with two special components. |
Yijie Zhong; Bo Li; Lv Tang; Senyun Kuang; Shuang Wu; Shouhong Ding; | code |
102 | SelfRecon: Self Reconstruction Your Digital Avatar From Monocular Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose SelfRecon, a clothed human body reconstruction method that combines implicit and explicit representations to recover space-time coherent geometries from a monocular self-rotating human video. |
Boyi Jiang; Yang Hong; Hujun Bao; Juyong Zhang; | code |
103 | Equivariant Point Cloud Analysis Via Learning Orientations for Message Passing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel and simple framework to achieve equivariance for point cloud analysis based on the message passing (graph neural network) scheme. |
Shitong Luo; Jiahan Li; Jiaqi Guan; Yufeng Su; Chaoran Cheng; Jian Peng; Jianzhu Ma; | code |
104 | Node Representation Learning in Graph Via Node-to-Neighbourhood Mutual Information Maximization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a simple-yet-effective self-supervised node representation learning strategy via directly maximizing the mutual information between the hidden representations of nodes and their neighbourhood, which can be theoretically justified by its link to graph smoothing. |
Wei Dong; Junsheng Wu; Yi Luo; Zongyuan Ge; Peng Wang; | code |
105 | Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This is called inner-video overfitting, and it would actually lead to inferior performance. To tackle this issue, we propose a novel inter-frame feature reconstruction (IFR) technique to leverage the ground-truth labels to supervise the model training on unlabeled frames. |
Jiafan Zhuang; Zilei Wang; Yuan Gao; | code |
106 | Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization With A Bayesian Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This task is particularly challenging for deep neural networks because data is difficult to obtain and annotate. Therefore, we formulate amodal segmentation as an out-of-task and out-of-distribution generalization problem. |
Yihong Sun; Adam Kortylewski; Alan Yuille; | code |
107 | How Well Do Sparse ImageNet Models Transfer? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Generally, more accurate models on the "upstream" dataset tend to provide better transfer accuracy "downstream". In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned–that is, compressed by sparsifiying their connections. |
Eugenia Iofinova; Alexandra Peste; Mark Kurtz; Dan Alistarh; | code |
108 | REX: Reasoning-Aware and Grounded Explanation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims to close the gap from three distinct perspectives: first, we define a new type of multi-modal explanations that explain the decisions by progressively traversing the reasoning process and grounding keywords in the images. |
Shi Chen; Qi Zhao; | code |
109 | Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In the work, we disentangle the direct offset into Local Canonical Coordinates (LCC), box scales and box orientations. |
Yang You; Zelin Ye; Yujing Lou; Chengkun Li; Yong-Lu Li; Lizhuang Ma; Weiming Wang; Cewu Lu; | code |
110 | Object-Aware Video-Language Pre-Training for Retrieval Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present Object-aware Transformers, an object-centric approach that extends video-language transformer to incorporate object representations. |
Jinpeng Wang; Yixiao Ge; Guanyu Cai; Rui Yan; Xudong Lin; Ying Shan; Xiaohu Qie; Mike Zheng Shou; | code |
111 | MAT: Mask-Aware Transformer for Large Hole Image Inpainting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel transformer-based model for large hole inpainting, which unifies the merits of transformers and convolutions to efficiently process high-resolution images. |
Wenbo Li; Zhe Lin; Kun Zhou; Lu Qi; Yi Wang; Jiaya Jia; | code |
112 | Align and Prompt: Video-and-Language Pre-Training With Entity Prompts Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Align and Prompt: an efficient and effective video-and-language pre-training framework with better cross-modal alignment. |
Dongxu Li; Junnan Li; Hongdong Li; Juan Carlos Niebles; Steven C.H. Hoi; | code |
113 | MSG-Transformer: Exchanging Local Spatial Information By Manipulating Messenger Tokens Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims to alleviate the conflict between efficiency and flexibility, for which we propose a specialized token for each region that serves as a messenger (MSG). |
Jiemin Fang; Lingxi Xie; Xinggang Wang; Xiaopeng Zhang; Wenyu Liu; Qi Tian; | code |
114 | Cross Modal Retrieval With Querybank Normalisation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Drawing inspiration from the NLP literature, we formulate a simple but effective framework called Querybank Normalisation (QB-Norm) that re-normalises query similarities to account for hubs in the embedding space. |
Simion-Vlad Bogolin; Ioana Croitoru; Hailin Jin; Yang Liu; Samuel Albanie; | code |
115 | Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel monocular ray-based 3D (Ray3D) absolute human pose estimation with calibrated camera. |
Yu Zhan; Fenghai Li; Renliang Weng; Wongun Choi; | code |
116 | ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, this formulation typically treats snippets in a video as independent instances, ignoring the underlying temporal structures within and across action segments. To address this problem, we propose \system, a novel WTAL framework that enables explicit, action-aware segment modeling beyond standard MIL-based methods. |
Bo He; Xitong Yang; Le Kang; Zhiyu Cheng; Xin Zhou; Abhinav Shrivastava; | code |
117 | Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Following the guidelines, we propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31×31, in contrast to commonly used 3×3. |
Xiaohan Ding; Xiangyu Zhang; Jungong Han; Guiguang Ding; | code |
118 | End-to-End Multi-Person Pose Estimation With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. |
Dahu Shi; Xing Wei; Liangqi Li; Ye Ren; Wenming Tan; | code |
119 | REGTR: End-to-End Point Cloud Correspondences With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we conjecture that attention mechanisms can replace the role of explicit feature matching and RANSAC, and thus propose an end-to-end framework to directly predict the final set of correspondences. |
Zi Jian Yew; Gim Hee Lee; | code |
120 | Neural 3D Scene Reconstruction With The Manhattan-World Assumption Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we show that the planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. |
Haoyu Guo; Sida Peng; Haotong Lin; Qianqian Wang; Guofeng Zhang; Hujun Bao; Xiaowei Zhou; | code |
121 | V2C: Visual Voice Cloning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, there also exist many scenarios that cannot be well reflected by these VC tasks, such as movie dubbing, which requires the speech to be with emotions consistent with the movie plots. To fill this gap, in this work we propose a new task named Visual Voice Cloning (V2C), which seeks to convert a paragraph of text to a speech with both desired voice specified by a reference audio and desired emotion specified by a reference video. |
Qi Chen; Mingkui Tan; Yuankai Qi; Jiaqiu Zhou; Yuanqing Li; Qi Wu; | code |
122 | Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we revisit the average precision (AP) loss and reveal that the crucial element is that of selecting the ranking pairs between positive and negative samples. |
Dongli Xu; Jinhong Deng; Wen Li; | code |
123 | MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present MAD (Movie Audio Descriptions), a novel benchmark that departs from the paradigm of augmenting existing video datasets with text annotations and focuses on crawling and aligning available audio descriptions of mainstream movies. |
Mattia Soldan; Alejandro Pardo; Juan León Alcázar; Fabian Caba; Chen Zhao; Silvio Giancola; Bernard Ghanem; | code |
124 | Gait Recognition in The Wild With Dense 3D Representations and A Benchmark Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In particular, we propose a novel framework to explore the 3D Skinned Multi-Person Linear (SMPL) model of the human body for gait recognition, named SMPLGait. |
Jinkai Zheng; Xinchen Liu; Wu Liu; Lingxiao He; Chenggang Yan; Tao Mei; | code |
125 | ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation Via Online Exploration and Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, constructing both valid and diverse hand-object interactions and efficiently learning from the vast synthetic data is still challenging. To address the above issues, we propose ArtiBoost, a lightweight online data enhancement method. |
Lixin Yang; Kailin Li; Xinyu Zhan; Jun Lv; Wenqiang Xu; Jiefeng Li; Cewu Lu; | code |
126 | QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To get the best of two worlds, we propose QueryDet that uses a novel query mechanism to accelerate the inference speed of feature-pyramid based object detectors. |
Chenhongyi Yang; Zehao Huang; Naiyan Wang; | code |
127 | IDEA-Net: Dynamic 3D Point Cloud Interpolation Via Deep Embedding Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To tackle the challenges, we propose IDEA-Net, an end-to-end deep learning framework, which disentangles the problem under the assistance of the explicitly learned temporal consistency. |
Yiming Zeng; Yue Qian; Qijian Zhang; Junhui Hou; Yixuan Yuan; Ying He; | code |
128 | BEHAVE: Dataset and Method for Tracking Human Object Interactions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our key insight is to predict correspondences from the human and the object to a statistical body model to obtain human-object contacts during interactions. |
Bharat Lal Bhatnagar; Xianghui Xie; Ilya A. Petrov; Cristian Sminchisescu; Christian Theobalt; Gerard Pons-Moll; | code |
129 | Revisiting Random Channel Pruning for Neural Network Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we try to determine the channel configuration of the pruned models by random search. |
Yawei Li; Kamil Adamczewski; Wen Li; Shuhang Gu; Radu Timofte; Luc Van Gool; | code |
130 | Generating Diverse and Natural 3D Human Motions From Text Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Instead of directly engaging with pose sequences, we propose motion snippet code as our internal motion representation, which captures local semantic motion contexts and is empirically shown to facilitate the generation of plausible motions faithful to the input text. |
Chuan Guo; Shihao Zou; Xinxin Zuo; Sen Wang; Wei Ji; Xingyu Li; Li Cheng; | code |
131 | E-CIR: Event-Enhanced Continuous Intensity Recovery Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents E-CIR, which converts a blurry image into a sharp video represented as a parametric function from time to intensity. |
Chen Song; Qixing Huang; Chandrajit Bajaj; | code |
132 | Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A systematic evaluation of key modules in existing methods is performed in terms of their robustness against adversarial attacks. From the insights of our analysis, we construct a more robust deraining method by integrating these effective modules. |
Yi Yu; Wenhan Yang; Yap-Peng Tan; Alex C. Kot; | code |
133 | Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a keypoint-based object-level SLAM framework that can provide globally consistent 6DoF pose estimates for symmetric and asymmetric objects alike. |
Nathaniel Merrill; Yuliang Guo; Xingxing Zuo; Xinyu Huang; Stefan Leutenegger; Xi Peng; Liu Ren; Guoquan Huang; | code |
134 | AziNorm: Exploiting The Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Point cloud, the most important data format for 3D environmental perception, is naturally endowed with strong radial symmetry. In this work, we exploit this radial symmetry via a divide-and-conquer strategy to boost 3D perception performance and ease optimization. |
Shaoyu Chen; Xinggang Wang; Tianheng Cheng; Wenqiang Zhang; Qian Zhang; Chang Huang; Wenyu Liu; | code |
135 | Weakly Supervised Rotation-Invariant Aerial Object Detection Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Meanwhile, current solutions have been prone to fall into the issue with unstable detectors, as they ignore lower-scored instances and may regard them as backgrounds. To address these issues, in this paper, we construct a novel end-to-end weakly supervised Rotation-Invariant aerial object detection Network (RINet). |
Xiaoxu Feng; Xiwen Yao; Gong Cheng; Junwei Han; | code |
136 | Surface Reconstruction From Point Clouds By Learning Predictive Context Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, this requires the local context prior to generalize to a wide variety of unseen target regions, which is hard to achieve. To resolve this issue, we introduce Predictive Context Priors by learning Predictive Queries for each specific point cloud at inference time. |
Baorui Ma; Yu-Shen Liu; Matthias Zwicker; Zhizhong Han; | code |
137 | IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, our intuition is that the long-range attention learned by transformer architectures is ideally suited to solve longstanding challenges in single-image inverse rendering. |
Rui Zhu; Zhengqin Li; Janarbek Matai; Fatih Porikli; Manmohan Chandraker; | code |
138 | DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To that end, we propose the DynamicEarthNet dataset that consists of daily, multi-spectral satellite observations of 75 selected areas of interest distributed over the globe with imagery from Planet Labs. |
Aysim Toker; Lukas Kondmann; Mark Weber; Marvin Eisenberger; Andrés Camero; Jingliang Hu; Ariadna Pregel Hoderlein; Çağlar Şenaras; Timothy Davis; Daniel Cremers; Giovanni Marchisio; Xiao Xiang Zhu; Laura Leal-Taixé; | code |
139 | Weakly Supervised Temporal Action Localization Via Representative Snippet Knowledge Propagation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Many existing methods seek to generate pseudo labels for bridging the discrepancy between classification and localization, but usually only make use of limited contextual information for pseudo label generation. To alleviate this problem, we propose a representative snippet summarization and propagation framework. |
Linjiang Huang; Liang Wang; Hongsheng Li; | code |
140 | E2EC: An End-to-End Contour-Based Method for High-Quality High-Speed Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a novel contour-based method, named E2EC, for high-quality instance segmentation. |
Tao Zhang; Shiqing Wei; Shunping Ji; | code |
141 | BatchFormer: Learning To Explore Sample Relationships for Robust Representation Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the above-mentioned issues, a variety of methods have been devised to explore the sample relationships in a vanilla way (i.e., from the perspectives of either the input or the loss function), failing to explore the internal structure of deep neural networks for learning with sample relationships. Inspired by this, we propose to enable deep neural networks themselves with the ability to learn the sample relationships from each mini-batch. |
Zhi Hou; Baosheng Yu; Dacheng Tao; | code |
142 | Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the classifier focuses only on the discriminative regions while ignoring other useful information in each image, resulting in incomplete localization maps. To address this issue, we propose a Self-supervised Image-specific Prototype Exploration (SIPE) that consists of an Image-specific Prototype Exploration (IPE) and a General-Specific Consistency (GSC) loss. |
Qi Chen; Lingxiao Yang; Jian-Huang Lai; Xiaohua Xie; | code |
143 | Learning Multi-View Aggregation in The Wild for Large-Scale 3D Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast, we propose an end-to-end trainable multi-view aggregation model leveraging the viewing conditions of 3D points to merge features from images taken at arbitrary positions. |
Damien Robert; Bruno Vallet; Loic Landrieu; | code |
144 | PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: These methods can be negatively influenced by strong illumination conditions causing shading-reflectance leakages. Therefore, in this paper, an end-to-end edge-driven hybrid CNN approach is proposed for intrinsic image decomposition. |
Partha Das; Sezer Karaoglu; Theo Gevers; | code |
145 | Clothes-Changing Person Re-Identification With RGB Modality Only Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Clothes-based Adversarial Loss (CAL) to mine clothes-irrelevant features from the original RGB images by penalizing the predictive power of re-id model w.r.t. clothes. |
Xinqian Gu; Hong Chang; Bingpeng Ma; Shutao Bai; Shiguang Shan; Xilin Chen; | code |
146 | Robust Image Forgery Detection Over Online Social Network Shared Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To fight against the OSN-shared forgeries, in this work, a novel robust training scheme is proposed. |
Haiwei Wu; Jiantao Zhou; Jinyu Tian; Jun Liu; | code |
147 | Representation Compensation Networks for Continual Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we study the continual semantic segmentation problem, where the deep neural networks are required to incorporate new classes continually without catastrophic forgetting. |
Chang-Bin Zhang; Jia-Wen Xiao; Xialei Liu; Ying-Cong Chen; Ming-Ming Cheng; | code |
148 | Tracking People By Predicting 3D Appearance, Location and Pose Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an approach for tracking people in monocular videos by predicting their future 3D representations. |
Jathushan Rajasegaran; Georgios Pavlakos; Angjoo Kanazawa; Jitendra Malik; | code |
149 | Text2Mesh: Text-Driven Neural Stylization for Meshes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we develop intuitive controls for editing the style of 3D objects. |
Oscar Michel; Roi Bar-On; Richard Liu; Sagie Benaim; Rana Hanocka; | code |
150 | C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we find that there are mainly two challenges of medical images in WSSS: i) the boundary of object foreground and background is not clear; ii) the co-occurrence phenomenon is very severe in training stage. We thus propose a Causal CAM (C-CAM) method to overcome the above challenges. |
Zhang Chen; Zhiqiang Tian; Jihua Zhu; Ce Li; Shaoyi Du; | code |
151 | Forward Compatible Few-Shot Class-Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: By contrast, we suggest learning prospectively to prepare for future updates, and propose ForwArd Compatible Training (FACT) for FSCIL. |
Da-Wei Zhou; Fu-Yun Wang; Han-Jia Ye; Liang Ma; Shiliang Pu; De-Chuan Zhan; | code |
152 | Weakly Supervised Object Localization As Domain Adaption Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the MIL mechanism makes CAM only activate discriminative object parts rather than the whole object, weakening its performance for localizing objects. To avoid this problem, this work provides a novel perspective that models WSOL as a domain adaption (DA) task, where the score estimator trained on the source/image domain is tested on the target/pixel domain to locate objects. |
Lei Zhu; Qi She; Qian Chen; Yunfei You; Boyu Wang; Yanye Lu; | code |
153 | Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the Tencent-MVSE dataset, which is the first benchmark dataset for the multi-modal video similarity evaluation task. |
Zhaoyang Zeng; Yongsheng Luo; Zhenhua Liu; Fengyun Rao; Dian Li; Weidong Guo; Zhen Wen; | code |
154 | Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Meanwhile, recent advances in the functional map framework allow to enforce orientation preservation using a functional representation for tangent vector field transfer, through so-called complex functional maps. Using this representation, we propose a new deep learning approach to learn orientation-aware features in a fully unsupervised setting. |
Nicolas Donati; Etienne Corman; Maks Ovsjanikov; | code |
155 | Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel tree energy loss for SASS by providing semantic guidance for unlabeled pixels. |
Zhiyuan Liang; Tiancai Wang; Xiangyu Zhang; Jian Sun; Jianbing Shen; | code |
156 | MatteFormer: Transformer-Based Image Matting Via Prior-Tokens Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a transformer-based image matting model called MatteFormer, which takes full advantage of trimap information in the transformer block. |
GyuTae Park; SungJoon Son; JaeYoung Yoo; SeHo Kim; Nojun Kwak; | code |
157 | Video Shadow Detection Via Spatio-Temporal Interpolation Consistency Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Using a model trained on labeled images to the video frames directly may lead to high generalization error and temporal inconsistent results. In this paper, we address these challenges by proposing a Spatio-Temporal Interpolation Consistency Training (STICT) framework to rationally feed the unlabeled video frames together with the labeled images into an image shadow detection network training. |
Xiao Lu; Yihong Cao; Sheng Liu; Chengjiang Long; Zipei Chen; Xuanyu Zhou; Yimin Yang; Chunxia Xiao; | code |
158 | Robust and Accurate Superquadric Recovery: A Probabilistic Approach Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The superquadric recovery is formulated as a Maximum Likelihood Estimation (MLE) problem. We propose an algorithm, Expectation, Maximization, and Switching (EMS), to solve this problem, where: (1) outliers are predicted from the posterior perspective; (2) the superquadric parameter is optimized by the trust-region reflective algorithm; and (3) local optima are avoided by globally searching and switching among parameters encoding similar superquadrics. |
Weixiao Liu; Yuwei Wu; Sipu Ruan; Gregory S. Chirikjian; | code |
159 | Grounding Answers for Visual Questions Asked By Visually Impaired People Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce the VizWiz-VQA-Grounding dataset, the first dataset that visually grounds answers to visual questions asked by people with visual impairments. |
Chongyan Chen; Samreen Anjum; Danna Gurari; | code |
160 | Sparse Instance Activation for Real-Time Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation. |
Tianheng Cheng; Xinggang Wang; Shaoyu Chen; Wenqiang Zhang; Qian Zhang; Chang Huang; Zhaoxiang Zhang; Wenyu Liu; | code |
161 | VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose VisualGPT, which employs a novel self-resurrecting encoder-decoder attention mechanism to quickly adapt the PLM with a small amount of in-domain image-text data. |
Jun Chen; Han Guo; Kai Yi; Boyang Li; Mohamed Elhoseiny; | code |
162 | MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, those works ignore the fact that it is an inverse problem where multiple feasible solutions (i.e., hypotheses) exist. To relieve this limitation, we propose a Multi-Hypothesis Transformer (MHFormer) that learns spatio-temporal representations of multiple plausible pose hypotheses. |
Wenhao Li; Hong Liu; Hao Tang; Pichao Wang; Luc Van Gool; | code |
163 | Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new method for reconstructing controllable implicit 3D human models from sparse multi-view RGB videos. |
Tianhan Xu; Yasuhiro Fujita; Eiichi Matsumoto; | code |
164 | Towards Implicit Text-Guided 3D Shape Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we explore the challenging task of generating 3D shapes from text. |
Zhengzhe Liu; Yi Wang; Xiaojuan Qi; Chi-Wing Fu; | code |
165 | SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose SoftCollage, a novel method that employs a neural-based differentiable probabilistic tree generator to produce the probability distribution of correlation-preserving collage tree conditioned on deep image feature, aspect ratio and canvas size. |
Jiahao Yu; Li Chen; Mingrui Zhang; Mading Li; | code |
166 | Query and Attention Augmentation for Knowledge-Based Explainable Reasoning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To bridge this research gap, we present Query and Attention Augmentation, a general approach that augments neural module networks to jointly reason about visual and external knowledge. |
Yifeng Zhang; Ming Jiang; Qi Zhao; | code |
167 | Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which we call Winoground. |
Tristan Thrush; Ryan Jiang; Max Bartolo; Amanpreet Singh; Adina Williams; Douwe Kiela; Candace Ross; | code |
168 | Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The main challenge of this task is perceiving various temporal variations of diverse event boundaries. To this end, this paper presents an effective and end-to-end learnable framework (DDM-Net). |
Jiaqi Tang; Zhaoyang Liu; Chen Qian; Wayne Wu; Limin Wang; | code |
169 | Fine-Grained Object Classification Via Self-Supervised Pose Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For discounting pose variations, this paper proposes to learn a novel graph based object representation to reveal a global configuration of local parts for self-supervised pose alignment across classes, which is employed as an auxiliary feature regularization on a deep representation learning network. |
Xuhui Yang; Yaowei Wang; Ke Chen; Yong Xu; Yonghong Tian; | code |
170 | Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, existing animal behavior datasets have limitations in multiple aspects, including limited numbers of animal classes, data samples and provided tasks, and also limited variations in environmental conditions and viewpoints. To address these limitations, we create a large and diverse dataset, Animal Kingdom, that provides multiple annotated tasks to enable a more thorough understanding of natural animal behaviors. |
Xun Long Ng; Kian Eng Ong; Qichen Zheng; Yun Ni; Si Yong Yeo; Jun Liu; | code |
171 | Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in WSAL and helps identify coherent action instances. |
Junyu Gao; Mengyuan Chen; Changsheng Xu; | code |
172 | Relieving Long-Tailed Instance Segmentation Via Pairwise Class Balance Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore to excavate the confusion matrix, which carries the fine-grained misclassification details, to relieve the pairwise biases, generalizing the coarse one. |
Yin-Yin He; Peizhen Zhang; Xiu-Shen Wei; Xiangyu Zhang; Jian Sun; | code |
173 | Online Convolutional Re-Parameterization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present online convolutional re-parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution. |
Mu Hu; Junyi Feng; Jiashen Hua; Baisheng Lai; Jianqiang Huang; Xiaojin Gong; Xian-Sheng Hua; | code |
174 | Mimicking The Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We find that, with fewer training classes, the data representations of each class lie in a long and narrow region; with more training classes, the representations of each class scatter more uniformly. Inspired by this observation, we propose Class-wise Decorrelation (CwD) that effectively regularizes representations of each class to scatter more uniformly, thus mimicking the model jointly trained with all classes (i.e., the oracle model). |
Yujun Shi; Kuangqi Zhou; Jian Liang; Zihang Jiang; Jiashi Feng; Philip H.S. Torr; Song Bai; Vincent Y. F. Tan; | code |
175 | RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR. |
Jun Chen; Aniket Agarwal; Sherif Abdelkarim; Deyao Zhu; Mohamed Elhoseiny; | code |
176 | Personalized Image Aesthetics Assessment With Rich Attributes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To solve the dilemma, we conduct so far, the most comprehensive subjective study of personalized image aesthetics and introduce a new Personalized image Aesthetics database with Rich Attributes (PARA), which consists of 31,220 images with annotations by 438 subjects. |
Yuzhe Yang; Liwu Xu; Leida Li; Nan Qie; Yaqian Li; Peng Zhang; Yandong Guo; | code |
177 | Part-Based Pseudo Label Refinement for Unsupervised Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel Part-based Pseudo Label Refinement (PPLR) framework that reduces the label noise by employing the complementary relationship between global and part features. |
Yoonki Cho; Woo Jae Kim; Seunghoon Hong; Sung-Eui Yoon; | code |
178 | HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: So we propose a high-resolution dual-domain learning network (HDNet) for HSI reconstruction. |
Xiaowan Hu; Yuanhao Cai; Jing Lin; Haoqian Wang; Xin Yuan; Yulun Zhang; Radu Timofte; Luc Van Gool; | code |
179 | OW-DETR: Open-World Detection Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we introduce a novel end-to-end transformer-based framework, OW-DETR, for open-world object detection. |
Akshita Gupta; Sanath Narayan; K J Joseph; Salman Khan; Fahad Shahbaz Khan; Mubarak Shah; | code |
180 | Learning Deep Implicit Functions for 3D Shapes With Dynamic Code Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the local codes are constrained at discrete and regular positions like grid points, which makes the code positions difficult to be optimized and limits their representation ability. To solve this problem, we propose to learn DIF with Dynamic Code Cloud, named DCC-DIF. |
Tianyang Li; Xin Wen; Yu-Shen Liu; Hua Su; Zhizhong Han; | code |
181 | Reversible Vision Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present Reversible Vision Transformers, a memory efficient architecture design for visual recognition. |
Karttikeya Mangalam; Haoqi Fan; Yanghao Li; Chao-Yuan Wu; Bo Xiong; Christoph Feichtenhofer; Jitendra Malik; | code |
182 | Amodal Panoptic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This ability of amodal perception forms the basis of our perceptual and cognitive understanding of our world. To enable robots to reason with this capability, we formulate and propose a novel task that we name amodal panoptic segmentation. |
Rohit Mohan; Abhinav Valada; | code |
183 | Correlation Verification for Image Retrieval Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose a novel image retrieval re-ranking network named Correlation Verification Networks (CVNet). |
Seongwon Lee; Hongje Seong; Suhyeon Lee; Euntai Kim; | code |
184 | Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: More importantly, existing approaches build upon the straightforward pose estimation loss, which unfortunately cannot constrain the network to fully leverage useful information from neighboring frames. To tackle these problems, we present a novel hierarchical alignment framework, which leverages coarse-to-fine deformations to progressively update a neighboring frame to align with the current frame at the feature level. |
Zhenguang Liu; Runyang Feng; Haoming Chen; Shuang Wu; Yixing Gao; Yunjun Gao; Xiang Wang; | code |
185 | Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show a graph-based method that uses the self-supervised transformer features to discover an object from an image. |
Yangtao Wang; Xi Shen; Shell Xu Hu; Yuan Yuan; James L. Crowley; Dominique Vaufreydaz; | code |
186 | Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we design a novel Transformer-style HOI detector, i.e., Structure-aware Transformer over Interaction Proposals (STIP), for HOI detection. |
Yong Zhang; Yingwei Pan; Ting Yao; Rui Huang; Tao Mei; Chang-Wen Chen; | code |
187 | Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper probes intrinsic factors behind typical failure cases (e.g spatial inconsistency and boundary confusion) produced by the existing state-of-the-art method in face parsing. To tackle these problems, we propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation (DML-CSR) for face parsing. |
Qingping Zheng; Jiankang Deng; Zheng Zhu; Ying Li; Stefanos Zafeiriou; | code |
188 | Glass: Geometric Latent Augmentation for Shape Spaces Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate the problem of training generative models on very sparse collections of 3D models. |
Sanjeev Muralikrishnan; Siddhartha Chaudhuri; Noam Aigerman; Vladimir G. Kim; Matthew Fisher; Niloy J. Mitra; | code |
189 | DPICT: Deep Progressive Image Compression Using Trit-Planes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose the deep progressive image compression using trit-planes (DPICT) algorithm, which is the first learning-based codec supporting fine granular scalability (FGS). |
Jae-Han Lee; Seungmin Jeon; Kwang Pyo Choi; Youngo Park; Chang-Su Kim; | code |
190 | Text to Image Generation With Semantic-Spatial Aware GAN Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A close inspection of their generated images reveals a major limitation: even though the generated image holistically matches the description, individual image regions or parts of somethings are often not recognizable or consistent with words in the sentence, e.g. "a white crown". To address this problem, we propose a novel framework Semantic-Spatial Aware GAN for synthesizing images from input text. |
Wentong Liao; Kai Hu; Michael Ying Yang; Bodo Rosenhahn; | code |
191 | Generalizable Cross-Modality Medical Image Segmentation Via Style Augmentation and Dual Normalization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This setting, namely generalizable cross-modality segmentation, owning its clinical potential, is much more challenging than other related settings, e.g., domain adaptation. To achieve this goal, we in this paper propose a novel dual-normalization model by leveraging the augmented source-similar and source-dissimilar images during our generalizable segmentation. |
Ziqi Zhou; Lei Qi; Xin Yang; Dong Ni; Yinghuan Shi; | code |
192 | Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a novel method, detection prompt (DetPro), to learn continuous prompt representations for open-vocabulary object detection based on the pre-trained vision-language model. |
Yu Du; Fangyun Wei; Zihe Zhang; Miaojing Shi; Yue Gao; Guoqi Li; | code |
193 | Interactive Segmentation and Visualization for Tiny Objects in Multi-Megapixel Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce an interactive image segmentation and visualization framework for identifying, inspecting, and editing tiny objects (just a few pixels wide) in large multi-megapixel high-dynamic-range (HDR) images. |
Chengyuan Xu; Boning Dong; Noah Stier; Curtis McCully; D. Andrew Howell; Pradeep Sen; Tobias Höllerer; | code |
194 | Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we focus on exploiting the high-precision and non-differentiable physics simulator to incorporate dynamical constraints in motion capture. |
Buzhen Huang; Liang Pan; Yuan Yang; Jingyi Ju; Yangang Wang; | code |
195 | Surface Representation for Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present RepSurf (representative surfaces), a novel representation of point clouds to explicitly depict the very local structure. |
Haoxi Ran; Jun Liu; Chengjie Wang; | code |
196 | Implicit Motion Handling for Video Camouflaged Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new video camouflaged object detection (VCOD) framework that can exploit both short-term dynamics and long-term temporal consistency to detect camouflaged objects from video frames. |
Xuelian Cheng; Huan Xiong; Deng-Ping Fan; Yiran Zhong; Mehrtash Harandi; Tom Drummond; Zongyuan Ge; | code |
197 | DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present DeepLIIF (https://deepliif.org), a first free online platform for efficient and reproducible IHC scoring. |
Parmida Ghahremani; Joseph Marino; Ricardo Dodds; Saad Nadeem; | code |
198 | Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study an untouched problem in visible-infrared person re-identification (VI-ReID), namely, Twin Noise Labels (TNL) which refers to as noisy annotation and correspondence. |
Mouxing Yang; Zhenyu Huang; Peng Hu; Taihao Li; Jiancheng Lv; Xi Peng; | code |
199 | Optical Flow Estimation for Spiking Camera Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, frame-based and event-based methods are not well suited to spike streams from the spiking camera due to the different data modalities. To this end, we present, SCFlow, a tailored deep learning pipeline to estimate optical flow in high-speed scenes from spike streams. |
Liwen Hu; Rui Zhao; Ziluo Ding; Lei Ma; Boxin Shi; Ruiqin Xiong; Tiejun Huang; | code |
200 | GradViT: Gradient Inversion of Vision Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks. |
Ali Hatamizadeh; Hongxu Yin; Holger R. Roth; Wenqi Li; Jan Kautz; Daguang Xu; Pavlo Molchanov; | code |
201 | Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution Via Cycle-Projected Mutual Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To this end, we propose a one-stage based Cycle-projected Mutual learning network (CycMu-Net) for ST-VSR, which makes full use of spatial-temporal correlations via the mutual learning between S-VSR and T-VSR. |
Mengshun Hu; Kui Jiang; Liang Liao; Jing Xiao; Junjun Jiang; Zheng Wang; | code |
202 | Joint Global and Local Hierarchical Priors for Learned Image Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, CNNs have a limitation in modeling long-range dependencies due to their nature of local connectivity, which can be a significant bottleneck in image compression where reducing spatial redundancy is a key point. To overcome this issue, we propose a novel entropy model called Information Transformer (Informer) that exploits both global and local information in a content-dependent manner using an attention mechanism. |
Jun-Hyuk Kim; Byeongho Heo; Jong-Seok Lee; | code |
203 | Knowledge Distillation Via The Target-Aware Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This greatly undermines the underlying assumption of the one-to-one distillation approach. To this end, we propose a novel one-to-all spatial matching knowledge distillation approach. |
Sihao Lin; Hongwei Xie; Bing Wang; Kaicheng Yu; Xiaojun Chang; Xiaodan Liang; Gang Wang; | code |
204 | Subspace Adversarial Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To control the growth of the gradient, we propose a new AT method, Subspace Adversarial Training (Sub-AT), which constrains AT in a carefully extracted subspace. |
Tao Li; Yingwen Wu; Sizhe Chen; Kun Fang; Xiaolin Huang; | code |
205 | 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we substantially improve the generalization of 3D object detectors to out-of-domain data by deforming point clouds during training. |
Alexander Lehner; Stefano Gasperini; Alvaro Marcos-Ramiro; Michael Schmidt; Mohammad-Ali Nikouei Mahani; Nassir Navab; Benjamin Busam; Federico Tombari; | code |
206 | Image Segmentation Using Text and Image Prompts Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we propose a system that can generate image segmentations based on arbitrary prompts at test time. |
Timo Lüddecke; Alexander Ecker; | code |
207 | AutoMine: An Unmanned Mine Dataset Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, the open-pit mine is one of the typical representatives for them. Therefore, we introduce the Autonomous driving dataset on the Mining scene (AutoMine) for positioning and perception tasks in this paper. |
Yuchen Li; Zixuan Li; Siyu Teng; Yu Zhang; Yuhang Zhou; Yuchang Zhu; Dongpu Cao; Bin Tian; Yunfeng Ai; Zhe Xuanyuan; Long Chen; | code |
208 | Background Activation Suppression for Weakly Supervised Object Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Background Activation Suppression (BAS) method. |
Pingyu Wu; Wei Zhai; Yang Cao; | code |
209 | Synthetic Generation of Face Videos With Plethysmograph Physiology Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a scalable biophysical learning based method to generate physio-realistic synthetic rPPG videos given any reference image and target rPPG signal and shows that it could further improve the state-of-the-art physiological measurement and reduce the bias among different groups. |
Zhen Wang; Yunhao Ba; Pradyumna Chari; Oyku Deniz Bozkurt; Gianna Brown; Parth Patwa; Niranjan Vaddi; Laleh Jalilian; Achuta Kadambi; | code |
210 | Hallucinated Neural Radiance Fields in The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing solutions adopt NeRF with a controllable appearance embedding to render novel views under various conditions, but they cannot render view-consistent images with an unseen appearance. To solve this problem, we present an end-to-end framework for constructing a hallucinated NeRF, dubbed as Ha-NeRF. |
Xingyu Chen; Qi Zhang; Xiaoyu Li; Yue Chen; Ying Feng; Xuan Wang; Jue Wang; | code |
211 | Global Tracking Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel transformer-based architecture for global multi-object tracking. |
Xingyi Zhou; Tianwei Yin; Vladlen Koltun; Philipp Krähenbühl; | code |
212 | Backdoor Attacks on Self-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Backdoor attacks have been studied extensively in supervised learning and to the best of our knowledge, we are the first to study them for self-supervised learning. |
Aniruddha Saha; Ajinkya Tejankar; Soroush Abbasi Koohpayegani; Hamed Pirsiavash; | code |
213 | GMFlow: Learning Optical Flow Via Global Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Specifically, we propose a GMFlow framework, which consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation. |
Haofei Xu; Jing Zhang; Jianfei Cai; Hamid Rezatofighi; Dacheng Tao; | code |
214 | Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To fully utilize the rich connections between speech audio and human gestures, we propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation. |
Xian Liu; Qianyi Wu; Hang Zhou; Yinghao Xu; Rui Qian; Xinyi Lin; Xiaowei Zhou; Wayne Wu; Bo Dai; Bolei Zhou; | code |
215 | Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We endeavor on a rarely explored task named Insubstan-tial Object Detection (IOD), which aims to localize the object with following characteristics: (1) amorphous shape with indistinct boundary; (2) similarity to surroundings; (3) absence in color. Accordingly, it is far more challenging to distinguish insubstantial objects in a single static frame and the collaborative representation of spatial and tempo-ral information is crucial. |
Kailai Zhou; Yibo Wang; Tao Lv; Yunqian Li; Linsen Chen; Qiu Shen; Xun Cao; | code |
216 | Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our model aims to forecast multiple paths based on a historical trajectory by modeling multi-scale graph-based spatial transformers combined with a trajectory smoothing algorithm named "Memory Replay" utilizing a memory graph. |
Lihuan Li; Maurice Pagnucco; Yang Song; | code |
217 | Scanline Homographies for Rolling-Shutter Plane Absolute Pose Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we give a solution to the absolute pose problem free of motion assumptions. |
Fang Bai; Agniva Sengupta; Adrien Bartoli; | code |
218 | AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: They adopt a sub-optimal uniform sampling point allocation, limiting the expressiveness of the learned LUTs since the (tri-)linear interpolation between uniform sampling points in the LUT transform might fail to model local non-linearities of the color transform. Focusing on this problem, we present AdaInt (Adaptive Intervals Learning), a novel mechanism to achieve a more flexible sampling point allocation by adaptively learning the non-uniform sampling intervals in the 3D color space. |
Canqian Yang; Meiguang Jin; Xu Jia; Yi Xu; Ying Chen; | code |
219 | Recurrent Glimpse-Based Decoder for Detection With Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Alternative to existing studies that mainly develop advanced feature or embedding designs to tackle the training issue, we point out that the Region-of-Interest (RoI) based detection refinement can easily help mitigate the difficulty of training for DETR methods. Based on this, we introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper. |
Zhe Chen; Jing Zhang; Dacheng Tao; | code |
220 | SimMIM: A Simple Framework for Masked Image Modeling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents SimMIM, a simple framework for masked image modeling. |
Zhenda Xie; Zheng Zhang; Yue Cao; Yutong Lin; Jianmin Bao; Zhuliang Yao; Qi Dai; Han Hu; | code |
221 | Label Matching Semi-Supervised Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Despite the promising results, the label mismatch problem is not yet fully explored in the previous works, leading to severe confirmation bias during self-training. In this paper, we delve into this problem and propose a simple yet effective LabelMatch framework from two different yet complementary perspectives, i.e., distribution-level and instance-level. |
Binbin Chen; Weijie Chen; Shicai Yang; Yunyi Xuan; Jie Song; Di Xie; Shiliang Pu; Mingli Song; Yueting Zhuang; | code |
222 | RegionCLIP: Region-Based Language-Image Pretraining Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To mitigate this issue, we propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts. |
Yiwu Zhong; Jianwei Yang; Pengchuan Zhang; Chunyuan Li; Noel Codella; Liunian Harold Li; Luowei Zhou; Xiyang Dai; Lu Yuan; Yin Li; Jianfeng Gao; | code |
223 | Video Frame Interpolation Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing methods for video interpolation heavily rely on deep convolution neural networks, and thus suffer from their intrinsic limitations, such as content-agnostic kernel weights and restricted receptive field. To address these issues, we propose a Transformer-based video interpolation framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations. |
Zhihao Shi; Xiangyu Xu; Xiaohong Liu; Jun Chen; Ming-Hsuan Yang; | code |
224 | BCOT: A Markerless High-Precision 3D Object Tracking Benchmark Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a multi-view approach to estimate the accurate 3D poses of real moving objects, and then use binocular data to construct a new benchmark for monocular textureless 3D object tracking. |
Jiachen Li; Bin Wang; Shiqiang Zhu; Xin Cao; Fan Zhong; Wenxuan Chen; Te Li; Jason Gu; Xueying Qin; | code |
225 | Omni-DETR: Omni-Supervised Object Detection With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider the problem of omni-supervised object detection, which can use unlabeled, fully labeled and weakly labeled annotations, such as image tags, counts, points, etc., for object detection. |
Pei Wang; Zhaowei Cai; Hao Yang; Gurumurthy Swaminathan; Nuno Vasconcelos; Bernt Schiele; Stefano Soatto; | code |
226 | Transferable Sparse Adversarial Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on sparse adversarial attack based on the l_0 norm constraint, which can succeed by only modifying a few pixels of an image. |
Ziwen He; Wei Wang; Jing Dong; Tieniu Tan; | code |
227 | CREAM: Weakly Supervised Object Localization Via Class RE-Activation Mapping Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we empirically prove that this problem is associated with the mixup of the activation values between less discriminative foreground regions and the background. To address it, we propose Class RE-Activation Mapping (CREAM), a novel clustering-based approach to boost the activation values of the integral object regions. |
Jilan Xu; Junlin Hou; Yuejie Zhang; Rui Feng; Rui-Wei Zhao; Tao Zhang; Xuequan Lu; Shang Gao; | code |
228 | VALHALLA: Visual Hallucination for Machine Translation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a visual hallucination framework, called VALHALLA, which requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation. |
Yi Li; Rameswar Panda; Yoon Kim; Chun-Fu (Richard) Chen; Rogerio S. Feris; David Cox; Nuno Vasconcelos; | code |
229 | HINT: Hierarchical Neuron Concept Explainer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study hierarchical concepts inspired by the hierarchical cognition process of human beings. |
Andong Wang; Wei-Ning Lee; Xiaojuan Qi; | code |
230 | Neural Face Identification in A 2D Wireframe Projection of A Manifold Object Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we approach the classical problem of face identification from a novel data-driven point of view. |
Kehan Wang; Jia Zheng; Zihan Zhou; | code |
231 | Nonuniform-to-Uniform Quantization: Towards Accurate Quantization Via Generalized Straight-Through Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient as the uniform quantization for model inference. |
Zechun Liu; Kwang-Ting Cheng; Dong Huang; Eric P. Xing; Zhiqiang Shen; | code |
232 | An Empirical Study of End-to-End Temporal Action Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present an empirical study of end-to-end temporal action detection. |
Xiaolong Liu; Song Bai; Xiang Bai; | code |
233 | Object Localization Under Single Coarse Point Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose a POL method using coarse point annotations, relaxing the supervision signals from accurate key points to freely spotted points. |
Xuehui Yu; Pengfei Chen; Di Wu; Najmul Hassan; Guorong Li; Junchi Yan; Humphrey Shi; Qixiang Ye; Zhenjun Han; | code |
234 | Unsupervised Learning of Accurate Siamese Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel unsupervised tracking framework, in which we can learn temporal correspondence both on the classification branch and regression branch. |
Qiuhong Shen; Lei Qiao; Jinyang Guo; Peixia Li; Xin Li; Bo Li; Weitao Feng; Weihao Gan; Wei Wu; Wanli Ouyang; | code |
235 | Non-Parametric Depth Distribution Modelling Based Depth Inference for Multi-View Stereo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast, we propose constructing the cost volume by non-parametric depth distribution modeling to handle pixels with unimodal and multi-modal distributions. |
Jiayu Yang; Jose M. Alvarez; Miaomiao Liu; | code |
236 | Equalized Focal Loss for Dense Long-Tailed Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, in the long-tailed scenario, this line of work has not been explored so far. In this paper, we investigate whether one-stage detectors can perform well in this case. |
Bo Li; Yongqiang Yao; Jingru Tan; Gang Zhang; Fengwei Yu; Jianwei Lu; Ye Luo; | code |
237 | DeepDPM: Deep Clustering With An Unknown Number of Clusters Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: When K is unknown, however, using model-selection criteria to choose its optimal value might become computationally expensive, especially in DL as the training process would have to be repeated numerous times. In this work, we bridge this gap by introducing an effective deep-clustering method that does not require knowing the value of K as it infers it during the learning. |
Meitar Ronen; Shahaf E. Finder; Oren Freifeld; | code |
238 | ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose ISDNet, a novel ultra-high resolution segmentation framework that integrates the shallow and deep networks in a new manner, which significantly accelerates the inference speed while achieving accurate segmentation. |
Shaohua Guo; Liang Liu; Zhenye Gan; Yabiao Wang; Wuhao Zhang; Chengjie Wang; Guannan Jiang; Wei Zhang; Ran Yi; Lizhuang Ma; Ke Xu; | code |
239 | Unsupervised Domain Adaptation for Nighttime Aerial Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work instead develops a novel unsupervised domain adaptation framework for nighttime aerial tracking (named UDAT). |
Junjie Ye; Changhong Fu; Guangze Zheng; Danda Pani Paudel; Guang Chen; | code |
240 | RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As face image contains abundant contextual information, we propose a method, RestoreFormer, which explores fully-spatial attentions to model contextual information and surpasses existing works that use local convolutions. |
Zhouxia Wang; Jiawei Zhang; Runjian Chen; Wenping Wang; Ping Luo; | code |
241 | Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel framework, Mask-guided Spectral-wise Transformer (MST), for HSI reconstruction. |
Yuanhao Cai; Jing Lin; Xiaowan Hu; Haoqian Wang; Xin Yuan; Yulun Zhang; Radu Timofte; Luc Van Gool; | code |
242 | A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel variational Bayesian formulation for diffeomorphic non-rigid registration of medical images, which learns in an unsupervised way a data-specific similarity metric. |
Daniel Grzech; Mohammad Farid Azampour; Ben Glocker; Julia Schnabel; Nassir Navab; Bernhard Kainz; Loïc Le Folgoc; | code |
243 | Not Just Selection, But Exploration: Online Class-Incremental Continual Learning Via Dual View Consistency Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel yet effective framework for online class-incremental continual learning, which considers not only the selection of stored samples, but also the full exploration of the data stream. |
Yanan Gu; Xu Yang; Kun Wei; Cheng Deng; | code |
244 | Coupling Vision and Proprioception for Navigation of Legged Robots Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We exploit the complementary strengths of vision and proprioception to develop a point-goal navigation system for legged robots, called VP-Nav. |
Zipeng Fu; Ashish Kumar; Ananye Agarwal; Haozhi Qi; Jitendra Malik; Deepak Pathak; | code |
245 | Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel optimization method based on a recurrent neural network to predict LiDAR scene flow in a weakly supervised manner. |
Guanting Dong; Yueyi Zhang; Hanlin Li; Xiaoyan Sun; Zhiwei Xiong; | code |
246 | EMOCA: Emotion Driven Monocular Face Capture and Animation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The result is facial geometries that do not match the emotional content of the input image. We address this with EMOCA (EMOtion Capture and Animation), by introducing a novel deep perceptual emotion consistency loss during training, which helps ensure that the reconstructed 3D expression matches the expression depicted in the input image. |
Radek Daněček; Michael J. Black; Timo Bolkart; | code |
247 | Quarantine: Sparsity Can Uncover The Trojan Attack Trigger for Free Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our crucial observation is that the Trojan features are significantly more stable to network pruning than benign features. Leveraging that, we propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork. |
Tianlong Chen; Zhenyu Zhang; Yihua Zhang; Shiyu Chang; Sijia Liu; Zhangyang Wang; | code |
248 | AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, existing approaches ignored the distribution difference between training and testing data, thereby inducing a large quantization error in inference. To address this issue, we propose a new quantization scheme, Alignment Quantization with ADMM-based Correlation Preservation (AlignQ), which exploits the cumulative distribution function (CDF) to align the data to be i.i.d. (independently and identically distributed) for quantization error minimization. |
Ting-An Chen; De-Nian Yang; Ming-Syan Chen; | code |
249 | Interactive Multi-Class Tiny-Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such imagery typically contains objects from various categories, yet the multi-class interactive annotation setting for the detection task has thus far been unexplored. To address these needs, we propose a novel interactive annotation method for multiple instances of tiny objects from multiple classes, based on a few point-based user inputs. |
Chunggi Lee; Seonwook Park; Heon Song; Jeongun Ryu; Sanghoon Kim; Haejoon Kim; Sérgio Pereira; Donggeun Yoo; | code |
250 | Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to learn light field saliency from pixel-level noisy labels obtained from unsupervised hand crafted featured-based saliency methods. |
Mingtao Feng; Kendong Liu; Liang Zhang; Hongshan Yu; Yaonan Wang; Ajmal Mian; | code |
251 | Multi-View Depth Estimation By Fusing Single-View Depth Probability With Multi-View Geometry Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For such failure modes, single-view depth estimation methods are often more reliable. To this end, we propose MaGNet, a novel framework for fusing single-view depth probability with multi-view geometry, to improve the accuracy, robustness and efficiency of multi-view depth estimation. |
Gwangbin Bae; Ignas Budvytis; Roberto Cipolla; | code |
252 | Slimmable Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank, from which models of different capacities can be sampled to accommodate different accuracy-efficiency trade-offs. |
Rang Meng; Weijie Chen; Shicai Yang; Jie Song; Luojun Lin; Di Xie; Shiliang Pu; Xinchao Wang; Mingli Song; Yueting Zhuang; | code |
253 | High-Resolution Image Harmonization Via Collaborative Dual Transformations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a high-resolution image harmonization network with Collaborative Dual Transformation (CDTNet) to combine pixel-to-pixel transformation and RGB-to-RGB transformation coherently in an end-to-end network. |
Wenyan Cong; Xinhao Tao; Li Niu; Jing Liang; Xuesong Gao; Qihao Sun; Liqing Zhang; | code |
254 | MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation. |
Inkyu Shin; Yi-Hsuan Tsai; Bingbing Zhuang; Samuel Schulter; Buyu Liu; Sparsh Garg; In So Kweon; Kuk-Jin Yoon; | code |
255 | Self-Supervised Neural Articulated Shape and Appearance Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel approach for learning a representation of the geometry, appearance, and motion of a class of articulated objects given only a set of color images as input. |
Fangyin Wei; Rohan Chabra; Lingni Ma; Christoph Lassner; Michael Zollhöfer; Szymon Rusinkiewicz; Chris Sweeney; Richard Newcombe; Mira Slavcheva; | code |
256 | Topology Preserving Local Road Network Estimation From Single Onboard Camera Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims at extracting the local road network topology, directly in the bird’s-eye-view (BEV), all in a complex urban setting. |
Yigit Baran Can; Alexander Liniger; Danda Pani Paudel; Luc Van Gool; | code |
257 | Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A novel algorithm to detect road lanes in the eigenlane space is proposed in this paper. |
Dongkwon Jin; Wonhui Park; Seong-Gyun Jeong; Heeyeon Kwon; Chang-Su Kim; | code |
258 | SwinTextSpotter: Scene Text Spotting Via Better Synergy Between Text Detection and Text Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new end-to-end scene text spotting framework termed SwinTextSpotter. |
Mingxin Huang; Yuliang Liu; Zhenghao Peng; Chongyu Liu; Dahua Lin; Shenggao Zhu; Nicholas Yuan; Kai Ding; Lianwen Jin; | code |
259 | Deblur-NeRF: Neural Radiance Fields From Blurry Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, image blurriness caused by defocus or motion, which often occurs when capturing scenes in the wild, significantly degrades its reconstruction quality. To address this problem, We propose Deblur-NeRF, the first method that can recover a sharp NeRF from blurry input. |
Li Ma; Xiaoyu Li; Jing Liao; Qi Zhang; Xuan Wang; Jue Wang; Pedro V. Sander; | code |
260 | Whose Track Is It Anyway? Improving Robustness to Tracking Errors With Affinity-Based Trajectory Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This is typically caused by the propagation of errors from tracking to prediction, such as noisy tracks, fragments, and identity switches. To alleviate this propagation of errors, we propose a new prediction paradigm that uses detections and their affinity matrices across frames as inputs, removing the need for error-prone data association during tracking. |
Xinshuo Weng; Boris Ivanovic; Kris Kitani; Marco Pavone; | code |
261 | Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents Video K-Net, a simple, strong, and unified framework for fully end-to-end video panoptic segmentation. |
Xiangtai Li; Wenwei Zhang; Jiangmiao Pang; Kai Chen; Guangliang Cheng; Yunhai Tong; Chen Change Loy; | code |
262 | Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Instead, we consider rethinking solutions to data heterogeneity in FL with a focus on local learning generality rather than proximal restriction. |
Matias Mendieta; Taojiannan Yang; Pu Wang; Minwoo Lee; Zhengming Ding; Chen Chen; | code |
263 | Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the above issues, this paper proposes a model-based blind SISR method under the probabilistic framework, which elaborately models image degradation from the perspectives of noise and blur kernel. |
Zongsheng Yue; Qian Zhao; Jianwen Xie; Lei Zhang; Deyu Meng; Kwan-Yee K. Wong; | code |
264 | Faithful Extreme Rescaling Via Generative Prior Reciprocated Invertible Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a Generative prior ReciprocAted Invertible rescaling Network (GRAIN) for generating faithful high-resolution (HR) images from low-resolution (LR) invertible images with an extreme upscaling factor (64x). |
Zhixuan Zhong; Liangyu Chai; Yang Zhou; Bailin Deng; Jia Pan; Shengfeng He; | code |
265 | Proto2Proto: Can You Recognize The Car, The Way I Do? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present Proto2Proto, a novel method to transfer interpretability of one prototypical part network to another via knowledge distillation. |
Monish Keswani; Sriranjani Ramakrishnan; Nishant Reddy; Vineeth N Balasubramanian; | code |
266 | TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We observe that these applications naturally exhibit the characteristics of large intra-image (spatial) variance and small cross-image variance. This observation motivates our efficient translation variant convolution (TVConv) for layout-aware visual processing. |
Jierun Chen; Tianlang He; Weipeng Zhuo; Li Ma; Sangtae Ha; S.-H. Gary Chan; | code |
267 | Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate a novel and practical task coded cross-device SR, which strives to adapt a real-world SR model trained on the paired images captured by one camera to low-resolution (LR) images captured by arbitrary target devices. |
Xiaoqian Xu; Pengxu Wei; Weikai Chen; Yang Liu; Mingzhi Mao; Liang Lin; Guanbin Li; | code |
268 | Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a large-scale study of imitating human demonstrations on tasks that require a virtual robot to search for objects in new environments – (1) ObjectGoal Navigation (e.g. ‘find & go to a chair’) and (2) Pick&Place (e.g. ‘find mug, pick mug, find counter, place mug on counter’). |
Ram Ramrakhya; Eric Undersander; Dhruv Batra; Abhishek Das; | code |
269 | Simple But Effective: CLIP Embeddings for Embodied AI Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate the effectiveness of CLIP visual backbones for Embodied AI tasks. |
Apoorv Khandelwal; Luca Weihs; Roozbeh Mottaghi; Aniruddha Kembhavi; | code |
270 | NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Alternatively, a more graceful way is that global and local context can adaptively contribute per se to accommodate different visual data. To achieve this goal, we in this paper propose a novel ViT architecture, termed NomMer, which can dynamically Nominate the synergistic global-local context in vision transforMer. |
Hao Liu; Xinghua Jiang; Xin Li; Zhimin Bao; Deqiang Jiang; Bo Ren; | code |
271 | Collaborative Transformers for Grounded Situation Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Grounded situation recognition is the task of predicting the main activity, entities playing certain roles within the activity, and bounding-box groundings of the entities in the given image. To effectively deal with this challenging task, we introduce a novel approach where the two processes for activity classification and entity estimation are interactive and complementary. |
Junhyeong Cho; Youngseok Yoon; Suha Kwak; | code |
272 | CPPF: Towards Robust Category-Level 9D Pose Estimation in The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we tackle the problem of category-level 9D pose estimation in the wild, given a single RGB-D frame. |
Yang You; Ruoxi Shi; Weiming Wang; Cewu Lu; | code |
273 | Continual Test-Time Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The noisy pseudo-labels can further lead to error accumulation and catastrophic forgetting. To tackle these issues, we propose a continual test-time adaptation approach (CoTTA) which comprises two parts. |
Qin Wang; Olga Fink; Luc Van Gool; Dengxin Dai; | code |
274 | Dynamic MLP for Fine-Grained Image Classification By Leveraging Geographical and Temporal Information Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To fully explore the potential of multimodal information, we propose a dynamic MLP on top of the image representation, which interacts with multimodal features at a higher and broader dimension. |
Lingfeng Yang; Xiang Li; Renjie Song; Borui Zhao; Juntian Tao; Shihao Zhou; Jiajun Liang; Jian Yang; | code |
275 | MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose MuKEA to represent multimodal knowledge by an explicit triplet to correlate visual objects and fact answers with implicit relations. |
Yang Ding; Jing Yu; Bang Liu; Yue Hu; Mingxin Cui; Qi Wu; | code |
276 | Fair Contrastive Learning for Facial Attribute Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we for the first time analyze unfairness caused by supervised contrastive learning and propose a new Fair Supervised Contrastive Loss (FSCL) for fair visual representation learning. |
Sungho Park; Jewook Lee; Pilhyeon Lee; Sunhee Hwang; Dohyung Kim; Hyeran Byun; | code |
277 | Directional Self-Supervised Learning for Heavy Image Augmentations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a directional self-supervised learning paradigm (DSSL), which is compatible with significantly more augmentations. |
Yalong Bai; Yifan Yang; Wei Zhang; Tao Mei; | code |
278 | No-Reference Point Cloud Quality Assessment Via Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel no-reference quality assessment metric, the image transferred point cloud quality assessment (IT-PCQA), for 3D point clouds. |
Qi Yang; Yipeng Liu; Siheng Chen; Yiling Xu; Jun Sun; | code |
279 | Comprehending and Ordering Semantics for Image Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new recipe of Transformer-style structure, namely Comprehending and Ordering Semantics Networks (COS-Net), that novelly unifies an enriched semantic comprehending and a learnable semantic ordering processes into a single architecture. |
Yehao Li; Yingwei Pan; Ting Yao; Tao Mei; | code |
280 | A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce VCSL (Video Copy Segment Localization), a new comprehensive segment-level annotated video copy dataset. |
Sifeng He; Xudong Yang; Chen Jiang; Gang Liang; Wei Zhang; Tan Pan; Qing Wang; Furong Xu; Chunguang Li; JinXiong Liu; Hui Xu; Kaiming Huang; Yuan Cheng; Feng Qian; Xiaobo Zhang; Lei Yang; | code |
281 | Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the HMC problem in which objects are labeled at any level of the hierarchy. |
Jingzhou Chen; Peng Wang; Jian Liu; Yuntao Qian; | code |
282 | HeadNeRF: A Real-Time NeRF-Based Parametric Head Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose HeadNeRF, a novel NeRF-based parametric head model that integrates the neural radiance field to the parametric representation of the human head. |
Yang Hong; Bo Peng; Haiyao Xiao; Ligang Liu; Juyong Zhang; | code |
283 | Occlusion-Robust Face Alignment Using A Viewpoint-Invariant Hierarchical Network Architecture Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a new network architecture called GlomFace to model the facial hierarchies against various occlusions, which draws inspiration from the viewpoint-invariant hierarchy of facial structure. |
Congcong Zhu; Xintong Wan; Shaorong Xie; Xiaoqiang Li; Yinzheng Gu; | code |
284 | IDR: Self-Supervised Image Denoising Via Iterative Data Refinement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a practical unsupervised image denoising method to achieve state-of-the-art denoising performance. |
Yi Zhang; Dasong Li; Ka Lung Law; Xiaogang Wang; Hongwei Qin; Hongsheng Li; | code |
285 | MogFace: Towards A Deeper Appreciation on Face Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on resolving three aforementioned challenges that exiting methods are difficult to finish off and present a novel face detector, termed MogFace. |
Yang Liu; Fei Wang; Jiankang Deng; Zhipeng Zhou; Baigui Sun; Hao Li; | code |
286 | Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, to address the aforementioned problem, we introduce Transformers, which naturally integrate global information, to generate more integral initial pseudo labels for end-to-end WSSS. |
Lixiang Ru; Yibing Zhan; Baosheng Yu; Bo Du; | code |
287 | CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the problem of jointly estimating the optical flow and scene flow from synchronized 2D and 3D data. |
Haisong Liu; Tao Lu; Yihui Xu; Jia Liu; Wenjie Li; Lijun Chen; | code |
288 | FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For example, the "Happy" expression with high intensity in Talk-Show is more discriminating than the same expression with low intensity in Official-Event. To fill this gap, we build a large-scale multi-scene dataset, coined as FERV39k. |
Yan Wang; Yixuan Sun; Yiwen Huang; Zhongying Liu; Shuyong Gao; Wei Zhang; Weifeng Ge; Wenqiang Zhang; | code |
289 | Learning To Detect Mobile Objects From LiDAR Scans Without Labels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes an alternative approach entirely based on unlabeled data, which can be collected cheaply and in abundance almost everywhere on earth. |
Yurong You; Katie Luo; Cheng Perng Phoo; Wei-Lun Chao; Wen Sun; Bharath Hariharan; Mark Campbell; Kilian Q. Weinberger; | code |
290 | WildNet: Learning Domain Generalized Semantic Segmentation From The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to diversify both the content and style of the source domain with the help of the wild. |
Suhyeon Lee; Hongje Seong; Seongwon Lee; Euntai Kim; | code |
291 | DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations. |
Haibao Yu; Yizhen Luo; Mao Shu; Yiyi Huo; Zebang Yang; Yifeng Shi; Zhenglong Guo; Hanyu Li; Xing Hu; Jirui Yuan; Zaiqing Nie; | code |
292 | Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To tackle the aforementioned problems, we propose the Point-to-Voxel Knowledge Distillation (PVD), which transfers the hidden knowledge from both point level and voxel level. |
Yuenan Hou; Xinge Zhu; Yuexin Ma; Chen Change Loy; Yikang Li; | code |
293 | Generating Diverse 3D Reconstructions From A Single Occluded Face Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Furthermore, while a plurality of 3D reconstructions is plausible in the occluded regions, existing approaches are limited to generating only a single solution. To address both of these challenges, we present Diverse3DFace, which is specifically designed to simultaneously generate a diverse and realistic set of 3D reconstructions from a single occluded face image. |
Rahul Dey; Vishnu Naresh Boddeti; | code |
294 | Stand-Alone Inter-Frame Attention in Video Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a new recipe of inter-frame attention block, namely Stand-alone Inter-Frame Attention (SIFA), that novelly delves into the deformation across frames to estimate local self-attention on each spatial location. |
Fuchen Long; Zhaofan Qiu; Yingwei Pan; Ting Yao; Jiebo Luo; Tao Mei; | code |
295 | Large-Scale Pre-Training for Person Re-Identification With Noisy Labels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims to address the problem of pre-training for person re-identification (Re-ID) with noisy labels. |
Dengpan Fu; Dongdong Chen; Hao Yang; Jianmin Bao; Lu Yuan; Lei Zhang; Houqiang Li; Fang Wen; Dong Chen; | code |
296 | Semantic Segmentation By Early Region Proxy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a novel and efficient modeling that starts from interpreting the image as a tessellation of learnable regions, each of which has flexible geometrics and carries homogeneous semantics. |
Yifan Zhang; Bo Pang; Cewu Lu; | code |
297 | LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To apply gesture recognition to long-distance interactive scenes such as meetings and smart homes, a large RGB-D video dataset LD-ConGR is established in this paper. |
Dan Liu; Libo Zhang; Yanjun Wu; | code |
298 | HVH: Learning A Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the aforementioned problems: 1) we use a novel, volumetric hair representation that is composed of thousands of primitives. |
Ziyan Wang; Giljoo Nam; Tuur Stuyck; Stephen Lombardi; Michael Zollhöfer; Jessica Hodgins; Christoph Lassner; | code |
299 | Rethinking Visual Geo-Localization for Large-Scale Applications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We find that current methods fail to scale to such large datasets, therefore we design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem avoiding the expensive mining needed by the commonly used contrastive learning. |
Gabriele Berton; Carlo Masone; Barbara Caputo; | code |
300 | The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In view of them, we advocate a principle of diversity for training ViTs, by presenting corresponding regularizers that encourage the representation diversity and coverage at each of those levels, that enabling capturing more discriminative information. |
Tianlong Chen; Zhenyu Zhang; Yu Cheng; Ahmed Awadallah; Zhangyang Wang; | code |
301 | ViM: Out-of-Distribution With Virtual-Logit Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: There are OOD samples that are easy to identify in the feature space while hard to distinguish in the logit space and vice versa. Motivated by this observation, we propose a novel OOD scoring method named Virtual-logit Matching (ViM), which combines the class-agnostic score from feature space and the In-Distribution (ID) class-dependent logits. |
Haoqi Wang; Zhizhong Li; Litong Feng; Wayne Zhang; | code |
302 | Class-Aware Contrastive Semi-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, the model’s judgment becomes noisier in real-world applications with extensive out-of-distribution data. To address this issue, we propose a general method named Class-aware Contrastive Semi-Supervised Learning (CCSSL), which is a drop-in helper to improve the pseudo-label quality and enhance the model’s robustness in the real-world setting. |
Fan Yang; Kai Wu; Shuyi Zhang; Guannan Jiang; Yong Liu; Feng Zheng; Wei Zhang; Chengjie Wang; Long Zeng; | code |
303 | Ditto: Building Digital Twins of Articulated Objects From Interaction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Ditto to learn articulation model estimation and 3D geometry reconstruction of an articulated object through interactive perception. |
Zhenyu Jiang; Cheng-Chun Hsu; Yuke Zhu; | code |
304 | Adaptive Early-Learning Correction for Segmentation From Noisy Annotations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we study the learning dynamics of deep segmentation networks trained on inaccurately-annotated data. |
Sheng Liu; Kangning Liu; Weicheng Zhu; Yiqiu Shen; Carlos Fernandez-Granda; | code |
305 | Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing works, e.g., using the twilight as the intermediate target domain to perform the adaptation from daytime to nighttime, may fail to cope with the inherent difference between datasets caused by the camera equipment and the urban style. Faced with these two types of domain shifts, i.e., the illumination and the inherent difference of the datasets, we propose a novel domain adaptation framework via cross-domain correlation distillation, called CCDistill. |
Huan Gao; Jichang Guo; Guoli Wang; Qian Zhang; | code |
306 | RSTT: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The existing methods based on Convolutional Neural Network (CNN) succeed in achieving visually satisfied results while suffer from slow inference speed due to their heavy architectures. We propose to resolve this issue by using a spatial-temporal transformer that naturally incorporates the spatial and temporal super resolution modules into a single model. |
Zhicheng Geng; Luming Liang; Tianyu Ding; Ilya Zharkov; | code |
307 | Partial Class Activation Attention for Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Beyond the previous CAM generated from image-level classification, we present Partial CAM, which subdivides the task into region-level prediction and achieves better localization performance. |
Sun-Ao Liu; Hongtao Xie; Hai Xu; Yongdong Zhang; Qi Tian; | code |
308 | Multi-Scale Memory-Based Video Deblurring Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To achieve fine-grained deblurring, we designed a memory branch to memorize the blurry-sharp feature pairs in the memory bank, thus providing useful information for the blurry query input. |
Bo Ji; Angela Yao; | code |
309 | A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a scalable combinatorial algorithm for globally optimizing over the space of geometrically consistent mappings between 3D shapes. |
Paul Roetzer; Paul Swoboda; Daniel Cremers; Florian Bernard; | code |
310 | Geometric Structure Preserving Warp for Natural Image Stitching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most of the existing methods ignore the large-scale layouts reflected by straight lines or curves, decreasing overall stitching quality. To address this issue, this work presents a structure-preserving stitching approach that produces images with natural visual effects and less distortion. |
Peng Du; Jifeng Ning; Jiguang Cui; Shaoli Huang; Xinchao Wang; Jiaxin Wang; | code |
311 | GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For the first time, we address the problem of generating full-body, hand and head motions of an avatar grasping an unknown object. |
Omid Taheri; Vasileios Choutas; Michael J. Black; Dimitrios Tzionas; | code |
312 | Conditional Prompt Learning for Vision-Language Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). |
Kaiyang Zhou; Jingkang Yang; Chen Change Loy; Ziwei Liu; | code |
313 | Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To do so, in this paper, we propose an efficient mini-batch sampling method, called graph sampling (GS), for large-scale deep metric learning. |
Shengcai Liao; Ling Shao; | code |
314 | Undoing The Damage of Label Shift for Cross-Domain Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we give an in-depth analysis and show that the damage of label shift can be overcome by aligning the data conditional distribution and correcting the posterior probability. |
Yahao Liu; Jinhong Deng; Jiale Tao; Tong Chu; Lixin Duan; Wen Li; | code |
315 | FisherMatch: Semi-Supervised Rotation Regression Via Entropy-Based Filtering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the popular semi-supervised approach, FixMatch, we propose to leverage pseudo label filtering to facilitate the information flow from labeled data to unlabeled data in a teacher-student mutual learning framework. |
Yingda Yin; Yingcheng Cai; He Wang; Baoquan Chen; | code |
316 | Affine Medical Image Registration With Coarse-To-Fine Vision Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a fast and robust learning-based algorithm, Coarse-to-Fine Vision Transformer (C2FViT), for 3D affine medical image registration. |
Tony C. W. Mok; Albert C. S. Chung; | code |
317 | A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In addition, the higher resolution (e.g., 4K) of modern imaging devices results in larger displacement between frames. To address these challenges, we design a differentiable two-stage alignment scheme sequentially in patch and pixel level for effective JDD-B. |
Shi Guo; Xi Yang; Jianqi Ma; Gaofeng Ren; Lei Zhang; | code |
318 | Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a deformable prototypical part network (Deformable ProtoPNet), an interpretable image classifier that integrates the power of deep learning and the interpretability of case-based reasoning. |
Jon Donnelly; Alina Jade Barnett; Chaofan Chen; | code |
319 | Restormer: Efficient Transformer for High-Resolution Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images. |
Syed Waqas Zamir; Aditya Arora; Salman Khan; Munawar Hayat; Fahad Shahbaz Khan; Ming-Hsuan Yang; | code |
320 | IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we devise an efficient encoder-decoder based network, termed IFRNet, for fast intermediate frame synthesizing. |
Lingtong Kong; Boyuan Jiang; Donghao Luo; Wenqing Chu; Xiaoming Huang; Ying Tai; Chengjie Wang; Jie Yang; | code |
321 | Large Loss Matters in Weakly Supervised Multi-Label Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: That is, the model first learns the representation of clean labels, and then starts memorizing noisy labels. Based on this finding, we propose novel methods for WSML which reject or correct the large loss samples to prevent model from memorizing the noisy label. |
Youngwook Kim; Jae Myung Kim; Zeynep Akata; Jungwoo Lee; | code |
322 | Neural Inertial Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes the inertial localization problem, the task of estimating the absolute location from a sequence of inertial sensor measurements. |
Sachini Herath; David Caruso; Chen Liu; Yufan Chen; Yasutaka Furukawa; | code |
323 | GraftNet: Towards Domain Generalized Stereo Matching With A Broad-Spectrum and Task-Oriented Feature Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to leverage the feature of a model trained on large-scale datasets to deal with the domain shift since it has seen various styles of images. |
Biyang Liu; Huimin Yu; Guodong Qi; | code |
324 | VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning, without requiring any human annotation. |
Wenjia Xu; Yongqin Xian; Jiuniu Wang; Bernt Schiele; Zeynep Akata; | code |
325 | Catching Both Gray and Black Swans: Open-Set Supervised Anomaly Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel approach that learns disentangled representations of abnormalities illustrated by seen anomalies, pseudo anomalies, and latent residual anomalies (i.e., samples that have unusual residuals compared to the normal data in a latent space), with the last two abnormalities designed to detect unseen anomalies. |
Choubo Ding; Guansong Pang; Chunhua Shen; | code |
326 | MLSLT: Towards Multilingual Sign Language Translation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, such models are inefficient in building multilingual sign language translation systems. To solve this problem, we introduce the multilingual sign language translation (MSLT) task. |
Aoxiong Yin; Zhou Zhao; Weike Jin; Meng Zhang; Xingshan Zeng; Xiaofei He; | code |
327 | Towards An End-to-End Framework for Flow-Guided Video Inpainting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an End-to-End framework for Flow-Guided Video Inpainting through elaborately designed three trainable modules, namely, flow completion, feature propagation, and content hallucination modules. |
Zhen Li; Cheng-Ze Lu; Jianhua Qin; Chun-Le Guo; Ming-Ming Cheng; | code |
328 | Contrastive Test-Time Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel way to leverage self-supervised contrastive learning to facilitate target feature learning, along with an online pseudo labeling scheme with refinement that significantly denoises pseudo labels. |
Dian Chen; Dequan Wang; Trevor Darrell; Sayna Ebrahimi; | code |
329 | MotionAug: Augmentation With Physical Correction for Human Motion Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a motion data augmentation scheme incorporating motion synthesis encouraging diversity and motion correction imposing physical plausibility. |
Takahiro Maeda; Norimichi Ukita; | code |
330 | Modeling Indirect Illumination for Inverse Rendering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach to efficiently recovering spatially-varying indirect illumination. |
Yuanqing Zhang; Jiaming Sun; Xingyi He; Huan Fu; Rongfei Jia; Xiaowei Zhou; | code |
331 | TransWeather: Transformer-Based Restoration of Images Degraded By Adverse Weather Conditions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we focus on developing an efficient solution for the all adverse weather removal problem. |
Jeya Maria Jose Valanarasu; Rajeev Yasarla; Vishal M. Patel; | code |
332 | H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing methods usually focus on partial detection components for domain alignment. In contrast, this paper considers that all the detection components are important and proposes a Holistic and Hierarchical Feature Alignment (H^2FA) R-CNN. |
Yunqiu Xu; Yifan Sun; Zongxin Yang; Jiaxu Miao; Yi Yang; | code |
333 | P3Depth: Monocular Depth Estimation With A Piecewise Planarity Prior Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on knowledge about the high regularity of real 3D scenes, we propose a method that learns to selectively leverage information from coplanar pixels to improve the predicted depth. |
Vaishakh Patil; Christos Sakaridis; Alexander Liniger; Luc Van Gool; | code |
334 | GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we reveal and address the disadvantages of the conventional query-driven HOI detectors from the two aspects. |
Yue Liao; Aixi Zhang; Miao Lu; Yongliang Wang; Xiaobo Li; Si Liu; | code |
335 | Simple Multi-Dataset Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a simple method for training a unified detector on multiple large-scale datasets. |
Xingyi Zhou; Vladlen Koltun; Philipp Krähenbühl; | code |
336 | Proactive Image Manipulation Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: By contrast, we propose a proactive scheme to image manipulation detection. |
Vishal Asnani; Xi Yin; Tal Hassner; Sijia Liu; Xiaoming Liu; | code |
337 | StyTr2: Image Style Transfer With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Therefore, traditional neural style transfer methods face biased content representation. To address this critical issue, we take long-range dependencies of input images into account for image style transfer by proposing a transformer-based approach called StyTr^2. |
Yingying Deng; Fan Tang; Weiming Dong; Chongyang Ma; Xingjia Pan; Lei Wang; Changsheng Xu; | code |
338 | Global Matching With Overlapping Attention for Optical Flow Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, inspired by the traditional matching-optimization methods where matching is introduced to handle large displacements before energy-based optimizations, we introduce a simple but effective global matching step before the direct regression and develop a learning-based matching-optimization framework, namely GMFlowNet. |
Shiyu Zhao; Long Zhao; Zhixing Zhang; Enyu Zhou; Dimitris Metaxas; | code |
339 | Language As Queries for Referring Video Object Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a simple and unified framework built upon Transformer, termed ReferFormer. |
Jiannan Wu; Yi Jiang; Peize Sun; Zehuan Yuan; Ping Luo; | code |
340 | MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection. |
Yanghao Li; Chao-Yuan Wu; Haoqi Fan; Karttikeya Mangalam; Bo Xiong; Jitendra Malik; Christoph Feichtenhofer; | code |
341 | Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Focusing on the relatively underexplored task of audio-visual zero-shot learning, we propose to learn multi-modal representations from audio-visual data using cross-modal attention and exploit textual label embeddings for transferring knowledge from seen classes to unseen classes. |
Otniel-Bogdan Mercea; Lukas Riesch; A. Sophia Koepke; Zeynep Akata; | code |
342 | Rethinking Efficient Lane Detection Via Curve Modeling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel parametric curve-based method for lane detection in RGB images. |
Zhengyang Feng; Shaohua Guo; Xin Tan; Ke Xu; Min Wang; Lizhuang Ma; | code |
343 | Self-Supervised Arbitrary-Scale Point Clouds Upsampling Via Implicit Neural Representation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach that achieves selfsupervised and magnification-flexible point clouds upsampling simultaneously. |
Wenbo Zhao; Xianming Liu; Zhiwei Zhong; Junjun Jiang; Wei Gao; Ge Li; Xiangyang Ji; | code |
344 | Co-Advise: Cross Inductive Bias Distillation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Unlike previous works, where merely heavy convolution-based teachers are provided, in this paper, we delve into the influence of models inductive biases in knowledge distillation (e.g., convolution and involution). |
Sucheng Ren; Zhengqi Gao; Tianyu Hua; Zihui Xue; Yonglong Tian; Shengfeng He; Hang Zhao; | code |
345 | AdaMixer: A Fast-Converging Query-Based Object Detector Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, this paradigm still suffers from slow convergence, limited performance, and design complexity of extra networks between backbone and decoder. In this paper, we find that the key to these issues is the adaptability of decoders for casting queries to varying objects. |
Ziteng Gao; Limin Wang; Bing Han; Sheng Guo; | code |
346 | DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In these, there are limited number of WSI slides (bags), while the resolution of a single WSI is huge, which leads to a large number of patches (instances) cropped from this slide. To address this issue, we propose to virtually enlarge the number of bags by introducing the concept of pseudo-bags, on which a double-tier MIL framework is built to effectively use the intrinsic features. |
Hongrun Zhang; Yanda Meng; Yitian Zhao; Yihong Qiao; Xiaoyun Yang; Sarah E. Coupland; Yalin Zheng; | code |
347 | BEVT: BERT Pretraining of Video Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce BEVT which decouples video representation learning into spatial representation learning and temporal dynamics learning. |
Rui Wang; Dongdong Chen; Zuxuan Wu; Yinpeng Chen; Xiyang Dai; Mengchen Liu; Yu-Gang Jiang; Luowei Zhou; Lu Yuan; | code |
348 | Deep Generalized Unfolding Networks for Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Deep Generalized Unfolding Network (DGUNet) for image restoration. |
Chong Mou; Qian Wang; Jian Zhang; | code |
349 | VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel single-stage framework for online VIS built based on the grid structured feature representation. |
Su Ho Han; Sukjun Hwang; Seoung Wug Oh; Yeonchool Park; Hyunwoo Kim; Min-Jung Kim; Seon Joo Kim; | code |
350 | Deep Unlearning Via Randomized Conditionally Independent Hessians Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We use a variant of a new conditional independence coefficient, L-CODEC, to identify a subset of the model parameters with the most semantic overlap on an individual sample level. |
Ronak Mehta; Sourav Pal; Vikas Singh; Sathya N. Ravi; | code |
351 | Revisiting Skeleton-Based Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose PoseConv3D, a new approach to skeleton-based action recognition. |
Haodong Duan; Yue Zhao; Kai Chen; Dahua Lin; Bo Dai; | code |
352 | Stereo Depth From Events Cameras: Concentrate and Focus on The Future Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To alleviate the event missing or overriding issue, we propose to learn to concentrate on the dense events to produce a compact event representation with high details for depth estimation. |
Yeongwoo Nam; Mohammad Mostafavi; Kuk-Jin Yoon; Jonghyun Choi; | code |
353 | A Simple Data Mixing Prior for Improving Self-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Data mixing (e.g., Mixup, Cutmix, ResizeMix) is an essential component for advancing recognition models. In this paper, we focus on studying its effectiveness in the self-supervised setting. |
Sucheng Ren; Huiyu Wang; Zhengqi Gao; Shengfeng He; Alan Yuille; Yuyin Zhou; Cihang Xie; | code |
354 | Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we investigate an alternative strategy for pre-training, namely Knowledge Distillation as Efficient Pre-training (KDEP), aiming to efficiently transfer the learned feature representation from existing pre-trained models to new student models for future downstream tasks. |
Ruifei He; Shuyang Sun; Jihan Yang; Song Bai; Xiaojuan Qi; | code |
355 | BigDL 2.0: Seamless Scaling of AI Pipelines From Laptops to Distributed Cluster Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address this challenge, we have open sourced BigDL 2.0 at https://github.com/intel-analytics/BigDL/ under Apache 2.0 license (combining the original BigDL [19] and Analytics Zoo [18] projects); using BigDL 2.0, users can simply build conventional Python notebooks on their laptops (with possible AutoML support), which can then be transparently accelerated on a single node (with up-to 9.6x speedup in our experiments), and seamlessly scaled out to a large cluster (across several hundreds servers in real-world use cases). |
Jason (Jinquan) Dai; Ding Ding; Dongjie Shi; Shengsheng Huang; Jiao Wang; Xin Qiu; Kai Huang; Guoqiong Song; Yang Wang; Qiyuan Gong; Jiaming Song; Shan Yu; Le Zheng; Yina Chen; Junwei Deng; Ge Song; | code |
356 | Attentive Fine-Grained Structured Sparsity for Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To further optimize the trade-off between the efficiency and the restoration accuracy, we propose a novel pruning method that determines the pruning ratio for N:M structured sparsity at each layer. |
Junghun Oh; Heewon Kim; Seungjun Nah; Cheeun Hong; Jonghyun Choi; Kyoung Mu Lee; | code |
357 | Learning Fair Classifiers With Partially Annotated Group Labels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider a more practical scenario, dubbed as Algorithmic Group Fairness with the Partially annotated Group labels (Fair-PG). |
Sangwon Jung; Sanghyuk Chun; Taesup Moon; | code |
358 | NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose NightLab, a novel nighttime segmentation framework that leverages multiple deep learning models imbued with night-aware features to yield State-of-The-Art (SoTA) performance on multiple night segmentation benchmarks. |
Xueqing Deng; Peng Wang; Xiaochen Lian; Shawn Newsam; | code |
359 | Constrained Few-Shot Class-Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To meet the above constraints, we propose C-FSCIL, which is architecturally composed of a frozen meta-learned feature extractor, a trainable fixed-size fully connected layer, and a rewritable dynamically growing memory that stores as many vectors as the number of encountered classes. |
Michael Hersche; Geethan Karunaratne; Giovanni Cherubini; Luca Benini; Abu Sebastian; Abbas Rahimi; | code |
360 | Threshold Matters in WSSS: Manipulating The Activation for The Robust and Accurate Segmentation Model Against Thresholds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Then, we show that this issue can be mitigated by satisfying two conditions; 1) reducing the imbalance in the foreground activation and 2) increasing the gap between the foreground and the background activation. Based on these findings, we propose a novel activation manipulation network with a per-pixel classification loss and a label conditioning module. |
Minhyun Lee; Dongseob Kim; Hyunjung Shim; | code |
361 | TransMVSNet: Global Context-Aware Multi-View Stereo Network With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present TransMVSNet, based on our exploration of feature matching in multi-view stereo (MVS). |
Yikang Ding; Wentao Yuan; Qingtian Zhu; Haotian Zhang; Xiangyue Liu; Yuanjiang Wang; Xiao Liu; | code |
362 | DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we propose DPGEN, a network model designed to synthesize high-resolution natural images while satisfying differential privacy. |
Jia-Wei Chen; Chia-Mu Yu; Ching-Chia Kao; Tzai-Wei Pang; Chun-Shien Lu; | code |
363 | The Majority Can Help The Minority: Context-Rich Minority Oversampling for Long-Tailed Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel minority over-sampling method to augment diversified minority samples by leveraging the rich context of the majority classes as background images. |
Seulki Park; Youngkyu Hong; Byeongho Heo; Sangdoo Yun; Jin Young Choi; | code |
364 | IntentVizor: Towards Generic Query Guided Interactive Video Summarization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: First, the text query might not be enough to describe the exact and diverse needs of the user. Second, the user cannot edit once the summaries are produced, while we assume the needs of the user should be subtle and need to be adjusted interactively. To solve these two problems, we propose IntentVizor, an interactive video summarization framework guided by generic multi-modality queries. |
Guande Wu; Jianzhe Lin; Claudio T. Silva; | code |
365 | Shape-Invariant 3D Adversarial Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel Point-Cloud Sensitivity Map to boost both the efficiency and imperceptibility of point perturbations. |
Qidong Huang; Xiaoyi Dong; Dongdong Chen; Hang Zhou; Weiming Zhang; Nenghai Yu; | code |
366 | Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we strive to liberate ViTs from pre-training by introducing CNNs’ inductive biases back to ViTs while preserving their network architectures for higher upper bound and setting up more suitable optimization objectives. |
Haofei Zhang; Jiarui Duan; Mengqi Xue; Jie Song; Li Sun; Mingli Song; | code |
367 | PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, one of the greatest challenges remains the creation of datasets with complete, unambiguous ground truth at scale. To address this, we develop a new, more comprehensive dataset for table extraction, called PubTables-1M. |
Brandon Smock; Rohith Pesala; Robin Abraham; | code |
368 | Meta-Attention for ViT-Backed Continual Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study ViT-backed continual learning to strive for higher performance riding on recent advances of ViTs. |
Mengqi Xue; Haofei Zhang; Jie Song; Mingli Song; | code |
369 | DST: Dynamic Substitute Training for Data-Free Black-Box Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel dynamic substitute training attack method to encourage substitute model to learn better and faster from the target model. |
Wenxuan Wang; Xuelin Qian; Yanwei Fu; Xiangyang Xue; | code |
370 | Unified Contrastive Learning in Image-Text-Label Space Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we introduce a new formulation by combining the two data sources into a common image-text-label space. |
Jianwei Yang; Chunyuan Li; Pengchuan Zhang; Bin Xiao; Ce Liu; Lu Yuan; Jianfeng Gao; | code |
371 | Unsupervised Pre-Training for Temporal Action Localization Tasks Literature Review Related Patents Related Grants Related Orgs |