Paper Digest: ECCV 2022 Highlights & Code
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.
If you do not want to miss interesting academic papers, you are welcome to sign up our daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: ECCV 2022 Highlights & Code
Paper | Author(s) | |
---|---|---|
1 | Learning Depth from Focus in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a convolutional neural network-based depth estimation from single focal stacks.In addition, for the generalization of the proposed network, we develop a simulator to realistically reproduce the features of commercial cameras, such as changes in field of view, focal length and principal points. |
Changyeon Won; Hae-Gon Jeon; |
2 | Learning-Based Point Cloud Registration for 6D Object Pose Estimation in The Real World Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we tackle the task of estimating the 6D pose of an object from point cloud data. |
Zheng Dang; Lizhou Wang; Yu Guo; Mathieu Salzmann; |
3 | An End-to-End Transformer Model for Crowd Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an elegant, end-to-end Crowd Localization TRansformer named CLTR that solves the task in the regression-based paradigm. |
Dingkang Liang; Wei Xu; Xiang Bai; |
4 | Few-Shot Single-View 3D Reconstruction with Memory Prior Contrastive Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a Memory Prior Contrastive Network (MPCN) that can store shape prior knowledge in a few-shot learning based 3D reconstruction framework. |
Zhen Xing; Yijiang Chen; Zhixin Ling; Xiangdong Zhou; Yu Xiang; |
5 | DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: It is coupled by visual depth clues and instance attribute clues, making it hard to be directly learned in the network. Therefore, we propose to reformulate the instance depth to the combination of the instance visual surface depth (visual depth) and the instance attribute depth (attribute depth). |
Liang Peng; Xiaopei Wu; Zheng Yang; Haifeng Liu; Deng Cai; |
6 | Adaptive Co-Teaching for Unsupervised Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Unsupervised depth estimation using photometric losses suffers from local minimum and training instability. We address this issue by proposing an adaptive co-teaching framework to distill the learned knowledge from unsupervised teacher networks to a student network. |
Weisong Ren; Lijun Wang; Yongri Piao; Miao Zhang; Huchuan Lu; Ting Liu; |
7 | Fusing Local Similarities for Retrieval-Based 3D Orientation Estimation of Unseen Objects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images. |
Chen Zhao; Yinlin Hu; Mathieu Salzmann; |
8 | Lidar Point Cloud Guided Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We delve into this underlying mechanism and then empirically find that: concerning the label accuracy, the 3D location part in the label is preferred compared to other parts of labels. Motivated by the conclusions above and considering the precise LiDAR 3D measurement, we propose a simple and effective framework, dubbed LiDAR point cloud guided monocular 3D object detection (LPCG). |
Liang Peng; Fei Liu; Zhengxu Yu; Senbo Yan; Dan Deng; Zheng Yang; Haifeng Liu; Deng Cai; |
9 | Structural Causal 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper considers the problem of unsupervised 3D object reconstruction from in-the-wild single-view images. |
Weiyang Liu; Zhen Liu; Liam Paull; Adrian Weller; Bernhard Schö,lkopf; |
10 | 3D Human Pose Estimation Using Möbius Graph Convolutional Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, a major limitation of GCNs is their inability to encode all the transformations between joints explicitly. To address this issue, we propose a novel spectral GCN using the Möbius transformation (MöbiusGCN). |
Niloofar Azizi; Horst Possegger; Emanuele Rodolà,; Horst Bischof; |
11 | Learning to Train A Point Cloud Reconstruction Network Without Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel framework named PCLossNet which learns to train a point cloud reconstruction network without any matching. |
Tianxin Huang; Xuemeng Yang; Jiangning Zhang; Jinhao Cui; Hao Zou; Jun Chen; Xiangrui Zhao; Yong Liu; |
12 | PanoFormer: Panorama Transformer for Indoor 360° Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes the panorama Transformer (named PanoFormer) to estimate the depth in panorama images, with tangent patches from spherical domain, learnable token flows, and panorama specific metrics. |
Zhijie Shen; Chunyu Lin; Kang Liao; Lang Nie; Zishuo Zheng; Yao Zhao; |
13 | Self-supervised Human Mesh Recovery with Cross-Representation Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, on synthetic dense correspondence maps (i.e., IUV) few have been explored since the domain gap between synthetic training data and real testing data is hard to address for 2D dense representation. To alleviate this domain gap on IUV, we propose cross-representation alignment utilizing the complementary information from the robust but sparse representation (2D keypoints). |
Xuan Gong; Meng Zheng; Benjamin Planche; Srikrishna Karanam; Terrence Chen; David Doermann; Ziyan Wu; |
14 | AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In particular, we propose a joint learning framework that disentangles the pose and the shape. |
Zerui Chen; Yana Hasson; Cordelia Schmid; Ivan Laptev; |
15 | A Reliable Online Method for Joint Estimation of Focal Length and Camera Rotation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Linear perspective cues deriving from regularities of the built environment can be used to recalibrate both intrinsic and extrinsic camera parameters online, but these estimates can be unreliable due to irregularities in the scene, uncertainties in line segment estimation and background clutter. Here we address this challenge through four initiatives. |
Yiming Qian; James H. Elder; |
16 | PS-NeRF: Neural Inverse Rendering for Multi-View Photometric Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a neural inverse rendering method for MVPS based on implicit representation. |
Wenqi Yang; Guanying Chen; Chaofeng Chen; Zhenfang Chen; Kwan-Yee K. Wong; |
17 | Share with Thy Neighbors: Single-View Reconstruction By Cross-Instance Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our main contributions are two ways for leveraging cross-instance consistency: (i) progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion and (ii) neighbor reconstruction, a loss enforcing consistency between instances having similar shape or texture. |
Tom Monnier; Matthew Fisher; Alexei A. Efros; Mathieu Aubry; |
18 | Towards Comprehensive Representation Enhancement in Semantics-Guided Self-Supervised Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose an attention-based module to enhance task-specific feature by addressing their feature uniqueness within instances. |
Jingyuan Ma; Xiangyu Lei; Nan Liu; Xian Zhao; Shiliang Pu; |
19 | AvatarCap: Animatable Avatar Conditioned Monocular Human Volumetric Capture Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the ill-posed problem caused by partial observations in monocular human volumetric capture, we present AvatarCap, a novel framework that introduces animatable avatars into the capture pipeline for high-fidelity reconstruction in both visible and invisible regions. |
Zhe Li; Zerong Zheng; Hongwen Zhang; Chaonan Ji; Yebin Liu; |
20 | Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel transformer encoder-decoder architecture for 3D human mesh reconstruction from a single image, called FastMETRO. |
Junhyeong Cho; Kim Youwang; Tae-Hyun Oh; |
21 | GeoRefine: Self-Supervised Online Depth Refinement for Accurate Dense Mapping Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a robust and accurate depth refinement system, named GeoRefine, for geometrically-consistent dense mapping from monocular sequences. |
Pan Ji; Qingan Yan; Yuxin Ma; Yi Xu; |
22 | Multi-modal Masked Pre-training for Monocular Panoramic Depth Completion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we formulate a potentially valuable panoramic depth completion (PDC) task as panoramic 3D cameras often produce 360° depth with missing data in complex scenes. |
Zhiqiang Yan; Xiang Li; Kun Wang; Zhenyu Zhang; Jun Li; Jian Yang; |
23 | GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel two-stage Geometry PrIor-based Transformation framework named GitNet, consisting of (i) the geometry-guided pre-alignment and (ii) ray-based transformer. |
Shi Gong; Xiaoqing Ye; Xiao Tan; Jingdong Wang; Errui Ding; Yu Zhou; Xiang Bai; |
24 | Learning Visibility for Robust Dense Human Body Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we learn dense human body estimation that is robust to partial observations. |
Chun-Han Yao; Jimei Yang; Duygu Ceylan; Yi Zhou; Yang Zhou; Ming-Hsuan Yang; |
25 | Towards High-Fidelity Single-View Holistic Reconstruction of Indoor Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a new framework to reconstruct holistic 3D indoor scenes including both room background and indoor objects from single-view images. |
Haolin Liu; Yujian Zheng; Guanying Chen; Shuguang Cui; Xiaoguang Han; |
26 | CompNVS: Novel View Synthesis with Scene Completion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a scalable framework for novel view synthesis from RGB-D images with largely incomplete scene coverage. |
Zuoyue Li; Tianxing Fan; Zhenqiang Li; Zhaopeng Cui; Yoichi Sato; Marc Pollefeys; Martin R. Oswald; |
27 | SketchSampler: Sketch-Based 3D Reconstruction Via View-Dependent Depth Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Through analyzing the 3D-to-2D projection process, we notice that the density map that characterizes the distribution of 2D point clouds (i.e., the probability of points projected at each location of the projection plane) can be used as a proxy to facilitate the reconstruction process. |
Chenjian Gao; Qian Yu; Lu Sheng; Yi-Zhe Song; Dong Xu; |
28 | LocalBins: Improving Depth Estimation By Learning Local Distributions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel architecture for depth estimation from a single image. |
Shariq Farooq Bhat; Ibraheem Alhashim; Peter Wonka; |
29 | 2D GANs Meet Unsupervised Single-View 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, less attention has been devoted to 3D vision tasks. In light of this, we propose a novel image-conditioned neural implicit field, which can leverage 2D supervisions from GAN-generated multi-view images and perform the single-view reconstruction of generic objects. |
Feng Liu; Xiaoming Liu; |
30 | InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a method for learning to generate unbounded flythrough videos of natural scenes starting from a single view. |
Zhengqi Li; Qianqian Wang; Noah Snavely; Angjoo Kanazawa; |
31 | Semi-Supervised Single-View 3D Reconstruction Via Prototype Shape Priors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In particular, we introduce an attention-guided prototype shape prior module for guiding realistic object reconstruction. |
Zhen Xing; Hengduo Li; Zuxuan Wu; Yu-Gang Jiang; |
32 | Bilateral Normal Integration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To model discontinuities, we introduce the assumption that the surface to be recovered is semi-smooth, i.e., the surface is one-sided differentiable (hence one-sided continuous) everywhere in the horizontal and vertical directions. |
Xu Cao; Hiroaki Santo; Boxin Shi; Fumio Okura; Yasuyuki Matsushita; |
33 | S$^2$Contact: Graph-Based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel semi-supervised framework that allows us to learn contact from monocular videos. |
Tze Ho Elden Tse; Zhongqun Zhang; Kwang In Kim; Aleš Leonardis; Feng Zheng; Hyung Jin Chang; |
34 | SC-wLS: Towards Interpretable Feed-Forward Camera Re-localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In order to have the best of both worlds, we propose a feed-forward method termed SC-wLS that exploits all scene coordinate estimates for weighted least squares pose regression. |
Xin Wu; Hao Zhao; Shunkai Li; Yingdian Cao; Hongbin Zha; |
35 | FloatingFusion: Depth from ToF and Image-Stabilized Stereo Cameras Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Leveraging ToF depth estimates and a wide-angle RGB camera, we design an automatic calibration technique based on dense 2D/3D matching that can estimate camera pose intrinsic and distortion parameters of a stabilized main RGB sensor from a single snapshot. |
Andreas Meuleman; Hakyeong Kim; James Tompkin; Min H. Kim; |
36 | DELTAR: Depth Estimation from A Light-Weight ToF Sensor and RGB Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose DELTAR, a novel method to empower light-weight ToF sensors with the capability of measuring high resolution and accurate depth by cooperating with a color image. |
Yijin Li; Xinyang Liu; Wenqi Dong; Han Zhou; Hujun Bao; Guofeng Zhang; Yinda Zhang; Zhaopeng Cui; |
37 | 3D Room Layout Estimation from A Cubemap of Panorama Image Via Deep Manhattan Hough Transform Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Significant geometric structures can be compactly described by global wireframes in the estimation of 3D room layout from a single panoramic image. Based on this observation, we present an alternative approach to estimate the walls in 3D space by modeling long-range geometric patterns in a learnable Hough Transform block. |
Yining Zhao; Chao Wen; Zhou Xue; Yue Gao; |
38 | RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, their shape prior integration strategy boosts pose estimation indirectly, which leads to insufficient pose-sensitive feature extraction and slow inference speed. To tackle this problem, in this paper, we propose a novel geometry-guided Residual Object Bounding Box Projection network RBP-Pose that jointly predicts object pose and residual vectors describing the displacements from the shape-prior-indicated object surface projections on the bounding box towards real surface projections. |
Ruida Zhang; Yan Di; Zhiqiang Lou; Fabian Manhardt; Federico Tombari; Xiangyang Ji; |
39 | Monocular 3D Object Reconstruction with GAN Inversion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present MeshInversion, a novel framework to improve the reconstruction by exploiting the generative prior of a 3D GAN pre-trained for 3D textured mesh synthesis. |
Junzhe Zhang; Daxuan Ren; Zhongang Cai; Chai Kiat Yeo; Bo Dai; Chen Change Loy; |
40 | Map-Free Visual Relocalization: Metric Pose Relative to A Single Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, we propose Map-free Relocalization, i.e., using only one photo of a scene to enable instant, metric scaled relocalization.Thus, we have constructed a new dataset of 655 small places of interest, such as sculptures, murals and fountains, collected worldwide. |
Eduardo Arnold; Jamie Wynn; Sara Vicente; Guillermo Garcia-Hernando; Aron Monszpart; Victor Prisacariu; Daniyar Turmukhambetov; Eric Brachmann; |
41 | Self-Distilled Feature Aggregation for Self-Supervised Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Most of the existing works in literature aggregate multi-scale features for depth prediction via either straightforward concatenation or element-wise addition, however, such feature aggregation operations generally neglect the contextual consistency between multi-scale features. Addressing this problem, we propose the Self-Distilled Feature Aggregation (SDFA) module for simultaneously aggregating a pair of low-scale and high-scale features and maintaining their contextual consistency. |
Zhengming Zhou; Qiulei Dong; |
42 | Planes Vs. Chairs: Category-Guided 3D Shape Learning Without Any 3D Cues Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel 3D shape reconstruction method which learns to predict an implicit 3D shape representation from a single RGB image. |
Zixuan Huang; Stefan Stojanov; Anh Thai; Varun Jampani; James M. Rehg; |
43 | MHR-Net: Multiple-Hypothesis Reconstruction of Non-rigid Shapes from 2D Views Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose MHR-Net, a novel method for recovering Non-Rigid Shapes from Motion (NRSfM). |
Haitian Zeng; Xin Yu; Jiaxu Miao; Yi Yang; |
44 | Depth Map Decomposition for Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel algorithm for monocular depth estimation that decomposes a metric depth map into a normalized depth map and scale features. |
Jinyoung Jun; Jae-Han Lee; Chul Lee; Chang-Su Kim; |
45 | Monitored Distillation for Positive Congruent Depth Completion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a method to infer a dense depth map from a single image, its calibration, and the associated sparse point cloud. |
Tian Yu Liu; Parth Agrawal; Allison Chen; Byung-Woo Hong; Alex Wong; |
46 | Resolution-Free Point Cloud Sampling Network with Data Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel resolution-free point clouds sampling network to directly sample the original point cloud to different resolutions, which is conducted by optimizing non-learning-based initial sampled points to better positions. |
Tianxin Huang; Jiangning Zhang; Jun Chen; Yuang Liu; Yong Liu; |
47 | Organic Priors in Non-rigid Structure from Motion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: It is shown that such priors reside in the factorized matrices, and quite surprisingly, existing methods generally disregard them. The paper’s main contribution is to put forward a simple, methodical, and practical method that can effectively exploit such organic priors to solve NRSfM. |
Suryansh Kumar; Luc Van Gool; |
48 | Perspective Flow Aggregation for Data-Limited 6D Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a method that can be trained solely on synthetic images, or optionally using a few additional real ones. |
Yinlin Hu; Pascal Fua; Mathieu Salzmann; |
49 | DANBO: Disentangled Articulated Neural Body Representations Via Graph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a three-stage method that induces two inductive biases to better disentangled pose-dependent deformation. |
Shih-Yang Su; Timur Bagautdinov; Helge Rhodin; |
50 | CHORE: Contact, Human and Object REconstruction from A Single RGB Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce CHORE, a novel method that learns to jointly reconstruct the human and the object from a single RGB image. |
Xianghui Xie; Bharat Lal Bhatnagar; Gerard Pons-Moll; |
51 | Learned Vertex Descent: A New Direction for 3D Human Model Fitting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel optimization-based paradigm for 3D human shape fitting on images. |
Enric Corona; Gerard Pons-Moll; Guillem Alenyà,; Francesc Moreno-Noguer; |
52 | Self-Calibrating Photometric Stereo By Neural Inverse Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new method that jointly optimizes object shape, light directions, and light intensities, all under general surfaces and lights assumptions. |
Junxuan Li; Hongdong Li; |
53 | 3D Clothed Human Reconstruction in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, such datasets contain simple human poses and less natural image appearances compared to those of real in-the-wild datasets, which makes generalization of it to in-the-wild images extremely challenging. To resolve this issue, in this work, we propose ClothWild, a 3D clothed human reconstruction framework that firstly addresses the robustness on in-thewild images. |
Gyeongsik Moon; Hyeongjin Nam; Takaaki Shiratori; Kyoung Mu Lee; |
54 | Directed Ray Distance Functions for 3D Scene Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present an approach for full 3D scene reconstruction from a single new image that can be trained on realistic non-watertight scans. |
Nilesh Kulkarni; Justin Johnson; David F. Fouhey; |
55 | Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Recently, RGBD-based category-level 6D object pose estimation has achieved promising improvement in performance, however, the requirement of depth information prohibits broader applications. In order to relieve this problem, this paper proposes a novel approach named Object Level Depth reconstruction Network (OLD-Net) taking only RGB images as input for category-level 6D object pose estimation. |
Zhaoxin Fan; Zhenbo Song; Jian Xu; Zhicheng Wang; Kejian Wu; Hongyan Liu; Jun He; |
56 | Uncertainty Quantification in Depth Estimation Via Constrained Ordinal Regression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper provides an uncertainty quantification method for supervised MDE models. |
Dongting Hu; Liuhua Peng; Tingjin Chu; Xiaoxing Zhang; Yinian Mao; Howard Bondell; Mingming Gong; |
57 | CostDCNet: Cost Volume Based Depth Completion for A Single RGB-D Image Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel depth completion framework, CostDCNet, based on the cost volume-based depth estimation approach that has been successfully employed for multi-view stereo (MVS). |
Jaewon Kam; Jungeon Kim; Soongjin Kim; Jaesik Park; Seungyong Lee; |
58 | ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present ShAPO, a method for joint multi-object detection, 3D textured reconstruction, 6D object pose and size estimation. |
Muhammad Zubair Irshad; Sergey Zakharov; Rare? Ambru?; Thomas Kollar; Zsolt Kira; Adrien Gaidon; |
59 | 3D Siamese Transformer Network for Single Object Tracking on Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explicitly use Transformer to form a 3D Siamese Transformer network for learning robust cross correlation between the template and the search area of point clouds. |
Le Hui; Lingpeng Wang; Linghua Tang; Kaihao Lan; Jin Xie; Jian Yang; |
60 | Object Wake-Up: 3D Object Rigging from A Single Image Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: It is a new problem that not only goes beyond image-based object reconstruction but also involves articulated animation of generic objects in 3D, which could give rise to numerous downstream augmented and virtual reality applications. In this paper, we propose an automated approach to tackle the entire process of reconstruct such generic 3D objects, rigging and animation, all from single images. |
Ji Yang; Xinxin Zuo; Sen Wang; Zhenbo Yu; Xingyu Li; Bingbing Ni; Minglun Gong; Li Cheng; |
61 | IntegratedPIFu: Integrated Pixel Aligned Implicit Function for Single-View Human Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose IntegratedPIFu, a new pixel-aligned implicit model that builds on the foundation set by PIFuHD. |
Kennard Yanting Chan; Guosheng Lin; Haiyu Zhao; Weisi Lin; |
62 | Realistic One-Shot Mesh-Based Head Avatars Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a system for the creation of realistic one-shot mesh-based (ROME) human head avatars. |
Taras Khakhulin; Vanessa Sklyarova; Victor Lempitsky; Egor Zakharov; |
63 | A Kendall Shape Space Approach to 3D Shape Estimation from 2D Landmarks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we present a new approach based on Kendall’s shape space to reconstruct 3D shapes from single monocular 2D images. |
Martha Paskin; Daniel Baum; Mason N. Dean; Christoph von Tycowicz; |
64 | Neural Light Field Estimation for Street Scenes with Differentiable Virtual Object Insertion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a neural approach that estimates the 5D HDR light field from a single image, and a differentiable object insertion formulation that enables end-to-end training with image-based losses that encourage realism. |
Zian Wang; Wenzheng Chen; David Acuna; Jan Kautz; Sanja Fidler; |
65 | Perspective Phase Angle Model for Polarimetric 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In the case of a large field of view, however, this assumption does not hold and may result in significant reconstruction errors in methods that make this assumption. To address this problem, we present the perspective phase angle (PPA) model that is applicable to perspective cameras. |
Guangcheng Chen; Li He; Yisheng Guan; Hong Zhang; |
66 | DeepShadow: Neural Shape from Shadow Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents ‘DeepShadow’, a one-shot method for recovering the depth map and surface normals from photometric stereo shadow maps. |
Asaf Karnieli; Ohad Fried; Yacov Hel-Or; |
67 | Camera Auto-Calibration from The Steiner Conic of The Fundamental Matrix Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We thus propose a method to fully calibrate the camera. |
Yu Liu; Hui Zhang; |
68 | Super-Resolution 3D Human Shape from A Single Low-Resolution Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel framework to reconstruct super-resolution human shape from a single low-resolution input image. |
Marco Pesavento; Marco Volino; Adrian Hilton; |
69 | Minimal Neural Atlas: Parameterizing Complex Surfaces with Minimal Charts and Distortion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present Minimal Neural Atlas, a novel atlas-based explicit neural surface representation. |
Weng Fei Low; Gim Hee Lee; |
70 | ExtrudeNet: Unsupervised Inverse Sketch-and-Extrude for Shape Parsing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present ExtrudeNet, an unsupervised end-to-end network for discovering sketch and extrude from point clouds. |
Daxuan Ren; Jianmin Zheng; Jianfei Cai; Jiatong Li; Junzhe Zhang; |
71 | CATRE: Iterative Point Clouds Alignment for Category-Level Object Pose Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In specific, we propose a novel disentangled architecture being aware of the inherent distinctions between rotation and translation/size estimation. |
Xingyu Liu; Gu Wang; Yi Li; Xiangyang Ji; |
72 | Optimization Over Disentangled Encoding: Unsupervised Cross-Domain Point Cloud Completion Via Occlusion Factor Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we disentangle partial scans into three (domain, shape, and occlusion) factors to handle the output gap in cross-domain completion. |
Jingyu Gong; Fengqi Liu; Jiachen Xu; Min Wang; Xin Tan; Zhizhong Zhang; Ran Yi; Haichuan Song; Yuan Xie; Lizhuang Ma; |
73 | Unsupervised Learning of 3D Semantic Keypoints with Mutual Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: From a novel mutual reconstruction perspective, we present an unsupervised method to generate consistent semantic keypoints from point clouds explicitly. |
Haocheng Yuan; Chen Zhao; Shichao Fan; Jiaxi Jiang; Jiaqi Yang; |
74 | MvDeCor: Multi-View Dense Correspondence Learning for Fine-Grained 3D Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks. |
Gopal Sharma; Kangxue Yin; Subhransu Maji; Evangelos Kalogerakis; Or Litany; Sanja Fidler; |
75 | SUPR: A Sparse Unified Part-Based Human Representation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Consequently, we propose a new learning scheme that jointly trains a full-body model and specific part models using a federated dataset of full-body and body-part scans. |
Ahmed A. A. Osman; Timo Bolkart; Dimitrios Tzionas; Michael J. Black; |
76 | Revisiting Point Cloud Simplification: A Learnable Feature Preserving Approach Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Traditional simplification techniques usually rely on solving a time-consuming optimization problem, hence they are impractical for large-scale datasets. In an attempt to alleviate this computational burden, we propose a fast point cloud simplification method by learning to sample salient points. |
Rolandos Alexandros Potamias; Giorgos Bouritsas; Stefanos Zafeiriou; |
77 | Masked Autoencoders for Point Cloud Self-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As a promising scheme of self-supervised learning, masked autoencoding has significantly advanced natural language processing and computer vision. Inspired by this, we propose a neat scheme of masked autoencoders for point cloud self-supervised learning, addressing the challenges posed by point cloud’s properties, including leakage of location information and uneven information density. |
Yatian Pang; Wenxiao Wang; Francis E.H. Tay; Wei Liu; Yonghong Tian; Li Yuan; |
78 | Intrinsic Neural Fields: Learning Functions on Manifolds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The extrinsic embedding ignores known intrinsic manifold properties and is inflexible wrt. transfer of the learned function. To overcome these limitations, this work introduces intrinsic neural fields, a novel and versatile representation for neural fields on manifolds. |
Lukas Koestler; Daniel Grittner; Michael Moeller; Daniel Cremers; Zorah Lä,hner; |
79 | Skeleton-Free Pose Transfer for Stylized 3D Characters Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present the first method that automatically transfers poses between stylized 3D characters without skeletal rigging. |
Zhouyingcheng Liao; Jimei Yang; Jun Saito; Gerard Pons-Moll; Yang Zhou; |
80 | Masked Discrimination for Self-Supervised Learning on Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, mask based pretraining has yet to show benefits for point cloud understanding, likely due to standard backbones like PointNet being unable to properly handle the training versus testing distribution mismatch introduced by masking during training. In this paper, we bridge this gap by proposing a discriminative mask pretraining Transformer framework, MaskPoint, for point clouds. |
Haotian Liu; Mu Cai; Yong Jae Lee; |
81 | FBNet: Feedback Network for Point Cloud Completion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a novel Feedback Network (FBNet) for point cloud completion, in which present features are efficiently refined by rerouting subsequent fine-grained ones. |
Xuejun Yan; Hongyu Yan; Jingjing Wang; Hang Du; Zhihong Wu; Di Xie; Shiliang Pu; Li Lu; |
82 | Meta-Sampler: Almost-Universal Yet Task-Oriented Sampling for Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose an almost-universal sampler, in our quest for a sampler that can learn to preserve the most useful points for a particular task, yet be inexpensive to adapt to different tasks, models or datasets. |
Ta-Ying Cheng; Qingyong Hu; Qian Xie; Niki Trigoni; Andrew Markham; |
83 | A Level Set Theory for Neural Implicit Evolution Under Explicit Flows Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: They effectively act as parametric level sets with the zero-level set defining the surface of interest. We present a framework that allows applying deformation operations defined for triangle meshes onto such implicit surfaces. |
Ishit Mehta; Manmohan Chandraker; Ravi Ramamoorthi; |
84 | Efficient Point Cloud Analysis Using Hilbert Curve Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this way, we propose the HilbertNet to maintain the locality advantage of voxel-based methods while significantly reducing the computational cost. |
Wanli Chen; Xinge Zhu; Guojin Chen; Bei Yu; |
85 | TOCH: Spatio-Temporal Object-to-Hand Correspondence for Motion Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present TOCH, a method for refining incorrect 3D hand-object interaction sequences using a data prior. |
Keyang Zhou; Bharat Lal Bhatnagar; Jan Eric Lenssen; Gerard Pons-Moll; |
86 | LaTeRF: Label and Text Driven Object Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene and known camera poses, a natural language description of the object, and a small number of point-labels of object and non-object points in the input images. |
Ashkan Mirzaei; Yash Kant; Jonathan Kelly; Igor Gilitschenski; |
87 | MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Recently, self-supervised pre-training has advanced Vision Transformers on various tasks w.r.t. different data modalities, e.g., image and 3D point cloud data. In this paper, we explore this learning paradigm for 3D mesh data analysis based on Transformers. |
Yaqian Liang; Shanshan Zhao; Baosheng Yu; Jing Zhang; Fazhi He; |
88 | Unsupervised Deep Multi-Shape Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel approach for deep multi-shape matching that ensures cycle-consistent multi-matchings while not depending on an explicit template shape. |
Dongliang Cao; Florian Bernard; |
89 | Texturify: Generating Textures on 3D Shape Surfaces Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We thus propose Texturify, a GAN-based method that leverages a 3D shape dataset of an object class and learns to reproduce the distribution of appearances observed in real images by generating high-quality textures. |
Yawar Siddiqui; Justus Thies; Fangchang Ma; Qi Shan; Matthias Nieß,ner; Angela Dai; |
90 | Autoregressive 3D Shape Generation Via Canonical Mapping Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Yet, taming them in generating less structured and voluminous data formats such as high-resolution point clouds have seldom been explored due to ambiguous sequentialization processes and infeasible computation burden. In this paper, we aim to further exploit the power of transformers and employ them for the task of 3D point cloud generation. |
An-Chieh Cheng; Xueting Li; Sifei Liu; Min Sun; Ming-Hsuan Yang; |
91 | PointTree: Transformation-Robust Point Cloud Encoder with Relaxed K-D Trees Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Despite rapid progress, state-of-the-art encoders are restrictive to canonicalized point clouds, and have weaker than necessary performance when encountering geometric transformation distortions. To overcome this challenge, we propose PointTree, a general-purpose point cloud encoder that is robust to transformations based on relaxed K-D trees. |
Jun-Kun Chen; Yu-Xiong Wang; |
92 | UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose united implicit functions (UNIF), a part-based method for clothed human reconstruction and animation with raw scans and skeletons as the input. |
Shenhan Qian; Jiale Xu; Ziwei Liu; Liqian Ma; Shenghua Gao; |
93 | PRIF: Primary Ray-Based Implicit Function Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a new implicit shape representation called Primary Ray-based Implicit Function (PRIF). |
Brandon Y. Feng; Yinda Zhang; Danhang Tang; Ruofei Du; Amitabh Varshney; |
94 | Point Cloud Domain Adaptation Via Masked Local 3D Structure Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Masked Local Structure Prediction (MLSP) method to encode target data. |
Hanxue Liang; Hehe Fan; Zhiwen Fan; Yi Wang; Tianlong Chen; Yu Cheng; Zhangyang Wang; |
95 | CLIP-Actor: Text-Driven Recommendation and Stylization for Animating Human Meshes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose CLIP-Actor, a text-driven motion recommendation and neural mesh stylization system for human mesh animation. |
Kim Youwang; Kim Ji-Yeon; Tae-Hyun Oh; |
96 | PlaneFormers: From Sparse View Planes to 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present an approach for the planar surface reconstruction of a scene from images with limited overlap. |
Samir Agarwala; Linyi Jin; Chris Rockwell; David F. Fouhey; |
97 | Learning Implicit Templates for Point-Based Clothed Human Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present FITE, a First-Implicit-Then-Explicit framework for modeling human avatars in clothing. |
Siyou Lin; Hongwen Zhang; Zerong Zheng; Ruizhi Shao; Yebin Liu; |
98 | Exploring The Devil in Graph Spectral Domain for 3D Point Cloud Attacks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead, we propose point cloud attacks from a new perspective—Graph Spectral Domain Attack (GSDA), aiming to perturb transform coefficients in the graph spectral domain that corresponds to varying certain geometric structure. |
Qianjiang Hu; Daizong Liu; Wei Hu; |
99 | Structure-Aware Editable Morphable Model for 3D Facial Detail Animation and Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper augments morphable models in representing facial details by learning a Structure-aware Editable Morphable Model (SEMM). |
Jingwang Ling; Zhibo Wang; Ming Lu; Quan Wang; Chen Qian; Feng Xu; |
100 | MoFaNeRF: Morphable Facial Neural Radiance Field Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a parametric model that maps free-view images into a vector space of coded facial shape, expression and appearance with a neural radiance field, namely Morphable Facial NeRF. |
Yiyu Zhuang; Hao Zhu; Xusen Sun; Xun Cao; |
101 | PointInst3D: Segmenting 3D Instances By Points Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In contrast, we propose a fully convolutional 3D point cloud instance segmentation method that works in a per-point prediction fashion. |
Tong He; Wei Yin; Chunhua Shen; Anton van den Hengel; |
102 | Cross-Modal 3D Shape Generation and Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a generic multi-modal generative model that couples the 2D modalities and implicit 3D representations through shared latent spaces. |
Zezhou Cheng; Menglei Chai; Jian Ren; Hsin-Ying Lee; Kyle Olszewski; Zeng Huang; Subhransu Maji; Sergey Tulyakov; |
103 | Latent Partition Implicit with Surface Codes for 3D Representation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Current solutions learn various primitives and blend the primitives directly in the spatial space, which still struggle to approximate the 3D shape accurately. To resolve this problem, we introduce a novel implicit representation to represent a single 3D shape as a set of parts in the latent space, towards both highly accurate and plausibly interpretable shape modeling. |
Chao Chen; Yu-Shen Liu; Zhizhong Han; |
104 | Implicit Field Supervision for Robust Non-rigid Shape Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce an approach based on an auto-decoder framework, that learns a continuous shape-wise deformation field over a fixed template. |
Ramana Sundararaman; Gautam Pai; Maks Ovsjanikov; |
105 | Learning Self-Prior for Mesh Denoising Using Dual Graph Convolutional Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This study proposes a deep-learning framework for mesh denoising from a single noisy input, where two graph convolutional networks are trained jointly to filter vertex positions and facet normals apart. |
Shota Hattori; Tatsuya Yatagawa; Yutaka Ohtake; Hiromasa Suzuki; |
106 | DiffConv: Analyzing Irregular Point Clouds with An Irregular View Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel graph convolution named Difference Graph Convolution (diffConv), which does not rely on a regular view. |
Manxi Lin; Aasa Feragen; |
107 | PD-Flow: A Point Cloud Denoising Framework with Normalizing Flows Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel deep learning-based denoising model, that incorporates normalizing flows and noise disentanglement techniques to achieve high denoising accuracy. |
Aihua Mao; Zihui Du; Yu-Hui Wen; Jun Xuan; Yong-Jin Liu; |
108 | SeedFormer: Patch Seeds Based Point Cloud Completion with Upsample Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel SeedFormer to improve the ability of detail preservation and recovery in point cloud completion. |
Haoran Zhou; Yun Cao; Wenqing Chu; Junwei Zhu; Tong Lu; Ying Tai; Chengjie Wang; |
109 | DeepMend: Learning Occupancy Functions to Represent Shape for Repair Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present DeepMend, a novel approach to reconstruct restorations to fractured shapes using learned occupancy functions. |
Nikolas Lamb; Sean Banerjee; Natasha Kholgade Banerjee; |
110 | A Repulsive Force Unit for Garment Collision Handling in Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Despite recent success, deep learning-based methods for predicting 3D garment deformation under body motion suffer from interpenetration problems between the garment and the body. To address this problem, we propose a novel collision handling neural network layer called Repulsive Force Unit (ReFU). |
Qingyang Tan; Yi Zhou; Tuanfeng Wang; Duygu Ceylan; Xin Sun; Dinesh Manocha; |
111 | Shape-Pose Disentanglement Using SE(3)-Equivariant Vector Neurons Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce an unsupervised technique for encoding point clouds into a canonical shape representation, by disentangling shape and pose. |
Oren Katzir; Dani Lischinski; Daniel Cohen-Or; |
112 | 3D Equivariant Graph Implicit Functions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In recent years, neural implicit representations have made remarkable progress in modeling of 3D shapes with arbitrary topology. In this work, we address two key limitations of such representations, in failing to capture local 3D geometric fine details, and to learn from and generalize to shapes with unseen 3D transformations. |
Yunlu Chen; Basura Fernando; Hakan Bilen; Matthias Nieß,ner; Efstratios Gavves; |
113 | PatchRD: Detail-Preserving Shape Completion By Learning Patch Retrieval and Deformation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces a data-driven shape completion approach that focuses on completing geometric details of missing regions of 3D shapes. |
Bo Sun; Vladimir G. Kim; Noam Aigerman; Qixing Huang; Siddhartha Chaudhuri; |
114 | 3D Shape Sequence of Human Comparison and Classification Using Current and Varifolds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we address the task of the comparison and the classification of 3D shape sequences of human. |
Emery Pierson; Mohamed Daoudi; Sylvain Arguillere; |
115 | Conditional-Flow NeRF: Accurate 3D Modelling with Reliable Uncertainty Quantification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This information is paramount in real applications such as medical diagnosis or autonomous driving where, to reduce potentially catastrophic failures, the confidence on the model outputs must be included into the decision-making process. In this context, we introduce Conditional-Flow NeRF (CF-NeRF), a novel probabilistic framework to incorporate uncertainty quantification into NeRF-based approaches. |
Jianxiong Shen; Antonio Agudo; Francesc Moreno-Noguer; Adria Ruiz; |
116 | Unsupervised Pose-Aware Part Decomposition for Man-Made Articulated Objects Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose PPD (unsupervised Pose-aware Part Decomposition) to address a novel setting that explicitly targets man-made articulated objects with mechanical joints, considering the part poses in part parsing. |
Yuki Kawana; Yusuke Mukuta; Tatsuya Harada; |
117 | MeshUDF: Fast and Differentiable Meshing of Unsigned Distance Field Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we extend the marching cube algorithm to handle UDFs, both fast and accurately. |
Benoî,t Guillard; Federico Stella; Pascal Fua; |
118 | SPE-Net: Boosting Point Cloud Analysis Via Rotation Robustness Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel deep architecture tailored for 3D point cloud applications, named as SPE-Net. |
Zhaofan Qiu; Yehao Li; Yu Wang; Yingwei Pan; Ting Yao; Tao Mei; |
119 | The Shape Part Slot Machine: Contact-Based Reasoning for Generating 3D Shapes from Parts Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present the Shape Part Slot Machine, a new method for assembling novel 3D shapes from existing parts by performing contact-based reasoning. |
Kai Wang; Paul Guerrero; Vladimir G. Kim; Siddhartha Chaudhuri; Minhyuk Sung; Daniel Ritchie; |
120 | Spatiotemporal Self-Attention Modeling with Temporal Patch Shift for Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Temporal Patch Shift (TPS) method for efficient 3D self-attention modeling in transformers for video-based action recognition. |
Wangmeng Xiang; Chao Li; Biao Wang; Xihan Wei; Xian-Sheng Hua; Lei Zhang; |
121 | Proposal-Free Temporal Action Detection Via Global Segmentation Mask Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, for the first time, we propose a proposal-free Temporal Action detection model with Global Segmentation mask (TAGS). |
Sauradip Nag; Xiatian Zhu; Yi-Zhe Song; Tao Xiang; |
122 | Semi-Supervised Temporal Action Detection with Proposal-Free Masking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to their sequential localization (e.g, proposal generation) and classification design, they are prone to proposal error propagation. To overcome this limitation, in this work we propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT) with a parallel localization (mask generation) and classification architecture. |
Sauradip Nag; Xiatian Zhu; Yi-Zhe Song; Tao Xiang; |
123 | Zero-Shot Temporal Action Detection Via Vision-Language Prompting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, due to the sequential localization (e.g, proposal generation) and classification design, it is prone to localization error propagation. To overcome this problem, in this paper we propose a novel zero-Shot Temporal Action detection model via vision-LanguagE prompting (STALE). |
Sauradip Nag; Xiatian Zhu; Yi-Zhe Song; Tao Xiang; |
124 | CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This poses two major challenges: (1) spatial domain shift between web images and video frames (2) modality gap between image and video data. To address these challenges, we propose Cycle Domain Adaptation (CycDA), a cycle-based approach for unsupervised image-to-video domain adaptation. |
Wei Lin; Anna Kukleva; Kunyang Sun; Horst Possegger; Hilde Kuehne; Horst Bischof; |
125 | S2N: Suppression-Strengthen Network for Event-Based Recognition Under Variant Illuminations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the event degradation due to imaging under low illumination obscures the correlation between event signals and brings uncertainty into event representation. Targeting this issue, we present a novel suppression-strengthen network (S2N) to augment the event feature representation after suppressing the influence of degradation. |
Zengyu Wan; Yang Wang; Ganchao Tan; Yang Cao; Zheng-Jun Zha; |
126 | CMD: Self-Supervised 3D Action Representation Learning with Cross-Modal Mutual Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem. |
Yunyao Mao; Wengang Zhou; Zhenbo Lu; Jiajun Deng; Houqiang Li; |
127 | Expanding Language-Image Pretrained Models for General Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a simple yet effective approach that adapts the pretrained language-image models to video recognition directly, instead of pretraining a new model from scratch. |
Bolin Ni; Houwen Peng; Minghao Chen; Songyang Zhang; Gaofeng Meng; Jianlong Fu; Shiming Xiang; Haibin Ling; |
128 | Hunting Group Clues with Transformers for Social Group Activity Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a novel framework for social group activity recognition. |
Masato Tamura; Rahul Vishwakarma; Ravigopal Vennelakanti; |
129 | Contrastive Positive Mining for Unsupervised 3D Action Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, a Contrastive Positive Mining (CPM) framework is proposed for unsupervised skeleton 3D action representation learning. |
Haoyuan Zhang; Yonghong Hou; Wenjing Zhang; Wanqing Li; |
130 | Target-Absent Human Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a data-driven computational model that addresses the search-termination problem and predicts the scanpath of search fixations made by people searching for targets that do not appear in images. |
Zhibo Yang; Sounak Mondal; Seoyoung Ahn; Gregory Zelinsky; Minh Hoai; Dimitris Samaras; |
131 | Uncertainty-Based Spatial-Temporal Attention for Online Action Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we proposed an uncertainty-based spatial-temporal attention for online action detection. |
Hongji Guo; Zhou Ren; Yi Wu; Gang Hua; Qiang Ji; |
132 | Iwin: Human-Object Interaction Detection Via Transformer with Irregular Windows Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a new vision Transformer, named Iwin Transformer, which is specifically designed for human-object interaction (HOI) detection, a detailed scene understanding task involving a sequential process of human/object detection and interaction recognition. |
Danyang Tu; Xiongkuo Min; Huiyu Duan; Guodong Guo; Guangtao Zhai; Wei Shen; |
133 | Rethinking Zero-Shot Action Recognition: Learning from Latent Atomic Actions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: It enables humans to quickly understand an unseen action given a bunch of atomic actions learned from seen actions. Inspired by this, we propose Jigsaw Network (JigsawNet) which recognizes complex actions through unsupervisedly decomposing them into combinations of atomic actions and bridging group-to-group relationships between visual features and semantic representations. |
Yijun Qian; Lijun Yu; Wenhe Liu; Alexander G. Hauptmann; |
134 | Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we argue that comparing body-parts of multi-person simultaneously can afford us more useful and supplementary interactiveness cues. |
Xiaoqian Wu; Yong-Lu Li; Xinpeng Liu; Junyi Zhang; Yuzhe Wu; Cewu Lu; |
135 | Collaborating Domain-Shared and Target-Specific Feature Clustering for Cross-Domain 3D Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we consider the problem of cross-domain 3D action recognition in the open-set setting, which has been rarely explored before. |
Qinying Liu; Zilei Wang; |
136 | Is Appearance Free Action Recognition Possible? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our results show a notable decrease in performance for all architectures on AFD compared to RGB. We also conducted a complimentary study with humans that shows their recognition accuracy on AFD and RGB is very similar and much better than the evaluated architectures on AFD. |
Filip Ilic; Thomas Pock; Richard P. Wildes; |
137 | Learning Spatial-Preserved Skeleton Representations for Few-Shot Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, existing skeleton-based spatial-temporal models tend to deteriorate the positional distinguishability of joints, which leads to fuzzy spatial matching and poor explainability. To address these issues, we propose a novel spatial matching strategy consisting of spatial disentanglement and spatial activation. |
Ning Ma; Hongyi Zhang; Xuhui Li; Sheng Zhou; Zhen Zhang; Jun Wen; Haifeng Li; Jingjun Gu; Jiajun Bu; |
138 | Dual-Evidential Learning for Weakly-Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Despite great progress, existing methods suffer from severe action-background ambiguity, which mainly comes from background noise introduced by aggregation operations and large intra-action variations caused by the task gap between classification and localization. To address this issue, we propose a generalized evidential deep learning (EDL) framework for WS-TAL, called Dual-Evidential Learning for Uncertainty modeling (DELU), which extends the traditional paradigm of EDL to adapt to the weakly-supervised multi-label classification goal. |
Mengyuan Chen; Junyu Gao; Shicai Yang; Changsheng Xu; |
139 | Global-Local Motion Transformer for Unsupervised Skeleton-Based Action Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new transformer model for the task of unsupervised learning of skeleton motion sequences. |
Boeun Kim; Hyung Jin Chang; Jungho Kim; Jin Young Choi; |
140 | AdaFocusV3: On Unified Spatial-Temporal Dynamic Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper explores the unified formulation of spatial-temporal dynamic computation on top of the recently proposed AdaFocusV2 algorithm, contributing to an improved AdaFocusV3 framework. |
Yulin Wang; Yang Yue; Xinhong Xu; Ali Hassani; Victor Kulikov; Nikita Orlov; Shiji Song; Humphrey Shi; Gao Huang; |
141 | Panoramic Human Activity Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To obtain a more comprehensive activity understanding for a crowded scene, in this paper, we propose a new problem of panoramic human activity recognition (PAR), which aims to simultaneously achieve the the recognition of individual actions, social group activities, and global activities. |
Ruize Han; Haomin Yan; Jiacheng Li; Songmiao Wang; Wei Feng; Song Wang; |
142 | Delving Into Details: Synopsis-to-Detail Networks for Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore the details in video recognition with the aim to improve the accuracy. |
Shuxian Liang; Xu Shen; Jianqiang Huang; Xian-Sheng Hua; |
143 | A Generalized \& Robust Framework for Timestamp Supervision in Temporal Action Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel Expectation-Maximization (EM) based approach which leverages label uncertainty of unlabelled frames and is robust enough to accommodate possible annotation errors. |
Rahul Rahaman; Dipika Singhania; Alexandre Thiery; Angela Yao; |
144 | Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a hierarchical matching model to support comprehensive similarity measure at global, temporal and spatial levels via a zoom-in matching module.We further propose a mixed-supervised hierarchical contrastive learning (HCL) in training, which not only employs supervised contrastive learning to differentiate videos at different levels, but also utilizes cycle consistency as weak supervision to align discriminative temporal clips or spatial patches. |
Sipeng Zheng; Shizhe Chen; Qin Jin; |
145 | PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an optimizing framework to provide robust visual privacy protection along the human action recognition pipeline. |
Carlos Hinojosa; Miguel Marquez; Henry Arguello; Ehsan Adeli; Li Fei-Fei; Juan Carlos Niebles; |
146 | Scale-Aware Spatio-Temporal Relation Learning for Video Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a scale-aware weakly supervised learning approach to capture local and salient anomalous patterns from the background, using only coarse video-level labels as supervision. |
Guoqiu Li; Guanxiong Cai; Xingyu Zeng; Rui Zhao; |
147 | Compound Prototype Matching for Few-Shot Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel approach that first summarizes each video into compound prototypes consisting of a group of global prototypes and a group of focused prototypes, and then compares video similarity based on the prototypes. |
Yifei Huang; Lijin Yang; Yoichi Sato; |
148 | Continual 3D Convolutional Neural Networks for Real-Time Processing of Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Continual 3D Convolutional Neural Networks (Co3D CNNs), a new computational formulation of spatio-temporal 3D CNNs, in which videos are processed frame-by-frame rather than by clip. |
Lukas Hedegaard; Alexandros Iosifidis; |
149 | Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. |
Tianjiao Li; Lin Geng Foo; Qiuhong Ke; Hossein Rahmani; Anran Wang; Jinghua Wang; Jun Liu; |
150 | Dynamic Local Aggregation Network with Adaptive Clusterer for Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To overcome these drawbacks, we introduce DLAN-AC, a Dynamic Local Aggregation Network with Adaptive Clusterer, for anomaly detection. |
Zhiwei Yang; Peng Wu; Jing Liu; Xiaotao Liu; |
151 | Action Quality Assessment with Temporal Parsing Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing state-of-the-art methods typically rely on the holistic video representations for score regression or ranking, which limits the generalization to capture fine-grained intra-class variation. To overcome the above limitation, we propose a temporal parsing transformer to decompose the holistic feature into temporal part-level representations. |
Yang Bai; Desen Zhou; Songyang Zhang; Jian Wang; Errui Ding; Yu Guan; Yang Long; Jingdong Wang; |
152 | Entry-Flipped Transformer for Inference and Prediction of Participant Behavior Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our key idea is to model the spatio-temporal relations among participants in a manner that is robust to error accumulation during frame-wise inference and prediction. |
Bo Hu; Tat-Jen Cham; |
153 | Pairwise Contrastive Learning Network for Action Quality Assessment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, it ignores the subtle and critical difference between videos. To address this problem, a new pairwise contrastive learning network (PCLN) is proposed to concern these differences and form an end-to-end AQA model with basic regression network. |
Mingzhe Li; Hong-Bo Zhang; Qing Lei; Zongwen Fan; Jinghua Liu; Ji-Xiang Du; |
154 | Geometric Features Informed Multi-Person Human-Object Interaction Recognition in Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Consider that geometric features such as human pose and object position provide meaningful information to understand HOIs, we argue to combine the benefits of both visual and geometric features in HOI recognition, and propose a novel Two-level Geometric feature-informed Graph Convolutional Network (2G-GCN).To demonstrate the novelty and effectiveness of our method in challenging scenarios, we propose a new multi-person HOI dataset (MPHOI-72). |
Tanqiu Qiao; Qianhui Men; Frederick W. B. Li; Yoshiki Kubotani; Shigeo Morishima; Hubert P. H. Shum; |
155 | ActionFormer: Localizing Moments of Actions with Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we present ActionFormer–a simple yet powerful model to identify actions in time and recognize their categories in a single shot, without using action proposals or relying on pre-defined anchor windows. |
Chen-Lin Zhang; Jianxin Wu; Yin Li; |
156 | SocialVAE: Human Trajectory Prediction Using Timewise Latents Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose SocialVAE, a novel approach for human trajectory prediction. |
Pei Xu; Jean-Bernard Hayet; Ioannis Karamouzas; |
157 | Shape Matters: Deformable Patch Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Previous work always assumes patches to have fixed shapes, such as circles or rectangles, and it does not consider the shape of patches as a factor in patch attacks. To explore this issue, we propose a novel Deformable Patch Representation (DPR) that can harness the geometric structure of triangles to support the differentiable mapping between contour modeling and masks. |
Zhaoyu Chen; Bo Li; Shuang Wu; Jianghe Xu; Shouhong Ding; Wenqiang Zhang; |
158 | Frequency Domain Model Augmentation for Adversarial Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the observation that the transferability of adversarial examples can be improved by attacking diverse models simultaneously, model augmentation methods which simulate different models by using transformed images are proposed. |
Yuyang Long; Qilong Zhang; Boheng Zeng; Lianli Gao; Xianglong Liu; Jian Zhang; Jingkuan Song; |
159 | Prior-Guided Adversarial Initialization for Fast Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore the difference between the training processes of SAT and FAT and observe that the attack success rate of adversarial examples (AEs) of FAT gets worse gradually in the late training stage, resulting in overfitting. |
Xiaojun Jia; Yong Zhang; Xingxing Wei; Baoyuan Wu; Ke Ma; Jue Wang; Xiaochun Cao; |
160 | Enhanced Accuracy and Robustness Via Multi-Teacher Adversarial Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To improve the robust and clean accuracy of small models, we introduce the Multi-Teacher Adversarial Robustness Distillation (MTARD) to guide the adversarial training process of small models. |
Shiji Zhao; Jie Yu; Zhenlong Sun; Bo Zhang; Xingxing Wei; |
161 | LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose transferability from Large Geometric Vicinity (LGV), a new technique to increase the transferability of black-box adversarial attacks. |
Martin Gubri; Maxime Cordy; Mike Papadakis; Yves Le Traon; Koushik Sen; |
162 | A Large-Scale Multiple-Objective Method for Black-Box Attack Against Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Most existing attack methods aim to minimize the true positive rate, which often shows poor attack performance, as another sub-optimal bounding box may be detected around the attacked bounding box to be the new true positive one. To settle this challenge, we propose to minimize the true positive rate and maximize the false positive rate, which can encourage more false positive objects to block the generation of new true positive bounding boxes. |
Siyuan Liang; Longkang Li; Yanbo Fan; Xiaojun Jia; Jingzhi Li; Baoyuan Wu; Xiaochun Cao; |
163 | GradAuto: Energy-Oriented Attack on Dynamic Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate the robustness of dynamic neural networks against energy-oriented attacks. |
Jianhong Pan; Qichen Zheng; Zhipeng Fan; Hossein Rahmani; Qiuhong Ke; Jun Liu; |
164 | A Spectral View of Randomized Smoothing Under Common Corruptions: Benchmarking and Improving Certified Robustness Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we explore a new problem setting to critically examine how the adversarial robustness guarantees change when state-of-the-art randomized smoothing-based certifications encounter common corruptions of the test data. |
Jiachen Sun; Akshay Mehra; Bhavya Kailkhura; Pin-Yu Chen; Dan Hendrycks; Jihun Hamm; Z. Morley Mao; |
165 | Improving Adversarial Robustness of 3D Point Cloud Classification Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we design two innovative methodologies to improve the adversarial robustness of 3D point cloud classification models. |
Guanlin Li; Guowen Xu; Han Qiu; Ruan He; Jiwei Li; Tianwei Zhang; |
166 | Learning Extremely Lightweight and Robust Model with Differentiable Constraints on Sparsity and Condition Number Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a framework for building extremely lightweight models, which combines tensor product with the differentiable constraints for reducing condition number and promoting sparsity. |
Xian Wei; Yangyu Xu; Yanhui Huang; Hairong Lv; Hai Lan; Mingsong Chen; Xuan Tang; |
167 | RIBAC: Towards Robust and Imperceptible Backdoor Attack Against Compact DNN Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to study and develop Robust and Imperceptible Backdoor Attack against Compact DNN models (RIBAC). |
Huy Phan; Cong Shi; Yi Xie; Tianfang Zhang; Zhuohang Li; Tianming Zhao; Jian Liu; Yan Wang; Yingying Chen; Bo Yuan; |
168 | Boosting Transferability of Targeted Adversarial Examples Via Hierarchical Generative Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we develop a simple yet effective framework to craft targeted transfer-based adversarial examples, applying a hierarchical generative network. |
Xiao Yang; Yinpeng Dong; Tianyu Pang; Hang Su; Jun Zhu; |
169 | Adaptive Image Transformations for Transfer-Based Adversarial Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel architecture, called Adaptive Image Transformation Learner (AITL), which incorporates different image transformation operations into a unified framework to further improve the transferability of adversarial examples. |
Zheng Yuan; Jie Zhang; Shiguang Shan; |
170 | Generative Multiplane Images: Making A 2D GAN 3D-Aware Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: What is really needed to make an existing 2D GAN 3Daware? To answer this question, we modify a classical GAN, i.e., StyleGANv2, as little as possible. |
Xiaoming Zhao; Fangchang Ma; David Gü,era; Zhile Ren; Alexander G. Schwing; Alex Colburn; |
171 | AdvDO: Realistic Adversarial Attacks for Trajectory Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While many prior works aim to achieve higher prediction accuracy, few studies the adversarial robustness of their methods. To bridge this gap, we propose to study the adversarial robustness of data-driven trajectory prediction systems. |
Yulong Cao; Chaowei Xiao; Anima Anandkumar; Danfei Xu; Marco Pavone; |
172 | Adversarial Contrastive Learning Via Asymmetric InfoNCE Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, this mechanism can be potentially flawed, since adversarial perturbations may cause instance-level identity confusion, which can impede CL performance by pulling together different instances with separate identities. To address this issue, we propose to treat adversarial samples unequally when contrasted to positive and negative samples, with an asymmetric InfoNCE objective (A-InfoNCE) that allows discriminating considerations of adversarial samples. |
Qiying Yu; Jieming Lou; Xianyuan Zhan; Qizhang Li; Wangmeng Zuo; Yang Liu; Jingjing Liu; |
173 | One Size Does NOT Fit All: Data-Adaptive Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we argue that, for the attackable examples, traditional adversarial training which utilizes a fixed size perturbation ball can create adversarial examples that deviate far away from the original class towards the target class. |
Shuo Yang; Chang Xu; |
174 | UniCR: Universally Approximated Certified Robustness Via Randomized Smoothing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, we propose the first universally approximated certified robustness (UniCR) framework, which can approximate the robustness certification of \emph{any} input on \emph{any} classifier against \emph{any} $\ell_p$ perturbations with noise generated by \emph{any} continuous probability distribution. |
Hanbin Hong; Binghui Wang; Yuan Hong; |
175 | Hardly Perceptible Trojan Attack Against Neural Networks with Bit Flips Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel attack, namely hardly perceptible Trojan attack (HPT). |
Jiawang Bai; Kuofeng Gao; Dihong Gong; Shu-Tao Xia; Zhifeng Li; Wei Liu; |
176 | Robust Network Architecture Search Via Feature Distortion Restraining Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Robust Network Architecture Search (RNAS) to obtain a robust network against adversarial attacks. |
Yaguan Qian; Shenghui Huang; Bin Wang; Xiang Ling; Xiaohui Guan; Zhaoquan Gu; Shaoning Zeng; Wujie Zhou; Haijiang Wang; |
177 | SecretGen: Privacy Recovery on Pre-trained Models Via Distribution Discrimination Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, it raises extensive concerns on whether these pre-trained models would leak privacy-sensitive information of their training data. Thus, in this work, we aim to answer the following questions: Can we effectively recover private information from these pre-trained models? |
Zhuowen Yuan; Fan Wu; Yunhui Long; Chaowei Xiao; Bo Li; |
178 | Triangle Attack: A Query-Efficient Decision-Based Adversarial Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we find that a benign sample, the current and the next adversarial examples can naturally construct a triangle in a subspace for any iterative attacks. |
Xiaosen Wang; Zeliang Zhang; Kangheng Tong; Dihong Gong; Kun He; Zhifeng Li; Wei Liu; |
179 | Data-Free Backdoor Removal Based on Channel Lipschitzness Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce a novel concept called Channel Lipschitz Constant (CLC), which is defined as the Lipschitz constant of the mapping from the input images to the output of each channel. |
Runkai Zheng; Rongjun Tang; Jianze Li; Li Liu; |
180 | Black-Box Dissector: Towards Erasing-Based Hard-Label Model Stealing Attack Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose a novel hard-label model stealing method termed black-box dissector, which consists of two erasing-based modules. |
Yixu Wang; Jie Li; Hong Liu; Yan Wang; Yongjian Wu; Feiyue Huang; Rongrong Ji; |
181 | Learning Energy-Based Models with Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We study a new approach to learning energy-based models (EBMs) based on adversarial training (AT). |
Xuwang Yin; Shiying Li; Gustavo K. Rohde; |
182 | Adversarial Label Poisoning Attack on Graph Neural Networks Via Label Propagation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, we propose a label poisoning attack framework for graph convolutional networks (GCNs), inspired by the equivalence between label propagation and decoupled GCNs that separate message passing from neural networks. |
Ganlin Liu; Xiaowei Huang; Xinping Yi; |
183 | Revisiting Outer Optimization in Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose an optimization method called ENGM which regularizes the contribution of each input example to the average mini-batch gradients. |
Ali Dabouei; Fariborz Taherkhani; Sobhan Soleymani; Nasser M. Nasrabadi; |
184 | Zero-Shot Attribute Attacks on Fine-Grained Recognition Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Such attacks, in particular, universal perturbations that are class-agnostic and ideally should generalize to unseen classes, however, cannot leverage or capture small distinctions among fine-grained classes. Therefore, we propose a compositional attribute-based framework for generating adversarial attacks on zero-shot fine-grained recognition models. |
Nasim Shafiee; Ehsan Elhamifar; |
185 | Towards Effective and Robust Neural Trojan Defenses Via Input Filtering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Most defense methods still make out-of-date assumptions about Trojan triggers and target classes, thus, can be easily circumvented by modern Trojan attacks. To deal with this problem, we propose two novel filtering defenses called Variational Input Filtering (VIF) and Adversarial Input Filtering (AIF) which leverage lossy data compression and adversarial learning respectively to effectively purify all potential Trojan triggers in the input at run time without making assumptions about the number of triggers/target classes or the input dependence property of triggers. |
Kien Do; Haripriya Harikumar; Hung Le; Dung Nguyen; Truyen Tran; Santu Rana; Dang Nguyen; Willy Susilo; Svetha Venkatesh; |
186 | Scaling Adversarial Training to Large Perturbation Bounds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we aim to achieve adversarial robustness within larger bounds, against perturbations that may be perceptible, but do not change human (or Oracle) prediction. |
Sravanti Addepalli; Samyak Jain; Gaurang Sriramanan; R. Venkatesh Babu; |
187 | Exploiting The Local Parabolic Landscapes of Adversarial Losses to Accelerate Black-Box Adversarial Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to improve the query efficiency of black-box methods by exploiting the smoothness of the local loss landscape. |
Hoang Tran; Dan Lu; Guannan Zhang; |
188 | Generative Domain Adaptation for Face Anti-Spoofing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, insufficient supervision of unlabeled target domains and neglect of low-level feature alignment degrade the performances of existing methods. To address these issues, we propose a novel perspective of UDA FAS that directly fits the target data to the models, i.e., stylizes the target data to the source-domain style via image translation, and further feeds the stylized data into the well-trained source model for classification. |
Qianyu Zhou; Ke-Yue Zhang; Taiping Yao; Ran Yi; Kekai Sheng; Shouhong Ding; Lizhuang Ma; |
189 | MetaGait: Learning to Learn An Omni Sample Adaptive Representation for Gait Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, gait recognition still suffers from the conflicts between the limited binary visual clues of the silhouette and numerous covariates with diverse scales, which brings challenges to the model’s adaptiveness. In this paper, we address this conflict by developing a novel MetaGait that learns to learn an omni sample adaptive representation. |
Huanzhang Dou; Pengyi Zhang; Wei Su; Yunlong Yu; Xi Li; |
190 | GaitEdge: Beyond Plain End-to-End Gait Recognition for Better Practicality Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel end-to-end framework named GaitEdge which can effectively block gait-irrelevant information and release end-to-end training potential. |
Junhao Liang; Chao Fan; Saihui Hou; Chuanfu Shen; Yongzhen Huang; Shiqi Yu; |
191 | UIA-ViT: Unsupervised Inconsistency-Aware Method Based on Vision Transformer for Face Forgery Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Some existing methods generate large-scale synthesized data with location annotations, which is time-consuming. Others generate forgery location labels by subtracting paired real and fake images, yet such paired data is difficult to collected and the generated label is usually discontinuous. To overcome these limitations, we propose a novel Unsupervised Inconsistency-Aware method based on Vision Transformer, called UIA-ViT. |
Wanyi Zhuang; Qi Chu; Zhentao Tan; Qiankun Liu; Haojie Yuan; Changtao Miao; Zixiang Luo; Nenghai Yu; |
192 | Effective Presentation Attack Detection Driven By Face Related Task Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unlike this specific PAD task, other face related tasks trained by huge amount of real faces (e.g. face recognition and attribute editing) can be effectively adopted into different application scenarios. Inspired by this, we propose to trade position of PAD and face related work in a face system and apply the free acquired prior knowledge from face related tasks to solve face PAD, so as to improve the generalization ability in detecting PAs. |
Wentian Zhang; Haozhe Liu; Feng Liu; Raghavendra Ramachandra; Christoph Busch; |
193 | PPT: Token-Pruned Pose Transformer for Monocular and Multi-View Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the token-Pruned Pose Transformer (PPT) for 2D human pose estimation, which can locate a rough human mask and performs self-attention only within selected tokens. |
Haoyu Ma; Zhe Wang; Yifei Chen; Deying Kong; Liangjian Chen; Xingwei Liu; Xiangyi Yan; Hao Tang; Xiaohui Xie; |
194 | AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present AvatarPoser, the first learning-based method that predicts full-body poses in world coordinates using only motion input from the user’s head and hands. |
Jiaxi Jiang; Paul Streli; Huajian Qiu; Andreas Fender; Larissa Laich; Patrick Snape; Christian Holz; |
195 | P-STMO: Pre-trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (P-STMO) model for 2D-to-3D human pose estimation task. |
Wenkang Shan; Zhenhua Liu; Xinfeng Zhang; Shanshe Wang; Siwei Ma; Wen Gao; |
196 | D\&D: Learning Human Dynamics from Dynamic Camera Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present D&D (Learning Human Dynamics from Dynamic Camera), which leverages the laws of physics to reconstruct 3D human motion from the in-the-wild videos with a moving camera. |
Jiefeng Li; Siyuan Bian; Chao Xu; Gang Liu; Gang Yu; Cewu Lu; |
197 | Explicit Occlusion Reasoning for Multi-Person 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by the remarkable ability of humans to infer occluded joints from visible cues, we develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation with or without occlusions. |
Qihao Liu; Yi Zhang; Song Bai; Alan Yuille; |
198 | COUCH: Towards Controllable Human-Chair Interactions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing works on synthesizing human scene interaction focus on the high-level control of interacting with a particular object without considering fine-grained control of limb motion variations within one task. In this work, we drive this direction and study the problem of synthesizing scene interactions conditioned on a wide range of contact positions on the object. |
Xiaohan Zhang; Bharat Lal Bhatnagar; Sebastian Starke; Vladimir Guzov; Gerard Pons-Moll; |
199 | Identity-Aware Hand Mesh Estimation and Personalization from RGB Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an identity-aware hand mesh estimation model, which can incorporate the identity information represented by the intrinsic shape parameters of the subject. |
Deying Kong; Linguang Zhang; Liangjian Chen; Haoyu Ma; Xiangyi Yan; Shanlin Sun; Xingwei Liu; Kun Han; Xiaohui Xie; |
200 | C3P: Cross-Domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose to transfer 2D HPE annotation information within the existing large-scale RGB datasets (e.g., MS COCO) to 3D task, using unlabelled RGB-point cloud sequence easy to acquire for linking 2D and 3D domains. |
Cunlin Wu; Yang Xiao; Boshen Zhang; Mingyang Zhang; Zhiguo Cao; Joey Tianyi Zhou; |
201 | Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present Pose-NDF, a continuous model for plausible human poses based on neural distance fields (NDFs). |
Garvita Tiwari; Dimitrije Anti?; Jan Eric Lenssen; Nikolaos Sarafianos; Tony Tung; Gerard Pons-Moll; |
202 | CLIFF: Carrying Location Information in Full Frames Into Human Pose and Shape Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, cropping, their first step, discards the location information from the very beginning, which makes themselves unable to accurately predict the global rotation in the original camera coordinate system. To address this problem, we propose to Carry Location Information in Full Frames (CLIFF) into this task. |
Zhihao Li; Jianzhuang Liu; Zhensong Zhang; Songcen Xu; Youliang Yan; |
203 | DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch. |
Ailing Zeng; Xuan Ju; Lei Yang; Ruiyuan Gao; Xizhou Zhu; Bo Dai; Qiang Xu; |
204 | SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, for rarely seen or occluded actions, the estimated positions of multiple joints largely deviate from the ground truth values for a consecutive sequence of frames, rendering significant jitters on them. To tackle this problem, we propose to attach a dedicated temporal-only refinement network to existing pose estimators for jitter mitigation, named SmoothNet. |
Ailing Zeng; Lei Yang; Xuan Ju; Jiefeng Li; Jianyi Wang; Qiang Xu; |
205 | PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a simple yet effective data augmentation method, termed Pose Transformation (PoseTrans), to alleviate the aforementioned problems. |
Wentao Jiang; Sheng Jin; Wentao Liu; Chen Qian; Ping Luo; Si Liu; |
206 | Multi-Person 3D Pose and Shape Estimation Via Inverse Kinematics and Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To tackle the challenges, we propose a coarse-to-fine pipeline that benefits from 1) inverse kinematics from the occlusion-robust 3D skeleton estimation and 2) transformer-based relation-aware refinement techniques. |
Junuk Cha; Muhammad Saqlain; GeonU Kim; Mingyu Shin; Seungryul Baek; |
207 | Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a new prediction pattern, which introduces previously overlooked human poses, to implement the prediction task from the view of interpolation. |
Xiaoning Sun; Qiongjie Cui; Huaijiang Sun; Bin Li; Weiqing Li; Jianfeng Lu; |
208 | Structural Triangulation: A Closed-Form Solution to Constrained 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Structural Triangulation, a closed-form solution for optimal 3D human pose considering multi-view 2D pose estimations, calibrated camera parameters, and bone lengths. |
Zhuo Chen; Xu Zhao; Xiaoyue Wan; |
209 | Audio-Driven Stylized Gesture Generation with Flow-Based Model Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new end-to-end flow-based model, which can generate audio-driven gestures of arbitrary styles without the preprocessing procedure and style labels. |
Sheng Ye; Yu-Hui Wen; Yanan Sun; Ying He; Ziyang Zhang; Yaoyuan Wang; Weihua He; Yong-Jin Liu; |
210 | Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we develop a self-constrained prediction-verification network to characterize and learn the structural correlation between keypoints during training. |
Zhehan Kan; Shuoshuo Chen; Zeng Li; Zhihai He; |
211 | UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present UnrealEgo, a new large-scale naturalistic dataset for egocentric 3D human pose estimation.We next generate a large corpus of human motions. |
Hiroyasu Akada; Jian Wang; Soshi Shimada; Masaki Takahashi; Christian Theobalt; Vladislav Golyanik; |
212 | Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the first issue, we propose adaptive graph scattering, which leverages multiple trainable band-pass graph filters to decompose pose features into various graph spectrum bands to provide richer information, promoting more comprehensive feature extraction. To address the second issue, body parts are modeled separately to learn diverse dynamics, which enables finer feature extraction along the spatial dimensions. Integrating the above two designs, we propose a novel skeleton-parted graph scattering network (SPGSN). |
Maosen Li; Siheng Chen; Zijing Zhang; Lingxi Xie; Qi Tian; Ya Zhang; |
213 | Rethinking Keypoint Representations: Modeling Keypoints and Poses As Objects for Multi-Person Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated to find a more efficient solution, we propose to model individual keypoints and sets of spatially related keypoints (i.e., poses) as objects within a dense single-stage anchor-based detection framework. |
William McNally; Kanav Vats; Alexander Wong; John McPhee; |
214 | VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we perform a systematic evaluation of the existing methods and find that they get notably larger errors when tested on different cameras, human poses and appearance. To address the problem, we introduce VirtualPose, a two-stage learning framework to exploit the hidden free lunch specific to this task, i.e. generating infinite number of poses and cameras for training models at no cost. |
Jiajun Su; Chunyu Wang; Xiaoxuan Ma; Wenjun Zeng; Yizhou Wang; |
215 | Poseur: Direct Human Pose Regression with Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a direct, regression-based approach to 2D human pose estimation from single images. |
Weian Mao; Yongtao Ge; Chunhua Shen; Zhi Tian; Xinlong Wang; Zhibin Wang; Anton van den Hengel; |
216 | SimCC: A Simple Coordinate Classification Perspective for Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the long-standing quantization error problem in the 2D heatmap-based methods leads to several well-known drawbacks: 1) The performance for the low-resolution inputs is limited 2) To improve the feature map resolution for higher localization precision, multiple costly upsampling layers are required 3) Extra post-processing is adopted to reduce the quantization error. To address these issues, we aim to explore a brand new scheme, called SimCC, which reformulates HPE as two classification tasks for horizontal and vertical coordinates. |
Yanjie Li; Sen Yang; Peidong Liu; Shoukui Zhang; Yunxiao Wang; Zhicheng Wang; Wankou Yang; Shu-Tao Xia; |
217 | Regularizing Vector Embedding in Bottom-Up Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We observe that the different dimensions of embeddings are highly linearly correlated. To address this issue, we impose an additional constraint on the embeddings during training phase. |
Haixin Wang; Lu Zhou; Yingying Chen; Ming Tang; Jinqiao Wang; |
218 | A Visual Navigation Perspective for Category-Level Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, convergence and efficiency are two challenges of this inference procedure. In this paper, we take a deeper look at the inference of analysis-by-synthesis from the perspective of visual navigation, and investigate what is a good navigation policy for this specific task. |
Jiaxin Guo; Fangxun Zhong; Rong Xiong; Yun-Hui Liu; Yue Wang; Yiyi Liao; |
219 | Faster VoxelPose: Real-Time 3D Human Pose Estimation By Orthographic Projection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While the voxel-based methods have achieved promising results for multi-person 3D pose estimation from multi-cameras, they suffer from heavy computation burdens, especially for large scenes. We present Faster VoxelPose to address the challenge by re-projecting the feature volume to the three two-dimensional coordinate planes and estimating X, Y, Z coordinates from them separately. |
Hang Ye; Wentao Zhu; Chunyu Wang; Rujie Wu; Yizhou Wang; |
220 | Learning to Fit Morphable Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we build upon recent advances in learned optimization and propose an update rule inspired by the classic Levenberg-Marquardt algorithm. |
Vasileios Choutas; Federica Bogo; Jingjing Shen; Julien Valentin; |
221 | EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing datasets are limited in terms of either size, capture/annotation modalities, ground-truth quality, or interaction diversity. We fill this gap by proposing EgoBody, a novel large-scale dataset for human pose, shape and motion estimation from egocentric views, during interactions in complex 3D scenes. |
Siwei Zhang; Qianli Ma; Yan Zhang; Zhiyin Qian; Taein Kwon; Marc Pollefeys; Federica Bogo; Siyu Tang; |
222 | Grasp’D: Differentiable Contact-Rich Grasp Synthesis for Multi-Fingered Hands Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents Grasp’D, an approach to grasp synthesis by differentiable contact simulation that can work with both known models and visual inputs. |
Dylan Turpin; Liquan Wang; Eric Heiden; Yun-Chun Chen; Miles Macklin; Stavros Tsogkas; Sven Dickinson; Animesh Garg; |
223 | AutoAvatar: Autoregressive Neural Fields for Dynamic Avatar Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Neural fields such as implicit surfaces have recently enabled avatar modeling from raw scans without explicit temporal correspondences. In this work, we exploit autoregressive modeling to further extend this notion to capture dynamic effects, such as soft-tissue deformations. |
Ziqian Bai; Timur Bagautdinov; Javier Romero; Michael Zollhö,fer; Ping Tan; Shunsuke Saito; |
224 | Deep Radial Embedding for Visual Sequence Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this study, we propose an objective function named RadialCTC that constrains sequence features on a hypersphere while retaining the iterative alignment mechanism of CTC. |
Yuecong Min; Peiqi Jiao; Yanan Li; Xiaotao Wang; Lei Lei; Xiujuan Chai; Xilin Chen; |
225 | SAGA: Stochastic Whole-Body Grasping with Contact Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, we propose a multi-task generative model, to jointly learn static whole-body grasping poses and human-object contacts. |
Yan Wu; Jiahao Wang; Yan Zhang; Siwei Zhang; Otmar Hilliges; Fisher Yu; Siyu Tang; |
226 | Neural Capture of Animatable 3D Human from Monocular Video Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views. |
Gusi Te; Xiu Li; Xiao Li; Jinglu Wang; Wei Hu; Yan Lu; |
227 | General Object Pose Transformation Network from Unpaired Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we address a problem of novel general object pose transformation from unpaired data. |
Yukun Su; Guosheng Lin; Ruizhou Sun; Qingyao Wu; |
228 | Compositional Human-Scene Interaction Synthesis with Semantic Control Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our goal is to synthesize humans interacting with a given 3D scene controlled by high-level semantic specifications as pairs of action categories and object instances, e.g., “sit on the chair”. |
Kaifeng Zhao; Shaofei Wang; Yan Zhang; Thabo Beeler; Siyu Tang; |
229 | PressureVision: Estimating Hand Pressure from A Single RGB Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We explore the possibility of using a conventional RGB camera to infer hand pressure, enabling machine perception of hand pressure from uninstrumented hands and surfaces. |
Patrick Grady; Chengcheng Tang; Samarth Brahmbhatt; Christopher D. Twigg; Chengde Wan; James Hays; Charles C. Kemp; |
230 | PoseScript: 3D Human Poses from Natural Language Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce the PoseScript dataset, which pairs a few thousand 3D human poses from AMASS with rich human-annotated descriptions of the body parts and their spatial relationships. |
Ginger Delmas; Philippe Weinzaepfel; Thomas Lucas; Francesc Moreno-Noguer; Gré,gory Rogez; |
231 | DProST: Dynamic Projective Spatial Transformer Network for 6D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, projective geometry in the camera space is not considered in those methods and causes performance degradation. In this regard, we propose a new pose estimation system based on a projective grid instead of object vertices. |
Jaewoo Park; Nam Ik Cho; |
232 | 3D Interacting Hand Pose Estimation By Hand De-Occlusion and Removal Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To tackle these two challenges, we propose a novel Hand De-occlusion and Removal (HDR) framework to perform hand de-occlusion and distractor removal.We also propose the first large-scale synthetic amodal hand dataset, termed Amodal InterHand Dataset (AIH), to facilitate model training and promote the development of the related research. |
Hao Meng; Sheng Jin; Wentao Liu; Chen Qian; Mengxiang Lin; Wanli Ouyang; Ping Luo; |
233 | Pose for Everything: Towards Category-Agnostic Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition. |
Lumin Xu; Sheng Jin; Wang Zeng; Wentao Liu; Chen Qian; Wanli Ouyang; Ping Luo; Xiaogang Wang; |
234 | PoseGPT: Quantization-Based 3D Human Motion Generation and Forecasting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, we generate motion conditioned on observations of arbitrary length, including none. To solve this generalized problem, we propose PoseGPT, an auto-regressive transformer-based approach which internally compresses human motion into quantized latent sequences. |
Thomas Lucas; Fabien Baradel; Philippe Weinzaepfel; Gré,gory Rogez; |
235 | DH-AUG: DH Forward Kinematics Model Driven Augmentation for 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to the lack of diversity of datasets, the generalization ability of the pose estimator is poor. To solve this problem, we propose a pose augmentation solution via DH forward kinematics model, which we call DH-AUG. |
Linzhi Huang; Jiahao Liang; Weihong Deng; |
236 | Estimating Spatially-Varying Lighting in Urban Scenes with Disentangled Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present an end-to-end network for spatially-varying outdoor lighting estimation in urban scenes given a single limited field-of-view LDR image and any assigned 2D pixel position. |
Jiajun Tang; Yongjie Zhu; Haoyu Wang; Jun Hoong Chan; Si Li; Boxin Shi; |
237 | Boosting Event Stream Super-Resolution with A Recurrent Neural Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing methods for event stream super-resolution (SR) either require high-quality and high-resolution frames or underperform for large factor SR. To address these problems, we propose a recurrent neural network for event SR without frames. |
Wenming Weng; Yueyi Zhang; Zhiwei Xiong; |
238 | Projective Parallel Single-Pixel Imaging to Overcome Global Illumination in 3D Structure Light Scanning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present projective parallel single-pixel imaging (pPSI), wherein the 4D LTCs are reduced to multiple projection functions to facilitate a highly efficient data capture process. |
Yuxi Li; Huijie Zhao; Hongzhi Jiang; Xudong Li; |
239 | Semantic-Sparse Colorization Network for Deep Exemplar-Based Colorization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Previous approaches have attempted to construct such a correspondence but are faced with two obstacles. First, using luminance channels for the calculation of correspondence is inaccurate. Second, the dense correspondence they built introduces wrong matching results and increases the computation burden. To address these two problems, we propose Semantic-Sparse Colorization Network (SSCN) to transfer both the global image style and detailed semantic-related colors to the gray-scale image in a coarse-to-fine manner. |
Yunpeng Bai; Chao Dong; Zenghao Chai; Andong Wang; Zhengzhuo Xu; Chun Yuan; |
240 | Practical and Scalable Desktop-Based High-Quality Facial Capture Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel desktop-based system for high-quality facial capture including geometry and facial appearance.We additionally present a novel set of binary illumination patterns for efficient acquisition of reflectance and photometric normals using our setup, with diffuse-specular separation. |
Alexandros Lattas; Yiming Lin; Jayanth Kannan; Ekin Ozturk; Luca Filipi; Giuseppe Claudio Guarnera; Gaurav Chawla; Abhijeet Ghosh; |
241 | FAST-VQA: Efficient End-to-End Video Quality Assessment with Fragment Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution and covers global quality with contextual relations via mini-patches sampled in uniform grids. |
Haoning Wu; Chaofeng Chen; Jingwen Hou; Liang Liao; Annan Wang; Wenxiu Sun; Qiong Yan; Weisi Lin; |
242 | Physically-Based Editing of Indoor Scene Lighting from A Single Image Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a method to edit complex indoor lighting from a single image with its predicted depth and light source segmentation masks. |
Zhengqin Li; Jia Shi; Sai Bi; Rui Zhu; Kalyan Sunkavalli; Miloš Hašan; Zexiang Xu; Ravi Ramamoorthi; Manmohan Chandraker; |
243 | LEDNet: Joint Low-Light Enhancement and Deblurring in The Dark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Training an end-to-end network is also infeasible as no paired data is available to characterize the coexistence of low light and blurs. We address the problem by introducing a novel data synthesis pipeline that models realistic low-light blurring degradations, especially for blurs in saturated regions, e.g., light streaks, that often appear in the night images. |
Shangchen Zhou; Chongyi Li; Chen Change Loy; |
244 | MPIB: An MPI-Based Bokeh Rendering Framework for Realistic Partial Occlusion Effects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on this analysis, we propose an MPI representation module combined with a background inpainting module to implement high-resolution scene representation. |
Juewen Peng; Jianming Zhang; Xianrui Luo; Hao Lu; Ke Xian; Zhiguo Cao; |
245 | Real-RawVSR: Real-World Raw Video Super-Resolution with A Benchmark Dataset Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Considering the superiority of raw image SR over sRGB image SR, we construct a real-world raw video SR (Real-RawVSR) dataset and propose a corresponding SR method. |
Huanjing Yue; Zhiming Zhang; Jingyu Yang; |
246 | Transform Your Smartphone Into A DSLR Camera: Learning The ISP in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a trainable Image Signal Processing (ISP) framework that produces DSLR quality images given RAW images captured by a smartphone. |
Ardhendu Shekhar Tripathi; Martin Danelljan; Samarth Shukla; Radu Timofte; Luc Van Gool; |
247 | Learning Deep Non-Blind Image Deconvolution Without Ground Truths Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes an unsupervised deep learning approach for NBID which avoids accessing GT images. |
Yuhui Quan; Zhuojie Chen; Huan Zheng; Hui Ji; |
248 | NEST: Neural Event Stack for Event-Based Image Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a novel event representation named Neural Event STack (NEST), which satisfies physical constraints and encodes comprehensive motion and temporal information sufficient for image enhancement. |
Minggui Teng; Chu Zhou; Hanyue Lou; Boxin Shi; |
249 | Editable Indoor Lighting Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a method for estimating lighting from a single perspective image of an indoor scene. |
Henrique Weber; Mathieu Garon; Jean-Franç,ois Lalonde; |
250 | Fast Two-Step Blind Optical Aberration Correction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a two-step scheme to correct optical aberrations in a single raw or JPEG image, i.e., without any prior information on the camera or lens. |
Thomas Eboli; Jean-Michel Morel; Gabriele Facciolo; |
251 | Seeing Far in The Dark with Patterned Flash Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new flash technique, named “patterned flash”, for flash imaging at a long distance. |
Zhanghao Sun; Jian Wang; Yicheng Wu; Shree Nayar; |
252 | PseudoClick: Interactive Image Segmentation with Click Imitation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose PseudoClick, a generic framework that enables existing segmentation networks to propose candidate next clicks. |
Qin Liu; Meng Zheng; Benjamin Planche; Srikrishna Karanam; Terrence Chen; Marc Niethammer; Ziyan Wu; |
253 | CT$^2$: Colorization Transformer Via Color Tokens Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Automatic image colorization is an ill-posed problem with multi-modal uncertainty, and there remains two main challenges with previous methods: incorrect semantic colors and under-saturation. In this paper, we propose an end-to-end transformer-based model to overcome these challenges. |
Shuchen Weng; Jimeng Sun; Yu Li; Si Li; Boxin Shi; |
254 | Simple Baselines for Image Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple baseline that exceeds the SOTA methods and is computationally efficient. |
Liangyu Chen; Xiaojie Chu; Xiangyu Zhang; Jian Sun; |
255 | Spike Transformer: Monocular Depth Estimation for Spiking Camera Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on the depth estimation task, which is challenging due to the natural properties of spike streams, such as irregularity, continuity, and spatial-temporal correlation, and has not been explored for the spiking camera.Furthermore, we build two spike-based depth datasets. |
Jiyuan Zhang; Lulu Tang; Zhaofei Yu; Jiwen Lu; Tiejun Huang; |
256 | Improving Image Restoration By Revisiting Global Information Aggregation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To reduce the inconsistency and improve test-time performance, we propose a simple method called Test-time Local Converter (TLC). |
Xiaojie Chu; Liangyu Chen; Chengpeng Chen; Xin Lu; |
257 | Data Association Between Event Streams and Intensity Frames Under Diverse Baselines Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a learning-based framework to associate event streams and intensity frames under diverse camera baselines, to simultaneously benefit to camera pose estimation under large baseline and depth estimation under small baseline. |
Dehao Zhang; Qiankun Ding; Peiqi Duan; Chu Zhou; Boxin Shi; |
258 | D2HNet: Joint Denoising and Deblurring with Hierarchical Network for Robust Night Image Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To exploit the information from successive long- and short-exposure images, we propose a learning-based pipeline to fuse them. |
Yuzhi Zhao; Yongzhe Xu; Qiong Yan; Dingdong Yang; Xuehui Wang; Lai-Man Po; |
259 | Learning Graph Neural Networks for Image Style Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study a novel semi-parametric neural style transfer framework that alleviates the deficiency of both parametric and non-parametric stylization. |
Yongcheng Jing; Yining Mao; Yiding Yang; Yibing Zhan; Mingli Song; Xinchao Wang; Dacheng Tao; |
260 | DeepPS2: Revisiting Photometric Stereo Using Two Differently Illuminated Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we attempt to address an under-explored problem of photometric stereo using just two differently illuminated images, referred to as the PS2 problem. |
Ashish Tiwari; Shanmuganathan Raman; |
261 | Instance Contour Adjustment Via Structure-Driven CNN Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Due to the ignorance of these requirements, the off-the-shelf image editing methods herein are unsuited. Therefore, we propose a specialized two-stage method. |
Shuchen Weng; Yi Wei; Ming-Ching Chang; Boxin Shi; |
262 | Synthesizing Light Field Video from Monocular Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Hence, we propose a self-supervised learning-based algorithm for LF video reconstruction from monocular videos. |
Shrisudhan Govindarajan; Prasan Shedligeri; Sarah; Kaushik Mitra; |
263 | Human-Centric Image Cropping with Partition-Aware and Content-Preserving Features Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we consider a specific and practical application: human-centric image cropping, which focuses on the depiction of a person. |
Bo Zhang; Li Niu; Xing Zhao; Liqing Zhang; |
264 | DeMFI: Deep Joint Deblurring and Multi-Frame Interpolation with Flow-Guided Attentive Correlation and Recursive Boosting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel joint deblurring and multi-frame interpolation (DeMFI) framework in a two-stage manner, called DeMFINet, which converts blurry videos of lower-frame-rate to sharp videos at higher-frame-rate based on flow-guided attentive-correlation-based feature bolstering (FAC-FB) module and recursive boosting (RB), in terms of multi-frame interpolation (MFI). |
Jihyong Oh; Munchurl Kim; |
265 | Neural Image Representations for Multi-Image Fusion and Layer Separation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a framework for aligning and fusing multiple images into a single view using neural image representations (NIRs), also known as implicit or coordinate-based neural representations. |
Seonghyeon Nam; Marcus A. Brubaker; Michael S. Brown; |
266 | Bringing Rolling Shutter Images Alive with Dual Reversed Distortion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, since RS distortion is coupled with other factors such as readout settings and the relative velocity of scene elements to the camera, models that only exploit the geometric correlation between temporally adjacent images suffer from poor generality in processing data with different readout settings and dynamic scenes with both camera motion and object motion. In this paper, instead of two consecutive frames, we propose to exploit a pair of images captured by dual RS cameras with reversed RS directions for this highly challenging task. |
Zhihang Zhong; Mingdeng Cao; Xiao Sun; Zhirong Wu; Zhongyi Zhou; Yinqiang Zheng; Stephen Lin; Imari Sato; |
267 | FILM: Frame Interpolation for Large Motion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a frame interpolation algorithm that synthesizes an engaging slow-motion video from near-duplicate photos which often exhibit large scene motion. |
Fitsum Reda; Janne Kontkanen; Eric Tabellion; Deqing Sun; Caroline Pantofaru; Brian Curless; |
268 | Video Interpolation By Event-Driven Anisotropic Adjustment of Optical Flow Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an end-to-end training method A^2OF for video frame interpolation with event-driven Anisotropic Adjustment of Optical Flows. |
Song Wu; Kaichao You; Weihua He; Chen Yang; Yang Tian; Yaoyuan Wang; Ziyang Zhang; Jianxing Liao; |
269 | EvAC3D: From Event-Based Apparent Contours to 3D Models Via Continuous Visual Hulls Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the problem of 3D reconstruction from event-cameras, motivated by the advantages of event-based cameras in terms of low power and latency as well as by the biological evidence that eyes in nature capture the same data and still perceive well 3D shape. |
Ziyun Wang; Kenneth Chaney; Kostas Daniilidis; |
270 | DCCF: Deep Comprehensible Color Filter Learning Framework for High-Resolution Image Harmonization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Deep Comprehensible Color Filter (DCCF) learning framework for high-resolution image harmonization. |
Ben Xue; Shenghui Ran; Quan Chen; Rongfei Jia; Binqiang Zhao; Xing Tang; |
271 | SelectionConv: Convolutional Neural Networks for Non-Rectilinear Image Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Such data are usually processed using networks and algorithms specialized for each type. In this work, we show that it may not always be necessary to use specialized neural networks to operate on such spaces. |
David Hart; Michael Whitney; Bryan Morse; |
272 | Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the model size and computational cost limit the ability of their models on edge devices and higher-resolution images. In this paper, we propose a spatial-separated curve rendering network (S2CRNet), a novel framework to prove that the simple global editing can effectively address this task as well as the challenge of high-resolution image harmonization for the first time. |
Jingtang Liang; Xiaodong Cun; Chi-Man Pun; Jue Wang; |
273 | BigColor: Colorization Using A Generative Color Prior for Natural Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose BigColor, a novel colorization approach that provides vivid colorization for diverse in-the-wild images with complex structures. |
Geonung Kim; Kyoungkook Kang; Seongtae Kim; Hwayoon Lee; Sehoon Kim; Jonghyun Kim; Seung-Hwan Baek; Sunghyun Cho; |
274 | CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, to achieve high average bit-reduction with less accuracy loss, we propose a novel Content-Aware Dynamic Quantization (CADyQ) method for SR networks that allocates optimal bits to local regions and layers adaptively based on the local contents of an input image. |
Cheeun Hong; Sungyong Baik; Heewon Kim; Seungjun Nah; Kyoung Mu Lee; |
275 | Deep Semantic Statistics Matching (D2SM) Denoising Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce the Deep Semantic Statistics Matching (D2SM) Denoising Network. |
Kangfu Mei; Vishal M. Patel; Rui Huang; |
276 | 3D Scene Inference from Transient Histograms Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose low-cost and low-power imaging modalities that capture scene information from minimal time-resolved image sensors with as few as one pixel. |
Sacha Jungerman; Atul Ingle; Yin Li; Mohit Gupta; |
277 | Neural Space-Filling Curves Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present Neural Space-filling Curves (SFCs), a data-driven approach to infer a context-based scan order for a set of images. |
Hanyu Wang; Kamal Gupta; Larry Davis; Abhinav Shrivastava; |
278 | Exposure-Aware Dynamic Weighted Learning for Single-Shot HDR Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel single-shot high dynamic range (HDR) imaging algorithm based on exposure-aware dynamic weighted learning, which reconstructs an HDR image from a spatially varying exposure (SVE) raw image. |
An Gia Vien; Chul Lee; |
279 | Seeing Through A Black Box: Toward High-Quality Terahertz Imaging Via Subspace-and-Attention Guided Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address the problem, we propose a novel Subspace-and-Attention-guided Restoration Network (SARNet) that fuses multi-spectral features of a THz image for effective restoration. |
Weng-Tai Su; Yi-Chun Hung; Po-Jen Yu; Shang-Hua Yang; Chia-Wen Lin; |
280 | Tomography of Turbulence Strength Based on Scintillation Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As far as we know, this work is the first to propose reconstruction of a TS horizontal field, using passive optical scintillation measurements. |
Nir Shaul; Yoav Y. Schechner; |
281 | Realistic Blur Synthesis for Learning Image Deblurring Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we present RSBlur, a novel dataset with real blurred images and the corresponding sharp image sequences to enable a detailed analysis of the difference between real and synthetic blur. |
Jaesung Rim; Geonung Kim; Jungeon Kim; Junyong Lee; Seungyong Lee; Sunghyun Cho; |
282 | Learning Phase Mask for Privacy-Preserving Passive Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The key question we address is: Can cameras be enhanced with a scalable solution to preserve users’ privacy without degrading their machine intelligence capabilities? |
Zaid Tasneem; Giovanni Milione; Yi-Hsuan Tsai; Xiang Yu; Ashok Veeraraghavan; Manmohan Chandraker; Francesco Pittaluga; |
283 | LWGNet – Learned Wirtinger Gradients for Fourier Ptychographic Phase Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a hybrid model-driven residual network that combines the knowledge of the forward imaging system with a deep data-driven network. |
Atreyee Saha; Salman S. Khan; Sagar Sehrawat; Sanjana S. Prabhu; Shanti Bhattacharya; Kaushik Mitra; |
284 | PANDORA: Polarization-Aided Neural Decomposition of Radiance Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose PANDORA, a polarimetric inverse rendering approach based on implicit neural representations. |
Akshat Dave; Yongyi Zhao; Ashok Veeraraghavan; |
285 | HuMMan: Multi-modal 4D Human Dataset for Versatile Sensing and Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we contribute HuMMan, a large-scale multi-modal 4D human dataset with 1000 human subjects, 400k sequences and 60M frames. |
Zhongang Cai; Daxuan Ren; Ailing Zeng; Zhengyu Lin; Tao Yu; Wenjia Wang; Xiangyu Fan; Yang Gao; Yifan Yu; Liang Pan; Fangzhou Hong; Mingyuan Zhang; Chen Change Loy; Lei Yang; Ziwei Liu; |
286 | DVS-Voltmeter: Stochastic Process-Based Event Simulator for Dynamic Vision Sensors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an event simulator, dubbed DVS-Voltmeter, to enable high-performance deep networks for DVS applications. |
Songnan Lin; Ye Ma; Zhenhua Guo; Bihan Wen; |
287 | Benchmarking Omni-Vision Representation Through The Lens of Visual Realms Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Omni-Realm Benchmark (OmniBenchmark) that enables systematically measuring the generalization ability across a wide range of visual realms. |
Yuanhan Zhang; Zhenfei Yin; Jing Shao; Ziwei Liu; |
288 | BEAT: A Large-Scale Semantic and Emotional Multi-modal Dataset for Conversational Gestures Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on this observation, we propose a baseline model, Cascaded Motion Network (CaMN), which consists of above six modalities modeled in a cascaded architecture for gesture synthesis. |
Haiyang Liu; Zihao Zhu; Naoya Iwamoto; Yichen Peng; Zhengqing Li; You Zhou; Elif Bozkurt; Bo Zheng; |
289 | Neuromorphic Data Augmentation for Training Spiking Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This issue remains unexplored by previous academic works. In an effort to minimize this generalization gap, we propose Neuromorphic Data Augmentation (NDA), a family of geometric augmentations specifically designed for event-based datasets with the goal of significantly stabilizing the SNN training and reducing the generalization gap between training and test performance. |
Yuhang Li; Youngeun Kim; Hyoungseob Park; Tamar Geller; Priyadarshini Panda; |
290 | CelebV-HQ: A Large-Scale Video Facial Attributes Dataset Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a large-scale, high-quality, and diverse video dataset, named the High-Quality Celebrity Video Dataset (CelebV-HQ), with rich facial attribute annotations. |
Hao Zhu; Wayne Wu; Wentao Zhu; Liming Jiang; Siwei Tang; Li Zhang; Ziwei Liu; Chen Change Loy; |
291 | MovieCuts: A New Dataset and Benchmark for Cut Type Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces the Cut type recognition task, which requires modeling multi-modal information. |
Alejandro Pardo; Fabian Caba; Juan Leó,n Alcá,zar; Ali Thabet; Bernard Ghanem; |
292 | LaMAR: Benchmarking Localization and Mapping for Augmented Reality Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Furthermore, ground-truth (GT) accuracy is mostly insufficient to satisfy AR requirements. To close this gap, we introduce a new benchmark with a comprehensive capture and GT pipeline, which allow us to co-register realistic AR trajectories in diverse scenes and from heterogeneous devices at scale. |
Paul-Edouard Sarlin; Mihai Dusmanu; Johannes L. Schö,nberger; Pablo Speciale; Lukas Gruber; Viktor Larsson; Ondrej Miksik; Marc Pollefeys; |
293 | Unitail: Detecting, Reading, and Matching in Retail Scene Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To make full use of computer vision technology in stores, it is required to consider the actual needs that fit the characteristics of the retail scene. Pursuing this goal, we introduce the United Retail Datasets (Unitail), a large-scale benchmark of basic visual tasks on products that challenges algorithms for detecting, reading, and matching. |
Fangyi Chen; Han Zhang; Zaiwang Li; Jiachen Dou; Shentong Mo; Hao Chen; Yongxin Zhang; Uzair Ahmed; Chenchen Zhu; Marios Savvides; |
294 | Not Just Streaks: Towards Ground Truth for Single Image Deraining Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a large-scale dataset of real-world rainy and clean image pairs and a method to remove degradations, induced by rain streaks and rain accumulation, from the image. |
Yunhao Ba; Howard Zhang; Ethan Yang; Akira Suzuki; Arnold Pfahnl; Chethan Chinder Chandrappa; Celso M. de Melo; Suya You; Stefano Soatto; Alex Wong; Achuta Kadambi; |
295 | ECCV Caption: Correcting False Negatives By Collecting Machine-and-Human-Verified Image-Caption Associations for MS-COCO Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To correct the massive false negatives, we construct the Extended COCO Validation (ECCV) Caption dataset by supplying the missing associations with machine and human annotators. |
Sanghyuk Chun; Wonjae Kim; Song Park; Minsuk Chang; Seong Joon Oh; |
296 | MOTCOM: The Multi-Object Tracking Dataset Complexity Metric Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As a remedy, we present the novel MOT dataset complexity metric (MOTCOM), which is a combination of three sub-metrics inspired by key problems in MOT: occlusion, erratic motion, and visual similarity. |
Malte Pedersen; Joakim Bruslund Haurum; Patrick Dendorfer; Thomas B. Moeslund; |
297 | How to Synthesize A Large-Scale and Trainable Micro-Expression Dataset? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper does not contain technical novelty but introduces our key discoveries in a data generation protocol, a database and insights. |
Yuchi Liu; Zhongdao Wang; Tom Gedeon; Liang Zheng; |
298 | A Real World Dataset for Multi-View 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a dataset of 371 3D models of everyday tabletop objects along with their 320,000 real world RGB and depth images. |
Rakesh Shrestha; Siqi Hu; Minghao Gou; Ziyuan Liu; Ping Tan; |
299 | REALY: Rethinking The Evaluation of 3D Face Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel evaluation approach with a new benchmark REALY, consists of 100 globally aligned face scans with accurate facial keypoints, high-quality region masks, and topology-consistent meshes. |
Zenghao Chai; Haoxian Zhang; Jing Ren; Di Kang; Zhengzhuo Xu; Xuefei Zhe; Chun Yuan; Linchao Bao; |
300 | Capturing, Reconstructing, and Simulating: The UrbanScene3D Dataset Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present UrbanScene3D, a large-scale data platform for research of urban scene perception and reconstruction. |
Liqiang Lin; Yilin Liu; Yue Hu; Xingguang Yan; Ke Xie; Hui Huang; |
301 | 3D CoMPaT: Composition of Materials on Parts of 3D Things Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present 3D CoMPaT, a richly annotated large-scale dataset of more than 7.19 million rendered compositions of Materials on Parts of 7262 unique 3D Models 990 compositions per model on average. |
Yuchen Li; Ujjwal Upadhyay; Habib Slim; Tezuesh Varshney; Ahmed Abdelreheem; Arpit Prajapati; Suhail Pothigara; Peter Wonka; Mohamed Elhoseiny; |
302 | PartImageNet: A Large, High-Quality Dataset of Parts Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This is partly due to the difficulty and high cost of annotating object parts so it has rarely been done except for humans (where there exists a big literature on part-based models). To help address this problem, we propose PartImageNet, a large, high-quality dataset with part segmentation annotations. |
Ju He; Shuo Yang; Shaokang Yang; Adam Kortylewski; Xiaoding Yuan; Jie-Neng Chen; Shuai Liu; Cheng Yang; Qihang Yu; Alan Yuille; |
303 | A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce A-OKVQA, a crowdsourced dataset composed of a diverse set of about 25K questions requiring a broad base of commonsense and world knowledge to answer. |
Dustin Schwenk; Apoorv Khandelwal; Christopher Clark; Kenneth Marino; Roozbeh Mottaghi; |
304 | OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce ROBIN, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions, and enables benchmarking models for image classification, object detection, and 3D pose estimation. |
Bingchen Zhao; Shaozuo Yu; Wufei Ma; Mingxin Yu; Shenxiao Mei; Angtian Wang; Ju He; Alan Yuille; Adam Kortylewski; |
305 | Facial Depth and Normal Estimation Using Single Dual-Pixel Camera Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce a DP-oriented Depth/Normal estimation network that reconstructs the 3D facial geometry.In addition, to train the network, we collect DP facial data with more than 135K images for 101 persons captured with our multi-camera structured light systems. |
Minjun Kang; Jaesung Choe; Hyowon Ha; Hae-Gon Jeon; Sunghoon Im; In So Kweon; Kuk-Jin Yoon; |
306 | The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-assisted video editing. |
Dawit Mureja Argaw; Fabian Caba; Joon-Young Lee; Markus Woodson; In So Kweon; |
307 | StyleBabel: Artistic Style Tagging and Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks, collected via a novel participatory method from experts studying at specialist art and design schools. |
Dan Ruta; Andrew Gilbert; Pranav Aggarwal; Naveen Marri; Ajinkya Kale; Jo Briggs; Chris Speed; Hailin Jin; Baldo Faieta; Alex Filipkowski; Zhe Lin; John Collomosse; |
308 | PANDORA: A Panoramic Detection Dataset for Object with Orientation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a new bounding box representation, Rotated Bounding Field of View (RBFoV), for the panoramic image object detection task.Then, based on the RBFoV, we present a PANoramic Detection dataset for Object with oRientAtion (PANDORA). |
Hang Xu; Qiang Zhao; Yike Ma; Xiaodong Li; Peng Yuan; Bailan Feng; Chenggang Yan; Feng Dai; |
309 | FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Namely, we propose a hierarchical sketch decoder, which we leverage at a sketch-specific “pretext” task.We will release the dataset upon acceptance. |
Pinaki Nath Chowdhury; Aneeshan Sain; Ayan Kumar Bhunia; Tao Xiang; Yulia Gryaditskaya; Yi-Zhe Song; |
310 | Exploring Fine-Grained Audiovisual Categorization with The SSW60 Dataset Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a new benchmark dataset, Sapsucker Woods 60 (SSW60), for advancing research on audiovisual fine-grained categorization. |
Grant Van Horn; Rui Qian; Kimberly Wilber; Hartwig Adam; Oisin Mac Aodha; Serge Belongie; |
311 | The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for detecting, tracking, and counting fish in sonar videos. |
Justin Kay; Peter Kulits; Suzanne Stathatos; Siqi Deng; Erik Young; Sara Beery; Grant Van Horn; Pietro Perona; |
312 | A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app. |
Andrea Burns; Deniz Arsan; Sanjna Agrawal; Ranjitha Kumar; Kate Saenko; Bryan A. Plummer; |
313 | BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: These characteristics are found in all existing datasets for dance motion synthesis, and indeed recent methods can achieve good results. We introduce a new dataset aiming to challenge these common assumptions, compiling a set of dynamic dance sequences displaying complex human poses. |
Davide Moltisanti; Jinyi Wu; Bo Dai; Chen Change Loy; |
314 | Dress Code: High-Resolution Multi-Category Virtual Try-On Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This shortcoming arises from a main factor: current publicly available datasets for image-based virtual try-on do not account for this variety, thus limiting progress in the field. To address this deficiency, we introduce Dress Code, which contains images of multi-category clothes. |
Davide Morelli; Matteo Fincato; Marcella Cornia; Federico Landi; Fabio Cesari; Rita Cucchiara; |
315 | A Data-Centric Approach for Improving Ambiguous Labels with Combined Semi-Supervised Classification and Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Subjective annotations by annotators often lead to ambiguous labels in real-world datasets. We propose a data-centric approach to relabel such ambiguous labels instead of implementing the handling of this issue in a neural network. |
Lars Schmarje; Monty Santarossa; Simon-Martin Schrö,der; Claudius Zelenka; Rainer Kiko; Jenny Stracke; Nina Volkmann; Reinhard Koch; |
316 | ClearPose: Large-Scale Transparent Object Dataset and Benchmark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we contribute a large-scale real-world RGB-Depth transparent object dataset named ClearPose to serve as a benchmark dataset for segmentation, scene-level depth completion, and object-centric pose estimation tasks. |
Xiaotong Chen; Huijie Zhang; Zeren Yu; Anthony Opipari; Odest Chadwicke Jenkins; |
317 | When Deep Classifiers Agree: Analyzing Correlations Between Learning Order and Image Statistics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: It has been hypothesized that neural networks converge not only to similar representations, but also exhibit a notion of empirical agreement on which data instances are learned first. Following in the latter works’ footsteps, we define a metric to quantify the relationship between such classification agreement over time, and posit that the agreement phenomenon can be mapped to core statistics of the investigated dataset. |
Iuliia Pliushch; Martin Mundt; Nicolas Lupp; Visvanathan Ramesh; |
318 | AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel Animation CelebHeads dataset (AnimeCeleb) to address an animation head reenactment. |
Kangyeol Kim; Sunghyun Park; Jaeseong Lee; Sunghyo Chung; Junsoo Lee; Jaegul Choo; |
319 | MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we present a large-scale video-audio-text dataset MUGEN, collected using the open-sourced platform game CoinRun. |
Thomas Hayes; Songyang Zhang; Xi Yin; Guan Pang; Sasha Sheng; Harry Yang; Songwei Ge; Qiyuan Hu; Devi Parikh; |
320 | A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A key algorithm for understanding the world is material segmentation, which assigns a label (metal, glass, etc.) to each pixel. We find that a model trained on existing data underperforms in some settings and propose to address this with a large-scale dataset of 3.2 million dense segments on 44,560 indoor and outdoor images, which is 23x more segments than existing data. |
Paul Upchurch; Ransen Niu; |
321 | MimicME: A Large Scale Diverse 4D Database for Facial Expression Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This lack of large datasets hinders the exploitation of the great advances that DNNs can provide. In this paper, we overcome these limitations by introducing MimicMe, a novel large-scale database of dynamic high-resolution 3D faces. |
Athanasios Papaioannou; Baris Gecer; Shiyang Cheng; Grigorios G. Chrysos; Jiankang Deng; Eftychia Fotiadou; Christos Kampouris; Dimitrios Kollias; Stylianos Moschoglou; Kritaphat Songsri-In; Stylianos Ploumpis; George Trigeorgis; Panagiotis Tzirakis; Evangelos Ververas; Yuxiang Zhou; Allan Ponniah; Anastasios Roussos; Stefanos Zafeiriou; |
322 | Delving Into Universal Lesion Segmentation: Method, Dataset, and Benchmark Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Considering that it is easy to encode CT slices owing to the limited CT scenarios, we propose a Knowledge Embedding Module (KEM) to adapt the concept of dictionary learning for this task. |
Yu Qiu; Jing Xu; |
323 | Large Scale Real-World Multi-person Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a new large scale multi-person tracking dataset. |
Bing Shuai; Alessandro Bergamo; Uta Bü,chler; Andrew Berneshawi; Alyssa Boden; Joseph Tighe; |
324 | D2-TPred: Discontinuous Dependency for Trajectory Prediction Under Traffic Lights Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a trajectory prediction approach with respect to traffic lights, D2-TPred, which uses a spatial dynamic interaction graph (SDG) and a behavior dependency graph (BDG) to handle the problem of discontinuous dependency in the spatial-temporal space. |
Yuzhen Zhang; Wentong Wang; Weizhi Guo; Pei Lv; Mingliang Xu; Wei Chen; Dinesh Manocha; |
325 | The Missing Link: Finding Label Relations Across Datasets Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we explore the automatic discovery of visual-semantic relations between labels across datasets. |
Jasper Uijlings; Thomas Mensink; Vittorio Ferrari; |
326 | Learning Omnidirectional Flow in 360° Video Via Siamese Representation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To accommodate the omnidirectional nature, we present a novel Siamese representation Learning framework for Omnidirectional Flow (SLOF). |
Keshav Bhandari; Bin Duan; Gaowen Liu; Hugo Latapie; Ziliang Zong; Yan Yan; |
327 | VizWiz-FewShot: Locating Objects in Images Taken By People with Visual Impairments Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. |
Yu-Yun Tseng; Alexander Bell; Danna Gurari; |
328 | TRoVE: Transforming Road Scene Datasets Into Photorealistic Virtual Environments Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work proposes a synthetic data generation pipeline that utilizes existing datasets, like nuScenes, to address the difficulties and domain-gaps present in simulated datasets. |
Shubham Dokania; Anbumani Subramanian; Manmohan Chandraker; C.V. Jawahar; |
329 | Trapped in Texture Bias? A Large Scale Comparison of Deep Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this study, we aim to understand if certain design decisions such as framework, architecture or pre-training contribute to the semantic understanding of instance segmentation. |
Johannes Theodoridis; Jessica Hofmann; Johannes Maucher; Andreas Schilling; |
330 | Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose AutoAlignV2, a faster and stronger multi-modal 3D detection framework, built on top of AutoAlign. |
Zehui Chen; Zhenyu Li; Shiquan Zhang; Liangji Fang; Qinhong Jiang; Feng Zhao; |
331 | WeLSA: Learning to Predict 6D Pose from Weakly Labeled Data Using Shape Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a weakly-supervised approach for object pose estimation from RGB-D data using training sets composed of very few labeled images with pose annotations along with weakly-labeled images with ground truth segmentation masks without pose labels. |
Shishir Reddy Vutukur; Ivan Shugurov; Benjamin Busam; Andreas Hutter; Slobodan Ilic; |
332 | Graph R-CNN: Towards Accurate 3D Object Detection with Semantic-Decorated Local Graph Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose the patch search to quickly search points in a local region for each 3D proposal. |
Honghui Yang; Zili Liu; Xiaopei Wu; Wenxiao Wang; Wei Qian; Xiaofei He; Deng Cai; |
333 | MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a flexible and high-performance 3D detection frame-work, named MPPNet, for 3D temporal object detection with point cloud sequences. |
Xuesong Chen; Shaoshuai Shi; Benjin Zhu; Ka Chun Cheung; Hang Xu; Hongsheng Li; |
334 | Long-Tail Detection with Effective Class-Margins Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we provide a theoretical understanding of the long-trail detection problem. |
Jang Hyun Cho; Philipp Krä,henbü,hl; |
335 | Semi-Supervised Monocular 3D Object Detection By Multi-View Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To alleviate the annotation effort, we propose MVC-MonoDet, the first semi-supervised training framework that improves Monocular 3D object detection by enforcing multi-view consistency. |
Qing Lian; Yanbo Xu; Weilong Yao; Yingcong Chen; Tong Zhang; |
336 | PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the issues, we perform a progressive way to introduce both temporal information and spatial information for an integrated enhancement. |
Han Wang; Jun Tang; Xiaodong Liu; Shanyan Guan; Rong Xie; Li Song; |
337 | BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images Via Spatiotemporal Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. |
Zhiqi Li; Wenhai Wang; Hongyang Li; Enze Xie; Chonghao Sima; Tong Lu; Yu Qiao; Jifeng Dai; |
338 | Category-Level 6D Object Pose and Size Estimation Using Self-Supervised Deep Prior Deformation Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the easy annotations in synthetic domains bring the downside effect of synthetic-to-real (Sim2Real) domain gap. In this work, we aim to address this issue in the task setting of Sim2Real, unsupervised domain adaptation for category-level 6D object pose and size estimation. |
Jiehong Lin; Zewei Wei; Changxing Ding; Kui Jia; |
339 | Dense Teacher: Dense Pseudo-Labels for Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose replacing the sparse pseudo-boxes with the dense prediction as a united and straightforward form of pseudo-label. |
Hongyu Zhou; Zheng Ge; Songtao Liu; Weixin Mao; Zeming Li; Haiyan Yu; Jian Sun; |
340 | Point-to-Box Network for Accurate Object Detection Via Single Point Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the performance gap between point supervised object detection (PSOD) and bounding box supervised detection remains large. In this paper, we attribute such a large performance gap to the failure of generating high-quality proposal bags which are crucial for multiple instance learning (MIL). |
Pengfei Chen; Xuehui Yu; Xumeng Han; Najmul Hassan; Kai Wang; Jiachen Li; Jian Zhao; Humphrey Shi; Zhenjun Han; Qixiang Ye; |
341 | Domain Adaptive Hand Keypoint and Pixel Localization in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose to utilize the divergence of two predictions to estimate the confidence of the target image for both tasks. |
Takehiko Ohkawa; Yu-Jhe Li; Qichen Fu; Ryosuke Furuta; Kris M. Kitani; Yoichi Sato; |
342 | Towards Data-Efficient Detection Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In other words, the detection transformers are generally data-hungry. To tackle this problem, we empirically analyze the factors that affect data efficiency, through a step-by-step transition from a data-efficient RCNN variant to the representative DETR. |
Wen Wang; Jing Zhang; Yang Cao; Yongliang Shen; Dacheng Tao; |
343 | Open-Vocabulary DETR with Conditional Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a novel open-vocabulary detector based on DETR—hence the name OV-DETR—which, once trained, can detect any object given its class name or an exemplar image. |
Yuhang Zang; Wei Li; Kaiyang Zhou; Chen Huang; Chen Change Loy; |
344 | Prediction-Guided Distillation for Dense Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we show that only a very small fraction of features within a ground-truth bounding box are responsible for a teacher’s high detection performance. |
Chenhongyi Yang; Mateusz Ochal; Amos Storkey; Elliot J. Crowley; |
345 | Multimodal Object Detection Via Probabilistic Ensembling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our key contribution is a probabilistic ensembling technique, ProbEn, a simple non-learned method that fuses together detections from multi-modalities. |
Yi-Ting Chen; Jinghao Shi; Zelin Ye; Christoph Mertz; Deva Ramanan; Shu Kong; |
346 | Exploiting Unlabeled Data with Vision and Language Models for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectively generating pseudo labels for object detection. |
Shiyu Zhao; Zhixing Zhang; Samuel Schulter; Long Zhao; Vijay Kumar B G; Anastasis Stathopoulos; Manmohan Chandraker; Dimitris N. Metaxas; |
347 | CPO: Change Robust Panorama to Point Cloud Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present CPO, a fast and robust algorithm that localizes a 2D panorama with respect to a 3D point cloud of a scene possibly containing changes. |
Junho Kim; Hojun Jang; Changwoon Choi; Young Min Kim; |
348 | INT: Towards Infinite-Frames 3D Detection with An Efficient Framework Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although increasing the number of frames might improve performance, previous multi-frame studies only used very limited frames to build their systems due to the dramatically increased computational and memory cost. To address these issues, we propose a novel on-stream training and prediction framework that, in theory, can employ an infinite number of frames while keeping the same amount of computation as a single-frame detector. |
Jianyun Xu; Zhenwei Miao; Da Zhang; Hongyu Pan; Kaixuan Liu; Peihan Hao; Jun Zhu; Zhengyang Sun; Hongmin Li; Xin Zhan; |
349 | End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this study, we propose a sparse proposal evolution (SPE) approach, which advances WSOD from the two-stage pipeline with dense proposals to an end-to-end framework with sparse proposals. |
Mingxiang Liao; Fang Wan; Yuan Yao; Zhenjun Han; Jialing Zou; Yuze Wang; Bailan Feng; Peng Yuan; Qixiang Ye; |
350 | Calibration-Free Multi-View Crowd Counting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To extend and apply MVCC to more practical situations, in this paper we propose calibration-free multi-view crowd counting (CF-MVCC), which obtains the scene-level count directly from the density map predictions for each camera view without needing the camera calibrations in the test. |
Qi Zhang; Antoni B. Chan; |
351 | Unsupervised Domain Adaptation for Monocular 3D Object Detection Via Self-Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To mitigate the depth-shift, we introduce the geometry-aligned multi-scale training strategy to disentangle the camera parameters and guarantee the geometry consistency of domains. |
Zhenyu Li; Zehui Chen; Ang Li; Liangji Fang; Qinhong Jiang; Xianming Liu; Junjun Jiang; |
352 | SuperLine3D: Self-Supervised Line Segmentation and Description for LiDAR Point Cloud Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Poles and building edges are frequently observable objects on urban roads, conveying reliable hints for various computer vision tasks. To repetitively extract them as features and perform association between discrete LiDAR frames for registration, we propose the first learning-based feature segmentation and description model for 3D lines in LiDAR point cloud. |
Xiangrui Zhao; Sheng Yang; Tianxin Huang; Jun Chen; Teng Ma; Mingyang Li; Yong Liu; |
353 | Exploring Plain Vision Transformer Backbones for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for object detection. |
Yanghao Li; Hanzi Mao; Ross Girshick; Kaiming He; |
354 | Adversarially-Aware Robust Object Detector Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we empirically explore the model training for adversarial robustness in object detection, which greatly attributes to the conflict between learning clean images and adversarial images. |
Ziyi Dong; Pengxu Wei; Liang Lin; |
355 | HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Conventional homogeneous KD (homo-KD) methods suffer from such a gap and are hard to directly obtain satisfactory performance for hetero-KD. In this paper, we propose the HEtero-Assists Distillation (HEAD) framework, leveraging heterogeneous detection heads as assistants to guide the optimization of the student detector to reduce this gap. |
Luting Wang; Xiaojie Li; Yue Liao; Zeren Jiang; Jianlong Wu; Fei Wang; Chen Qian; Si Liu; |
356 | You Should Look at All Objects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, this paper first revisits FPN in the detection framework and reveals the nature of the success of FPN from the perspective of optimization. Then, we point out that the degraded performance of large-scale objects is due to the arising of improper back-propagation paths after integrating FPN. |
Zhenchao Jin; Dongdong Yu; Luchuan Song; Zehuan Yuan; Lequan Yu; |
357 | Detecting Twenty-Thousand Classes Using Image-Level Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Detic, which simply trains the classifiers of a detector on image classification data and thus expands the vocabulary of detectors to tens of thousands of concepts. |
Xingyi Zhou; Rohit Girdhar; Armand Joulin; Philipp Krä,henbü,hl; Ishan Misra; |
358 | DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, surrogate objectives of correspondence learning in 3D space are a step away from the true ones of object pose estimation, making the learning suboptimal for the end task. In this paper, we address this shortcoming by introducing a new method of Deep Correspondence Learning Network for direct 6D object pose estimation, shortened as DCL-Net. |
Hongyang Li; Jiehong Lin; Kui Jia; |
359 | Monocular 3D Object Detection with Depth from Motion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by binocular methods for 3D object detection, we take advantage of the strong geometry structure provided by camera ego-motion for accurate object depth estimation and detection. |
Tai Wang; Jiangmiao Pang; Dahua Lin; |
360 | DISP6D: Disentangled Implicit Shape and Pose Learning for Scalable 6D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Building on a well-known auto-encoding framework to cope with object symmetry and the lack of labeled training data, we achieve scalability by disentangling the latent representation of auto-encoder into shape and pose sub-spaces. |
Yilin Wen; Xiangyu Li; Hao Pan; Lei Yang; Zheng Wang; Taku Komura; Wenping Wang; |
361 | Distilling Object Detectors with Global Knowledge Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, a novel prototype generation module (PGM) is proposed to find the common basis vectors, dubbed prototypes, in the two feature spaces. Then, a robust distilling module (RDM) is applied to construct the global knowledge based on the prototypes and filtrate noisy global and local knowledge by measuring the discrepancy of the representations in two feature spaces. |
Sanli Tang; Zhongyu Zhang; Zhanzhan Cheng; Jing Lu; Yunlu Xu; Yi Niu; Fan He; |
362 | Unifying Visual Perception By Dispersible Points Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a conceptually simple, flexible, and universal visual perception head for variant visual tasks, e.g., classification, object detection, instance segmentation and pose estimation, and different frameworks, such as one-stage or two-stage pipelines. |
Jianming Liang; Guanglu Song; Biao Leng; Yu Liu; |
363 | PseCo: Pseudo Labeling and Consistency Training for Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we delve into two key techniques in Semi-Supervised Object Detection (SSOD), namely pseudo labeling and consistency training. |
Gang Li; Xiang Li; Yujie Wang; Yichao Wu; Ding Liang; Shanshan Zhang; |
364 | Exploring Resolution and Degradation Clues As Self-Supervised Signal for Low Quality Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we propose anovel self-supervised framework to detect objects in degraded low res-olution images. |
Ziteng Cui; Yingying Zhu; Lin Gu; Guo-Jun Qi; Xiaoxiao Li; Renrui Zhang; Zenghui Zhang; Tatsuya Harada; |
365 | Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering of Neural Features Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead, we introduce a coarse-to-fine optimization strategy that utilizes the rendering process to estimate a sparse set of 6D object proposals, which are subsequently refined with gradient-based optimization. |
Wufei Ma; Angtian Wang; Alan Yuille; Adam Kortylewski; |
366 | Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we mainly address the challenge of cross-modal weakly misalignment in aerial RGB-IR images. |
Maoxun Yuan; Yinyan Wang; Xingxing Wei; |
367 | RFLA: Gaussian Receptive Field Based Label Assignment for Tiny Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we point out that either box prior in the anchor-based detector or point prior in the anchor-free detector is sub-optimal for tiny objects. |
Chang Xu; Jinwang Wang; Wen Yang; Huai Yu; Lei Yu; Gui-Song Xia; |
368 | Rethinking IoU-Based Optimization for Single-Stage 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Rotation-Decoupled IoU (RDIoU) method that can mitigate the rotation-sensitivity issue, and produce more efficient optimization objectives compared with 3D IoU during the training stage. |
Hualian Sheng; Sijia Cai; Na Zhao; Bing Deng; Jianqiang Huang; Xian-Sheng Hua; Min-Jian Zhao; Gim Hee Lee; |
369 | TD-Road: Top-Down Road Network Extraction with Holistic Graph Construction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In contrast to the bottom-up graph-based approaches, which rely on orientation information, we propose a novel top-down approach to generate road network graphs with a holistic model, namely TD-Road. |
Yang He; Ravi Garg; Amber Roy Chowdhury; |
370 | Multi-faceted Distillation of Base-Novel Commonality for Few-Shot Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work we propose to learn three types of class-agnostic commonalities between base and novel classes explicitly: recognition-related semantic commonalities, localization-related semantic commonalities and distribution commonalities. |
Shuang Wu; Wenjie Pei; Dianwen Mei; Fanglin Chen; Jiandong Tian; Guangming Lu; |
371 | PointCLM: A Contrastive Learning-Based Framework for Multi-Instance Point Cloud Registration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose PointCLM, a contrastive learning-based framework for mutli-instance point cloud registration. |
Mingzhi Yuan; Zhihao Li; Qiuye Jin; Xinrong Chen; Manning Wang; |
372 | Weakly Supervised Object Localization Via Transformer with Implicit Spatial Calibration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the long-range modeling in Transformer neglects the inherent spatial coherence of the object, and it usually diffuses the semantic-aware regions far from the object boundary, making localization results significantly larger or far smaller. To address such an issue, we introduce a simple yet effective Spatial Calibration Module (SCM) for accurate WSOL, incorporating semantic similarities of patch tokens and their spatial relationships into a unified diffusion model. |
Haotian Bai; Ruimao Zhang; Jiong Wang; Xiang Wan; |
373 | MTTrans: Cross-Domain Object Detection with Mean Teacher Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, it requires large-scale labeled data and suffers from domain shift, especially when no labeled data is available in the target domain. To solve this problem, we propose an end-to-end cross-domain detection Transformer based on the mean teacher framework, MTTrans, which can fully exploit unlabeled target domain data in object detection training and transfer knowledge between domains via pseudo labels. |
Jinze Yu; Jiaming Liu; Xiaobao Wei; Haoyi Zhou; Yohei Nakata; Denis Gudovskiy; Tomoyuki Okuno; Jianxin Li; Kurt Keutzer; Shanghang Zhang; |
374 | Multi-Domain Multi-Definition Landmark Localization for Small Datasets Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel method for multi image domain and multi-landmark definition learning for small dataset facial localization. |
David Ferman; Gaurav Bharaj; |
375 | DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Since the depth is the hardest to estimate for monocular detection, this paper proposes Depth EquiVarIAnt NeTwork (DEVIANT) built with existing scale equivariant steerable blocks. |
Abhinav Kumar; Garrick Brazil; Enrique Corona; Armin Parchami; Xiaoming Liu; |
376 | Label-Guided Auxiliary Training Improves 3D Object Detector Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Label-Guided auxiliary training method for 3D object detection (LG3D), which serves as an auxiliary network to enhance the feature learning of existing 3D object detectors. |
Yaomin Huang; Xinmei Liu; Yichen Zhu; Zhiyuan Xu; Chaomin Shen; Zhengping Che; Guixu Zhang; Yaxin Peng; Feifei Feng; Jian Tang; |
377 | PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations. |
Chengjian Feng; Yujie Zhong; Zequn Jie; Xiangxiang Chu; Haibing Ren; Xiaolin Wei; Weidi Xie; Lin Ma; |
378 | Densely Constrained Depth Estimator for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a method that utilizes dense projection constraints from edges of any direction. |
Yingyan Li; Yuntao Chen; Jiawei He; Zhaoxiang Zhang; |
379 | Polarimetric Pose Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper explores how complementary polarisation information, i.e. the orientation of light wave oscillations, influences the accuracy of pose predictions. |
Daoyi Gao; Yitong Li; Patrick Ruhkamp; Iuliia Skobleva; Magdalena Wysocki; HyunJun Jung; Pengyuan Wang; Arturo Guridi; Benjamin Busam; |
380 | DFNet: Enhance Absolute Pose Regression with Direct Feature Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a camera relocalization pipeline that combines absolute pose regression (APR) and direct feature matching. |
Shuai Chen; Xinghui Li; Zirui Wang; Victor Adrian Prisacariu; |
381 | Cornerformer: Purifying Instances for Corner-Based Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Accordingly, this paper presents an elegant framework named Cornerformer that is composed of two factors. |
Haoran Wei; Xin Chen; Lingxi Xie; Qi Tian; |
382 | PillarNet: Real-Time and High-Performance Pillar-Based 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, by examining the primary performance gap between pillar- and voxel-based detectors, we develop a real-time and high-performance pillar-based detector, dubbed PillarNet. |
Guangsheng Shi; Ruifeng Li; Chao Ma; |
383 | Robust Object Detection with Inaccurate Bounding Boxes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we aim to address the challenge of learning robust object detectors with inaccurate bounding boxes. |
Chengxin Liu; Kewei Wang; Hao Lu; Zhiguo Cao; Ziming Zhang; |
384 | Efficient Decoder-Free Object Detection with Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As a result, transformer-based object detection could not prevail in large-scale applications. To overcome these issues, we propose a novel decoder-free fully transformer-based (DFFT) object detector, achieving high efficiency in both training and inference stages for the first time. |
Peixian Chen; Mengdan Zhang; Yunhang Shen; Kekai Sheng; Yuting Gao; Xing Sun; Ke Li; Chunhua Shen; |
385 | Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the Cross-Modality Knowledge Distillation (CMKD) network for monocular 3D detection to efficiently and directly transfer the knowledge from LiDAR modality to image modality on both features and responses. |
Yu Hong; Hang Dai; Yong Ding; |
386 | ReAct: Temporal Action Detection with Relational Queries Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries, similar to DETR, which has shown great success in object detection. |
Dingfeng Shi; Yujie Zhong; Qiong Cao; Jing Zhang; Lin Ma; Jia Li; Dacheng Tao; |
387 | Towards Accurate Active Camera Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we tackle the problem of active camera localization, which controls the camera movements actively to achieve an accurate camera pose. |
Qihang Fang; Yingda Yin; Qingnan Fan; Fei Xia; Siyan Dong; Sheng Wang; Jue Wang; Leonidas J. Guibas; Baoquan Chen; |
388 | Camera Pose Auto-Encoders for Improving Pose Regression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce Camera Pose Auto-Encoders (PAEs), multilayer perceptrons that are trained via a Teacher-Student approach to encode camera poses using APRs as their teachers. |
Yoli Shavit; Yosi Keller; |
389 | Improving The Intra-Class Long-Tail in 3D Detection Via Rare Example Mining Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this study, we identify a new conceptual dimension – rareness – to mine new data for improving the long-tail performance of models. |
Chiyu Max Jiang; Mahyar Najibi; Charles R. Qi; Yin Zhou; Dragomir Anguelov; |
390 | Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus only the discriminative locations are activated when feeding pixel-level features into this classifier. To solve this issue, this paper elaborates a plug-and-play mechanism called BagCAMs to better project a well-trained classifier for the localization task without refining or re-training the baseline structure. |
Lei Zhu; Qian Chen; Lujia Jin; Yunfei You; Yanye Lu; |
391 | UC-OWOD: Unknown-Classified Open World Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel OWOD problem called Unknown-Classified Open World Object Detection (UC-OWOD). |
Zhiheng Wu; Yue Lu; Xingyu Chen; Zhengxing Wu; Liwen Kang; Junzhi Yu; |
392 | RayTran: 3D Pose Estimation and Shape Reconstruction of Multiple Objects from Videos with Ray-Traced Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a transformer-based neural network architecture for multi-object 3D reconstruction from RGB videos. |
Micha? J. Tyszkiewicz; Kevis-Kokitsi Maninis; Stefan Popov; Vittorio Ferrari; |
393 | GTCaR: Graph Transformer for Camera Re-Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we propose a neural network approach with a graph Transformer backbone, namely GTCaR (Graph Transformer for Camera Re-localization), to address the multi-view camera re-localization problem. |
Xinyi Li; Haibin Ling; |
394 | 3D Object Detection with A Self-Supervised Lidar Scene Flow Backbone Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our main contribution leverages learned flow and motion representations and combines a self-supervised backbone with a supervised 3D detection head. |
Emeç Erç,elik; Ekim Yurtsever; Mingyu Liu; Zhijie Yang; Hanzhen Zhang; P?nar Topç,am; Maximilian Listl; Y?lmaz Kaan Ç,ayl?; Alois Knoll; |
395 | Open Vocabulary Object Detection with Pseudo Bounding-Box Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs. |
Mingfei Gao; Chen Xing; Juan Carlos Niebles; Junnan Li; Ran Xu; Wenhao Liu; Caiming Xiong; |
396 | Few-Shot Object Detection By Knowledge Distillation Using Bag-of-Visual-Words Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we design a novel knowledge distillation framework to guide the learning of the object detector and thereby restrain the overfitting in both the pre-training stage on base classes and fine-tuning stage on novel classes. |
Wenjie Pei; Shuang Wu; Dianwen Mei; Fanglin Chen; Jiandong Tian; Guangming Lu; |
397 | SALISA: Saliency-Based Input Sampling for Efficient Video Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection that allows for heavy down-sampling of unimportant background regions while preserving the fine-grained details of a high-resolution image. |
Babak Ehteshami Bejnordi; Amirhossein Habibian; Fatih Porikli; Amir Ghodrati; |
398 | ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an efficient structure named Efficient Correspondence Transformer (ECO-TR) by finding correspondences in a coarse-to-fine manner, which significantly improves the efficiency of functional model. |
Dongli Tan; Jiang-Jiang Liu; Xingyu Chen; Chao Chen; Ruixin Zhang; Yunhang Shen; Shouhong Ding; Rongrong Ji; |
399 | Vote from The Center: 6 DoF Pose Estimation in RGB-D Images By Radial Keypoint Voting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel keypoint voting scheme based on intersecting spheres, that is more accurate than existing schemes and allows for fewer, more disperse keypoints. |
Yangzheng Wu; Mohsen Zand; Ali Etemad; Michael Greenspan; |
400 | Long-Tailed Instance Segmentation Using Gumbel Optimized Loss Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we identify that Sigmoid or Softmax functions used in deep detectors are a major reason for low performance and are suboptimal for long-tailed detection and segmentation. |
Konstantinos Panagiotis Alexandridis; Jiankang Deng; Anh Nguyen; Shan Luo; |
401 | DetMatch: Two Teachers Are Better Than One for Joint 2D and 3D Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Observing that the distinct characteristics of each sensor cause them to be biased towards detecting different objects, we propose DetMatch, a flexible framework for joint semi-supervised learning on 2D and 3D modalities. |
Jinhyung Park; Chenfeng Xu; Yiyang Zhou; Masayoshi Tomizuka; Wei Zhan; |
402 | ObjectBox: From Centers to Boxes for Anchor-Free Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present ObjectBox, a novel single-stage anchor-free and highly generalizable object detection approach. |
Mohsen Zand; Ali Etemad; Michael Greenspan; |
403 | Is Geometry Enough for Matching in Visual Localization? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to go beyond the well-established approach to vision-based localization that relies on visual descriptor matching between a query image and a 3D point cloud. |
Qunjie Zhou; Sé,rgio Agostinho; Aljoša Ošep; Laura Leal-Taixé,; |
404 | SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose Sparse Window Transformer (SWFormer ), a scalable and accurate model for 3D object detection, which can take full advantage of the sparsity of point clouds. |
Pei Sun; Mingxing Tan; Weiyue Wang; Chenxi Liu; Fei Xia; Zhaoqi Leng; Dragomir Anguelov; |
405 | PCR-CG: Point Cloud Registration Via Deep Explicit Color and Geometry Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce PCR-CG: a novel 3D point cloud registration module explicitly embedding the color signals into geometry representation. |
Yu Zhang; Junle Yu; Xiaolin Huang; Wenhui Zhou; Ji Hou; |
406 | GLAMD: Global and Local Attention Mask Distillation for Object Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To overcome such challenging issues, we propose a novel knowledge distillation method, GLAMD, distilling both global and local knowledge from the teacher. |
Younho Jang; Wheemyung Shin; Jinbeom Kim; Simon Woo; Sung-Ho Bae; |
407 | FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present FCAF3D — a first-in-class fully convolutional anchor-free indoor 3D object detection method. |
Danila Rukhovich; Anna Vorontsova; Anton Konushin; |
408 | Video Anomaly Detection By Solving Decoupled Spatio-Temporal Jigsaw Puzzles Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the recent advances in self-supervised learning, this paper addresses VAD by solving an intuitive yet challenging pretext task, i.e., spatio-temporal jigsaw puzzles, which is cast as a multi-label fine-grained classification problem. |
Guodong Wang; Yunhong Wang; Jie Qin; Dongming Zhang; Xiuguo Bao; Di Huang; |
409 | Class-Agnostic Object Detection with Multi-modal Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we advocate that existing methods lack a top-down supervision signal governed by human-understandable semantics. |
Muhammad Maaz; Hanoona Rasheed; Salman Khan; Fahad Shahbaz Khan; Rao Muhammad Anwer; Ming-Hsuan Yang; |
410 | Enhancing Multi-modal Features Using Local Self-Attention for 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose EMMF-Det to do multi-modal fusion leveraging range and camera images. |
Hao Li; Zehan Zhang; Xian Zhao; Yulong Wang; Yuxi Shen; Shiliang Pu; Hui Mao; |
411 | Object Detection As Probabilistic Set Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose to view object detection as a set prediction task where detectors predict the distribution over the set of objects. |
Georg Hess; Christoffer Petersson; Lennart Svensson; |
412 | Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to model actions as the combinations of reusable atomic actions which are automatically discovered from data through self-supervised clustering, in order to capture the commonality and individuality of fine-grained actions. |
Zhi Li; Lu He; Huijuan Xu; |
413 | Neural Correspondence Field for Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image. |
Lin Huang; Tomas Hodan; Lingni Ma; Linguang Zhang; Luan Tran; Christopher D. Twigg; Po-Chen Wu; Junsong Yuan; Cem Keskin; Robert Wang; |
414 | On Label Granularity and Object Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we study the role of label granularity in WSOL. |
Elijah Cole; Kimberly Wilber; Grant Van Horn; Xuan Yang; Marco Fornoni; Pietro Perona; Serge Belongie; Andrew Howard; Oisin Mac Aodha; |
415 | OIMNet++: Prototypical Normalization and Localization-Aware Learning for Person Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce OIMNet++ that addresses the aforementioned limitations. |
Sanghoon Lee; Youngmin Oh; Donghyeon Baek; Junghyup Lee; Bumsub Ham; |
416 | Out-of-Distribution Identification: Let Detector Tell Which I Am Not Sure Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, the Feature structured OOD-IDentification (FOOD-ID) model is proposed to reduce the uncertainty of detection results by identifying the OOD instances. |
Ruoqi Li; Chongyang Zhang; Hao Zhou; Chao Shi; Yan Luo; |
417 | Learning with Free Object Segments for Long-Tailed Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore the possibility to increase the training examples without laborious data collection and annotation. |
Cheng Zhang; Tai-Yu Pan; Tianle Chen; Jike Zhong; Wenjin Fu; Wei-Lun Chao; |
418 | Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures, achieving strong results on SUN-RGBD, Scannet, KITTI, and our new dataset. |
YuXuan Liu; Nikhil Mishra; Maximilian Sieb; Yide Shentu; Pieter Abbeel; Xi Chen; |
419 | 3D Random Occlusion and Multi-layer Projection for Deep Multi-Camera Pedestrian Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. |
Rui Qiu; Ming Xu; Yuyao Yan; Jeremy S. Smith; Xi Yang; |
420 | A Simple Single-Scale Vision Transformer for Object Detection and Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we comprehensively study three architecture design choices on ViT — spatial reduction, doubled channels, and multiscale features — and demonstrate that a vanilla ViT architecture can fulfill this goal without handcrafting multiscale features, maintaining the original ViT design philosophy. |
Wuyang Chen; Xianzhi Du; Fan Yang; Lucas Beyer; Xiaohua Zhai; Tsung-Yi Lin; Huizhong Chen; Jing Li; Xiaodan Song; Zhangyang Wang; Denny Zhou; |
421 | Simple Open-Vocabulary Object Detection with Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a strong recipe for transferring image-text models to open-vocabulary object detection. |
Matthias Minderer; Alexey Gritsenko; Austin Stone; Maxim Neumann; Dirk Weissenborn; Alexey Dosovitskiy; Aravindh Mahendran; Anurag Arnab; Mostafa Dehghani; Zhuoran Shen; Xiao Wang; Xiaohua Zhai; Thomas Kipf; Neil Houlsby; |
422 | A Simple Approach and Benchmark for 21,000-Category Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unlike previous efforts that usually transfer knowledge from base detectors to image classification data, we propose to rely more on a reverse information flow from a base image classifier to object detection data. |
Yutong Lin; Chen Li; Yue Cao; Zheng Zhang; Jianfeng Wang; Lijuan Wang; Zicheng Liu; Han Hu; |
423 | Knowledge Condensation Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Knowledge Condensation Distillation (KCD). |
Chenxin Li; Mingbao Lin; Zhiyuan Ding; Nie Lin; Yihong Zhuang; Yue Huang; Xinghao Ding; Liujuan Cao; |
424 | Reducing Information Loss for Spiking Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Meanwhile, quantifying the membrane potential to 0/1 spikes at the firing instants will inevitably introduce the quantization error thus bringing about information loss too. To address these problems, we propose a “Soft Reset mechanism for the supervised training-based SNNs, which will drive the membrane potential to a dynamic reset potential according to its magnitude, and Membrane Potential Rectifier (MPR) to reduce the quantization error via redistributing the membrane potential to a range close to the spikes. |
Yufei Guo; Yuanpei Chen; Liwen Zhang; YingLei Wang; Xiaode Liu; Xinyi Tong; Yuanyuan Ou; Xuhui Huang; Zhe Ma; |
425 | Masked Generative Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper shows that teachers can also improve students’ representation power by guiding students’ feature recovery. From this point of view, we propose Masked Generative Distillation (MGD), which is simple: we mask random pixels of the student’s feature and force it to generate the teacher’s full feature through a simple block. |
Zhendong Yang; Zhe Li; Mingqi Shao; Dachuan Shi; Zehuan Yuan; Chun Yuan; |
426 | Fine-Grained Data Distribution Alignment for Post-Training Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While post-training quantization receives popularity mostly due to its evasion in accessing the original complete training dataset, its poor performance also stems from scarce images. To alleviate this limitation, in this paper, we leverage the synthetic data introduced by zero-shot quantization with calibration dataset and propose a fine-grained data distribution alignment (FDDA) method to boost the performance of post-training quantization. |
Yunshan Zhong; Mingbao Lin; Mengzhao Chen; Ke Li; Yunhang Shen; Fei Chao; Yongjian Wu; Rongrong Ji; |
427 | Learning with Recoverable Forgetting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we explore a novel learning scheme, termed as \textbf{L}earning w\textbf{I}th \textbf{R}ecoverable \textbf{F}orgetting (LIRF), that explicitly handles the task- or sample-specific knowledge removal and recovery. |
Jingwen Ye; Yifang Fu; Jie Song; Xingyi Yang; Songhua Liu; Xin Jin; Mingli Song; Xinchao Wang; |
428 | Efficient One Pass Self-Distillation with Zipf’s Label Smoothing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes an efficient self-distillation method named Zipf’s Label Smoothing (Zipf’s LS), which uses the on-the-fly prediction of a network to generate soft supervision that conforms to Zipf distribution without using any contrastive samples or auxiliary parameters. |
Jiajun Liang; Linze Li; Zhaodong Bing; Borui Zhao; Yao Tang; Bo Lin; Haoqiang Fan; |
429 | Prune Your Model Before Distill It Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose the novel framework, “prune, then distill,” that prunes the model first to make it more transferrable and then distill it to the student. |
Jinhyuk Park; Albert No; |
430 | Deep Partial Updating: Towards Communication Efficient Updating for On-Device Inference Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the weight-wise deep partial updating paradigm, which smartly selects a small subset of weights to update in each server-to-edge communication round, while achieving a similar performance compared to full updating. |
Zhongnan Qu; Cong Liu; Lothar Thiele; |
431 | Patch Similarity Aware Data-Free Quantization for Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose PSAQ-ViT, a Patch Similarity Aware data-free Quantization framework for Vision Transformers, to enable the generation of realistic samples based on the vision transformer’s unique properties for calibrating the quantization parameters. |
Zhikai Li; Liping Ma; Mengjuan Chen; Junrui Xiao; Qingyi Gu; |
432 | L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus, we propose L3, a custom lightweight, lossless image format for high-resolution, high-throughput DNN training. |
Jonghyun Bae; Woohyeon Baek; Tae Jun Ham; Jae W. Lee; |
433 | Streaming Multiscale Deep Equilibrium Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present StreamDEQ, a method that infers frame-wise representations on videos with minimal per-frame computation. |
Can Ufuk Ertenli; Emre Akbas; Ramazan Gokberk Cinbis; |
434 | Symmetry Regularization and Saturating Nonlinearity for Robust Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we perform extensive analyses to identify the sources of quantization error and present three insights to robustify the network against quantization: reduction of error propagation, range clamping for error minimization, and inherited robustness against quantization. |
Sein Park; Yeongsang Jang; Eunhyeok Park; |
435 | SP-Net: Slowly Progressing Dynamic Inference Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To alleviate the problems above, we propose a slowly progressing dynamic inference network to stabilize the optimization. |
Huanyu Wang; Wenhu Zhang; Shihao Su; Hui Wang; Zhenwei Miao; Xin Zhan; Xi Li; |
436 | Equivariance and Invariance Inductive Bias for Learning from Insufficient Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We are interested in learning robust models from insufficient data, without the need for any externally pre-trained checkpoints. |
Tan Wang; Qianru Sun; Sugiri Pranata; Karlekar Jayashree; Hanwang Zhang; |
437 | Mixed-Precision Neural Network Quantization Via Learned Layer-Wise Importance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we reveal that some unique learnable parameters in quantization, namely the scale factors in the quantizer, can serve as importance indicators of a layer, reflecting the contribution of that layer to the final accuracy at certain bit-widths. |
Chen Tang; Kai Ouyang; Zhi Wang; Yifei Zhu; Wen Ji; Yaowei Wang; Wenwu Zhu; |
438 | Event Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Such redundancy occurs at multiple levels of complexity, from low-level pixel values to textures and high-level semantics. We propose Event Neural Networks (EvNets), which leverage this redundancy to achieve considerable computation savings during video inference. |
Matthew Dutson; Yin Li; Mohit Gupta; |
439 | EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, pushing further along this under-studied direction we introduce EdgeViTs, a new family of light-weight ViTs that, for the first time, enable attention-based vision models to compete with the best light-weight CNNs in the tradeoff between accuracy and on-device efficiency. |
Junting Pan; Adrian Bulat; Fuwen Tan; Xiatian Zhu; Lukasz Dudziak; Hongsheng Li; Georgios Tzimiropoulos; Brais Martinez; |
440 | PalQuant: Accelerating High-Precision Networks on Low-Precision Accelerators Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the PArallel Low-precision Quantization (PalQuant) method that approximates high-precision computations via learning parallel low-precision representations from scratch. |
Qinghao Hu; Gang Li; Qiman Wu; Jian Cheng; |
441 | Disentangled Differentiable Network Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel channel pruning method for compression and acceleration of Convolutional Neural Networks (CNNs). |
Shangqian Gao; Feihu Huang; Yanfu Zhang; Heng Huang; |
442 | IDa-Det: An Information Discrepancy-Aware Distillation for 1-Bit Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents an Information Discrepancy-aware strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate information discrepancies and significantly reduce the performance gap between a 1-bit detector and its real-valued counterpart. |
Sheng Xu; Yanjing Li; Bohan Zeng; Teli Ma; Baochang Zhang; Xianbin Cao; Peng Gao; Jinhu Lü,; |
443 | Learning to Weight Samples for Dynamic Early-Exiting Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the early-exiting behavior during testing has been ignored, leading to a gap between training and testing. In this paper, we propose to bridge this gap by sample weighting. |
Yizeng Han; Yifan Pu; Zihang Lai; Chaofei Wang; Shiji Song; Junfeng Cao; Wenhui Huang; Chao Deng; Gao Huang; |
444 | AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we present a simple yet effective approach called AdaBin to adaptively obtain the optimal binary sets {b_1, b_2} (b_1, b_2 belong to R) of weights and activations for each layer instead of a fixed set (i.e., {-1, +1}). |
Zhijun Tu; Xinghao Chen; Pengju Ren; Yunhe Wang; |
445 | Adaptive Token Sampling for Efficient Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although the GFLOPs of a vision transformer can be decreased by reducing the number of tokens in the network, there is no setting that is optimal for all input images. In this work, we therefore introduce a differentiable parameter-free Adaptive Token Sampler (ATS) module, which can be plugged into any existing vision transformer architecture. |
Mohsen Fayyaz; Soroush Abbasi Koohpayegani; Farnoush Rezaei Jafari; Sunando Sengupta; Hamid Reza Vaezi Joze; Eric Sommerlade; Hamed Pirsiavash; Jü,rgen Gall; |
446 | Weight Fixing Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new method, which we call Weight Fixing Networks (WFN) that we design to realise four model outcome objectives: i) very few unique weights, ii) low-entropy weight encodings, iii) unique weight values which are amenable to energy-saving versions of hardware multiplication, and iv) lossless task-performance. |
Christopher Subia-Waud; Srinandan Dasmahapatra; |
447 | Self-Slimmed Vision Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To solve the issue, we propose a generic self-slimmed learning approach for vanilla ViTs, namely SiT. |
Zhuofan Zong; Kunchang Li; Guanglu Song; Yali Wang; Yu Qiao; Biao Leng; Yu Liu; |
448 | Switchable Online Knowledge Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Several crucial bottlenecks over the gap between them — e.g., Why and when does a large gap harm the performance, especially for student? How to quantify the gap between teacher and student? — have received limited formal study. In this paper, we propose Switchable Online Knowledge Distillation (SwitOKD), to answer these questions |
Biao Qian; Yang Wang; Hongzhi Yin; Richang Hong; Meng Wang; |
449 | $\ell_\infty$-Robustness and Beyond: Unleashing Efficient Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, by leveraging the theory of coreset selection, we show how selecting a small subset of training data provides a general, more principled approach toward reducing the time complexity of robust training. |
Hadi M. Dolatabadi; Sarah Erfani; Christopher Leckie; |
450 | Multi-Granularity Pruning for Model Acceleration on Mobile Devices Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a unified framework for the Joint Channel pruning and Weight pruning, named JCW, which achieves an optimal pruning proportion between channel and weight pruning. |
Tianli Zhao; Xi Sheryl Zhang; Wentao Zhu; Jiaxing Wang; Sen Yang; Ji Liu; Jian Cheng; |
451 | Deep Ensemble Learning By Diverse Knowledge Distillation for Fine-Grained Object Classification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a knowledge distillation for ensemble by optimizing the elements of knowledge distillation as hyperparameters. |
Naoki Okamoto; Tsubasa Hirakawa; Takayoshi Yamashita; Hironobu Fujiyoshi; |
452 | Helpful or Harmful: Inter-Task Association in Continual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel approach to differentiate helpful and harmful information for old tasks using a model search to learn a current task effectively. |
Hyundong Jin; Eunwoo Kim; |
453 | Towards Accurate Binary Neural Networks Via Modeling Contextual Dependencies Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, such simple bit operations lack the ability of modeling contextual dependencies, which is critical for learning discriminative deep representations in vision models. In this work, we tackle this issue by presenting new designs of binary neural modules, which enables BNNs to learn effective contextual dependencies. |
Xingrun Xing; Yangguang Li; Wei Li; Wenrui Ding; Yalong Jiang; Yufeng Wang; Jing Shao; Chunlei Liu; Xianglong Liu; |
454 | SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we perform an empirical evaluation on methods for sharing parameters in isotropic networks (SPIN). |
Chien-Yu Lin; Anish Prabhu; Thomas Merth; Sachin Mehta; Anurag Ranjan; Maxwell Horton; Mohammad Rastegari; |
455 | Ensemble Knowledge Guided Sub-network Search and Fine-Tuning for Filter Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a novel sub-network search and fine-tuning method that is named Ensemble Knowledge Guidance (EKG). |
Seunghyun Lee; Byung Cheol Song; |
456 | Network Binarization Via Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To mitigate the information degradation caused by the binarization operation from FP to binary activations, we establish a novel contrastive learning framework while training BNNs through the lens of Mutual Information (MI) maximization. |
Yuzhang Shang; Dan Xu; Ziliang Zong; Liqiang Nie; Yan Yan; |
457 | Lipschitz Continuity Retained Binary Neural Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce the Lipschitz continuity, a well-defined functional property, as the rigorous criteria to define the model robustness for BNN. |
Yuzhang Shang; Dan Xu; Bin Duan; Ziliang Zong; Liqiang Nie; Yan Yan; |
458 | SPViT: Enabling Faster Vision Transformers Via Latency-Aware Soft Token Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Considering the computation complexity, the internal data pattern of ViTs, and the edge device deployment, we propose a latency-aware soft token pruning framework, SPViT, which can be set up on vanilla Transformers of both flatten and hierarchical structures, such as DeiTs and Swin-Transformers (Swin). |
Zhenglun Kong; Peiyan Dong; Xiaolong Ma; Xin Meng; Wei Niu; Mengshu Sun; Xuan Shen; Geng Yuan; Bin Ren; Hao Tang; Minghai Qin; Yanzhi Wang; |
459 | Soft Masking for Cost-Constrained Channel Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Soft Masking for cost-constrained Channel Pruning (SMCP) to allow pruned channels to adaptively return to the network while simultaneously pruning towards a target cost constraint. |
Ryan Humble; Maying Shen; Jorge Albericio Latorre; Eric Darve; Jose Alvarez; |
460 | Non-uniform Step Size Quantization for Accurate Post-Training Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we propose a novel PTQ scheme to bridge the gap, with minimal impact on hardware cost. |
Sangyun Oh; Hyeonuk Sim; Jounghyun Kim; Jongeun Lee; |
461 | SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets Via Jointly Architecture Searching and Parameter Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we discover for the first time that both efficient DNNs and their lottery subnetworks (i.e., lottery tickets) can be directly identified from a supernet, which we term as SuperTickets, via a two-in-one training scheme with jointly architecture searching and parameter pruning. |
Haoran You; Baopu Li; Zhanyi Sun; Xu Ouyang; Yingyan Lin; |
462 | Meta-GF: Training Dynamic-Depth Neural Networks Harmoniously Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The interference would reduce performance of the models and cause negative influences on the convergence speed. To address this problem, we investigate the gradient conflict of these multi-exit networks, and propose a novel meta-learning based training paradigm namely Meta-GF(meta gradient fusion) to harmoniously train these exits. |
Yi Sun; Jian Li; Xin Xu; |
463 | Towards Ultra Low Latency Spiking Neural Networks for Vision and Sequential Tasks Using Temporal Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To optimize the accuracy-energy-latency trade-off, we propose a temporal pruning method which starts with an SNN of T timesteps, and reduces T every iteration of training, with threshold and leak as trainable parameters. |
Sayeed Shafayet Chowdhury; Nitin Rathi; Kaushik Roy; |
464 | Towards Accurate Network Quantization with Equivalent Smooth Regularizer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, they still suffer from accuracy degradation due to inappropriate gradients in the optimization phase, especially for low-bit precision network and low-level vision tasks. To alleviate this issue, this paper defines a family of equivalent smooth regularizers for neural network quantization, named as SQR, which represents the equivalent of actual quantization error. |
Kirill Solodskikh; Vladimir Chikin; Ruslan Aydarkhanov; Dehua Song; Irina Zhelavskaya; Jiansheng Wei; |
465 | Explicit Model Size Control and Relaxation Via Smooth Regularization for Mixed-Precision Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The main challenge of the mixed-precision approach is to define the bit-widths for each layer, while staying under memory and latency requirements. Motivated by this challenge, we introduce a novel technique for explicit complexity control of DNNs quantized to mixed-precision, which uses smooth optimization on the surface containing neural networks of constant size. |
Vladimir Chikin; Kirill Solodskikh; Irina Zhelavskaya; |
466 | BASQ: Branch-Wise Activation-Clipping Search Quantization for Sub-4-Bit Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Branch-wise Activation-clipping Search Quantization (BASQ), which is a novel quantization method for low-bit activation. |
Han-Byul Kim; Eunhyeok Park; Sungjoo Yoo; |
467 | You Already Have It: A Generator-Free Low-Precision DNN Training Framework Using Stochastic Rounding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we innovatively propose to employ the stochastic property of DNN training process itself and directly extract random numbers from DNNs in a self-sufficient manner. |
Geng Yuan; Sung-En Chang; Qing Jin; Alec Lu; Yanyu Li; Yushu Wu; Zhenglun Kong; Yanyue Xie; Peiyan Dong; Minghai Qin; Xiaolong Ma; Xulong Tang; Zhenman Fang; Yanzhi Wang; |
468 | Real Spike: Learning Real-Valued Spikes for Spiking Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we argue that SNNs may not benefit from the weight-sharing mechanism, which can effectively reduce parameters and improve inference efficiency in DNNs, in some hardwares, and assume that an SNN with unshared convolution kernels could perform better. |
Yufei Guo; Liwen Zhang; Yuanpei Chen; Xinyi Tong; Xiaode Liu; YingLei Wang; Xuhui Huang; Zhe Ma; |
469 | FedLTN: Federated Learning for Sparse and Personalized Lottery Ticket Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose FedLTN, a novel approach motivated by the well-known Lottery Ticket Hypothesis to learn sparse and personalized lottery ticket networks (LTNs) for communication-efficient and personalized FL under non-identically and independently distributed (non-IID) data settings. |
Vaikkunth Mugunthan; Eric Lin; Vignesh Gokul; Christian Lau; Lalana Kagal; Steve Pieper; |
470 | Theoretical Understanding of The Information Flow on Continual Learning Performance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While different CL training regimes have been extensively studied empirically, insufficient attention has been paid to the underlying theory. In this paper, we establish a probabilistic framework to analyze information flow through layers in networks for sequential tasks and its impact on learning performance. |
Joshua Andle; Salimeh Yasaei Sekeh; |
471 | Exploring Lottery Ticket Hypothesis in Spiking Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the iterative searching process of LTH brings a huge training computational cost when combined with the multiple timesteps of SNNs. To alleviate such heavy searching cost, we propose Early-Time (ET) ticket where we find the important weight connectivity from a smaller number of timesteps. |
Youngeun Kim; Yuhang Li; Hyoungseob Park; Yeshwanth Venkatesha; Ruokai Yin; Priyadarshini Panda; |
472 | On The Angular Update and Hyperparameter Tuning of A Scale-Invariant Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We first find a common feature of good hyperparameter combinations on such a scale-invariant network, including learning rate, weight decay, number of data samples, and batch size. Then we observe that hyperparameter setups that lead to good performance show similar degrees of angular update during one epoch. |
Juseung Yun; Janghyeon Lee; Hyounguk Shon; Eojindl Yi; Seung Hwan Kim; Junmo Kim; |
473 | LANA: Latency Aware Network Acceleration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce latency-aware network acceleration (LANA)-an approach that builds on neural architecture search technique to accelerate neural networks. |
Pavlo Molchanov; Jimmy Hall; Hongxu Yin; Jan Kautz; Nicolo Fusi; Arash Vahdat; |
474 | RDO-Q: Extremely Fine-Grained Channel-Wise Quantization Via Rate-Distortion Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we address the problem of efficiently exploring the hyperparameter space of channel bit widths. |
Zhe Wang; Jie Lin; Xue Geng; Mohamed M. Sabry Aly; Vijay Chandrasekhar; |
475 | U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel hardware-aware NAS framework that does not only optimize for task accuracy and inference latency, but also for resource utilization. |
Ahmet Caner Yü,zü,gü,ler; Nikolaos Dimitriadis; Pascal Frossard; |
476 | PTQ4ViT: Post-Training Quantization for Vision Transformers with Twin Uniform Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the twin uniform quantization method to reduce the quantization error on these activation values. |
Zhihang Yuan; Chenhao Xue; Yiqi Chen; Qiang Wu; Guangyu Sun; |
477 | Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Deep neural network quantization with adaptive bitwidths has gained increasing attention due to the ease of model deployment on various platforms with different resource budgets. In this paper, we propose a meta-learning approach to achieve this goal. |
Jiseok Youn; Jaehun Song; Hyung-Sin Kim; Saewoong Bahk; |
478 | Understanding The Dynamics of DNNs Using Graph Modularity Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we move a tiny step towards understanding the dynamics of feature representations over layers. |
Yao Lu; Wen Yang; Yunzhe Zhang; Zuohui Chen; Jinyin Chen; Qi Xuan; Zhen Wang; Xiaoniu Yang; |
479 | Latent Discriminant Deterministic Uncertainty Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most successful approaches are computationally intensive. In this work, we attempt to address these challenges in the context of autonomous driving perception tasks. |
Gianni Franchi; Xuanlong Yu; Andrei Bursuc; Emanuel Aldea; Severine Dubuisson; David Filliat; |
480 | Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a novel framework for computing visual counterfactual explanations based on two key ideas. |
Simon Vandenhende; Dhruv Mahajan; Filip Radenovic; Deepti Ghadiyaram; |
481 | HIVE: Evaluating The Human Interpretability of Visual Explanations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce HIVE (Human Interpretability of Visual Explanations), a novel human evaluation framework that assesses the utility of explanations to human users in AI-assisted decision making scenarios, and enables falsifiable hypothesis testing, cross-method comparison, and human-centered evaluation of visual interpretability methods. |
Sunnie S. Y. Kim; Nicole Meister; Vikram V. Ramaswamy; Ruth Fong; Olga Russakovsky; |
482 | BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Moreover, many of the high-performing deep learning models that are already trained and deployed are non-Bayesian in nature, and do not provide uncertainty estimates. To address these issues, we propose BayesCap that learns a Bayesian identity mapping for the frozen model, allowing uncertainty estimation. |
Uddeshya Upadhyay; Shyamgopal Karthik; Yanbei Chen; Massimiliano Mancini; Zeynep Akata; |
483 | SESS: Saliency Enhancing with Scaling and Sliding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel saliency enhancing approach called \textbf{SESS} (\textbf{S}aliency \textbf{E}nhancing with \textbf{S}caling and \textbf{S}liding). |
Osman Tursun; Simon Denman; Sridha Sridharan; Clinton Fookes; |
484 | No Token Left Behind: Explainability-Aided Image Classification and Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To mitigate it, we present a novel explainability-based approach, which adds a loss term to ensure that CLIP focuses on all relevant semantic parts of the input, in addition to employing the CLIP similarity loss used in previous works. |
Roni Paiss; Hila Chefer; Lior Wolf; |
485 | Interpretable Image Classification with Differentiable Prototypes Assignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address those shortcomings, we introduce ProtoPool, an interpretable prototype-based model with positive reasoning and three main novelties. |
Dawid Rymarczyk; ?ukasz Struski; Micha? Gó,rszczak; Koryna Lewandowska; Jacek Tabor; Bartosz Zieli?ski; |
486 | Contributions of Shape, Texture, and Color in Visual Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We investigate the contributions of three important features of the human visual system (HVS)—shape, texture, and color —to object classification. |
Yunhao Ge; Yao Xiao; Zhi Xu; Xingrui Wang; Laurent Itti; |
487 | STEEX: Steering Counterfactual Explanations with Semantics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we address the problem of producing counterfactual explanations for high-quality images and complex scenes. |
Paul Jacob; É,loi Zablocki; Hé,di Ben-Younes; Mickaë,l Chen; Patrick Pé,rez; Matthieu Cord; |
488 | Are Vision Transformers Robust to Patch Perturbations? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we study the robustness of ViT to patch-wise perturbations. |
Jindong Gu; Volker Tresp; Yao Qin; |
489 | A Dataset Generation Framework for Evaluating Megapixel Image Classifiers \& Their Explanations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To investigate classification and explanation performance, we introduce a framework to (a) generate synthetic control images that reflect common properties of megapixel images and (b) evaluate average test-set correctness. |
Gautam Machiraju; Sylvia Plevritis; Parag Mallick; |
490 | Cartoon Explanations of Image Classifiers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present CartoonX (Cartoon Explanation), a novel model-agnostic explanation method tailored towards image classifiers and based on the rate-distortion explanation (RDE) framework. |
Stefan Kolek; Duc Anh Nguyen; Ron Levie; Joan Bruna; Gitta Kutyniok; |
491 | Shap-CAM: Visual Explanations for Convolutional Neural Networks Based on Shapley Value Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop a novel post-hoc visual explanation method called Shap-CAM based on class activation mapping. |
Quan Zheng; Ziwei Wang; Jie Zhou; Jiwen Lu; |
492 | Privacy-Preserving Face Recognition with Learnable Privacy Budgets in Frequency Domain Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a privacy-preserving face recognition method using differential privacy in the frequency domain. |
Jiazhen Ji; Huan Wang; Yuge Huang; Jiaxiang Wu; Xingkun Xu; Shouhong Ding; ShengChuan Zhang; Liujuan Cao; Rongrong Ji; |
493 | Contrast-Phys: Unsupervised Video-Based Remote Physiological Measurement Via Spatiotemporal Contrast Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an unsupervised rPPG measurement method that does not require ground truth signals for training. |
Zhaodong Sun; Xiaobai Li; |
494 | Source-Free Domain Adaptation with Contrastive Domain Alignment and Self-Supervised Exploration for Face Anti-Spoofing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Source-free Domain Adaptation framework for Face Anti-Spoofing, namely SDA-FAS, that addresses the problems of source knowledge adaptation and target data exploration under the source-free setting. |
Yuchen Liu; Yabo Chen; Wenrui Dai; Mengran Gou; Chun-Ting Huang; Hongkai Xiong; |
495 | On Mitigating Hard Clusters for Face Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce two novel modules, Neighborhood-Diffusion-based Density (NDDe) and Transition-Probability-based Distance (TPDi), based on which we can simply apply the standard Density Peak Clustering algorithm with a uniform threshold. |
Yingjie Chen; Huasong Zhong; Chong Chen; Chen Shen; Jianqiang Huang; Tao Wang; Yun Liang; Qianru Sun; |
496 | OneFace: One Threshold for All Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we rethink the limitations of existing evaluation protocols for FR and propose to evaluate the performance of FR models from a new perspective. |
Jiaheng Liu; Zhipeng Yu; Haoyu Qin; Yichao Wu; Ding Liang; Gangming Zhao; Ke Xu; |
497 | Label2Label: A Language Modeling Framework for Multi-Attribute Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a simple yet generic framework named Label2Label to exploit the complex attribute correlations. |
Wanhua Li; Zhexuan Cao; Jianjiang Feng; Jie Zhou; Jiwen Lu; |
498 | AgeTransGAN for Facial Age Transformation with Rectified Performance Metrics Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose the AgeTransGAN for facial age transformation and the improvements to the metrics for performance evaluation. |
Gee-Sern Hsu; Rui-Cang Xie; Zhi-Ting Chen; Yu-Hong Lin; |
499 | Hierarchical Contrastive Inconsistency Learning for Deepfake Video Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Hierarchical Contrastive Inconsistency Learning framework (HCIL) with a two-level contrastive paradigm. |
Zhihao Gu; Taiping Yao; Yang Chen; Shouhong Ding; Lizhuang Ma; |
500 | Rethinking Robust Representation Learning Under Fine-Grained Noisy Faces Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Different types of noisy faces can be generated by adjusting the values of N, K, and C. Based on this unified formulation, we found that the main barrier behind the noise-robust representation learning is the flexibility of the algorithm under different N, K, and C. For this potential problem, we constructively propose a new method, named Evolving Sub-centers Learning (ESL), to find optimal hyperplanes to accurately describe the latent space of massive noisy faces. |
Bingqi Ma; Guanglu Song; Boxiao Liu; Yu Liu; |
501 | Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an attention similarity knowledge distillation approach, which transfers attention maps obtained from a high resolution (HR) network as a teacher into an LR network as a student to boost LR recognition performance. |
Sungho Shin; Joosoon Lee; Junseok Lee; Yeonguk Yu; Kyoobin Lee; |
502 | Teaching with Soft Label Smoothing for Mitigating Noisy Labels in Facial Expressions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Recent studies have highlighted the problem of noisy labels in large scale in-the-wild facial expressions datasets due to the uncertainties caused by ambiguous facial expressions, low-quality facial images, and the subjectiveness of annotators. To solve the problem of noisy labels, we propose Soft Label Smoothing (SLS), which smooths out multiple high-confidence classes in the logits by assigning them a probability based on the corresponding confidence, and at the same time assigning a fixed low probability to the low-confidence classes. |
Tohar Lukov; Na Zhao; Gim Hee Lee; Ser-Nam Lim; |
503 | Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Dynamic Facial Radiance Fields (DFRF) for few-shot talking head synthesis, which can rapidly generalize to an unseen identity with few training data. |
Shuai Shen; Wanhua Li; Zheng Zhu; Yueqi Duan; Jie Zhou; Jiwen Lu; |
504 | CoupleFace: Relation Matters for Face Recognition Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we observe that mutual relation knowledge between samples is also important to improve the discrim- inative ability of the learned representation of the student model, and propose an effective face recognition distilla- tion method called CoupleFace by additionally introducing the Mutual Relation Distillation (MRD) into existing distil- lation framework. |
Jiaheng Liu; Haoyu Qin; Yichao Wu; Jinyang Guo; Ding Liang; Ke Xu; |
505 | Controllable and Guided Face Synthesis for Unconstrained Face Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although significant advances have been made in face recognition (FR), FR in unconstrained environments remains challenging due to the domain gap between the semi-constrained training datasets and unconstrained testing scenarios. To address this problem, we propose a controllable face synthesis model (CFSM) that can mimic the distribution of target datasets in a style latent space. |
Feng Liu; Minchul Kim; Anil Jain; Xiaoming Liu; |
506 | Towards Robust Face Recognition with Comprehensive Search Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Previously, the research community tries to improve the performance of each single aspect but failed to present a unified solution on the joint search of the optimal designs for all three aspects. In this paper, we for the first time identify that these aspects are tightly coupled to each other. |
Manyuan Zhang; Guanglu Song; Yu Liu; Hongsheng Li; |
507 | Towards Unbiased Label Distribution Learning for Facial Pose Estimation Using Anisotropic Spherical Gaussian Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an Anisotropic Spherical Gaussian (ASG)-based LDL approach for facial pose estimation. |
Zhiwen Cao; Dongfang Liu; Qifan Wang; Yingjie Chen; |
508 | AU-Aware 3D Face Reconstruction Through Personalized AU-Specific Blendshape Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a multi-stage learning framework that recovers AU-interpretable 3D facial details by learning personalized AU-specific blendshapes from images. |
Chenyi Kuang; Zijun Cui; Jeffrey O. Kephart; Qiang Ji; |
509 | BézierPalm: A Free Lunch for Palmprint Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, by observing that palmar creases are the key information to deep-learning-based palmprint recognition, we propose to synthesize training data by manipulating palmar creases. |
Kai Zhao; Lei Shen; Yingyi Zhang; Chuhan Zhou; Tao Wang; Ruixin Zhang; Shouhong Ding; Wei Jia; Wei Shen; |
510 | Adaptive Transformers for Robust Few-Shot Cross-Domain Face Anti-Spoofing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present adaptive vision transformers (ViT) for robust cross-domain face anti-spoofing. |
Hsin-Ping Huang; Deqing Sun; Yaojie Liu; Wen-Sheng Chu; Taihong Xiao; Jinwei Yuan; Hartwig Adam; Ming-Hsuan Yang; |
511 | Face2Face$^\rho$: Real-Time High-Resolution One-Shot Face Reenactment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce Face2Face^Ï, the first Real-time High-resolution and One-shot (RHO, Ï) face reenactment framework. |
Kewei Yang; Kang Chen; Daoliang Guo; Song-Hai Zhang; Yuan-Chen Guo; Weidong Zhang; |
512 | Towards Racially Unbiased Skin Tone Estimation Via Scene Disambiguation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We find that current methods are biased towards light skin tones due to (1) strongly biased priors that prefer lighter pigmentation and (2) algorithmic solutions that disregard the light/albedo ambiguity. To address this, we propose a new evaluation dataset (FAIR) and an algorithm (TRUST) to improve albedo estimation and, hence, fairness. |
Haiwen Feng; Timo Bolkart; Joachim Tesch; Michael J. Black; Victoria Abrevaya; |
513 | BoundaryFace: A Mining Framework with Noise Label Self-Correction for Face Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, starting from the perspective of decision boundary, we propose a novel mining framework that focuses on the relationship between a sample’s ground truth class center and its nearest negative class center. |
Shijie Wu; Xun Gong; |
514 | Pre-training Strategies and Datasets for Facial Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our main two findings are: (1) Unsupervised pre-training on completely in-the-wild, uncurated data provides consistent and, in some cases, significant accuracy improvements for all facial tasks considered. (2) Many existing facial video datasets seem to have a large amount of redundancy. |
Adrian Bulat; Shiyang Cheng; Jing Yang; Andrew Garbett; Enrique Sanchez; Georgios Tzimiropoulos; |
515 | Look Both Ways: Self-Supervising Driver Gaze Estimation and Road Scene Saliency Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a new on-road driving dataset, called “Look Both Ways”, which contains synchronized video of both driver faces and the forward road scene, along with ground truth gaze data registered from eye tracking glasses worn by the drivers. |
Isaac Kasahara; Simon Stent; Hyun Soo Park; |
516 | MFIM: Megapixel Facial Identity Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel face-swapping framework called Megapixel Facial Identity Manipulation (MFIM). |
Sanghyeon Na; |
517 | 3D Face Reconstruction with Dense Landmarks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In answer, we present the first method that accurately predicts 10x as many landmarks as usual, covering the whole head, including the eyes and teeth. |
Erroll Wood; Tadas Baltrušaitis; Charlie Hewitt; Matthew Johnson; Jingjing Shen; Nikola Milosavljevi?; Daniel Wilde; Stephan Garbin; Toby Sharp; Ivan Stojiljkovi?; Tom Cashman; Julien Valentin; |
518 | Emotion-Aware Multi-View Contrastive Learning for Facial Emotion Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a novel approach to generate features related to emotional expression through feature transformation and to use them for emotional representation learning. |
Daeha Kim; Byung Cheol Song; |
519 | Order Learning Using Partially Ordered Data Via Chainization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose the chainization algorithm for effective order learning when only partially ordered data are available. |
Seon-Ho Lee; Chang-Su Kim; |
520 | Unsupervised High-Fidelity Facial Texture Generation and Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel unified pipeline for both tasks, generation of texture with coupled geometry, and reconstruction of high-fidelity texture. |
Ron Slossberg; Ibrahim Jubran; Ron Kimmel; |
521 |