Paper Digest: ICCV 2019 Highlights
Download ICCV-2019-Paper-Digests.pdf– highlights of all 1,075 ICCV-2019 papers.
The International Conference on Computer Vision (ICCV) is one of the top computer vision conferences in the world. In 2019, it is to be held in Seoul, Korea. There were more than 4,300 paper submissions, of which around 1,070 were accepted. More than 100 papers also published their code (download link).
The International Conference on Computer Vision (ICCV) is one of the top computer vision conferences in the world. In 2019, it is to be held in Seoul, Korea. There were more than 4,300 paper submissions, of which around 1,070 were accepted. More than 100 papers also published their code (download link).
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: ICCV 2019 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | FaceForensics++: Learning to Detect Manipulated Facial Images | Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, Matthias Niessner | To standardize the evaluation of detection methods, we propose an automated benchmark for facial manipulation detection. |
2 | DeepVCP: An End-to-End Deep Neural Network for Point Cloud Registration | Weixin Lu, Guowei Wan, Yao Zhou, Xiangyu Fu, Pengfei Yuan, Shiyu Song | We present DeepVCP – a novel end-to-end learning-based 3D point cloud registration framework that achieves comparable registration accuracy to prior state-of-the-art geometric methods. |
3 | Shape Reconstruction Using Differentiable Projections and Deep Priors | Matheus Gadelha, Rui Wang, Subhransu Maji | We investigate the problem of reconstructing shapes from noisy and incomplete projections in the presence of viewpoint uncertainities. |
4 | Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization | Mans Larsson, Erik Stenborg, Carl Toft, Lars Hammarstrand, Torsten Sattler, Fredrik Kahl | In this paper, we propose a novel neural network, the Fine-Grained Segmentation Network (FGSN), that can be used to provide image segmentations with a larger number of labels and can be trained in a self-supervised fashion. |
5 | SANet: Scene Agnostic Network for Camera Localization | Luwei Yang, Ziqian Bai, Chengzhou Tang, Honghua Li, Yasutaka Furukawa, Ping Tan | This paper presents a scene agnostic neural architecture for camera localization, where model parameters and scenes are independent from each other.Despite recent advancement in learning based methods, most approaches require training for each scene one by one, not applicable for online applications such as SLAM and robotic navigation, where a model must be built on-the-fly. |
6 | Total Denoising: Unsupervised Learning of 3D Point Cloud Cleaning | Pedro Hermosilla, Tobias Ritschel, Timo Ropinski | To overcome this, and to enable effective and unsupervised 3D point cloud denoising, we introduce a spatial prior term, that steers converges to the unique closest out of the many possible modes on the manifold. |
7 | Hierarchical Self-Attention Network for Action Localization in Videos | Rizard Renanda Adhi Pramono, Yie-Tarng Chen, Wen-Hsien Fang | This paper presents a novel Hierarchical Self-Attention Network (HISAN) to generate spatial-temporal tubes for action localization in videos. |
8 | Goal-Driven Sequential Data Abstraction | Umar Riaz Muhammad, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song | In this paper we study a general reinforcement learning based framework for learning to abstract sequential data in a goal-driven way. |
9 | Jointly Aligning Millions of Images With Deep Penalised Reconstruction Congealing | Roberto Annunziata, Christos Sagonas, Jacques Cali | To overcome these limitations, we propose an unsupervised joint alignment method leveraging a densely fused spatial transformer network to estimate the warping parameters for each image and a low-capacity auto-encoder whose reconstruction error is used as an auxiliary measure of joint alignment. |
10 | Drop to Adapt: Learning Discriminative Features for Unsupervised Domain Adaptation | Seungmin Lee, Dongwan Kim, Namil Kim, Seong-Gyun Jeong | We propose Drop to Adapt (DTA), which leverages adversarial dropout to learn strongly discriminative features by enforcing the cluster assumption. |
11 | NLNL: Negative Learning for Noisy Labels | Youngdong Kim, Junho Yim, Juseung Yun, Junmo Kim | To address this issue, we start with an indirect learning method called Negative Learning (NL), in which the CNNs are trained using a complementary label as in “input image does not belong to this complementary label.” |
12 | On the Design of Black-Box Adversarial Examples by Leveraging Gradient-Free Optimization and Operator Splitting Method | Pu Zhao, Sijia Liu, Pin-Yu Chen, Nghia Hoang, Kaidi Xu, Bhavya Kailkhura, Xue Lin | To push for further advances in this field, we introduce a general framework based on an operator splitting method, the alternating direction method of multipliers (ADMM) to devise efficient, robust black-box attacks that work with various distortion metrics and feedback settings without incurring high query complexity. |
13 | DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks | Sagnik Das, Ke Ma, Zhixin Shu, Dimitris Samaras, Roy Shilkrot | In this work, we propose DewarpNet, a deep-learning approach for document image unwarping from a single image. |
14 | Learning Robust Facial Landmark Detection via Hierarchical Structured Ensemble | Xu Zou, Sheng Zhong, Luxin Yan, Xiangyun Zhao, Jiahuan Zhou, Ying Wu | In this paper, we propose a novel Hierarchical Structured Landmark Ensemble (HSLE) model for learning robust facial landmark detection, by using it as the structural constraints. |
15 | Remote Heart Rate Measurement From Highly Compressed Facial Videos: An End-to-End Deep Learning Solution With Video Enhancement | Zitong Yu, Wei Peng, Xiaobai Li, Xiaopeng Hong, Guoying Zhao | Here we propose a two-stage, end-to-end method using hidden rPPG information enhancement and attention networks, which is the first attempt to counter video compression loss and recover rPPG signals from highly compressed videos. |
16 | Face-to-Parameter Translation for Game Character Auto-Creation | Tianyang Shi, Yi Yuan, Changjie Fan, Zhengxia Zou, Zhenwei Shi, Yong Liu | This paper proposes a method for automatically creating in-game characters of players according to an input face photo. |
17 | Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions | Guha Balakrishnan, Adrian V. Dalca, Amy Zhao, John V. Guttag, Fredo Durand, William T. Freeman | We introduce visual deprojection: the task of recovering an image or video that has been collapsed along a dimension. |
18 | StructureFlow: Image Inpainting via Structure-Aware Appearance Flow | Yurui Ren, Xiaoming Yu, Ruonan Zhang, Thomas H. Li, Shan Liu, Ge Li | In order to solve this problem, in this paper, we propose a two-stage model which splits the inpainting task into two parts: structure reconstruction and texture generation. |
19 | Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization | Md Mahfuzur Rahman Siddiquee, Zongwei Zhou, Nima Tajbakhsh, Ruibin Feng, Michael B. Gotway, Yoshua Bengio, Jianming Liang | Therefore, we propose a new GAN, called Fixed-Point GAN, trained by (1) supervising same-domain translation through a conditional identity loss, and (2) regularizing cross-domain translation through revised adversarial, domain classification, and cycle consistency loss. |
20 | Generative Adversarial Training for Weakly Supervised Cloud Matting | Zhengxia Zou, Wenyuan Li, Tianyang Shi, Zhenwei Shi, Jieping Ye | We re-examine the cloud detection under a totally different point of view, i.e. to formulate it as a mixed energy separation process between foreground and background images, which can be equivalently implemented under an image matting paradigm with a clear physical significance. |
21 | PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data | Zheng Tang, Milind Naphade, Stan Birchfield, Jonathan Tremblay, William Hodge, Ratnesh Kumar, Shuo Wang, Xiaodong Yang | To address these challenges, we propose a Pose-Aware Multi-Task Re-Identification (PAMTRI) framework. |
22 | Generative Adversarial Networks for Extreme Learned Image Compression | Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, Luc Van Gool | We present a learned image compression system based on GANs, operating at extremely low bitrates. |
23 | Instance-Guided Context Rendering for Cross-Domain Person Re-Identification | Yanbei Chen, Xiatian Zhu, Shaogang Gong | To tackle this limitation, we propose a novel Instance-Guided Context Rendering scheme, which transfers the source person identities into diverse target domain contexts to enable supervised re-id model learning in the unlabelled target domain. |
24 | What Else Can Fool Deep Learning? Addressing Color Constancy Errors on Deep Neural Network Performance | Mahmoud Afifi, Michael S. Brown | To address this problem, a novel augmentation method is proposed that can emulate accurate color constancy degradation. |
25 | Beyond Cartesian Representations for Local Descriptors | Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, Eduard Trulls | By contrast, we propose to extract the “support region” directly with a log-polar sampling scheme. |
26 | Distilling Knowledge From a Deep Pose Regressor Network | Muhamad Risqi U. Saputra, Pedro P. B. de Gusmao, Yasin Almalioglu, Andrew Markham, Niki Trigoni | This paper presents a novel method to distill knowledge from a deep pose regressor network for efficient Visual Odometry (VO). |
27 | Instance-Level Future Motion Estimation in a Single Image Based on Ordinal Regression | Kyung-Rae Kim, Whan Choi, Yeong Jun Koh, Seong-Gyun Jeong, Chang-Su Kim | A novel algorithm to estimate instance-level future motion in a single image is proposed in this paper. |
28 | Vision-Infused Deep Audio Inpainting | Hang Zhou, Ziwei Liu, Xudong Xu, Ping Luo, Xiaogang Wang | In this work, we consider a new task of visual information-infused audio inpainting, i.e., synthesizing missing audio segments that correspond to their accompanying videos. To facilitate a large-scale study, we collect a new multi-modality instrument-playing dataset called MUSIC-Extra-Solo (MUSICES) by enriching MUSIC dataset. |
29 | HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision | Zhen Dong, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer | Here, we introduce Hessian AWare Quantization (HAWQ), a novel second-order quantization method to address these problems. |
30 | Evaluating Robustness of Deep Image Super-Resolution Against Adversarial Attacks | Jun-Ho Choi, Huan Zhang, Jun-Hyuk Kim, Cho-Jui Hsieh, Jong-Seok Lee | This paper investigates the robustness of deep learning-based super-resolution methods against adversarial attacks, which can significantly deteriorate the super-resolved images without noticeable distortion in the attacked low-resolution images. |
31 | Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild | Kibok Lee, Kimin Lee, Jinwoo Shin, Honglak Lee | To alleviate this effect, we propose to leverage a large stream of unlabeled data easily obtainable in the wild. |
32 | Symmetric Cross Entropy for Robust Learning With Noisy Labels | Yisen Wang, Xingjun Ma, Zaiyi Chen, Yuan Luo, Jinfeng Yi, James Bailey | Inspired by the symmetric KL-divergence, we propose the approach of Symmetric cross entropy Learning (SL), boosting CE symmetrically with a noise robust counterpart Reverse Cross Entropy (RCE). |
33 | Few-Shot Learning With Embedded Class Models and Shot-Free Meta Training | Avinash Ravichandran, Rahul Bhotika, Stefano Soatto | We propose a method for learning embeddings for few-shot learning that is suitable for use with any number of shots (shot-free). |
34 | Dual Directed Capsule Network for Very Low Resolution Image Recognition | Maneet Singh, Shruti Nagpal, Richa Singh, Mayank Vatsa | This research presents a novel Dual Directed Capsule Network model, termed as DirectCapsNet, for addressing VLR digit and face recognition. |
35 | Recognizing Part Attributes With Insufficient Data | Xiangyun Zhao, Yi Yang, Feng Zhou, Xiao Tan, Yuchen Yuan, Yingze Bao, Ying Wu | In order to solve the data insufficiency problem and get rid of dependence on the part annotation, we introduce a novel Concept Sharing Network (CSN) for part attribute recognition. |
36 | USIP: Unsupervised Stable Interest Point Detection From 3D Point Clouds | Jiaxin Li, Gim Hee Lee | In this paper, we propose the USIP detector: an Unsupervised Stable Interest Point detector that can detect highly repeatable and accurately localized keypoints from 3D point clouds under arbitrary transformations without the need for any ground truth training data. |
37 | Mixed High-Order Attention Network for Person Re-Identification | Binghui Chen, Weihong Deng, Jiani Hu | However, state-of-the-art works concentrate only on coarse or first-order attention design, e.g. spatial and channels attention, while rarely exploring higher-order attention mechanism. We take a step towards addressing this problem. |
38 | Budget-Aware Adapters for Multi-Domain Learning | Rodrigo Berriel, Stephane Lathuillere, Moin Nabi, Tassilo Klein, Thiago Oliveira-Santos, Nicu Sebe, Elisa Ricci | To implement this idea we derive specialized deep models for each domain by adapting a pre-trained architecture but, differently from other methods, we propose a novel strategy to automatically adjust the computational complexity of the network. |
39 | Compact Trilinear Interaction for Visual Question Answering | Tuong Do, Thanh-Toan Do, Huy Tran, Erman Tjiputra, Quang D. Tran | Thus, to selectively utilize image, question and answer information, we propose a novel trilinear interaction model which simultaneously learns high level associations between these three inputs. |
40 | Towards Latent Attribute Discovery From Triplet Similarities | Ishan Nigam, Pavel Tokmakov, Deva Ramanan | We introduce Latent Similarity Networks (LSNs): a simple and effective technique to discover the underlying latent notions of similarity in data without any explicit attribute supervision. |
41 | GeoStyle: Discovering Fashion Trends and Events | Utkarsh Mall, Kevin Matzen, Bharath Hariharan, Noah Snavely, Kavita Bala | In this paper we address this need by providing an automatic framework that analyzes large corpora of street imagery to (a) discover and forecast long-term trends of various fashion attributes as well as automatically discovered styles, and (b) identify spatio-temporally localized events that affect what people wear. |
42 | Towards Adversarially Robust Object Detection | Haichao Zhang, Jianyu Wang | In this work, we take an initial attempt towards this direction. |
43 | Automatic and Robust Skull Registration Based on Discrete Uniformization | Junli Zhao, Xin Qi, Chengfeng Wen, Na Lei, Xianfeng Gu | In this work, we propose an automatic skull registration method based on the discrete uniformization theory, which can handle complicated topologies and is robust to low quality meshes. |
44 | Few-Shot Image Recognition With Knowledge Transfer | Zhimao Peng, Zechao Li, Junge Zhang, Yan Li, Guo-Jun Qi, Jinhui Tang | Inspired from this, we propose a novel Knowledge Transfer Network architecture (KTN) for few-shot image recognition. |
45 | Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings | Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen | In this paper, we propose to enrich the embedding by disentangling parts-of-speech (PoS) in the accompanying captions. |
46 | Vehicle Re-Identification in Aerial Imagery: Dataset and Approach | Peng Wang, Bingliang Jiao, Lu Yang, Yifei Yang, Shizhou Zhang, Wei Wei, Yanning Zhang | In this work, we construct a large-scale dataset for vehicle re-identification (ReID), which contains 137k images of 13k vehicle instances captured by UAV-mounted cameras. |
47 | Bridging the Domain Gap for Ground-to-Aerial Image Matching | Krishna Regmi, Mubarak Shah | We propose a novel method for solving this task by exploiting the gener- ative powers of conditional GANs to synthesize an aerial representation of a ground-level panorama query and use it to minimize the domain gap between the two views. |
48 | A Robust Learning Approach to Domain Adaptive Object Detection | Mehran Khodabandeh, Arash Vahdat, Mani Ranjbar, William G. Macready | In this paper, we address the domain adaptation problem from the perspective of robust learning and show that the problem may be formulated as training with noisy labels. |
49 | Graph-Based Object Classification for Neuromorphic Vision Sensing | Yin Bi, Aaron Chadha, Alhabib Abbas, Eirina Bourtsoulatze, Yiannis Andreopoulos | To circumvent this mismatch between sensing and processing with CNNs, we propose a compact graph representation for NVS. |
50 | Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving | Jiwoong Choi, Dayoung Chun, Hyun Kim, Hyuk-Jae Lee | This paper proposes a method for improving the detection accuracy while supporting a real-time operation by modeling the bounding box (bbox) of YOLOv3, which is the most representative of one-stage detectors, with a Gaussian parameter and redesigning the loss function. |
51 | Sharpen Focus: Learning With Attention Separability and Consistency | Lezi Wang, Ziyan Wu, Srikrishna Karanam, Kuan-Chuan Peng, Rajat Vikram Singh, Bo Liu, Dimitris N. Metaxas | In this paper, we address this problem by means of a new framework that makes class-discriminative attention a principled part of the learning process. |
52 | Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition | Tianshui Chen, Muxin Xu, Xiaolu Hui, Hefeng Wu, Liang Lin | To address these issues, we propose a Semantic-Specific Graph Representation Learning (SSGRL) framework that consists of two crucial modules: 1) a semantic decoupling module that incorporates category semantics to guide learning semantic-specific representations and 2) a semantic interaction module that correlates these representations with a graph built on the statistical label co-occurrence and explores their interactions via a graph propagation mechanism. |
53 | DeceptionNet: Network-Driven Domain Randomization | Sergey Zakharov, Wadim Kehl, Slobodan Ilic | We present a novel approach to tackle domain adaptation between synthetic and real data. |
54 | Pose-Guided Feature Alignment for Occluded Person Re-Identification | Jiaxu Miao, Yu Wu, Ping Liu, Yuhang Ding, Yi Yang | In this paper, we introduce a novel method named Pose-Guided Feature Alignment (PGFA), exploiting pose landmarks to disentangle the useful information from the occlusion noise. Besides, we construct a large-scale dataset for the Occluded Person Re-ID problem, namely Occluded-DukeMTMC, which is by far the largest dataset for the Occlusion Person Re-ID. |
55 | Robust Person Re-Identification by Modelling Feature Uncertainty | Tianyuan Yu, Da Li, Yongxin Yang, Timothy M. Hospedales, Tao Xiang | In this paper, we propose a novel deep network termed DistributionNet for robust ReID. |
56 | Co-Segmentation Inspired Attention Networks for Video-Based Person Re-Identification | Arulkumar Subramaniam, Athira Nambiar, Anurag Mittal | In this work, we propose a novel Co-segmentation inspired video Re-ID deep architecture and formulate a Co-segmentation based Attention Module (COSAM) that activates a common set of salient features across multiple frames of a video via mutual consensus in an unsupervised manner. |
57 | A Delay Metric for Video Object Detection: What Average Precision Fails to Tell | Huizi Mao, Xiaodong Yang, William J. Dally | In this paper, we analyze the object detection from video and point out that mAP alone is not sufficient to capture the temporal nature of video object detection. |
58 | IL2M: Class Incremental Learning With Dual Memory | Eden Belouadah, Adrian Popescu | This paper presents a class incremental learning (IL) method which exploits fine tuning and a dual memory to reduce the negative effect of catastrophic forgetting in image recognition. |
59 | Asymmetric Non-Local Neural Networks for Semantic Segmentation | Zhen Zhu, Mengde Xu, Song Bai, Tengteng Huang, Xiang Bai | In this paper, we present Asymmetric Non-local Neural Network to semantic segmentation, which has two prominent components: Asymmetric Pyramid Non-local Block (APNB) and Asymmetric Fusion Non-local Block (AFNB). |
60 | CCNet: Criss-Cross Attention for Semantic Segmentation | Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, Wenyu Liu | In this work, we propose a Criss-Cross Network (CCNet) for obtaining such contextual information in a more effective and efficient way. |
61 | Convex Shape Prior for Multi-Object Segmentation Using a Single Level Set Function | Shousheng Luo, Xue-Cheng Tai, Limei Huo, Yang Wang, Roland Glowinski | This paper proposes a method to incorporate convex shape prior for multi-object segmentation using level set method. |
62 | Feature Weighting and Boosting for Few-Shot Segmentation | Khoi Nguyen, Sinisa Todorovic | We make two contributions by: (1) Improving discriminativeness of features so their activations are high on the foreground and low elsewhere; and (2) Boosting inference with an ensemble of experts guided with the gradient of loss incurred when segmenting the support images in testing. |
63 | Surface Networks via General Covers | Niv Haim, Nimrod Segol, Heli Ben-Hamu, Haggai Maron, Yaron Lipman | This paper tackles the problem of sphere-type surface learning by developing a novel surface-to-image representation. |
64 | SSAP: Single-Shot Instance Segmentation With Affinity Pyramid | Naiyu Gao, Yanhu Shan, Yupei Wang, Xin Zhao, Yinan Yu, Ming Yang, Kaiqi Huang | To this end, this work proposes a single-shot proposal-free instance segmentation method that requires only one single pass for prediction. |
65 | Learning Propagation for Arbitrarily-Structured Data | Sifei Liu, Xueting Li, Varun Jampani, Shalini De Mello, Jan Kautz | In this paper, we propose to learn pairwise relations among data points in a global fashion to improve semantic segmentation with arbitrarily-structured data, through spatial generalized propagation networks (SGPN). |
66 | MultiSeg: Semantically Meaningful, Scale-Diverse Segmentations From Minimal User Input | Jun Hao Liew, Scott Cohen, Brian Price, Long Mai, Sim-Heng Ong, Jiashi Feng | Motivated by the observation that the object part, full object, and a collection of objects essentially differ in size, we propose a new concept called scale-diversity, which characterizes the spectrum of segmentations w.r.t. different scales. |
67 | Robust Motion Segmentation From Pairwise Matches | Federica Arrigoni, Tomas Pajdla | In this paper we consider the problem of motion segmentation, where only pairwise correspondences are assumed as input without prior knowledge about tracks. |
68 | InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting | Hao-Shu Fang, Jianhua Sun, Runzhong Wang, Minghao Gou, Yong-Lu Li, Cewu Lu | In this paper, we present a simple, efficient and effective method to augment the training set using the existing instance mask annotations. |
69 | Racial Faces in the Wild: Reducing Racial Bias by Information Maximization Adaptation Network | Mei Wang, Weihong Deng, Jiani Hu, Xunqiang Tao, Yaohai Huang | This unsupervised method simultaneously aligns global distribution to decrease race gap at domain-level, and learns the discriminative target representations at cluster level. |
70 | Uncertainty Modeling of Contextual-Connections Between Tracklets for Unconstrained Video-Based Face Recognition | Jingxiao Zheng, Ruichi Yu, Jun-Cheng Chen, Boyu Lu, Carlos D. Castillo, Rama Chellappa | In this paper, we propose the Uncertainty-Gated Graph (UGG), which conducts graph-based identity propagation between tracklets, which are represented by nodes in a graph. |
71 | Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading | Xingxuan Zhang, Feng Cheng, Shilin Wang | To well solve these drawbacks, we propose a Temporal Focal block to sufficiently describe short-range dependencies and a Spatio-Temporal Fusion Module (STFM) to maintain the local spatial information and to reduce the feature dimensions as well. |
72 | Occlusion-Aware Networks for 3D Human Pose Estimation in Video | Yu Cheng, Bo Yang, Bo Wang, Wending Yan, Robby T. Tan | To address this problem, we introduce an occlusion-aware deep-learning framework. |
73 | Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data | Yong Zhang, Haiyong Jiang, Baoyuan Wu, Yanbo Fan, Qiang Ji | In this paper, we propose a novel weakly supervised patch-based deep model on basis of two types of attention mechanisms for joint intensity estimation of multiple AUs. |
74 | Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning | Chaoyang Wang, Chen Kong, Simon Lucey | We propose to learn a 3D pose estimator by distilling knowledge from Non-Rigid Structure from Motion (NRSfM). |
75 | MONET: Multiview Semi-Supervised Keypoint Detection via Epipolar Divergence | Yuan Yao, Yasamin Jafarian, Hyun Soo Park | This paper presents MONET—an end-to-end semi-supervised learning framework for a keypoint detector using multiview image streams. |
76 | Occlusion Robust Face Recognition Based on Mask Learning With Pairwise Differential Siamese Network | Lingxue Song, Dihong Gong, Zhifeng Li, Changsong Liu, Wei Liu | Inspired by the fact that human visual system explicitly ignores the occlusion and only focuses on the non-occluded facial areas, we propose a mask learning strategy to find and discard corrupted feature elements from recognition. |
77 | Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection | Xuanyi Dong, Yi Yang | In this paper, we study facial landmark detection from partially labeled facial images. |
78 | A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image | Fu Xiong, Boshen Zhang, Yang Xiao, Zhiguo Cao, Taidong Yu, Joey Tianyi Zhou, Junsong Yuan | For 3D hand and body pose estimation task in depth image, a novel anchor-based approach termed Anchor-to-Joint regression network (A2J) with the end-to-end learning ability is proposed. |
79 | TexturePose: Supervising Human Mesh Estimation With Texture Consistency | Georgios Pavlakos, Nikos Kolotouros, Kostas Daniilidis | In this work, we advocate that there are more cues we can leverage, which are available for free in natural images, i.e., without getting more annotations, or modifying the network architecture. |
80 | FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images | Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max Argus, Thomas Brox | In this paper, we analyze cross-dataset generalization when training on existing datasets. As a consequence, we introduce the first large-scale, multi-view hand dataset that is accompanied by both 3D hand pose and shape annotations. |
81 | Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles | Nitin Saini, Eric Price, Rahul Tallamraju, Raffi Enficiaud, Roman Ludwig, Igor Martinovic, Aamir Ahmad, Michael J. Black | To make motion capture truly unconstrained, we describe the first fully autonomous outdoor capture system based on flying vehicles. |
82 | Toyota Smarthome: Real-World Activities of Daily Living | Srijan Das, Rui Dai, Michal Koperski, Luca Minciullo, Lorenzo Garattoni, Francois Bremond, Gianpiero Francesca | In this paper, we introduce a large real-world video dataset for activities of daily living: Toyota Smarthome. We release the dataset for research use. |
83 | Relation Parsing Neural Network for Human-Object Interaction Detection | Penghao Zhou, Mingmin Chi | In this paper, we propose a novel model, i.e., Relation Parsing Neural Network (RPNN), to detect human-object interactions. |
84 | DistInit: Learning Video Representations Without a Single Labeled Video | Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan | In this work we propose an alternative approach to learning video representations that requires no semantically labeled videos, and instead leverages the years of effort in collecting and labeling large and clean still-image datasets. |
85 | Zero-Shot Anticipation for Instructional Activities | Fadime Sener, Angela Yao | We address the problem of zero-shot anticipation by presenting a hierarchical model that generalizes instructional knowledge from large-scale text-corpora and transfers the knowledge to the visual domain. To demonstrate the anticipation capabilities of our model, we introduce the Tasty Videos dataset, a collection of 2511 recipes for zero-shot learning, recognition and anticipation. |
86 | Making the Invisible Visible: Action Recognition Through Walls and Occlusions | Tianhong Li, Lijie Fan, Mingmin Zhao, Yingcheng Liu, Dina Katabi | In this paper, we introduce a neural network model that can detect human actions through walls and occlusions, and in poor lighting conditions. |
87 | Recursive Visual Sound Separation Using Minus-Plus Net | Xudong Xu, Bo Dai, Dahua Lin | In this paper we propose a novel framework, referred to as MinusPlus Network (MP-Net), for the task of visual sound separation. |
88 | Unsupervised Video Interpolation Using Cycle Consistency | Fitsum A. Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro | Here, we propose unsupervised techniques to synthesize high frame rate videos directly from low frame rate videos using cycle consistency. |
89 | Deformable Surface Tracking by Graph Matching | Tao Wang, Haibin Ling, Congyan Lang, Songhe Feng, Xiaohui Hou | Specifically, we propose a graph-based approach that effectively explores the structure information of the surface to enhance tracking performance. |
90 | Deep Meta Learning for Real-Time Target-Aware Visual Tracking | Janghoon Choi, Junseok Kwon, Kyoung Mu Lee | In this paper, we propose a novel on-line visual tracking framework based on the Siamese matching network and meta-learner network, which run at real-time speeds. |
91 | Looking to Relations for Future Trajectory Forecast | Chiho Choi, Behzad Dariush | To this end, we propose a relation-aware framework for future trajectory forecast. |
92 | Anchor Diffusion for Unsupervised Video Object Segmentation | Zhao Yang, Qiang Wang, Luca Bertinetto, Weiming Hu, Song Bai, Philip H. S. Torr | Inspired by the non-local operators, we introduce a technique to establish dense correspondences between pixel embeddings of a reference “anchor” frame and the current one. |
93 | Tracking Without Bells and Whistles | Philipp Bergmann, Tim Meinhardt, Laura Leal-Taixe | We present a tracker (without bells and whistles) that accomplishes tracking without specifically targeting any of these tasks, in particular, we perform no training or optimization on tracking data. |
94 | Perspective-Guided Convolution Networks for Crowd Counting | Zhaoyi Yan, Yuchen Yuan, Wangmeng Zuo, Xiao Tan, Yezhen Wang, Shilei Wen, Errui Ding | In this paper, we propose a novel perspective-guided convolution (PGC) for convolutional neural network (CNN) based crowd counting (i.e. PGCNet), which aims to overcome the dramatic intra-scene scale variations of people due to the perspective effect. |
95 | End-to-End Wireframe Parsing | Yichao Zhou, Haozhi Qi, Yi Ma | We present a conceptually simple yet effective algorithm to detect wireframes in a given image. |
96 | Incremental Class Discovery for Semantic Segmentation With RGBD Sensing | Yoshikatsu Nakajima, Byeongkeun Kang, Hideo Saito, Kris Kitani | Towards a more open world approach, we propose a novel method that incrementally learns new classes for image segmentation. |
97 | SSF-DAN: Separated Semantic Feature Based Domain Adaptation Network for Semantic Segmentation | Liang Du, Jingang Tan, Hongye Yang, Jianfeng Feng, Xiangyang Xue, Qibao Zheng, Xiaoqing Ye, Xiaolin Zhang | In this work, we propose a Separated Semantic Feature based domain adaptation network, named SSF-DAN, for semantic segmentation. |
98 | SpaceNet MVOI: A Multi-View Overhead Imagery Dataset | Nicholas Weir, David Lindenbaum, Alexei Bastidas, Adam Van Etten, Sean McPherson, Jacob Shermeyer, Varun Kumar, Hanlin Tang | To address this problem, we present an open source Multi-View Overhead Imagery dataset, termed SpaceNet MVOI, with 27 unique looks from a broad range of viewing angles (-32.5 degrees to 54.0 degrees). |
99 | Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting | Vishwanath A. Sindagi, Vishal M. Patel | Specifically, we present a network that involves: (i) a multi-level bottom-top and top-bottom fusion (MBTTBF) method to combine information from shallower to deeper layers and vice versa at multiple levels, (ii) scale complementary feature extraction blocks (SCFB) involving cross-scale residual functions to explicitly enable flow of complementary features from adjacent conv layers along the fusion paths. |
100 | Learning Lightweight Lane Detection CNNs by Self Attention Distillation | Yuenan Hou, Zheng Ma, Chunxiao Liu, Chen Change Loy | In this paper, we present a novel knowledge distillation approach, i.e., Self Attention Distillation (SAD), which allows a model to learn from itself and gains substantial improvement without any additional supervision or labels. |
101 | SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation | Daniel Gordon, Abhishek Kadian, Devi Parikh, Judy Hoffman, Dhruv Batra | We propose SplitNet, a method for decoupling visual perception and policy learning. |
102 | Cascaded Parallel Filtering for Memory-Efficient Image-Based Localization | Wentao Cheng, Weisi Lin, Kan Chen, Xinfeng Zhang | In this work, we propose a cascaded parallel filtering method that leverages the feature, visibility and geometry information to filter wrong matches under binary feature representation. |
103 | Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation | Chao Wen, Yinda Zhang, Zhuwen Li, Yanwei Fu | We study the problem of shape generation in 3D mesh representation from a few color images with known camera poses. |
104 | A Differential Volumetric Approach to Multi-View Photometric Stereo | Fotios Logothetis, Roberto Mecca, Roberto Cipolla | In this work, we present a volumetric approach to the multi-view photometric stereo problem. |
105 | Revisiting Radial Distortion Absolute Pose | Viktor Larsson, Torsten Sattler, Zuzana Kukelova, Marc Pollefeys | We present a general approach which can handle rational models of arbitrary degree for both distortion and undistortion. |
106 | Estimating the Fundamental Matrix Without Point Correspondences With Application to Transmission Imaging | Tobias Wurfl, Andre Aichert, Nicole Maass, Frank Dennerlein, Andreas Maier | We present a general method to estimate the fundamental matrix from a pair of images under perspective projection without the need for image point correspondences. |
107 | QUARCH: A New Quasi-Affine Reconstruction Stratum From Vague Relative Camera Orientation Knowledge | Devesh Adlakha, Adlane Habed, Fabio Morbidi, Cedric Demonceaux, Michel de Mathelin | We present a new quasi-affine reconstruction of a scene and its application to camera self-calibration. |
108 | Homography From Two Orientation- and Scale-Covariant Features | Daniel Barath, Zuzana Kukelova | This paper proposes a geometric interpretation of the angles and scales which the orientation- and scale-covariant feature detectors, e.g. SIFT, provide. |
109 | Hiding Video in Audio via Reversible Generative Models | Hyukryul Yang, Hao Ouyang, Vladlen Koltun, Qifeng Chen | We present a method for hiding video content inside audio files while preserving the perceptual fidelity of the cover audio. |
110 | GSLAM: A General SLAM Framework and Benchmark | Yong Zhao, Shibiao Xu, Shuhui Bu, Hongkai Jiang, Pengcheng Han | In this paper, we propose a novel SLAM platform named GSLAM, which not only provides evaluation functionality, but also supplies useful toolkit for researchers to quickly develop their SLAM systems. |
111 | Elaborate Monocular Point and Line SLAM With Robust Initialization | Sang Jun Lee, Sung Soo Hwang | This paper presents a monocular indirect SLAM system which performs robust initialization and accurate localization. |
112 | Adaptive Density Map Generation for Crowd Counting | Jia Wan, Antoni Chan | To address this issue, we first show the impact of different density maps and that better ground-truth density maps can be obtained by refining the existing ones using a learned refinement network, which is jointly trained with the counter. Then, we propose an adaptive density map generator, which takes the annotation dot map as input, and learns a density map representation for a counter. |
113 | Attention-Aware Polarity Sensitive Embedding for Affective Image Retrieval | Xingxu Yao, Dongyu She, Sicheng Zhao, Jie Liang, Yu-Kun Lai, Jufeng Yang | To address the problem, this paper introduces an Attention-aware Polarity Sensitive Embedding (APSE) network to learn affective representations in an end-to-end manner. |
114 | Zero-Shot Emotion Recognition via Affective Structural Embedding | Chi Zhan, Dongyu She, Sicheng Zhao, Ming-Ming Cheng, Jufeng Yang | In this paper, we investigate zero-shot learning (ZSL) problem in the emotion recognition task, which tries to recognize the new unseen emotions. |
115 | FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On | Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bowen Wu, Bing-Cheng Chen, Jian Yin | In this work, we propose Flow-navigated Warping Generative Adversarial Network (FW-GAN), a novel framework that learns to synthesize the video of virtual try-on based on a person image, the desired clothes image, and a series of target poses. |
116 | Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation | Arnab Ghosh, Richard Zhang, Puneet K. Dokania, Oliver Wang, Alexei A. Efros, Philip H. S. Torr, Eli Shechtman | In order to use a single model for a wide array of object classes, we introduce a gating-based approach for class conditioning, which allows us to generate distinct classes without feature mixing, from a single generator network. |
117 | Attention-Based Autism Spectrum Disorder Screening With Privileged Modality | Shi Chen, Qi Zhao | This paper presents a novel framework for automatic and quantitative screening of autism spectrum disorder (ASD). |
118 | Image Aesthetic Assessment Based on Pairwise Comparison A Unified Approach to Score Regression, Binary Classification, and Personalization | Jun-Tae Lee, Chang-Su Kim | We propose a unified approach to three tasks of aesthetic score regression, binary aesthetic classification, and personalized aesthetics. |
119 | Delving Into Robust Object Detection From Unmanned Aerial Vehicles: A Deep Nuisance Disentanglement Approach | Zhenyu Wu, Karthik Suresh, Priya Narayanan, Hongyu Xu, Heesung Kwon, Zhangyang Wang | We propose to utilize those free meta-data in conjunction with associated UAV images to learn domain-robust features via an adversarial training framework dubbed Nuisance Disentangled Feature Transform (NDFT), for the specific challenging problem of object detection in UAV images, achieving a substantial gain in robustness to those nuisances. |
120 | Bit-Flip Attack: Crushing Neural Network With Progressive Bit Search | Adnan Siraj Rakin, Zhezhi He, Deliang Fan | In this work, we are the first to propose a novel DNN weight attack methodology called Bit-Flip Attack (BFA) which can crush a neural network through maliciously flipping extremely small amount of bits within its weight storage memory system (i.e., DRAM). |
121 | Pushing the Frontiers of Unconstrained Crowd Counting: New Dataset and Benchmark Method | Vishwanath A. Sindagi, Rajeev Yasarla, Vishal M. Patel | In this work, we propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation. |
122 | Employing Deep Part-Object Relationships for Salient Object Detection | Yi Liu, Qiang Zhang, Dingwen Zhang, Jungong Han | To solve this problem, we dig into part-object relationships and take the unprecedented attempt to employ these relationships endowed by the Capsule Network (CapsNet) for salient object detection. |
123 | Self-Supervised Deep Depth Denoising | Vladimiros Sterzentsenko, Leonidas Saroglou, Anargyros Chatzitofis, Spyridon Thermos, Nikolaos Zioulis, Alexandros Doumanoglou, Dimitrios Zarpalas, Petros Daras | In this paper, we propose a fully convolutional deep autoencoder that learns to denoise depth maps, surpassing the lack of ground truth data. |
124 | Cost-Aware Fine-Grained Recognition for IoTs Based on Sequential Fixations | Hanxiao Wang, Venkatesh Saligrama, Stan Sclaroff, Vitaly Ablavsky | We propose a novel deep reinforcement learning-based foveation model, DRIFT, that sequentially generates and recognizes mixed-acuity images. |
125 | Layout-Induced Video Representation for Recognizing Agent-in-Place Actions | Ruichi Yu, Hongcheng Wang, Ang Li, Jingxiao Zheng, Vlad I. Morariu, Larry S. Davis | We introduce a novel representation to model the geometry and topology of scene layouts so that a network can generalize from the layouts observed in the training scenes to unseen scenes in the test set. |
126 | Anomaly Detection in Video Sequence With Appearance-Motion Correspondence | Trong-Nguyen Nguyen, Jean Meunier | We propose a deep convolutional neural network (CNN) that addresses this problem by learning a correspondence between common object appearances (e.g. pedestrian, background, tree, etc.) and their associated motions. |
127 | Exploring Randomly Wired Neural Networks for Image Recognition | Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He | In this paper, we explore a more diverse set of connectivity patterns through the lens of randomly wired neural networks. |
128 | Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation | Xin Chen, Lingxi Xie, Jun Wu, Qi Tian | In this paper, we present an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. |
129 | Multinomial Distribution Learning for Effective Neural Architecture Search | Xiawu Zheng, Rongrong Ji, Lang Tang, Baochang Zhang, Jianzhuang Liu, Qi Tian | In this paper, we propose a Multinomial Distribution Learning for extremely effective NAS, which considers the search space as a joint multinomial distribution, i.e., the operation between two nodes is sampled from this distribution, and the optimal network structure is obtained by the operations with the most likely probability in this distribution. |
130 | Searching for MobileNetV3 | Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam | We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. |
131 | Data-Free Quantization Through Weight Equalization and Bias Correction | Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling | We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. |
132 | A Camera That CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays | Laurie Bose, Jianing Chen, Stephen J. Carey, Piotr Dudek, Walterio Mayol-Cuevas | We present a convolutional neural network implementation for pixel processor array (PPA) sensors. |
133 | Knowledge Distillation via Route Constrained Optimization | Xiao Jin, Baoyun Peng, Yichao Wu, Yu Liu, Jiaheng Liu, Ding Liang, Junjie Yan, Xiaolin Hu | In this work, we consider the knowledge distillation from the perspective of curriculum learning by teacher’s routing. |
134 | Distillation-Based Training for Multi-Exit Architectures | Mary Phuong, Christoph H. Lampert | In this work, we propose a new training procedure for multi-exit architectures based on the principle of knowledge distillation. |
135 | Similarity-Preserving Knowledge Distillation | Frederick Tung, Greg Mori | In this paper, we propose a new form of knowledge distillation loss that is inspired by the observation that semantically similar inputs tend to elicit similar activation patterns in a trained network. |
136 | Many Task Learning With Task Routing | Gjorgji Strezoski, Nanne van Noord, Marcel Worring | In this paper, we introduce a method which applies a conditional feature-wise transformation over the convolutional activations that enables a model to successfully perform a large number of tasks. |
137 | Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels | Felix J.S. Bragman, Ryutaro Tanno, Sebastien Ourselin, Daniel C. Alexander, Jorge Cardoso | In this paper, we present a probabilistic approach to learning task-specific and shared representations in CNNs for multi-task learning. |
138 | Transferability and Hardness of Supervised Classification Tasks | Anh T. Tran, Cuong V. Nguyen, Tal Hassner | We propose a novel approach for estimating the difficulty and transferability of supervised classification tasks. |
139 | Moment Matching for Multi-Source Domain Adaptation | Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, Bo Wang | We make three major contributions towards addressing this problem. First, we collect and annotate by far the largest UDA dataset, called DomainNet, which contains six domains and about 0.6 million images distributed among 345 categories, addressing the gap in data availability for multi-source UDA research. Second, we propose a new deep learning approach, Moment Matching for Multi-Source Domain Adaptation (M3SDA), which aims to transfer knowledge learned from multiple labeled source domains to an unlabeled target domain by dynamically aligning moments of their feature distributions. |
140 | Unsupervised Domain Adaptation via Regularized Conditional Alignment | Safa Cicek, Stefano Soatto | We propose a method for unsupervised domain adaptation that trains a shared embedding to align the joint distributions of inputs (domain) and outputs (classes), making any classifier agnostic to the domain. |
141 | Larger Norm More Transferable: An Adaptive Feature Norm Approach for Unsupervised Domain Adaptation | Ruijia Xu, Guanbin Li, Jihan Yang, Liang Lin | In this paper, we empirically reveal that the erratic discrimination of the target domain mainly stems from its much smaller feature norms with respect to that of the source domain. |
142 | UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation | Jogendra Nath Kundu, Nishank Lakkakula, R. Venkatesh Babu | In this paper, we propose UM-Adapt – a unified framework to effectively perform unsupervised domain adaptation for spatially-structured prediction tasks, simultaneously maintaining a balanced performance across individual tasks in a multi-task setting. |
143 | Episodic Training for Domain Generalization | Da Li, Jianshu Zhang, Yongxin Yang, Cong Liu, Yi-Zhe Song, Timothy M. Hospedales | In this paper we build on this strong baseline by designing an episodic training procedure that trains a single deep network in a way that exposes it to the domain shift that characterises a novel domain at runtime. |
144 | Domain Adaptation for Structured Output via Discriminative Patch Representations | Yi-Hsuan Tsai, Kihyuk Sohn, Samuel Schulter, Manmohan Chandraker | We propose to learn discriminative feature representations of patches in the source domain by discovering multiple modes of patch-wise output distribution through the construction of a clustered space. |
145 | Semi-Supervised Learning by Augmented Distribution Alignment | Qin Wang, Wen Li, Luc Van Gool | In this work, we propose a simple yet effective semi-supervised learning approach called Augmented Distribution Alignment. |
146 | S4L: Self-Supervised Semi-Supervised Learning | Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, Lucas Beyer | Unifying these two approaches, we propose the framework of self-supervised semi-supervised learning (S4L) and use it to derive two novel semi-supervised image classification methods. |
147 | Privacy Preserving Image Queries for Camera Localization | Pablo Speciale, Johannes L. Schonberger, Sudipta N. Sinha, Marc Pollefeys | We propose to conceal the content of the query images from an adversary on the server or a man-in-the-middle intruder. |
148 | Calibration Wizard: A Guidance System for Camera Calibration Based on Modelling Geometric and Corner Uncertainty | Songyou Peng, Peter Sturm | We present a system — Calibration Wizard — that interactively guides a user towards taking optimal calibration images. |
149 | Gated2Depth: Real-Time Dense Lidar From Gated Images | Tobias Gruber, Frank Julca-Aguilar, Mario Bijelic, Felix Heide | We present an imaging framework which converts three images from a gated camera into high-resolution depth maps with depth accuracy comparable to pulsed lidar measurements. |
150 | X-Section: Cross-Section Prediction for Enhanced RGB-D Fusion | Andrea Nicastro, Ronald Clark, Stefan Leutenegger | Here, we propose X-Section, an RGB-D 3D reconstruction approach that leverages deep learning to make object-level predictions about thicknesses that can be readily integrated into a volumetric multi-view fusion process, where we propose an extension to the popular KinectFusion approach. |
151 | Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo | Stepan Tulyakov, Francois Fleuret, Martin Kiefel, Peter Gehler, Michael Hirsch | To address this problem we introduce a new module for event sequence embedding, for use in difference applications. |
152 | Point-Based Multi-View Stereo Network | Rui Chen, Songfang Han, Jing Xu, Hao Su | We introduce Point-MVSNet, a novel point-based deep framework for multi-view stereo (MVS). |
153 | Discrete Laplace Operator Estimation for Dynamic 3D Reconstruction | Xiangyu Xu, Enrique Dunn | We present a general paradigm for dynamic 3D reconstruction from multiple independent and uncontrolled image sources having arbitrary temporal sampling density and distribution. |
154 | Deep Non-Rigid Structure From Motion | Chen Kong, Simon Lucey | In this paper we propose a novel deep neural network to recover camera poses and 3D points solely from an ensemble of 2D image coordinates. |
155 | Equivariant Multi-View Networks | Carlos Esteves, Yinshuang Xu, Christine Allen-Blanchette, Kostas Daniilidis | In this paper, we propose a group convolutional approach to multiple view aggregation where convolutions are performed over a discrete subgroup of the rotation group, enabling, thus, joint reasoning over all views in an equivariant (instead of invariant) fashion, up to the very last layer. |
156 | Interpolated Convolutional Networks for 3D Point Cloud Understanding | Jiageng Mao, Xiaogang Wang, Hongsheng Li | In this paper, we propose a novel Interpolated Convolution operation, InterpConv, to tackle the point cloud feature learning and understanding problem. |
157 | Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data | Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Thanh Nguyen, Sai-Kit Yeung | To prove this, we introduce ScanObjectNN, a new real-world point cloud object dataset based on scanned indoor scene data. |
158 | PointCloud Saliency Maps | Tianhang Zheng, Changyou Chen, Junsong Yuan, Bo Li, Kui Ren | In this paper, we propose a novel way of characterizing critical points and segments to build point-cloud saliency maps. |
159 | ShellNet: Efficient Point Cloud Convolutional Neural Networks Using Concentric Shells Statistics | Zhiyuan Zhang, Binh-Son Hua, Sai-Kit Yeung | In this paper, we address these problems by proposing an efficient end-to-end permutation invariant convolution for point cloud deep learning. |
160 | Unsupervised Deep Learning for Structured Shape Matching | Jean-Michel Roufosse, Abhishek Sharma, Maks Ovsjanikov | We present a novel method for computing correspondences across 3D shapes using unsupervised learning. |
161 | Linearly Converging Quasi Branch and Bound Algorithms for Global Rigid Registration | Nadav Dym, Shahar Ziv Kovalsky | In this paper, we suggest a general framework to improve upon the BnB approach, which we name Quasi BnB. |
162 | Consensus Maximization Tree Search Revisited | Zhipeng Cai, Tat-Jun Chin, Vladlen Koltun | We make two key contributions towards improving A* tree search. We propose a new acceleration strategy that avoids such redundant paths. In the second contribution, we show that the existing branch pruning technique also deteriorates quickly with the problem dimension. |
163 | Quasi-Globally Optimal and Efficient Vanishing Point Estimation in Manhattan World | Haoang Li, Ji Zhao, Jean-Charles Bazin, Wen Chen, Zhe Liu, Yun-Hui Liu | In Manhattan world, given several lines in a calibrated image, we aim at clustering them by three unknown-but-sought VPs. |
164 | An Efficient Solution to the Homography-Based Relative Pose Problem With a Common Reference Direction | Yaqing Ding, Jian Yang, Jean Ponce, Hui Kong | In this paper, we propose a novel approach to two-view minimal-case relative pose problems based on homography with a common reference direction. |
165 | A Quaternion-Based Certifiably Optimal Solution to the Wahba Problem With Outliers | Heng Yang, Luca Carlone | This work proposes the first polynomial-time certifiably optimal approach for solving the Wahba problem when a large number of vector observations are outliers. |
166 | PLMP – Point-Line Minimal Problems in Complete Multi-View Visibility | Timothy Duff, Kathlen Kohn, Anton Leykin, Tomas Pajdla | We present a complete classification of all minimal problems for generic arrangements of points and lines completely observed by calibrated perspective cameras. |
167 | Variational Few-Shot Learning | Jian Zhang, Chenglong Zhao, Bingbing Ni, Minghao Xu, Xiaokang Yang | We propose a variational Bayesian framework for enhancing few-shot learning performance. |
168 | Generative Adversarial Minority Oversampling | Sankha Subhra Mullick, Shounak Datta, Swagatam Das | We propose a three-player adversarial game between a convex generator, a multi-class classifier network, and a real/fake discriminator to perform oversampling in deep learning systems. |
169 | Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection | Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, Anton van den Hengel | To mitigate this drawback for autoencoder based anomaly detector, we propose to augment the autoencoder with a memory module and develop an improved autoencoder called memory-augmented autoencoder, i.e. MemAE. |
170 | Topological Map Extraction From Overhead Images | Zuoyue Li, Jan Dirk Wegner, Aurelien Lucchi | We propose a new approach, named PolyMapper, to circumvent the conventional pixel-wise segmentation of (aerial) images and predict objects in a vector representation directly. |
171 | Exploiting Temporal Consistency for Real-Time Video Depth Estimation | Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, Youliang Yan | In this work, we focus on exploring temporal information from monocular videos for depth estimation. |
172 | The Sound of Motions | Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba | Inspired by the fact that humans is capable of interpreting sound sources from how objects move visually, we propose a novel system that explicitly captures such motion cues for the task of sound localization and separation. |
173 | SC-FEGAN: Face Editing Generative Adversarial Network With User’s Sketch and Color | Youngjoo Jo, Jongyoul Park | We present a novel image editing system that generates images as the user provides free-form masks, sketches and color as inputs. |
174 | Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style | Hongwei Ge, Zehang Yan, Kai Zhang, Mingde Zhao, Liang Sun | In this paper, we explore the utilization of a human-like cognitive style, i.e., building overall cognition for the image to be described and the sentence to be constructed, for enhancing computer image understanding. |
175 | Order-Aware Generative Modeling Using the 3D-Craft Dataset | Zhuoyuan Chen, Demi Guo, Tong Xiao, Saining Xie, Xinlei Chen, Haonan Yu, Jonathan Gray, Kavya Srinet, Haoqi Fan, Jerry Ma, Charles R. Qi, Shubham Tulsiani, Arthur Szlam, C. Lawrence Zitnick | In this paper, we study the problem of sequentially building houses in the game of Minecraft, and demonstrate that learning the ordering can make for more effective autoregressive models. We introduce a new dataset, HouseCraft, for this new task. |
176 | Crowd Counting With Deep Structured Scale Integration Network | Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, Liang Lin | In this paper, we propose a novel Deep Structured Scale Integration Network (DSSINet) for crowd counting, which addresses the scale variation of people by using structured feature representation learning and hierarchically structured loss function optimization. |
177 | Bidirectional One-Shot Unsupervised Domain Mapping | Tomer Cohen, Lior Wolf | The method we present is able to perform this mapping in both directions. |
178 | Evolving Space-Time Neural Architectures for Videos | AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo | We present a new method for finding video CNN architectures that more optimally capture rich spatio-temporal information in videos. |
179 | Universally Slimmable Networks and Improved Training Techniques | Jiahui Yu, Thomas S. Huang | In this work, we propose a systematic approach to train universally slimmable networks (US-Nets), extending slimmable networks to execute at arbitrary width, and generalizing to networks both with and without batch normalization layers. |
180 | AutoDispNet: Improving Disparity Estimation With AutoML | Tonmoy Saikia, Yassine Marrakchi, Arber Zela, Frank Hutter, Thomas Brox | In this work, we show how to use and extend existing AutoML techniques to efficiently optimize large-scale U-Net-like encoder-decoder architectures. |
181 | Deep Meta Functionals for Shape Representation | Gidi Littwin, Lior Wolf | We present a new method for 3D shape reconstruction from a single image, in which a deep neural network directly maps an image to a vector of network weights. |
182 | Differentiable Kernel Evolution | Yu Liu, Jihao Liu, Ailing Zeng, Xiaogang Wang | This paper proposes a differentiable kernel evolution (DKE) algorithm to find a better layer-operator for the convolutional neural network. |
183 | Batch Weight for Domain Adaptation With Mass Shift | Mikolaj Binkowski, Devon Hjelm, Aaron Courville | We propose a principled method of re-weighting training samples to correct for such mass shift between the transferred distributions, which we call batch weight. |
184 | SRM: A Style-Based Recalibration Module for Convolutional Neural Networks | HyunJae Lee, Hyo-Eun Kim, Hyeonseob Nam | In this paper, we aim to fully leverage the potential of styles to improve the performance of CNNs in general vision tasks. |
185 | Switchable Whitening for Deep Representation Learning | Xingang Pan, Xiaohang Zhan, Jianping Shi, Xiaoou Tang, Ping Luo | Unlike existing works that design normalization techniques for specific tasks, we propose Switchable Whitening (SW), which provides a general form unifying different whitening methods as well as standardization methods. |
186 | Adaptative Inference Cost With Convolutional Neural Mixture Models | Adria Ruiz, Jakob Verbeek | Within the proposed framework, we present different mechanisms to prune subsets of CNNs from the mixture, allowing to easily adapt the computational cost required for inference. |
187 | On Network Design Spaces for Visual Recognition | Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollar | To help sustain this rate of progress, in this work we propose to reexamine the methodology for comparing network architectures. |
188 | Improved Techniques for Training Adaptive Deep Networks | Hao Li, Hong Zhang, Xiaojuan Qi, Ruigang Yang, Gao Huang | We present three techniques to improve its training efficacy from two aspects: 1) a Gradient Equilibrium algorithm to resolve the conflict of learning of different classifiers; 2) an Inline Subnetwork Collaboration approach and a One-for-all Knowledge Distillation algorithm to enhance the collaboration among classifiers. |
189 | Resource Constrained Neural Network Architecture Search: Will a Submodularity Assumption Help? | Yunyang Xiong, Ronak Mehta, Vikas Singh | Based on this observation, we adapt algorithms within discrete optimization to obtain heuristic schemes for neural network architecture search, where we have resource constraints on the architecture. |
190 | ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks | Xiaohan Ding, Yuchen Guo, Guiguang Ding, Jungong Han | We propose Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels. |
191 | A Comprehensive Overhaul of Feature Distillation | Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, Jin Young Choi | We investigate the design aspects of feature distillation methods achieving network compression and propose a novel feature distillation method in which the distillation loss is designed to make a synergy among various aspects: teacher transform, student transform, distillation feature position and distance function. |
192 | Transferable Semi-Supervised 3D Object Detection From RGB-D Data | Yew Siang Tang, Gim Hee Lee | To this end, we propose a transferable semi-supervised 3D object detection model that learns a 3D object detector network from training data with two disjoint sets of object classes – a set of strong classes with both 2D and 3D box labels, and another set of weak classes with only 2D box labels. |
193 | DPOD: 6D Pose Object Detector and Refiner | Sergey Zakharov, Ivan Shugurov, Slobodan Ilic | In this paper we present a novel deep learning method for 3D object detection and 6D pose estimation from RGB images. |
194 | STD: Sparse-to-Dense 3D Object Detector for Point Cloud | Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, Jiaya Jia | We propose a two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD). |
195 | DUP-Net: Denoiser and Upsampler Network for 3D Adversarial Point Clouds Defense | Hang Zhou, Kejiang Chen, Weiming Zhang, Han Fang, Wenbo Zhou, Nenghai Yu | In this paper, statistical outlier removal (SOR) and a data-driven upsampling network are considered as denoiser and upsampler respectively. |
196 | Learning Rich Features at High-Speed for Single-Shot Object Detection | Tiancai Wang, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao | We introduce a single-stage detection framework that combines the advantages of both fine-tuning pretrained models and training from scratch. |
197 | Detecting Unseen Visual Relations Using Analogies | Julia Peyre, Ivan Laptev, Cordelia Schmid, Josef Sivic | The contributions of this work are three-fold. First, we learn a representation of visual relations that combines (i) individual embeddings for subject, object and predicate together with (ii) a visual phrase embedding that represents the relation triplet. Second, we learn how to transfer visual phrase embeddings from existing training triplets to unseen test triplets using analogies between relations that involve similar objects. |
198 | Disentangling Monocular 3D Object Detection | Andrea Simonelli, Samuel Rota Bulo, Lorenzo Porzi, Manuel Lopez-Antequera, Peter Kontschieder | In this paper we propose an approach for monocular 3D object detection from a single RGB image, which leverages a novel disentangling transformation for 2D and 3D detection losses and a novel, self-supervised confidence score for 3D bounding boxes. |
199 | STM: SpatioTemporal and Motion Encoding for Action Recognition | Boyuan Jiang, MengMeng Wang, Weihao Gan, Wei Wu, Junjie Yan | In this work, we aim to efficiently encode these two features in a unified 2D framework. |
200 | Dynamic Context Correspondence Network for Semantic Alignment | Shuaiyi Huang, Qiuyue Wang, Songyang Zhang, Shipeng Yan, Xuming He | In this paper, we aim to incorporate global semantic context in a flexible manner to overcome the limitations of prior work that relies on local semantic representations. |
201 | Fooling Network Interpretation in Image Classification | Akshayvarun Subramanya, Vipin Pillai, Hamed Pirsiavash | We show that it is possible to create adversarial patches which not only fool the prediction, but also change what we interpret regarding the cause of the prediction. |
202 | Unconstrained Foreground Object Search | Yinan Zhao, Brian Price, Scott Cohen, Danna Gurari | We instead propose a novel problem of unconstrained foreground object (UFO) search and introduce a solution that supports efficient search by encoding the background image in the same latent space as the candidate foreground objects. |
203 | Embodied Amodal Recognition: Learning to Move to Perceive Objects | Jianwei Yang, Zhile Ren, Mingze Xu, Xinlei Chen, David J. Crandall, Devi Parikh, Dhruv Batra | In this work, we introduce the task of Embodied Amodel Recognition (EAR): an agent is instantiated in a 3D environment close to an occluded target object, and is free to move in the environment to perform object classification, amodal object localization, and amodal object segmentation. |
204 | SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition | Kaiyu Yang, Olga Russakovsky, Jia Deng | We introduce SpatialSense, a dataset specializing in spatial relation recognition which captures a broad spectrum of such challenges, allowing for proper benchmarking of computer vision techniques. |
205 | TensorMask: A Foundation for Dense Object Segmentation | Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollar | In this work, we investigate the paradigm of dense sliding-window instance segmentation, which is surprisingly under-explored. |
206 | Integral Object Mining via Online Attention Accumulation | Peng-Tao Jiang, Qibin Hou, Yang Cao, Ming-Ming Cheng, Yunchao Wei, Hong-Kai Xiong | In order to accumulate the discovered different object parts, we propose an online attention accumulation (OAA) strategy which maintains a cumulative attention map for each target category in each training image so that the integral object regions can be gradually promoted as the training goes. |
207 | Accelerated Gravitational Point Set Alignment With Altered Physical Laws | Vladislav Golyanik, Christian Theobalt, Didier Stricker | This work describes Barnes-Hut Rigid Gravitational Approach (BH-RGA) — a new rigid point set registration method relying on principles of particle dynamics. |
208 | Domain Adaptation for Semantic Segmentation With Maximum Squares Loss | Minghao Chen, Hongyang Xue, Deng Cai | To balance the gradient of well-classified target samples, we propose the maximum squares loss. |
209 | Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data | Xiangyu Yue, Yang Zhang, Sicheng Zhao, Alberto Sangiovanni-Vincentelli, Kurt Keutzer, Boqing Gong | To this end, we propose a new approach of domain randomization and pyramid consistency to learn a model with high generalizability. |
210 | Semi-Supervised Skin Detection by Network With Mutual Guidance | Yi He, Jiayuan Shi, Chuan Wang, Haibin Huang, Jiaming Liu, Guanbin Li, Risheng Liu, Jue Wang | We present a new data-driven method for robust skin detection from a single human portrait image. |
211 | ACE: Adapting to Changing Environments for Semantic Segmentation | Zuxuan Wu, Xin Wang, Joseph E. Gonzalez, Tom Goldstein, Larry S. Davis | We present ACE, a framework for semantic segmentation that dynamically adapts to changing environments over time. |
212 | Efficient Segmentation: Learning Downsampling Near Semantic Boundaries | Dmitrii Marin, Zijian He, Peter Vajda, Priyam Chatterjee, Sam Tsai, Fei Yang, Yuri Boykov | To address this problem, we propose a new content-adaptive downsampling technique that learns to favor sampling locations near semantic boundaries of target classes. |
213 | Recurrent U-Net for Resource-Constrained Segmentation | Wei Wang, Kaicheng Yu, Joachim Hugonot, Pascal Fua, Mathieu Salzmann | In this paper, we introduce a novel recurrent U-Net architecture that preserves the compactness of the original U-Net, while substantially increasing its performance to the point where it outperforms the state of the art on several benchmarks. We also introduce a large-scale dataset for hand segmentation. |
214 | Detecting the Unexpected via Image Resynthesis | Krzysztof Lis, Krishna Nakka, Pascal Fua, Mathieu Salzmann | In this paper, we tackle the more realistic scenario where unexpected objects of unknown classes can appear at test time. |
215 | Self-Supervised Monocular Depth Hints | Jamie Watson, Michael Firman, Gabriel J. Brostow, Daniyar Turmukhambetov | Here, we study the problem of ambiguous reprojections in depth-prediction from stereo-based self-supervision, and introduce Depth Hints to alleviate their effects. |
216 | 3D Scene Reconstruction With Multi-Layer Depth and Epipolar Transformers | Daeyun Shin, Zhile Ren, Erik B. Sudderth, Charless C. Fowlkes | To improve the accuracy of view-centered representations for complex scenes, we introduce a novel “Epipolar Feature Transformer” that transfers convolutional network features from an input view to other virtual camera viewpoints, and thus better covers the 3D scene geometry. |
217 | How Do Neural Networks See Depth in Single Images? | Tom van Dijk, Guido de Croon | In this work we take four previously published networks and investigate what depth cues they exploit. |
218 | On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos | Zhi Li, Xuan Wang, Fei Wang, Peilin Jiang | In this paper, we propose to exploit monocular videos to complement the training dataset for the single-image 3D human pose estimation tasks. |
219 | Canonical Surface Mapping via Geometric Cycle Consistency | Nilesh Kulkarni, Abhinav Gupta, Shubham Tulsiani | Our key insight is that the CSM task (pixel to 3D), when combined with 3D projection (3D to pixel), completes a cycle. |
220 | 3D-RelNet: Joint Object and Relational Network for 3D Prediction | Nilesh Kulkarni, Ishan Misra, Shubham Tulsiani, Abhinav Gupta | We propose an approach to predict the 3D shape and pose for the objects present in a scene. |
221 | GP2C: Geometric Projection Parameter Consensus for Joint 3D Pose and Focal Length Estimation in the Wild | Alexander Grabner, Peter M. Roth, Vincent Lepetit | We present a joint 3D pose and focal length estimation approach for object categories in the wild. |
222 | Moulding Humans: Non-Parametric 3D Human Shape Estimation From Single Images | Valentin Gabeur, Jean-Sebastien Franco, Xavier Martin, Cordelia Schmid, Gregory Rogez | In this paper, we tackle the problem of 3D human shape estimation from single RGB images. |
223 | 3DPeople: Modeling the Geometry of Dressed Humans | Albert Pumarola, Jordi Sanchez-Riera, Gary P. T. Choi, Alberto Sanfeliu, Francesc Moreno-Noguer | In this paper, we present an approach to model dressed humans and predict their geometry from single images. |
224 | Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop | Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, Kostas Daniilidis | In this work, instead of investigating which approach is better, our key insight is that the two paradigms can form a strong collaboration. |
225 | Optimizing Network Structure for 3D Human Pose Estimation | Hai Ci, Chunyu Wang, Xiaoxuan Ma, Yizhou Wang | In this work, we propose a generic formulation where both GCN and Fully Connected Network (FCN) are its special cases. |
226 | Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks | Yujun Cai, Liuhao Ge, Jun Liu, Jianfei Cai, Tat-Jen Cham, Junsong Yuan, Nadia Magnenat Thalmann | Motivated by the effectiveness of incorporating spatial dependencies and temporal consistencies to alleviate these issues, we propose a novel graph-based method to tackle the problem of 3D human body and 3D hand pose estimation from a short sequence of 2D joint detections. |
227 | Resolving 3D Human Pose Ambiguities With 3D Scene Constraints | Mohamed Hassan, Vasileios Choutas, Dimitrios Tzionas, Michael J. Black | Our key contribution is to exploit static 3D scene structure to better estimate human pose from monocular images. |
228 | Tex2Shape: Detailed Full Human Body Geometry From a Single Image | Thiemo Alldieck, Gerard Pons-Moll, Christian Theobalt, Marcus Magnor | We present a simple yet effective method to infer detailed full human body shape from only a single photograph. |
229 | PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization | Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, Hao Li | Using PIFu, we propose an end-to-end deep learning method for digitizing highly detailed clothed humans that can infer both 3D surface and texture from a single image, and optionally, multiple input images. |
230 | DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction | Xiaoxing Zeng, Xiaojiang Peng, Yu Qiao | This paper proposes a deep Dense-Fine-Finer Network (DF2Net) to address this challenging problem. |
231 | Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking | Saurabh Sharma, Pavan Teja Varigonda, Prashast Bindal, Abhishek Sharma, Arjun Jain | In this paper, we propose a Deep Conditional Variational Autoencoder based model that synthesizes diverse anatomically plausible 3D-pose samples conditioned on the estimated 2D-pose. |
232 | Aligning Latent Spaces for 3D Hand Pose Estimation | Linlin Yang, Shile Li, Dongheui Lee, Angela Yao | In this work, we propose to learn a joint latent representation that leverages other modalities as weak labels to boost the RGB-based hand pose estimator. |
233 | HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation | Kun Zhou, Xiaoguang Han, Nianjuan Jiang, Kui Jia, Jiangbo Lu | This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state – Part-Centric Heatmap Triplets (HEMlets), which shortens the gap between the 2D observation and the 3D interpretation. |
234 | End-to-End Hand Mesh Recovery From a Monocular RGB Image | Xiong Zhang, Qiang Li, Hong Mo, Wenbo Zhang, Wen Zheng | In this paper, we present a HAnd Mesh Recovery (HAMR) framework to tackle the problem of reconstructing the full 3D mesh of a human hand from a single RGB image. |
235 | Robust Multi-Modality Multi-Object Tracking | Wenwei Zhang, Hui Zhou, Shuyang Sun, Zhe Wang, Jianping Shi, Chen Change Loy | In this study, we design a generic sensor-agnostic multi-modality MOT framework (mmMOT), where each modality (i.e., sensors) is capable of performing its role independently to preserve reliability, and could further improving its accuracy through a novel multi-modality fusion module. |
236 | The Trajectron: Probabilistic Multi-Agent Trajectory Modeling With Dynamic Spatiotemporal Graphs | Boris Ivanovic, Marco Pavone | Towards this end, we present the Trajectron, a graph-structured model that predicts many potential future trajectories of multiple agents simultaneously in both highly dynamic and multimodal scenarios (i.e. where the number of agents in the scene is time-varying and there are many possible highly-distinct futures for each agent). |
237 | ‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-Term Tracking | Bin Yan, Haojie Zhao, Dong Wang, Huchuan Lu, Xiaoyun Yang | In this work, we present a novel robust and real-time long-term tracking framework based on the proposed skimming and perusal modules. |
238 | TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection | Kyle Min, Jason J. Corso | The proposed approach assumes that the saliency map of any frame can be predicted by considering a limited number of past frames. The results of our extensive experiments on video saliency detection validate this assumption and demonstrate that our fully-convolutional model with temporal aggregation method is effective. |
239 | Attacking Optical Flow | Anurag Ranjan, Joel Janai, Andreas Geiger, Michael J. Black | In this paper, we extend adversarial patch attacks to optical flow networks and show that such attacks can compromise their performance. |
240 | Pro-Cam SSfM: Projector-Camera System for Structure and Spectral Reflectance From Motion | Chunyu Li, Yusuke Monno, Hironori Hidaka, Masatoshi Okutomi | In this paper, we propose a novel projector-camera system for practical and low-cost acquisition of a dense object 3D model with the spectral reflectance property. |
241 | Mop Moire Patterns Using MopNet | Bin He, Ce Wang, Boxin Shi, Ling-Yu Duan | In this paper, we propose a Moire pattern Removal Neural Network (MopNet) to solve this problem. |
242 | Kernel Modeling Super-Resolution on Real Low-Resolution Images | Ruofan Zhou, Sabine Susstrunk | To improve generalization and robustness of deep super-resolution CNNs on real photographs, we present a kernel modeling super-resolution network (KMSR) that incorporates blur-kernel modeling in the training. |
243 | Learning to Jointly Generate and Separate Reflections | Daiqian Ma, Renjie Wan, Boxin Shi, Alex C. Kot, Ling-Yu Duan | In this work, we propose to jointly generate and separate reflections within a weakly-supervised learning framework, aiming to model the reflection image formation more comprehensively with abundant unpaired supervision. In particular, we built up an unpaired reflection dataset with 4,027 images, which is useful for facilitating the weakly-supervised learning of reflection removal model. |
244 | Deep Multi-Model Fusion for Single-Image Dehazing | Zijun Deng, Lei Zhu, Xiaowei Hu, Chi-Wing Fu, Xuemiao Xu, Qing Zhang, Jing Qin, Pheng-Ann Heng | This paper presents a deep multi-model fusion network to attentively integrate multiple models to separate layers and boost the performance in single-image dehazing. |
245 | Deep Learning for Seeing Through Window With Raindrops | Yuhui Quan, Shijie Deng, Yixin Chen, Hui Ji | In the proposed CNN, we introduce a double attention mechanism that concurrently guides the CNN using shape-driven attention and channel re-calibration. |
246 | Mask-ShadowGAN: Learning to Remove Shadows From Unpaired Data | Xiaowei Hu, Yitong Jiang, Chi-Wing Fu, Pheng-Ann Heng | This paper presents a new method for shadow removal using unpaired data, enabling us to avoid tedious annotations and obtain more diverse training samples. |
247 | Spatio-Temporal Filter Adaptive Network for Video Deblurring | Shangchen Zhou, Jiawei Zhang, Jinshan Pan, Haozhe Xie, Wangmeng Zuo, Jimmy Ren | To overcome the limitation of separate optical flow estimation, we propose a Spatio-Temporal Filter Adaptive Network (STFAN) for the alignment and deblurring in a unified framework. |
248 | Learning Deep Priors for Image Dehazing | Yang Liu, Jinshan Pan, Jimmy Ren, Zhixun Su | We propose an effective iteration algorithm with deep CNNs to learn haze-relevant priors for image dehazing. |
249 | JPEG Artifacts Reduction via Deep Convolutional Sparse Coding | Xueyang Fu, Zheng-Jun Zha, Feng Wu, Xinghao Ding, John Paisley | To effectively reduce JPEG compression artifacts, we propose a deep convolutional sparse coding (DCSC) network architecture. |
250 | Self-Guided Network for Fast Image Denoising | Shuhang Gu, Yawei Li, Luc Van Gool, Radu Timofte | To tackle this problem, we propose a self-guided network (SGN), which adopts a top-down self-guidance architecture to better exploit image multi-scale information. |
251 | Non-Local Intrinsic Decomposition With Near-Infrared Priors | Ziang Cheng, Yinqiang Zheng, Shaodi You, Imari Sato | In this paper, we revisit intrinsic image decomposition with the aid of near-infrared (NIR) imagery. |
252 | VideoMem: Constructing, Analyzing, Predicting Short-Term and Long-Term Video Memorability | Romain Cohendet, Claire-Helene Demarty, Ngoc Q. K. Duong, Martin Engilberge | This paper focuses on understanding the intrinsic memorability of visual content. To address this challenge, we introduce a large scale dataset (VideoMem) composed of 10,000 videos with memorability scores. |
253 | Rescan: Inductive Instance Segmentation for Indoor RGBD Scans | Maciej Halber, Yifei Shi, Kai Xu, Thomas Funkhouser | We propose an algorithm that analyzes these “rescans” to infer a temporal model of a scene with semantic instance information. |
254 | End-to-End CAD Model Retrieval and 9DoF Alignment in 3D Scans | Armen Avetisyan, Angela Dai, Matthias Niessner | We present a novel, end-to-end approach to align CAD models to an 3D scan of a scene, enabling transformation of a noisy, incomplete 3D scan to a compact, CAD reconstruction with clean, complete object geometry. |
255 | Making History Matter: History-Advantage Sequence Training for Visual Dialog | Tianhao Yang, Zheng-Jun Zha, Hanwang Zhang | To this end, inspired by the actor-critic policy gradient in reinforcement learning, we propose a novel training paradigm called History Advantage Sequence Training (HAST). |
256 | Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization | Liu Liu, Hongdong Li, Yuchao Dai | We propose a novel representation learning method having higher location-discriminating power. |
257 | Scene Graph Prediction With Limited Labels | Vincent S. Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Re, Li Fei-Fei | In this paper, we introduce a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few labeled examples. |
258 | Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded | Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh | In this work, we propose a generic approach called Human Importance-aware Network Tuning (HINT) that effectively leverages human demonstrations to improve visual grounding. |
259 | Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment | Samyak Datta, Karan Sikka, Anirban Roy, Karuna Ahuja, Devi Parikh, Ajay Divakaran | We propose a novel end-to-end model that uses caption-to-image retrieval as a downstream task to guide the process of phrase localization. |
260 | Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding | Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Dechao Meng, Qingming Huang | To address this problem, we propose a novel end-to-end adaptive reconstruction network (ARN). |
261 | Hierarchy Parsing for Image Captioning | Ting Yao, Yingwei Pan, Yehao Li, Tao Mei | In this paper, we introduce a new design to model a hierarchy from instance level (segmentation), region level (detection) to the whole image to delve into a thorough image understanding for captioning. |
262 | HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips | Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef Sivic | In this work, we propose instead to learn such embeddings from video data with readily available natural language annotations in the form of automatically transcribed narrations. |
263 | Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network | Bairui Wang, Lin Ma, Wei Zhang, Wenhao Jiang, Jingwen Wang, Wei Liu | In this paper, we propose to guide the video caption generation with Part-of-Speech (POS) information, based on a gated fusion of multiple representations of input videos. |
264 | Multi-View Stereo by Temporal Nonparametric Fusion | Yuxin Hou, Juho Kannala, Arno Solin | We propose a novel idea for depth estimation from multi-view image-pose pairs, where the model has capability to leverage information from previous latent-space encodings of the scene. |
265 | Floor-SP: Inverse CAD for Floorplans by Sequential Room-Wise Shortest Path | Jiacheng Chen, Chen Liu, Jiaye Wu, Yasutaka Furukawa | This paper proposes a new approach for automated floorplan reconstruction from RGBD scans, a major milestone in indoor mapping research. |
266 | Polarimetric Relative Pose Estimation | Zhaopeng Cui, Viktor Larsson, Marc Pollefeys | In this paper we consider the problem of relative pose estimation from two images with per-pixel polarimetric information. |
267 | Closed-Form Optimal Two-View Triangulation Based on Angular Errors | Seong Hun Lee, Javier Civera | In this paper, we study closed-form optimal solutions to two-view triangulation with known internal calibration and pose. |
268 | Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images | Haozhe Xie, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Shengping Zhang | To solve these problems, we propose a novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox. |
269 | Unsupervised Robust Disentangling of Latent Characteristics for Image Synthesis | Patrick Esser, Johannes Haux, Bjorn Ommer | We present a novel approach that learns disentangled representations of these characteristics and explains them individually. |
270 | SROBB: Targeted Perceptual Loss for Single Image Super-Resolution | Mohammad Saeed Rad, Behzad Bozorgtabar, Urs-Viktor Marti, Max Basler, Hazim Kemal Ekenel, Jean-Philippe Thiran | In this paper, we propose a novel method to benefit from perceptual loss in a more objective way. |
271 | An Internal Learning Approach to Video Inpainting | Haotian Zhang, Long Mai, Ning Xu, Zhaowen Wang, John Collomosse, Hailin Jin | We propose a novel video inpainting algorithm that simultaneously hallucinates missing appearance and motion (optical flow) information, building upon the recent ‘Deep Image Prior’ (DIP) that exploits convolutional network architectures to enforce plausible texture in static images. |
272 | Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement | Sai Bi, Kalyan Sunkavalli, Federico Perazzi, Eli Shechtman, Vladimir G. Kim, Ravi Ramamoorthi | We present a method to improve the visual realism of low-quality, synthetic images, e.g. OpenGL renderings. |
273 | Adversarial Defense via Learning to Generate Diverse Attacks | Yunseok Jang, Tianchen Zhao, Seunghoon Hong, Honglak Lee | In this work, we propose to utilize the generator to learn how to create adversarial examples. |
274 | Image Generation From Small Datasets via Batch Statistics Adaptation | Atsuhiro Noguchi, Tatsuya Harada | In this work, we propose a novel method focusing on the parameters for batch statistics, scale and shift, of the hidden layers in the generator. |
275 | Lifelong GAN: Continual Learning for Conditional Image Generation | Mengyao Zhai, Lei Chen, Frederick Tung, Jiawei He, Megha Nawhal, Greg Mori | In contrast to state-of-the-art memory replay based approaches which are limited to label-conditioned image generation tasks, a more generic framework for continual learning of generative models under different conditional image generation settings is proposed in this paper. |
276 | Bayesian Relational Memory for Semantic Visual Navigation | Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian | We introduce a new memory architecture, Bayesian Relational Memory (BRM), to improve the generalization ability for semantic visual navigation agents in unseen environments, where an agent is given a semantic target to navigate towards. |
277 | Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes | Fabian Brickwedde, Steffen Abraham, Rudolf Mester | In this paper, we propose a novel monocular 3D scene flow estimation method, called Mono-SF. |
278 | Prior Guided Dropout for Robust Visual Localization in Dynamic Environments | Zhaoyang Huang, Yan Xu, Jianping Shi, Xiaowei Zhou, Hujun Bao, Guofeng Zhang | In this paper, we propose a framework which can be generally applied to existing CNN-based pose regressors to improve their robustness in dynamic environments. |
279 | Drive&Act: A Multi-Modal Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles | Manuel Martin, Alina Roitberg, Monica Haurilet, Matthias Horne, Simon Reiss, Michael Voit, Rainer Stiefelhagen | We introduce the novel domain-specific Drive&Act benchmark for fine-grained categorization of driver behavior. Finally, we provide challenging benchmarks by adopting prominent methods for video- and body pose-based action recognition. |
280 | Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints | Yan Xu, Xinge Zhu, Jianping Shi, Guofeng Zhang, Hujun Bao, Hongsheng Li | In this paper, to regularize the depth completion and improve the robustness against noise, we propose a unified CNN framework that 1) models the geometric constraints between depth and surface normal in a diffusion module and 2) predicts the confidence of sparse LiDAR measurements to mitigate the impact of noise. |
281 | PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings | Nicholas Rhinehart, Rowan McAllister, Kris Kitani, Sergey Levine | Towards these capabilities, we present a probabilistic forecasting model of future interactions between a variable number of agents. |
282 | LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis | Zhe Liu, Shunbo Zhou, Chuanzhe Suo, Peng Yin, Wen Chen, Hesheng Wang, Haoang Li, Yun-Hui Liu | In this paper, we develop a novel deep neural network, named LPD-Net (Large-scale Place Description Network), which can extract discriminative and generalizable global descriptors from the raw 3D point cloud. |
283 | Local Supports Global: Deep Camera Relocalization With Sequence Enhancement | Fei Xue, Xin Wang, Zike Yan, Qiuyuan Wang, Junqiu Wang, Hongbin Zha | We propose to leverage the local information in a image sequence to support global camera relocalization. |
284 | Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry | Shunkai Li, Fei Xue, Xin Wang, Zike Yan, Hongbin Zha | We propose a self-supervised learning framework for visual odometry (VO) that incorporates correlation of consecutive frames and takes advantage of adversarial learning. |
285 | TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts | Ziyang Hong, Yvan Petillot, David Lane, Yishu Miao, Sen Wang | This paper proposes a novel visual place recognition algorithm, termed TextPlace, based on scene texts in the wild. |
286 | CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization | Mingyu Ding, Zhe Wang, Jiankai Sun, Jianping Shi, Ping Luo | To this end, here we present a coarse-to-fine retrieval-based deep learning framework, which includes three steps, i.e., image-based coarse retrieval, pose-based fine retrieval and precise relative pose regression. |
287 | Situational Fusion of Visual Representation for Visual Navigation | William B. Shen, Danfei Xu, Yuke Zhu, Leonidas J. Guibas, Li Fei-Fei, Silvio Savarese | We propose to train an agent to fuse a large set of visual representations that correspond to diverse visual perception abilities. |
288 | Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking | Ziyuan Huang, Changhong Fu, Yiming Li, Fuling Lin, Peng Lu | Therefore, in this work, a novel approach to repress the aberrances happening during the detection process is proposed, i.e., aberrance repressed correlation filter (ARCF). |
289 | 6-DOF GraspNet: Variational Grasp Generation for Object Manipulation | Arsalan Mousavian, Clemens Eppner, Dieter Fox | In this work, we formulate the problem of grasp generation as sampling a set of grasps using a variational autoencoder and assess and refine the sampled grasps using a grasp evaluator model. |
290 | DAGMapper: Learning to Map by Discovering Lane Topology | Namdar Homayounfar, Wei-Chiu Ma, Justin Liang, Xinyu Wu, Jack Fan, Raquel Urtasun | In contrast, in this paper we focus on drawing the lane boundaries of complex highways with many lanes that contain topology changes due to forks and merges. |
291 | 3D-LaneNet: End-to-End 3D Multiple Lane Detection | Noa Garnett, Rafi Cohen, Tomer Pe’er, Roee Lahav, Dan Levi | We introduce a network that directly predicts the 3D layout of lanes in a road scene from a single image. |
292 | Sampling-Free Epistemic Uncertainty Estimation Using Approximated Variance Propagation | Janis Postels, Francesco Ferroni, Huseyin Coskun, Nassir Navab, Federico Tombari | We present a sampling-free approach for computing the epistemic uncertainty of a neural network. |
293 | Universal Adversarial Perturbation via Prior Driven Uncertainty Approximation | Hong Liu, Rongrong Ji, Jie Li, Baochang Zhang, Yue Gao, Yongjian Wu, Feiyue Huang | In this paper, we propose a new unsupervised universal adversarial perturbation method, termed as Prior Driven Uncertainty Approximation (PD-UA), to generate a robust UAP by fully exploiting the model uncertainty at each network layer. |
294 | Understanding Deep Networks via Extremal Perturbations and Smooth Masks | Ruth Fong, Mandela Patrick, Andrea Vedaldi | In this paper, we discuss some of the shortcomings of existing approaches to perturbation analysis and address them by introducing the concept of extremal perturbations, which are theoretically grounded and interpretable. |
295 | Unsupervised Pre-Training of Image Features on Non-Curated Data | Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin | To that effect, we propose a new unsupervised approach which leverages self-supervision and clustering to capture complementary statistics from large-scale data. |
296 | Learning Local Descriptors With a CDF-Based Dynamic Soft Margin | Linguang Zhang, Szymon Rusinkiewicz | In this work, we propose a simple yet effective method to overcome the above limitations. |
297 | Bayes-Factor-VAE: Hierarchical Bayesian Deep Auto-Encoder Models for Factor Disentanglement | Minyoung Kim, Yuting Wang, Pritish Sahu, Vladimir Pavlovic | We propose a family of novel hierarchical Bayesian deep auto-encoder models capable of identifying disentangled factors of variability in data. |
298 | Linearized Multi-Sampling for Differentiable Image Transformation | Wei Jiang, Weiwei Sun, Andrea Tagliasacchi, Eduard Trulls, Kwang Moo Yi | We propose a novel image sampling method for differentiable image transformation in deep neural networks. |
299 | AdaTransform: Adaptive Data Transformation | Zhiqiang Tang, Xi Peng, Tingfeng Li, Yizhe Zhu, Dimitris N. Metaxas | In this work, we propose adaptive data transformation to achieve the two goals. |
300 | CARAFE: Content-Aware ReAssembly of FEatures | Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin | In this work, we propose Content-Aware ReAssembly of FEatures (CARAFE), a universal, lightweight and highly effective operator to fulfill this goal. |
301 | AFD-Net: Aggregated Feature Difference Learning for Cross-Spectral Image Patch Matching | Dou Quan, Xuefeng Liang, Shuang Wang, Shaowei Wei, Yanfeng Li, Ning Huyan, Licheng Jiao | To tackle these problems, we propose an aggregated feature difference learning network (AFD-Net). |
302 | Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval | Shupeng Su, Zhisheng Zhong, Chao Zhang | In this paper, we study the unsupervised deep cross-modal hash coding and propose Deep Joint-Semantics Reconstructing Hashing (DJSRH), which has the following two main advantages. |
303 | Unsupervised Neural Quantization for Compressed-Domain Similarity Search | Stanislav Morozov, Artem Babenko | In more detail, we introduce a DNN architecture for the unsupervised compressed-domain retrieval, based on multi-codebook quantization. |
304 | Siamese Networks: The Tale of Two Manifolds | Soumava Kumar Roy, Mehrtash Harandi, Richard Nock, Richard Hartley | In this paper, we study Siamese networks from a new perspective and question the validity of their training procedure. |
305 | Learning Combinatorial Embedding Networks for Deep Graph Matching | Runzhong Wang, Junchi Yan, Xiaokang Yang | To this end, this paper devises an end-to-end differentiable deep network pipeline to learn the affinity for graph matching. |
306 | Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid | Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, Wayne Zhang | To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales. |
307 | Wavelet Domain Style Transfer for an Effective Perception-Distortion Tradeoff in Single Image Super-Resolution | Xin Deng, Ren Yang, Mai Xu, Pier Luigi Dragotti | In this paper, we propose a novel method based on wavelet domain style transfer (WDST), which achieves a better PD tradeoff than the GAN based methods. |
308 | Toward Real-World Single Image Super-Resolution: A New Benchmark and a New Model | Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, Lei Zhang | In this paper, we build a real-world super-resolution (RealSR) dataset where paired LR-HR images on the same scene are captured by adjusting the focal length of a digital camera. |
309 | RankSRGAN: Generative Adversarial Networks With Ranker for Image Super-Resolution | Wenlong Zhang, Yihao Liu, Chao Dong, Yu Qiao | To address the problem, we propose Super-Resolution Generative Adversarial Networks with Ranker (RankSRGAN) to optimize generator in the direction of perceptual metrics. |
310 | Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations | Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Jiayi Ma | In this study, we propose a novel progressive fusion network for video SR, which is designed to make better use of spatio-temporal information and is proved to be more efficient and effective than the existing direct fusion, slow fusion or 3D convolution strategies. |
311 | Deep SR-ITM: Joint Learning of Super-Resolution and Inverse Tone-Mapping for 4K UHD HDR Applications | Soo Ye Kim, Jihyong Oh, Munchurl Kim | In this paper, we propose a joint super-resolution (SR) and inverse tone-mapping (ITM) framework, called Deep SR-ITM, which learns the direct mapping from LR SDR video to their HR HDR version. |
312 | Dynamic PET Image Reconstruction Using Nonnegative Matrix Factorization Incorporated With Deep Image Prior | Tatsuya Yokota, Kazuya Kawai, Muneyuki Sakata, Yuichi Kimura, Hidekata Hontani | We propose a method that reconstructs dynamic positron emission tomography (PET) images from given sinograms by using non-negative matrix factorization (NMF) incorporated with a deep image prior (DIP) for appropriately constraining the spatial patterns of resultant images. |
313 | DSIC: Deep Stereo Image Compression | Jerry Liu, Shenlong Wang, Raquel Urtasun | In this paper we tackle the problem of stereo image compression, and leverage the fact that the two images have overlapping fields of view to further compress the representations. |
314 | Variable Rate Deep Image Compression With a Conditional Autoencoder | Yoojin Choi, Mostafa El-Khamy, Jungwon Lee | In this paper, we propose a novel variable-rate learned image compression framework with a conditional autoencoder. |
315 | Real Image Denoising With Feature Attention | Saeed Anwar, Nick Barnes | To advance the practicability of the denoising algorithms, this paper proposes a novel single-stage blind real image denoising network (RIDNet) by employing a modular architecture. |
316 | Noise Flow: Noise Modeling With Conditional Normalizing Flows | Abdelrahman Abdelhamed, Marcus A. Brubaker, Michael S. Brown | This paper introduces Noise Flow, a powerful and accurate noise model based on recent normalizing flow architectures. |
317 | Bottleneck Potentials in Markov Random Fields | Ahmed Abbas, Paul Swoboda | To solve the ensuing inference problem, we propose high-quality relaxations and efficient algorithms for solving them. |
318 | Seeing Motion in the Dark | Chen Chen, Qifeng Chen, Minh N. Do, Vladlen Koltun | In this paper, we present deep processing of very dark raw videos: on the order of one lux of illuminance. To support this line of work, we collect a new dataset of raw low-light videos, in which high-resolution raw data is captured at video rate. |
319 | SENSE: A Shared Encoder Network for Scene-Flow Estimation | Huaizu Jiang, Deqing Sun, Varun Jampani, Zhaoyang Lv, Erik Learned-Miller, Jan Kautz | We introduce a compact network for holistic scene flow estimation, called SENSE, which shares common encoder features among four closely-related tasks: optical flow estimation, disparity estimation from stereo, occlusion estimation, and semantic segmentation. |
320 | Adversarial Feedback Loop | Firas Shama, Roey Mechrez, Alon Shoshan, Lihi Zelnik-Manor | In this paper we propose a novel method that makes an explicit use of the discriminator in test-time, in a feedback manner in order to improve the generator results. |
321 | Dynamic-Net: Tuning the Objective Without Re-Training for Synthesis Tasks | Alon Shoshan, Roey Mechrez, Lihi Zelnik-Manor | In this paper we present a first attempt at alleviating the need for re-training. |
322 | AutoGAN: Neural Architecture Search for Generative Adversarial Networks | Xinyu Gong, Shiyu Chang, Yifan Jiang, Zhangyang Wang | In this paper, we present the first preliminary study on introducing the NAS algorithm to generative adversarial networks (GANs), dubbed AutoGAN. |
323 | Co-Evolutionary Compression for Unpaired Image Translation | Han Shu, Yunhe Wang, Xu Jia, Kai Han, Hanting Chen, Chunjing Xu, Qi Tian, Chang Xu | To this end, we develop a novel co-evolutionary approach for reducing their memory usage and FLOPs simultaneously. |
324 | Self-Supervised Representation Learning From Multi-Domain Data | Zeyu Feng, Chang Xu, Dacheng Tao | We present an information-theoretically motivated constraint for self-supervised representation learning from multiple related domains. |
325 | Controlling Neural Networks via Energy Dissipation | Michael Moeller, Thomas Mollenhoff, Daniel Cremers | In this work we propose energy dissipating networks that iteratively compute a descent direction with respect to a given cost function or energy at the currently estimated reconstruction. |
326 | Indices Matter: Learning to Index for Deep Image Matting | Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu | By viewing the indices as a function of the feature map, we introduce the concept of ‘learning to index’, and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the pooling and upsampling operators, without extra training supervision. |
327 | LAP-Net: Level-Aware Progressive Network for Image Dehazing | Yunan Li, Qiguang Miao, Wanli Ouyang, Zhenxin Ma, Huijuan Fang, Chao Dong, Yining Quan | In this paper, we propose a level-aware progressive network (LAP-Net) for single image dehazing. |
328 | Attention Augmented Convolutional Networks | Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, Quoc V. Le | In this paper, we propose to augment convolutional networks with self-attention by concatenating convolutional feature maps with a set of feature maps produced via a novel relative self-attention mechanism. |
329 | MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning | Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Kwang-Ting Cheng, Jian Sun | In this paper, we propose a novel meta learning approach for automatic channel pruning of very deep neural networks. |
330 | Accelerate CNN via Recursive Bayesian Pruning | Yuefu Zhou, Ya Zhang, Yanfeng Wang, Qi Tian | To solve the problem, under the Bayesian framework, we here propose a layer-wise Recursive Bayesian Pruning method (RBP). |
331 | HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions | Duo Li, Aojun Zhou, Anbang Yao | In this paper, we present Harmonious Bottleneck on two Orthogonal dimensions (HBO), a novel architecture unit, specially tailored to boost the accuracy of extremely lightweight MobileNets at the level of less than 40 MFLOPs. |
332 | O2U-Net: A Simple Noisy Label Detection Approach for Deep Neural Networks | Jinchi Huang, Lie Qu, Rongfei Jia, Binqiang Zhao | This paper proposes a novel noisy label detection approach, named O2U-net, for deep neural networks without human annotations. |
333 | Continual Learning by Asymmetric Loss Approximation With Single-Side Overestimation | Dongmin Park, Seokil Hong, Bohyung Han, Kyoung Mu Lee | We propose a novel approach to continual learning by approximating a true loss function using an asymmetric quadratic function with one of its sides overestimated. |
334 | Label-PEnet: Sequential Label Propagation and Enhancement Networks for Weakly Supervised Instance Segmentation | Weifeng Ge, Sheng Guo, Weilin Huang, Matthew R. Scott | Unlike previous methods which are composed of multiple offline stages, we propose Sequential Label Propagation and Enhancement Networks (referred as Label-PEnet) that progressively transforms image-level labels to pixel-wise labels in a coarse-to-fine manner. |
335 | LIP: Local Importance-Based Pooling | Ziteng Gao, Limin Wang, Gangshan Wu | In this paper, we present a unified framework over the existing downsampling layers (e.g., average pooling, max pooling, and strided convolution) from a local importance view. |
336 | Global Feature Guided Local Pooling | Takumi Kobayashi | In this paper, we propose a flexible pooling method which adaptively tunes the pooling functionality based on input features without manually fixing it beforehand. |
337 | Conditional Coupled Generative Adversarial Networks for Zero-Shot Domain Adaptation | Jinghua Wang, Jianmin Jiang | In this paper, we tackle the challenging zero-shot domain adaptation (ZSDA) problem, where the target-domain data is non-available in the training stage. |
338 | Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks | Aamir Mustafa, Salman Khan, Munawar Hayat, Roland Goecke, Jianbing Shen, Ling Shao | To counter this, we propose to class-wise disentangle the intermediate feature representations of deep networks. |
339 | Hyperpixel Flow: Semantic Correspondence With Multi-Layer Neural Features | Juhong Min, Jongmin Lee, Jean Ponce, Minsu Cho | Taking advantage of the condensed features of hyperpixels, we develop an effective real-time matching algorithm based on Hough geometric voting. |
340 | Information Entropy Based Feature Pooling for Convolutional Neural Networks | Weitao Wan, Jiansheng Chen, Tianpeng Li, Yiqing Huang, Jingqi Tian, Cheng Yu, Youze Xue | Based on this idea, we propose the entropy-based feature weighting method for semantics-aware feature pooling which can be readily integrated into various CNN architectures for both training and inference. |
341 | Patchwork: A Patch-Wise Attention Network for Efficient Object Detection and Segmentation in Video Streams | Yuning Chai | In this paper, we explore the idea of hard attention aimed for latency-sensitive applications. |
342 | AttentionRNN: A Structured Spatial Attention Mechanism | Siddhesh Khandelwal, Leonid Sigal | In this paper we develop a novel structured spatial attention mechanism which is end-to-end trainable and can be integrated with any feed-forward convolutional neural network. |
343 | Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution | Yunpeng Chen, Haoqi Fan, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, Jiashi Feng | In this work, we propose to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially “slower” at a lower spatial resolution reducing both memory and computation cost. |
344 | Domain Intersection and Domain Difference | Sagie Benaim, Michael Khaitov, Tomer Galanti, Lior Wolf | We present a method for recovering the shared content between two visual domains as well as the content that is unique to each domain. |
345 | Learned Video Compression | Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G. Anderson, Lubomir Bourdev | We present a new algorithm for video coding, learned end-to-end for the low-latency mode. |
346 | Local Relation Networks for Image Recognition | Han Hu, Zheng Zhang, Zhenda Xie, Stephen Lin | This paper presents a new image feature extractor, called the local relation layer, that adaptively determines aggregation weights based on the compositional relationship of local pixel pairs. |
347 | DiscoNet: Shapes Learning on Disconnected Manifolds for 3D Editing | Eloi Mehr, Ariane Jourdan, Nicolas Thome, Matthieu Cord, Vincent Guitteny | In this work, we present an intelligent and user-friendly 3D editing tool, where the edited model is constrained to lie onto a learned manifold of realistic shapes. |
348 | Deep Residual Learning in the JPEG Transform Domain | Max Ehrlich, Larry S. Davis | We introduce a general method of performing Residual Network inference and learning in the JPEG transform domain that allows the network to consume compressed images as input. |
349 | Approximated Bilinear Modules for Temporal Modeling | Xinqi Zhu, Chang Xu, Langwen Hui, Cewu Lu, Dacheng Tao | We consider two less-emphasized temporal properties of video: 1. Temporal cues are fine-grained; 2. Temporal modeling needs reasoning. To tackle both problems at once, we exploit approximated bilinear modules (ABMs) for temporal modeling. |
350 | Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation | Chengchao Shen, Mengqi Xue, Xinchao Wang, Jie Song, Li Sun, Mingli Song | In this paper, we study how to exploit such heterogeneous pre-trained networks, known as teachers, so as to train a customized student network that tackles a set of selective tasks defined by the user. |
351 | Data-Free Learning of Student Networks | Hanting Chen, Yunhe Wang, Chang Xu, Zhaohui Yang, Chuanjian Liu, Boxin Shi, Chunjing Xu, Chao Xu, Qi Tian | To this end, we propose a novel framework for training efficient deep neural networks by exploiting generative adversarial networks (GANs). |
352 | Deep Closest Point: Learning Representations for Point Cloud Registration | Yue Wang, Justin M. Solomon | To address local optima and other difficulties in the ICP pipeline, we propose a learning-based method, titled Deep Closest Point (DCP), inspired by recent techniques in computer vision and natural language processing. |
353 | Orientation-Aware Semantic Segmentation on Icosahedron Spheres | Chao Zhang, Stephan Liwicki, William Smith, Roberto Cipolla | In our work, we propose an orientation-aware CNN framework for the icosahedron mesh. |
354 | Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks | Zhaoyang Zhang, Jingyu Li, Wenqi Shao, Zhanglin Peng, Ruimao Zhang, Xiaogang Wang, Ping Luo | Toward addressing this issue, we present Groupable ConvNet (GroupNet) built by using a novel dynamic grouping convolution (DGConv) operation, which is able to learn the number of groups in an end-to-end manner. |
355 | HarDNet: A Low Memory Traffic Network | Ping Chao, Chao-Yang Kao, Yu-Shan Ruan, Chien-Hsiang Huang, Youn-Long Lin | We propose a Harmonic Densely Connected Network to achieve high efficiency in terms of both low MACs and memory traffic. |
356 | Dynamic Multi-Scale Filters for Semantic Segmentation | Junjun He, Zhongying Deng, Yu Qiao | To address these problems, this paper proposes a Dynamic Multi-scale Network (DMNet) to adaptively capture multi-scale contents for predicting pixel-level semantic labels. |
357 | Online Model Distillation for Efficient Video Inference | Ravi Teja Mullapudi, Steven Chen, Keyi Zhang, Deva Ramanan, Kayvon Fatahalian | In this paper, we employ the technique of model distillation (supervising a low-cost student model using the output of a high-cost teacher) to specialize accurate, low-cost semantic segmentation models to a target video stream. We also provide a new video dataset for evaluating the efficiency of inference over long running video streams. |
358 | Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective | Kai Li, Martin Renqiang Min, Yun Fu | With this reformulation, we develop algorithms targeting various ZSL settings: For the conventional setting, we propose to train a deep neural network that directly generates visual feature classifiers from the semantic attributes with an episode-based training scheme; For the generalized setting, we concatenate the learned highly discriminative classifiers for seen classes and the generated classifiers for unseen classes to classify visual features of all classes; For the transductive setting, we exploit unlabeled data to effectively calibrate the classifier generator using a novel learning-without-forgetting self-training mechanism and guide the process by a robust generalized cross-entropy loss. |
359 | Task-Driven Modular Networks for Zero-Shot Compositional Learning | Senthil Purushwalkam, Maximilian Nickel, Abhinav Gupta, Marc’Aurelio Ranzato | To alleviate this striking difference in efficiency, we propose a task-driven modular architecture for compositional reasoning and sample efficient learning. |
360 | Transductive Episodic-Wise Adaptive Metric for Few-Shot Learning | Limeng Qiao, Yemin Shi, Jia Li, Yaowei Wang, Tiejun Huang, Yonghong Tian | To this end, we propose a Transductive Episodic-wise Adaptive Metric (TEAM) framework for few-shot learning, by integrating the meta-learning paradigm with both deep metric learning and transductive inference. |
361 | Deep Multiple-Attribute-Perceived Network for Real-World Texture Recognition | Wei Zhai, Yang Cao, Jing Zhang, Zheng-Jun Zha | To address this problem, we propose a novel deep Multiple-Attribute-Perceived Network (MAP-Net) by progressively learning visual texture attributes in a mutually reinforced manner. |
362 | RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment | Guan’an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, Zengguang Hou | Different from existing methods, in this paper, we propose a novel and end-to-end Alignment Generative Adversarial Network (AlignGAN) for the RGB-IR RE-ID task. |
363 | EvalNorm: Estimating Batch Normalization Statistics for Evaluation | Saurabh Singh, Abhinav Shrivastava | In this paper we study this peculiar behavior of BN to gain a better understanding of the problem, and identify a cause. |
364 | Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification | Jianyuan Guo, Yuhui Yuan, Lang Huang, Chao Zhang, Jin-Ge Yao, Kai Han | In this paper, we address the missed contextual cues by exploiting both the accurate human parts and the coarse non-human parts. |
365 | Person Search by Text Attribute Query As Zero-Shot Learning | Qi Dong, Shaogang Gong, Xiatian Zhu | In this work, we present a deep learning method for attribute text description based person search without any query imagery. |
366 | Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval | Qing Liu, Lingxi Xie, Huiyu Wang, Alan L. Yuille | In this paper, we investigate this problem from the viewpoint of domain adaptation which we show is critical in improving feature embedding in the zero-shot scenario. |
367 | Active Learning for Deep Detection Neural Networks | Hamed H. Aghdam, Abel Gonzalez-Garcia, Joost van de Weijer, Antonio M. Lopez | In this paper, we propose a method to perform active learning of object detectors based on convolutional neural networks. |
368 | One-Shot Neural Architecture Search via Self-Evaluated Template Network | Xuanyi Dong, Yi Yang | In this paper, we propose a Self-Evaluated Template Network (SETN) to improve the quality of the architecture candidates for evaluation so that it is more likely to cover competitive candidates. |
369 | Batch DropBlock Network for Person Re-Identification and Beyond | Zuozhuo Dai, Mingqiang Chen, Xiaodong Gu, Siyu Zhu, Ping Tan | In this paper, we propose the Batch DropBlock (BDB) Network which is a two branch network composed of a conventional ResNet-50 as the global branch and a feature dropping branch.The global branch encodes the global salient representations.Meanwhile, the feature dropping branch consists of an attentive feature learning module called Batch DropBlock, which randomly drops the same region of all input feature maps in a batch to reinforce the attentive feature learning of local regions.The network then concatenates features from both branches and provides a more comprehensive and spatially distributed feature representation. |
370 | Omni-Scale Feature Learning for Person Re-Identification | Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, Tao Xiang | In this paper, a novel deep ReID CNN is designed, termed Omni-Scale Network (OSNet), for omni-scale feature learning. |
371 | Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation | Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, Kaisheng Ma | In this paper, we propose a general training framework named self distillation, which notably enhances the performance (accuracy) of convolutional neural networks through shrinking the size of the network rather than aggrandizing it. |
372 | Diversity With Cooperation: Ensemble Methods for Few-Shot Classification | Nikita Dvornik, Cordelia Schmid, Julien Mairal | In this paper, we go a step further and show that by addressing the fundamental high-variance issue of few-shot learning classifiers, it is possible to significantly outperform current meta-learning techniques. |
373 | Enhancing 2D Representation via Adjacent Views for 3D Shape Retrieval | Cheng Xu, Zhaoqun Li, Qiang Qiu, Biao Leng, Jingfei Jiang | In this paper, we propose a convolutional neural network based method, CenterNet, to enhance each individual 2D view using its neighboring ones. |
374 | Adversarial Fine-Grained Composition Learning for Unseen Attribute-Object Recognition | Kun Wei, Muli Yang, Hao Wang, Cheng Deng, Xianglong Liu | In this paper, we propose a novel adversarial fine-grained composition learning model for unseen attribute-object pair recognition. |
375 | Auto-ReID: Searching for a Part-Aware ConvNet for Person Re-Identification | Ruijie Quan, Xuanyi Dong, Yu Wu, Linchao Zhu, Yi Yang | To solve these problems, we propose a retrieval-based search algorithm over a specifically designed reID search space, named Auto-ReID. |
376 | Second-Order Non-Local Attention Networks for Person Re-Identification | Bryan (Ning) Xia, Yuan Gong, Yizhe Zhang, Christian Poellabauer | In this paper, we propose a novel attention mechanism to directly model long-range relationships via second-order feature statistics. |
377 | Fast Computation of Content-Sensitive Superpixels and Supervoxels Using Q-Distances | Zipeng Ye, Ran Yi, Minjing Yu, Yong-Jin Liu, Ying He | In this paper, we propose a much faster queue-based graph distance (called q-distance). |
378 | Progressive-X: Efficient, Anytime, Multi-Model Fitting Algorithm | Daniel Barath, Jiri Matas | The Progressive-X algorithm, Prog-X in short, is proposed for geometric multi-model fitting. |
379 | Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection | Yingyue Xu, Dan Xu, Xiaopeng Hong, Wanli Ouyang, Rongrong Ji, Min Xu, Guoying Zhao | In this paper, we add message-passing between features and predictions and propose a deep unified CRF saliency model . |
380 | Selectivity or Invariance: Boundary-Aware Salient Object Detection | Jinming Su, Jia Li, Yu Zhang, Changqun Xia, Yonghong Tian | To address this selectivity-invariance dilemma, we propose a novel boundary-aware network with successive dilation for image-based SOD. |
381 | Online Unsupervised Learning of the 3D Kinematic Structure of Arbitrary Rigid Bodies | Urbano Miguel Nunes, Yiannis Demiris | In contrast, we propose to tackle this problem in an online unsupervised fashion, by recursively maintaining the metric distance of the scene’s 3D structure, while achieving real-time performance. |
382 | Few-Shot Generalization for Single-Image 3D Reconstruction via Priors | Bram Wallace, Bharath Hariharan | To address this problem, we present a new model architecture that reframes single-view 3D reconstruction as learnt, category agnostic refinement of a provided, category-specific prior. |
383 | Digging Into Self-Supervised Monocular Depth Estimation | Clement Godard, Oisin Mac Aodha, Michael Firman, Gabriel J. Brostow | In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. |
384 | Learning Object-Specific Distance From a Monocular Image | Jing Zhu, Yi Fang | Observing that the traditional inverse perspective mapping algorithm performs poorly for objects far away from the camera or on the curved road, in this paper, we address the challenging distance estimation problem by developing the first end-to-end learning-based model to directly predict distances for given objects in the images. |
385 | Unsupervised 3D Reconstruction Networks | Geonho Cha, Minsik Lee, Songhwai Oh | In this paper, we propose 3D unsupervised reconstruction networks (3D-URN), which reconstruct the 3D structures of instances in a given object category from their 2D feature points under an orthographic camera model. |
386 | 3D Point Cloud Generative Adversarial Network Based on Tree Structured Graph Convolutions | Dong Wook Shu, Sung Woo Park, Junseok Kwon | In this paper, we propose a novel generative adversarial network (GAN) for 3D point clouds generation, which is called tree-GAN. |
387 | Visualization of Convolutional Neural Networks for Monocular Depth Estimation | Junjie Hu, Yan Zhang, Takayuki Okatani | To cope with a difficulty with optimization through a deep CNN, we propose to use another network to predict those relevant image pixels in a forward computation. |
388 | Co-Separating Sounds of Visual Objects | Ruohan Gao, Kristen Grauman | We introduce a co-separation training paradigm that permits learning object-level sounds from unlabeled multi-source videos. |
389 | BMN: Boundary-Matching Network for Temporal Action Proposal Generation | Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, Shilei Wen | Based on BM mechanism, we propose an effective, efficient and end-to-end proposal generation method, named Boundary-Matching Network (BMN), which generates proposals with precise temporal boundaries as well as reliable confidence scores simultaneously. |
390 | Weakly Supervised Temporal Action Localization Through Contrast Based Evaluation Networks | Ziyi Liu, Le Wang, Qilin Zhang, Zhanning Gao, Zhenxing Niu, Nanning Zheng, Gang Hua | To address this challenge, we propose the Contrast-based Localization EvaluAtioN Network (CleanNet) with our new action proposal evaluator, which provides pseudo-supervision by leveraging the temporal contrast in snippet-level action classification predictions. |
391 | Progressive Sparse Local Attention for Video Object Detection | Chaoxu Guo, Bin Fan, Jie Gu, Qian Zhang, Shiming Xiang, Veronique Prinet, Chunhong Pan | Instead of relying on optical flow, this paper proposes a novel module called Progressive Sparse Local Attention (PSLA), which establishes the spatial correspondence between features across frames in a local region with progressively sparser stride and uses the correspondence to propagate features. |
392 | Reasoning About Human-Object Interactions Through Dual Attention Networks | Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou | In this work we propose a Dual Attention Network model which reasons about human-object interactions. |
393 | DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation | Xiaohui Zeng, Renjie Liao, Li Gu, Yuwen Xiong, Sanja Fidler, Raquel Urtasun | In this paper, we propose the differentiable mask-matching network (DMM-Net) for solving the video object segmentation problem where the initial object masks are provided. |
394 | Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query | Hao Wang, Cheng Deng, Junchi Yan, Dacheng Tao | To address these issues, we propose an asymmetric cross-guided attention network for actor and action video segmentation from natural language query. |
395 | AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation | Huaijia Lin, Xiaojuan Qi, Jiaya Jia | In this paper, we propose AGSS-VOS to segment multiple objects in one feed-forward path via instance-agnostic and instance-specific modules. |
396 | Global-Local Temporal Representations for Video Person Re-Identification | Jianing Li, Jingdong Wang, Qi Tian, Wen Gao, Shiliang Zhang | This paper proposes the Global-Local Temporal Representation (GLTR) to exploit the multi-scale temporal cues in video sequences for video person Re-Identification (ReID). |
397 | AdvIT: Adversarial Frames Identifier Based on Temporal Consistency in Videos | Chaowei Xiao, Ruizhi Deng, Bo Li, Taesung Lee, Benjamin Edwards, Jinfeng Yi, Dawn Song, Mingyan Liu, Ian Molloy | In this paper, we propose an efficient and effective method advIT to detect adversarial frames within videos against different types of attacks based on temporal consistency property of videos. |
398 | RANet: Ranking Attention Network for Fast Video Object Segmentation | Ziqin Wang, Jun Xu, Li Liu, Fan Zhu, Ling Shao | In this paper, we develop a real-time yet very accurate Ranking Attention Network (RANet) for VOS. |
399 | Spatial-Temporal Relation Networks for Multi-Object Tracking | Jiarui Xu, Yue Cao, Zheng Zhang, Han Hu | In this paper, we present a unified framework for similarity measurement based on spatial-temporal relation network which could simultaneously encode various cues and perform reasoning across both spatial and temporal domains. |
400 | Bridging the Gap Between Detection and Tracking: A Unified Approach | Lianghua Huang, Xin Zhao, Kaiqi Huang | In this paper, instead of redesigning a new tracking-by-detection algorithm, we aim to explore a general framework for building trackers directly upon almost any advanced object detector. |
401 | Learning the Model Update for Siamese Trackers | Lichao Zhang, Abel Gonzalez-Garcia, Joost van de Weijer, Martin Danelljan, Fahad Shahbaz Khan | Therefore, we propose to replace the handcrafted update function with a method which learns to update. |
402 | Fast-deepKCF Without Boundary Effect | Linyu Zheng, Ming Tang, Yingying Chen, Jinqiao Wang, Hanqing Lu | In order to achieve real-time tracking speed while maintaining high localization accuracy, in this paper, we propose a novel CF tracker, fdKCF*, which casts aside the popular acceleration tool, i.e., fast Fourier transform, employed by all existing CF trackers, and exploits the inherent high-overlap among real (i.e., noncyclic) and dense samples to efficiently construct the kernel matrix. |
403 | Program-Guided Image Manipulators | Jiayuan Mao, Xiuming Zhang, Yikai Li, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu | In this paper, we present the Program-Guided Image Manipulator (PG-IM), inducing neuro-symbolic program-like representations to represent and manipulate images. |
404 | Calibration of Axial Fisheye Cameras Through Generic Virtual Central Models | Pierre-Andre Brousseau, Sebastien Roy | This paper proposes a new calibration method for large field of view cameras. |
405 | Micro-Baseline Structured Light | Vishwanath Saragadam, Jian Wang, Mohit Gupta, Shree Nayar | We propose Micro-baseline Structured Light (MSL), a novel 3D imaging approach designed for small form-factor devices such as cell-phones and miniature robots. |
406 | l-Net: Reconstruct Hyperspectral Images From a Snapshot Measurement | Xin Miao, Xin Yuan, Yunchen Pu, Vassilis Athitsos | We propose the l-net, which reconstructs hyperspectral images (e.g., with 24 spectral channels) from a single shot measurement. |
407 | Deep Depth From Aberration Map | Masako Kashiwagi, Nao Mishima, Tatsuo Kozakaya, Shinsaku Hiura | In this work, we propose a novel method which realizes a single-shot deep depth measurement based on physical depth cue using only an off-the-shelf camera and lens. |
408 | A Dataset of Multi-Illumination Images in the Wild | Lukas Murmann, Michael Gharbi, Miika Aittala, Fredo Durand | We introduce a new multi-illumination dataset of more than 1000 real scenes, each captured in high dynamic range and high resolution, under 25 lighting conditions. |
409 | Monocular Neural Image Based Rendering With Continuous View Control | Xu Chen, Jie Song, Otmar Hilliges | We propose a method to produce a continuous stream of novel views under fine-grained (e.g., 1 degree step-size) camera control at interactive rates. |
410 | Multi-View Image Fusion | Marc Comino Trinidad, Ricardo Martin Brualla, Florian Kainz, Janne Kontkanen | We present a novel cascaded feature extraction method that enables us to synergetically learn optical flow at different resolution levels. |
411 | Enhancing Low Light Videos by Exploring High Sensitivity Camera Noise | Wei Wang, Xin Chen, Cheng Yang, Xiang Li, Xuemei Hu, Tao Yue | In this paper, we explore the physical origins of the practical high sensitivity noise in digital cameras, model them mathematically, and propose to enhance the low light videos based on the noise model by using an LSTM-based neural network. |
412 | Deep Restoration of Vintage Photographs From Scanned Halftone Prints | Qifan Gao, Xiao Shu, Xiaolin Wu | In this research, we adopt a novel strategy of two-stage deep learning, in which the restoration task is divided into two stages: the removal of printing artifacts and the inverse of halftoning. |
413 | Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation | Qiqi Hou, Feng Liu | This paper presents a context-aware natural image matting method for simultaneous foreground and alpha matte estimation. |
414 | CFSNet: Toward a Controllable Feature Space for Image Restoration | Wei Wang, Ruiming Guo, Yapeng Tian, Wenming Yang | This motivates us to exquisitely design a unified interactive framework for general image restoration tasks. |
415 | Deep Blind Hyperspectral Image Fusion | Wu Wang, Weihong Zeng, Yue Huang, Xinghao Ding, John Paisley | We propose a method for blind HIF problem based on deep learning, where the estimation of the observation model and fusion process are optimized iteratively and alternatingly during the super-resolution reconstruction. |
416 | Fully Convolutional Pixel Adaptive Image Denoiser | Sungmin Cha, Taesup Moon | We propose a new image denoising algorithm, dubbed as Fully Convolutional Adaptive Image DEnoiser (FC-AIDE), that can learn from an offline supervised training set with a fully convolutional neural network as well as adaptively fine-tune the supervised model for each given noisy image. |
417 | Coherent Semantic Attention for Image Inpainting | Hongyu Liu, Bin Jiang, Yi Xiao, Chao Yang | To handle this problem, we investigate the human behavior in repairing pictures and propose a fined deep generative model-based approach with a novel coherent semantic attention (CSA) layer, which can not only preserve contextual structure but also make more effective predictions of missing parts by modeling the semantic relevance between the holes features. |
418 | Embedded Block Residual Network: A Recursive Restoration Model for Single-Image Super-Resolution | Yajun Qiu, Ruxin Wang, Dapeng Tao, Jun Cheng | In this paper, we believe that the lower-frequency and higher-frequency information in images have different levels of complexity and should be restored by models of different representational capacity. |
419 | Fast Image Restoration With Multi-Bin Trainable Linear Units | Shuhang Gu, Wen Li, Luc Van Gool, Radu Timofte | In this paper we propose a novel activation function, the multi-bin trainable linear unit (MTLU), for increasing the nonlinear modeling capacity together with lighter and shallower networks. |
420 | Counting With Focus for Free | Zenglin Shi, Pascal Mettes, Cees G. M. Snoek | This paper aims to count arbitrary objects in images. |
421 | SynDeMo: Synergistic Deep Feature Alignment for Joint Learning of Depth and Ego-Motion | Behzad Bozorgtabar, Mohammad Saeed Rad, Dwarikanath Mahapatra, Jean-Philippe Thiran | In this work, we demonstrate the benefit of using geometric information from synthetic images, coupled with scene depth information, to recover the scale in depth and ego-motion estimation from monocular videos. |
422 | Diverse Image Synthesis From Semantic Layouts via Conditional IMLE | Ke Li, Tianhao Zhang, Jitendra Malik | In this paper, we focus on the problem of generating images from semantic segmentation maps and present a simple new method that can generate an arbitrary number of images with diverse appearance for the same semantic layout. |
423 | Towards Bridging Semantic Gap to Improve Semantic Segmentation | Yanwei Pang, Yazhao Li, Jianbing Shen, Ling Shao | To solve this problem, we explore two strategies for robust feature fusion. |
424 | Generating Diverse and Descriptive Image Captions Using Visual Paraphrases | Lixin Liu, Jiajun Tang, Xiaojun Wan, Zongming Guo | In this paper, aimed at improving diversity and descriptiveness characteristics of generated image captions, we propose a model utilizing visual paraphrases (different sentences describing the same image) in captioning datasets. |
425 | Learning to Collocate Neural Modules for Image Captioning | Xu Yang, Hanwang Zhang, Jianfei Cai | To render existing encoder-decoder image captioners such human-like reasoning, we propose a novel framework: learning to Collocate Neural Modules (CNM), to generate the “inner pattern” connecting visual encoder and language decoder. |
426 | Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning | Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing | To address this concern, we propose Seq-CVAE which learns a latent space for every word. |
427 | Why Does a Visual Question Have Different Answers? | Nilavra Bhattacharya, Qing Li, Danna Gurari | We propose a taxonomy of nine plausible reasons, and create two labelled datasets consisting of 45,000 visual questions indicating which reasons led to answer differences. We then propose a novel problem of predicting directly from a visual question which reasons will cause answer differences as well as a novel algorithm for this purpose. |
428 | G3raphGround: Graph-Based Language Grounding | Mohit Bajaj, Lanjun Wang, Leonid Sigal | In this paper we present an end-to-end framework for grounding of phrases in images. |
429 | Scene Text Visual Question Answering | Ali Furkan Biten, Ruben Tito, Andres Mafla, Lluis Gomez, Marcal Rusinol, Ernest Valveny, C.V. Jawahar, Dimosthenis Karatzas | In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the Visual Question Answering process. |
430 | Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM | Lu Sheng, Dan Xu, Wanli Ouyang, Xiaogang Wang | In this paper we tackle the joint learning problem of keyframe detection and visual odometry towards monocular visual SLAM systems. |
431 | MVSCRF: Learning Multi-View Stereo With Conditional Random Fields | Youze Xue, Jiansheng Chen, Weitao Wan, Yiqing Huang, Cheng Yu, Tianpeng Li, Jiayu Bao | We present a deep-learning architecture for multi-view stereo with conditional random fields (MVSCRF). |
432 | Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses | Eric Brachmann, Carsten Rother | We present Neural-Guided RANSAC (NG-RANSAC), an extension to the classic RANSAC algorithm from robust optimization. |
433 | Efficient Learning on Point Clouds With Basis Point Sets | Sergey Prokudin, Christoph Lassner, Javier Romero | In this work we propose basis point sets as a highly efficient and fully general way to process point clouds with machine learning algorithms. |
434 | Cross View Fusion for 3D Human Pose Estimation | Haibo Qiu, Chunyu Wang, Jingdong Wang, Naiyan Wang, Wenjun Zeng | We present an approach to recover absolute 3D human poses from multi-view images by incorporating multi-view geometric priors in our model. |
435 | Shape-Aware Human Pose and Shape Reconstruction Using Multi-View Images | Junbang Liang, Ming C. Lin | We propose a scalable neural network framework to reconstruct the 3D mesh of a human body from multi-view images, in the subspace of the SMPL model. |
436 | Monocular Piecewise Depth Estimation in Dynamic Scenes by Exploiting Superpixel Relations | Yan Di, Henrique Morimitsu, Shan Gao, Xiangyang Ji | In this paper, we propose a novel and specially designed method for piecewise dense monocular depth estimation in dynamic scenes. |
437 | Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization | Hajime Taira, Ignacio Rocco, Jiri Sedlar, Masatoshi Okutomi, Josef Sivic, Tomas Pajdla, Torsten Sattler, Akihiko Torii | In this paper, we thus focus on pose verification. |
438 | DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch | Shivam Duggal, Shenlong Wang, Wei-Chiu Ma, Rui Hu, Raquel Urtasun | Our goal is to significantly speed up the runtime of current state-of-the-art stereo algorithms to enable real-time inference. |
439 | Convolutional Sequence Generation for Skeleton-Based Action Synthesis | Sijie Yan, Zhizhong Li, Yuanjun Xiong, Huahan Yan, Dahua Lin | In this work, we aim to generate long actions represented as sequences of skeletons. |
440 | Onion-Peel Networks for Deep Video Completion | Seoung Wug Oh, Sungho Lee, Joon-Young Lee, Seon Joo Kim | We propose the onion-peel networks for video completion. |
441 | Copy-and-Paste Networks for Deep Video Inpainting | Sungho Lee, Seoung Wug Oh, DaeYeun Won, Seon Joo Kim | We present a novel deep learning based algorithm for video inpainting. |
442 | Content and Style Disentanglement for Artistic Style Transfer | Dmytro Kotovenko, Artsiom Sanakoyeu, Sabine Lang, Bjorn Ommer | We present a novel approach which captures particularities of style and the variations within and separates style and content. |
443 | Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? | Rameen Abdal, Yipeng Qin, Peter Wonka | We propose an efficient algorithm to embed a given image into the latent space of StyleGAN. |
444 | Controllable Artistic Text Style Transfer via Shape-Matching GAN | Shuai Yang, Zhangyang Wang, Zhaowen Wang, Ning Xu, Jiaying Liu, Zongming Guo | In this paper, we present the first text style transfer network that allows for real-time control of the crucial stylistic degree of the glyph through an adjustable parameter. |
445 | Understanding Generalized Whitening and Coloring Transform for Universal Style Transfer | Tai-Yin Chiu | In this report, we generalize ZCA to the general form of WCT, provide an analytical performance analysis from the angle of neural style transfer, and show why ZCA is a good choice for style transfer among different WCTs and why some WCTs are not well applicable for style transfer. |
446 | Learning Implicit Generative Models by Matching Perceptual Features | Cicero Nogueira dos Santos, Youssef Mroueh, Inkit Padhi, Pierre Dognin | More specifically, we propose a new effective MM approach that learns implicit generative models by performing mean and covariance matching of features extracted from pretrained ConvNets. |
447 | Free-Form Image Inpainting With Gated Convolution | Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang | We present a generative image inpainting system to complete images with free-form mask and guidance. |
448 | FiNet: Compatible and Diverse Fashion Image Inpainting | Xintong Han, Zuxuan Wu, Weilin Huang, Matthew R. Scott, Larry S. Davis | In this paper, we propose to explicitly model visual compatibility through fashion image inpainting. |
449 | InGAN: Capturing and Retargeting the “DNA” of a Natural Image | Assaf Shocher, Shai Bagon, Phillip Isola, Michal Irani | In this paper we propose an “Internal GAN” (InGAN) — an image-specific GAN — which trains on a single input image and learns its internal distribution of patches. |
450 | Seeing What a GAN Cannot Generate | David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, Antonio Torralba | In this work, we visualize mode collapse at both the distribution level and the instance level. |
451 | COCO-GAN: Generation by Parts via Conditional Coordinating | Chieh Hubert Lin, Chia-Che Chang, Yu-Sheng Chen, Da-Cheng Juan, Wei Wei, Hwann-Tzong Chen | Inspired by such behavior and the fact that machines also have computational constraints, we propose COnditional COordinate GAN (COCO-GAN) of which the generator generates images by parts based on their spatial coordinates as the condition. |
452 | Neural Turtle Graphics for Modeling City Road Layouts | Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, Antonio Torralba, Sanja Fidler | We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts. |
453 | Texture Fields: Learning Texture Representations in Function Space | Michael Oechsle, Lars Mescheder, Michael Niemeyer, Thilo Strauss, Andreas Geiger | In this paper, we propose Texture Fields, a novel texture representation which is based on regressing a continuous 3D function parameterized with a neural network. |
454 | PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows | Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, Bharath Hariharan | This paper proposes a principled probabilistic framework to generate 3D point clouds by modeling them as a distribution of distributions. |
455 | Meta-Sim: Learning to Generate Synthetic Datasets | Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, Sanja Fidler | The goal of our work is to automatically synthesize labeled datasets that are relevant for a downstream task. |
456 | Specifying Object Attributes and Relations in Interactive Scene Generation | Oron Ashual, Lior Wolf | We introduce a method for the generation of images from an input scene graph. |
457 | SinGAN: Learning a Generative Model From a Single Natural Image | Tamar Rott Shaham, Tali Dekel, Tomer Michaeli | We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. |
458 | VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research | Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, William Yang Wang | We present a new large-scale multilingual video description dataset, VATEX, which contains over 41,250 videos and 825,000 captions in both English and Chinese. |
459 | A Graph-Based Framework to Bridge Movies and Synopses | Yu Xiong, Qingqiu Huang, Lingfeng Guo, Hang Zhou, Bolei Zhou, Dahua Lin | On top of this dataset, we develop a framework to perform matching between movie segments and synopsis paragraphs. To facilitate the efforts along this direction, we construct a dataset called Movie Synopses Associations (MSA) over 327 movies, which provides a synopsis for each movie, together with annotated associations between synopsis paragraphs and movie segments. |
460 | From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason | Ajeet Kumar Singh, Anand Mishra, Shashank Shekhar, Anirban Chakraborty | In this work, we present a VQA model which can read scene texts and perform reasoning on a knowledge graph to arrive at an accurate answer. |
461 | Counterfactual Critic Multi-Agent Training for Scene Graph Generation | Long Chen, Hanwang Zhang, Jun Xiao, Xiangnan He, Shiliang Pu, Shih-Fu Chang | To this end, we propose a Counterfactual critic Multi-Agent Training (CMAT) approach. |
462 | Robust Change Captioning | Dong Huk Park, Trevor Darrell, Anna Rohrbach | We present a novel Dual Dynamic Attention Model (DUDA) to perform robust Change Captioning. To study the problem in depth, we collect a CLEVR-Change dataset, built off the CLEVR engine, with 5 types of scene changes. |
463 | Attention on Attention for Image Captioning | Lun Huang, Wenmin Wang, Jie Chen, Xiao-Yong Wei | In this paper, we propose an Attention on Attention (AoA) module, which extends the conventional attention mechanisms to determine the relevance between attention results and queries. |
464 | Dynamic Graph Attention for Referring Expression Comprehension | Sibei Yang, Guanbin Li, Yizhou Yu | In this paper, we explore the problem of referring expression comprehension from the perspective of language-driven visual reasoning, and propose a dynamic graph attention network to perform multi-step reasoning by modeling both the relationships among the objects in the image and the linguistic structure of the expression. |
465 | Visual Semantic Reasoning for Image-Text Matching | Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, Yun Fu | To address this issue, we propose a simple and interpretable reasoning model to generate visual representation that captures key objects and semantic concepts of a scene. |
466 | Phrase Localization Without Paired Training Examples | Josiah Wang, Lucia Specia | We postulate that such paired annotations are unnecessary, and propose the first method for the phrase localization problem where neither training procedure nor paired, task-specific data is required. |
467 | Learning to Assemble Neural Module Tree Networks for Visual Grounding | Daqing Liu, Hanwang Zhang, Feng Wu, Zheng-Jun Zha | In this paper, we propose to ground natural language in an intuitive, explainable, and composite fashion as it should be. |
468 | A Fast and Accurate One-Stage Approach to Visual Grounding | Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, Jiebo Luo | We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight. |
469 | Zero-Shot Grounding of Objects From Natural Language Queries | Arka Sadhu, Kan Chen, Ram Nevatia | We propose a new single-stage model called ZSGNet which combines the detector network and the grounding system and predicts classification scores and regression parameters. We also introduce new datasets, sub-sampled from Flickr30k Entities and Visual Genome, that enable evaluations for the four conditions. |
470 | Towards Unconstrained End-to-End Text Spotting | Siyang Qin, Alessandro Bissacco, Michalis Raptis, Yasuhisa Fujii, Ying Xiao | We propose an end-to-end trainable network that can simultaneously detect and recognize text of arbitrary shape, making substantial progress on the open problem of reading scene text of irregular shape. |
471 | What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis | Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee | This paper addresses this difficulty with three major contributions. First, we examine the inconsistencies of training and evaluation datasets, and the performance gap results from inconsistencies. Second, we introduce a unified four-stage STR framework that most existing STR models fit into. |
472 | Sparse and Imperceivable Adversarial Attacks | Francesco Croce, Matthias Hein | We propose a new black-box technique to craft adversarial examples aiming at minimizing l_0-distance to the original image. |
473 | Enhancing Adversarial Example Transferability With an Intermediate Level Attack | Qian Huang, Isay Katsman, Horace He, Zeqi Gu, Serge Belongie, Ser-Nam Lim | We introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model, improving upon state-of-the-art methods. |
474 | Implicit Surface Representations As Layers in Neural Networks | Mateusz Michalkiewicz, Jhony K. Pontes, Dominic Jack, Mahsa Baktashmotlagh, Anders Eriksson | To overcome this limitation we propose a novel formulation that permits the use of implicit representations of curves and surfaces, of arbitrary topology, as individual layers in Neural Network architectures with end-to-end trainability. |
475 | A Tour of Convolutional Networks Guided by Linear Interpreters | Pablo Navarrete Michelini, Hanwen Liu, Yunhua Lu, Xingqun Jiang | We introduce a hooking layer, called a LinearScope, which allows us to run the network and the linear interpreter in parallel. |
476 | Small Steps and Giant Leaps: Minimal Newton Solvers for Deep Learning | Joao F. Henriques, Sebastien Ehrhardt, Samuel Albanie, Andrea Vedaldi | We propose a fast second-order method that can be used as a drop-in replacement for current deep learning solvers. |
477 | Semantic Adversarial Attacks: Parametric Transformations That Fool Deep Classifiers | Ameya Joshi, Amitangshu Mukherjee, Soumik Sarkar, Chinmay Hegde | In this paper, we consider a different setting: what happens if the adversary could only alter specific attributes of the input image? |
478 | Hilbert-Based Generative Defense for Adversarial Examples | Yang Bai, Yan Feng, Yisen Wang, Tao Dai, Shu-Tao Xia, Yong Jiang | Therefore, we propose a more advanced Hilbert curve scan order to model the pixel dependencies in this paper. |
479 | On the Efficacy of Knowledge Distillation | Jang Hyun Cho, Bharath Hariharan | In this paper, we present a thorough evaluation of the efficacy of knowledge distillation and its dependence on student and teacher architectures. |
480 | Sym-Parameterized Dynamic Inference for Mixed-Domain Image Translation | Simyung Chang, SeongUk Park, John Yang, Nojun Kwak | We propose a method to expand the concept of `multi-domain’ from data to the loss area, and to combine the characteristics of each domain to create an image. |
481 | Better and Faster: Exponential Loss for Image Patch Matching | Shuang Wang, Yanfeng Li, Xuefeng Liang, Dou Quan, Bowu Yang, Shaowei Wei, Licheng Jiao | To assist the exponential losses, we introduce the hard positive sample mining to further enhance the effectiveness. |
482 | Physical Adversarial Textures That Fool Visual Object Tracking | Rey Reza Wiyatno, Anqi Xu | We present a method for creating inconspicuous-looking textures that, when displayed as posters in the physical world, cause visual object tracking systems to become confused. |
483 | Wasserstein GAN With Quadratic Transport Cost | Huidong Liu, Xianfeng Gu, Dimitris Samaras | In this paper, we propose WGAN-QC, a WGAN with quadratic transport cost. |
484 | Scalable Verified Training for Provably Robust Image Classification | Sven Gowal, Krishnamurthy (Dj) Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, Pushmeet Kohli | Through a comprehensive analysis, we show how a simple bounding technique, interval bound propagation (IBP), can be exploited to train large provably robust neural networks that beat the state-of-the-art in verified accuracy. |
485 | Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks | Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, Junjie Yan | To address this problem, in this paper we propose Differentiable Soft Quantization (DSQ) to bridge the gap between the full-precision and low-bit networks. |
486 | The LogBarrier Adversarial Attack: Making Effective Use of Decision Boundary Information | Chris Finlay, Aram-Alexandre Pooladian, Adam Oberman | We design a new untargeted attack, based on these best practices, using the well-regarded logarithmic barrier method. |
487 | Proximal Mean-Field for Neural Network Quantization | Thalaiyasingam Ajanthan, Puneet K. Dokania, Richard Hartley, Philip H. S. Torr | In this work, we cast NN quantization as a discrete labelling problem, and by examining relaxations, we design an efficient iterative optimization procedure that involves stochastic gradient descent followed by a projection. |
488 | Improving Adversarial Robustness via Guided Complement Entropy | Hao-Yun Chen, Jhao-Hong Liang, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan | In this paper, we propose a new training paradigm called Guided Complement Entropy (GCE) that is capable of achieving “adversarial defense for free,” which involves no additional procedures in the process of improving adversarial robustness. |
489 | A Geometry-Inspired Decision-Based Attack | Yujia Liu, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard | In this paper, we propose qFool, a novel decision-based attack algorithm that can generate adversarial examples using a small number of queries. |
490 | Universal Perturbation Attack Against Image Retrieval | Jie Li, Rongrong Ji, Hong Liu, Xiaopeng Hong, Yue Gao, Qi Tian | To this end, we propose a novel method to generate retrieval-against UAP to break the neighbourhood relationships of image features via degrading the corresponding ranking metric. |
491 | Bayesian Optimized 1-Bit CNNs | Jiaxin Gu, Junhe Zhao, Xiaolong Jiang, Baochang Zhang, Jianzhuang Liu, Guodong Guo, Rongrong Ji | In this paper, we propose a novel approach, called Bayesian optimized 1-bit CNNs (denoted as BONNs), taking the advantage of Bayesian learning, a well-established strategy for hard problems, to significantly improve the performance of extreme 1-bit CNNs. |
492 | Rethinking ImageNet Pre-Training | Kaiming He, Ross Girshick, Piotr Dollar | We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization. |
493 | Defending Against Universal Perturbations With Shared Adversarial Training | Chaithanya Kumar Mummadi, Thomas Brox, Jan Hendrik Metzen | In this work, we show that adversarial training is more effective in preventing universal perturbations, where the same perturbation needs to fool a classifier on many inputs. |
494 | Adaptive Activation Thresholding: Dynamic Routing Type Behavior for Interpretability in Convolutional Neural Networks | Yiyou Sun, Sathya N. Ravi, Vikas Singh | In this paper, we show how a simple modification of the SGD scheme can help provide dynamic/EM routing type behavior in convolutional neural networks. |
495 | XRAI: Better Attributions Through Regions | Andrei Kapishnikov, Tolga Bolukbasi, Fernanda Viegas, Michael Terry | In this paper, we 1) present a novel region-based attribution method, XRAI, that builds upon integrated gradients (Sundararajan et al. 2017), 2) introduce evaluation methods for empirically assessing the quality of image-based saliency maps (Performance Information Curves (PICs)), and 3) contribute an axiom-based sanity check for attribution methods. |
496 | Guessing Smart: Biased Sampling for Efficient Black-Box Adversarial Attacks | Thomas Brunner, Frederik Diehl, Michael Truong Le, Alois Knoll | We consider adversarial examples for image classification in the black-box decision-based setting. |
497 | Mask-Guided Attention Network for Occluded Pedestrian Detection | Yanwei Pang, Jin Xie, Muhammad Haris Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, Ling Shao | We propose an approach for occluded pedestrian detection with the following contributions. |
498 | Spectral Feature Transformation for Person Re-Identification | Chuanchen Luo, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang | To relieve the issue, we propose to formulate the whole data batch as a similarity graph. |
499 | Permutation-Invariant Feature Restructuring for Correlation-Aware Image Set-Based Recognition | Xiaofeng Liu, Zhenhua Guo, Site Li, Lingsheng Kong, Ping Jia, Jane You, B.V.K. Vijaya Kumar | We consider the problem of comparing the similarity of image sets with variable-quantity, quality and un-ordered heterogeneous images. |
500 | Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization | Chufeng Tang, Lu Sheng, Zhaoxiang Zhang, Xiaolin Hu | We propose a flexible Attribute Localization Module (ALM) to adaptively discover the most discriminative regions and learns the regional features for each attribute at multiple levels. |
501 | Correlation Congruence for Knowledge Distillation | Baoyun Peng, Xiao Jin, Jiaheng Liu, Dongsheng Li, Yichao Wu, Yu Liu, Shunfeng Zhou, Zhaoning Zhang | In this work, we propose a new framework named correlation congruence for knowledge distillation (CCKD), which transfers not only the instance-level information but also the correlation between instances. |
502 | Dynamic Curriculum Learning for Imbalanced Data Classification | Yiru Wang, Weihao Gan, Jie Yang, Wei Wu, Junjie Yan | To address this problem, we propose a unified framework called Dynamic Curriculum Learning (DCL) to adaptively adjust the sampling strategy and loss weight in each batch, which results in better ability of generalization and discrimination. |
503 | Video Face Clustering With Unknown Number of Clusters | Makarand Tapaswi, Marc T. Law, Sanja Fidler | To this end, we propose Ball Cluster Learning (BCL), a supervised approach to carve the embedding space into balls of equal size, one for each cluster. |
504 | Targeted Mismatch Adversarial Attack: Query With a Flower to Retrieve the Tower | Giorgos Tolias, Filip Radenovic, Ondrej Chum | We introduce the concept of targeted mismatch attack for deep learning based retrieval systems to generate an adversarial image to conceal the query image. |
505 | Fashion++: Minimal Edits for Outfit Improvement | Wei-Lin Hsiao, Isay Katsman, Chao-Yuan Wu, Devi Parikh, Kristen Grauman | We introduce Fashion++, an approach that proposes minimal adjustments to a full-body clothing outfit that will have maximal impact on its fashionability. |
506 | Semi-Supervised Pedestrian Instance Synthesis and Detection With Mutual Reinforcement | Si Wu, Sihao Lin, Wenhao Wu, Mohamed Azzam, Hau-San Wong | We propose a GAN-based scene-specific instance synthesis and classification model for semi-supervised pedestrian detection. |
507 | SILCO: Show a Few Images, Localize the Common Object | Tao Hu, Pascal Mettes, Jia-Hong Huang, Cees G. M. Snoek | In this work, we propose a new task along this research direction, we call few-shot common-localization. |
508 | A Deep Step Pattern Representation for Multimodal Retinal Image Registration | Jimmy Addison Lee, Peng Liu, Jun Cheng, Huazhu Fu | This paper presents a novel feature-based method that is built upon a convolutional neural network (CNN) to learn the deep representation for multimodal retinal image registration. |
509 | Deep Graphical Feature Learning for the Feature Matching Problem | Zhen Zhang, Wee Sun Lee | In this paper, we address this problem by proposing a graph neural network model to transform coordinates of feature points into local features. |
510 | Minimum Delay Object Detection From Video | Dong Lao, Ganesh Sundaramoorthi | We consider the problem of detecting objects, as they come into view, from videos in an online fashion. |
511 | Learning With Average Precision: Training Image Retrieval With a Listwise Loss | Jerome Revaud, Jon Almazan, Rafael S. Rezende, Cesar Roberto de Souza | In this paper we propose instead to directly optimize the global mAP by leveraging recent advances in listwise loss formulations. |
512 | Learning to Find Common Objects Across Few Image Collections | Amirreza Shaban, Amir Rahimi, Shray Bansal, Stephen Gould, Byron Boots, Richard Hartley | Given a collection of bags where each bag is a set of images, our goal is to select one image from each bag such that the selected images are from the same object class. |
513 | Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection | Lu Zhang, Xiangyu Zhu, Xiangyu Chen, Xu Yang, Zhen Lei, Zhiyong Liu | In this paper, we propose a novel Aligned Region CNN (AR-CNN) to handle the weakly aligned multispectral data in an end-to-end way. |
514 | Deep Self-Learning From Noisy Labels | Jiangfan Han, Ping Luo, Xiaogang Wang | Unlike previous works constrained by many conditions, making them infeasible to real noisy cases, this work presents a novel deep self-learning framework to train a robust network on the real noisy datasets without extra supervision. |
515 | DSConv: Efficient Convolution Operator | Marcelo Gennari do Nascimento, Roger Fawcett, Victor Adrian Prisacariu | We introduce DSConv, a flexible quantized convolution operator that replaces single-precision operations with their far less expensive integer counterparts, while maintaining the probability distributions over both the kernel weights and the outputs. |
516 | Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once | Jiangfan Han, Xiaoyi Dong, Ruimao Zhang, Dongdong Chen, Weiming Zhang, Nenghai Yu, Ping Luo, Xiaogang Wang | In this paper, we propose the first Multi-target Adversarial Network (MAN), which can generate multi-target adversarial samples with a single model. |
517 | Explicit Shape Encoding for Real-Time Instance Segmentation | Wenqiang Xu, Haiyang Wang, Fubo Qi, Cewu Lu | In this paper, we propose a novel top-down instance segmentation framework based on explicit shape encoding, named ESE-Seg. |
518 | IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things | Cheng-Yang Fu, Tamara L. Berg, Alexander C. Berg | In this work, we present a new operator, called Instance Mask Projection (IMP), which projects a predicted instance segmentation as a new feature for semantic segmentation. |
519 | Video Instance Segmentation | Linjie Yang, Yuchen Fan, Ning Xu | In this paper we present a new computer vision task, named video instance segmentation. To facilitate research on this new task, we propose a large-scale benchmark called YouTube-VIS, which consists of 2,883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks. |
520 | Attention Bridging Network for Knowledge Transfer | Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, Yun Fu | In this paper, we use knowledge from the source domain to guide the network’s response to categories shared with the target domain. |
521 | Self-Supervised Difference Detection for Weakly-Supervised Semantic Segmentation | Wataru Shimoda, Keiji Yanai | In this paper, to make the most of such mapping functions, we assume that the results of the mapping function include noise, and we improve the accuracy by removing noise. |
522 | SPGNet: Semantic Prediction Guidance for Scene Parsing | Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, Jinjun Xiong, Thomas S. Huang, Wen-Mei Hwu, Honghui Shi | In this work, we propose a Semantic Prediction Guidance (SPG) module which learns to re-weight the local features through the guidance from pixel-wise semantic prediction. |
523 | Gated-SCNN: Gated Shape CNNs for Semantic Segmentation | Towaki Takikawa, David Acuna, Varun Jampani, Sanja Fidler | Here, we propose a new two-stream CNN architecture for semantic segmentation that explicitly wires shape information as a separate processing branch, i.e. shape stream, that processes information in parallel to the classical stream. |
524 | DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing | Yongcheng Liu, Bin Fan, Gaofeng Meng, Jiwen Lu, Shiming Xiang, Chunhong Pan | Here we propose DensePoint, a general architecture to learn densely contextual representation for point cloud processing. |
525 | AMP: Adaptive Masked Proxies for Few-Shot Segmentation | Mennatullah Siam, Boris N. Oreshkin, Martin Jagersand | We propose a novel adaptive masked proxies method that constructs the final segmentation layer weights from few labelled samples. |
526 | Universal Semi-Supervised Semantic Segmentation | Tarun Kalluri, Girish Varma, Manmohan Chandraker, C.V. Jawahar | In this paper, we pose the novel problem of universal semi-supervised semantic segmentation and propose a solution framework, to meet the dual needs of lower annotation and deployment costs. |
527 | Accelerate Learning of Deep Hashing With Gradient Attention | Long-Kai Huang, Jianda Chen, Sinno Jialin Pan | To address this issue, we propose a new deep hashing model integrated with a novel gradient attention mechanism. |
528 | SVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval | Qing-Yuan Jiang, Yi He, Gen Li, Jian Lin, Lei Li, Wu-Jun Li | In this paper, we introduce a large-scale short video dataset, called SVD, for the NDVR task. |
529 | Block Annotation: Better Image Annotation With Sub-Image Decomposition | Hubert Lin, Paul Upchurch, Kavita Bala | To recover the necessary global structure for applications such as characterizing spatial context and affordance relationships, we propose an effective method to inpaint block-annotated images with high-quality labels without additional human effort. |
530 | Probabilistic Deep Ordinal Regression Based on Gaussian Processes | Yanzhu Liu, Fan Wang, Adams Wai Kin Kong | This paper adapts traditional GPs regression for ordinal regression problem by using both conjugate and non-conjugate ordinal likelihood. |
531 | Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations | Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, Vicente Ordonez | In this work, we present a framework to measure and mitigate intrinsic biases with respect to protected variables -such as gender- in visual recognition tasks. |
532 | Teacher Guided Architecture Search | Pouya Bashivan, Mark Tensen, James J. DiCarlo | As one step toward this goal, we use representational similarity analysis to evaluate the similarity of internal activations of candidate networks with those of a (fixed, high performing) teacher network. |
533 | FACSIMILE: Fast and Accurate Scans From an Image in Less Than a Second | David Smith, Matthew Loper, Xiaochen Hu, Paris Mavroidis, Javier Romero | We propose FACSIMILE (FAX), a method that estimates a detailed body from a single photo, lowering the bar for creating virtual representations of humans. |
534 | Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild | Yu Rong, Ziwei Liu, Cheng Li, Kaidi Cao, Chen Change Loy | In this work, we aim to perform a comprehensive study on cost and effectiveness trade-off between different annotations. |
535 | Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation | Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, Tao Mei | We describe an end-to-end method for recovering 3D human body mesh from single images and monocular videos. |
536 | Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture From Images “In the Wild” | Silvia Zuffi, Angjoo Kanazawa, Tanya Berger-Wolf, Michael J. Black | We present the first method to perform automatic 3D pose, shape and texture capture of animals from images acquired in-the-wild. |
537 | Object-Driven Multi-Layer Scene Decomposition From a Single Image | Helisa Dhamo, Nassir Navab, Federico Tombari | We present a method that tackles the challenge of predicting color and depth behind the visible content of an image. |
538 | Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics | Michael Niemeyer, Lars Mescheder, Michael Oechsle, Andreas Geiger | In this work, we present Occupancy Flow, a novel spatio-temporal representation of time-varying 3D geometry with implicit correspondences. |
539 | Joint Monocular 3D Vehicle Detection and Tracking | Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krahenbuhl, Trevor Darrell, Fisher Yu | In this paper, we propose a novel online framework for 3D vehicle detection and tracking from monocular videos. |
540 | Fingerspelling Recognition in the Wild With Iterative Visual Attention | Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Diane Brentari, Greg Shakhnarovich, Karen Livescu | In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media. We also introduce a newly collected data set of crowdsourced annotations of fingerspelling in the wild, and show that performance can be further improved with this additional data set. |
541 | PointAE: Point Auto-Encoder for 3D Statistical Shape and Texture Modelling | Hang Dai, Ling Shao | In this paper, we propose a Point Auto-Encoder (PointAE) with skip-connection, attention blocks for 3D statistical shape modelling directly on 3D points. |
542 | Multi-Garment Net: Learning to Dress 3D People From Images | Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, Gerard Pons-Moll | We present Multi-Garment Network (MGN), a method to predict body shape and clothing, layered on top of the SMPL model from a few frames (1-8) of a video. |
543 | Skeleton-Aware 3D Human Shape Reconstruction From Point Clouds | Haiyong Jiang, Jianfei Cai, Jianmin Zheng | Particularly, we develop an end-to-end framework, where we propose a graph aggregation module to augment PointNet++ by extracting better point features, an attention module to better map unordered point features into ordered skeleton joint features, and a skeleton graph module to extract better joint features for SMPL parameter regression. |
544 | AMASS: Archive of Motion Capture As Surface Shapes | Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, Michael J. Black | To address this, we introduce AMASS, a large and varied database of human motion that unifies 15 different optical marker-based mocap datasets by representing them within a common framework and parameterization. |
545 | Person-in-WiFi: Fine-Grained Person Perception Using WiFi | Fei Wang, Sanping Zhou, Stanislav Panev, Jinsong Han, Dong Huang | In this paper, we take one step forward to show that fine-grained person perception is possible even with 1D sensors: WiFi antennas. |
546 | FAB: A Robust Facial Landmark Detection Framework for Motion-Blurred Videos | Keqiang Sun, Wayne Wu, Tinghao Liu, Shuo Yang, Quan Wang, Qiang Zhou, Zuochang Ye, Chen Qian | In this paper, we propose a framework named FAB that takes advantage of structure consistency in the temporal dimension for facial landmark detection in motion-blurred videos. |
547 | Attentional Feature-Pair Relation Networks for Accurate Face Recognition | Bong-Nam Kang, Yonghyun Kim, Bongjin Jun, Daijin Kim | In this paper, we propose a novel face recognition method, called Attentional Feature-pair Relation Network (AFRN), which represents the face by the relevant pairs of local appearance block features with their attention scores. |
548 | Action Recognition With Spatial-Temporal Discriminative Filter Banks | Brais Martinez, Davide Modolo, Yuanjun Xiong, Joseph Tighe | In this work we focus on how to improve the representation capacity of the network, but rather than altering the backbone, we focus on improving the last layers of the network, where changes have low impact in terms of computational cost. With the proposed approach, we obtain state-of-the-art performance on Kinetics-400 and Something-Something-V1, the two major large-scale action recognition benchmarks. |
549 | EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition | Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen | We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multi-modal temporal-binding, i.e. the combination of modalities within a range of temporal offsets. |
550 | Weakly-Supervised Action Localization With Background Modeling | Phuc Xuan Nguyen, Deva Ramanan, Charless C. Fowlkes | We describe a latent approach that learns to detect actions in long sequences given training videos with only whole-video class labels. |
551 | Grouped Spatial-Temporal Aggregation for Efficient Action Recognition | Chenxu Luo, Alan L. Yuille | In this paper, we propose a novel decomposition method that decomposes the feature channels into spatial and temporal groups in parallel. |
552 | Temporal Structure Mining for Weakly Supervised Action Detection | Tan Yu, Zhou Ren, Yuncheng Li, Enxu Yan, Ning Xu, Junsong Yuan | To alleviate this problem in WSAD, we propose the temporal structure mining (TSM) approach. |
553 | Temporal Recurrent Networks for Online Action Detection | Mingze Xu, Mingfei Gao, Yi-Ting Chen, Larry S. Davis, David J. Crandall | In this paper, we propose a novel framework, the Temporal Recurrent Network (TRN), to model greater temporal context of each frame by simultaneously performing online action detection and anticipation of the immediate future. |
554 | StartNet: Online Detection of Action Start in Untrimmed Videos | Mingfei Gao, Mingze Xu, Larry S. Davis, Richard Socher, Caiming Xiong | We propose StartNet to address Online Detection of Action Start (ODAS) where action starts and their associated categories are detected in untrimmed, streaming videos. |
555 | Video Classification With Channel-Separated Convolutional Networks | Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli | This paper studies the effects of different design choices in 3D group convolutional networks for video classification. |
556 | Predicting the Future: A Jointly Learnt Model for Action Anticipation | Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes | Inspired by human neurological structures for action anticipation, we present an action anticipation model that enables the prediction of plausible future actions by forecasting both the visual and temporal future. |
557 | Human-Aware Motion Deblurring | Ziyi Shen, Wenguan Wang, Xiankai Lu, Jianbing Shen, Haibin Ling, Tingfa Xu, Ling Shao | This paper proposes a human-aware deblurring model that disentangles the motion blur between foreground (FG) humans and background (BG). To further benefit the research towards Human-aware Image Deblurring, we introduce a large-scale dataset, named HIDE, which consists of 8,422 blurry and sharp image pairs with 65,784 densely annotated FG human bounding boxes. |
558 | Fast Video Object Segmentation via Dynamic Targeting Network | Lu Zhang, Zhe Lin, Jianming Zhang, Huchuan Lu, You He | We propose a new model for fast and accurate video object segmentation. |
559 | Solving Vision Problems via Filtering | Sean I. Young, Aous T. Naman, Bernd Girod, David Taubman | We propose a new, filtering approach for solving a large number of regularized inverse problems commonly found in computer vision. |
560 | GAN-Based Projector for Faster Recovery With Convergence Guarantees in Linear Inverse Problems | Ankit Raj, Yuqi Li, Yoram Bresler | Here, we propose a new method of deploying a GAN-based prior to solve linear inverse problems using projected gradient descent (PGD). |
561 | Scoot: A Perceptual Metric for Facial Sketches | Deng-Ping Fan, ShengChuan Zhang, Yu-Huan Wu, Yun Liu, Ming-Ming Cheng, Bo Ren, Paul L. Rosin, Rongrong Ji | In this paper, we design a perceptual metric, called Structure Co-Occurrence Texture (Scoot), which simultaneously considers the block-level spatial structure and co-occurrence texture statistics. |
562 | Learning Filter Basis for Convolutional Neural Network Compression | Yawei Li, Shuhang Gu, Luc Van Gool, Radu Timofte | Thus, in this paper, we try to reduce the number of parameters of CNNs by learning a basis of the filters in convolutional layers. |
563 | End-to-End Learning of Representations for Asynchronous Event-Based Data | Daniel Gehrig, Antonio Loquercio, Konstantinos G. Derpanis, Davide Scaramuzza | In this work, we introduce a general framework to convert event streams into grid-based representations by means of strictly differentiable operations. |
564 | ERL-Net: Entangled Representation Learning for Single Image De-Raining | Guoqing Wang, Changming Sun, Arcot Sowmya | In this paper, we hypothesize that there exists an inherent mapping between the low-quality embedding to a latent optimal one, with which the generator (decoder) can produce much better results. |
565 | Perceptual Deep Depth Super-Resolution | Oleg Voynov, Alexey Artemov, Vage Egiazarian, Alexander Notchenko, Gleb Bobrovskikh, Evgeny Burnaev, Denis Zorin | The main idea of our approach is to measure the quality of depth map upsampling using renderings of resulting 3D surfaces. |
566 | 3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera | Iro Armeni, Zhi-Yang He, JunYoung Gwak, Amir R. Zamir, Martin Fischer, Jitendra Malik, Silvio Savarese | To alleviate this we devise a semi-automatic framework that employs existing detection methods and enhances them using two main constraints: I. framing of query images sampled on panoramas to maximize the performance of 2D detectors, and II. |
567 | Floorplan-Jigsaw: Jointly Estimating Scene Layout and Aligning Partial Scans | Cheng Lin, Changjian Li, Wenping Wang | We present a novel approach to align partial 3D reconstructions which may not have substantial overlap. |
568 | Enforcing Geometric Constraints of Virtual Normal for Depth Prediction | Wei Yin, Yifan Liu, Chunhua Shen, Youliang Yan | In this work, we show the importance of the high-order 3D geometric constraints for depth prediction. |
569 | Deep Contextual Attention for Human-Object Interaction Detection | Tiancai Wang, Rao Muhammad Anwer, Muhammad Haris Khan, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Jorma Laaksonen | We propose a contextual attention framework for human-object interaction detection. |
570 | Learning Compositional Neural Information Fusion for Human Parsing | Wenguan Wang, Zhijie Zhang, Siyuan Qi, Jianbing Shen, Yanwei Pang, Ling Shao | This work proposes to combine neural networks with the compositional hierarchy of human bodies for efficient and complete human parsing. |
571 | Attentional Neural Fields for Crowd Counting | Anran Zhang, Lei Yue, Jiayi Shen, Fan Zhu, Xiantong Zhen, Xianbin Cao, Ling Shao | In this paper, we propose the Attentional Neural Field (ANF) for crowd counting via density estimation. |
572 | Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning | Lifeng Fan, Wenguan Wang, Siyuan Huang, Xinyu Tang, Song-Chun Zhu | Together with VACATION, we propose a spatio-temporal graph neural network to explicitly represent the diverse gaze interactions in the social scenes and to infer atomic-level gaze communication by message passing. |
573 | Controllable Attention for Structured Layered Video Decomposition | Jean-Baptiste Alayrac, Joao Carreira, Relja Arandjelovic, Andrew Zisserman | The objective of this paper is to be able to separate a video into its natural layers, and to control which of the separated layers to attend to. |
574 | GANalyze: Toward Visual Definitions of Cognitive Image Properties | Lore Goetschalckx, Alex Andonian, Aude Oliva, Phillip Isola | We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability. |
575 | Saliency-Guided Attention Network for Image-Sentence Matching | Zhong Ji, Haoran Wang, Jungong Han, Yanwei Pang | Unlike previous approaches that predominantly deploy symmetrical architecture to represent both modalities, we introduce a Saliency-guided Attention Network (SAN) that is characterized by building an asymmetrical link between vision and language to efficiently learn a fine-grained cross-modal correlation. |
576 | CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval | Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao | In this paper, we propose Cross-modal Adaptive Message Passing (CAMP), which adaptively controls the information flow for message passing across modalities. |
577 | ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching | Yan Huang, Liang Wang | In this work, we study this challenging scenario as few-shot image and sentence matching, and accordingly propose an Aligned Cross-Modal Memory (ACMM) model to memorize the rarely appeared content. |
578 | Creativity Inspired Zero-Shot Learning | Mohamed Elhoseiny, Mohamed Elfeki | We introduce a learning signal inspired by creativity literature that explores the unseen space with hallucinated class-descriptions and encourages careful deviation of their visual feature generations from seen classes while allowing knowledge transfer from seen to unseen classes. |
579 | Generating Easy-to-Understand Referring Expressions for Target Identifications | Mikihiro Tanaka, Takayuki Itamochi, Kenichi Narioka, Ikuro Sato, Yoshitaka Ushiku, Tatsuya Harada | This paper addresses the generation of referring expressions that not only refer to objects correctly but also let humans find them quickly. To evaluate our system, we created a new referring expression dataset whose images were acquired from Grand Theft Auto V (GTA V), limiting targets to persons. |
580 | Language-Agnostic Visual-Semantic Embeddings | Jonatas Wehrmann, Douglas M. Souza, Mauricio A. Lopes, Rodrigo C. Barros | This paper proposes a framework for training language-invariant cross-modal retrieval models. |
581 | Adversarial Representation Learning for Text-to-Image Matching | Nikolaos Sarafianos, Xiang Xu, Ioannis A. Kakadiaris | With that in mind, we introduce TIMAM: a Text-Image Modality Adversarial Matching approach that learns modality-invariant feature representations using adversarial and cross-modal matching objectives. |
582 | Multi-Modality Latent Interaction Network for Visual Question Answering | Peng Gao, Haoxuan You, Zhanpeng Zhang, Xiaogang Wang, Hongsheng Li | In this paper, we proposed the Multi-modality Latent Interaction module (MLI) to tackle this problem. |
583 | Learning Two-View Correspondences and Geometry Using Order-Aware Network | Jiahui Zhang, Dawei Sun, Zixin Luo, Anbang Yao, Lei Zhou, Tianwei Shen, Yurong Chen, Long Quan, Hongen Liao | Given putative correspondences of feature points in two views, in this paper, we propose Order-Aware Network, which infers the probabilities of correspondences being inliers and regresses the relative pose encoded by the essential matrix. |
584 | Learning Meshes for Dense Visual SLAM | Michael Bloesch, Tristan Laidlow, Ronald Clark, Stefan Leutenegger, Andrew J. Davison | In the present paper, we use triangular meshes as both compact and dense geometry representation. |
585 | EM-Fusion: Dynamic Object-Level SLAM With Probabilistic Data Association | Michael Strecke, Jorg Stuckler | In this paper, we propose a novel approach to dynamic SLAM with dense object-level representations. |
586 | ClusterSLAM: A SLAM Backend for Simultaneous Rigid Body Clustering and Motion Estimation | Jiahui Huang, Sheng Yang, Zishuo Zhao, Yu-Kun Lai, Shi-Min Hu | In this paper, we exploit the consensus of 3D motions among the landmarks extracted from the same rigid body for clustering and estimating static and dynamic objects in a unified manner. |
587 | Efficient and Robust Registration on the 3D Special Euclidean Group | Uttaran Bhattacharya, Venu Madhav Govindu | We present a robust, fast and accurate method for registration of 3D scans. |
588 | Algebraic Characterization of Essential Matrices and Their Averaging in Multiview Settings | Yoni Kasten, Amnon Geifman, Meirav Galun, Ronen Basri | This paper presents a novel approach that solves simultaneously for both camera orientations and positions. |
589 | Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis | Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, Shenghua Gao | In this paper, we propose to use a 3D body mesh recovery module to disentangle the pose and shape, which can not only model the joint location and rotation but also characterize the personalized body shape. In addition, we build a new dataset, namely Impersonator (iPER) dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis. |
590 | RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes | Po-Wei Wu, Yu-Jing Lin, Che-Han Chang, Edward Y. Chang, Shih-Wei Liao | To address these limitations, we propose RelGAN, a new method for multi-domain image-to-image translation. |
591 | Attribute-Driven Spontaneous Motion in Unpaired Image Translation | Ruizheng Wu, Xin Tao, Xiaodong Gu, Xiaoyong Shen, Jiaya Jia | We in this paper propose the spontaneous motion estimation module, along with a refinement part, to learn attribute-driven deformation between source and target domains. |
592 | Everybody Dance Now | Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros | This paper presents a simple method for “do as I do” motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. In addition, we release a first-of-its-kind open-source dataset of videos that can be legally used for training and motion transfer. |
593 | Multimodal Style Transfer via Graph Cuts | Yulun Zhang, Chen Fang, Yilin Wang, Zhaowen Wang, Zhe Lin, Yun Fu, Jimei Yang | In this paper, we introduce a more flexible and general universal style transfer technique: multimodal style transfer (MST). |
594 | A Closed-Form Solution to Universal Style Transfer | Ming Lu, Hao Zhao, Anbang Yao, Yurong Chen, Feng Xu, Li Zhang | In this paper, we first propose a novel interpretation by treating it as the optimal transport problem. Then, we demonstrate the relations of our formulation with former works like Adaptive Instance Normalization (AdaIN) and Whitening and Coloring Transform (WCT). Finally, we derive a closed-form solution named Optimal Style Transfer (OST) under our formulation by additionally considering the content loss of Gatys. |
595 | Progressive Reconstruction of Visual Structure for Image Inpainting | Jingyuan Li, Fengxiang He, Lefei Zhang, Bo Du, Dacheng Tao | To address this issue, this paper proposes a Progressive Reconstruction of Visual Structure (PRVS) network that progressively reconstructs the structures and the associated visual feature. |
596 | Variational Adversarial Active Learning | Samarth Sinha, Sayna Ebrahimi, Trevor Darrell | We describe a pool-based semi-supervised active learning algorithm that implicitly learns this sampling mechanism in an adversarial manner. |
597 | Confidence Regularized Self-Training | Yang Zou, Zhiding Yu, Xiaofeng Liu, B.V.K. Vijaya Kumar, Jinsong Wang | To address the problem, we propose a confidence regularized self-training (CRST) framework, formulated as regularized self-training. |
598 | Anchor Loss: Modulating Loss Scale Based on Prediction Difficulty | Serim Ryou, Seong-Gyun Jeong, Pietro Perona | In this work, we define the prediction difficulty as a relative property coming from the confidence score gap between positive and negative labels. |
599 | Local Aggregation for Unsupervised Learning of Visual Embeddings | Chengxu Zhuang, Alex Lin Zhai, Daniel Yamins | Here, we describe a method that trains an embedding function to maximize a metric of local aggregation, causing similar data instances to move together in the embedding space, while allowing dissimilar instances to separate. |
600 | PR Product: A Substitute for Inner Product in Neural Networks | Zhennan Wang, Wenbin Zou, Chen Xu | In this paper, we analyze the inner product of weight vector w and data vector x in neural networks from the perspective of vector orthogonal decomposition and prove that the direction gradient of w decreases with the angle between them close to 0 or p. |
601 | CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features | Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo | We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. |
602 | Towards Interpretable Object Detection by Unfolding Latent Structures | Tianfu Wu, Xi Song | The proposed method focuses on weakly-supervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. |
603 | Scaling Object Detection by Transferring Classification Weights | Jason Kuen, Federico Perazzi, Zhe Lin, Jianming Zhang, Yap-Peng Tan | In this paper, we propose a novel weight transfer network (WTN) to effectively and efficiently transfer knowledge from classification network’s weights to detection network’s weights to allow detection of novel classes without box supervision. |
604 | Scale-Aware Trident Networks for Object Detection | Yanghao Li, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang | Based on the findings from the exploration experiments, we propose a novel Trident Network (TridentNet) aiming to generate scale-specific feature maps with a uniform representational power. |
605 | Object-Aware Instance Labeling for Weakly Supervised Object Detection | Satoshi Kosugi, Toshihiko Yamasaki, Kiyoharu Aizawa | Instead of simply labeling the top-scoring region and its highly overlapping regions as positive and others as negative, we propose more effective instance labeling methods as follows. |
606 | Generative Modeling for Small-Data Object Detection | Lanlan Liu, Michael Muelly, Jia Deng, Tomas Pfister, Li-Jia Li | In this work we explore this problem from a generative modeling perspective by learning to generate new images with associated bounding boxes, and using these for training an object detector. |
607 | Transductive Learning for Zero-Shot Object Detection | Shafin Rahman, Salman Khan, Nick Barnes | To the best of our knowledge, we are the first to propose a transductive zero-shot object detection approach that convincingly reduces the domain-shift and model-bias against unseen classes. |
608 | Self-Training and Adversarial Background Regularization for Unsupervised Domain Adaptive One-Stage Object Detection | Seunghyeon Kim, Jaehoon Choi, Taekyung Kim, Changick Kim | In this paper, we introduce a weak self-training (WST) method and adversarial background score regularization (BSR) for domain adaptive one-stage object detection. |
609 | Memory-Based Neighbourhood Embedding for Visual Recognition | Suichan Li, Dapeng Chen, Bin Liu, Nenghai Yu, Rui Zhao | In this paper, we propose Memory-based Neighbourhood Embedding (MNE) to enhance a general CNN feature by considering its neighbourhood. |
610 | Self-Similarity Grouping: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-Identification | Yang Fu, Yunchao Wei, Guanshuo Wang, Yuqian Zhou, Honghui Shi, Thomas S. Huang | In this work, we explore how to harness the similar natural characteristics existing in the samples from the target domain for learning to conduct person re-ID in an unsupervised manner. |
611 | Deep Reinforcement Active Learning for Human-in-the-Loop Person Re-Identification | Zimo Liu, Jingya Wang, Shaogang Gong, Huchuan Lu, Dacheng Tao | In this work, we propose an alternative reinforcement learning based human-in-the-loop model which releases the restriction of pre-labelling and keeps model upgrading with progressively collected data. |
612 | A Dual-Path Model With Adaptive Attention for Vehicle Re-Identification | Pirazh Khorramshahi, Amit Kumar, Neehar Peri, Sai Saketh Rambhatla, Jun-Cheng Chen, Rama Chellappa | In this paper, we present a novel dual-path adaptive attention model for vehicle re-identification (AAVER). |
613 | Bayesian Loss for Crowd Count Estimation With Point Supervision | Zhiheng Ma, Xing Wei, Xiaopeng Hong, Yihong Gong | On the contrary, we propose Bayesian loss, a novel loss function which constructs a density contribution probability model from the point annotations. |
614 | Learning Spatial Awareness to Improve Crowd Counting | Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander G. Hauptmann | In this paper, we present a novel architecture called SPatial Awareness Network (SPANet) to incorporate spatial context for crowd counting. |
615 | GradNet: Gradient-Guided Network for Visual Object Tracking | Peixia Li, Boyu Chen, Wanli Ouyang, Dong Wang, Xiaoyun Yang, Huchuan Lu | In this work, we propose a novel gradient-guided network to exploit the discriminative information in gradients and update the template in the siamese network through feed-forward and backward operations. |
616 | FAMNet: Joint Learning of Feature, Affinity and Multi-Dimensional Assignment for Online Multiple Object Tracking | Peng Chu, Haibin Ling | In this paper, we present an end-to-end model, named FAMNet, where Feature extraction, Affinity estimation and Multi-dimensional assignment are refined in a single network. |
617 | Learning Discriminative Model Prediction for Tracking | Goutam Bhat, Martin Danelljan, Luc Van Gool, Radu Timofte | We develop an end-to-end tracking architecture, capable of fully exploiting both target and background appearance information for target model prediction. |
618 | DynamoNet: Dynamic Action and Motion Network | Ali Diba, Vivek Sharma, Luc Van Gool, Rainer Stiefelhagen | In this paper, we are interested in self-supervised learning the motion cues in videos using dynamic motion filters for a better motion representation to finally boost human action recognition in particular. |
619 | SlowFast Networks for Video Recognition | Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He | We present SlowFast networks for video recognition. |
620 | Generative Multi-View Human Action Recognition | Lichen Wang, Zhengming Ding, Zhiqiang Tao, Yunyu Liu, Yun Fu | In this work, we propose a Generative Multi-View Action Recognition (GMVAR) framework to address the challenges above. |
621 | Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition | Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Shilei Wen | We intuitively formulate the frame sampling procedure as multiple parallel Markov decision processes, each of which aims at picking out a frame/clip by gradually adjusting an initial sampling. |
622 | SCSampler: Sampling Salient Clips From Video for Efficient Action Recognition | Bruno Korbar, Du Tran, Lorenzo Torresani | In this paper we introduce a lightweight “clip-sampling” model that can efficiently identify the most salient temporal clips within a long video. |
623 | Weakly Supervised Energy-Based Learning for Action Segmentation | Jun Li, Peng Lei, Sinisa Todorovic | Our key contribution is a new constrained discriminative forward loss (CDFL) that we use for training the HMM and GRU under weak supervision. |
624 | What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention | Antonino Furnari, Giovanni Maria Farinella | We tackle the problem proposing an architecture able to anticipate actions at multiple temporal scales using two LSTMs to 1) summarize the past, and 2) formulate predictions about the future. |
625 | PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction | Amir Rasouli, Iuliia Kotseruba, Toni Kunic, John K. Tsotsos | We propose models for estimating pedestrian crossing intention and predicting their future trajectory. To this end, we propose a novel large-scale dataset designed for pedestrian intention estimation (PIE). |
626 | STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction | Yingfan Huang, Huikun Bi, Zhaoxin Li, Tianlu Mao, Zhaoqi Wang | In this work, we propose a Spatial-Temporal Graph Attention network (STGAT), based on a sequence-to-sequence architecture to predict future trajectories of pedestrians. |
627 | Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection | Khoi-Nguyen C. Mac, Dhiraj Joshi, Raymond A. Yeh, Jinjun Xiong, Rogerio S. Feris, Minh N. Do | We propose a novel locally-consistent deformable convolution, which utilizes the change in receptive fields and enforces a local coherency constraint to capture motion information effectively. |
628 | Dual Attention Matching for Audio-Visual Event Localization | Yu Wu, Linchao Zhu, Yan Yan, Yi Yang | In this paper, we investigate the audio-visual event localization problem. |
629 | Uncertainty-Aware Audiovisual Activity Recognition Using Deep Bayesian Variational Inference | Mahesh Subedar, Ranganath Krishnan, Paulo Lopez Meyer, Omesh Tickoo, Jonathan Huang | Our contribution in this work is to propose an uncertainty aware multimodal Bayesian fusion framework for activity recognition. |
630 | Non-Local Recurrent Neural Memory for Supervised Sequence Modeling | Canmiao Fu, Wenjie Pei, Qiong Cao, Chaopeng Zhang, Yong Zhao, Xiaoyong Shen, Yu-Wing Tai | To tackle this limitation, we propose the Non-local Recurrent Neural Memory (NRNM) for supervised sequence modeling, which performs non-local operations to learn full-order interactions within a sliding temporal block and models the global interactions between blocks in a gated recurrent manner. |
631 | Temporal Attentive Alignment for Large-Scale Video Domain Adaptation | Min-Hung Chen, Zsolt Kira, Ghassan AlRegib, Jaekwon Yoo, Ruxin Chen, Jian Zheng | Finally, we propose Temporal Attentive Adversarial Adaptation Network (TA3N), which explicitly attends to the temporal dynamics using domain discrepancy for more effective domain alignment, achieving state-of-the-art performance on four video DA datasets |
632 | Action Assessment by Joint Relation Graphs | Jia-Hui Pan, Jibin Gao, Wei-Shi Zheng | We present a new model to assess the performance of actions from videos, through graph-based joint relation modelling. |
633 | Unsupervised Procedure Learning via Joint Dynamic Summarization | Ehsan Elhamifar, Zwe Naing | Our goal is to produce a summary of the procedure key-steps and their ordering needed to perform a given task, as well as localization of the key-steps in videos. |
634 | ViSiL: Fine-Grained Spatio-Temporal Video Similarity Learning | Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis Kompatsiaris | In this paper we introduce ViSiL, a Video Similarity Learning architecture that considers fine-grained Spatio-Temporal relations between pairs of videos — such relations are typically lost in previous video retrieval approaches that embed the whole frame or even the whole video into a vector descriptor before the similarity estimation. |
635 | Unsupervised Learning of Landmarks by Descriptor Vector Exchange | James Thewlis, Samuel Albanie, Hakan Bilen, Andrea Vedaldi | In this paper, we develop a new perspective on the equivariance approach by noting that dense landmark detectors can be interpreted as local image descriptors equipped with invariance to intra-category variations. |
636 | Learning Compositional Representations for Few-Shot Recognition | Pavel Tokmakov, Yu-Xiong Wang, Martial Hebert | In this work, we make a step towards bridging this gap between human and machine learning by introducing a simple regularization technique that allows the learned representation to be decomposable into parts. |
637 | Spectral Regularization for Combating Mode Collapse in GANs | Kanglin Liu, Wenming Tang, Fei Zhou, Guoping Qiu | In this paper, we present spectral regularization for GANs (SR-GANs), a new and robust method for combating the mode collapse problem in GANs. |
638 | Scaling and Benchmarking Self-Supervised Visual Representation Learning | Priya Goyal, Dhruv Mahajan, Abhinav Gupta, Ishan Misra | In this work, we revisit this principle and scale two popular self-supervised approaches to 100 million images. We also introduce an extensive benchmark across 9 different datasets and tasks. |
639 | Learning an Effective Equivariant 3D Descriptor Without Supervision | Riccardo Spezialetti, Samuele Salti, Luigi Di Stefano | In this paper, we explore the benefits of taking a step back in the direction of end-to-end learning of 3D descrip- tors by disentangling the creation of a robust and distinctive rotation equivariant representation, which can be learned from unoriented input data, and the definition of a good canonical orientation, required only at test time to obtain an invariant descriptor. |
640 | KPConv: Flexible and Deformable Convolution for Point Clouds | Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Francois Goulette, Leonidas J. Guibas | We present Kernel Point Convolution (KPConv), a new design of point convolution, i.e. that operates on point clouds without any intermediate representation. |
641 | Neural Inter-Frame Compression for Video Coding | Abdelaziz Djelouah, Joaquim Campos, Simone Schaub-Meyer, Christopher Schroers | Therefore, in this work we present an inter-frame compression approach for neural video coding that can seamlessly build up on different existing neural image codecs. |
642 | Task2Vec: Task Embedding for Meta-Learning | Alessandro Achille, Michael Lam, Rahul Tewari, Avinash Ravichandran, Subhransu Maji, Charless C. Fowlkes, Stefano Soatto, Pietro Perona | We introduce a method to generate vectorial representations of visual classification tasks which can be used to reason about the nature of those tasks and their relations. |
643 | Deep Clustering by Gaussian Mixture Variational Autoencoders With Graph Embedding | Linxiao Yang, Ngai-Man Cheung, Jiaying Li, Jun Fang | We propose DGG: D eep clustering via a G aussian-mixture variational autoencoder (VAE) with G raph embedding. |
644 | SoftTriple Loss: Deep Metric Learning Without Triplet Sampling | Qi Qian, Lei Shang, Baigui Sun, Juhua Hu, Hao Li, Rong Jin | Therefore, we propose the SoftTriple loss to extend the SoftMax loss with multiple centers for each class. |
645 | A Weakly Supervised Fine Label Classifier Enhanced by Coarse Supervision | Fariborz Taherkhani, Hadi Kazemi, Ali Dabouei, Jeremy Dawson, Nasser M. Nasrabadi | We propose a new deep model that leverages coarse images to improve the classification performance of fine images within the coarse category. |
646 | Gaussian Affinity for Max-Margin Class Imbalanced Learning | Munawar Hayat, Salman Khan, Syed Waqas Zamir, Jianbing Shen, Ling Shao | Here, we introduce the first hybrid loss function that jointly performs classification and clustering in a single formulation. |
647 | AttPool: Towards Hierarchical Feature Representation in Graph Convolutional Networks via Attention Mechanism | Jingjia Huang, Zhangheng Li, Nannan Li, Shan Liu, Ge Li | Here, we propose AttPool, which is a novel graph pooling module based on attention mechanism, to remedy the problem. |
648 | Deep Metric Learning With Tuplet Margin Loss | Baosheng Yu, Dacheng Tao | In this paper, we propose a new deep metric learning loss function, tuplet margin loss, using randomly selected samples from each mini-batch. |
649 | Normalized Wasserstein for Mixture Distributions With Applications in Adversarial Learning and Domain Adaptation | Yogesh Balaji, Rama Chellappa, Soheil Feizi | In this work, we focus on mixture distributions that arise naturally in several application domains where the data contains different sub-populations. |
650 | Fast and Practical Neural Architecture Search | Jiequan Cui, Pengguang Chen, Ruiyu Li, Shu Liu, Xiaoyong Shen, Jiaya Jia | In this paper, we propose a fast and practical neural architecture search (FPNAS) framework for automatic network design. |
651 | Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning | Jiwoong Park, Minsik Lee, Hyung Jin Chang, Kyuewang Lee, Jin Young Choi | We propose a symmetric graph convolutional autoencoder which produces a low-dimensional latent representation from a graph. |
652 | Deep Elastic Networks With Model Selection for Multi-Task Learning | Chanho Ahn, Eunwoo Kim, Songhwai Oh | In this work, we consider the problem of instance-wise dynamic network model selection for multi-task learning. |
653 | Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings | Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein | In this paper, we tackle this scattering problem with a distribution-aware regularization named HORDE. |
654 | Adversarial Learning With Margin-Based Triplet Embedding Regularization | Yaoyao Zhong, Weihong Deng | To address this problem, we propose to improve the local smoothness of the representation space, by integrating a margin-based triplet embedding regularization term into the classification objective, so that the obtained models learn to resist adversarial examples. |
655 | Simultaneous Multi-View Instance Detection With Learned Geometric Soft-Constraints | Ahmed Samy Nassar, Sebastien Lefevre, Jan Dirk Wegner | We propose to jointly learn multi-view geometry and warping between views of the same object instances for robust cross-view object detection. |
656 | CenterNet: Keypoint Triplets for Object Detection | Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian | This paper presents an efficient solution that explores the visual patterns within individual cropped regions with minimal costs. |
657 | Online Hyper-Parameter Learning for Auto-Augmentation Strategy | Chen Lin, Minghao Guo, Chuming Li, Xin Yuan, Wei Wu, Junjie Yan, Dahua Lin, Wanli Ouyang | In this paper, we propose Online Hyper-parameter Learning for Auto-Augmentation (OHL-Auto-Aug), an economical solution that learns the augmentation policy distribution along with network training. |
658 | DANet: Divergent Activation for Weakly Supervised Object Localization | Haolan Xue, Chang Liu, Fang Wan, Jianbin Jiao, Xiangyang Ji, Qixiang Ye | In this paper, we propose a divergent activation (DA) approach, and target at learning complementary and discriminative visual patterns for image classification and weakly supervised object localization from the perspective of discrepancy. |
659 | Selective Sparse Sampling for Fine-Grained Image Recognition | Yao Ding, Yanzhao Zhou, Yi Zhu, Qixiang Ye, Jianbin Jiao | In this paper, we propose a simple yet effective framework, called Selective Sparse Sampling, to capture diverse and fine-grained details. |
660 | Dynamic Anchor Feature Selection for Single-Shot Object Detection | Shuai Li, Lingxiao Yang, Jianqiang Huang, Xian-Sheng Hua, Lei Zhang | In this paper, we present a dynamic feature selection operation to select new pixels in a feature map for each refined anchor received from the ARM. |
661 | Incremental Learning Using Conditional Adversarial Networks | Ye Xiang, Ying Fu, Pan Ji, Hua Huang | In this paper, we propose a new incremental learning strategy based on conditional adversarial networks. |
662 | Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks | Jianyu Wang, Haichao Zhang | In this paper, we study fast training of adversarially robust models. |
663 | View Confusion Feature Learning for Person Re-Identification | Fangyi Liu, Lei Zhang | In this paper, we mainly focus on how to learn view-independent features by getting rid of view specific information through a view confusion learning mechanism. |
664 | Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification | Hang Xu, Lewei Yao, Wei Zhang, Xiaodan Liang, Zhenguo Li | In this paper, we study NAS for object detection, a core computer vision task that classifies and localizes object instances in an image. |
665 | PARN: Position-Aware Relation Networks for Few-Shot Learning | Ziyang Wu, Yuwei Li, Lihua Guo, Kui Jia | To address this problem, we introduce a deformable feature extractor (DFE) to extract more efficient features, and design a dual correlation attention mechanism (DCA) to deal with its inherent local connectivity. |
666 | Multi-Adversarial Faster-RCNN for Unrestricted Object Detection | Zhenwei He, Lei Zhang | For alleviating the problem of domain dependency and cumbersome labeling, this paper proposes to detect objects in unrestricted environment by leveraging domain knowledge trained from an auxiliary source domain with sufficient labels. |
667 | Object Guided External Memory Network for Video Object Detection | Hanming Deng, Yang Hua, Tao Song, Zongpu Zhang, Zhengui Xue, Ruhui Ma, Neil Robertson, Haibing Guan | In this work, we propose the first object guided external memory network for online video object detection. |
668 | An Empirical Study of Spatial Attention Mechanisms in Deep Networks | Xizhou Zhu, Dazhi Cheng, Zheng Zhang, Stephen Lin, Jifeng Dai | Toward a better general understanding of attention mechanisms, we present an empirical study that ablates various spatial attention elements within a generalized attention formulation, encompassing the dominant Transformer attention as well as the prevalent deformable convolution and dynamic convolution modules. |
669 | Attribute Attention for Semantic Disambiguation in Zero-Shot Learning | Yang Liu, Jishun Guo, Deng Cai, Xiaofei He | Considering both low-level visual information and global class-level features that relate to this ambiguity, we propose a practical Latent Feature Guided Attribute Attention (LFGAA) framework to perform object-based attribute attention for semantic disambiguation. |
670 | CIIDefence: Defeating Adversarial Attacks by Fusing Class-Specific Image Inpainting and Image Denoising | Puneet Gupta, Esa Rahtu | This paper presents a novel approach for protecting deep neural networks from adversarial attacks, i.e., methods that add well-crafted imperceptible modifications to the original inputs such that they are incorrectly classified with high confidence. |
671 | ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices | Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun | In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. |
672 | Dual Student: Breaking the Limits of the Teacher in Semi-Supervised Learning | Zhanghan Ke, Daoye Wang, Qiong Yan, Jimmy Ren, Rynson W.H. Lau | In this work, we show that the coupled EMA teacher causes a performance bottleneck. |
673 | MVP Matching: A Maximum-Value Perfect Matching for Mining Hard Samples, With Application to Person Re-Identification | Han Sun, Zhiyuan Chen, Shiyang Yan, Lin Xu | In this paper, we propose a novel weighted complete bipartite graph based maximum-value perfect (MVP) matching for mining the hard samples from a batch of samples. |
674 | Adaptive Context Network for Scene Parsing | Jun Fu, Jing Liu, Yuhang Wang, Yong Li, Yongjun Bao, Jinhui Tang, Hanqing Lu | Based on this observation, we propose an Adaptive Context Network (ACNet) to capture the pixel-aware contexts by a competitive fusion of global context and local context according to different per-pixel demands. |
675 | Constructing Self-Motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach | Qing Lian, Fengmao Lv, Lixin Duan, Boqing Gong | We propose a new approach, called self-motivated pyramid curriculum domain adaptation (PyCDA), to facilitate the adaptation of semantic segmentation neural networks from synthetic source domains to real target domains. |
676 | SparseMask: Differentiable Connectivity Learning for Dense Image Prediction | Huikai Wu, Junge Zhang, Kaiqi Huang | In this paper, we aim at automatically searching an efficient network architecture for dense image prediction. |
677 | Significance-Aware Information Bottleneck for Domain Adaptive Semantic Segmentation | Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, Yi Yang | In this work, we equip the adversarial network with a “significance-aware information bottleneck (SIB)”, to address the above problem. |
678 | Relational Attention Network for Crowd Counting | Anran Zhang, Jiayi Shen, Zehao Xiao, Fan Zhu, Xiantong Zhen, Xianbin Cao, Ling Shao | In order to address such an issue, we propose a Relational Attention Network (RANet) with a self-attention mechanism for capturing interdependence of pixels. |
679 | ACFNet: Attentional Class Feature Network for Semantic Segmentation | Fan Zhang, Yanqin Chen, Zhihang Li, Zhibin Hong, Jingtuo Liu, Feifei Ma, Junyu Han, Errui Ding | In this paper, we use two types of base networks to evaluate the effectiveness of ACFNet. |
680 | Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly Supervised Semantic Segmentation | Jungbeom Lee, Eunji Kim, Sungmin Lee, Jangho Lee, Sungroh Yoon | We propose a method of using videos automatically harvested from the web to identify a larger region of the target object by using temporal information, which is not present in the static image. |
681 | Boundary-Aware Feature Propagation for Scene Segmentation | Henghui Ding, Xudong Jiang, Ai Qun Liu, Nadia Magnenat Thalmann, Gang Wang | In this work, we address the challenging issue of scene segmentation. |
682 | Self-Ensembling With GAN-Based Data Augmentation for Domain Adaptation in Semantic Segmentation | Jaehoon Choi, Taekyung Kim, Changick Kim | In this paper, we introduce a self-ensembling technique, one of the successful methods for domain adaptation in classification. |
683 | Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data | Fabian Manhardt, Diego Martin Arroyo, Christian Rupprecht, Benjamin Busam, Tolga Birdal, Nassir Navab, Federico Tombari | In this work we propose to explicitly deal with these ambiguities. |
684 | Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving | Xinzhu Ma, Zhihui Wang, Haojie Li, Pengbo Zhang, Wanli Ouyang, Xin Fan | In this paper, we propose a monocular 3D object detection framework in the domain of autonomous driving. |
685 | MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation | Lorenzo Bertoni, Sven Kreiss, Alexandre Alahi | We tackle the fundamentally ill-posed problem of 3D human localization from monocular RGB images. |
686 | Unsupervised High-Resolution Depth Learning From Videos With Dual Networks | Junsheng Zhou, Yuwang Wang, Kaihuai Qin, Wenjun Zeng | In order to fully explore the information contained in high-resolution data, we propose a simple yet effective dual networks architecture, which can directly take high-resolution images as input and generate high-resolution and high-accuracy depth map efficiently. |
687 | Bayesian Graph Convolution LSTM for Skeleton Based Action Recognition | Rui Zhao, Kang Wang, Hui Su, Qiang Ji | We propose a framework for recognizing human actions from skeleton data by modeling the underlying dynamic process that generates the motion pattern. |
688 | DeCaFA: Deep Convolutional Cascade for Face Alignment in the Wild | Arnaud Dapogny, Kevin Bailly, Matthieu Cord | In this paper, we introduce an end-to-end deep convolutional cascade (DeCaFA) architecture for face alignment. |
689 | Probabilistic Face Embeddings | Yichun Shi, Anil K. Jain | We propose Probabilistic Face Embeddings (PFEs), which represent each face image as a Gaussian distribution in the latent space. |
690 | Gaze360: Physically Unconstrained Gaze Estimation in the Wild | Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, Antonio Torralba | In this work, we present Gaze360, a large-scale remote gaze-tracking dataset and method for robust 3D gaze estimation in unconstrained images. |
691 | Unsupervised Person Re-Identification by Camera-Aware Similarity Consistency Learning | Ancong Wu, Wei-Shi Zheng, Jian-Huang Lai | To alleviate the effect of cross-camera scene variation, we propose a Camera-Aware Similarity Consistency Loss to learn consistent pairwise similarity distributions for intra-camera matching and cross-camera matching. |
692 | Photo-Realistic Monocular Gaze Redirection Using Generative Adversarial Networks | Zhe He, Adrian Spurr, Xucong Zhang, Otmar Hilliges | In this work, we present a novel method to alleviate this problem by leveraging generative adversarial training to synthesize an eye image conditioned on a target gaze direction. |
693 | Dynamic Kernel Distillation for Efficient Pose Estimation in Videos | Xuecheng Nie, Yuncheng Li, Linjie Luo, Ning Zhang, Jiashi Feng | To address this issue, we propose a novel Dynamic Kernel Distillation (DKD) model to facilitate small networks for estimating human poses in videos, thus significantly lifting the efficiency. |
694 | Single-Stage Multi-Person Pose Machines | Xuecheng Nie, Jiashi Feng, Jianfeng Zhang, Shuicheng Yan | In this work, we present the first single-stage model, Single-stage multi-person Pose Machine (SPM), to simplify the pipeline and lift the efficiency for multi-person pose estimation. |
695 | SO-HandNet: Self-Organizing Network for 3D Hand Pose Estimation With Semi-Supervised Learning | Yujin Chen, Zhigang Tu, Liuhao Ge, Dejun Zhang, Ruizhi Chen, Junsong Yuan | Inspired by the point cloud autoencoder presented in self-organizing network (SO-Net), our proposed SO-HandNet aims at making use of the unannotated data to obtain accurate 3D hand pose estimation in a semi-supervised manner. |
696 | Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression | Xinyao Wang, Liefeng Bo, Li Fuxin | In this paper, we analyze the ideal loss function properties for heatmap regression in face alignment problems. |
697 | Single-Network Whole-Body Pose Estimation | Gines Hidalgo, Yaadhav Raaj, Haroon Idrees, Donglai Xiang, Hanbyul Joo, Tomas Simon, Yaser Sheikh | We present the first single-network approach for 2D whole-body pose estimation, which entails simultaneous localization of body, face, hands, and feet keypoints. |
698 | Face Alignment With Kernel Density Deep Neural Network | Lisha Chen, Hui Su, Qiang Ji | To model more general distributions, such as multi-modal or asymmetric distributions, we propose to develop a kernel density deep neural network. |
699 | Spatiotemporal Feature Residual Propagation for Action Prediction | He Zhao, Richard P. Wildes | In this study, we address this task by investigating how action patterns evolve over time in a spatial feature space. |
700 | Identity From Here, Pose From There: Self-Supervised Disentanglement and Generation of Objects Using Unlabeled Videos | Fanyi Xiao, Haotian Liu, Yong Jae Lee | We propose a novel approach that disentangles the identity and pose of objects for image generation. |
701 | Relation Distillation Networks for Video Object Detection | Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei | In this paper, we introduce a new design to capture the interactions across the objects in spatio-temporal context. |
702 | Video Compression With Rate-Distortion Autoencoders | Amirhossein Habibian, Ties van Rozendaal, Jakub M. Tomczak, Taco S. Cohen | In this paper we present a a deep generative model for lossy video compression. |
703 | Non-Local ConvLSTM for Video Compression Artifact Reduction | Yi Xu, Longwen Gao, Kai Tian, Shuigeng Zhou, Huyang Sun | To remedy these shortcomings, in this paper we propose a novel end-to-end deep neural network called non-local ConvLSTM (NL-ConvLSTM in short) that exploits multiple consecutive frames. |
704 | Self-Supervised Moving Vehicle Tracking With Stereo Sound | Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba | In particular, we propose a framework that consists of a vision “teacher” network and a stereo-sound “student” network. |
705 | Self-Supervised Learning With Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera | Yuhua Chen, Cordelia Schmid, Cristian Sminchisescu | We present GLNet, a self-supervised framework for learning depth, optical flow, camera pose and intrinsic parameters from monocular video — addressing the difficulty of acquiring realistic ground-truth for such tasks. |
706 | Learning Temporal Action Proposals With Fewer Labels | Jingwei Ji, Kaidi Cao, Juan Carlos Niebles | In this work, we propose a semi-supervised learning algorithm specifically designed for training temporal action proposal networks. |
707 | TSM: Temporal Shift Module for Efficient Video Understanding | Ji Lin, Chuang Gan, Song Han | In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. |
708 | Graph Convolutional Networks for Temporal Action Localization | Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan | In this paper, we propose to exploit the proposal-proposal relations using GraphConvolutional Networks (GCNs). |
709 | Fast Object Detection in Compressed Video | Shiyao Wang, Hongchao Lu, Zhidong Deng | In this paper, we propose a fast object detection method by taking advantage of this with a novel Motion aided Memory Network (MMNet). |
710 | Predicting 3D Human Dynamics From Video | Jason Y. Zhang, Panna Felsen, Angjoo Kanazawa, Jitendra Malik | In this work, we present perhaps the first approach for predicting a future 3D mesh model sequence of a person from past video input. |
711 | Imitation Learning for Human Pose Prediction | Borui Wang, Ehsan Adeli, Hsu-kuang Chiu, De-An Huang, Juan Carlos Niebles | Inspired by the recent success of deep reinforcement learning methods, in this paper we propose a new reinforcement learning formulation for the problem of human pose prediction, and develop an imitation learning algorithm for predicting future poses under this formulation through a combination of behavioral cloning and generative adversarial imitation learning. |
712 | Human Motion Prediction via Spatio-Temporal Inpainting | Alejandro Hernandez, Jurgen Gall, Francesc Moreno-Noguer | We propose a Generative Adversarial Network (GAN) to forecast 3D human motion given a sequence of past 3D skeleton poses. |
713 | Structured Prediction Helps 3D Human Motion Modelling | Emre Aksan, Manuel Kaufmann, Otmar Hilliges | In this paper, we propose a novel approach that decomposes the prediction into individual joints by means of a structured prediction layer that explicitly models the joint dependencies. |
714 | Learning Shape Templates With Structured Implicit Functions | Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T. Freeman, Thomas Funkhouser | In this paper, we investigate learning a general shape template from data. |
715 | CompenNet++: End-to-End Full Projector Compensation | Bingyao Huang, Haibin Ling | In this paper, we propose the first end-to-end solution, named CompenNet++, to solve the two problems jointly. Moreover, we construct the first setup-independent full compensation benchmark to facilitate the study on this topic. |
716 | Deep Parametric Indoor Lighting Estimation | Marc-Andre Gardner, Yannick Hold-Geoffroy, Kalyan Sunkavalli, Christian Gagne, Jean-Francois Lalonde | We present a method to estimate lighting from a single image of an indoor scene. |
717 | FSGAN: Subject Agnostic Face Swapping and Reenactment | Yuval Nirkin, Yosi Keller, Tal Hassner | We present Face Swapping GAN (FSGAN) for face swapping and reenactment. |
718 | Deep Single-Image Portrait Relighting | Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, David W. Jacobs | In this work, we apply a physically-based portrait relighting method to generate a large scale, high quality, “in the wild” portrait relighting dataset (DPR). |
719 | PU-GAN: A Point Cloud Upsampling Adversarial Network | Ruihui Li, Xianzhi Li, Chi-Wing Fu, Daniel Cohen-Or, Pheng-Ann Heng | This paper presents a new point cloud upsampling network called PU-GAN, which is formulated based on a generative adversarial network (GAN), to learn a rich variety of point distributions from the latent space and upsample points over patches on object surfaces. |
720 | Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation | Giorgos Bouritsas, Sergiy Bokhnyak, Stylianos Ploumpis, Michael Bronstein, Stefanos Zafeiriou | In this paper, we focus on 3D deformable shapes that share a common topological structure, such as human faces and bodies. |
721 | Joint Learning of Saliency Detection and Weakly Supervised Semantic Segmentation | Yu Zeng, Yunzhi Zhuge, Huchuan Lu, Lihe Zhang | Here we propose a unified multi-task learning framework to jointly solve WSSS and SD using a single network, i.e. saliency and segmentation network (SSNet). |
722 | Towards High-Resolution Salient Object Detection | Yi Zeng, Pingping Zhang, Jianming Zhang, Zhe Lin, Huchuan Lu | This paper pushes forward high-resolution saliency detection, and contributes a new dataset, named High-Resolution Salient Object Detection (HRSOD) dataset. To our best knowledge, HRSOD is the first high-resolution saliency detection dataset to date. |
723 | Event-Based Motion Segmentation by Motion Compensation | Timo Stoffregen, Guillermo Gallego, Tom Drummond, Lindsay Kleeman, Davide Scaramuzza | We present the first per-event segmentation method for splitting a scene into independently moving objects. |
724 | Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection | Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, Huchuan Lu | In this work, we propose a novel depth-induced multi-scale recurrent attention network for saliency detection. In addition, we create a large scale RGB-D dataset containing more complex scenarios, which can contribute to comprehensively evaluating saliency models. |
725 | Stacked Cross Refinement Network for Edge-Aware Salient Object Detection | Zhe Wu, Li Su, Qingming Huang | Motivated by the logical interrelations between binary segmentation and edge maps, we propose a novel Stacked Cross Refinement Network (SCRN) for salient object detection in this paper. |
726 | Motion Guided Attention for Video Salient Object Detection | Haofeng Li, Guanqi Chen, Guanbin Li, Yizhou Yu | In this paper, we develop a multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using two sub-networks, one sub-network for salient object detection in still images and the other for motion saliency detection in optical flow images. |
727 | Semi-Supervised Video Salient Object Detection Using Pseudo-Labels | Pengxiang Yan, Guanbin Li, Yuan Xie, Zhen Li, Chuan Wang, Tianshui Chen, Liang Lin | In this paper, we address the semi-supervised video salient object detection task using pseudo-labels. |
728 | Joint Learning of Semantic Alignment and Object Landmark Detection | Sangryul Jeon, Dongbo Min, Seungryong Kim, Kwanghoon Sohn | In this paper, we present a joint learning approach for obtaining dense correspondences and discovering object landmarks from semantically similar images. |
729 | RainFlow: Optical Flow Under Rain Streaks and Rain Veiling Effect | Ruoteng Li, Robby T. Tan, Loong-Fah Cheong, Angelica I. Aviles-Rivero, Qingnan Fan, Carola-Bibiane Schonlieb | Concerning this, we propose a deep-learning based optical flow method designed to handle heavy rain. |
730 | GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing | Xiaohong Liu, Yongrui Ma, Zhihao Shi, Jun Chen | We propose an end-to-end trainable Convolutional Neural Network (CNN), named GridDehazeNet, for single image dehazing. |
731 | Learning to See Moving Objects in the Dark | Haiyang Jiang, Yinqiang Zheng | We propose a novel optical system to capture bright and dark videos of the exact same scenes, generating training and groud truth pairs for authentic low-light video dataset. |
732 | SegSort: Segmentation by Discriminative Sorting of Segments | Jyh-Jing Hwang, Stella X. Yu, Jianbo Shi, Maxwell D. Collins, Tien-Ju Yang, Xiao Zhang, Liang-Chieh Chen | This motivates us to propose an end-to-end pixel-wise metric learning approach that mimics this process. |
733 | What Synthesis Is Missing: Depth Adaptation Integrated With Weak Supervision for Indoor Scene Parsing | Keng-Chi Liu, Yi-Ting Shen, Jan P. Klopp, Liang-Gee Chen | The aim of this work is hence twofold: Exploit synthetic data where feasible and integrate weak supervision where necessary. |
734 | AdaptIS: Adaptive Instance Selection Network | Konstantin Sofiiuk, Olga Barinova, Anton Konushin | We present Adaptive Instance Selection network architecture for class-agnostic instance segmentation. |
735 | DADA: Depth-Aware Domain Adaptation in Semantic Segmentation | Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Perez | In this work, we aim at exploiting at best such a privileged information while training the UDA model. |
736 | Guided Curriculum Model Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation | Christos Sakaridis, Dengxin Dai, Luc Van Gool | Our central contributions are: 1) a curriculum framework to gradually adapt semantic segmentation models from day to night via labeled synthetic images and unlabeled real images, both for progressively darker times of day, which exploits cross-time-of-day correspondences for the real images to guide the inference of their labels; 2) a novel uncertainty-aware annotation and evaluation framework and metric for semantic segmentation, designed for adverse conditions and including image regions beyond human recognition capability in the evaluation in a principled fashion; 3) the Dark Zurich dataset, which comprises 2416 unlabeled nighttime and 2920 unlabeled twilight images with correspondences to their daytime counterparts plus a set of 151 nighttime images with fine pixel-level annotations created with our protocol, which serves as a first benchmark to perform our novel evaluation. |
737 | SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation | Yang Zhou, Zachary While, Evangelos Kalogerakis | In this paper we propose a neural message passing approach to augment an input 3D indoor scene with new objects matching their surroundings. |
738 | SkyScapes Fine-Grained Semantic Understanding of Aerial Scenes | Seyed Majid Azimi, Corentin Henry, Lars Sommer, Arne Schumann, Eleonora Vig | We therefore propose a novel multi-task model, which incorporates semantic edge detection and is better tuned for feature extraction from a wide range of scales. |
739 | Transferable Representation Learning in Vision-and-Language Navigation | Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie | Our approach adapts pre-trained vision and language representations to relevant in-domain tasks making them more effective for VLN. |
740 | Towards Unsupervised Image Captioning With Shared Multimodal Embeddings | Iro Laina, Christian Rupprecht, Nassir Navab | In this paper, we address image captioning by generating language descriptions of scenes without learning from annotated pairs of images and their captions. |
741 | ViCo: Word Embeddings From Visual Co-Occurrences | Tanmay Gupta, Alexander Schwing, Derek Hoiem | We propose to learn word embeddings from visual co-occurrences. |
742 | Seq-SG2SL: Inferring Semantic Layout From Scene Graph Through Sequence to Sequence Learning | Boren Li, Boyu Zhuang, Mingyang Li, Jian Gu | We present a conceptually simple, flexible and general framework using sequence to sequence (seq-to-seq) learning for this task. |
743 | U-CAM: Visual Explanation Using Uncertainty Based Class Activation Maps | Badri N. Patro, Mayank Lunayach, Shivansh Patel, Vinay P. Namboodiri | Towards this, we propose a method that obtains gradient-based certainty estimates that also provide visual attention maps. |
744 | See-Through-Text Grouping for Referring Image Segmentation | Ding-Jie Chen, Songhao Jia, Yi-Chen Lo, Hwann-Tzong Chen, Tyng-Luh Liu | Motivated by the conventional grouping techniques to image segmentation, we develop their DNN counterpart to tackle the referring variant. |
745 | VideoBERT: A Joint Model for Video and Language Representation Learning | Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid | Whereas most existing approaches learn low-level representations, we propose a joint visual-linguistic model to learn high-level features without any explicit supervision. |
746 | Language Features Matter: Effective Language Representations for Vision-Language Tasks | Andrea Burns, Reuben Tan, Kate Saenko, Stan Sclaroff, Bryan A. Plummer | We conclude that language features deserve more attention, which has been informed by experiments which compare different word embeddings, language models, and embedding augmentation steps on five common VL tasks: image-sentence retrieval, image captioning, visual question answering, phrase grounding, and text-to-clip retrieval. |
747 | Semantic Stereo Matching With Pyramid Cost Volumes | Zhenyao Wu, Xinyi Wu, Xiaoping Zhang, Song Wang, Lili Ju | To further capture the details of disparity maps, in this paper, we propose a novel semantic stereo network named SSPCV-Net, which includes newly designed pyramid cost volumes for describing semantic and spatial information on multiple levels. |
748 | Spatial Correspondence With Generative Adversarial Network: Learning Depth From Monocular Videos | Zhenyao Wu, Xinyi Wu, Xiaoping Zhang, Song Wang, Lili Ju | In this paper, we present a novel SC-GAN network with end-to-end adversarial training for depth estimation from monocular videos without estimating the camera pose and pose change over time. |
749 | Learning Relationships for Multi-View 3D Object Recognition | Ze Yang, Liwei Wang | To tackle this problem, we propose a Relation Network to effectively connect corresponding regions from different viewpoints, and therefore reinforce the information of individual view image. |
750 | View N-Gram Network for 3D Object Retrieval | Xinwei He, Tengteng Huang, Song Bai, Xiang Bai | To address these issues, we propose an effective and efficient framework called View N-gram Network (VNN). |
751 | Expert Sample Consensus Applied to Camera Re-Localization | Eric Brachmann, Carsten Rother | In this work, we fit the 6D camera pose to a set of noisy correspondences between the 2D input image and a known 3D environment. |
752 | Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints From Limited Training Data | Yutong Bai, Qing Liu, Lingxi Xie, Weichao Qiu, Yan Zheng, Alan L. Yuille | In this paper, we present an approach which can learn from a small annotated dataset containing a limited range of viewpoints and generalize to detect semantic parts for a much larger range of viewpoints. |
753 | Dynamic Points Agglomeration for Hierarchical Point Sets Learning | Jinxian Liu, Bingbing Ni, Caiyuan Li, Jiancheng Yang, Qi Tian | To this end, we develop a novel hierarchical point sets learning architecture, with dynamic points agglomeration. |
754 | Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints | Ning Yu, Larry S. Davis, Mario Fritz | We present the first study of learning GAN fingerprints towards image attribution and using them to classify an image as real or GAN-generated. |
755 | Dual Adversarial Inference for Text-to-Image Synthesis | Qicheng Lao, Mohammad Havaei, Ahmad Pesaranghader, Francis Dutil, Lisa Di Jorio, Thomas Fevens | In this paper, we aim to learn two variables that are disentangled in the latent space, representing content and style respectively. |
756 | View-LSTM: Novel-View Video Synthesis Through View Decomposition | Mohamed Ilyes Lakhal, Oswald Lanz, Andrea Cavallaro | We tackle the problem of synthesizing a video of multiple moving people as seen from a novel view, given only an input video and depth information or human poses of the novel view as prior. |
757 | HoloGAN: Unsupervised Learning of 3D Representations From Natural Images | Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, Yong-Liang Yang | We propose a novel generative adversarial network (GAN) for the task of unsupervised learning of 3D representations from natural images. |
758 | Unpaired Image-to-Speech Synthesis With Multimodal Information Bottleneck | Shuang Ma, Daniel McDuff, Yale Song | We propose a multimodal information bottleneck approach that learns the correspondence between modalities from unpaired data (image and speech) by leveraging the shared modality (text). |
759 | Improved Conditional VRNNs for Video Prediction | Lluis Castrejon, Nicolas Ballas, Aaron Courville | In this work we argue that this is a sign of underfitting. |
760 | Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery | Xiaosheng Yan, Feigege Wang, Wenxi Liu, Yuanlong Yu, Shengfeng He, Jia Pan | In this paper, we propose a novel iterative multi-task framework to complete the segmentation mask of an occluded vehicle and recover the appearance of its invisible parts. To evaluate our method, we present a dataset, Occluded Vehicle dataset, containing synthetic and real-world occluded vehicle images. |
761 | Learning Single Camera Depth Estimation Using Dual-Pixels | Rahul Garg, Neal Wadhwa, Sameer Ansari, Jonathan T. Barron | To allow learning based methods to work well on dual-pixel imagery, we identify an inherent ambiguity in the depth estimated from dual-pixel cues, and develop an approach to estimate depth up to this ambiguity. |
762 | Domain-Adaptive Single-View 3D Reconstruction | Pedro O. Pinheiro, Negar Rostamzadeh, Sungjin Ahn | In this paper, we propose a framework to improve over these challenges using adversarial training. |
763 | Transformable Bottleneck Networks | Kyle Olszewski, Sergey Tulyakov, Oliver Woodford, Hao Li, Linjie Luo | We propose a novel approach to performing fine-grained 3D manipulation of image content via a convolutional neural network, which we call the Transformable Bottleneck Network (TBN). |
764 | RIO: 3D Object Instance Re-Localization in Changing Indoor Environments | Johanna Wald, Armen Avetisyan, Nassir Navab, Federico Tombari, Matthias Niessner | In this work, we introduce the task of 3D object instance re-localization (RIO): given one or multiple objects in an RGB-D scan, we want to estimate their corresponding 6DoF poses in another 3D scan of the same environment taken at a later point in time. |
765 | Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation | Kiru Park, Timothy Patten, Markus Vincze | To address these problems, we propose a novel pose estimation method, Pix2Pose, that predicts the 3D coordinates of each object pixel without textured models. |
766 | CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation | Zhigang Li, Gu Wang, Xiangyang Ji | In this work, we propose a novel 6-DoF pose estimation approach: Coordinates-based Disentangled Pose Network (CDPN), which disentangles the pose to predict rotation and translation separately to achieve highly accurate and robust pose estimation. |
767 | C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion | David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedaldi | We propose C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images. |
768 | Learning to Reconstruct 3D Manhattan Wireframes From a Single Image | Yichao Zhou, Haozhi Qi, Yuexiang Zhai, Qi Sun, Zhili Chen, Li-Yi Wei, Yi Ma | From a single view of an urban environment, we propose a method to effectively exploit the global structural regularities for obtaining a compact, accurate, and intuitive 3D wireframe representation. |
769 | Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning | Shichen Liu, Tianye Li, Weikai Chen, Hao Li | Unlike the state-of-the-art differentiable renderers, which only approximate the rendering gradient in the back propagation, we propose a truly differentiable rendering framework that is able to (1) directly render colorized mesh using differentiable functions and (2) back-propagate efficient supervision signals to mesh vertices and their attributes from various forms of image representations, including silhouette, shading and color images. |
770 | Learnable Triangulation of Human Pose | Karim Iskakov, Egor Burkov, Victor Lempitsky, Yury Malkov | We present two novel solutions for multi-view 3D human pose estimation based on new learnable triangulation methods that combine 3D information from multiple 2D views. |
771 | xR-EgoPose: Egocentric 3D Human Pose From an HMD Camera | Denis Tome, Patrick Peluse, Lourdes Agapito, Hernan Badino | We present a new solution to egocentric 3D body pose estimation from monocular images captured from a downward looking fish-eye camera installed on the rim of a head mounted virtual reality device. |
772 | DeepHuman: 3D Human Reconstruction From a Single Image | Zerong Zheng, Tao Yu, Yixuan Wei, Qionghai Dai, Yebin Liu | We propose DeepHuman, an image-guided volume-to-volume translation CNN for 3D human reconstruction from a single RGB image. |
773 | A Neural Network for Detailed Human Depth Estimation From a Single Image | Sicong Tang, Feitong Tan, Kelvin Cheng, Zhaoyang Li, Siyu Zhu, Ping Tan | This paper presents a neural network to estimate a detailed depth map of the foreground human in a single RGB image. |
774 | DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare | Yuanlu Xu, Song-Chun Zhu, Tony Tung | We present DenseRaC, a novel end-to-end framework for jointly estimating 3D human pose and body shape from a monocular RGB image. To boost learning, we further construct a large-scale synthetic dataset (MOCA) utilizing web-crawled Mocap sequences, 3D scans and animations. |
775 | Not All Parts Are Created Equal: 3D Pose Estimation by Modeling Bi-Directional Dependencies of Body Parts | Jue Wang, Shaoli Huang, Xinchao Wang, Dacheng Tao | In this paper, we propose a progressive approach that explicitly accounts for the distinct DOFs among the body parts. |
776 | Extreme View Synthesis | Inchang Choi, Orazio Gallo, Alejandro Troccoli, Min H. Kim, Jan Kautz | We present Extreme View Synthesis, a solution for novel view extrapolation that works even when the number of input images is small—as few as two. |
777 | View Independent Generative Adversarial Network for Novel View Synthesis | Xiaogang Xu, Ying-Cong Chen, Jiaya Jia | In this paper, we propose an encoder-decoder based generative adversarial network VI-GAN to tackle this problem. |
778 | Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion | Pingping Zhang, Wei Liu, Yinjie Lei, Huchuan Lu, Xiaoyun Yang | To address these issues, in this work we propose a novel deep learning framework, named Cascaded Context Pyramid Network (CCPNet), to jointly infer the occupancy and semantic labels of a volumetric 3D scene from a single depth image. |
779 | View-Consistent 4D Light Field Superpixel Segmentation | Numair Khan, Qian Zhang, Lucas Kasser, Henry Stone, Min H. Kim, James Tompkin | Our proposed approach combines an occlusion-aware angular segmentation in horizontal and vertical EPI spaces with an occlusion-aware clustering and propagation step across all views. |
780 | GLoSH: Global-Local Spherical Harmonics for Intrinsic Image Decomposition | Hao Zhou, Xiang Yu, David W. Jacobs | In this work, we propose a Global-Local Spherical Harmonics (GLoSH) lighting model to improve the lighting component, and jointly predict reflectance and surface normals. |
781 | Surface Normals and Shape From Water | Satoshi Murai, Meng-Yu Jennifer Kuo, Ryo Kawahara, Shohei Nobuhara, Ko Nishino | In this paper, we introduce a novel method for reconstructing surface normals and depth of dynamic objects in water. |
782 | Restoration of Non-Rigidly Distorted Underwater Images Using a Combination of Compressive Sensing and Local Polynomial Image Representations | Jerin Geo James, Pranay Agrawal, Ajit Rajwade | Motivated by this, we pose the task of restoration of such video sequences as a compressed sensing (CS) problem. |
783 | Learning Perspective Undistortion of Portraits | Yajie Zhao, Zeng Huang, Tianye Li, Weikai Chen, Chloe LeGendre, Xinglei Ren, Ari Shapiro, Hao Li | We present the first deep learning based approach to remove such artifacts from unconstrained portraits. Moreover, we also build the first perspective portrait database with a large diversity in identities, expression and poses. |
784 | Towards Photorealistic Reconstruction of Highly Multiplexed Lensless Images | Salman S. Khan, Adarsh V. R., Vivek Boominathan, Jasper Tan, Ashok Veeraraghavan, Kaushik Mitra | In this paper, we present a method to obtain image reconstructions from mask-based lensless measurements that are more photorealistic than those currently available in the literature. |
785 | Unconstrained Motion Deblurring for Dual-Lens Cameras | M. R. Mahesh Mohan, Sharath Girish, A. N. Rajagopalan | In this paper, we propose a generalized blur model that elegantly explains the intrinsically coupled image formation model for dual-lens set-up, which are by far most predominant in smartphones. |
786 | Stochastic Exposure Coding for Handling Multi-ToF-Camera Interference | Jongho Lee, Mohit Gupta | In this paper, we propose stochastic exposure coding (SEC), a novel approach for mitigating. |
787 | Convolutional Approximations to the General Non-Line-of-Sight Imaging Operator | Byeongjoo Ahn, Akshat Dave, Ashok Veeraraghavan, Ioannis Gkioulekas, Aswin C. Sankaranarayanan | We introduce a computationally tractable framework for solving the ellipsoidal tomography problem. |
788 | Agile Depth Sensing Using Triangulation Light Curtains | Joseph R. Bartels, Jian Wang, William “Red” Whittaker, Srinivasa G. Narasimhan | In this paper, we present an approach and system to dynamically and adaptively sample the depths of a scene using the principle of triangulation light curtains. |
789 | Asynchronous Single-Photon 3D Imaging | Anant Gupta, Atul Ingle, Mohit Gupta | We propose asynchronous single-photon 3D imaging, a family of acquisition schemes to mitigate pileup during data acquisition itself. |
790 | Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation | Yu-Jhe Li, Ci-Siang Lin, Yan-Bo Lin, Yu-Chiang Frank Wang | To achieve this goal, our proposed Pose Disentanglement and Adaptation Network (PDA-Net) aims at learning deep image representation with pose and domain information properly disentangled. |
791 | A Learned Representation for Scalable Vector Graphics | Raphael Gontijo Lopes, David Ha, Douglas Eck, Jonathon Shlens | In this work we attempt to model the drawing process of fonts by building sequential generative models of vector graphics. |
792 | ELF: Embedded Localisation of Features in Pre-Trained CNN | Assia Benbihi, Matthieu Geist, Cedric Pradalier | This paper introduces a novel feature detector based only on information embedded inside a CNN trained on standard tasks (e.g. classification). |
793 | Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking | Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler | We propose a new Group Feature Selection method for Discriminative Correlation Filters (GFS-DCF) based visual object tracking. |
794 | Sampling Wisely: Deep Image Embedding by Top-K Precision Optimization | Jing Lu, Chaofan Xu, Wei Zhang, Ling-Yu Duan, Tao Mei | In contrast to existing works, in this paper, we propose a novel deep image embedding algorithm with end-to-end optimization to top-k precision, the evaluation metric that is closely related to user experience. |
795 | On the Global Optima of Kernelized Adversarial Representation Learning | Bashir Sadeghi, Runyi Yu, Vishnu Boddeti | In this paper, we first study the “linear” form of this problem i.e., the setting where all the players are linear functions. We show that the resulting optimization problem is both non-convex and non-differentiable. We obtain an exact closed-form expression for its global optima through spectral learning and provide performance guarantees in terms of analytical bounds on the achievable utility and invariance. We then extend this solution and analysis to non-linear functions through kernel representation. |
796 | Addressing Model Vulnerability to Distributional Shifts Over Image Transformation Sets | Riccardo Volpi, Vittorio Murino | We formulate a combinatorial optimization problem that allows evaluating the regions in the image space where a given model is more vulnerable, in terms of image transformations applied to the input, and face it with standard search algorithms. |
797 | Attract or Distract: Exploit the Margin of Open Set | Qianyu Feng, Guoliang Kang, Hehe Fan, Yi Yang | In this paper, we exploit the semantic structure of open set data from two aspects: 1) Semantic Categorical Alignment, which aims to achieve good separability of target known classes by categorically aligning the centroid of target with the source. |
798 | MIC: Mining Interclass Characteristics for Improved Metric Learning | Karsten Roth, Biagio Brattoli, Bjorn Ommer | In contrast, we propose to explicitly learn the latent characteristics that are shared by and go across object classes. |
799 | Self-Supervised Representation Learning via Neighborhood-Relational Encoding | Mohammad Sabokrou, Mohammad Khalooei, Ehsan Adeli | In this paper, we propose a novel self-supervised representation learning by taking advantage of a neighborhood-relational encoding (NRE) among the training data. |
800 | AWSD: Adaptive Weighted Spatiotemporal Distillation for Video Representation | Mohammad Tavakolian, Hamed R. Tavakoli, Abdenour Hadid | We propose an Adaptive Weighted Spatiotemporal Distillation (AWSD) technique for video representation by encoding the appearance and dynamics of the videos into a single RGB image map. |
801 | Bilinear Attention Networks for Person Retrieval | Pengfei Fang, Jieming Zhou, Soumava Kumar Roy, Lars Petersson, Mehrtash Harandi | We propose an Attention in Attention (AiA) mechanism to build inter-dependency among the second order local and global features with the intent to make better use of, or pay more attention to, such higher order statistical relationships. |
802 | Discriminative Feature Learning With Consistent Attention Regularization for Person Re-Identification | Sanping Zhou, Fei Wang, Zeyi Huang, Jinjun Wang | In this paper, we propose a simple yet effective feedforward attention network to address the two mentioned problems, in which a novel consistent attention regularizer and an improved triplet loss are designed to learn foreground attentive features for person Re-ID. |
803 | Semi-Supervised Domain Adaptation via Minimax Entropy | Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko | To address this semi-supervised domain adaptation (SSDA) setting, we propose a novel Minimax Entropy (MME) approach that adversarially optimizes an adaptive few-shot model. |
804 | Boosting Few-Shot Visual Learning With Self-Supervision | Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Perez, Matthieu Cord | In this work we exploit the complementarity of these two domains and propose an approach for improving few-shot learning through self-supervision. |
805 | FDA: Feature Disruptive Attack | Aditya Ganeshan, Vivek B.S., R. Venkatesh Babu | In this work we, (i) show the drawbacks of such attacks, (ii) propose two new evaluation metrics: Old Label New Rank (OLNR) and New Label Old Rank (NLOR) in order to quantify the extent of damage made by an attack, and (iii) propose a new attack FDA: Feature Disruptive attack, to address the drawbacks of existing attacks. |
806 | A Novel Unsupervised Camera-Aware Domain Adaptation Framework for Person Re-Identification | Lei Qi, Lei Wang, Jing Huo, Luping Zhou, Yinghuan Shi, Yang Gao | From the perspective of representation learning, this paper proposes a novel end-to-end deep domain adaptation framework to address them. |
807 | Recover and Identify: A Generative Dual Model for Cross-Resolution Person Re-Identification | Yu-Jhe Li, Yun-Chun Chen, Yen-Yu Lin, Xiaofei Du, Yu-Chiang Frank Wang | To overcome this problem, we propose a novel generative adversarial network to address cross-resolution person re-ID, allowing query images with varying resolutions. |
808 | Cross-View Policy Learning for Street Navigation | Ang Li, Huiyi Hu, Piotr Mirowski, Mehrdad Farajtabar | Since aerial images are easily and globally accessible, we propose instead to transfer a ground view policy, from training areas to unseen (target) parts of the city, by utilizing aerial view observations. |
809 | Learning Across Tasks and Domains | Pierluigi Zama Ramirez, Alessio Tonioni, Samuele Salti, Luigi Di Stefano | In this work, we introduce a novel adaptation framework that can operate across both task and domains. |
810 | EMPNet: Neural Localisation and Mapping Using Embedded Memory Points | Gil Avraham, Yan Zuo, Thanuja Dharmasiri, Tom Drummond | In this work we develop a memory module which contains rigidly aligned point-embeddings that represent a coherent scene structure acquired from an RGB-D sequence of observations. |
811 | AVT: Unsupervised Learning of Transformation Equivariant Representations by Autoencoding Variational Transformations | Guo-Jun Qi, Liheng Zhang, Chang Wen Chen, Qi Tian | To this end, we present a novel principled method by Autoencoding Variational Transformations (AVT), compared with the conventional approach to autoencoding data. |
812 | Composite Shape Modeling via Latent Space Factorization | Anastasia Dubrovina, Fei Xia, Panos Achlioptas, Mira Shalah, Raphael Groscot, Leonidas J. Guibas | We present a novel neural network architecture, termed Decomposer-Composer, for semantic structure-aware 3D shape modeling. |
813 | Deep Comprehensive Correlation Mining for Image Clustering | Jianlong Wu, Keyu Long, Fei Wang, Chen Qian, Cheng Li, Zhouchen Lin, Hongbin Zha | In this paper, we propose a novel clustering framework, named deep comprehensive correlation mining (DCCM), for exploring and taking full advantage of various kinds of correlations behind the unlabeled data from three aspects: 1) Instead of only using pair-wise information, pseudo-label supervision is proposed to investigate category information and learn discriminative features. |
814 | Unsupervised Multi-Task Feature Learning on Point Clouds | Kaveh Hassani, Mike Haley | We introduce an unsupervised multi-task model to jointly learn point and shape features on point clouds. |
815 | Reciprocal Multi-Layer Subspace Learning for Multi-View Clustering | Ruihuang Li, Changqing Zhang, Huazhu Fu, Xi Peng, Tianyi Zhou, Qinghua Hu | In this work, we present a novel Reciprocal Multi-layer Subspace Learning (RMSL) algorithm for multi-view clustering, which is composed of two main components: Hierarchical Self-Representative Layers (HSRL), and Backward Encoding Networks (BEN). |
816 | Geometric Disentanglement for Generative Latent Shape Models | Tristan Aumentado-Armstrong, Stavros Tsogkas, Allan Jepson, Sven Dickinson | In this paper, we propose an unsupervised approach to partitioning the latent space of a variational autoencoder for 3D point clouds in a natural way, using only geometric information, that builds upon prior work utilizing generative adversarial models of point sets. |
817 | GAN-Tree: An Incrementally Learned Hierarchical Generative Framework for Multi-Modal Data Distributions | Jogendra Nath Kundu, Maharshi Gor, Dakshit Agrawal, R. Venkatesh Babu | In contrast to such bottom-up approaches, we present GAN-Tree, which follows a hierarchical divisive strategy to address such discontinuous multi-modal data. |
818 | GODS: Generalized One-Class Discriminative Subspaces for Anomaly Detection | Jue Wang, Anoop Cherian | In this paper, we propose a novel objective for one-class learning. |
819 | Neighborhood Preserving Hashing for Scalable Video Retrieval | Shuyan Li, Zhixiang Chen, Jiwen Lu, Xiu Li, Jie Zhou | In this paper, we propose a Neighborhood Preserving Hashing (NPH) method for scalable video retrieval in an unsupervised manner. |
820 | Self-Training With Progressive Augmentation for Unsupervised Cross-Domain Person Re-Identification | Xinyu Zhang, Jiewei Cao, Chunhua Shen, Mingyu You | In this work, we develop a self-training method with progressive augmentation framework (PAST) to promote the model performance progressively on the target dataset. |
821 | SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects | Xue Yang, Jirui Yang, Junchi Yan, Yue Zhang, Tengfei Zhang, Zhi Guo, Xian Sun, Kun Fu | This paper presents a novel multi-category rotation detector for small, cluttered and rotated objects, namely SCRDet. |
822 | Cross-X Learning for Fine-Grained Visual Categorization | Wei Luo, Xitong Yang, Xianjie Mo, Yuheng Lu, Larry S. Davis, Jun Li, Jian Yang, Ser-Nam Lim | In this paper, we propose Cross-X learning, a simple yet effective approach that exploits the relationships between different images and between different network layers for robust multi-scale feature learning. |
823 | Maximum-Margin Hamming Hashing | Rong Kang, Yue Cao, Mingsheng Long, Jianmin Wang, Philip S. Yu | The main idea of this work is to directly embody the Hamming radius into the loss functions, leading to Maximum-Margin Hamming Hashing (MMHH), a new model specifically optimized for Hamming space retrieval. |
824 | Conservative Wasserstein Training for Pose Estimation | Xiaofeng Liu, Yang Zou, Tong Che, Peng Ding, Ping Jia, Jane You, B.V.K. Vijaya Kumar | We propose to incorporate inter-class correlations in a Wasserstein training framework by pre-defining (i.e., using arc length of a circle) or adaptively learning the ground metric. |
825 | Learning to Rank Proposals for Object Detection | Zhiyu Tan, Xuecheng Nie, Qi Qian, Nan Li, Hao Li | To address this issue, in this paper, we propose a novel Learning-to-Rank (LTR) model to produce the suppression rank via a learning procedure, thus facilitating the candidate generation and lifting the detection performance. |
826 | Vehicle Re-Identification With Viewpoint-Aware Metric Learning | Ruihang Chu, Yifan Sun, Yadong Li, Zheng Liu, Chi Zhang, Yichen Wei | Inspired by the behavior in human’s recognition process, we propose a novel viewpoint-aware metric learning approach. |
827 | WSOD2: Learning Bottom-Up and Top-Down Objectness Distillation for Weakly-Supervised Object Detection | Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao, Lei Zhang | In this paper, we propose a novel WSOD framework with Objectness Distillation (i.e., WSOD2) by designing a tailored training mechanism for weakly-supervised object detection. |
828 | Localization of Deep Inpainting Using High-Pass Fully Convolutional Network | Haodong Li, Jiwu Huang | This paper presents a method to locate the regions manipulated by deep inpainting. |
829 | Clustered Object Detection in Aerial Images | Fan Yang, Heng Fan, Peng Chu, Erik Blasch, Haibin Ling | In this paper, we address both issues inspired by observing that these targets are often clustered. |
830 | Unsupervised Graph Association for Person Re-Identification | Jinlin Wu, Yang Yang, Hao Liu, Shengcai Liao, Zhen Lei, Stan Z. Li | In this paper, we propose an unsupervised graph association (UGA) framework to learn the underlying viewinvariant representations from the video pedestrian tracklets. |
831 | Learning a Mixture of Granularity-Specific Experts for Fine-Grained Categorization | Lianbo Zhang, Shaoli Huang, Wei Liu, Dacheng Tao | We aim to divide the problem space of fine-grained recognition into some specific regions. |
832 | advPattern: Physical-World Attacks on Deep Person Re-Identification via Adversarially Transformable Patterns | Zhibo Wang, Siyan Zheng, Mengkai Song, Qian Wang, Alireza Rahimpour, Hairong Qi | We propose a novel attack algorithm, called advPattern, for generating adversarial patterns on clothes, which learns the variations of image pairs across cameras to pull closer the image features from the same camera, while pushing features from different cameras farther. |
833 | ABD-Net: Attentive but Diverse Person Re-Identification | Tianlong Chen, Shaojin Ding, Jingyi Xie, Ye Yuan, Wuyang Chen, Yang Yang, Zhou Ren, Zhangyang Wang | Specifically, we introduce a pair of complementary attention modules, focusing on channel aggregation and position awareness, respectively. |
834 | From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer | Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Zhiguo Cao, Chunhua Shen | Inspired by this idea, we propose a simple but effective approach, Spatial Divide-and-Conquer Network (S-DCNet). |
835 | Towards Precise End-to-End Weakly Supervised Object Detection Network | Ke Yang, Dongsheng Li, Yong Dou | In this paper, we propose to jointly train the two phases in an end-to-end manner to tackle this problem. |
836 | Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting | Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai | In this paper, we propose a simple yet effective approach to tackle this problem. |
837 | Ground-to-Aerial Image Geo-Localization With a Hard Exemplar Reweighting Triplet Loss | Sudong Cai, Yulan Guo, Salman Khan, Jiwei Hu, Gongjian Wen | In this paper, we propose a novel in-batch reweighting triplet loss to emphasize the positive effect of hard exemplars during end-to-end training. |
838 | Learning to Discover Novel Visual Categories via Deep Transfer Clustering | Kai Han, Andrea Vedaldi, Andrew Zisserman | Our contributions are twofold. The first contribution is to extend Deep Embedded Clustering to a transfer learning setting; we also improve the algorithm by introducing a representation bottleneck, temporal ensembling, and consistency. The second contribution is a method to estimate the number of classes in the unlabelled data. |
839 | AM-LFS: AutoML for Loss Function Search | Chuming Li, Xin Yuan, Chen Lin, Minghao Guo, Wei Wu, Junjie Yan, Wanli Ouyang | In this paper, we propose AutoML for Loss Function Search (AM-LFS) which leverages REINFORCE to search loss functions during the training process. |
840 | Few-Shot Object Detection via Feature Reweighting | Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell | In this work we develop a few-shot object detector that can learn to detect novel objects from only a few annotated examples. |
841 | Objects365: A Large-Scale, High-Quality Dataset for Object Detection | Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, Jian Sun | In this paper, we introduce a new large-scale object detection dataset, Objects365, which has 365 object categories over 600K training images. |
842 | Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network | Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen | In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing. |
843 | Foreground-Aware Pyramid Reconstruction for Alignment-Free Occluded Person Re-Identification | Lingxiao He, Yinggang Wang, Wu Liu, He Zhao, Zhenan Sun, Jiashi Feng | This paper proposes a novel occlusion-robust and alignment-free model for occluded person ReID and extends its application to realistic and crowded scenarios. |
844 | Collect and Select: Semantic Alignment Metric Learning for Few-Shot Learning | Fusheng Hao, Fengxiang He, Jun Cheng, Lei Wang, Jianzhong Cao, Dacheng Tao | To address this issue, this paper proposes a Semantic Alignment Metric Learning (SAML) method for few-shot learning that aligns the semantically relevant dominant objects through a “collect-and-select” strategy. |
845 | Bayesian Adaptive Superpixel Segmentation | Roy Uziel, Meitar Ronen, Oren Freifeld | As a remedy, we propose a novel probabilistic model, self-coined Bayesian Adaptive Superpixel Segmentation (BASS), together with an efficient inference. |
846 | CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing | Kevin Duarte, Yogesh S. Rawat, Mubarak Shah | In this work we propose a capsule-based approach for semi-supervised video object segmentation. |
847 | BAE-NET: Branched Autoencoder for Shape Co-Segmentation | Zhiqin Chen, Kangxue Yin, Matthew Fisher, Siddhartha Chaudhuri, Hao Zhang | We treat shape co-segmentation as a representation learning problem and introduce BAE-NET, a branched autoencoder network, for the task. |
848 | VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation | Hsien-Yu Meng, Lin Gao, Yu-Kun Lai, Dinesh Manocha | We present a novel algorithm for point cloud segmentation.Our approach transforms unstructured point clouds into regular voxel grids, and further uses a kernel-based interpolated variational autoencoder (VAE) architecture to encode the local geometry within each voxel. |
849 | Group-Wise Deep Object Co-Segmentation With Co-Attention Recurrent Neural Network | Bo Li, Zhengxing Sun, Qian Li, Yunjie Wu, Anqi Hu | This paper proposes a novel end-to-end deep learning approach for group-wise object co-segmentation with a recurrent network architecture. |
850 | Human Attention in Image Captioning: Dataset and Analysis | Sen He, Hamed R. Tavakoli, Ali Borji, Nicolas Pugeault | In this work, we present a novel dataset consisting of eye movements and verbal descriptions recorded synchronously over images. |
851 | Variational Uncalibrated Photometric Stereo Under General Lighting | Bjoern Haefner, Zhenzhang Ye, Maolin Gao, Tao Wu, Yvain Queau, Daniel Cremers | To eliminate such restrictions, we propose an efficient principled variational approach to uncalibrated PS under general illumination. |
852 | SPLINE-Net: Sparse Photometric Stereo Through Lighting Interpolation and Normal Estimation Networks | Qian Zheng, Yiming Jia, Boxin Shi, Xudong Jiang, Ling-Yu Duan, Alex C. Kot | This paper solves the Sparse Photometric stereo through Lighting Interpolation and Normal Estimation using a generative Network (SPLINE-Net). |
853 | Hyperspectral Image Reconstruction Using Deep External and Internal Learning | Tao Zhang, Ying Fu, Lizhi Wang, Hua Huang | In this paper, we present an effective convolutional neural network (CNN) based method for coded HSI reconstruction, which learns the deep prior from the external dataset as well as the internal information of input coded image with spatial-spectral constraint. |
854 | Gravity as a Reference for Estimating a Person’s Height From Video | Didier Bieler, Semih Gunel, Pascal Fua, Helge Rhodin | We focus on motion cues and exploit gravity on earth as an omnipresent reference ‘object’ to translate acceleration, and subsequently height, measured in image-pixels to values in meters. |
855 | Shadow Removal via Shadow Image Decomposition | Hieu Le, Dimitris Samaras | We propose a novel deep learning method for shadow removal. Moreover, we create an augmented ISTD dataset based on an image decomposition system by modifying the shadow parameters to generate new synthetic shadow images. |
856 | OperatorNet: Recovering 3D Shapes From Difference Operators | Ruqi Huang, Marie-Julie Rakotosaona, Panos Achlioptas, Leonidas J. Guibas, Maks Ovsjanikov | This paper proposes a learning-based framework for reconstructing 3D shapes from functional operators, compactly encoded as small-sized matrices. |
857 | Neural Inverse Rendering of an Indoor Scene From a Single Image | Soumyadip Sengupta, Jinwei Gu, Kihwan Kim, Guilin Liu, David W. Jacobs, Jan Kautz | Our key contribution is the Residual Appearance Renderer (RAR), which can be trained to synthesize complex appearance effects (e.g., inter-reflection, cast shadows, near-field illumination, and realistic shading), which would be neglected otherwise. |
858 | ForkNet: Multi-Branch Volumetric Semantic Completion From a Single Depth Image | Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari | We propose a novel model for 3D semantic completion from a single depth image, based on a single encoder and three separate generators used to reconstruct different geometric and semantic representations of the original and completed scene, all sharing the same latent space. |
859 | Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments | Junsheng Zhou, Yuwang Wang, Kaihuai Qin, Wenjun Zeng | To overcome these problems, we propose a new optical-flow based training paradigm which reduces the difficulty of unsupervised learning by providing a clearer training target and handles the non-texture regions. |
860 | GraphX-Convolution for Point Cloud Deformation in 2D-to-3D Conversion | Anh-Duc Nguyen, Seonghwa Choi, Woojae Kim, Sanghoon Lee | In this paper, we present a novel deep method to reconstruct a point cloud of an object from a single still image. |
861 | FrameNet: Learning Local Canonical Frames of 3D Surfaces From a Single RGB Image | Jingwei Huang, Yichao Zhou, Thomas Funkhouser, Leonidas J. Guibas | In this work, we introduce the novel problem of identifying dense canonical 3D coordinate frames from a single RGB image. |
862 | Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense | Yixin Chen, Siyuan Huang, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu | We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic scene parsing and reconstruction—3D estimations of object bounding boxes, camera pose, and room layout, and (ii) 3D human pose estimation. |
863 | MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding | Quan Kong, Ziming Wu, Ziwei Deng, Martin Klinkigt, Bin Tong, Tomokazu Murakami | On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism to realize an adaptive knowledge transfer from sensor-based modalities to vision-based modalities. To address the disadvantage of vision-based modalities and push towards multi/cross modal action understanding, this paper introduces a new large-scale dataset recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. |
864 | HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization | Hang Zhao, Antonio Torralba, Lorenzo Torresani, Zhicheng Yan | This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos. |
865 | 3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization | Sanath Narayan, Hisham Cholakkal, Fahad Shahbaz Khan, Ling Shao | In this work, we propose a framework, called 3C-Net, which only requires video-level supervision (weak supervision) in the form of action category labels and the corresponding count. |
866 | Grounded Human-Object Interaction Hotspots From Video | Tushar Nagarajan, Christoph Feichtenhofer, Kristen Grauman | We propose an approach to learn human-object interaction “hotspots” directly from video. |
867 | Hallucinating IDT Descriptors and I3D Optical Flow Features for Action Recognition With CNNs | Lei Wang, Piotr Koniusz, Du Q. Huynh | In this paper, we revive the use of old-fashioned handcrafted video representations for action recognition and put new life into these techniques via a CNN-based hallucination step. |
868 | Learning to Paint With Model-Based Deep Reinforcement Learning | Zhewei Huang, Wen Heng, Shuchang Zhou | We show how to teach machines to paint like human painters, who can use a small number of strokes to create fantastic paintings. |
869 | Neural Re-Simulation for Generating Bounces in Single Images | Carlo Innamorati, Bryan Russell, Danny M. Kaufman, Niloy J. Mitra | We introduce a method to generate videos of dynamic virtual objects plausibly interacting via collisions with a still image’s environment. |
870 | Deep Appearance Maps | Maxim Maximov, Laura Leal-Taixe, Mario Fritz, Tobias Ritschel | We propose a deep representation of appearance, i.e. the relation of color, surface orientation, viewer position, material and illumination. |
871 | GarNet: A Two-Stream Network for Fast and Accurate 3D Cloth Draping | Erhan Gundogdu, Victor Constantin, Amrollah Seifoddini, Minh Dang, Mathieu Salzmann, Pascal Fua | Taking advantage of this, we propose a novel architecture to fit a 3D garment template to a 3D body. |
872 | Joint Embedding of 3D Scan and CAD Objects | Manuel Dahnert, Angela Dai, Leonidas J. Guibas, Matthias Niessner | We propose a novel approach to learn a joint embedding space between scan and CAD geometry, where semantically similar objects from both domains lie close together. |
873 | CompoNet: Learning to Generate the Unseen by Part Synthesis and Composition | Nadav Schor, Oren Katzir, Hao Zhang, Daniel Cohen-Or | In our work, we present CompoNet, a generative neural network for 2D or 3D shapes that is based on a part-based prior, where the key idea is for the network to synthesize shapes by varying both the shape parts and their compositions. |
874 | DDSL: Deep Differentiable Simplex Layer for Learning Geometric Signals | Chiyu “Max” Jiang, Dana Lansigan, Philip Marcus, Matthias Niessner | We present a complete theoretical framework for the process as well as an efficient backpropagation algorithm. |
875 | EGNet: Edge Guidance Network for Salient Object Detection | Jia-Xing Zhao, Jiang-Jiang Liu, Deng-Ping Fan, Yang Cao, Jufeng Yang, Ming-Ming Cheng | In this paper, to solve this problem, we focus on the complementarity between salient edge information and salient object information. |
876 | SID4VAM: A Benchmark Dataset With Synthetic Images for Visual Attention Modeling | David Berga, Xose R. Fdez-Vidal, Xavier Otazu, Xose M. Pardo | This study proposes a new way to evaluate saliency models in the forthcoming literature, accounting for synthetic images with uniquely low-level feature contexts, distinct from previous eye tracking image datasets. |
877 | Two-Stream Action Recognition-Oriented Video Super-Resolution | Haochen Zhang, Dong Liu, Zhiwei Xiong | Tailored for two-stream action recognition networks, we propose two video SR methods for the spatial and temporal streams respectively. |
878 | Where Is My Mirror? | Xin Yang, Haiyang Mei, Ke Xu, Xiaopeng Wei, Baocai Yin, Rynson W.H. Lau | In this paper, we present a novel method to segment mirrors from an input image. First, we construct a large-scale mirror dataset that contains mirror images with corresponding manually annotated masks. |
879 | Disentangled Image Matting | Shaofan Cai, Xiaoshuai Zhang, Haoqiang Fan, Haibin Huang, Jiangyu Liu, Jiaming Liu, Jiaying Liu, Jue Wang, Jian Sun | We propose AdaMatting, a new end-to-end matting framework that disentangles this problem into two sub-tasks: trimap adaptation and alpha estimation. |
880 | Guided Super-Resolution As Pixel-to-Pixel Transformation | Riccardo de Lutio, Stefano D’Aronco, Jan Dirk Wegner, Konrad Schindler | Here, we propose to turn that interpretation on its head and instead see it as a pixel-to-pixel mapping of the guide image to the domain of the source image. |
881 | Deep Learning for Light Field Saliency Detection | Tiantian Wang, Yongri Piao, Xiao Li, Lihe Zhang, Huchuan Lu | To address this, we introduce a new dataset to assist the subsequent research in 4D light field saliency detection. |
882 | Optimizing the F-Measure for Threshold-Free Salient Object Detection | Kai Zhao, Shanghua Gao, Wenguan Wang, Ming-Ming Cheng | In this paper, we investigate an interesting issue: can we consistently use the F-measure formulation in both training and evaluation for SOD? |
883 | Image Inpainting With Learnable Bidirectional Attention Maps | Chaohao Xie, Shaohui Liu, Chao Li, Ming-Ming Cheng, Wangmeng Zuo, Xiao Liu, Shilei Wen, Errui Ding | In this paper, we present a learnable attention map module for learning feature re-normalization and mask-updating in an end-to-end manner, which is effective in adapting to irregular holes and propagation of convolution layers. |
884 | Joint Demosaicking and Denoising by Fine-Tuning of Bursts of Raw Images | Thibaud Ehret, Axel Davy, Pablo Arias, Gabriele Facciolo | In this paper we present a method to learn demosaicking directly from mosaicked images, without requiring ground truth RGB data. |
885 | DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better | Orest Kupyn, Tetiana Martyniuk, Junru Wu, Zhangyang Wang | We present a new end-to-end generative adversarial network (GAN) for single image motion deblurring, named DeblurGAN-V2, which considerably boosts state-of-the-art deblurring performance while being much more flexible and efficient. |
886 | Reflective Decoding Network for Image Captioning | Lei Ke, Wenjie Pei, Ruiyu Li, Xiaoyong Shen, Yu-Wing Tai | Following the conventional encoder-decoder framework, we propose the Reflective Decoding Network (RDN) for image captioning, which enhances both the long-sequence dependency and position perception of words in a caption decoder. |
887 | Joint Optimization for Cooperative Image Captioning | Gilad Vered, Gal Oren, Yuval Atzmon, Gal Chechik | To address these challenges, we present an effective optimization technique based on partial-sampling from a multinomial distribution combined with straight-through gradient updates, which we name PSST for Partial-Sampling Straight-Through. |
888 | Watch, Listen and Tell: Multi-Modal Weakly Supervised Dense Event Captioning | Tanzila Rahman, Bicheng Xu, Leonid Sigal | In this paper, we present the evidence, that audio signals can carry surprising amount of information when it comes to high-level visual-lingual tasks. |
889 | Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning | Jingyi Hou, Xinxiao Wu, Wentian Zhao, Jiebo Luo, Yunde Jia | We propose a novel video captioning approach that takes into account both visual perception and syntax representation learning to generate accurate descriptions of videos. |
890 | Entangled Transformer for Image Captioning | Guang Li, Linchao Zhu, Ping Liu, Yi Yang | In this paper, we investigate a Transformer-based sequence modeling framework, built only with attention layers and feedforward layers. |
891 | Shapeglot: Learning Language for Shape Differentiation | Panos Achlioptas, Judy Fan, Robert Hawkins, Noah Goodman, Leonidas J. Guibas | In this work we explore how fine-grained differences between the shapes of common objects are expressed in language, grounded on 2D and/or 3D object representations. We first build a large scale, carefully controlled dataset of human utterances each of which refers to a 2D rendering of a 3D CAD model so as to distinguish it from a set of shape-wise similar alternatives. |
892 | nocaps: novel object captioning at scale | Harsh Agrawal, Karan Desai, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson | To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task. |
893 | Fully Convolutional Geometric Features | Christopher Choy, Jaesik Park, Vladlen Koltun | In this work, we present fully-convolutional geometric features, computed in a single pass by a 3D fully-convolutional network. |
894 | Learning Local RGB-to-CAD Correspondences for Object Pose Estimation | Georgios Georgakis, Srikrishna Karanam, Ziyan Wu, Jana Kosecka | In this paper, we solve this key problem of existing methods requiring expensive 3D pose annotations by proposing a new method that matches RGB images to CAD models for object pose estimation. |
895 | Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras | Ariel Gordon, Hanhan Li, Rico Jonschkowski, Anelia Angelova | We present a novel method for simultaneous learning of depth, egomotion, object motion, and camera intrinsics from monocular videos, using only consistency across neighboring video frames as supervision signal. |
896 | OmniMVS: End-to-End Learning for Omnidirectional Stereo Matching | Changhee Won, Jongbin Ryu, Jongwoo Lim | In this paper, we propose a novel end-to-end deep neural network model for omnidirectional depth estimation from a wide-baseline multi-view stereo setup. In addition, we present large-scale synthetic datasets for training and testing omnidirectional multi-view stereo algorithms. |
897 | On the Over-Smoothing Problem of CNN Based Disparity Estimation | Chuangrong Chen, Xiaozhi Chen, Hui Cheng | Based on this observation, we propose a single-modal weighted average operation on the probability distribution during inference, which can alleviate the problem effectively. |
898 | Disentangling Propagation and Generation for Video Prediction | Hang Gao, Huazhe Xu, Qi-Zhi Cai, Ruth Wang, Fisher Yu, Trevor Darrell | In this paper, we describe a computational model for high-fidelity video prediction which disentangles motion-specific propagation from motion-agnostic generation. |
899 | Guided Image-to-Image Translation With Bi-Directional Feature Transformation | Badour AlBahar, Jia-Bin Huang | To better utilize the constraints of the guidance image, we present a bi-directional feature transformation (bFT) scheme. |
900 | Towards Multi-Pose Guided Virtual Try-On Network | Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bochao Wang, Hanjiang Lai, Jia Zhu, Zhiting Hu, Jian Yin | This paper makes the first attempt towards a multi-pose guided virtual try-on system, which enables clothes to transfer onto a person with diverse poses. |
901 | Photorealistic Style Transfer via Wavelet Transforms | Jaejun Yoo, Youngjung Uh, Sanghyuk Chun, Byeongkyu Kang, Jung-Woo Ha | We introduce a theoretically sound correction to the network architecture that remarkably enhances photorealism and faithfully transfers the style. |
902 | Personalized Fashion Design | Cong Yu, Yang Hu, Yan Chen, Bing Zeng | In this work, we propose to automatically synthesis new items for recommendation. |
903 | Tag2Pix: Line Art Colorization Using Text Tag With SECat and Changing Loss | Hyunsu Kim, Ho Young Jhoo, Eunhyeok Park, Sungjoo Yoo | A GAN approach is proposed, called Tag2Pix, of line art colorization which takes as input a grayscale line art and color tag information and produces a quality colored image. |
904 | Free-Form Video Inpainting With 3D Gated Convolution and Temporal PatchGAN | Ya-Liang Chang, Zhe Yu Liu, Kuan-Ying Lee, Winston Hsu | In this paper, we introduce a deep learning based free-form video inpainting model, with proposed 3D gated convolutions to tackle the uncertainty of free-form masks and a novel Temporal PatchGAN loss to enhance temporal consistency. |
905 | TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting | Wei Feng, Wenhao He, Fei Yin, Xu-Yao Zhang, Cheng-Lin Liu | In this paper, we propose a novel text spotting framework to detect and recognize text of arbitrary shapes in an end-to-end manner, using only word/line-level annotations for training. |
906 | Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning | Yipeng Sun, Jiaming Liu, Wei Liu, Junyu Han, Errui Ding, Jingtuo Liu | To address this issue, we introduce a new large-scale text reading benchmark dataset named Chinese Street View Text (C-SVT) with 430,000 street view images, which is at least 14 times as large as the existing Chinese text reading benchmarks. |
907 | Deep Floor Plan Recognition Using a Multi-Task Network With Room-Boundary-Guided Attention | Zhiliang Zeng, Xianzhi Li, Ying Kin Yu, Chi-Wing Fu | This paper presents a new approach to recognize elements in floor plan layouts. |
908 | GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition | Fangneng Zhan, Chuhui Xue, Shijian Lu | This paper presents an innovative Geometry-Aware Domain Adaptation Network (GA-DAN) that is capable of modelling cross-domain shifts concurrently in both geometry space and appearance space and realistically converting images across domains with very different characteristics. |
909 | Large-Scale Tag-Based Font Retrieval With Generative Feature Learning | Tianlang Chen, Zhaowen Wang, Ning Xu, Hailin Jin, Jiebo Luo | In this paper, we address the problem of large-scale tag-based font retrieval which aims to bring semantics to the font selection process and enable people without expert knowledge to use fonts effectively. |
910 | Convolutional Character Networks | Linjie Xing, Zhi Tian, Weilin Huang, Matthew R. Scott | In this work, we propose convolutional character networks, referred as CharNet, which is an one-stage model that can process two tasks simultaneously in one pass. |
911 | Geometry Normalization Networks for Accurate Scene Text Detection | Youjiang Xu, Jiaqi Duan, Zhanghui Kuang, Xiaoyu Yue, Hongbin Sun, Yue Guan, Wayne Zhang | In this work, we first conduct experiments to investigate the capacity of networks for learning geometry variances on detecting scene texts, and find that networks can handle only limited text geometry variances. Then, we put forward a novel Geometry Normalization Module (GNM) with multiple branches, each of which is composed of one Scale Normalization Unit and one Orientation Normalization Unit, to normalize each text instance to one desired canonical geometry range through at least one branch. |
912 | Symmetry-Constrained Rectification Network for Scene Text Recognition | Mingkun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai | To tackle this issue, we propose in this paper a Symmetry-constrained Rectification Network (ScRN) based on local attributes of text instances, such as center line, scale and orientation. |
913 | YOLACT: Real-Time Instance Segmentation | Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee | We present a simple, fully-convolutional model for real-time instance segmentation that achieves 29.8 mAP on MS COCO at 33.5 fps evaluated on a single Titan Xp, which is significantly faster than any previous competitive approach. |
914 | Expectation-Maximization Attention Networks for Semantic Segmentation | Xia Li, Zhisheng Zhong, Jianlong Wu, Yibo Yang, Zhouchen Lin, Hong Liu | In this paper, we formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed. |
915 | Multi-Class Part Parsing With Joint Boundary-Semantic Awareness | Yifan Zhao, Jia Li, Yu Zhang, Yonghong Tian | In this paper, we propose a joint parsing framework with boundary and semantic awareness to address this challenging problem. |
916 | Explaining Neural Networks Semantically and Quantitatively | Runjin Chen, Hao Chen, Jie Ren, Ge Huang, Quanshi Zhang | This paper presents a method to pursue a semantic and quantitative explanation for the knowledge encoded in a convolutional neural network (CNN). |
917 | PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment | Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, Jiashi Feng | In this paper, we tackle the challenging few-shot segmentation problem from a metric learning perspective and present PANet, a novel prototype alignment network to better utilize the information of the support set. |
918 | ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors | Weicheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin | We introduce ShapeMask, which learns the intermediate concept of object shape to address the problem of generalization in instance segmentation to novel categories. |
919 | Sequence Level Semantics Aggregation for Video Object Detection | Haiping Wu, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang | In this work, we argue that aggregating features in the full-sequence level will lead to more discriminative and robust features for video object detection. |
920 | Video Object Segmentation Using Space-Time Memory Networks | Seoung Wug Oh, Joon-Young Lee, Ning Xu, Seon Joo Kim | We propose a novel solution for semi-supervised video object segmentation. |
921 | Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks | Wenguan Wang, Xiankai Lu, Jianbing Shen, David J. Crandall, Ling Shao | This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS). |
922 | MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences | Xingyu Liu, Mengyuan Yan, Jeannette Bohg | We propose a novel neural network architecture called MeteorNet for learning representations for dynamic 3D point cloud sequences. |
923 | 3D Instance Segmentation via Multi-Task Metric Learning | Jean Lahoud, Bernard Ghanem, Marc Pollefeys, Martin R. Oswald | We propose a novel method for instance label segmentation of dense 3D voxel grids. |
924 | DeepGCNs: Can GCNs Go As Deep As CNNs? | Guohao Li, Matthias Muller, Ali Thabet, Bernard Ghanem | In this work, we present new ways to successfully train very deep GCNs. |
925 | Deep Hough Voting for 3D Object Detection in Point Clouds | Charles R. Qi, Or Litany, Kaiming He, Leonidas J. Guibas | In this work, we return to first principles to construct a 3D detection pipeline for point cloud data and as generic as possible. |
926 | M3D-RPN: Monocular 3D Region Proposal Network for Object Detection | Garrick Brazil, Xiaoming Liu | We propose to reduce the gap by reformulating the monocular 3D detection problem as a standalone 3D region proposal network. |
927 | SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences | Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, Jurgen Gall | In this paper, we introduce a large dataset to propel research on laser-based semantic segmentation. |
928 | WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving | Senthil Yogamani, Ciaran Hughes, Jonathan Horgan, Ganesh Sistu, Padraig Varley, Derek O’Dea, Michal Uricar, Stefan Milz, Martin Simon, Karl Amende, Christian Witt, Hazem Rashed, Sumanth Chennupati, Sanjaya Nayak, Saquib Mansoor, Xavier Perrotton, Patrick Perez | We release the first extensive fisheye automotive dataset, WoodScape, named after Robert Wood who invented the fisheye camera in 1906. |
929 | Scalable Place Recognition Under Appearance Change for Autonomous Driving | Anh-Dzung Doan, Yasir Latif, Tat-Jun Chin, Yu Liu, Thanh-Toan Do, Ian Reid | To this end, we propose a novel place recognition technique that can be efficiently retrained and compressed, such that the recognition of new queries can exploit all available data (including recent changes) without suffering from visible growth in computational cost. |
930 | Exploring the Limitations of Behavior Cloning for Autonomous Driving | Felipe Codevilla, Eder Santana, Antonio M. Lopez, Adrien Gaidon | In this paper, we propose a new benchmark to experimentally investigate the scalability and limitations of behavior cloning. |
931 | Habitat: A Platform for Embodied AI Research | Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra | We present Habitat, a platform for research in embodied artificial intelligence (AI). |
932 | Towards Interpretable Face Recognition | Bangjie Yin, Luan Tran, Haoxiang Li, Xiaohui Shen, Xiaoming Liu | In this work, focusing on a specific area of visual recognition, we report our efforts towards interpretable face recognition. |
933 | Co-Mining: Deep Face Recognition With Noisy Labels | Xiaobo Wang, Shuo Wang, Jun Wang, Hailin Shi, Tao Mei | To address this issue, this paper develops a novel co-mining strategy to effectively train on the datasets with noisy labels. |
934 | Few-Shot Adaptive Gaze Estimation | Seonwook Park, Shalini De Mello, Pavlo Molchanov, Umar Iqbal, Otmar Hilliges, Jan Kautz | We embrace these challenges and propose a novel framework for Few-shot Adaptive GaZE Estimation (Faze) for learning person-specific gaze networks with very few (<= 9) calibration samples. |
935 | Live Face De-Identification in Video | Oran Gafni, Lior Wolf, Yaniv Taigman | We propose a method for face de-identification that enables fully automatic video modification at high frame rates. |
936 | Face Video Deblurring Using 3D Facial Priors | Wenqi Ren, Jiaolong Yang, Senyou Deng, David Wipf, Xiaochun Cao, Xin Tong | In this paper we propose a novel face video deblurring network capitalizing on 3D facial priors. |
937 | Semi-Supervised Monocular 3D Face Reconstruction With End-to-End Shape-Preserved Domain Transfer | Jingtan Piao, Chen Qian, Hongsheng Li | To tackle this problem, we propose a semi-supervised monocular reconstruction method, which jointly optimizes a shape-preserved domain-transfer CycleGAN and a shape estimation network. |
938 | 3D Face Modeling From Diverse Raw Scan Data | Feng Liu, Luan Tran, Xiaoming Liu | To address these problems, this paper proposes an innovative framework to jointly learn a nonlinear face model from a diverse set of raw 3D scan databases and establish dense point-to-point correspondence among their scans. |
939 | A Decoupled 3D Facial Shape Model by Adversarial Training | Victoria Fernandez Abrevaya, Adnane Boukhayma, Stefanie Wuhrer, Edmond Boyer | In this work, we explore a new direction with Generative Adversarial Networks and show that they contribute to better face modeling performances, especially in decoupling natural factors, while also achieving more diverse samples. |
940 | Photo-Realistic Facial Details Synthesis From Single Image | Anpei Chen, Zhang Chen, Guli Zhang, Kenny Mitchell, Jingyi Yu | We present a single-image 3D face synthesis technique that can handle challenging facial expressions while recovering fine geometric details. |
941 | S2GAN: Share Aging Factors Across Ages and Share Aging Trends Among Individuals | Zhenliang He, Meina Kan, Shiguang Shan, Xilin Chen | Following this biological principle, in this work, we propose an effective and efficient method to simulate natural aging. |
942 | PuppetGAN: Cross-Domain Image Manipulation by Demonstration | Ben Usman, Nick Dufour, Kate Saenko, Chris Bregler | In this work we propose a model that can manipulate individual visual attributes of objects in a real scene using examples of how respective attribute manipulations affect the output of a simulation. |
943 | Few-Shot Adversarial Learning of Realistic Neural Talking Head Models | Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky | However, in many practical scenarios, such personalized talking head models need to be learned from a few image views of a person, potentially even a single image. Here, we present a system with such few-shot capability. |
944 | Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection | Bo Wan, Desen Zhou, Yongfei Liu, Rongjie Li, Xuming He | To address those challenges, we propose a multi-level relation detection strategy that utilizes human pose cues to capture global spatial configurations of relations and as an attention mechanism to dynamically zoom into relevant regions at human part level. |
945 | TRB: A Novel Triplet Representation for Understanding 2D Human Body | Haodong Duan, Kwan-Yee Lin, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang | In this paper, we propose the Triplet Representation for Body (TRB) — a compact 2D human body representation, with skeleton keypoints capturing human pose information and contour keypoints containing human shape information. |
946 | Learning Trajectory Dependencies for Human Motion Prediction | Wei Mao, Miaomiao Liu, Mathieu Salzmann, Hongdong Li | In this paper, we propose a simple feed-forward deep network for motion prediction, which takes into account both temporal smoothness and spatial dependencies among human body joints. |
947 | Cross-Domain Adaptation for Animal Pose Estimation | Jinkun Cao, Hongyang Tang, Hao-Shu Fang, Xiaoyong Shen, Cewu Lu, Yu-Wing Tai | In this paper, we are interested in pose estimation of animals. |
948 | NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection | Jiyang Gao, Jiang Wang, Shengyang Dai, Li-Jia Li, Ram Nevatia | To reduce the dependence on expensive bounding box annotations, we propose a new semi-supervised object detection formulation, in which a few seed box level annotations and a large scale of image level annotations are used to train the detector. |
949 | Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy | Qing Yu, Kiyoharu Aizawa | In this work, we propose a two-head deep convolutional neural network (CNN) and maximize the discrepancy between the two classifiers to detect OOD inputs. |
950 | SBSGAN: Suppression of Inter-Domain Background Shift for Person Re-Identification | Yan Huang, Qiang Wu, JingSong Xu, Yi Zhong | In this paper, we formulate such problems as a background shift problem. |
951 | Enriched Feature Guided Refinement Network for Object Detection | Jing Nie, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao | We propose a single-stage detection framework that jointly tackles the problem of multi-scale object detection and class imbalance. |
952 | Deep Meta Metric Learning | Guangyi Chen, Tianren Zhang, Jiwen Lu, Jie Zhou | In this paper, we present a deep meta metric learning (DMML) approach for visual recognition. |
953 | Discriminative Feature Transformation for Occluded Pedestrian Detection | Chunluan Zhou, Ming Yang, Junsong Yuan | In this paper, we propose a discriminative feature transformation which en- forces feature separability of pedestrian and non-pedestrian examples to handle occlusions for pedestrian detection. |
954 | Contextual Attention for Hand Detection in the Wild | Supreeth Narasimhaswamy, Zhengwei Wei, Yang Wang, Justin Zhang, Minh Hoai | We present Hand-CNN, a novel convolutional network architecture for detecting hand masks and predicting hand orientations in unconstrained images. We also introduce large-scale annotated hand datasets containing hands in unconstrained images for training and evaluation. |
955 | Meta R-CNN: Towards General Solver for Instance-Level Low-Shot Learning | Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, Liang Lin | In this work, we present a flexible and general methodology to achieve these tasks. |
956 | Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation | Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, Rui Yao | In this paper, we propose to model structured segmentation data with graphs and apply attentive graph reasoning to propagate label information from support data to query data. |
957 | Presence-Only Geographical Priors for Fine-Grained Image Classification | Oisin Mac Aodha, Elijah Cole, Pietro Perona | We propose an efficient spatio-temporal prior, that when conditioned on a geographical location and time, estimates the probability that a given object category occurs at that location. |
958 | POD: Practical Object Detection With Scale-Sensitive Network | Junran Peng, Ming Sun, Zhaoxiang Zhang, Tieniu Tan, Junjie Yan | For fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into combinations of fixed integral scales for each convolution filter, which exploit the dilated convolution. |
959 | Human Uncertainty Makes Classification More Robust | Joshua C. Peterson, Ruairidh M. Battleday, Thomas L. Griffiths, Olga Russakovsky | In this paper, we make progress on this problem by training with full label distributions that reflect human perceptual uncertainty. We first present a new benchmark dataset which we call CIFAR10H, containing a full distribution of human labels for each image of the CIFAR10 test set. |
960 | FCOS: Fully Convolutional One-Stage Object Detection | Zhi Tian, Chunhua Shen, Hao Chen, Tong He | We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. |
961 | Self-Critical Attention Learning for Person Re-Identification | Guangyi Chen, Chunze Lin, Liangliang Ren, Jiwen Lu, Jie Zhou | In this paper, we propose a self-critical attention learning method for person re-identification. |
962 | Temporal Knowledge Propagation for Image-to-Video Person Re-Identification | Xinqian Gu, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen | To solve this problem, we propose a novel Temporal Knowledge Propagation (TKP) method which propagates the temporal knowledge learned by the video representation network to the image representation network. |
963 | RepPoints: Point Set Representation for Object Detection | Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, Stephen Lin | In this paper, we present RepPoints (representative points), a new finer representation of objects as a set of sample points useful for both localization and recognition. |
964 | SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering | Haonan Luo, Guosheng Lin, Zichuan Liu, Fayao Liu, Zhenmin Tang, Yazhou Yao | To tackle these problems, we propose a segmentation based visual attention mechanism for Embodied Question Answering. |
965 | No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques | Tanmay Gupta, Alexander Schwing, Derek Hoiem | We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches. |
966 | Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection | Keren Ye, Mingda Zhang, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse Berent | Instead, we show how to squeeze the most information out of these captions by training a text-only classifier that generalizes beyond dataset boundaries. Our discovery provides an opportunity for learning detection models from noisy but more abundant and freely-available caption data. |
967 | No Fear of the Dark: Image Retrieval Under Varying Illumination Conditions | Tomas Jenicek, Ondrej Chum | We propose a learnable normalisation based on the U-Net architecture, which is trained on a combination of single-camera multi-exposure images and a newly constructed collection of similar views of landmarks during day and night. |
968 | Hierarchical Shot Detector | Jiale Cao, Yanwei Pang, Jungong Han, Xuelong Li | To solve the first problem, a novel reg-offset-cls (ROC) module is proposed. It contains three hierarchical steps: box regression, the feature sampling location predication, and the regressed box classification with the features of offset locations. To further solve the second problem, a hierarchical shot detector (HSD) is proposed, which stacks two ROC modules and one feature enhanced module. |
969 | Few-Shot Learning With Global Class Representations | Aoxue Li, Tiange Luo, Tao Xiang, Weiran Huang, Liwei Wang | In this paper, we propose to tackle the challenging few-shot learning (FSL) problem by learning global class representations using both base and novel class training samples. |
970 | Better to Follow, Follow to Be Better: Towards Precise Supervision of Feature Super-Resolution for Small Object Detection | Junhyug Noh, Wonho Bae, Wonhee Lee, Jinhwan Seo, Gunhee Kim | We propose a novel feature-level super-resolution approach that not only correctly addresses these two desiderata but also is integrable with any proposal-based detectors with feature pooling. |
971 | Weakly Supervised Object Detection With Segmentation Collaboration | Xiaoyan Li, Meina Kan, Shiguang Shan, Xilin Chen | To obtain a more accurate detector, in this work we propose a novel end-to-end weakly supervised detection approach, where a newly introduced generative adversarial segmentation module interacts with the conventional detection module in a collaborative loop. |
972 | AutoFocus: Efficient Multi-Scale Inference | Mahyar Najibi, Bharat Singh, Larry S. Davis | This paper describes AutoFocus, an efficient multi-scale inference algorithm for deep-learning based object detectors. |
973 | Leveraging Long-Range Temporal Relationships Between Proposals for Video Object Detection | Mykhailo Shvets, Wei Liu, Alexander C. Berg | In this paper, we present a light-weight modification to a single-frame detector that accounts for arbitrary long dependencies in a video. |
974 | Transferable Contrastive Network for Generalized Zero-Shot Learning | Huajie Jiang, Ruiping Wang, Shiguang Shan, Xilin Chen | To tackle such problem, we propose a novel Transferable Contrastive Network (TCN) that explicitly transfers knowledge from the source classes to the target classes. |
975 | Fast Point R-CNN | Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia | We present a unified, efficient and effective framework for point-cloud based 3D object detection. |
976 | Mesh R-CNN | Georgia Gkioxari, Jitendra Malik, Justin Johnson | We propose a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object. |
977 | Deep Supervised Hashing With Anchor Graph | Yudong Chen, Zhihui Lai, Yujuan Ding, Kaiyi Lin, Wai Keung Wong | To address these problems, this paper proposes an interesting regularized deep model to seamlessly integrate the advantages of deep hashing and efficient binary code learning by using the anchor graph. |
978 | Detecting 11K Classes: Large Scale Object Detection Without Fine-Grained Bounding Boxes | Hao Yang, Hao Wu, Hao Chen | In this paper, we propose a semi-supervised large scale fine-grained detection method, which only needs bounding box annotations of a smaller number of coarse-grained classes and image-level labels of large scale fine-grained classes, and can detect all classes at nearly fully-supervised accuracy. |
979 | Re-ID Driven Localization Refinement for Person Search | Chuchu Han, Jiacheng Ye, Yunshan Zhong, Xin Tan, Chi Zhang, Changxin Gao, Nong Sang | To alleviate this issue, we propose a re-ID driven localization refinement framework for providing the refined detection boxes for person search. |
980 | Hierarchical Encoding of Sequential Data With Compact and Sub-Linear Storage Cost | Huu Le, Ming Xu, Tuan Hoang, Michael Milford | To address these limitations, in this paper we present a totally new hierarchical encoding approach that enables a sub-linear storage scale. |
981 | C-MIDN: Coupled Multiple Instance Detection Network With Segmentation Guidance for Weakly Supervised Object Detection | Yan Gao, Boxiao Liu, Nan Guo, Xiaochun Ye, Fang Wan, Haihang You, Dongrui Fan | In this paper, we propose a novel Coupled Multiple Instance Detection Network (C-MIDN) to address this problem. |
982 | Learning Feature-to-Feature Translator by Alternating Back-Propagation for Generative Zero-Shot Learning | Yizhe Zhu, Jianwen Xie, Bingchen Liu, Ahmed Elgammal | We conduct extensive comparisons with existing generative ZSL methods on five benchmarks, demonstrating the superiority of our method in not only ZSL performance but also convergence speed and computational cost. |
983 | Deep Constrained Dominant Sets for Person Re-Identification | Leulseged Tesfaye Alemu, Marcello Pelillo, Mubarak Shah | In this work, we propose an end-to-end constrained clustering scheme to tackle the person re-identification (re-id) problem. |
984 | Invariant Information Clustering for Unsupervised Image Classification and Segmentation | Xu Ji, Joao F. Henriques, Andrea Vedaldi | We present a novel clustering objective that learns a neural network classifier from scratch, given only unlabelled data samples. |
985 | Subspace Structure-Aware Spectral Clustering for Robust Subspace Clustering | Masataka Yamaguchi, Go Irie, Takahito Kawanishi, Kunio Kashino | In this paper, we propose a novel graph clustering framework for robust subspace clustering. |
986 | Order-Preserving Wasserstein Discriminant Analysis | Bing Su, Jiahuan Zhou, Ying Wu | This paper presents a linear method, namely Order-preserving Wasserstein Discriminant Analysis (OWDA), which learns the projection by maximizing the inter-class distance and minimizing the intra-class scatter. |
987 | LayoutVAE: Stochastic Scene Layout Generation From a Label Set | Akash Abdu Jyothi, Thibaut Durand, Jiawei He, Leonid Sigal, Greg Mori | We propose LayoutVAE, a variational autoencoder based framework for generating stochastic scene layouts. |
988 | Robust Variational Bayesian Point Set Registration | Jie Zhou, Xinke Ma, Li Liang, Yang Yang, Shijin Xu, Yuhe Liu, Sim-Heng Ong | In this work, we propose a hierarchical Bayesian network based point set registration method to solve missing correspondences and various massive outliers. |
989 | Is an Affine Constraint Needed for Affine Subspace Clustering? | Chong You, Chun-Guang Li, Daniel P. Robinson, Rene Vidal | This paper shows, both theoretically and empirically, that when the dimension of the ambient space is high relative to the sum of the dimensions of the affine subspaces, the affine constraint has a negligible effect on clustering performance. |
990 | Meta-Learning to Detect Rare Objects | Yu-Xiong Wang, Deva Ramanan, Martial Hebert | Our key insight is to disentangle the learning of category-agnostic and category-specific components in a CNN based detection model. |
991 | New Convex Relaxations for MRF Inference With Unknown Graphs | Zhenhua Wang, Tong Liu, Qinfeng Shi, M. Pawan Kumar, Jianhua Zhang | We propose two novel relaxations for solving this problem. The first is a linear programming (LP) relaxation, which is provably tighter than the existing LP relaxation. The second is a non-convex quadratic programming (QP) relaxation, which admits an efficient concave-convex procedure (CCCP). |
992 | Cluster Alignment With a Teacher for Unsupervised Domain Adaptation | Zhijie Deng, Yucen Luo, Jun Zhu | In this paper, we propose Cluster Alignment with a Teacher (CAT) for unsupervised domain adaptation, which can effectively incorporate the discriminative clustering structures in both domains for better adaptation. |
993 | Analyzing the Variety Loss in the Context of Probabilistic Trajectory Prediction | Luca Anthony Thiede, Pratik Prabhanjan Brahma | In this work, we present a proof to show that the MoN loss does not lead to the ground truth probability density function, but approximately to its square root instead. |
994 | Deep Mesh Reconstruction From Single RGB Images via Topology Modification Networks | Junyi Pan, Xiaoguang Han, Weikai Chen, Jiapeng Tang, Kui Jia | In this paper, we present an end-to-end single-view mesh reconstruction framework that is able to generate high-quality meshes with complex topologies from a single genus-0 template mesh. |
995 | UprightNet: Geometry-Aware Camera Orientation Estimation From Single Images | Wenqi Xian, Zhengqi Li, Matthew Fisher, Jonathan Eisenmann, Eli Shechtman, Noah Snavely | We introduce UprightNet, a learning-based approach for estimating 2DoF camera orientation from a single RGB image of an indoor scene. |
996 | Escaping Plato’s Cave: 3D Shape From Adversarial Rendering | Philipp Henzler, Niloy J. Mitra, Tobias Ritschel | We introduce PlatonicGAN to discover the 3D structure of an object class from an unstructured collection of 2D images, i.e., where no relation between photos is known, except that they are showing instances of the same category. |
997 | Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module | Di Qiu, Jiahao Pang, Wenxiu Sun, Chengxi Yang | In this work, we propose a framework for jointly alignment and refinement via deep learning. |
998 | GEOBIT: A Geodesic-Based Binary Descriptor Invariant to Non-Rigid Deformations for RGB-D Images | Erickson R. Nascimento, Guilherme Potje, Renato Martins, Felipe Cadar, Mario F. M. Campos, Ruzena Bajcsy | In this paper, we introduce a novel binary RGB-D descriptor invariant to isometric deformations. We also provide to the community a new dataset composed of annotated RGB-D images of different objects (shirts, cloths, paintings, bags), subjected to strong non-rigid deformations, to evaluate point correspondence algorithms. |
999 | CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark | Alan Lukezic, Ugur Kart, Jani Kapyla, Ahmed Durmush, Joni-Kristian Kamarainen, Jiri Matas, Matej Kristan | We propose a new color-and-depth general visual object tracking benchmark (CDTB). |
1000 | Learning Joint 2D-3D Representations for Depth Completion | Yun Chen, Bin Yang, Ming Liang, Raquel Urtasun | In this paper, we tackle the problem of depth completion from RGBD data. |
1001 | Make a Face: Towards Arbitrary High Fidelity Face Manipulation | Shengju Qian, Kwan-Yee Lin, Wayne Wu, Yangxiaokang Liu, Quan Wang, Fumin Shen, Chen Qian, Ran He | In this work, we propose Additive Focal Variational Auto-encoder (AF-VAE), a novel approach that can arbitrarily manipulate high-resolution face images using a simple yet effective model and only weak supervision of reconstruction and KL divergence losses. |
1002 | M2FPA: A Multi-Yaw Multi-Pitch High-Quality Dataset and Benchmark for Facial Pose Analysis | Peipei Li, Xiang Wu, Yibo Hu, Ran He, Zhenan Sun | In this paper, a new large-scale Multi-yaw Multi-pitch high-quality database is proposed for Facial Pose Analysis (M2FPA), including face frontalization, face rotation, facial pose estimation and pose-invariant face recognition. |
1003 | Fair Loss: Margin-Aware Reinforcement Learning for Deep Face Recognition | Bingyu Liu, Weihong Deng, Yaoyao Zhong, Mei Wang, Jiani Hu, Xunqiang Tao, Yaohai Huang | In this paper, we introduce a new margin-aware reinforcement learning based loss function, namely fair loss, in which each class will learn an appropriate adaptive margin by Deep Q-learning. |
1004 | Face De-Occlusion Using 3D Morphable Model and Generative Adversarial Network | Xiaowei Yuan, In Kyu Park | In this paper, a novel method is proposed to restore de-occluded face images based on inverse use of 3DMM and generative adversarial network. |
1005 | Detecting Photoshopped Faces by Scripting Photoshop | Sheng-Yu Wang, Oliver Wang, Andrew Owens, Richard Zhang, Alexei A. Efros | We present a method for detecting one very popular Photoshop manipulation — image warping applied to human faces — using a model trained entirely using fake images that were automatically generated by scripting Photoshop itself. |
1006 | Ego-Pose Estimation and Forecasting As Real-Time PD Control | Ye Yuan, Kris Kitani | We propose the use of a proportional-derivative (PD) control based policy learned via reinforcement learning (RL) to estimate and forecast 3D human pose from egocentric videos. |
1007 | End-to-End Learning for Graph Decomposition | Jie Song, Bjoern Andres, Michael J. Black, Otmar Hilliges, Siyu Tang | In this paper, we study how to connect deep networks with graph decomposition into an end-to-end trainable framework. |
1008 | Laplace Landmark Localization | Joseph P. Robinson, Yuncheng Li, Ning Zhang, Yun Fu, Sergey Tulyakov | To address both issues, we propose an adversarial training framework that leverages unlabeled data to improve model performance. |
1009 | Through-Wall Human Mesh Recovery Using Radio Signals | Mingmin Zhao, Yingcheng Liu, Aniruddh Raghu, Tianhong Li, Hang Zhao, Antonio Torralba, Dina Katabi | This paper presents RF-Avatar, a neural network model that can estimate 3D meshes of the human body in the presence of occlusions, baggy clothes, and bad lighting conditions. |
1010 | Discriminatively Learned Convex Models for Set Based Face Recognition | Hakan Cevikalp, Golara Ghorban Dordinejad | In contrast to these methods, this paper introduces a novel method that searches for discriminative convex models that best fit to an individual’s face images but at the same time are as far as possible from the images of other persons in the gallery. |
1011 | Camera Distance-Aware Top-Down Approach for 3D Multi-Person Pose Estimation From a Single RGB Image | Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee | In this work, we firstly propose a fully learning-based, camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. |
1012 | Context-Aware Emotion Recognition Networks | Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, Kwanghoon Sohn | We present deep networks for context-aware emotion recognition, called CAER-Net, that exploit not only human facial expression but also context information in a joint and boosting manner. |
1013 | Aggregation via Separation: Boosting Facial Landmark Detector With Semi-Supervised Style Translation | Shengju Qian, Keqiang Sun, Wayne Wu, Chen Qian, Jiaya Jia | In this paper, we investigate a new perspective of facial landmark detection and demonstrate it leads to further notable improvement. |
1014 | Deep Head Pose Estimation Using Synthetic Images and Partial Adversarial Domain Adaption for Continuous Label Spaces | Felix Kuhnke, Jorn Ostermann | More precisely, we adapt the predominant weighting approaches to continuous label spaces by applying a weighted resampling of the source domain during training. |
1015 | Flare in Interference-Based Hyperspectral Cameras | Eden Sassoon, Yoav Y. Schechner, Tali Treibitz | We present a theoretical image formation model for this effect, and quantify it through simulations and experiments. |
1016 | Computational Hyperspectral Imaging Based on Dimension-Discriminative Low-Rank Tensor Recovery | Shipeng Zhang, Lizhi Wang, Ying Fu, Xiaoming Zhong, Hua Huang | In this paper, we propose to make full use of the high-dimensionality structure of the desired HSI to boost the reconstruction quality. |
1017 | Deep Optics for Monocular Depth Estimation and 3D Object Detection | Julie Chang, Gordon Wetzstein | Here we introduce the paradigm of deep optics, i.e. end-to-end design of optics and image processing, to the monocular depth estimation problem, using coded defocus blur as an additional depth cue to be decoded by a neural network. |
1018 | Physics-Based Rendering for Improving Robustness to Rain | Shirsendu Sukanta Halder, Jean-Francois Lalonde, Raoul de Charette | To improve the robustness to rain, we present a physically-based rain rendering pipeline for realistically inserting rain into clear weather images. |
1019 | ARGAN: Attentive Recurrent Generative Adversarial Network for Shadow Detection and Removal | Bin Ding, Chengjiang Long, Ling Zhang, Chunxia Xiao | In this paper we propose an attentive recurrent generative adversarial network (ARGAN) to detect and remove shadows in an image. |
1020 | Deep Tensor ADMM-Net for Snapshot Compressive Imaging | Jiawei Ma, Xiao-Yang Liu, Zheng Shou, Xin Yuan | In this paper, we propose a deep tensor ADMM-Net for video SCI systems that provides high-quality decoding in seconds. |
1021 | Convex Relaxations for Consensus and Non-Minimal Problems in 3D Vision | Thomas Probst, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool | In this paper, we formulate a generic non-minimal solver using the existing tools of Polynomials Optimization Problems (POP) from computational algebraic geometry. |
1022 | Pareto Meets Huber: Efficiently Avoiding Poor Minima in Robust Estimation | Christopher Zach, Guillaume Bourmaud | In this paper, we propose a novel algorithm relying on multi-objective optimization which allows to match those two properties. |
1023 | K-Best Transformation Synchronization | Yifan Sun, Jiacheng Zhuo, Arnav Mohan, Qixing Huang | In this paper, we introduce the problem of K-best transformation synchronization for the purpose of multiple scan matching. |
1024 | Parametric Majorization for Data-Driven Energy Minimization Methods | Jonas Geiping, Michael Moeller | In this work, we present a new strategy to optimize these bi-level problems. |
1025 | A Bayesian Optimization Framework for Neural Network Compression | Xingchen Ma, Amal Rannen Triki, Maxim Berman, Christos Sagonas, Jacques Cali, Matthew B. Blaschko | In this work, we develop a general Bayesian optimization framework for optimizing functions that are computed based on U-statistics. |
1026 | HiPPI: Higher-Order Projected Power Iterations for Scalable Multi-Matching | Florian Bernard, Johan Thunberg, Paul Swoboda, Christian Theobalt | We address these shortcomings by introducing a Higher-order Projected Power Iteration method, which is (i) efficient and scales to tens of thousands of points, (ii) straightforward to implement, (iii) able to incorporate geometric consistency, (iv) guarantees cycle-consistent multi-matchings, and (iv) comes with theoretical convergence guarantees. |
1027 | Language-Conditioned Graph Networks for Relational Reasoning | Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko | In this paper, we take an alternate approach and build contextualized representations for objects in a visual scene to support relational reasoning. |
1028 | Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction | Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, Graham W. Taylor | In this work, we present a recurrent image generation model which takes into account both the generated output up to the current step as well as all past instructions for generation. |
1029 | Relation-Aware Graph Attention Network for Visual Question Answering | Linjie Li, Zhe Gan, Yu Cheng, Jingjing Liu | We propose a Relation-aware Graph Attention Network (ReGAT), which encodes each image into a graph and models multi-type inter-object relations via a graph attention mechanism, to learn question-adaptive relation representations. |
1030 | Unpaired Image Captioning via Scene Graph Alignments | Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang | In this paper, we present a scene graph-based approach for unpaired image captioning. |
1031 | Modeling Inter and Intra-Class Relations in the Triplet Loss for Zero-Shot Learning | Yannick Le Cacheux, Herve Le Borgne, Michel Crucianu | Our approach consists in taking into account both inter-class and intra-class relations, respectively by being more permissive with confusions between similar classes, and by penalizing visual samples which are atypical to their class. |
1032 | Occlusion-Shared and Feature-Separated Network for Occlusion Relationship Reasoning | Rui Lu, Feng Xue, Menghan Zhou, Anlong Ming, Yu Zhou | For the reasons above, we propose the Occlusion-shared and Feature-separated Network (OFNet). |
1033 | Compositional Video Prediction | Yufei Ye, Maneesh Singh, Abhinav Gupta, Shubham Tulsiani | We present an approach for pixel-level future prediction given an input image of a scene. |
1034 | Mixture-Kernel Graph Attention Network for Situation Recognition | Mohammed Suhail, Leonid Sigal | In this paper, we propose a novel mixture-kernel attention graph neural network (GNN) architecture designed to address these challenges. |
1035 | Learning Similarity Conditions Without Explicit Supervision | Reuben Tan, Mariya I. Vasileva, Kate Saenko, Bryan A. Plummer | To address this deficiency, we propose an approach that jointly learns representations for the different similarity conditions and their contributions as a latent variable without explicit supervision. |
1036 | Joint Prediction for Kinematic Trajectories in Vehicle-Pedestrian-Mixed Scenes | Huikun Bi, Zhong Fang, Tianlu Mao, Zhaoqi Wang, Zhigang Deng | In this paper, we tackle this problem using separate LSTMs for heterogeneous vehicles and pedestrians. |
1037 | Learning to Caption Images Through a Lifetime by Asking Questions | Tingke Shen, Amlan Kar, Sanja Fidler | Inspired by a student learning in a classroom, we present an agent that can continuously learn by posing natural language questions to humans. |
1038 | VrR-VG: Refocusing Visually-Relevant Relationships | Yuanzhi Liang, Yalong Bai, Wei Zhang, Xueming Qian, Li Zhu, Tao Mei | To encourage further development in visual relationships, we propose a novel method to mine more valuable relationships by automatically pruning visually-irrelevant relationships. We construct a new scene graph dataset named Visually-Relevant Relationships Dataset (VrR-VG) based on Visual Genome. |
1039 | TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo | Andrea Romanoni, Matteo Matteucci | Assuming the untextured areas piecewise planar, in this paper we generate novel PatchMatch hypotheses so to expand reliable depth estimates in neighboring untextured regions. |
1040 | U4D: Unsupervised 4D Dynamic Scene Understanding | Armin Mustafa, Chris Russell, Adrian Hilton | We introduce the first approach to solve the challenging problem of unsupervised 4D visual scene understanding for complex dynamic scenes with multiple interacting people from multi-view video. |
1041 | Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation | Li Jiang, Hengshuang Zhao, Shu Liu, Xiaoyong Shen, Chi-Wing Fu, Jiaya Jia | To incorporate point features in the edge branch, we establish a hierarchical graph framework, where the graph is initialized from a coarse layer and gradually enriched along the point decoding process. |
1042 | Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds From Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction | Zhizhong Han, Xiyang Wang, Yu-Shen Liu, Matthias Zwicker | To resolve this issue, we propose MAP-VAE to enable the learning of global and local geometry by jointly leveraging global and local self-supervision. |
1043 | P-MVSNet: Learning Patch-Wise Matching Confidence Aggregation for Multi-View Stereo | Keyang Luo, Tao Guan, Lili Ju, Haipeng Huang, Yawei Luo | In this paper, we propose a new end-to-end deep learning network of P-MVSNet for multi-view stereo based on isotropic and anisotropic 3D convolutions. |
1044 | SME-Net: Sparse Motion Estimation for Parametric Video Prediction Through Reinforcement Learning | Yung-Han Ho, Chuan-Yuan Cho, Wen-Hsiao Peng, Guo-Lun Jin | This paper leverages a classic prediction technique, known as parametric overlapped block motion compensation (POBMC), in a reinforcement learning framework for video prediction. |
1045 | ClothFlow: A Flow-Based Model for Clothed Person Generation | Xintong Han, Xiaojun Hu, Weilin Huang, Matthew R. Scott | We present ClothFlow, an appearance-flow-based generative model to synthesize clothed person for posed-guided person image generation and virtual try-on. |
1046 | LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup | Qiao Gu, Guanzhi Wang, Mang Tik Chiu, Yu-Wing Tai, Chi-Keung Tang | We propose a local adversarial disentangling network (LADN) for facial makeup and de-makeup. |
1047 | Point-to-Point Video Generation | Tsun-Hsuan Wang, Yen-Chi Cheng, Chieh Hubert Lin, Hwann-Tzong Chen, Min Sun | We introduce point-to-point video generation that controls the generation process with two control points: the targeted start- and end-frames. |
1048 | Semantics-Enhanced Adversarial Nets for Text-to-Image Synthesis | Hongchen Tan, Xiuping Liu, Xin Li, Yi Zhang, Baocai Yin | This paper presents a new model, Semantics-enhanced Generative Adversarial Network (SEGAN), for fine-grained text-to-image generation. |
1049 | VTNFP: An Image-Based Virtual Try-On Network With Body and Clothing Feature Preservation | Ruiyun Yu, Xiaoqi Wang, Xiaohui Xie | Here we present a new virtual try-on network, called VTNFP, to synthesize photo-realistic images given the images of a clothed person and a target clothing item. |
1050 | Boundless: Generative Adversarial Networks for Image Extension | Piotr Teterwak, Aaron Sarna, Dilip Krishnan, Aaron Maschinot, David Belanger, Ce Liu, William T. Freeman | We introduce semantic conditioning to the discriminator of a generative adversarial network (GAN), and achieve strong results on image extension with coherent semantics and visually pleasing colors and textures. |
1051 | Image Synthesis From Reconfigurable Layout and Style | Wei Sun, Tianfu Wu | In this paper, we present a layout- and style-based architecture for generative adversarial networks (termed LostGANs) that can be trained end-to-end to generate images from reconfigurable layout and style. |
1052 | Attribute Manipulation Generative Adversarial Networks for Fashion Images | Kenan E. Ak, Joo Hwee Lim, Jo Yew Tham, Ashraf A. Kassim | To address this and other limitations, we introduce Attribute Manipulation Generative Adversarial Networks (AMGAN) for fashion images. |
1053 | Few-Shot Unsupervised Image-to-Image Translation | Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, Jan Kautz | Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, we seek a few-shot, unsupervised image-to-image translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images. |
1054 | Very Long Natural Scenery Image Prediction by Outpainting | Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, Shuicheng Yan | To solve the two problems, we devise some innovative modules, named Skip Horizontal Connection and Recurrent Content Transfer, and integrate them into our designed encoder-decoder structure. |
1055 | Scaling Recurrent Models via Orthogonal Approximations in Tensor Trains | Ronak Mehta, Rudrasis Chakraborty, Yunyang Xiong, Vikas Singh | We describe the “orthogonal” tensor train, and demonstrate its ability to express a standard network layer both theoretically and empirically. |
1056 | A Deep Cybersickness Predictor Based on Brain Signal Analysis for Virtual Reality Contents | Jinwoo Kim, Woojae Kim, Heeseok Oh, Seongmin Lee, Sanghoon Lee | In this paper, we address the above question by developing an electroencephalography (EEG) driven VR cybersickness prediction model. |
1057 | Learning With Unsure Data for Medical Image Diagnosis | Botong Wu, Xinwei Sun, Lingjing Hu, Yizhou Wang | In this paper, we raise “learning with unsure data” problem and formulate it as an ordinal regression and propose a unified end-to-end learning framework, which also considers the aforementioned two issues: (i) incorporate cost-sensitive parameters to alleviate the data imbalance problem, and (ii) execute the conservative and aggressive strategies by introducing two parameters in the training procedure. |
1058 | Recursive Cascaded Networks for Unsupervised Medical Image Registration | Shengyu Zhao, Yue Dong, Eric I-Chao Chang, Yan Xu | We present recursive cascaded networks, a general architecture that enables learning deep cascades, for deformable image registration. |
1059 | DUAL-GLOW: Conditional Flow-Based Generative Model for Modality Transfer | Haoliang Sun, Ronak Mehta, Hao H. Zhou, Zhichun Huang, Sterling C. Johnson, Vivek Prabhakaran, Vikas Singh | We present experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset with 826 subjects, and obtain good performance in PET image synthesis, qualitatively and quantitatively better than recent works. |
1060 | Dilated Convolutional Neural Networks for Sequential Manifold-Valued Data | Xingjian Zhen, Rudrasis Chakraborty, Nicholas Vogt, Barbara B. Bendlin, Vikas Singh | Instead of a recurrent model which poses computational/technical issues, and inspired by recent results showing the viability of dilated convolutional models for sequence prediction, we develop a dilated convolutional neural network architecture for this task. |
1061 | Align, Attend and Locate: Chest X-Ray Diagnosis via Contrast Induced Attention Network With Limited Supervision | Jingyu Liu, Gangming Zhao, Yu Fei, Ming Zhang, Yizhou Wang, Yizhou Yu | In this paper, we propose a Contrast Induced Attention Network (CIA-Net), which exploits the highly structured property of chest X-ray images and localizes diseases via contrastive learning on the aligned positive and negative samples. |
1062 | Joint Acne Image Grading and Counting via Label Distribution Learning | Xiaoping Wu, Ni Wen, Jie Liang, Yu-Kun Lai, Dongyu She, Ming-Ming Cheng, Jufeng Yang | In this paper, we address the problem of acne image analysis via Label Distribution Learning (LDL) considering the ambiguous information among acne severity. In addition, we further build the ACNE04 dataset with annotations of acne severity and lesion number of each image for evaluation. |
1063 | An Alarm System for Segmentation Algorithm Based on Shape Model | Fengze Liu, Yingda Xia, Dong Yang, Alan L. Yuille, Daguang Xu | Motivated by this, in this paper, we learn a feature space using the shape information which is a strong prior shared among different datasets and robust to the appearance variation of input data. |
1064 | HistoSegNet: Semantic Segmentation of Histological Tissue Type in Whole Slide Images | Lyndon Chan, Mahdi S. Hosseini, Corwyn Rowsell, Konstantinos N. Plataniotis, Savvas Damaskinos | In this paper, we propose HistoSegNet, a method for semantic segmentation of histological tissue type (HTT). |
1065 | Prior-Aware Neural Network for Partially-Supervised Multi-Organ Segmentation | Yuyin Zhou, Zhe Li, Song Bai, Chong Wang, Xinlei Chen, Mei Han, Elliot Fishman, Alan L. Yuille | To address the background ambiguity in these partially-labeled datasets, we propose Prior-aware Neural Network (PaNN) via explicitly incorporating anatomical priors on abdominal organ sizes, guiding the training process with domain-specific knowledge. |
1066 | CAMEL: A Weakly Supervised Learning Framework for Histopathology Image Segmentation | Gang Xu, Zhigang Song, Zhuo Sun, Calvin Ku, Zhe Yang, Cancheng Liu, Shuhao Wang, Jianpeng Ma, Wei Xu | In this research, we propose CAMEL, a weakly supervised learning framework for histopathology image segmentation using only image-level labels. |
1067 | Conditional Recurrent Flow: Conditional Generation of Longitudinal Samples With Applications to Neuroimaging | Seong Jae Hwang, Zirui Tao, Won Hwa Kim, Vikas Singh | We develop a conditional generative model for longitudinal image datasets based on sequential invertible neural networks. |
1068 | Multi-Stage Pathological Image Classification Using Semantic Segmentation | Shusuke Takahama, Yusuke Kurose, Yusuke Mukuta, Hiroyuki Abe, Masashi Fukayama, Akihiko Yoshizawa, Masanobu Kitagawa, Tatsuya Harada | In this paper, we propose a new model structure combining the patch-based classification model and whole slide-scale segmentation model in order to improve the prediction performance of automatic pathological diagnosis. |
1069 | Semantic-Transferable Weakly-Supervised Endoscopic Lesions Segmentation | Jiahua Dong, Yang Cong, Gan Sun, Dongdong Hou | To better utilize these dependencies, we present a new semantic lesions representation transfer model for weakly-supervised endoscopic lesions segmentation, which can exploit useful knowledge from relevant fully-labeled diseases segmentation task to enhance the performance of target weakly-labeled lesions segmentation task. Finally, we build a new medical endoscopic dataset with 3659 images collected from more than 1100 volunteers. |
1070 | Unsupervised Microvascular Image Segmentation Using an Active Contours Mimicking Neural Network | Shir Gur, Lior Wolf, Lior Golgher, Pablo Blinder | We present a novel deep learning method for unsupervised segmentation of blood vessels. |
1071 | GLAMpoints: Greedily Learned Accurate Match Points | Prune Truong, Stefanos Apostolopoulos, Agata Mosinska, Samuel Stucky, Carlos Ciller, Sandro De Zanet | We introduce a novel CNN-based feature point detector – Greedily Learned Accurate Match Points (GLAMpoints) – learned in a semi-supervised manner. |
1072 | Adversarial Robustness vs. Model Compression, or Both? | Shaokai Ye, Kaidi Xu, Sijia Liu, Hao Cheng, Jan-Henrik Lambrechts, Huan Zhang, Aojun Zhou, Kaisheng Ma, Yanzhi Wang, Xue Lin | This paper proposes a framework of concurrent adversarial training and weightpruning that enables model compression while still preserving the adversarial robustness and essentially tackles the dilemma of adversarial training. |
1073 | MONET: Multiview Semi-Supervised Keypoint Detection via Epipolar Divergence | Yuan Yao, Yasamin Jafarian, Hyun Soo Park | This paper presents MONET- an end-to-end semi-supervised learning framework for a keypoint detector using multiview image streams. |
1074 | Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters | Axel Barroso-Laguna, Edgar Riba, Daniel Ponsa, Krystian Mikolajczyk | We introduce a novel approach for keypoint detection task that combines handcrafted and learned CNN filters within a shallow multi-scale architecture. |
1075 | Miss Detection vs. False Alarm: Adversarial Learning for Small Object Segmentation in Infrared Images | Huan Wang, Luping Zhou, Lei Wang | In this paper, we propose a deep adversarial learning framework to improve this situation. |