Paper Digest: ICCV 2019 Highlights

October 23, 2019November 21, 2019 admin

Download ICCV-2019-Paper-Digests.pdf– highlights of all 1,075 ICCV-2019 papers.
The International Conference on Computer Vision (ICCV) is one of the top computer vision conferences in the world. In 2019, it is to be held in Seoul, Korea. There were more than 4,300 paper submissions, of which around 1,070 were accepted. More than 100 papers also published their code (download link).

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: ICCV 2019 Papers

	Title	Authors	Highlight
1	FaceForensics++: Learning to Detect Manipulated Facial Images	Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, Matthias Niessner	To standardize the evaluation of detection methods, we propose an automated benchmark for facial manipulation detection.
2	DeepVCP: An End-to-End Deep Neural Network for Point Cloud Registration	Weixin Lu, Guowei Wan, Yao Zhou, Xiangyu Fu, Pengfei Yuan, Shiyu Song	We present DeepVCP – a novel end-to-end learning-based 3D point cloud registration framework that achieves comparable registration accuracy to prior state-of-the-art geometric methods.
3	Shape Reconstruction Using Differentiable Projections and Deep Priors	Matheus Gadelha, Rui Wang, Subhransu Maji	We investigate the problem of reconstructing shapes from noisy and incomplete projections in the presence of viewpoint uncertainities.
4	Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization	Mans Larsson, Erik Stenborg, Carl Toft, Lars Hammarstrand, Torsten Sattler, Fredrik Kahl	In this paper, we propose a novel neural network, the Fine-Grained Segmentation Network (FGSN), that can be used to provide image segmentations with a larger number of labels and can be trained in a self-supervised fashion.
5	SANet: Scene Agnostic Network for Camera Localization	Luwei Yang, Ziqian Bai, Chengzhou Tang, Honghua Li, Yasutaka Furukawa, Ping Tan	This paper presents a scene agnostic neural architecture for camera localization, where model parameters and scenes are independent from each other.Despite recent advancement in learning based methods, most approaches require training for each scene one by one, not applicable for online applications such as SLAM and robotic navigation, where a model must be built on-the-fly.
6	Total Denoising: Unsupervised Learning of 3D Point Cloud Cleaning	Pedro Hermosilla, Tobias Ritschel, Timo Ropinski	To overcome this, and to enable effective and unsupervised 3D point cloud denoising, we introduce a spatial prior term, that steers converges to the unique closest out of the many possible modes on the manifold.
7	Hierarchical Self-Attention Network for Action Localization in Videos	Rizard Renanda Adhi Pramono, Yie-Tarng Chen, Wen-Hsien Fang	This paper presents a novel Hierarchical Self-Attention Network (HISAN) to generate spatial-temporal tubes for action localization in videos.
8	Goal-Driven Sequential Data Abstraction	Umar Riaz Muhammad, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song	In this paper we study a general reinforcement learning based framework for learning to abstract sequential data in a goal-driven way.
9	Jointly Aligning Millions of Images With Deep Penalised Reconstruction Congealing	Roberto Annunziata, Christos Sagonas, Jacques Cali	To overcome these limitations, we propose an unsupervised joint alignment method leveraging a densely fused spatial transformer network to estimate the warping parameters for each image and a low-capacity auto-encoder whose reconstruction error is used as an auxiliary measure of joint alignment.
10	Drop to Adapt: Learning Discriminative Features for Unsupervised Domain Adaptation	Seungmin Lee, Dongwan Kim, Namil Kim, Seong-Gyun Jeong	We propose Drop to Adapt (DTA), which leverages adversarial dropout to learn strongly discriminative features by enforcing the cluster assumption.
11	NLNL: Negative Learning for Noisy Labels	Youngdong Kim, Junho Yim, Juseung Yun, Junmo Kim	To address this issue, we start with an indirect learning method called Negative Learning (NL), in which the CNNs are trained using a complementary label as in “input image does not belong to this complementary label.”
12	On the Design of Black-Box Adversarial Examples by Leveraging Gradient-Free Optimization and Operator Splitting Method	Pu Zhao, Sijia Liu, Pin-Yu Chen, Nghia Hoang, Kaidi Xu, Bhavya Kailkhura, Xue Lin	To push for further advances in this field, we introduce a general framework based on an operator splitting method, the alternating direction method of multipliers (ADMM) to devise efficient, robust black-box attacks that work with various distortion metrics and feedback settings without incurring high query complexity.
13	DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks	Sagnik Das, Ke Ma, Zhixin Shu, Dimitris Samaras, Roy Shilkrot	In this work, we propose DewarpNet, a deep-learning approach for document image unwarping from a single image.
14	Learning Robust Facial Landmark Detection via Hierarchical Structured Ensemble	Xu Zou, Sheng Zhong, Luxin Yan, Xiangyun Zhao, Jiahuan Zhou, Ying Wu	In this paper, we propose a novel Hierarchical Structured Landmark Ensemble (HSLE) model for learning robust facial landmark detection, by using it as the structural constraints.
15	Remote Heart Rate Measurement From Highly Compressed Facial Videos: An End-to-End Deep Learning Solution With Video Enhancement	Zitong Yu, Wei Peng, Xiaobai Li, Xiaopeng Hong, Guoying Zhao	Here we propose a two-stage, end-to-end method using hidden rPPG information enhancement and attention networks, which is the first attempt to counter video compression loss and recover rPPG signals from highly compressed videos.
16	Face-to-Parameter Translation for Game Character Auto-Creation	Tianyang Shi, Yi Yuan, Changjie Fan, Zhengxia Zou, Zhenwei Shi, Yong Liu	This paper proposes a method for automatically creating in-game characters of players according to an input face photo.
17	Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions	Guha Balakrishnan, Adrian V. Dalca, Amy Zhao, John V. Guttag, Fredo Durand, William T. Freeman	We introduce visual deprojection: the task of recovering an image or video that has been collapsed along a dimension.
18	StructureFlow: Image Inpainting via Structure-Aware Appearance Flow	Yurui Ren, Xiaoming Yu, Ruonan Zhang, Thomas H. Li, Shan Liu, Ge Li	In order to solve this problem, in this paper, we propose a two-stage model which splits the inpainting task into two parts: structure reconstruction and texture generation.
19	Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization	Md Mahfuzur Rahman Siddiquee, Zongwei Zhou, Nima Tajbakhsh, Ruibin Feng, Michael B. Gotway, Yoshua Bengio, Jianming Liang	Therefore, we propose a new GAN, called Fixed-Point GAN, trained by (1) supervising same-domain translation through a conditional identity loss, and (2) regularizing cross-domain translation through revised adversarial, domain classification, and cycle consistency loss.
20	Generative Adversarial Training for Weakly Supervised Cloud Matting	Zhengxia Zou, Wenyuan Li, Tianyang Shi, Zhenwei Shi, Jieping Ye	We re-examine the cloud detection under a totally different point of view, i.e. to formulate it as a mixed energy separation process between foreground and background images, which can be equivalently implemented under an image matting paradigm with a clear physical significance.
21	PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data	Zheng Tang, Milind Naphade, Stan Birchfield, Jonathan Tremblay, William Hodge, Ratnesh Kumar, Shuo Wang, Xiaodong Yang	To address these challenges, we propose a Pose-Aware Multi-Task Re-Identification (PAMTRI) framework.
22	Generative Adversarial Networks for Extreme Learned Image Compression	Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, Luc Van Gool	We present a learned image compression system based on GANs, operating at extremely low bitrates.
23	Instance-Guided Context Rendering for Cross-Domain Person Re-Identification	Yanbei Chen, Xiatian Zhu, Shaogang Gong	To tackle this limitation, we propose a novel Instance-Guided Context Rendering scheme, which transfers the source person identities into diverse target domain contexts to enable supervised re-id model learning in the unlabelled target domain.
24	What Else Can Fool Deep Learning? Addressing Color Constancy Errors on Deep Neural Network Performance	Mahmoud Afifi, Michael S. Brown	To address this problem, a novel augmentation method is proposed that can emulate accurate color constancy degradation.
25	Beyond Cartesian Representations for Local Descriptors	Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, Eduard Trulls	By contrast, we propose to extract the “support region” directly with a log-polar sampling scheme.
26	Distilling Knowledge From a Deep Pose Regressor Network	Muhamad Risqi U. Saputra, Pedro P. B. de Gusmao, Yasin Almalioglu, Andrew Markham, Niki Trigoni	This paper presents a novel method to distill knowledge from a deep pose regressor network for efficient Visual Odometry (VO).
27	Instance-Level Future Motion Estimation in a Single Image Based on Ordinal Regression	Kyung-Rae Kim, Whan Choi, Yeong Jun Koh, Seong-Gyun Jeong, Chang-Su Kim	A novel algorithm to estimate instance-level future motion in a single image is proposed in this paper.
28	Vision-Infused Deep Audio Inpainting	Hang Zhou, Ziwei Liu, Xudong Xu, Ping Luo, Xiaogang Wang	In this work, we consider a new task of visual information-infused audio inpainting, i.e., synthesizing missing audio segments that correspond to their accompanying videos. To facilitate a large-scale study, we collect a new multi-modality instrument-playing dataset called MUSIC-Extra-Solo (MUSICES) by enriching MUSIC dataset.
29	HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision	Zhen Dong, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer	Here, we introduce Hessian AWare Quantization (HAWQ), a novel second-order quantization method to address these problems.
30	Evaluating Robustness of Deep Image Super-Resolution Against Adversarial Attacks	Jun-Ho Choi, Huan Zhang, Jun-Hyuk Kim, Cho-Jui Hsieh, Jong-Seok Lee	This paper investigates the robustness of deep learning-based super-resolution methods against adversarial attacks, which can significantly deteriorate the super-resolved images without noticeable distortion in the attacked low-resolution images.
31	Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild	Kibok Lee, Kimin Lee, Jinwoo Shin, Honglak Lee	To alleviate this effect, we propose to leverage a large stream of unlabeled data easily obtainable in the wild.
32	Symmetric Cross Entropy for Robust Learning With Noisy Labels	Yisen Wang, Xingjun Ma, Zaiyi Chen, Yuan Luo, Jinfeng Yi, James Bailey	Inspired by the symmetric KL-divergence, we propose the approach of Symmetric cross entropy Learning (SL), boosting CE symmetrically with a noise robust counterpart Reverse Cross Entropy (RCE).
33	Few-Shot Learning With Embedded Class Models and Shot-Free Meta Training	Avinash Ravichandran, Rahul Bhotika, Stefano Soatto	We propose a method for learning embeddings for few-shot learning that is suitable for use with any number of shots (shot-free).
34	Dual Directed Capsule Network for Very Low Resolution Image Recognition	Maneet Singh, Shruti Nagpal, Richa Singh, Mayank Vatsa	This research presents a novel Dual Directed Capsule Network model, termed as DirectCapsNet, for addressing VLR digit and face recognition.
35	Recognizing Part Attributes With Insufficient Data	Xiangyun Zhao, Yi Yang, Feng Zhou, Xiao Tan, Yuchen Yuan, Yingze Bao, Ying Wu	In order to solve the data insufficiency problem and get rid of dependence on the part annotation, we introduce a novel Concept Sharing Network (CSN) for part attribute recognition.
36	USIP: Unsupervised Stable Interest Point Detection From 3D Point Clouds	Jiaxin Li, Gim Hee Lee	In this paper, we propose the USIP detector: an Unsupervised Stable Interest Point detector that can detect highly repeatable and accurately localized keypoints from 3D point clouds under arbitrary transformations without the need for any ground truth training data.
37	Mixed High-Order Attention Network for Person Re-Identification	Binghui Chen, Weihong Deng, Jiani Hu	However, state-of-the-art works concentrate only on coarse or first-order attention design, e.g. spatial and channels attention, while rarely exploring higher-order attention mechanism. We take a step towards addressing this problem.
38	Budget-Aware Adapters for Multi-Domain Learning	Rodrigo Berriel, Stephane Lathuillere, Moin Nabi, Tassilo Klein, Thiago Oliveira-Santos, Nicu Sebe, Elisa Ricci	To implement this idea we derive specialized deep models for each domain by adapting a pre-trained architecture but, differently from other methods, we propose a novel strategy to automatically adjust the computational complexity of the network.
39	Compact Trilinear Interaction for Visual Question Answering	Tuong Do, Thanh-Toan Do, Huy Tran, Erman Tjiputra, Quang D. Tran	Thus, to selectively utilize image, question and answer information, we propose a novel trilinear interaction model which simultaneously learns high level associations between these three inputs.
40	Towards Latent Attribute Discovery From Triplet Similarities	Ishan Nigam, Pavel Tokmakov, Deva Ramanan	We introduce Latent Similarity Networks (LSNs): a simple and effective technique to discover the underlying latent notions of similarity in data without any explicit attribute supervision.
41	GeoStyle: Discovering Fashion Trends and Events	Utkarsh Mall, Kevin Matzen, Bharath Hariharan, Noah Snavely, Kavita Bala	In this paper we address this need by providing an automatic framework that analyzes large corpora of street imagery to (a) discover and forecast long-term trends of various fashion attributes as well as automatically discovered styles, and (b) identify spatio-temporally localized events that affect what people wear.
42	Towards Adversarially Robust Object Detection	Haichao Zhang, Jianyu Wang	In this work, we take an initial attempt towards this direction.
43	Automatic and Robust Skull Registration Based on Discrete Uniformization	Junli Zhao, Xin Qi, Chengfeng Wen, Na Lei, Xianfeng Gu	In this work, we propose an automatic skull registration method based on the discrete uniformization theory, which can handle complicated topologies and is robust to low quality meshes.
44	Few-Shot Image Recognition With Knowledge Transfer	Zhimao Peng, Zechao Li, Junge Zhang, Yan Li, Guo-Jun Qi, Jinhui Tang	Inspired from this, we propose a novel Knowledge Transfer Network architecture (KTN) for few-shot image recognition.
45	Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings	Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen	In this paper, we propose to enrich the embedding by disentangling parts-of-speech (PoS) in the accompanying captions.
46	Vehicle Re-Identification in Aerial Imagery: Dataset and Approach	Peng Wang, Bingliang Jiao, Lu Yang, Yifei Yang, Shizhou Zhang, Wei Wei, Yanning Zhang	In this work, we construct a large-scale dataset for vehicle re-identification (ReID), which contains 137k images of 13k vehicle instances captured by UAV-mounted cameras.
47	Bridging the Domain Gap for Ground-to-Aerial Image Matching	Krishna Regmi, Mubarak Shah	We propose a novel method for solving this task by exploiting the gener- ative powers of conditional GANs to synthesize an aerial representation of a ground-level panorama query and use it to minimize the domain gap between the two views.
48	A Robust Learning Approach to Domain Adaptive Object Detection	Mehran Khodabandeh, Arash Vahdat, Mani Ranjbar, William G. Macready	In this paper, we address the domain adaptation problem from the perspective of robust learning and show that the problem may be formulated as training with noisy labels.
49	Graph-Based Object Classification for Neuromorphic Vision Sensing	Yin Bi, Aaron Chadha, Alhabib Abbas, Eirina Bourtsoulatze, Yiannis Andreopoulos	To circumvent this mismatch between sensing and processing with CNNs, we propose a compact graph representation for NVS.
50	Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving	Jiwoong Choi, Dayoung Chun, Hyun Kim, Hyuk-Jae Lee	This paper proposes a method for improving the detection accuracy while supporting a real-time operation by modeling the bounding box (bbox) of YOLOv3, which is the most representative of one-stage detectors, with a Gaussian parameter and redesigning the loss function.
51	Sharpen Focus: Learning With Attention Separability and Consistency	Lezi Wang, Ziyan Wu, Srikrishna Karanam, Kuan-Chuan Peng, Rajat Vikram Singh, Bo Liu, Dimitris N. Metaxas	In this paper, we address this problem by means of a new framework that makes class-discriminative attention a principled part of the learning process.
52	Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition	Tianshui Chen, Muxin Xu, Xiaolu Hui, Hefeng Wu, Liang Lin	To address these issues, we propose a Semantic-Specific Graph Representation Learning (SSGRL) framework that consists of two crucial modules: 1) a semantic decoupling module that incorporates category semantics to guide learning semantic-specific representations and 2) a semantic interaction module that correlates these representations with a graph built on the statistical label co-occurrence and explores their interactions via a graph propagation mechanism.
53	DeceptionNet: Network-Driven Domain Randomization	Sergey Zakharov, Wadim Kehl, Slobodan Ilic	We present a novel approach to tackle domain adaptation between synthetic and real data.
54	Pose-Guided Feature Alignment for Occluded Person Re-Identification	Jiaxu Miao, Yu Wu, Ping Liu, Yuhang Ding, Yi Yang	In this paper, we introduce a novel method named Pose-Guided Feature Alignment (PGFA), exploiting pose landmarks to disentangle the useful information from the occlusion noise. Besides, we construct a large-scale dataset for the Occluded Person Re-ID problem, namely Occluded-DukeMTMC, which is by far the largest dataset for the Occlusion Person Re-ID.
55	Robust Person Re-Identification by Modelling Feature Uncertainty	Tianyuan Yu, Da Li, Yongxin Yang, Timothy M. Hospedales, Tao Xiang	In this paper, we propose a novel deep network termed DistributionNet for robust ReID.
56	Co-Segmentation Inspired Attention Networks for Video-Based Person Re-Identification	Arulkumar Subramaniam, Athira Nambiar, Anurag Mittal	In this work, we propose a novel Co-segmentation inspired video Re-ID deep architecture and formulate a Co-segmentation based Attention Module (COSAM) that activates a common set of salient features across multiple frames of a video via mutual consensus in an unsupervised manner.
57	A Delay Metric for Video Object Detection: What Average Precision Fails to Tell	Huizi Mao, Xiaodong Yang, William J. Dally	In this paper, we analyze the object detection from video and point out that mAP alone is not sufficient to capture the temporal nature of video object detection.
58	IL2M: Class Incremental Learning With Dual Memory	Eden Belouadah, Adrian Popescu	This paper presents a class incremental learning (IL) method which exploits fine tuning and a dual memory to reduce the negative effect of catastrophic forgetting in image recognition.
59	Asymmetric Non-Local Neural Networks for Semantic Segmentation	Zhen Zhu, Mengde Xu, Song Bai, Tengteng Huang, Xiang Bai	In this paper, we present Asymmetric Non-local Neural Network to semantic segmentation, which has two prominent components: Asymmetric Pyramid Non-local Block (APNB) and Asymmetric Fusion Non-local Block (AFNB).
60	CCNet: Criss-Cross Attention for Semantic Segmentation	Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, Wenyu Liu	In this work, we propose a Criss-Cross Network (CCNet) for obtaining such contextual information in a more effective and efficient way.
61	Convex Shape Prior for Multi-Object Segmentation Using a Single Level Set Function	Shousheng Luo, Xue-Cheng Tai, Limei Huo, Yang Wang, Roland Glowinski	This paper proposes a method to incorporate convex shape prior for multi-object segmentation using level set method.
62	Feature Weighting and Boosting for Few-Shot Segmentation	Khoi Nguyen, Sinisa Todorovic	We make two contributions by: (1) Improving discriminativeness of features so their activations are high on the foreground and low elsewhere; and (2) Boosting inference with an ensemble of experts guided with the gradient of loss incurred when segmenting the support images in testing.
63	Surface Networks via General Covers	Niv Haim, Nimrod Segol, Heli Ben-Hamu, Haggai Maron, Yaron Lipman	This paper tackles the problem of sphere-type surface learning by developing a novel surface-to-image representation.
64	SSAP: Single-Shot Instance Segmentation With Affinity Pyramid	Naiyu Gao, Yanhu Shan, Yupei Wang, Xin Zhao, Yinan Yu, Ming Yang, Kaiqi Huang	To this end, this work proposes a single-shot proposal-free instance segmentation method that requires only one single pass for prediction.
65	Learning Propagation for Arbitrarily-Structured Data	Sifei Liu, Xueting Li, Varun Jampani, Shalini De Mello, Jan Kautz	In this paper, we propose to learn pairwise relations among data points in a global fashion to improve semantic segmentation with arbitrarily-structured data, through spatial generalized propagation networks (SGPN).
66	MultiSeg: Semantically Meaningful, Scale-Diverse Segmentations From Minimal User Input	Jun Hao Liew, Scott Cohen, Brian Price, Long Mai, Sim-Heng Ong, Jiashi Feng	Motivated by the observation that the object part, full object, and a collection of objects essentially differ in size, we propose a new concept called scale-diversity, which characterizes the spectrum of segmentations w.r.t. different scales.
67	Robust Motion Segmentation From Pairwise Matches	Federica Arrigoni, Tomas Pajdla	In this paper we consider the problem of motion segmentation, where only pairwise correspondences are assumed as input without prior knowledge about tracks.
68	InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting	Hao-Shu Fang, Jianhua Sun, Runzhong Wang, Minghao Gou, Yong-Lu Li, Cewu Lu	In this paper, we present a simple, efficient and effective method to augment the training set using the existing instance mask annotations.
69	Racial Faces in the Wild: Reducing Racial Bias by Information Maximization Adaptation Network	Mei Wang, Weihong Deng, Jiani Hu, Xunqiang Tao, Yaohai Huang	This unsupervised method simultaneously aligns global distribution to decrease race gap at domain-level, and learns the discriminative target representations at cluster level.
70	Uncertainty Modeling of Contextual-Connections Between Tracklets for Unconstrained Video-Based Face Recognition	Jingxiao Zheng, Ruichi Yu, Jun-Cheng Chen, Boyu Lu, Carlos D. Castillo, Rama Chellappa	In this paper, we propose the Uncertainty-Gated Graph (UGG), which conducts graph-based identity propagation between tracklets, which are represented by nodes in a graph.
71	Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading	Xingxuan Zhang, Feng Cheng, Shilin Wang	To well solve these drawbacks, we propose a Temporal Focal block to sufficiently describe short-range dependencies and a Spatio-Temporal Fusion Module (STFM) to maintain the local spatial information and to reduce the feature dimensions as well.
72	Occlusion-Aware Networks for 3D Human Pose Estimation in Video	Yu Cheng, Bo Yang, Bo Wang, Wending Yan, Robby T. Tan	To address this problem, we introduce an occlusion-aware deep-learning framework.
73	Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data	Yong Zhang, Haiyong Jiang, Baoyuan Wu, Yanbo Fan, Qiang Ji	In this paper, we propose a novel weakly supervised patch-based deep model on basis of two types of attention mechanisms for joint intensity estimation of multiple AUs.
74	Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning	Chaoyang Wang, Chen Kong, Simon Lucey	We propose to learn a 3D pose estimator by distilling knowledge from Non-Rigid Structure from Motion (NRSfM).
75	MONET: Multiview Semi-Supervised Keypoint Detection via Epipolar Divergence	Yuan Yao, Yasamin Jafarian, Hyun Soo Park	This paper presents MONET—an end-to-end semi-supervised learning framework for a keypoint detector using multiview image streams.
76	Occlusion Robust Face Recognition Based on Mask Learning With Pairwise Differential Siamese Network	Lingxue Song, Dihong Gong, Zhifeng Li, Changsong Liu, Wei Liu	Inspired by the fact that human visual system explicitly ignores the occlusion and only focuses on the non-occluded facial areas, we propose a mask learning strategy to find and discard corrupted feature elements from recognition.
77	Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection	Xuanyi Dong, Yi Yang	In this paper, we study facial landmark detection from partially labeled facial images.
78	A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image	Fu Xiong, Boshen Zhang, Yang Xiao, Zhiguo Cao, Taidong Yu, Joey Tianyi Zhou, Junsong Yuan	For 3D hand and body pose estimation task in depth image, a novel anchor-based approach termed Anchor-to-Joint regression network (A2J) with the end-to-end learning ability is proposed.
79	TexturePose: Supervising Human Mesh Estimation With Texture Consistency	Georgios Pavlakos, Nikos Kolotouros, Kostas Daniilidis	In this work, we advocate that there are more cues we can leverage, which are available for free in natural images, i.e., without getting more annotations, or modifying the network architecture.
80	FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images	Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max Argus, Thomas Brox	In this paper, we analyze cross-dataset generalization when training on existing datasets. As a consequence, we introduce the first large-scale, multi-view hand dataset that is accompanied by both 3D hand pose and shape annotations.
81	Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles	Nitin Saini, Eric Price, Rahul Tallamraju, Raffi Enficiaud, Roman Ludwig, Igor Martinovic, Aamir Ahmad, Michael J. Black	To make motion capture truly unconstrained, we describe the first fully autonomous outdoor capture system based on flying vehicles.
82	Toyota Smarthome: Real-World Activities of Daily Living	Srijan Das, Rui Dai, Michal Koperski, Luca Minciullo, Lorenzo Garattoni, Francois Bremond, Gianpiero Francesca	In this paper, we introduce a large real-world video dataset for activities of daily living: Toyota Smarthome. We release the dataset for research use.
83	Relation Parsing Neural Network for Human-Object Interaction Detection	Penghao Zhou, Mingmin Chi	In this paper, we propose a novel model, i.e., Relation Parsing Neural Network (RPNN), to detect human-object interactions.
84	DistInit: Learning Video Representations Without a Single Labeled Video	Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan	In this work we propose an alternative approach to learning video representations that requires no semantically labeled videos, and instead leverages the years of effort in collecting and labeling large and clean still-image datasets.
85	Zero-Shot Anticipation for Instructional Activities	Fadime Sener, Angela Yao	We address the problem of zero-shot anticipation by presenting a hierarchical model that generalizes instructional knowledge from large-scale text-corpora and transfers the knowledge to the visual domain. To demonstrate the anticipation capabilities of our model, we introduce the Tasty Videos dataset, a collection of 2511 recipes for zero-shot learning, recognition and anticipation.
86	Making the Invisible Visible: Action Recognition Through Walls and Occlusions	Tianhong Li, Lijie Fan, Mingmin Zhao, Yingcheng Liu, Dina Katabi	In this paper, we introduce a neural network model that can detect human actions through walls and occlusions, and in poor lighting conditions.
87	Recursive Visual Sound Separation Using Minus-Plus Net	Xudong Xu, Bo Dai, Dahua Lin	In this paper we propose a novel framework, referred to as MinusPlus Network (MP-Net), for the task of visual sound separation.
88	Unsupervised Video Interpolation Using Cycle Consistency	Fitsum A. Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro	Here, we propose unsupervised techniques to synthesize high frame rate videos directly from low frame rate videos using cycle consistency.
89	Deformable Surface Tracking by Graph Matching	Tao Wang, Haibin Ling, Congyan Lang, Songhe Feng, Xiaohui Hou	Specifically, we propose a graph-based approach that effectively explores the structure information of the surface to enhance tracking performance.
90	Deep Meta Learning for Real-Time Target-Aware Visual Tracking	Janghoon Choi, Junseok Kwon, Kyoung Mu Lee	In this paper, we propose a novel on-line visual tracking framework based on the Siamese matching network and meta-learner network, which run at real-time speeds.
91	Looking to Relations for Future Trajectory Forecast	Chiho Choi, Behzad Dariush	To this end, we propose a relation-aware framework for future trajectory forecast.
92	Anchor Diffusion for Unsupervised Video Object Segmentation	Zhao Yang, Qiang Wang, Luca Bertinetto, Weiming Hu, Song Bai, Philip H. S. Torr	Inspired by the non-local operators, we introduce a technique to establish dense correspondences between pixel embeddings of a reference “anchor” frame and the current one.
93	Tracking Without Bells and Whistles	Philipp Bergmann, Tim Meinhardt, Laura Leal-Taixe	We present a tracker (without bells and whistles) that accomplishes tracking without specifically targeting any of these tasks, in particular, we perform no training or optimization on tracking data.
94	Perspective-Guided Convolution Networks for Crowd Counting	Zhaoyi Yan, Yuchen Yuan, Wangmeng Zuo, Xiao Tan, Yezhen Wang, Shilei Wen, Errui Ding	In this paper, we propose a novel perspective-guided convolution (PGC) for convolutional neural network (CNN) based crowd counting (i.e. PGCNet), which aims to overcome the dramatic intra-scene scale variations of people due to the perspective effect.
95	End-to-End Wireframe Parsing	Yichao Zhou, Haozhi Qi, Yi Ma	We present a conceptually simple yet effective algorithm to detect wireframes in a given image.
96	Incremental Class Discovery for Semantic Segmentation With RGBD Sensing	Yoshikatsu Nakajima, Byeongkeun Kang, Hideo Saito, Kris Kitani	Towards a more open world approach, we propose a novel method that incrementally learns new classes for image segmentation.
97	SSF-DAN: Separated Semantic Feature Based Domain Adaptation Network for Semantic Segmentation	Liang Du, Jingang Tan, Hongye Yang, Jianfeng Feng, Xiangyang Xue, Qibao Zheng, Xiaoqing Ye, Xiaolin Zhang	In this work, we propose a Separated Semantic Feature based domain adaptation network, named SSF-DAN, for semantic segmentation.
98	SpaceNet MVOI: A Multi-View Overhead Imagery Dataset	Nicholas Weir, David Lindenbaum, Alexei Bastidas, Adam Van Etten, Sean McPherson, Jacob Shermeyer, Varun Kumar, Hanlin Tang	To address this problem, we present an open source Multi-View Overhead Imagery dataset, termed SpaceNet MVOI, with 27 unique looks from a broad range of viewing angles (-32.5 degrees to 54.0 degrees).
99	Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting	Vishwanath A. Sindagi, Vishal M. Patel	Specifically, we present a network that involves: (i) a multi-level bottom-top and top-bottom fusion (MBTTBF) method to combine information from shallower to deeper layers and vice versa at multiple levels, (ii) scale complementary feature extraction blocks (SCFB) involving cross-scale residual functions to explicitly enable flow of complementary features from adjacent conv layers along the fusion paths.
100	Learning Lightweight Lane Detection CNNs by Self Attention Distillation	Yuenan Hou, Zheng Ma, Chunxiao Liu, Chen Change Loy	In this paper, we present a novel knowledge distillation approach, i.e., Self Attention Distillation (SAD), which allows a model to learn from itself and gains substantial improvement without any additional supervision or labels.
101	SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation	Daniel Gordon, Abhishek Kadian, Devi Parikh, Judy Hoffman, Dhruv Batra	We propose SplitNet, a method for decoupling visual perception and policy learning.
102	Cascaded Parallel Filtering for Memory-Efficient Image-Based Localization	Wentao Cheng, Weisi Lin, Kan Chen, Xinfeng Zhang	In this work, we propose a cascaded parallel filtering method that leverages the feature, visibility and geometry information to filter wrong matches under binary feature representation.
103	Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation	Chao Wen, Yinda Zhang, Zhuwen Li, Yanwei Fu	We study the problem of shape generation in 3D mesh representation from a few color images with known camera poses.
104	A Differential Volumetric Approach to Multi-View Photometric Stereo	Fotios Logothetis, Roberto Mecca, Roberto Cipolla	In this work, we present a volumetric approach to the multi-view photometric stereo problem.
105	Revisiting Radial Distortion Absolute Pose	Viktor Larsson, Torsten Sattler, Zuzana Kukelova, Marc Pollefeys	We present a general approach which can handle rational models of arbitrary degree for both distortion and undistortion.
106	Estimating the Fundamental Matrix Without Point Correspondences With Application to Transmission Imaging	Tobias Wurfl, Andre Aichert, Nicole Maass, Frank Dennerlein, Andreas Maier	We present a general method to estimate the fundamental matrix from a pair of images under perspective projection without the need for image point correspondences.
107	QUARCH: A New Quasi-Affine Reconstruction Stratum From Vague Relative Camera Orientation Knowledge	Devesh Adlakha, Adlane Habed, Fabio Morbidi, Cedric Demonceaux, Michel de Mathelin	We present a new quasi-affine reconstruction of a scene and its application to camera self-calibration.
108	Homography From Two Orientation- and Scale-Covariant Features	Daniel Barath, Zuzana Kukelova	This paper proposes a geometric interpretation of the angles and scales which the orientation- and scale-covariant feature detectors, e.g. SIFT, provide.
109	Hiding Video in Audio via Reversible Generative Models	Hyukryul Yang, Hao Ouyang, Vladlen Koltun, Qifeng Chen	We present a method for hiding video content inside audio files while preserving the perceptual fidelity of the cover audio.
110	GSLAM: A General SLAM Framework and Benchmark	Yong Zhao, Shibiao Xu, Shuhui Bu, Hongkai Jiang, Pengcheng Han	In this paper, we propose a novel SLAM platform named GSLAM, which not only provides evaluation functionality, but also supplies useful toolkit for researchers to quickly develop their SLAM systems.
111	Elaborate Monocular Point and Line SLAM With Robust Initialization	Sang Jun Lee, Sung Soo Hwang	This paper presents a monocular indirect SLAM system which performs robust initialization and accurate localization.
112	Adaptive Density Map Generation for Crowd Counting	Jia Wan, Antoni Chan	To address this issue, we first show the impact of different density maps and that better ground-truth density maps can be obtained by refining the existing ones using a learned refinement network, which is jointly trained with the counter. Then, we propose an adaptive density map generator, which takes the annotation dot map as input, and learns a density map representation for a counter.
113	Attention-Aware Polarity Sensitive Embedding for Affective Image Retrieval	Xingxu Yao, Dongyu She, Sicheng Zhao, Jie Liang, Yu-Kun Lai, Jufeng Yang	To address the problem, this paper introduces an Attention-aware Polarity Sensitive Embedding (APSE) network to learn affective representations in an end-to-end manner.
114	Zero-Shot Emotion Recognition via Affective Structural Embedding	Chi Zhan, Dongyu She, Sicheng Zhao, Ming-Ming Cheng, Jufeng Yang	In this paper, we investigate zero-shot learning (ZSL) problem in the emotion recognition task, which tries to recognize the new unseen emotions.
115	FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On	Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bowen Wu, Bing-Cheng Chen, Jian Yin	In this work, we propose Flow-navigated Warping Generative Adversarial Network (FW-GAN), a novel framework that learns to synthesize the video of virtual try-on based on a person image, the desired clothes image, and a series of target poses.
116	Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation	Arnab Ghosh, Richard Zhang, Puneet K. Dokania, Oliver Wang, Alexei A. Efros, Philip H. S. Torr, Eli Shechtman	In order to use a single model for a wide array of object classes, we introduce a gating-based approach for class conditioning, which allows us to generate distinct classes without feature mixing, from a single generator network.
117	Attention-Based Autism Spectrum Disorder Screening With Privileged Modality	Shi Chen, Qi Zhao	This paper presents a novel framework for automatic and quantitative screening of autism spectrum disorder (ASD).
118	Image Aesthetic Assessment Based on Pairwise Comparison A Unified Approach to Score Regression, Binary Classification, and Personalization	Jun-Tae Lee, Chang-Su Kim	We propose a unified approach to three tasks of aesthetic score regression, binary aesthetic classification, and personalized aesthetics.
119	Delving Into Robust Object Detection From Unmanned Aerial Vehicles: A Deep Nuisance Disentanglement Approach	Zhenyu Wu, Karthik Suresh, Priya Narayanan, Hongyu Xu, Heesung Kwon, Zhangyang Wang	We propose to utilize those free meta-data in conjunction with associated UAV images to learn domain-robust features via an adversarial training framework dubbed Nuisance Disentangled Feature Transform (NDFT), for the specific challenging problem of object detection in UAV images, achieving a substantial gain in robustness to those nuisances.
120	Bit-Flip Attack: Crushing Neural Network With Progressive Bit Search	Adnan Siraj Rakin, Zhezhi He, Deliang Fan	In this work, we are the first to propose a novel DNN weight attack methodology called Bit-Flip Attack (BFA) which can crush a neural network through maliciously flipping extremely small amount of bits within its weight storage memory system (i.e., DRAM).
121	Pushing the Frontiers of Unconstrained Crowd Counting: New Dataset and Benchmark Method	Vishwanath A. Sindagi, Rajeev Yasarla, Vishal M. Patel	In this work, we propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation.
122	Employing Deep Part-Object Relationships for Salient Object Detection	Yi Liu, Qiang Zhang, Dingwen Zhang, Jungong Han	To solve this problem, we dig into part-object relationships and take the unprecedented attempt to employ these relationships endowed by the Capsule Network (CapsNet) for salient object detection.
123	Self-Supervised Deep Depth Denoising	Vladimiros Sterzentsenko, Leonidas Saroglou, Anargyros Chatzitofis, Spyridon Thermos, Nikolaos Zioulis, Alexandros Doumanoglou, Dimitrios Zarpalas, Petros Daras	In this paper, we propose a fully convolutional deep autoencoder that learns to denoise depth maps, surpassing the lack of ground truth data.
124	Cost-Aware Fine-Grained Recognition for IoTs Based on Sequential Fixations	Hanxiao Wang, Venkatesh Saligrama, Stan Sclaroff, Vitaly Ablavsky	We propose a novel deep reinforcement learning-based foveation model, DRIFT, that sequentially generates and recognizes mixed-acuity images.
125	Layout-Induced Video Representation for Recognizing Agent-in-Place Actions	Ruichi Yu, Hongcheng Wang, Ang Li, Jingxiao Zheng, Vlad I. Morariu, Larry S. Davis	We introduce a novel representation to model the geometry and topology of scene layouts so that a network can generalize from the layouts observed in the training scenes to unseen scenes in the test set.
126	Anomaly Detection in Video Sequence With Appearance-Motion Correspondence	Trong-Nguyen Nguyen, Jean Meunier	We propose a deep convolutional neural network (CNN) that addresses this problem by learning a correspondence between common object appearances (e.g. pedestrian, background, tree, etc.) and their associated motions.
127	Exploring Randomly Wired Neural Networks for Image Recognition	Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He	In this paper, we explore a more diverse set of connectivity patterns through the lens of randomly wired neural networks.
128	Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation	Xin Chen, Lingxi Xie, Jun Wu, Qi Tian	In this paper, we present an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure.
129	Multinomial Distribution Learning for Effective Neural Architecture Search	Xiawu Zheng, Rongrong Ji, Lang Tang, Baochang Zhang, Jianzhuang Liu, Qi Tian	In this paper, we propose a Multinomial Distribution Learning for extremely effective NAS, which considers the search space as a joint multinomial distribution, i.e., the operation between two nodes is sampled from this distribution, and the optimal network structure is obtained by the operations with the most likely probability in this distribution.
130	Searching for MobileNetV3	Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam	We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design.
131	Data-Free Quantization Through Weight Equalization and Bias Correction	Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling	We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection.
132	A Camera That CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays	Laurie Bose, Jianing Chen, Stephen J. Carey, Piotr Dudek, Walterio Mayol-Cuevas	We present a convolutional neural network implementation for pixel processor array (PPA) sensors.
133	Knowledge Distillation via Route Constrained Optimization	Xiao Jin, Baoyun Peng, Yichao Wu, Yu Liu, Jiaheng Liu, Ding Liang, Junjie Yan, Xiaolin Hu	In this work, we consider the knowledge distillation from the perspective of curriculum learning by teacher’s routing.
134	Distillation-Based Training for Multi-Exit Architectures	Mary Phuong, Christoph H. Lampert	In this work, we propose a new training procedure for multi-exit architectures based on the principle of knowledge distillation.
135	Similarity-Preserving Knowledge Distillation	Frederick Tung, Greg Mori	In this paper, we propose a new form of knowledge distillation loss that is inspired by the observation that semantically similar inputs tend to elicit similar activation patterns in a trained network.
136	Many Task Learning With Task Routing	Gjorgji Strezoski, Nanne van Noord, Marcel Worring	In this paper, we introduce a method which applies a conditional feature-wise transformation over the convolutional activations that enables a model to successfully perform a large number of tasks.
137	Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels	Felix J.S. Bragman, Ryutaro Tanno, Sebastien Ourselin, Daniel C. Alexander, Jorge Cardoso	In this paper, we present a probabilistic approach to learning task-specific and shared representations in CNNs for multi-task learning.
138	Transferability and Hardness of Supervised Classification Tasks	Anh T. Tran, Cuong V. Nguyen, Tal Hassner	We propose a novel approach for estimating the difficulty and transferability of supervised classification tasks.
139	Moment Matching for Multi-Source Domain Adaptation	Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, Bo Wang	We make three major contributions towards addressing this problem. First, we collect and annotate by far the largest UDA dataset, called DomainNet, which contains six domains and about 0.6 million images distributed among 345 categories, addressing the gap in data availability for multi-source UDA research. Second, we propose a new deep learning approach, Moment Matching for Multi-Source Domain Adaptation (M3SDA), which aims to transfer knowledge learned from multiple labeled source domains to an unlabeled target domain by dynamically aligning moments of their feature distributions.
140	Unsupervised Domain Adaptation via Regularized Conditional Alignment	Safa Cicek, Stefano Soatto	We propose a method for unsupervised domain adaptation that trains a shared embedding to align the joint distributions of inputs (domain) and outputs (classes), making any classifier agnostic to the domain.
141	Larger Norm More Transferable: An Adaptive Feature Norm Approach for Unsupervised Domain Adaptation	Ruijia Xu, Guanbin Li, Jihan Yang, Liang Lin	In this paper, we empirically reveal that the erratic discrimination of the target domain mainly stems from its much smaller feature norms with respect to that of the source domain.
142	UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation	Jogendra Nath Kundu, Nishank Lakkakula, R. Venkatesh Babu	In this paper, we propose UM-Adapt – a unified framework to effectively perform unsupervised domain adaptation for spatially-structured prediction tasks, simultaneously maintaining a balanced performance across individual tasks in a multi-task setting.
143	Episodic Training for Domain Generalization	Da Li, Jianshu Zhang, Yongxin Yang, Cong Liu, Yi-Zhe Song, Timothy M. Hospedales	In this paper we build on this strong baseline by designing an episodic training procedure that trains a single deep network in a way that exposes it to the domain shift that characterises a novel domain at runtime.
144	Domain Adaptation for Structured Output via Discriminative Patch Representations	Yi-Hsuan Tsai, Kihyuk Sohn, Samuel Schulter, Manmohan Chandraker	We propose to learn discriminative feature representations of patches in the source domain by discovering multiple modes of patch-wise output distribution through the construction of a clustered space.
145	Semi-Supervised Learning by Augmented Distribution Alignment	Qin Wang, Wen Li, Luc Van Gool	In this work, we propose a simple yet effective semi-supervised learning approach called Augmented Distribution Alignment.
146	S4L: Self-Supervised Semi-Supervised Learning	Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, Lucas Beyer	Unifying these two approaches, we propose the framework of self-supervised semi-supervised learning (S4L) and use it to derive two novel semi-supervised image classification methods.
147	Privacy Preserving Image Queries for Camera Localization	Pablo Speciale, Johannes L. Schonberger, Sudipta N. Sinha, Marc Pollefeys	We propose to conceal the content of the query images from an adversary on the server or a man-in-the-middle intruder.
148	Calibration Wizard: A Guidance System for Camera Calibration Based on Modelling Geometric and Corner Uncertainty	Songyou Peng, Peter Sturm	We present a system — Calibration Wizard — that interactively guides a user towards taking optimal calibration images.
149	Gated2Depth: Real-Time Dense Lidar From Gated Images	Tobias Gruber, Frank Julca-Aguilar, Mario Bijelic, Felix Heide	We present an imaging framework which converts three images from a gated camera into high-resolution depth maps with depth accuracy comparable to pulsed lidar measurements.
150	X-Section: Cross-Section Prediction for Enhanced RGB-D Fusion	Andrea Nicastro, Ronald Clark, Stefan Leutenegger	Here, we propose X-Section, an RGB-D 3D reconstruction approach that leverages deep learning to make object-level predictions about thicknesses that can be readily integrated into a volumetric multi-view fusion process, where we propose an extension to the popular KinectFusion approach.
151	Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo	Stepan Tulyakov, Francois Fleuret, Martin Kiefel, Peter Gehler, Michael Hirsch	To address this problem we introduce a new module for event sequence embedding, for use in difference applications.
152	Point-Based Multi-View Stereo Network	Rui Chen, Songfang Han, Jing Xu, Hao Su	We introduce Point-MVSNet, a novel point-based deep framework for multi-view stereo (MVS).
153	Discrete Laplace Operator Estimation for Dynamic 3D Reconstruction	Xiangyu Xu, Enrique Dunn	We present a general paradigm for dynamic 3D reconstruction from multiple independent and uncontrolled image sources having arbitrary temporal sampling density and distribution.
154	Deep Non-Rigid Structure From Motion	Chen Kong, Simon Lucey	In this paper we propose a novel deep neural network to recover camera poses and 3D points solely from an ensemble of 2D image coordinates.
155	Equivariant Multi-View Networks	Carlos Esteves, Yinshuang Xu, Christine Allen-Blanchette, Kostas Daniilidis	In this paper, we propose a group convolutional approach to multiple view aggregation where convolutions are performed over a discrete subgroup of the rotation group, enabling, thus, joint reasoning over all views in an equivariant (instead of invariant) fashion, up to the very last layer.
156	Interpolated Convolutional Networks for 3D Point Cloud Understanding	Jiageng Mao, Xiaogang Wang, Hongsheng Li	In this paper, we propose a novel Interpolated Convolution operation, InterpConv, to tackle the point cloud feature learning and understanding problem.
157	Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data	Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Thanh Nguyen, Sai-Kit Yeung	To prove this, we introduce ScanObjectNN, a new real-world point cloud object dataset based on scanned indoor scene data.
158	PointCloud Saliency Maps	Tianhang Zheng, Changyou Chen, Junsong Yuan, Bo Li, Kui Ren	In this paper, we propose a novel way of characterizing critical points and segments to build point-cloud saliency maps.
159	ShellNet: Efficient Point Cloud Convolutional Neural Networks Using Concentric Shells Statistics	Zhiyuan Zhang, Binh-Son Hua, Sai-Kit Yeung	In this paper, we address these problems by proposing an efficient end-to-end permutation invariant convolution for point cloud deep learning.
160	Unsupervised Deep Learning for Structured Shape Matching	Jean-Michel Roufosse, Abhishek Sharma, Maks Ovsjanikov	We present a novel method for computing correspondences across 3D shapes using unsupervised learning.
161	Linearly Converging Quasi Branch and Bound Algorithms for Global Rigid Registration	Nadav Dym, Shahar Ziv Kovalsky	In this paper, we suggest a general framework to improve upon the BnB approach, which we name Quasi BnB.
162	Consensus Maximization Tree Search Revisited	Zhipeng Cai, Tat-Jun Chin, Vladlen Koltun	We make two key contributions towards improving A* tree search. We propose a new acceleration strategy that avoids such redundant paths. In the second contribution, we show that the existing branch pruning technique also deteriorates quickly with the problem dimension.
163	Quasi-Globally Optimal and Efficient Vanishing Point Estimation in Manhattan World	Haoang Li, Ji Zhao, Jean-Charles Bazin, Wen Chen, Zhe Liu, Yun-Hui Liu	In Manhattan world, given several lines in a calibrated image, we aim at clustering them by three unknown-but-sought VPs.
164	An Efficient Solution to the Homography-Based Relative Pose Problem With a Common Reference Direction	Yaqing Ding, Jian Yang, Jean Ponce, Hui Kong	In this paper, we propose a novel approach to two-view minimal-case relative pose problems based on homography with a common reference direction.
165	A Quaternion-Based Certifiably Optimal Solution to the Wahba Problem With Outliers	Heng Yang, Luca Carlone	This work proposes the first polynomial-time certifiably optimal approach for solving the Wahba problem when a large number of vector observations are outliers.
166	PLMP – Point-Line Minimal Problems in Complete Multi-View Visibility	Timothy Duff, Kathlen Kohn, Anton Leykin, Tomas Pajdla	We present a complete classification of all minimal problems for generic arrangements of points and lines completely observed by calibrated perspective cameras.
167	Variational Few-Shot Learning	Jian Zhang, Chenglong Zhao, Bingbing Ni, Minghao Xu, Xiaokang Yang	We propose a variational Bayesian framework for enhancing few-shot learning performance.
168	Generative Adversarial Minority Oversampling	Sankha Subhra Mullick, Shounak Datta, Swagatam Das	We propose a three-player adversarial game between a convex generator, a multi-class classifier network, and a real/fake discriminator to perform oversampling in deep learning systems.
169	Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection	Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, Anton van den Hengel	To mitigate this drawback for autoencoder based anomaly detector, we propose to augment the autoencoder with a memory module and develop an improved autoencoder called memory-augmented autoencoder, i.e. MemAE.
170	Topological Map Extraction From Overhead Images	Zuoyue Li, Jan Dirk Wegner, Aurelien Lucchi	We propose a new approach, named PolyMapper, to circumvent the conventional pixel-wise segmentation of (aerial) images and predict objects in a vector representation directly.
171	Exploiting Temporal Consistency for Real-Time Video Depth Estimation	Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, Youliang Yan	In this work, we focus on exploring temporal information from monocular videos for depth estimation.
172	The Sound of Motions	Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba	Inspired by the fact that humans is capable of interpreting sound sources from how objects move visually, we propose a novel system that explicitly captures such motion cues for the task of sound localization and separation.
173	SC-FEGAN: Face Editing Generative Adversarial Network With User’s Sketch and Color	Youngjoo Jo, Jongyoul Park	We present a novel image editing system that generates images as the user provides free-form masks, sketches and color as inputs.
174	Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style	Hongwei Ge, Zehang Yan, Kai Zhang, Mingde Zhao, Liang Sun	In this paper, we explore the utilization of a human-like cognitive style, i.e., building overall cognition for the image to be described and the sentence to be constructed, for enhancing computer image understanding.
175	Order-Aware Generative Modeling Using the 3D-Craft Dataset	Zhuoyuan Chen, Demi Guo, Tong Xiao, Saining Xie, Xinlei Chen, Haonan Yu, Jonathan Gray, Kavya Srinet, Haoqi Fan, Jerry Ma, Charles R. Qi, Shubham Tulsiani, Arthur Szlam, C. Lawrence Zitnick	In this paper, we study the problem of sequentially building houses in the game of Minecraft, and demonstrate that learning the ordering can make for more effective autoregressive models. We introduce a new dataset, HouseCraft, for this new task.
176	Crowd Counting With Deep Structured Scale Integration Network	Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, Liang Lin	In this paper, we propose a novel Deep Structured Scale Integration Network (DSSINet) for crowd counting, which addresses the scale variation of people by using structured feature representation learning and hierarchically structured loss function optimization.
177	Bidirectional One-Shot Unsupervised Domain Mapping	Tomer Cohen, Lior Wolf	The method we present is able to perform this mapping in both directions.
178	Evolving Space-Time Neural Architectures for Videos	AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo	We present a new method for finding video CNN architectures that more optimally capture rich spatio-temporal information in videos.
179	Universally Slimmable Networks and Improved Training Techniques	Jiahui Yu, Thomas S. Huang	In this work, we propose a systematic approach to train universally slimmable networks (US-Nets), extending slimmable networks to execute at arbitrary width, and generalizing to networks both with and without batch normalization layers.
180	AutoDispNet: Improving Disparity Estimation With AutoML	Tonmoy Saikia, Yassine Marrakchi, Arber Zela, Frank Hutter, Thomas Brox	In this work, we show how to use and extend existing AutoML techniques to efficiently optimize large-scale U-Net-like encoder-decoder architectures.
181	Deep Meta Functionals for Shape Representation	Gidi Littwin, Lior Wolf	We present a new method for 3D shape reconstruction from a single image, in which a deep neural network directly maps an image to a vector of network weights.
182	Differentiable Kernel Evolution	Yu Liu, Jihao Liu, Ailing Zeng, Xiaogang Wang	This paper proposes a differentiable kernel evolution (DKE) algorithm to find a better layer-operator for the convolutional neural network.
183	Batch Weight for Domain Adaptation With Mass Shift	Mikolaj Binkowski, Devon Hjelm, Aaron Courville	We propose a principled method of re-weighting training samples to correct for such mass shift between the transferred distributions, which we call batch weight.
184	SRM: A Style-Based Recalibration Module for Convolutional Neural Networks	HyunJae Lee, Hyo-Eun Kim, Hyeonseob Nam	In this paper, we aim to fully leverage the potential of styles to improve the performance of CNNs in general vision tasks.
185	Switchable Whitening for Deep Representation Learning	Xingang Pan, Xiaohang Zhan, Jianping Shi, Xiaoou Tang, Ping Luo	Unlike existing works that design normalization techniques for specific tasks, we propose Switchable Whitening (SW), which provides a general form unifying different whitening methods as well as standardization methods.
186	Adaptative Inference Cost With Convolutional Neural Mixture Models	Adria Ruiz, Jakob Verbeek	Within the proposed framework, we present different mechanisms to prune subsets of CNNs from the mixture, allowing to easily adapt the computational cost required for inference.
187	On Network Design Spaces for Visual Recognition	Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollar	To help sustain this rate of progress, in this work we propose to reexamine the methodology for comparing network architectures.
188	Improved Techniques for Training Adaptive Deep Networks	Hao Li, Hong Zhang, Xiaojuan Qi, Ruigang Yang, Gao Huang	We present three techniques to improve its training efficacy from two aspects: 1) a Gradient Equilibrium algorithm to resolve the conflict of learning of different classifiers; 2) an Inline Subnetwork Collaboration approach and a One-for-all Knowledge Distillation algorithm to enhance the collaboration among classifiers.
189	Resource Constrained Neural Network Architecture Search: Will a Submodularity Assumption Help?	Yunyang Xiong, Ronak Mehta, Vikas Singh	Based on this observation, we adapt algorithms within discrete optimization to obtain heuristic schemes for neural network architecture search, where we have resource constraints on the architecture.
190	ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks	Xiaohan Ding, Yuchen Guo, Guiguang Ding, Jungong Han	We propose Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels.
191	A Comprehensive Overhaul of Feature Distillation	Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, Jin Young Choi	We investigate the design aspects of feature distillation methods achieving network compression and propose a novel feature distillation method in which the distillation loss is designed to make a synergy among various aspects: teacher transform, student transform, distillation feature position and distance function.
192	Transferable Semi-Supervised 3D Object Detection From RGB-D Data	Yew Siang Tang, Gim Hee Lee	To this end, we propose a transferable semi-supervised 3D object detection model that learns a 3D object detector network from training data with two disjoint sets of object classes – a set of strong classes with both 2D and 3D box labels, and another set of weak classes with only 2D box labels.
193	DPOD: 6D Pose Object Detector and Refiner	Sergey Zakharov, Ivan Shugurov, Slobodan Ilic	In this paper we present a novel deep learning method for 3D object detection and 6D pose estimation from RGB images.
194	STD: Sparse-to-Dense 3D Object Detector for Point Cloud	Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, Jiaya Jia	We propose a two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD).
195	DUP-Net: Denoiser and Upsampler Network for 3D Adversarial Point Clouds Defense	Hang Zhou, Kejiang Chen, Weiming Zhang, Han Fang, Wenbo Zhou, Nenghai Yu	In this paper, statistical outlier removal (SOR) and a data-driven upsampling network are considered as denoiser and upsampler respectively.
196	Learning Rich Features at High-Speed for Single-Shot Object Detection	Tiancai Wang, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao	We introduce a single-stage detection framework that combines the advantages of both fine-tuning pretrained models and training from scratch.
197	Detecting Unseen Visual Relations Using Analogies	Julia Peyre, Ivan Laptev, Cordelia Schmid, Josef Sivic	The contributions of this work are three-fold. First, we learn a representation of visual relations that combines (i) individual embeddings for subject, object and predicate together with (ii) a visual phrase embedding that represents the relation triplet. Second, we learn how to transfer visual phrase embeddings from existing training triplets to unseen test triplets using analogies between relations that involve similar objects.
198	Disentangling Monocular 3D Object Detection	Andrea Simonelli, Samuel Rota Bulo, Lorenzo Porzi, Manuel Lopez-Antequera, Peter Kontschieder	In this paper we propose an approach for monocular 3D object detection from a single RGB image, which leverages a novel disentangling transformation for 2D and 3D detection losses and a novel, self-supervised confidence score for 3D bounding boxes.
199	STM: SpatioTemporal and Motion Encoding for Action Recognition	Boyuan Jiang, MengMeng Wang, Weihao Gan, Wei Wu, Junjie Yan	In this work, we aim to efficiently encode these two features in a unified 2D framework.
200	Dynamic Context Correspondence Network for Semantic Alignment	Shuaiyi Huang, Qiuyue Wang, Songyang Zhang, Shipeng Yan, Xuming He	In this paper, we aim to incorporate global semantic context in a flexible manner to overcome the limitations of prior work that relies on local semantic representations.
201	Fooling Network Interpretation in Image Classification	Akshayvarun Subramanya, Vipin Pillai, Hamed Pirsiavash	We show that it is possible to create adversarial patches which not only fool the prediction, but also change what we interpret regarding the cause of the prediction.
202	Unconstrained Foreground Object Search	Yinan Zhao, Brian Price, Scott Cohen, Danna Gurari	We instead propose a novel problem of unconstrained foreground object (UFO) search and introduce a solution that supports efficient search by encoding the background image in the same latent space as the candidate foreground objects.
203	Embodied Amodal Recognition: Learning to Move to Perceive Objects	Jianwei Yang, Zhile Ren, Mingze Xu, Xinlei Chen, David J. Crandall, Devi Parikh, Dhruv Batra	In this work, we introduce the task of Embodied Amodel Recognition (EAR): an agent is instantiated in a 3D environment close to an occluded target object, and is free to move in the environment to perform object classification, amodal object localization, and amodal object segmentation.
204	SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition	Kaiyu Yang, Olga Russakovsky, Jia Deng	We introduce SpatialSense, a dataset specializing in spatial relation recognition which captures a broad spectrum of such challenges, allowing for proper benchmarking of computer vision techniques.
205	TensorMask: A Foundation for Dense Object Segmentation	Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollar	In this work, we investigate the paradigm of dense sliding-window instance segmentation, which is surprisingly under-explored.
206	Integral Object Mining via Online Attention Accumulation	Peng-Tao Jiang, Qibin Hou, Yang Cao, Ming-Ming Cheng, Yunchao Wei, Hong-Kai Xiong	In order to accumulate the discovered different object parts, we propose an online attention accumulation (OAA) strategy which maintains a cumulative attention map for each target category in each training image so that the integral object regions can be gradually promoted as the training goes.
207	Accelerated Gravitational Point Set Alignment With Altered Physical Laws	Vladislav Golyanik, Christian Theobalt, Didier Stricker	This work describes Barnes-Hut Rigid Gravitational Approach (BH-RGA) — a new rigid point set registration method relying on principles of particle dynamics.
208	Domain Adaptation for Semantic Segmentation With Maximum Squares Loss	Minghao Chen, Hongyang Xue, Deng Cai	To balance the gradient of well-classified target samples, we propose the maximum squares loss.
209	Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data	Xiangyu Yue, Yang Zhang, Sicheng Zhao, Alberto Sangiovanni-Vincentelli, Kurt Keutzer, Boqing Gong	To this end, we propose a new approach of domain randomization and pyramid consistency to learn a model with high generalizability.
210	Semi-Supervised Skin Detection by Network With Mutual Guidance	Yi He, Jiayuan Shi, Chuan Wang, Haibin Huang, Jiaming Liu, Guanbin Li, Risheng Liu, Jue Wang	We present a new data-driven method for robust skin detection from a single human portrait image.
211	ACE: Adapting to Changing Environments for Semantic Segmentation	Zuxuan Wu, Xin Wang, Joseph E. Gonzalez, Tom Goldstein, Larry S. Davis	We present ACE, a framework for semantic segmentation that dynamically adapts to changing environments over time.
212	Efficient Segmentation: Learning Downsampling Near Semantic Boundaries	Dmitrii Marin, Zijian He, Peter Vajda, Priyam Chatterjee, Sam Tsai, Fei Yang, Yuri Boykov	To address this problem, we propose a new content-adaptive downsampling technique that learns to favor sampling locations near semantic boundaries of target classes.
213	Recurrent U-Net for Resource-Constrained Segmentation	Wei Wang, Kaicheng Yu, Joachim Hugonot, Pascal Fua, Mathieu Salzmann	In this paper, we introduce a novel recurrent U-Net architecture that preserves the compactness of the original U-Net, while substantially increasing its performance to the point where it outperforms the state of the art on several benchmarks. We also introduce a large-scale dataset for hand segmentation.
214	Detecting the Unexpected via Image Resynthesis	Krzysztof Lis, Krishna Nakka, Pascal Fua, Mathieu Salzmann	In this paper, we tackle the more realistic scenario where unexpected objects of unknown classes can appear at test time.
215	Self-Supervised Monocular Depth Hints	Jamie Watson, Michael Firman, Gabriel J. Brostow, Daniyar Turmukhambetov	Here, we study the problem of ambiguous reprojections in depth-prediction from stereo-based self-supervision, and introduce Depth Hints to alleviate their effects.
216	3D Scene Reconstruction With Multi-Layer Depth and Epipolar Transformers	Daeyun Shin, Zhile Ren, Erik B. Sudderth, Charless C. Fowlkes	To improve the accuracy of view-centered representations for complex scenes, we introduce a novel “Epipolar Feature Transformer” that transfers convolutional network features from an input view to other virtual camera viewpoints, and thus better covers the 3D scene geometry.
217	How Do Neural Networks See Depth in Single Images?	Tom van Dijk, Guido de Croon	In this work we take four previously published networks and investigate what depth cues they exploit.
218	On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos	Zhi Li, Xuan Wang, Fei Wang, Peilin Jiang	In this paper, we propose to exploit monocular videos to complement the training dataset for the single-image 3D human pose estimation tasks.
219	Canonical Surface Mapping via Geometric Cycle Consistency	Nilesh Kulkarni, Abhinav Gupta, Shubham Tulsiani	Our key insight is that the CSM task (pixel to 3D), when combined with 3D projection (3D to pixel), completes a cycle.
220	3D-RelNet: Joint Object and Relational Network for 3D Prediction	Nilesh Kulkarni, Ishan Misra, Shubham Tulsiani, Abhinav Gupta	We propose an approach to predict the 3D shape and pose for the objects present in a scene.
221	GP2C: Geometric Projection Parameter Consensus for Joint 3D Pose and Focal Length Estimation in the Wild	Alexander Grabner, Peter M. Roth, Vincent Lepetit	We present a joint 3D pose and focal length estimation approach for object categories in the wild.
222	Moulding Humans: Non-Parametric 3D Human Shape Estimation From Single Images	Valentin Gabeur, Jean-Sebastien Franco, Xavier Martin, Cordelia Schmid, Gregory Rogez	In this paper, we tackle the problem of 3D human shape estimation from single RGB images.
223	3DPeople: Modeling the Geometry of Dressed Humans	Albert Pumarola, Jordi Sanchez-Riera, Gary P. T. Choi, Alberto Sanfeliu, Francesc Moreno-Noguer	In this paper, we present an approach to model dressed humans and predict their geometry from single images.
224	Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop	Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, Kostas Daniilidis	In this work, instead of investigating which approach is better, our key insight is that the two paradigms can form a strong collaboration.
225	Optimizing Network Structure for 3D Human Pose Estimation	Hai Ci, Chunyu Wang, Xiaoxuan Ma, Yizhou Wang	In this work, we propose a generic formulation where both GCN and Fully Connected Network (FCN) are its special cases.
226	Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks	Yujun Cai, Liuhao Ge, Jun Liu, Jianfei Cai, Tat-Jen Cham, Junsong Yuan, Nadia Magnenat Thalmann	Motivated by the effectiveness of incorporating spatial dependencies and temporal consistencies to alleviate these issues, we propose a novel graph-based method to tackle the problem of 3D human body and 3D hand pose estimation from a short sequence of 2D joint detections.
227	Resolving 3D Human Pose Ambiguities With 3D Scene Constraints	Mohamed Hassan, Vasileios Choutas, Dimitrios Tzionas, Michael J. Black	Our key contribution is to exploit static 3D scene structure to better estimate human pose from monocular images.
228	Tex2Shape: Detailed Full Human Body Geometry From a Single Image	Thiemo Alldieck, Gerard Pons-Moll, Christian Theobalt, Marcus Magnor	We present a simple yet effective method to infer detailed full human body shape from only a single photograph.
229	PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization	Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, Hao Li	Using PIFu, we propose an end-to-end deep learning method for digitizing highly detailed clothed humans that can infer both 3D surface and texture from a single image, and optionally, multiple input images.
230	DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction	Xiaoxing Zeng, Xiaojiang Peng, Yu Qiao	This paper proposes a deep Dense-Fine-Finer Network (DF2Net) to address this challenging problem.
231	Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking	Saurabh Sharma, Pavan Teja Varigonda, Prashast Bindal, Abhishek Sharma, Arjun Jain	In this paper, we propose a Deep Conditional Variational Autoencoder based model that synthesizes diverse anatomically plausible 3D-pose samples conditioned on the estimated 2D-pose.
232	Aligning Latent Spaces for 3D Hand Pose Estimation	Linlin Yang, Shile Li, Dongheui Lee, Angela Yao	In this work, we propose to learn a joint latent representation that leverages other modalities as weak labels to boost the RGB-based hand pose estimator.
233	HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation	Kun Zhou, Xiaoguang Han, Nianjuan Jiang, Kui Jia, Jiangbo Lu	This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state – Part-Centric Heatmap Triplets (HEMlets), which shortens the gap between the 2D observation and the 3D interpretation.
234	End-to-End Hand Mesh Recovery From a Monocular RGB Image	Xiong Zhang, Qiang Li, Hong Mo, Wenbo Zhang, Wen Zheng	In this paper, we present a HAnd Mesh Recovery (HAMR) framework to tackle the problem of reconstructing the full 3D mesh of a human hand from a single RGB image.
235	Robust Multi-Modality Multi-Object Tracking	Wenwei Zhang, Hui Zhou, Shuyang Sun, Zhe Wang, Jianping Shi, Chen Change Loy	In this study, we design a generic sensor-agnostic multi-modality MOT framework (mmMOT), where each modality (i.e., sensors) is capable of performing its role independently to preserve reliability, and could further improving its accuracy through a novel multi-modality fusion module.
236	The Trajectron: Probabilistic Multi-Agent Trajectory Modeling With Dynamic Spatiotemporal Graphs	Boris Ivanovic, Marco Pavone	Towards this end, we present the Trajectron, a graph-structured model that predicts many potential future trajectories of multiple agents simultaneously in both highly dynamic and multimodal scenarios (i.e. where the number of agents in the scene is time-varying and there are many possible highly-distinct futures for each agent).
237	‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-Term Tracking	Bin Yan, Haojie Zhao, Dong Wang, Huchuan Lu, Xiaoyun Yang	In this work, we present a novel robust and real-time long-term tracking framework based on the proposed skimming and perusal modules.
238	TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection	Kyle Min, Jason J. Corso	The proposed approach assumes that the saliency map of any frame can be predicted by considering a limited number of past frames. The results of our extensive experiments on video saliency detection validate this assumption and demonstrate that our fully-convolutional model with temporal aggregation method is effective.
239	Attacking Optical Flow	Anurag Ranjan, Joel Janai, Andreas Geiger, Michael J. Black	In this paper, we extend adversarial patch attacks to optical flow networks and show that such attacks can compromise their performance.
240	Pro-Cam SSfM: Projector-Camera System for Structure and Spectral Reflectance From Motion	Chunyu Li, Yusuke Monno, Hironori Hidaka, Masatoshi Okutomi	In this paper, we propose a novel projector-camera system for practical and low-cost acquisition of a dense object 3D model with the spectral reflectance property.
241	Mop Moire Patterns Using MopNet	Bin He, Ce Wang, Boxin Shi, Ling-Yu Duan	In this paper, we propose a Moire pattern Removal Neural Network (MopNet) to solve this problem.
242	Kernel Modeling Super-Resolution on Real Low-Resolution Images	Ruofan Zhou, Sabine Susstrunk	To improve generalization and robustness of deep super-resolution CNNs on real photographs, we present a kernel modeling super-resolution network (KMSR) that incorporates blur-kernel modeling in the training.
243	Learning to Jointly Generate and Separate Reflections	Daiqian Ma, Renjie Wan, Boxin Shi, Alex C. Kot, Ling-Yu Duan	In this work, we propose to jointly generate and separate reflections within a weakly-supervised learning framework, aiming to model the reflection image formation more comprehensively with abundant unpaired supervision. In particular, we built up an unpaired reflection dataset with 4,027 images, which is useful for facilitating the weakly-supervised learning of reflection removal model.
244	Deep Multi-Model Fusion for Single-Image Dehazing	Zijun Deng, Lei Zhu, Xiaowei Hu, Chi-Wing Fu, Xuemiao Xu, Qing Zhang, Jing Qin, Pheng-Ann Heng	This paper presents a deep multi-model fusion network to attentively integrate multiple models to separate layers and boost the performance in single-image dehazing.
245	Deep Learning for Seeing Through Window With Raindrops	Yuhui Quan, Shijie Deng, Yixin Chen, Hui Ji	In the proposed CNN, we introduce a double attention mechanism that concurrently guides the CNN using shape-driven attention and channel re-calibration.
246	Mask-ShadowGAN: Learning to Remove Shadows From Unpaired Data	Xiaowei Hu, Yitong Jiang, Chi-Wing Fu, Pheng-Ann Heng	This paper presents a new method for shadow removal using unpaired data, enabling us to avoid tedious annotations and obtain more diverse training samples.
247	Spatio-Temporal Filter Adaptive Network for Video Deblurring	Shangchen Zhou, Jiawei Zhang, Jinshan Pan, Haozhe Xie, Wangmeng Zuo, Jimmy Ren	To overcome the limitation of separate optical flow estimation, we propose a Spatio-Temporal Filter Adaptive Network (STFAN) for the alignment and deblurring in a unified framework.
248	Learning Deep Priors for Image Dehazing	Yang Liu, Jinshan Pan, Jimmy Ren, Zhixun Su	We propose an effective iteration algorithm with deep CNNs to learn haze-relevant priors for image dehazing.
249	JPEG Artifacts Reduction via Deep Convolutional Sparse Coding	Xueyang Fu, Zheng-Jun Zha, Feng Wu, Xinghao Ding, John Paisley	To effectively reduce JPEG compression artifacts, we propose a deep convolutional sparse coding (DCSC) network architecture.
250	Self-Guided Network for Fast Image Denoising	Shuhang Gu, Yawei Li, Luc Van Gool, Radu Timofte	To tackle this problem, we propose a self-guided network (SGN), which adopts a top-down self-guidance architecture to better exploit image multi-scale information.
251	Non-Local Intrinsic Decomposition With Near-Infrared Priors	Ziang Cheng, Yinqiang Zheng, Shaodi You, Imari Sato	In this paper, we revisit intrinsic image decomposition with the aid of near-infrared (NIR) imagery.
252	VideoMem: Constructing, Analyzing, Predicting Short-Term and Long-Term Video Memorability	Romain Cohendet, Claire-Helene Demarty, Ngoc Q. K. Duong, Martin Engilberge	This paper focuses on understanding the intrinsic memorability of visual content. To address this challenge, we introduce a large scale dataset (VideoMem) composed of 10,000 videos with memorability scores.
253	Rescan: Inductive Instance Segmentation for Indoor RGBD Scans	Maciej Halber, Yifei Shi, Kai Xu, Thomas Funkhouser	We propose an algorithm that analyzes these “rescans” to infer a temporal model of a scene with semantic instance information.
254	End-to-End CAD Model Retrieval and 9DoF Alignment in 3D Scans	Armen Avetisyan, Angela Dai, Matthias Niessner	We present a novel, end-to-end approach to align CAD models to an 3D scan of a scene, enabling transformation of a noisy, incomplete 3D scan to a compact, CAD reconstruction with clean, complete object geometry.
255	Making History Matter: History-Advantage Sequence Training for Visual Dialog	Tianhao Yang, Zheng-Jun Zha, Hanwang Zhang	To this end, inspired by the actor-critic policy gradient in reinforcement learning, we propose a novel training paradigm called History Advantage Sequence Training (HAST).
256	Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization	Liu Liu, Hongdong Li, Yuchao Dai	We propose a novel representation learning method having higher location-discriminating power.
257	Scene Graph Prediction With Limited Labels	Vincent S. Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Re, Li Fei-Fei	In this paper, we introduce a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few labeled examples.
258	Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded	Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh	In this work, we propose a generic approach called Human Importance-aware Network Tuning (HINT) that effectively leverages human demonstrations to improve visual grounding.
259	Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment	Samyak Datta, Karan Sikka, Anirban Roy, Karuna Ahuja, Devi Parikh, Ajay Divakaran	We propose a novel end-to-end model that uses caption-to-image retrieval as a downstream task to guide the process of phrase localization.
260	Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding	Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Dechao Meng, Qingming Huang	To address this problem, we propose a novel end-to-end adaptive reconstruction network (ARN).
261	Hierarchy Parsing for Image Captioning	Ting Yao, Yingwei Pan, Yehao Li, Tao Mei	In this paper, we introduce a new design to model a hierarchy from instance level (segmentation), region level (detection) to the whole image to delve into a thorough image understanding for captioning.
262	HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips	Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef Sivic	In this work, we propose instead to learn such embeddings from video data with readily available natural language annotations in the form of automatically transcribed narrations.
263	Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network	Bairui Wang, Lin Ma, Wei Zhang, Wenhao Jiang, Jingwen Wang, Wei Liu	In this paper, we propose to guide the video caption generation with Part-of-Speech (POS) information, based on a gated fusion of multiple representations of input videos.
264	Multi-View Stereo by Temporal Nonparametric Fusion	Yuxin Hou, Juho Kannala, Arno Solin	We propose a novel idea for depth estimation from multi-view image-pose pairs, where the model has capability to leverage information from previous latent-space encodings of the scene.
265	Floor-SP: Inverse CAD for Floorplans by Sequential Room-Wise Shortest Path	Jiacheng Chen, Chen Liu, Jiaye Wu, Yasutaka Furukawa	This paper proposes a new approach for automated floorplan reconstruction from RGBD scans, a major milestone in indoor mapping research.
266	Polarimetric Relative Pose Estimation	Zhaopeng Cui, Viktor Larsson, Marc Pollefeys	In this paper we consider the problem of relative pose estimation from two images with per-pixel polarimetric information.
267	Closed-Form Optimal Two-View Triangulation Based on Angular Errors	Seong Hun Lee, Javier Civera	In this paper, we study closed-form optimal solutions to two-view triangulation with known internal calibration and pose.
268	Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images	Haozhe Xie, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Shengping Zhang	To solve these problems, we propose a novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox.
269	Unsupervised Robust Disentangling of Latent Characteristics for Image Synthesis	Patrick Esser, Johannes Haux, Bjorn Ommer	We present a novel approach that learns disentangled representations of these characteristics and explains them individually.
270	SROBB: Targeted Perceptual Loss for Single Image Super-Resolution	Mohammad Saeed Rad, Behzad Bozorgtabar, Urs-Viktor Marti, Max Basler, Hazim Kemal Ekenel, Jean-Philippe Thiran	In this paper, we propose a novel method to benefit from perceptual loss in a more objective way.
271	An Internal Learning Approach to Video Inpainting	Haotian Zhang, Long Mai, Ning Xu, Zhaowen Wang, John Collomosse, Hailin Jin	We propose a novel video inpainting algorithm that simultaneously hallucinates missing appearance and motion (optical flow) information, building upon the recent ‘Deep Image Prior’ (DIP) that exploits convolutional network architectures to enforce plausible texture in static images.
272	Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement	Sai Bi, Kalyan Sunkavalli, Federico Perazzi, Eli Shechtman, Vladimir G. Kim, Ravi Ramamoorthi	We present a method to improve the visual realism of low-quality, synthetic images, e.g. OpenGL renderings.
273	Adversarial Defense via Learning to Generate Diverse Attacks	Yunseok Jang, Tianchen Zhao, Seunghoon Hong, Honglak Lee	In this work, we propose to utilize the generator to learn how to create adversarial examples.
274	Image Generation From Small Datasets via Batch Statistics Adaptation	Atsuhiro Noguchi, Tatsuya Harada	In this work, we propose a novel method focusing on the parameters for batch statistics, scale and shift, of the hidden layers in the generator.
275	Lifelong GAN: Continual Learning for Conditional Image Generation	Mengyao Zhai, Lei Chen, Frederick Tung, Jiawei He, Megha Nawhal, Greg Mori	In contrast to state-of-the-art memory replay based approaches which are limited to label-conditioned image generation tasks, a more generic framework for continual learning of generative models under different conditional image generation settings is proposed in this paper.
276	Bayesian Relational Memory for Semantic Visual Navigation	Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian	We introduce a new memory architecture, Bayesian Relational Memory (BRM), to improve the generalization ability for semantic visual navigation agents in unseen environments, where an agent is given a semantic target to navigate towards.
277	Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes	Fabian Brickwedde, Steffen Abraham, Rudolf Mester	In this paper, we propose a novel monocular 3D scene flow estimation method, called Mono-SF.
278	Prior Guided Dropout for Robust Visual Localization in Dynamic Environments	Zhaoyang Huang, Yan Xu, Jianping Shi, Xiaowei Zhou, Hujun Bao, Guofeng Zhang	In this paper, we propose a framework which can be generally applied to existing CNN-based pose regressors to improve their robustness in dynamic environments.
279	Drive&Act: A Multi-Modal Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles	Manuel Martin, Alina Roitberg, Monica Haurilet, Matthias Horne, Simon Reiss, Michael Voit, Rainer Stiefelhagen	We introduce the novel domain-specific Drive&Act benchmark for fine-grained categorization of driver behavior. Finally, we provide challenging benchmarks by adopting prominent methods for video- and body pose-based action recognition.
280	Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints	Yan Xu, Xinge Zhu, Jianping Shi, Guofeng Zhang, Hujun Bao, Hongsheng Li	In this paper, to regularize the depth completion and improve the robustness against noise, we propose a unified CNN framework that 1) models the geometric constraints between depth and surface normal in a diffusion module and 2) predicts the confidence of sparse LiDAR measurements to mitigate the impact of noise.
281	PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings	Nicholas Rhinehart, Rowan McAllister, Kris Kitani, Sergey Levine	Towards these capabilities, we present a probabilistic forecasting model of future interactions between a variable number of agents.
282	LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis	Zhe Liu, Shunbo Zhou, Chuanzhe Suo, Peng Yin, Wen Chen, Hesheng Wang, Haoang Li, Yun-Hui Liu	In this paper, we develop a novel deep neural network, named LPD-Net (Large-scale Place Description Network), which can extract discriminative and generalizable global descriptors from the raw 3D point cloud.
283	Local Supports Global: Deep Camera Relocalization With Sequence Enhancement	Fei Xue, Xin Wang, Zike Yan, Qiuyuan Wang, Junqiu Wang, Hongbin Zha	We propose to leverage the local information in a image sequence to support global camera relocalization.
284	Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry	Shunkai Li, Fei Xue, Xin Wang, Zike Yan, Hongbin Zha	We propose a self-supervised learning framework for visual odometry (VO) that incorporates correlation of consecutive frames and takes advantage of adversarial learning.
285	TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts	Ziyang Hong, Yvan Petillot, David Lane, Yishu Miao, Sen Wang	This paper proposes a novel visual place recognition algorithm, termed TextPlace, based on scene texts in the wild.
286	CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization	Mingyu Ding, Zhe Wang, Jiankai Sun, Jianping Shi, Ping Luo	To this end, here we present a coarse-to-fine retrieval-based deep learning framework, which includes three steps, i.e., image-based coarse retrieval, pose-based fine retrieval and precise relative pose regression.
287	Situational Fusion of Visual Representation for Visual Navigation	William B. Shen, Danfei Xu, Yuke Zhu, Leonidas J. Guibas, Li Fei-Fei, Silvio Savarese	We propose to train an agent to fuse a large set of visual representations that correspond to diverse visual perception abilities.
288	Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking	Ziyuan Huang, Changhong Fu, Yiming Li, Fuling Lin, Peng Lu	Therefore, in this work, a novel approach to repress the aberrances happening during the detection process is proposed, i.e., aberrance repressed correlation filter (ARCF).
289	6-DOF GraspNet: Variational Grasp Generation for Object Manipulation	Arsalan Mousavian, Clemens Eppner, Dieter Fox	In this work, we formulate the problem of grasp generation as sampling a set of grasps using a variational autoencoder and assess and refine the sampled grasps using a grasp evaluator model.
290	DAGMapper: Learning to Map by Discovering Lane Topology	Namdar Homayounfar, Wei-Chiu Ma, Justin Liang, Xinyu Wu, Jack Fan, Raquel Urtasun	In contrast, in this paper we focus on drawing the lane boundaries of complex highways with many lanes that contain topology changes due to forks and merges.
291	3D-LaneNet: End-to-End 3D Multiple Lane Detection	Noa Garnett, Rafi Cohen, Tomer Pe’er, Roee Lahav, Dan Levi	We introduce a network that directly predicts the 3D layout of lanes in a road scene from a single image.
292	Sampling-Free Epistemic Uncertainty Estimation Using Approximated Variance Propagation	Janis Postels, Francesco Ferroni, Huseyin Coskun, Nassir Navab, Federico Tombari	We present a sampling-free approach for computing the epistemic uncertainty of a neural network.
293	Universal Adversarial Perturbation via Prior Driven Uncertainty Approximation	Hong Liu, Rongrong Ji, Jie Li, Baochang Zhang, Yue Gao, Yongjian Wu, Feiyue Huang	In this paper, we propose a new unsupervised universal adversarial perturbation method, termed as Prior Driven Uncertainty Approximation (PD-UA), to generate a robust UAP by fully exploiting the model uncertainty at each network layer.
294	Understanding Deep Networks via Extremal Perturbations and Smooth Masks	Ruth Fong, Mandela Patrick, Andrea Vedaldi	In this paper, we discuss some of the shortcomings of existing approaches to perturbation analysis and address them by introducing the concept of extremal perturbations, which are theoretically grounded and interpretable.
295	Unsupervised Pre-Training of Image Features on Non-Curated Data	Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin	To that effect, we propose a new unsupervised approach which leverages self-supervision and clustering to capture complementary statistics from large-scale data.
296	Learning Local Descriptors With a CDF-Based Dynamic Soft Margin	Linguang Zhang, Szymon Rusinkiewicz	In this work, we propose a simple yet effective method to overcome the above limitations.
297	Bayes-Factor-VAE: Hierarchical Bayesian Deep Auto-Encoder Models for Factor Disentanglement	Minyoung Kim, Yuting Wang, Pritish Sahu, Vladimir Pavlovic	We propose a family of novel hierarchical Bayesian deep auto-encoder models capable of identifying disentangled factors of variability in data.
298	Linearized Multi-Sampling for Differentiable Image Transformation	Wei Jiang, Weiwei Sun, Andrea Tagliasacchi, Eduard Trulls, Kwang Moo Yi	We propose a novel image sampling method for differentiable image transformation in deep neural networks.
299	AdaTransform: Adaptive Data Transformation	Zhiqiang Tang, Xi Peng, Tingfeng Li, Yizhe Zhu, Dimitris N. Metaxas	In this work, we propose adaptive data transformation to achieve the two goals.
300	CARAFE: Content-Aware ReAssembly of FEatures	Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin	In this work, we propose Content-Aware ReAssembly of FEatures (CARAFE), a universal, lightweight and highly effective operator to fulfill this goal.
301	AFD-Net: Aggregated Feature Difference Learning for Cross-Spectral Image Patch Matching	Dou Quan, Xuefeng Liang, Shuang Wang, Shaowei Wei, Yanfeng Li, Ning Huyan, Licheng Jiao	To tackle these problems, we propose an aggregated feature difference learning network (AFD-Net).
302	Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval	Shupeng Su, Zhisheng Zhong, Chao Zhang	In this paper, we study the unsupervised deep cross-modal hash coding and propose Deep Joint-Semantics Reconstructing Hashing (DJSRH), which has the following two main advantages.
303	Unsupervised Neural Quantization for Compressed-Domain Similarity Search	Stanislav Morozov, Artem Babenko	In more detail, we introduce a DNN architecture for the unsupervised compressed-domain retrieval, based on multi-codebook quantization.
304	Siamese Networks: The Tale of Two Manifolds	Soumava Kumar Roy, Mehrtash Harandi, Richard Nock, Richard Hartley	In this paper, we study Siamese networks from a new perspective and question the validity of their training procedure.
305	Learning Combinatorial Embedding Networks for Deep Graph Matching	Runzhong Wang, Junchi Yan, Xiaokang Yang	To this end, this paper devises an end-to-end differentiable deep network pipeline to learn the affinity for graph matching.
306	Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid	Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, Wayne Zhang	To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales.
307	Wavelet Domain Style Transfer for an Effective Perception-Distortion Tradeoff in Single Image Super-Resolution	Xin Deng, Ren Yang, Mai Xu, Pier Luigi Dragotti	In this paper, we propose a novel method based on wavelet domain style transfer (WDST), which achieves a better PD tradeoff than the GAN based methods.
308	Toward Real-World Single Image Super-Resolution: A New Benchmark and a New Model	Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, Lei Zhang	In this paper, we build a real-world super-resolution (RealSR) dataset where paired LR-HR images on the same scene are captured by adjusting the focal length of a digital camera.
309	RankSRGAN: Generative Adversarial Networks With Ranker for Image Super-Resolution	Wenlong Zhang, Yihao Liu, Chao Dong, Yu Qiao	To address the problem, we propose Super-Resolution Generative Adversarial Networks with Ranker (RankSRGAN) to optimize generator in the direction of perceptual metrics.
310	Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations	Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Jiayi Ma	In this study, we propose a novel progressive fusion network for video SR, which is designed to make better use of spatio-temporal information and is proved to be more efficient and effective than the existing direct fusion, slow fusion or 3D convolution strategies.
311	Deep SR-ITM: Joint Learning of Super-Resolution and Inverse Tone-Mapping for 4K UHD HDR Applications	Soo Ye Kim, Jihyong Oh, Munchurl Kim	In this paper, we propose a joint super-resolution (SR) and inverse tone-mapping (ITM) framework, called Deep SR-ITM, which learns the direct mapping from LR SDR video to their HR HDR version.
312	Dynamic PET Image Reconstruction Using Nonnegative Matrix Factorization Incorporated With Deep Image Prior	Tatsuya Yokota, Kazuya Kawai, Muneyuki Sakata, Yuichi Kimura, Hidekata Hontani	We propose a method that reconstructs dynamic positron emission tomography (PET) images from given sinograms by using non-negative matrix factorization (NMF) incorporated with a deep image prior (DIP) for appropriately constraining the spatial patterns of resultant images.
313	DSIC: Deep Stereo Image Compression	Jerry Liu, Shenlong Wang, Raquel Urtasun	In this paper we tackle the problem of stereo image compression, and leverage the fact that the two images have overlapping fields of view to further compress the representations.
314	Variable Rate Deep Image Compression With a Conditional Autoencoder	Yoojin Choi, Mostafa El-Khamy, Jungwon Lee	In this paper, we propose a novel variable-rate learned image compression framework with a conditional autoencoder.
315	Real Image Denoising With Feature Attention	Saeed Anwar, Nick Barnes	To advance the practicability of the denoising algorithms, this paper proposes a novel single-stage blind real image denoising network (RIDNet) by employing a modular architecture.
316	Noise Flow: Noise Modeling With Conditional Normalizing Flows	Abdelrahman Abdelhamed, Marcus A. Brubaker, Michael S. Brown	This paper introduces Noise Flow, a powerful and accurate noise model based on recent normalizing flow architectures.
317	Bottleneck Potentials in Markov Random Fields	Ahmed Abbas, Paul Swoboda	To solve the ensuing inference problem, we propose high-quality relaxations and efficient algorithms for solving them.
318	Seeing Motion in the Dark	Chen Chen, Qifeng Chen, Minh N. Do, Vladlen Koltun	In this paper, we present deep processing of very dark raw videos: on the order of one lux of illuminance. To support this line of work, we collect a new dataset of raw low-light videos, in which high-resolution raw data is captured at video rate.
319	SENSE: A Shared Encoder Network for Scene-Flow Estimation	Huaizu Jiang, Deqing Sun, Varun Jampani, Zhaoyang Lv, Erik Learned-Miller, Jan Kautz	We introduce a compact network for holistic scene flow estimation, called SENSE, which shares common encoder features among four closely-related tasks: optical flow estimation, disparity estimation from stereo, occlusion estimation, and semantic segmentation.
320	Adversarial Feedback Loop	Firas Shama, Roey Mechrez, Alon Shoshan, Lihi Zelnik-Manor	In this paper we propose a novel method that makes an explicit use of the discriminator in test-time, in a feedback manner in order to improve the generator results.
321	Dynamic-Net: Tuning the Objective Without Re-Training for Synthesis Tasks	Alon Shoshan, Roey Mechrez, Lihi Zelnik-Manor	In this paper we present a first attempt at alleviating the need for re-training.
322	AutoGAN: Neural Architecture Search for Generative Adversarial Networks	Xinyu Gong, Shiyu Chang, Yifan Jiang, Zhangyang Wang	In this paper, we present the first preliminary study on introducing the NAS algorithm to generative adversarial networks (GANs), dubbed AutoGAN.
323	Co-Evolutionary Compression for Unpaired Image Translation	Han Shu, Yunhe Wang, Xu Jia, Kai Han, Hanting Chen, Chunjing Xu, Qi Tian, Chang Xu	To this end, we develop a novel co-evolutionary approach for reducing their memory usage and FLOPs simultaneously.
324	Self-Supervised Representation Learning From Multi-Domain Data	Zeyu Feng, Chang Xu, Dacheng Tao	We present an information-theoretically motivated constraint for self-supervised representation learning from multiple related domains.
325	Controlling Neural Networks via Energy Dissipation	Michael Moeller, Thomas Mollenhoff, Daniel Cremers	In this work we propose energy dissipating networks that iteratively compute a descent direction with respect to a given cost function or energy at the currently estimated reconstruction.
326	Indices Matter: Learning to Index for Deep Image Matting	Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu	By viewing the indices as a function of the feature map, we introduce the concept of ‘learning to index’, and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the pooling and upsampling operators, without extra training supervision.
327	LAP-Net: Level-Aware Progressive Network for Image Dehazing	Yunan Li, Qiguang Miao, Wanli Ouyang, Zhenxin Ma, Huijuan Fang, Chao Dong, Yining Quan	In this paper, we propose a level-aware progressive network (LAP-Net) for single image dehazing.
328	Attention Augmented Convolutional Networks	Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, Quoc V. Le	In this paper, we propose to augment convolutional networks with self-attention by concatenating convolutional feature maps with a set of feature maps produced via a novel relative self-attention mechanism.
329	MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning	Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Kwang-Ting Cheng, Jian Sun	In this paper, we propose a novel meta learning approach for automatic channel pruning of very deep neural networks.
330	Accelerate CNN via Recursive Bayesian Pruning	Yuefu Zhou, Ya Zhang, Yanfeng Wang, Qi Tian	To solve the problem, under the Bayesian framework, we here propose a layer-wise Recursive Bayesian Pruning method (RBP).
331	HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions	Duo Li, Aojun Zhou, Anbang Yao	In this paper, we present Harmonious Bottleneck on two Orthogonal dimensions (HBO), a novel architecture unit, specially tailored to boost the accuracy of extremely lightweight MobileNets at the level of less than 40 MFLOPs.
332	O2U-Net: A Simple Noisy Label Detection Approach for Deep Neural Networks	Jinchi Huang, Lie Qu, Rongfei Jia, Binqiang Zhao	This paper proposes a novel noisy label detection approach, named O2U-net, for deep neural networks without human annotations.
333	Continual Learning by Asymmetric Loss Approximation With Single-Side Overestimation	Dongmin Park, Seokil Hong, Bohyung Han, Kyoung Mu Lee	We propose a novel approach to continual learning by approximating a true loss function using an asymmetric quadratic function with one of its sides overestimated.
334	Label-PEnet: Sequential Label Propagation and Enhancement Networks for Weakly Supervised Instance Segmentation	Weifeng Ge, Sheng Guo, Weilin Huang, Matthew R. Scott	Unlike previous methods which are composed of multiple offline stages, we propose Sequential Label Propagation and Enhancement Networks (referred as Label-PEnet) that progressively transforms image-level labels to pixel-wise labels in a coarse-to-fine manner.
335	LIP: Local Importance-Based Pooling	Ziteng Gao, Limin Wang, Gangshan Wu	In this paper, we present a unified framework over the existing downsampling layers (e.g., average pooling, max pooling, and strided convolution) from a local importance view.
336	Global Feature Guided Local Pooling	Takumi Kobayashi	In this paper, we propose a flexible pooling method which adaptively tunes the pooling functionality based on input features without manually fixing it beforehand.
337	Conditional Coupled Generative Adversarial Networks for Zero-Shot Domain Adaptation	Jinghua Wang, Jianmin Jiang	In this paper, we tackle the challenging zero-shot domain adaptation (ZSDA) problem, where the target-domain data is non-available in the training stage.
338	Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks	Aamir Mustafa, Salman Khan, Munawar Hayat, Roland Goecke, Jianbing Shen, Ling Shao	To counter this, we propose to class-wise disentangle the intermediate feature representations of deep networks.
339	Hyperpixel Flow: Semantic Correspondence With Multi-Layer Neural Features	Juhong Min, Jongmin Lee, Jean Ponce, Minsu Cho	Taking advantage of the condensed features of hyperpixels, we develop an effective real-time matching algorithm based on Hough geometric voting.
340	Information Entropy Based Feature Pooling for Convolutional Neural Networks	Weitao Wan, Jiansheng Chen, Tianpeng Li, Yiqing Huang, Jingqi Tian, Cheng Yu, Youze Xue	Based on this idea, we propose the entropy-based feature weighting method for semantics-aware feature pooling which can be readily integrated into various CNN architectures for both training and inference.
341	Patchwork: A Patch-Wise Attention Network for Efficient Object Detection and Segmentation in Video Streams	Yuning Chai	In this paper, we explore the idea of hard attention aimed for latency-sensitive applications.
342	AttentionRNN: A Structured Spatial Attention Mechanism	Siddhesh Khandelwal, Leonid Sigal	In this paper we develop a novel structured spatial attention mechanism which is end-to-end trainable and can be integrated with any feed-forward convolutional neural network.
343	Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution	Yunpeng Chen, Haoqi Fan, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, Jiashi Feng	In this work, we propose to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially “slower” at a lower spatial resolution reducing both memory and computation cost.
344	Domain Intersection and Domain Difference	Sagie Benaim, Michael Khaitov, Tomer Galanti, Lior Wolf	We present a method for recovering the shared content between two visual domains as well as the content that is unique to each domain.
345	Learned Video Compression	Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G. Anderson, Lubomir Bourdev	We present a new algorithm for video coding, learned end-to-end for the low-latency mode.
346	Local Relation Networks for Image Recognition	Han Hu, Zheng Zhang, Zhenda Xie, Stephen Lin	This paper presents a new image feature extractor, called the local relation layer, that adaptively determines aggregation weights based on the compositional relationship of local pixel pairs.
347	DiscoNet: Shapes Learning on Disconnected Manifolds for 3D Editing	Eloi Mehr, Ariane Jourdan, Nicolas Thome, Matthieu Cord, Vincent Guitteny	In this work, we present an intelligent and user-friendly 3D editing tool, where the edited model is constrained to lie onto a learned manifold of realistic shapes.
348	Deep Residual Learning in the JPEG Transform Domain	Max Ehrlich, Larry S. Davis	We introduce a general method of performing Residual Network inference and learning in the JPEG transform domain that allows the network to consume compressed images as input.
349	Approximated Bilinear Modules for Temporal Modeling	Xinqi Zhu, Chang Xu, Langwen Hui, Cewu Lu, Dacheng Tao	We consider two less-emphasized temporal properties of video: 1. Temporal cues are fine-grained; 2. Temporal modeling needs reasoning. To tackle both problems at once, we exploit approximated bilinear modules (ABMs) for temporal modeling.
350	Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation	Chengchao Shen, Mengqi Xue, Xinchao Wang, Jie Song, Li Sun, Mingli Song	In this paper, we study how to exploit such heterogeneous pre-trained networks, known as teachers, so as to train a customized student network that tackles a set of selective tasks defined by the user.
351	Data-Free Learning of Student Networks	Hanting Chen, Yunhe Wang, Chang Xu, Zhaohui Yang, Chuanjian Liu, Boxin Shi, Chunjing Xu, Chao Xu, Qi Tian	To this end, we propose a novel framework for training efficient deep neural networks by exploiting generative adversarial networks (GANs).
352	Deep Closest Point: Learning Representations for Point Cloud Registration	Yue Wang, Justin M. Solomon	To address local optima and other difficulties in the ICP pipeline, we propose a learning-based method, titled Deep Closest Point (DCP), inspired by recent techniques in computer vision and natural language processing.
353	Orientation-Aware Semantic Segmentation on Icosahedron Spheres	Chao Zhang, Stephan Liwicki, William Smith, Roberto Cipolla	In our work, we propose an orientation-aware CNN framework for the icosahedron mesh.
354	Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks	Zhaoyang Zhang, Jingyu Li, Wenqi Shao, Zhanglin Peng, Ruimao Zhang, Xiaogang Wang, Ping Luo	Toward addressing this issue, we present Groupable ConvNet (GroupNet) built by using a novel dynamic grouping convolution (DGConv) operation, which is able to learn the number of groups in an end-to-end manner.
355	HarDNet: A Low Memory Traffic Network	Ping Chao, Chao-Yang Kao, Yu-Shan Ruan, Chien-Hsiang Huang, Youn-Long Lin	We propose a Harmonic Densely Connected Network to achieve high efficiency in terms of both low MACs and memory traffic.
356	Dynamic Multi-Scale Filters for Semantic Segmentation	Junjun He, Zhongying Deng, Yu Qiao	To address these problems, this paper proposes a Dynamic Multi-scale Network (DMNet) to adaptively capture multi-scale contents for predicting pixel-level semantic labels.
357	Online Model Distillation for Efficient Video Inference	Ravi Teja Mullapudi, Steven Chen, Keyi Zhang, Deva Ramanan, Kayvon Fatahalian	In this paper, we employ the technique of model distillation (supervising a low-cost student model using the output of a high-cost teacher) to specialize accurate, low-cost semantic segmentation models to a target video stream. We also provide a new video dataset for evaluating the efficiency of inference over long running video streams.
358	Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective	Kai Li, Martin Renqiang Min, Yun Fu	With this reformulation, we develop algorithms targeting various ZSL settings: For the conventional setting, we propose to train a deep neural network that directly generates visual feature classifiers from the semantic attributes with an episode-based training scheme; For the generalized setting, we concatenate the learned highly discriminative classifiers for seen classes and the generated classifiers for unseen classes to classify visual features of all classes; For the transductive setting, we exploit unlabeled data to effectively calibrate the classifier generator using a novel learning-without-forgetting self-training mechanism and guide the process by a robust generalized cross-entropy loss.
359	Task-Driven Modular Networks for Zero-Shot Compositional Learning	Senthil Purushwalkam, Maximilian Nickel, Abhinav Gupta, Marc’Aurelio Ranzato	To alleviate this striking difference in efficiency, we propose a task-driven modular architecture for compositional reasoning and sample efficient learning.
360	Transductive Episodic-Wise Adaptive Metric for Few-Shot Learning	Limeng Qiao, Yemin Shi, Jia Li, Yaowei Wang, Tiejun Huang, Yonghong Tian	To this end, we propose a Transductive Episodic-wise Adaptive Metric (TEAM) framework for few-shot learning, by integrating the meta-learning paradigm with both deep metric learning and transductive inference.
361	Deep Multiple-Attribute-Perceived Network for Real-World Texture Recognition	Wei Zhai, Yang Cao, Jing Zhang, Zheng-Jun Zha	To address this problem, we propose a novel deep Multiple-Attribute-Perceived Network (MAP-Net) by progressively learning visual texture attributes in a mutually reinforced manner.
362	RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment	Guan’an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, Zengguang Hou	Different from existing methods, in this paper, we propose a novel and end-to-end Alignment Generative Adversarial Network (AlignGAN) for the RGB-IR RE-ID task.
363	EvalNorm: Estimating Batch Normalization Statistics for Evaluation	Saurabh Singh, Abhinav Shrivastava	In this paper we study this peculiar behavior of BN to gain a better understanding of the problem, and identify a cause.
364	Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification	Jianyuan Guo, Yuhui Yuan, Lang Huang, Chao Zhang, Jin-Ge Yao, Kai Han	In this paper, we address the missed contextual cues by exploiting both the accurate human parts and the coarse non-human parts.
365	Person Search by Text Attribute Query As Zero-Shot Learning	Qi Dong, Shaogang Gong, Xiatian Zhu	In this work, we present a deep learning method for attribute text description based person search without any query imagery.
366	Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval	Qing Liu, Lingxi Xie, Huiyu Wang, Alan L. Yuille	In this paper, we investigate this problem from the viewpoint of domain adaptation which we show is critical in improving feature embedding in the zero-shot scenario.
367	Active Learning for Deep Detection Neural Networks	Hamed H. Aghdam, Abel Gonzalez-Garcia, Joost van de Weijer, Antonio M. Lopez	In this paper, we propose a method to perform active learning of object detectors based on convolutional neural networks.
368	One-Shot Neural Architecture Search via Self-Evaluated Template Network	Xuanyi Dong, Yi Yang	In this paper, we propose a Self-Evaluated Template Network (SETN) to improve the quality of the architecture candidates for evaluation so that it is more likely to cover competitive candidates.
369	Batch DropBlock Network for Person Re-Identification and Beyond	Zuozhuo Dai, Mingqiang Chen, Xiaodong Gu, Siyu Zhu, Ping Tan	In this paper, we propose the Batch DropBlock (BDB) Network which is a two branch network composed of a conventional ResNet-50 as the global branch and a feature dropping branch.The global branch encodes the global salient representations.Meanwhile, the feature dropping branch consists of an attentive feature learning module called Batch DropBlock, which randomly drops the same region of all input feature maps in a batch to reinforce the attentive feature learning of local regions.The network then concatenates features from both branches and provides a more comprehensive and spatially distributed feature representation.
370	Omni-Scale Feature Learning for Person Re-Identification	Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, Tao Xiang	In this paper, a novel deep ReID CNN is designed, termed Omni-Scale Network (OSNet), for omni-scale feature learning.
371	Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation	Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, Kaisheng Ma	In this paper, we propose a general training framework named self distillation, which notably enhances the performance (accuracy) of convolutional neural networks through shrinking the size of the network rather than aggrandizing it.
372	Diversity With Cooperation: Ensemble Methods for Few-Shot Classification	Nikita Dvornik, Cordelia Schmid, Julien Mairal	In this paper, we go a step further and show that by addressing the fundamental high-variance issue of few-shot learning classifiers, it is possible to significantly outperform current meta-learning techniques.
373	Enhancing 2D Representation via Adjacent Views for 3D Shape Retrieval	Cheng Xu, Zhaoqun Li, Qiang Qiu, Biao Leng, Jingfei Jiang	In this paper, we propose a convolutional neural network based method, CenterNet, to enhance each individual 2D view using its neighboring ones.
374	Adversarial Fine-Grained Composition Learning for Unseen Attribute-Object Recognition	Kun Wei, Muli Yang, Hao Wang, Cheng Deng, Xianglong Liu	In this paper, we propose a novel adversarial fine-grained composition learning model for unseen attribute-object pair recognition.
375	Auto-ReID: Searching for a Part-Aware ConvNet for Person Re-Identification	Ruijie Quan, Xuanyi Dong, Yu Wu, Linchao Zhu, Yi Yang	To solve these problems, we propose a retrieval-based search algorithm over a specifically designed reID search space, named Auto-ReID.
376	Second-Order Non-Local Attention Networks for Person Re-Identification	Bryan (Ning) Xia, Yuan Gong, Yizhe Zhang, Christian Poellabauer	In this paper, we propose a novel attention mechanism to directly model long-range relationships via second-order feature statistics.
377	Fast Computation of Content-Sensitive Superpixels and Supervoxels Using Q-Distances	Zipeng Ye, Ran Yi, Minjing Yu, Yong-Jin Liu, Ying He	In this paper, we propose a much faster queue-based graph distance (called q-distance).
378	Progressive-X: Efficient, Anytime, Multi-Model Fitting Algorithm	Daniel Barath, Jiri Matas	The Progressive-X algorithm, Prog-X in short, is proposed for geometric multi-model fitting.
379	Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection	Yingyue Xu, Dan Xu, Xiaopeng Hong, Wanli Ouyang, Rongrong Ji, Min Xu, Guoying Zhao	In this paper, we add message-passing between features and predictions and propose a deep unified CRF saliency model .
380	Selectivity or Invariance: Boundary-Aware Salient Object Detection	Jinming Su, Jia Li, Yu Zhang, Changqun Xia, Yonghong Tian	To address this selectivity-invariance dilemma, we propose a novel boundary-aware network with successive dilation for image-based SOD.
381	Online Unsupervised Learning of the 3D Kinematic Structure of Arbitrary Rigid Bodies	Urbano Miguel Nunes, Yiannis Demiris	In contrast, we propose to tackle this problem in an online unsupervised fashion, by recursively maintaining the metric distance of the scene’s 3D structure, while achieving real-time performance.
382	Few-Shot Generalization for Single-Image 3D Reconstruction via Priors	Bram Wallace, Bharath Hariharan	To address this problem, we present a new model architecture that reframes single-view 3D reconstruction as learnt, category agnostic refinement of a provided, category-specific prior.
383	Digging Into Self-Supervised Monocular Depth Estimation	Clement Godard, Oisin Mac Aodha, Michael Firman, Gabriel J. Brostow	In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods.
384	Learning Object-Specific Distance From a Monocular Image	Jing Zhu, Yi Fang	Observing that the traditional inverse perspective mapping algorithm performs poorly for objects far away from the camera or on the curved road, in this paper, we address the challenging distance estimation problem by developing the first end-to-end learning-based model to directly predict distances for given objects in the images.
385	Unsupervised 3D Reconstruction Networks	Geonho Cha, Minsik Lee, Songhwai Oh	In this paper, we propose 3D unsupervised reconstruction networks (3D-URN), which reconstruct the 3D structures of instances in a given object category from their 2D feature points under an orthographic camera model.
386	3D Point Cloud Generative Adversarial Network Based on Tree Structured Graph Convolutions	Dong Wook Shu, Sung Woo Park, Junseok Kwon	In this paper, we propose a novel generative adversarial network (GAN) for 3D point clouds generation, which is called tree-GAN.
387	Visualization of Convolutional Neural Networks for Monocular Depth Estimation	Junjie Hu, Yan Zhang, Takayuki Okatani	To cope with a difficulty with optimization through a deep CNN, we propose to use another network to predict those relevant image pixels in a forward computation.
388	Co-Separating Sounds of Visual Objects	Ruohan Gao, Kristen Grauman	We introduce a co-separation training paradigm that permits learning object-level sounds from unlabeled multi-source videos.
389	BMN: Boundary-Matching Network for Temporal Action Proposal Generation	Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, Shilei Wen	Based on BM mechanism, we propose an effective, efficient and end-to-end proposal generation method, named Boundary-Matching Network (BMN), which generates proposals with precise temporal boundaries as well as reliable confidence scores simultaneously.
390	Weakly Supervised Temporal Action Localization Through Contrast Based Evaluation Networks	Ziyi Liu, Le Wang, Qilin Zhang, Zhanning Gao, Zhenxing Niu, Nanning Zheng, Gang Hua	To address this challenge, we propose the Contrast-based Localization EvaluAtioN Network (CleanNet) with our new action proposal evaluator, which provides pseudo-supervision by leveraging the temporal contrast in snippet-level action classification predictions.
391	Progressive Sparse Local Attention for Video Object Detection	Chaoxu Guo, Bin Fan, Jie Gu, Qian Zhang, Shiming Xiang, Veronique Prinet, Chunhong Pan	Instead of relying on optical flow, this paper proposes a novel module called Progressive Sparse Local Attention (PSLA), which establishes the spatial correspondence between features across frames in a local region with progressively sparser stride and uses the correspondence to propagate features.
392	Reasoning About Human-Object Interactions Through Dual Attention Networks	Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou	In this work we propose a Dual Attention Network model which reasons about human-object interactions.
393	DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation	Xiaohui Zeng, Renjie Liao, Li Gu, Yuwen Xiong, Sanja Fidler, Raquel Urtasun	In this paper, we propose the differentiable mask-matching network (DMM-Net) for solving the video object segmentation problem where the initial object masks are provided.
394	Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query	Hao Wang, Cheng Deng, Junchi Yan, Dacheng Tao	To address these issues, we propose an asymmetric cross-guided attention network for actor and action video segmentation from natural language query.
395	AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation	Huaijia Lin, Xiaojuan Qi, Jiaya Jia	In this paper, we propose AGSS-VOS to segment multiple objects in one feed-forward path via instance-agnostic and instance-specific modules.
396	Global-Local Temporal Representations for Video Person Re-Identification	Jianing Li, Jingdong Wang, Qi Tian, Wen Gao, Shiliang Zhang	This paper proposes the Global-Local Temporal Representation (GLTR) to exploit the multi-scale temporal cues in video sequences for video person Re-Identification (ReID).
397	AdvIT: Adversarial Frames Identifier Based on Temporal Consistency in Videos	Chaowei Xiao, Ruizhi Deng, Bo Li, Taesung Lee, Benjamin Edwards, Jinfeng Yi, Dawn Song, Mingyan Liu, Ian Molloy	In this paper, we propose an efficient and effective method advIT to detect adversarial frames within videos against different types of attacks based on temporal consistency property of videos.
398	RANet: Ranking Attention Network for Fast Video Object Segmentation	Ziqin Wang, Jun Xu, Li Liu, Fan Zhu, Ling Shao	In this paper, we develop a real-time yet very accurate Ranking Attention Network (RANet) for VOS.
399	Spatial-Temporal Relation Networks for Multi-Object Tracking	Jiarui Xu, Yue Cao, Zheng Zhang, Han Hu	In this paper, we present a unified framework for similarity measurement based on spatial-temporal relation network which could simultaneously encode various cues and perform reasoning across both spatial and temporal domains.
400	Bridging the Gap Between Detection and Tracking: A Unified Approach	Lianghua Huang, Xin Zhao, Kaiqi Huang	In this paper, instead of redesigning a new tracking-by-detection algorithm, we aim to explore a general framework for building trackers directly upon almost any advanced object detector.
401	Learning the Model Update for Siamese Trackers	Lichao Zhang, Abel Gonzalez-Garcia, Joost van de Weijer, Martin Danelljan, Fahad Shahbaz Khan	Therefore, we propose to replace the handcrafted update function with a method which learns to update.
402	Fast-deepKCF Without Boundary Effect	Linyu Zheng, Ming Tang, Yingying Chen, Jinqiao Wang, Hanqing Lu	In order to achieve real-time tracking speed while maintaining high localization accuracy, in this paper, we propose a novel CF tracker, fdKCF*, which casts aside the popular acceleration tool, i.e., fast Fourier transform, employed by all existing CF trackers, and exploits the inherent high-overlap among real (i.e., noncyclic) and dense samples to efficiently construct the kernel matrix.
403	Program-Guided Image Manipulators	Jiayuan Mao, Xiuming Zhang, Yikai Li, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu	In this paper, we present the Program-Guided Image Manipulator (PG-IM), inducing neuro-symbolic program-like representations to represent and manipulate images.
404	Calibration of Axial Fisheye Cameras Through Generic Virtual Central Models	Pierre-Andre Brousseau, Sebastien Roy	This paper proposes a new calibration method for large field of view cameras.
405	Micro-Baseline Structured Light	Vishwanath Saragadam, Jian Wang, Mohit Gupta, Shree Nayar	We propose Micro-baseline Structured Light (MSL), a novel 3D imaging approach designed for small form-factor devices such as cell-phones and miniature robots.
406	l-Net: Reconstruct Hyperspectral Images From a Snapshot Measurement	Xin Miao, Xin Yuan, Yunchen Pu, Vassilis Athitsos	We propose the l-net, which reconstructs hyperspectral images (e.g., with 24 spectral channels) from a single shot measurement.
407	Deep Depth From Aberration Map	Masako Kashiwagi, Nao Mishima, Tatsuo Kozakaya, Shinsaku Hiura	In this work, we propose a novel method which realizes a single-shot deep depth measurement based on physical depth cue using only an off-the-shelf camera and lens.
408	A Dataset of Multi-Illumination Images in the Wild	Lukas Murmann, Michael Gharbi, Miika Aittala, Fredo Durand	We introduce a new multi-illumination dataset of more than 1000 real scenes, each captured in high dynamic range and high resolution, under 25 lighting conditions.
409	Monocular Neural Image Based Rendering With Continuous View Control	Xu Chen, Jie Song, Otmar Hilliges	We propose a method to produce a continuous stream of novel views under fine-grained (e.g., 1 degree step-size) camera control at interactive rates.
410	Multi-View Image Fusion	Marc Comino Trinidad, Ricardo Martin Brualla, Florian Kainz, Janne Kontkanen	We present a novel cascaded feature extraction method that enables us to synergetically learn optical flow at different resolution levels.
411	Enhancing Low Light Videos by Exploring High Sensitivity Camera Noise	Wei Wang, Xin Chen, Cheng Yang, Xiang Li, Xuemei Hu, Tao Yue	In this paper, we explore the physical origins of the practical high sensitivity noise in digital cameras, model them mathematically, and propose to enhance the low light videos based on the noise model by using an LSTM-based neural network.
412	Deep Restoration of Vintage Photographs From Scanned Halftone Prints	Qifan Gao, Xiao Shu, Xiaolin Wu	In this research, we adopt a novel strategy of two-stage deep learning, in which the restoration task is divided into two stages: the removal of printing artifacts and the inverse of halftoning.
413	Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation	Qiqi Hou, Feng Liu	This paper presents a context-aware natural image matting method for simultaneous foreground and alpha matte estimation.
414	CFSNet: Toward a Controllable Feature Space for Image Restoration	Wei Wang, Ruiming Guo, Yapeng Tian, Wenming Yang	This motivates us to exquisitely design a unified interactive framework for general image restoration tasks.
415	Deep Blind Hyperspectral Image Fusion	Wu Wang, Weihong Zeng, Yue Huang, Xinghao Ding, John Paisley	We propose a method for blind HIF problem based on deep learning, where the estimation of the observation model and fusion process are optimized iteratively and alternatingly during the super-resolution reconstruction.
416	Fully Convolutional Pixel Adaptive Image Denoiser	Sungmin Cha, Taesup Moon	We propose a new image denoising algorithm, dubbed as Fully Convolutional Adaptive Image DEnoiser (FC-AIDE), that can learn from an offline supervised training set with a fully convolutional neural network as well as adaptively fine-tune the supervised model for each given noisy image.
417	Coherent Semantic Attention for Image Inpainting	Hongyu Liu, Bin Jiang, Yi Xiao, Chao Yang	To handle this problem, we investigate the human behavior in repairing pictures and propose a fined deep generative model-based approach with a novel coherent semantic attention (CSA) layer, which can not only preserve contextual structure but also make more effective predictions of missing parts by modeling the semantic relevance between the holes features.
418	Embedded Block Residual Network: A Recursive Restoration Model for Single-Image Super-Resolution	Yajun Qiu, Ruxin Wang, Dapeng Tao, Jun Cheng	In this paper, we believe that the lower-frequency and higher-frequency information in images have different levels of complexity and should be restored by models of different representational capacity.
419	Fast Image Restoration With Multi-Bin Trainable Linear Units	Shuhang Gu, Wen Li, Luc Van Gool, Radu Timofte	In this paper we propose a novel activation function, the multi-bin trainable linear unit (MTLU), for increasing the nonlinear modeling capacity together with lighter and shallower networks.
420	Counting With Focus for Free	Zenglin Shi, Pascal Mettes, Cees G. M. Snoek	This paper aims to count arbitrary objects in images.
421	SynDeMo: Synergistic Deep Feature Alignment for Joint Learning of Depth and Ego-Motion	Behzad Bozorgtabar, Mohammad Saeed Rad, Dwarikanath Mahapatra, Jean-Philippe Thiran	In this work, we demonstrate the benefit of using geometric information from synthetic images, coupled with scene depth information, to recover the scale in depth and ego-motion estimation from monocular videos.
422	Diverse Image Synthesis From Semantic Layouts via Conditional IMLE	Ke Li, Tianhao Zhang, Jitendra Malik	In this paper, we focus on the problem of generating images from semantic segmentation maps and present a simple new method that can generate an arbitrary number of images with diverse appearance for the same semantic layout.
423	Towards Bridging Semantic Gap to Improve Semantic Segmentation	Yanwei Pang, Yazhao Li, Jianbing Shen, Ling Shao	To solve this problem, we explore two strategies for robust feature fusion.
424	Generating Diverse and Descriptive Image Captions Using Visual Paraphrases	Lixin Liu, Jiajun Tang, Xiaojun Wan, Zongming Guo	In this paper, aimed at improving diversity and descriptiveness characteristics of generated image captions, we propose a model utilizing visual paraphrases (different sentences describing the same image) in captioning datasets.
425	Learning to Collocate Neural Modules for Image Captioning	Xu Yang, Hanwang Zhang, Jianfei Cai	To render existing encoder-decoder image captioners such human-like reasoning, we propose a novel framework: learning to Collocate Neural Modules (CNM), to generate the “inner pattern” connecting visual encoder and language decoder.
426	Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning	Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing	To address this concern, we propose Seq-CVAE which learns a latent space for every word.
427	Why Does a Visual Question Have Different Answers?	Nilavra Bhattacharya, Qing Li, Danna Gurari	We propose a taxonomy of nine plausible reasons, and create two labelled datasets consisting of 45,000 visual questions indicating which reasons led to answer differences. We then propose a novel problem of predicting directly from a visual question which reasons will cause answer differences as well as a novel algorithm for this purpose.
428	G3raphGround: Graph-Based Language Grounding	Mohit Bajaj, Lanjun Wang, Leonid Sigal	In this paper we present an end-to-end framework for grounding of phrases in images.
429	Scene Text Visual Question Answering	Ali Furkan Biten, Ruben Tito, Andres Mafla, Lluis Gomez, Marcal Rusinol, Ernest Valveny, C.V. Jawahar, Dimosthenis Karatzas	In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the Visual Question Answering process.
430	Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM	Lu Sheng, Dan Xu, Wanli Ouyang, Xiaogang Wang	In this paper we tackle the joint learning problem of keyframe detection and visual odometry towards monocular visual SLAM systems.
431	MVSCRF: Learning Multi-View Stereo With Conditional Random Fields	Youze Xue, Jiansheng Chen, Weitao Wan, Yiqing Huang, Cheng Yu, Tianpeng Li, Jiayu Bao	We present a deep-learning architecture for multi-view stereo with conditional random fields (MVSCRF).
432	Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses	Eric Brachmann, Carsten Rother	We present Neural-Guided RANSAC (NG-RANSAC), an extension to the classic RANSAC algorithm from robust optimization.
433	Efficient Learning on Point Clouds With Basis Point Sets	Sergey Prokudin, Christoph Lassner, Javier Romero	In this work we propose basis point sets as a highly efficient and fully general way to process point clouds with machine learning algorithms.
434	Cross View Fusion for 3D Human Pose Estimation	Haibo Qiu, Chunyu Wang, Jingdong Wang, Naiyan Wang, Wenjun Zeng	We present an approach to recover absolute 3D human poses from multi-view images by incorporating multi-view geometric priors in our model.
435	Shape-Aware Human Pose and Shape Reconstruction Using Multi-View Images	Junbang Liang, Ming C. Lin	We propose a scalable neural network framework to reconstruct the 3D mesh of a human body from multi-view images, in the subspace of the SMPL model.
436	Monocular Piecewise Depth Estimation in Dynamic Scenes by Exploiting Superpixel Relations	Yan Di, Henrique Morimitsu, Shan Gao, Xiangyang Ji	In this paper, we propose a novel and specially designed method for piecewise dense monocular depth estimation in dynamic scenes.
437	Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization	Hajime Taira, Ignacio Rocco, Jiri Sedlar, Masatoshi Okutomi, Josef Sivic, Tomas Pajdla, Torsten Sattler, Akihiko Torii	In this paper, we thus focus on pose verification.
438	DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch	Shivam Duggal, Shenlong Wang, Wei-Chiu Ma, Rui Hu, Raquel Urtasun	Our goal is to significantly speed up the runtime of current state-of-the-art stereo algorithms to enable real-time inference.
439	Convolutional Sequence Generation for Skeleton-Based Action Synthesis	Sijie Yan, Zhizhong Li, Yuanjun Xiong, Huahan Yan, Dahua Lin	In this work, we aim to generate long actions represented as sequences of skeletons.
440	Onion-Peel Networks for Deep Video Completion	Seoung Wug Oh, Sungho Lee, Joon-Young Lee, Seon Joo Kim	We propose the onion-peel networks for video completion.
441	Copy-and-Paste Networks for Deep Video Inpainting	Sungho Lee, Seoung Wug Oh, DaeYeun Won, Seon Joo Kim	We present a novel deep learning based algorithm for video inpainting.
442	Content and Style Disentanglement for Artistic Style Transfer	Dmytro Kotovenko, Artsiom Sanakoyeu, Sabine Lang, Bjorn Ommer	We present a novel approach which captures particularities of style and the variations within and separates style and content.
443	Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?	Rameen Abdal, Yipeng Qin, Peter Wonka	We propose an efficient algorithm to embed a given image into the latent space of StyleGAN.
444	Controllable Artistic Text Style Transfer via Shape-Matching GAN	Shuai Yang, Zhangyang Wang, Zhaowen Wang, Ning Xu, Jiaying Liu, Zongming Guo	In this paper, we present the first text style transfer network that allows for real-time control of the crucial stylistic degree of the glyph through an adjustable parameter.
445	Understanding Generalized Whitening and Coloring Transform for Universal Style Transfer	Tai-Yin Chiu	In this report, we generalize ZCA to the general form of WCT, provide an analytical performance analysis from the angle of neural style transfer, and show why ZCA is a good choice for style transfer among different WCTs and why some WCTs are not well applicable for style transfer.
446	Learning Implicit Generative Models by Matching Perceptual Features	Cicero Nogueira dos Santos, Youssef Mroueh, Inkit Padhi, Pierre Dognin	More specifically, we propose a new effective MM approach that learns implicit generative models by performing mean and covariance matching of features extracted from pretrained ConvNets.
447	Free-Form Image Inpainting With Gated Convolution	Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang	We present a generative image inpainting system to complete images with free-form mask and guidance.
448	FiNet: Compatible and Diverse Fashion Image Inpainting	Xintong Han, Zuxuan Wu, Weilin Huang, Matthew R. Scott, Larry S. Davis	In this paper, we propose to explicitly model visual compatibility through fashion image inpainting.
449	InGAN: Capturing and Retargeting the “DNA” of a Natural Image	Assaf Shocher, Shai Bagon, Phillip Isola, Michal Irani	In this paper we propose an “Internal GAN” (InGAN) — an image-specific GAN — which trains on a single input image and learns its internal distribution of patches.
450	Seeing What a GAN Cannot Generate	David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, Antonio Torralba	In this work, we visualize mode collapse at both the distribution level and the instance level.
451	COCO-GAN: Generation by Parts via Conditional Coordinating	Chieh Hubert Lin, Chia-Che Chang, Yu-Sheng Chen, Da-Cheng Juan, Wei Wei, Hwann-Tzong Chen	Inspired by such behavior and the fact that machines also have computational constraints, we propose COnditional COordinate GAN (COCO-GAN) of which the generator generates images by parts based on their spatial coordinates as the condition.
452	Neural Turtle Graphics for Modeling City Road Layouts	Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, Antonio Torralba, Sanja Fidler	We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts.
453	Texture Fields: Learning Texture Representations in Function Space	Michael Oechsle, Lars Mescheder, Michael Niemeyer, Thilo Strauss, Andreas Geiger	In this paper, we propose Texture Fields, a novel texture representation which is based on regressing a continuous 3D function parameterized with a neural network.
454	PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows	Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, Bharath Hariharan	This paper proposes a principled probabilistic framework to generate 3D point clouds by modeling them as a distribution of distributions.
455	Meta-Sim: Learning to Generate Synthetic Datasets	Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, Sanja Fidler	The goal of our work is to automatically synthesize labeled datasets that are relevant for a downstream task.
456	Specifying Object Attributes and Relations in Interactive Scene Generation	Oron Ashual, Lior Wolf	We introduce a method for the generation of images from an input scene graph.
457	SinGAN: Learning a Generative Model From a Single Natural Image	Tamar Rott Shaham, Tali Dekel, Tomer Michaeli	We introduce SinGAN, an unconditional generative model that can be learned from a single natural image.
458	VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research	Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, William Yang Wang	We present a new large-scale multilingual video description dataset, VATEX, which contains over 41,250 videos and 825,000 captions in both English and Chinese.
459	A Graph-Based Framework to Bridge Movies and Synopses	Yu Xiong, Qingqiu Huang, Lingfeng Guo, Hang Zhou, Bolei Zhou, Dahua Lin	On top of this dataset, we develop a framework to perform matching between movie segments and synopsis paragraphs. To facilitate the efforts along this direction, we construct a dataset called Movie Synopses Associations (MSA) over 327 movies, which provides a synopsis for each movie, together with annotated associations between synopsis paragraphs and movie segments.
460	From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason	Ajeet Kumar Singh, Anand Mishra, Shashank Shekhar, Anirban Chakraborty	In this work, we present a VQA model which can read scene texts and perform reasoning on a knowledge graph to arrive at an accurate answer.
461	Counterfactual Critic Multi-Agent Training for Scene Graph Generation	Long Chen, Hanwang Zhang, Jun Xiao, Xiangnan He, Shiliang Pu, Shih-Fu Chang	To this end, we propose a Counterfactual critic Multi-Agent Training (CMAT) approach.
462	Robust Change Captioning	Dong Huk Park, Trevor Darrell, Anna Rohrbach	We present a novel Dual Dynamic Attention Model (DUDA) to perform robust Change Captioning. To study the problem in depth, we collect a CLEVR-Change dataset, built off the CLEVR engine, with 5 types of scene changes.
463	Attention on Attention for Image Captioning	Lun Huang, Wenmin Wang, Jie Chen, Xiao-Yong Wei	In this paper, we propose an Attention on Attention (AoA) module, which extends the conventional attention mechanisms to determine the relevance between attention results and queries.
464	Dynamic Graph Attention for Referring Expression Comprehension	Sibei Yang, Guanbin Li, Yizhou Yu	In this paper, we explore the problem of referring expression comprehension from the perspective of language-driven visual reasoning, and propose a dynamic graph attention network to perform multi-step reasoning by modeling both the relationships among the objects in the image and the linguistic structure of the expression.
465	Visual Semantic Reasoning for Image-Text Matching	Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, Yun Fu	To address this issue, we propose a simple and interpretable reasoning model to generate visual representation that captures key objects and semantic concepts of a scene.
466	Phrase Localization Without Paired Training Examples	Josiah Wang, Lucia Specia	We postulate that such paired annotations are unnecessary, and propose the first method for the phrase localization problem where neither training procedure nor paired, task-specific data is required.
467	Learning to Assemble Neural Module Tree Networks for Visual Grounding	Daqing Liu, Hanwang Zhang, Feng Wu, Zheng-Jun Zha	In this paper, we propose to ground natural language in an intuitive, explainable, and composite fashion as it should be.
468	A Fast and Accurate One-Stage Approach to Visual Grounding	Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, Jiebo Luo	We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight.
469	Zero-Shot Grounding of Objects From Natural Language Queries	Arka Sadhu, Kan Chen, Ram Nevatia	We propose a new single-stage model called ZSGNet which combines the detector network and the grounding system and predicts classification scores and regression parameters. We also introduce new datasets, sub-sampled from Flickr30k Entities and Visual Genome, that enable evaluations for the four conditions.
470	Towards Unconstrained End-to-End Text Spotting	Siyang Qin, Alessandro Bissacco, Michalis Raptis, Yasuhisa Fujii, Ying Xiao	We propose an end-to-end trainable network that can simultaneously detect and recognize text of arbitrary shape, making substantial progress on the open problem of reading scene text of irregular shape.
471	What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis	Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee	This paper addresses this difficulty with three major contributions. First, we examine the inconsistencies of training and evaluation datasets, and the performance gap results from inconsistencies. Second, we introduce a unified four-stage STR framework that most existing STR models fit into.
472	Sparse and Imperceivable Adversarial Attacks	Francesco Croce, Matthias Hein	We propose a new black-box technique to craft adversarial examples aiming at minimizing l_0-distance to the original image.
473	Enhancing Adversarial Example Transferability With an Intermediate Level Attack	Qian Huang, Isay Katsman, Horace He, Zeqi Gu, Serge Belongie, Ser-Nam Lim	We introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model, improving upon state-of-the-art methods.
474	Implicit Surface Representations As Layers in Neural Networks	Mateusz Michalkiewicz, Jhony K. Pontes, Dominic Jack, Mahsa Baktashmotlagh, Anders Eriksson	To overcome this limitation we propose a novel formulation that permits the use of implicit representations of curves and surfaces, of arbitrary topology, as individual layers in Neural Network architectures with end-to-end trainability.
475	A Tour of Convolutional Networks Guided by Linear Interpreters	Pablo Navarrete Michelini, Hanwen Liu, Yunhua Lu, Xingqun Jiang	We introduce a hooking layer, called a LinearScope, which allows us to run the network and the linear interpreter in parallel.
476	Small Steps and Giant Leaps: Minimal Newton Solvers for Deep Learning	Joao F. Henriques, Sebastien Ehrhardt, Samuel Albanie, Andrea Vedaldi	We propose a fast second-order method that can be used as a drop-in replacement for current deep learning solvers.
477	Semantic Adversarial Attacks: Parametric Transformations That Fool Deep Classifiers	Ameya Joshi, Amitangshu Mukherjee, Soumik Sarkar, Chinmay Hegde	In this paper, we consider a different setting: what happens if the adversary could only alter specific attributes of the input image?
478	Hilbert-Based Generative Defense for Adversarial Examples	Yang Bai, Yan Feng, Yisen Wang, Tao Dai, Shu-Tao Xia, Yong Jiang	Therefore, we propose a more advanced Hilbert curve scan order to model the pixel dependencies in this paper.
479	On the Efficacy of Knowledge Distillation	Jang Hyun Cho, Bharath Hariharan	In this paper, we present a thorough evaluation of the efficacy of knowledge distillation and its dependence on student and teacher architectures.
480	Sym-Parameterized Dynamic Inference for Mixed-Domain Image Translation	Simyung Chang, SeongUk Park, John Yang, Nojun Kwak	We propose a method to expand the concept of `multi-domain’ from data to the loss area, and to combine the characteristics of each domain to create an image.
481	Better and Faster: Exponential Loss for Image Patch Matching	Shuang Wang, Yanfeng Li, Xuefeng Liang, Dou Quan, Bowu Yang, Shaowei Wei, Licheng Jiao	To assist the exponential losses, we introduce the hard positive sample mining to further enhance the effectiveness.
482	Physical Adversarial Textures That Fool Visual Object Tracking	Rey Reza Wiyatno, Anqi Xu	We present a method for creating inconspicuous-looking textures that, when displayed as posters in the physical world, cause visual object tracking systems to become confused.
483	Wasserstein GAN With Quadratic Transport Cost	Huidong Liu, Xianfeng Gu, Dimitris Samaras	In this paper, we propose WGAN-QC, a WGAN with quadratic transport cost.
484	Scalable Verified Training for Provably Robust Image Classification	Sven Gowal, Krishnamurthy (Dj) Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, Pushmeet Kohli	Through a comprehensive analysis, we show how a simple bounding technique, interval bound propagation (IBP), can be exploited to train large provably robust neural networks that beat the state-of-the-art in verified accuracy.
485	Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks	Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, Junjie Yan	To address this problem, in this paper we propose Differentiable Soft Quantization (DSQ) to bridge the gap between the full-precision and low-bit networks.
486	The LogBarrier Adversarial Attack: Making Effective Use of Decision Boundary Information	Chris Finlay, Aram-Alexandre Pooladian, Adam Oberman	We design a new untargeted attack, based on these best practices, using the well-regarded logarithmic barrier method.
487	Proximal Mean-Field for Neural Network Quantization	Thalaiyasingam Ajanthan, Puneet K. Dokania, Richard Hartley, Philip H. S. Torr	In this work, we cast NN quantization as a discrete labelling problem, and by examining relaxations, we design an efficient iterative optimization procedure that involves stochastic gradient descent followed by a projection.
488	Improving Adversarial Robustness via Guided Complement Entropy	Hao-Yun Chen, Jhao-Hong Liang, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan	In this paper, we propose a new training paradigm called Guided Complement Entropy (GCE) that is capable of achieving “adversarial defense for free,” which involves no additional procedures in the process of improving adversarial robustness.
489	A Geometry-Inspired Decision-Based Attack	Yujia Liu, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard	In this paper, we propose qFool, a novel decision-based attack algorithm that can generate adversarial examples using a small number of queries.
490	Universal Perturbation Attack Against Image Retrieval	Jie Li, Rongrong Ji, Hong Liu, Xiaopeng Hong, Yue Gao, Qi Tian	To this end, we propose a novel method to generate retrieval-against UAP to break the neighbourhood relationships of image features via degrading the corresponding ranking metric.
491	Bayesian Optimized 1-Bit CNNs	Jiaxin Gu, Junhe Zhao, Xiaolong Jiang, Baochang Zhang, Jianzhuang Liu, Guodong Guo, Rongrong Ji	In this paper, we propose a novel approach, called Bayesian optimized 1-bit CNNs (denoted as BONNs), taking the advantage of Bayesian learning, a well-established strategy for hard problems, to significantly improve the performance of extreme 1-bit CNNs.
492	Rethinking ImageNet Pre-Training	Kaiming He, Ross Girshick, Piotr Dollar	We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization.
493	Defending Against Universal Perturbations With Shared Adversarial Training	Chaithanya Kumar Mummadi, Thomas Brox, Jan Hendrik Metzen	In this work, we show that adversarial training is more effective in preventing universal perturbations, where the same perturbation needs to fool a classifier on many inputs.
494	Adaptive Activation Thresholding: Dynamic Routing Type Behavior for Interpretability in Convolutional Neural Networks	Yiyou Sun, Sathya N. Ravi, Vikas Singh	In this paper, we show how a simple modification of the SGD scheme can help provide dynamic/EM routing type behavior in convolutional neural networks.
495	XRAI: Better Attributions Through Regions	Andrei Kapishnikov, Tolga Bolukbasi, Fernanda Viegas, Michael Terry	In this paper, we 1) present a novel region-based attribution method, XRAI, that builds upon integrated gradients (Sundararajan et al. 2017), 2) introduce evaluation methods for empirically assessing the quality of image-based saliency maps (Performance Information Curves (PICs)), and 3) contribute an axiom-based sanity check for attribution methods.
496	Guessing Smart: Biased Sampling for Efficient Black-Box Adversarial Attacks	Thomas Brunner, Frederik Diehl, Michael Truong Le, Alois Knoll	We consider adversarial examples for image classification in the black-box decision-based setting.
497	Mask-Guided Attention Network for Occluded Pedestrian Detection	Yanwei Pang, Jin Xie, Muhammad Haris Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, Ling Shao	We propose an approach for occluded pedestrian detection with the following contributions.
498	Spectral Feature Transformation for Person Re-Identification	Chuanchen Luo, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang	To relieve the issue, we propose to formulate the whole data batch as a similarity graph.
499	Permutation-Invariant Feature Restructuring for Correlation-Aware Image Set-Based Recognition	Xiaofeng Liu, Zhenhua Guo, Site Li, Lingsheng Kong, Ping Jia, Jane You, B.V.K. Vijaya Kumar	We consider the problem of comparing the similarity of image sets with variable-quantity, quality and un-ordered heterogeneous images.
500	Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization	Chufeng Tang, Lu Sheng, Zhaoxiang Zhang, Xiaolin Hu	We propose a flexible Attribute Localization Module (ALM) to adaptively discover the most discriminative regions and learns the regional features for each attribute at multiple levels.
501	Correlation Congruence for Knowledge Distillation	Baoyun Peng, Xiao Jin, Jiaheng Liu, Dongsheng Li, Yichao Wu, Yu Liu, Shunfeng Zhou, Zhaoning Zhang	In this work, we propose a new framework named correlation congruence for knowledge distillation (CCKD), which transfers not only the instance-level information but also the correlation between instances.
502	Dynamic Curriculum Learning for Imbalanced Data Classification	Yiru Wang, Weihao Gan, Jie Yang, Wei Wu, Junjie Yan	To address this problem, we propose a unified framework called Dynamic Curriculum Learning (DCL) to adaptively adjust the sampling strategy and loss weight in each batch, which results in better ability of generalization and discrimination.
503	Video Face Clustering With Unknown Number of Clusters	Makarand Tapaswi, Marc T. Law, Sanja Fidler	To this end, we propose Ball Cluster Learning (BCL), a supervised approach to carve the embedding space into balls of equal size, one for each cluster.
504	Targeted Mismatch Adversarial Attack: Query With a Flower to Retrieve the Tower	Giorgos Tolias, Filip Radenovic, Ondrej Chum	We introduce the concept of targeted mismatch attack for deep learning based retrieval systems to generate an adversarial image to conceal the query image.
505	Fashion++: Minimal Edits for Outfit Improvement	Wei-Lin Hsiao, Isay Katsman, Chao-Yuan Wu, Devi Parikh, Kristen Grauman	We introduce Fashion++, an approach that proposes minimal adjustments to a full-body clothing outfit that will have maximal impact on its fashionability.
506	Semi-Supervised Pedestrian Instance Synthesis and Detection With Mutual Reinforcement	Si Wu, Sihao Lin, Wenhao Wu, Mohamed Azzam, Hau-San Wong	We propose a GAN-based scene-specific instance synthesis and classification model for semi-supervised pedestrian detection.
507	SILCO: Show a Few Images, Localize the Common Object	Tao Hu, Pascal Mettes, Jia-Hong Huang, Cees G. M. Snoek	In this work, we propose a new task along this research direction, we call few-shot common-localization.
508	A Deep Step Pattern Representation for Multimodal Retinal Image Registration	Jimmy Addison Lee, Peng Liu, Jun Cheng, Huazhu Fu	This paper presents a novel feature-based method that is built upon a convolutional neural network (CNN) to learn the deep representation for multimodal retinal image registration.
509	Deep Graphical Feature Learning for the Feature Matching Problem	Zhen Zhang, Wee Sun Lee	In this paper, we address this problem by proposing a graph neural network model to transform coordinates of feature points into local features.
510	Minimum Delay Object Detection From Video	Dong Lao, Ganesh Sundaramoorthi	We consider the problem of detecting objects, as they come into view, from videos in an online fashion.
511	Learning With Average Precision: Training Image Retrieval With a Listwise Loss	Jerome Revaud, Jon Almazan, Rafael S. Rezende, Cesar Roberto de Souza	In this paper we propose instead to directly optimize the global mAP by leveraging recent advances in listwise loss formulations.
512	Learning to Find Common Objects Across Few Image Collections	Amirreza Shaban, Amir Rahimi, Shray Bansal, Stephen Gould, Byron Boots, Richard Hartley	Given a collection of bags where each bag is a set of images, our goal is to select one image from each bag such that the selected images are from the same object class.
513	Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection	Lu Zhang, Xiangyu Zhu, Xiangyu Chen, Xu Yang, Zhen Lei, Zhiyong Liu	In this paper, we propose a novel Aligned Region CNN (AR-CNN) to handle the weakly aligned multispectral data in an end-to-end way.
514	Deep Self-Learning From Noisy Labels	Jiangfan Han, Ping Luo, Xiaogang Wang	Unlike previous works constrained by many conditions, making them infeasible to real noisy cases, this work presents a novel deep self-learning framework to train a robust network on the real noisy datasets without extra supervision.
515	DSConv: Efficient Convolution Operator	Marcelo Gennari do Nascimento, Roger Fawcett, Victor Adrian Prisacariu	We introduce DSConv, a flexible quantized convolution operator that replaces single-precision operations with their far less expensive integer counterparts, while maintaining the probability distributions over both the kernel weights and the outputs.
516	Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once	Jiangfan Han, Xiaoyi Dong, Ruimao Zhang, Dongdong Chen, Weiming Zhang, Nenghai Yu, Ping Luo, Xiaogang Wang	In this paper, we propose the first Multi-target Adversarial Network (MAN), which can generate multi-target adversarial samples with a single model.
517	Explicit Shape Encoding for Real-Time Instance Segmentation	Wenqiang Xu, Haiyang Wang, Fubo Qi, Cewu Lu	In this paper, we propose a novel top-down instance segmentation framework based on explicit shape encoding, named ESE-Seg.
518	IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things	Cheng-Yang Fu, Tamara L. Berg, Alexander C. Berg	In this work, we present a new operator, called Instance Mask Projection (IMP), which projects a predicted instance segmentation as a new feature for semantic segmentation.
519	Video Instance Segmentation	Linjie Yang, Yuchen Fan, Ning Xu	In this paper we present a new computer vision task, named video instance segmentation. To facilitate research on this new task, we propose a large-scale benchmark called YouTube-VIS, which consists of 2,883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks.
520	Attention Bridging Network for Knowledge Transfer	Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, Yun Fu	In this paper, we use knowledge from the source domain to guide the network’s response to categories shared with the target domain.
521	Self-Supervised Difference Detection for Weakly-Supervised Semantic Segmentation	Wataru Shimoda, Keiji Yanai	In this paper, to make the most of such mapping functions, we assume that the results of the mapping function include noise, and we improve the accuracy by removing noise.
522	SPGNet: Semantic Prediction Guidance for Scene Parsing	Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, Jinjun Xiong, Thomas S. Huang, Wen-Mei Hwu, Honghui Shi	In this work, we propose a Semantic Prediction Guidance (SPG) module which learns to re-weight the local features through the guidance from pixel-wise semantic prediction.
523	Gated-SCNN: Gated Shape CNNs for Semantic Segmentation	Towaki Takikawa, David Acuna, Varun Jampani, Sanja Fidler	Here, we propose a new two-stream CNN architecture for semantic segmentation that explicitly wires shape information as a separate processing branch, i.e. shape stream, that processes information in parallel to the classical stream.
524	DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing	Yongcheng Liu, Bin Fan, Gaofeng Meng, Jiwen Lu, Shiming Xiang, Chunhong Pan	Here we propose DensePoint, a general architecture to learn densely contextual representation for point cloud processing.
525	AMP: Adaptive Masked Proxies for Few-Shot Segmentation	Mennatullah Siam, Boris N. Oreshkin, Martin Jagersand	We propose a novel adaptive masked proxies method that constructs the final segmentation layer weights from few labelled samples.
526	Universal Semi-Supervised Semantic Segmentation	Tarun Kalluri, Girish Varma, Manmohan Chandraker, C.V. Jawahar	In this paper, we pose the novel problem of universal semi-supervised semantic segmentation and propose a solution framework, to meet the dual needs of lower annotation and deployment costs.
527	Accelerate Learning of Deep Hashing With Gradient Attention	Long-Kai Huang, Jianda Chen, Sinno Jialin Pan	To address this issue, we propose a new deep hashing model integrated with a novel gradient attention mechanism.
528	SVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval	Qing-Yuan Jiang, Yi He, Gen Li, Jian Lin, Lei Li, Wu-Jun Li	In this paper, we introduce a large-scale short video dataset, called SVD, for the NDVR task.
529	Block Annotation: Better Image Annotation With Sub-Image Decomposition	Hubert Lin, Paul Upchurch, Kavita Bala	To recover the necessary global structure for applications such as characterizing spatial context and affordance relationships, we propose an effective method to inpaint block-annotated images with high-quality labels without additional human effort.
530	Probabilistic Deep Ordinal Regression Based on Gaussian Processes	Yanzhu Liu, Fan Wang, Adams Wai Kin Kong	This paper adapts traditional GPs regression for ordinal regression problem by using both conjugate and non-conjugate ordinal likelihood.
531	Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations	Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, Vicente Ordonez	In this work, we present a framework to measure and mitigate intrinsic biases with respect to protected variables -such as gender- in visual recognition tasks.
532	Teacher Guided Architecture Search	Pouya Bashivan, Mark Tensen, James J. DiCarlo	As one step toward this goal, we use representational similarity analysis to evaluate the similarity of internal activations of candidate networks with those of a (fixed, high performing) teacher network.
533	FACSIMILE: Fast and Accurate Scans From an Image in Less Than a Second	David Smith, Matthew Loper, Xiaochen Hu, Paris Mavroidis, Javier Romero	We propose FACSIMILE (FAX), a method that estimates a detailed body from a single photo, lowering the bar for creating virtual representations of humans.
534	Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild	Yu Rong, Ziwei Liu, Cheng Li, Kaidi Cao, Chen Change Loy	In this work, we aim to perform a comprehensive study on cost and effectiveness trade-off between different annotations.
535	Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation	Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, Tao Mei	We describe an end-to-end method for recovering 3D human body mesh from single images and monocular videos.
536	Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture From Images “In the Wild”	Silvia Zuffi, Angjoo Kanazawa, Tanya Berger-Wolf, Michael J. Black	We present the first method to perform automatic 3D pose, shape and texture capture of animals from images acquired in-the-wild.
537	Object-Driven Multi-Layer Scene Decomposition From a Single Image	Helisa Dhamo, Nassir Navab, Federico Tombari	We present a method that tackles the challenge of predicting color and depth behind the visible content of an image.
538	Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics	Michael Niemeyer, Lars Mescheder, Michael Oechsle, Andreas Geiger	In this work, we present Occupancy Flow, a novel spatio-temporal representation of time-varying 3D geometry with implicit correspondences.
539	Joint Monocular 3D Vehicle Detection and Tracking	Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krahenbuhl, Trevor Darrell, Fisher Yu	In this paper, we propose a novel online framework for 3D vehicle detection and tracking from monocular videos.
540	Fingerspelling Recognition in the Wild With Iterative Visual Attention	Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Diane Brentari, Greg Shakhnarovich, Karen Livescu	In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media. We also introduce a newly collected data set of crowdsourced annotations of fingerspelling in the wild, and show that performance can be further improved with this additional data set.
541	PointAE: Point Auto-Encoder for 3D Statistical Shape and Texture Modelling	Hang Dai, Ling Shao	In this paper, we propose a Point Auto-Encoder (PointAE) with skip-connection, attention blocks for 3D statistical shape modelling directly on 3D points.
542	Multi-Garment Net: Learning to Dress 3D People From Images	Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, Gerard Pons-Moll	We present Multi-Garment Network (MGN), a method to predict body shape and clothing, layered on top of the SMPL model from a few frames (1-8) of a video.
543	Skeleton-Aware 3D Human Shape Reconstruction From Point Clouds	Haiyong Jiang, Jianfei Cai, Jianmin Zheng	Particularly, we develop an end-to-end framework, where we propose a graph aggregation module to augment PointNet++ by extracting better point features, an attention module to better map unordered point features into ordered skeleton joint features, and a skeleton graph module to extract better joint features for SMPL parameter regression.
544	AMASS: Archive of Motion Capture As Surface Shapes	Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, Michael J. Black	To address this, we introduce AMASS, a large and varied database of human motion that unifies 15 different optical marker-based mocap datasets by representing them within a common framework and parameterization.
545	Person-in-WiFi: Fine-Grained Person Perception Using WiFi	Fei Wang, Sanping Zhou, Stanislav Panev, Jinsong Han, Dong Huang	In this paper, we take one step forward to show that fine-grained person perception is possible even with 1D sensors: WiFi antennas.
546	FAB: A Robust Facial Landmark Detection Framework for Motion-Blurred Videos	Keqiang Sun, Wayne Wu, Tinghao Liu, Shuo Yang, Quan Wang, Qiang Zhou, Zuochang Ye, Chen Qian	In this paper, we propose a framework named FAB that takes advantage of structure consistency in the temporal dimension for facial landmark detection in motion-blurred videos.
547	Attentional Feature-Pair Relation Networks for Accurate Face Recognition	Bong-Nam Kang, Yonghyun Kim, Bongjin Jun, Daijin Kim	In this paper, we propose a novel face recognition method, called Attentional Feature-pair Relation Network (AFRN), which represents the face by the relevant pairs of local appearance block features with their attention scores.
548	Action Recognition With Spatial-Temporal Discriminative Filter Banks	Brais Martinez, Davide Modolo, Yuanjun Xiong, Joseph Tighe	In this work we focus on how to improve the representation capacity of the network, but rather than altering the backbone, we focus on improving the last layers of the network, where changes have low impact in terms of computational cost. With the proposed approach, we obtain state-of-the-art performance on Kinetics-400 and Something-Something-V1, the two major large-scale action recognition benchmarks.
549	EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition	Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen	We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multi-modal temporal-binding, i.e. the combination of modalities within a range of temporal offsets.
550	Weakly-Supervised Action Localization With Background Modeling	Phuc Xuan Nguyen, Deva Ramanan, Charless C. Fowlkes	We describe a latent approach that learns to detect actions in long sequences given training videos with only whole-video class labels.
551	Grouped Spatial-Temporal Aggregation for Efficient Action Recognition	Chenxu Luo, Alan L. Yuille	In this paper, we propose a novel decomposition method that decomposes the feature channels into spatial and temporal groups in parallel.
552	Temporal Structure Mining for Weakly Supervised Action Detection	Tan Yu, Zhou Ren, Yuncheng Li, Enxu Yan, Ning Xu, Junsong Yuan	To alleviate this problem in WSAD, we propose the temporal structure mining (TSM) approach.
553	Temporal Recurrent Networks for Online Action Detection	Mingze Xu, Mingfei Gao, Yi-Ting Chen, Larry S. Davis, David J. Crandall	In this paper, we propose a novel framework, the Temporal Recurrent Network (TRN), to model greater temporal context of each frame by simultaneously performing online action detection and anticipation of the immediate future.
554	StartNet: Online Detection of Action Start in Untrimmed Videos	Mingfei Gao, Mingze Xu, Larry S. Davis, Richard Socher, Caiming Xiong	We propose StartNet to address Online Detection of Action Start (ODAS) where action starts and their associated categories are detected in untrimmed, streaming videos.
555	Video Classification With Channel-Separated Convolutional Networks	Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli	This paper studies the effects of different design choices in 3D group convolutional networks for video classification.
556	Predicting the Future: A Jointly Learnt Model for Action Anticipation	Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes	Inspired by human neurological structures for action anticipation, we present an action anticipation model that enables the prediction of plausible future actions by forecasting both the visual and temporal future.
557	Human-Aware Motion Deblurring	Ziyi Shen, Wenguan Wang, Xiankai Lu, Jianbing Shen, Haibin Ling, Tingfa Xu, Ling Shao	This paper proposes a human-aware deblurring model that disentangles the motion blur between foreground (FG) humans and background (BG). To further benefit the research towards Human-aware Image Deblurring, we introduce a large-scale dataset, named HIDE, which consists of 8,422 blurry and sharp image pairs with 65,784 densely annotated FG human bounding boxes.
558	Fast Video Object Segmentation via Dynamic Targeting Network	Lu Zhang, Zhe Lin, Jianming Zhang, Huchuan Lu, You He	We propose a new model for fast and accurate video object segmentation.
559	Solving Vision Problems via Filtering	Sean I. Young, Aous T. Naman, Bernd Girod, David Taubman	We propose a new, filtering approach for solving a large number of regularized inverse problems commonly found in computer vision.
560	GAN-Based Projector for Faster Recovery With Convergence Guarantees in Linear Inverse Problems	Ankit Raj, Yuqi Li, Yoram Bresler	Here, we propose a new method of deploying a GAN-based prior to solve linear inverse problems using projected gradient descent (PGD).
561	Scoot: A Perceptual Metric for Facial Sketches	Deng-Ping Fan, ShengChuan Zhang, Yu-Huan Wu, Yun Liu, Ming-Ming Cheng, Bo Ren, Paul L. Rosin, Rongrong Ji	In this paper, we design a perceptual metric, called Structure Co-Occurrence Texture (Scoot), which simultaneously considers the block-level spatial structure and co-occurrence texture statistics.
562	Learning Filter Basis for Convolutional Neural Network Compression	Yawei Li, Shuhang Gu, Luc Van Gool, Radu Timofte	Thus, in this paper, we try to reduce the number of parameters of CNNs by learning a basis of the filters in convolutional layers.
563	End-to-End Learning of Representations for Asynchronous Event-Based Data	Daniel Gehrig, Antonio Loquercio, Konstantinos G. Derpanis, Davide Scaramuzza	In this work, we introduce a general framework to convert event streams into grid-based representations by means of strictly differentiable operations.
564	ERL-Net: Entangled Representation Learning for Single Image De-Raining	Guoqing Wang, Changming Sun, Arcot Sowmya	In this paper, we hypothesize that there exists an inherent mapping between the low-quality embedding to a latent optimal one, with which the generator (decoder) can produce much better results.
565	Perceptual Deep Depth Super-Resolution	Oleg Voynov, Alexey Artemov, Vage Egiazarian, Alexander Notchenko, Gleb Bobrovskikh, Evgeny Burnaev, Denis Zorin	The main idea of our approach is to measure the quality of depth map upsampling using renderings of resulting 3D surfaces.
566	3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera	Iro Armeni, Zhi-Yang He, JunYoung Gwak, Amir R. Zamir, Martin Fischer, Jitendra Malik, Silvio Savarese	To alleviate this we devise a semi-automatic framework that employs existing detection methods and enhances them using two main constraints: I. framing of query images sampled on panoramas to maximize the performance of 2D detectors, and II.
567	Floorplan-Jigsaw: Jointly Estimating Scene Layout and Aligning Partial Scans	Cheng Lin, Changjian Li, Wenping Wang	We present a novel approach to align partial 3D reconstructions which may not have substantial overlap.
568	Enforcing Geometric Constraints of Virtual Normal for Depth Prediction	Wei Yin, Yifan Liu, Chunhua Shen, Youliang Yan	In this work, we show the importance of the high-order 3D geometric constraints for depth prediction.
569	Deep Contextual Attention for Human-Object Interaction Detection	Tiancai Wang, Rao Muhammad Anwer, Muhammad Haris Khan, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Jorma Laaksonen	We propose a contextual attention framework for human-object interaction detection.
570	Learning Compositional Neural Information Fusion for Human Parsing	Wenguan Wang, Zhijie Zhang, Siyuan Qi, Jianbing Shen, Yanwei Pang, Ling Shao	This work proposes to combine neural networks with the compositional hierarchy of human bodies for efficient and complete human parsing.
571	Attentional Neural Fields for Crowd Counting	Anran Zhang, Lei Yue, Jiayi Shen, Fan Zhu, Xiantong Zhen, Xianbin Cao, Ling Shao	In this paper, we propose the Attentional Neural Field (ANF) for crowd counting via density estimation.
572	Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning	Lifeng Fan, Wenguan Wang, Siyuan Huang, Xinyu Tang, Song-Chun Zhu	Together with VACATION, we propose a spatio-temporal graph neural network to explicitly represent the diverse gaze interactions in the social scenes and to infer atomic-level gaze communication by message passing.
573	Controllable Attention for Structured Layered Video Decomposition	Jean-Baptiste Alayrac, Joao Carreira, Relja Arandjelovic, Andrew Zisserman	The objective of this paper is to be able to separate a video into its natural layers, and to control which of the separated layers to attend to.
574	GANalyze: Toward Visual Definitions of Cognitive Image Properties	Lore Goetschalckx, Alex Andonian, Aude Oliva, Phillip Isola	We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability.
575	Saliency-Guided Attention Network for Image-Sentence Matching	Zhong Ji, Haoran Wang, Jungong Han, Yanwei Pang	Unlike previous approaches that predominantly deploy symmetrical architecture to represent both modalities, we introduce a Saliency-guided Attention Network (SAN) that is characterized by building an asymmetrical link between vision and language to efficiently learn a fine-grained cross-modal correlation.
576	CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval	Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao	In this paper, we propose Cross-modal Adaptive Message Passing (CAMP), which adaptively controls the information flow for message passing across modalities.
577	ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching	Yan Huang, Liang Wang	In this work, we study this challenging scenario as few-shot image and sentence matching, and accordingly propose an Aligned Cross-Modal Memory (ACMM) model to memorize the rarely appeared content.
578	Creativity Inspired Zero-Shot Learning	Mohamed Elhoseiny, Mohamed Elfeki	We introduce a learning signal inspired by creativity literature that explores the unseen space with hallucinated class-descriptions and encourages careful deviation of their visual feature generations from seen classes while allowing knowledge transfer from seen to unseen classes.
579	Generating Easy-to-Understand Referring Expressions for Target Identifications	Mikihiro Tanaka, Takayuki Itamochi, Kenichi Narioka, Ikuro Sato, Yoshitaka Ushiku, Tatsuya Harada	This paper addresses the generation of referring expressions that not only refer to objects correctly but also let humans find them quickly. To evaluate our system, we created a new referring expression dataset whose images were acquired from Grand Theft Auto V (GTA V), limiting targets to persons.
580	Language-Agnostic Visual-Semantic Embeddings	Jonatas Wehrmann, Douglas M. Souza, Mauricio A. Lopes, Rodrigo C. Barros	This paper proposes a framework for training language-invariant cross-modal retrieval models.
581	Adversarial Representation Learning for Text-to-Image Matching	Nikolaos Sarafianos, Xiang Xu, Ioannis A. Kakadiaris	With that in mind, we introduce TIMAM: a Text-Image Modality Adversarial Matching approach that learns modality-invariant feature representations using adversarial and cross-modal matching objectives.
582	Multi-Modality Latent Interaction Network for Visual Question Answering	Peng Gao, Haoxuan You, Zhanpeng Zhang, Xiaogang Wang, Hongsheng Li	In this paper, we proposed the Multi-modality Latent Interaction module (MLI) to tackle this problem.
583	Learning Two-View Correspondences and Geometry Using Order-Aware Network	Jiahui Zhang, Dawei Sun, Zixin Luo, Anbang Yao, Lei Zhou, Tianwei Shen, Yurong Chen, Long Quan, Hongen Liao	Given putative correspondences of feature points in two views, in this paper, we propose Order-Aware Network, which infers the probabilities of correspondences being inliers and regresses the relative pose encoded by the essential matrix.
584	Learning Meshes for Dense Visual SLAM	Michael Bloesch, Tristan Laidlow, Ronald Clark, Stefan Leutenegger, Andrew J. Davison	In the present paper, we use triangular meshes as both compact and dense geometry representation.
585	EM-Fusion: Dynamic Object-Level SLAM With Probabilistic Data Association	Michael Strecke, Jorg Stuckler	In this paper, we propose a novel approach to dynamic SLAM with dense object-level representations.
586	ClusterSLAM: A SLAM Backend for Simultaneous Rigid Body Clustering and Motion Estimation	Jiahui Huang, Sheng Yang, Zishuo Zhao, Yu-Kun Lai, Shi-Min Hu	In this paper, we exploit the consensus of 3D motions among the landmarks extracted from the same rigid body for clustering and estimating static and dynamic objects in a unified manner.
587	Efficient and Robust Registration on the 3D Special Euclidean Group	Uttaran Bhattacharya, Venu Madhav Govindu	We present a robust, fast and accurate method for registration of 3D scans.
588	Algebraic Characterization of Essential Matrices and Their Averaging in Multiview Settings	Yoni Kasten, Amnon Geifman, Meirav Galun, Ronen Basri	This paper presents a novel approach that solves simultaneously for both camera orientations and positions.
589	Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis	Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, Shenghua Gao	In this paper, we propose to use a 3D body mesh recovery module to disentangle the pose and shape, which can not only model the joint location and rotation but also characterize the personalized body shape. In addition, we build a new dataset, namely Impersonator (iPER) dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis.
590	RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes	Po-Wei Wu, Yu-Jing Lin, Che-Han Chang, Edward Y. Chang, Shih-Wei Liao	To address these limitations, we propose RelGAN, a new method for multi-domain image-to-image translation.
591	Attribute-Driven Spontaneous Motion in Unpaired Image Translation	Ruizheng Wu, Xin Tao, Xiaodong Gu, Xiaoyong Shen, Jiaya Jia	We in this paper propose the spontaneous motion estimation module, along with a refinement part, to learn attribute-driven deformation between source and target domains.
592	Everybody Dance Now	Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros	This paper presents a simple method for “do as I do” motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. In addition, we release a first-of-its-kind open-source dataset of videos that can be legally used for training and motion transfer.
593	Multimodal Style Transfer via Graph Cuts	Yulun Zhang, Chen Fang, Yilin Wang, Zhaowen Wang, Zhe Lin, Yun Fu, Jimei Yang	In this paper, we introduce a more flexible and general universal style transfer technique: multimodal style transfer (MST).
594	A Closed-Form Solution to Universal Style Transfer	Ming Lu, Hao Zhao, Anbang Yao, Yurong Chen, Feng Xu, Li Zhang	In this paper, we first propose a novel interpretation by treating it as the optimal transport problem. Then, we demonstrate the relations of our formulation with former works like Adaptive Instance Normalization (AdaIN) and Whitening and Coloring Transform (WCT). Finally, we derive a closed-form solution named Optimal Style Transfer (OST) under our formulation by additionally considering the content loss of Gatys.
595	Progressive Reconstruction of Visual Structure for Image Inpainting	Jingyuan Li, Fengxiang He, Lefei Zhang, Bo Du, Dacheng Tao	To address this issue, this paper proposes a Progressive Reconstruction of Visual Structure (PRVS) network that progressively reconstructs the structures and the associated visual feature.
596	Variational Adversarial Active Learning	Samarth Sinha, Sayna Ebrahimi, Trevor Darrell	We describe a pool-based semi-supervised active learning algorithm that implicitly learns this sampling mechanism in an adversarial manner.
597	Confidence Regularized Self-Training	Yang Zou, Zhiding Yu, Xiaofeng Liu, B.V.K. Vijaya Kumar, Jinsong Wang	To address the problem, we propose a confidence regularized self-training (CRST) framework, formulated as regularized self-training.
598	Anchor Loss: Modulating Loss Scale Based on Prediction Difficulty	Serim Ryou, Seong-Gyun Jeong, Pietro Perona	In this work, we define the prediction difficulty as a relative property coming from the confidence score gap between positive and negative labels.
599	Local Aggregation for Unsupervised Learning of Visual Embeddings	Chengxu Zhuang, Alex Lin Zhai, Daniel Yamins	Here, we describe a method that trains an embedding function to maximize a metric of local aggregation, causing similar data instances to move together in the embedding space, while allowing dissimilar instances to separate.
600	PR Product: A Substitute for Inner Product in Neural Networks	Zhennan Wang, Wenbin Zou, Chen Xu	In this paper, we analyze the inner product of weight vector w and data vector x in neural networks from the perspective of vector orthogonal decomposition and prove that the direction gradient of w decreases with the angle between them close to 0 or p.
601	CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features	Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo	We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches.
602	Towards Interpretable Object Detection by Unfolding Latent Structures	Tianfu Wu, Xi Song	The proposed method focuses on weakly-supervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations.
603	Scaling Object Detection by Transferring Classification Weights	Jason Kuen, Federico Perazzi, Zhe Lin, Jianming Zhang, Yap-Peng Tan	In this paper, we propose a novel weight transfer network (WTN) to effectively and efficiently transfer knowledge from classification network’s weights to detection network’s weights to allow detection of novel classes without box supervision.
604	Scale-Aware Trident Networks for Object Detection	Yanghao Li, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang	Based on the findings from the exploration experiments, we propose a novel Trident Network (TridentNet) aiming to generate scale-specific feature maps with a uniform representational power.
605	Object-Aware Instance Labeling for Weakly Supervised Object Detection	Satoshi Kosugi, Toshihiko Yamasaki, Kiyoharu Aizawa	Instead of simply labeling the top-scoring region and its highly overlapping regions as positive and others as negative, we propose more effective instance labeling methods as follows.
606	Generative Modeling for Small-Data Object Detection	Lanlan Liu, Michael Muelly, Jia Deng, Tomas Pfister, Li-Jia Li	In this work we explore this problem from a generative modeling perspective by learning to generate new images with associated bounding boxes, and using these for training an object detector.
607	Transductive Learning for Zero-Shot Object Detection	Shafin Rahman, Salman Khan, Nick Barnes	To the best of our knowledge, we are the first to propose a transductive zero-shot object detection approach that convincingly reduces the domain-shift and model-bias against unseen classes.
608	Self-Training and Adversarial Background Regularization for Unsupervised Domain Adaptive One-Stage Object Detection	Seunghyeon Kim, Jaehoon Choi, Taekyung Kim, Changick Kim	In this paper, we introduce a weak self-training (WST) method and adversarial background score regularization (BSR) for domain adaptive one-stage object detection.
609	Memory-Based Neighbourhood Embedding for Visual Recognition	Suichan Li, Dapeng Chen, Bin Liu, Nenghai Yu, Rui Zhao	In this paper, we propose Memory-based Neighbourhood Embedding (MNE) to enhance a general CNN feature by considering its neighbourhood.
610	Self-Similarity Grouping: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-Identification	Yang Fu, Yunchao Wei, Guanshuo Wang, Yuqian Zhou, Honghui Shi, Thomas S. Huang	In this work, we explore how to harness the similar natural characteristics existing in the samples from the target domain for learning to conduct person re-ID in an unsupervised manner.
611	Deep Reinforcement Active Learning for Human-in-the-Loop Person Re-Identification	Zimo Liu, Jingya Wang, Shaogang Gong, Huchuan Lu, Dacheng Tao	In this work, we propose an alternative reinforcement learning based human-in-the-loop model which releases the restriction of pre-labelling and keeps model upgrading with progressively collected data.
612	A Dual-Path Model With Adaptive Attention for Vehicle Re-Identification	Pirazh Khorramshahi, Amit Kumar, Neehar Peri, Sai Saketh Rambhatla, Jun-Cheng Chen, Rama Chellappa	In this paper, we present a novel dual-path adaptive attention model for vehicle re-identification (AAVER).
613	Bayesian Loss for Crowd Count Estimation With Point Supervision	Zhiheng Ma, Xing Wei, Xiaopeng Hong, Yihong Gong	On the contrary, we propose Bayesian loss, a novel loss function which constructs a density contribution probability model from the point annotations.
614	Learning Spatial Awareness to Improve Crowd Counting	Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander G. Hauptmann	In this paper, we present a novel architecture called SPatial Awareness Network (SPANet) to incorporate spatial context for crowd counting.
615	GradNet: Gradient-Guided Network for Visual Object Tracking	Peixia Li, Boyu Chen, Wanli Ouyang, Dong Wang, Xiaoyun Yang, Huchuan Lu	In this work, we propose a novel gradient-guided network to exploit the discriminative information in gradients and update the template in the siamese network through feed-forward and backward operations.
616	FAMNet: Joint Learning of Feature, Affinity and Multi-Dimensional Assignment for Online Multiple Object Tracking	Peng Chu, Haibin Ling	In this paper, we present an end-to-end model, named FAMNet, where Feature extraction, Affinity estimation and Multi-dimensional assignment are refined in a single network.
617	Learning Discriminative Model Prediction for Tracking	Goutam Bhat, Martin Danelljan, Luc Van Gool, Radu Timofte	We develop an end-to-end tracking architecture, capable of fully exploiting both target and background appearance information for target model prediction.
618	DynamoNet: Dynamic Action and Motion Network	Ali Diba, Vivek Sharma, Luc Van Gool, Rainer Stiefelhagen	In this paper, we are interested in self-supervised learning the motion cues in videos using dynamic motion filters for a better motion representation to finally boost human action recognition in particular.
619	SlowFast Networks for Video Recognition	Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He	We present SlowFast networks for video recognition.
620	Generative Multi-View Human Action Recognition	Lichen Wang, Zhengming Ding, Zhiqiang Tao, Yunyu Liu, Yun Fu	In this work, we propose a Generative Multi-View Action Recognition (GMVAR) framework to address the challenges above.
621	Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition	Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Shilei Wen	We intuitively formulate the frame sampling procedure as multiple parallel Markov decision processes, each of which aims at picking out a frame/clip by gradually adjusting an initial sampling.
622	SCSampler: Sampling Salient Clips From Video for Efficient Action Recognition	Bruno Korbar, Du Tran, Lorenzo Torresani	In this paper we introduce a lightweight “clip-sampling” model that can efficiently identify the most salient temporal clips within a long video.
623	Weakly Supervised Energy-Based Learning for Action Segmentation	Jun Li, Peng Lei, Sinisa Todorovic	Our key contribution is a new constrained discriminative forward loss (CDFL) that we use for training the HMM and GRU under weak supervision.
624	What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention	Antonino Furnari, Giovanni Maria Farinella	We tackle the problem proposing an architecture able to anticipate actions at multiple temporal scales using two LSTMs to 1) summarize the past, and 2) formulate predictions about the future.
625	PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction	Amir Rasouli, Iuliia Kotseruba, Toni Kunic, John K. Tsotsos	We propose models for estimating pedestrian crossing intention and predicting their future trajectory. To this end, we propose a novel large-scale dataset designed for pedestrian intention estimation (PIE).
626	STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction	Yingfan Huang, Huikun Bi, Zhaoxin Li, Tianlu Mao, Zhaoqi Wang	In this work, we propose a Spatial-Temporal Graph Attention network (STGAT), based on a sequence-to-sequence architecture to predict future trajectories of pedestrians.
627	Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection	Khoi-Nguyen C. Mac, Dhiraj Joshi, Raymond A. Yeh, Jinjun Xiong, Rogerio S. Feris, Minh N. Do	We propose a novel locally-consistent deformable convolution, which utilizes the change in receptive fields and enforces a local coherency constraint to capture motion information effectively.
628	Dual Attention Matching for Audio-Visual Event Localization	Yu Wu, Linchao Zhu, Yan Yan, Yi Yang	In this paper, we investigate the audio-visual event localization problem.
629	Uncertainty-Aware Audiovisual Activity Recognition Using Deep Bayesian Variational Inference	Mahesh Subedar, Ranganath Krishnan, Paulo Lopez Meyer, Omesh Tickoo, Jonathan Huang	Our contribution in this work is to propose an uncertainty aware multimodal Bayesian fusion framework for activity recognition.
630	Non-Local Recurrent Neural Memory for Supervised Sequence Modeling	Canmiao Fu, Wenjie Pei, Qiong Cao, Chaopeng Zhang, Yong Zhao, Xiaoyong Shen, Yu-Wing Tai	To tackle this limitation, we propose the Non-local Recurrent Neural Memory (NRNM) for supervised sequence modeling, which performs non-local operations to learn full-order interactions within a sliding temporal block and models the global interactions between blocks in a gated recurrent manner.
631	Temporal Attentive Alignment for Large-Scale Video Domain Adaptation	Min-Hung Chen, Zsolt Kira, Ghassan AlRegib, Jaekwon Yoo, Ruxin Chen, Jian Zheng	Finally, we propose Temporal Attentive Adversarial Adaptation Network (TA3N), which explicitly attends to the temporal dynamics using domain discrepancy for more effective domain alignment, achieving state-of-the-art performance on four video DA datasets
632	Action Assessment by Joint Relation Graphs	Jia-Hui Pan, Jibin Gao, Wei-Shi Zheng	We present a new model to assess the performance of actions from videos, through graph-based joint relation modelling.
633	Unsupervised Procedure Learning via Joint Dynamic Summarization	Ehsan Elhamifar, Zwe Naing	Our goal is to produce a summary of the procedure key-steps and their ordering needed to perform a given task, as well as localization of the key-steps in videos.
634	ViSiL: Fine-Grained Spatio-Temporal Video Similarity Learning	Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis Kompatsiaris	In this paper we introduce ViSiL, a Video Similarity Learning architecture that considers fine-grained Spatio-Temporal relations between pairs of videos — such relations are typically lost in previous video retrieval approaches that embed the whole frame or even the whole video into a vector descriptor before the similarity estimation.
635	Unsupervised Learning of Landmarks by Descriptor Vector Exchange	James Thewlis, Samuel Albanie, Hakan Bilen, Andrea Vedaldi	In this paper, we develop a new perspective on the equivariance approach by noting that dense landmark detectors can be interpreted as local image descriptors equipped with invariance to intra-category variations.
636	Learning Compositional Representations for Few-Shot Recognition	Pavel Tokmakov, Yu-Xiong Wang, Martial Hebert	In this work, we make a step towards bridging this gap between human and machine learning by introducing a simple regularization technique that allows the learned representation to be decomposable into parts.
637	Spectral Regularization for Combating Mode Collapse in GANs	Kanglin Liu, Wenming Tang, Fei Zhou, Guoping Qiu	In this paper, we present spectral regularization for GANs (SR-GANs), a new and robust method for combating the mode collapse problem in GANs.
638	Scaling and Benchmarking Self-Supervised Visual Representation Learning	Priya Goyal, Dhruv Mahajan, Abhinav Gupta, Ishan Misra	In this work, we revisit this principle and scale two popular self-supervised approaches to 100 million images. We also introduce an extensive benchmark across 9 different datasets and tasks.
639	Learning an Effective Equivariant 3D Descriptor Without Supervision	Riccardo Spezialetti, Samuele Salti, Luigi Di Stefano	In this paper, we explore the benefits of taking a step back in the direction of end-to-end learning of 3D descrip- tors by disentangling the creation of a robust and distinctive rotation equivariant representation, which can be learned from unoriented input data, and the definition of a good canonical orientation, required only at test time to obtain an invariant descriptor.
640	KPConv: Flexible and Deformable Convolution for Point Clouds	Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Francois Goulette, Leonidas J. Guibas	We present Kernel Point Convolution (KPConv), a new design of point convolution, i.e. that operates on point clouds without any intermediate representation.
641	Neural Inter-Frame Compression for Video Coding	Abdelaziz Djelouah, Joaquim Campos, Simone Schaub-Meyer, Christopher Schroers	Therefore, in this work we present an inter-frame compression approach for neural video coding that can seamlessly build up on different existing neural image codecs.
642	Task2Vec: Task Embedding for Meta-Learning	Alessandro Achille, Michael Lam, Rahul Tewari, Avinash Ravichandran, Subhransu Maji, Charless C. Fowlkes, Stefano Soatto, Pietro Perona	We introduce a method to generate vectorial representations of visual classification tasks which can be used to reason about the nature of those tasks and their relations.
643	Deep Clustering by Gaussian Mixture Variational Autoencoders With Graph Embedding	Linxiao Yang, Ngai-Man Cheung, Jiaying Li, Jun Fang	We propose DGG: D eep clustering via a G aussian-mixture variational autoencoder (VAE) with G raph embedding.
644	SoftTriple Loss: Deep Metric Learning Without Triplet Sampling	Qi Qian, Lei Shang, Baigui Sun, Juhua Hu, Hao Li, Rong Jin	Therefore, we propose the SoftTriple loss to extend the SoftMax loss with multiple centers for each class.
645	A Weakly Supervised Fine Label Classifier Enhanced by Coarse Supervision	Fariborz Taherkhani, Hadi Kazemi, Ali Dabouei, Jeremy Dawson, Nasser M. Nasrabadi	We propose a new deep model that leverages coarse images to improve the classification performance of fine images within the coarse category.
646	Gaussian Affinity for Max-Margin Class Imbalanced Learning	Munawar Hayat, Salman Khan, Syed Waqas Zamir, Jianbing Shen, Ling Shao	Here, we introduce the first hybrid loss function that jointly performs classification and clustering in a single formulation.
647	AttPool: Towards Hierarchical Feature Representation in Graph Convolutional Networks via Attention Mechanism	Jingjia Huang, Zhangheng Li, Nannan Li, Shan Liu, Ge Li	Here, we propose AttPool, which is a novel graph pooling module based on attention mechanism, to remedy the problem.
648	Deep Metric Learning With Tuplet Margin Loss	Baosheng Yu, Dacheng Tao	In this paper, we propose a new deep metric learning loss function, tuplet margin loss, using randomly selected samples from each mini-batch.
649	Normalized Wasserstein for Mixture Distributions With Applications in Adversarial Learning and Domain Adaptation	Yogesh Balaji, Rama Chellappa, Soheil Feizi	In this work, we focus on mixture distributions that arise naturally in several application domains where the data contains different sub-populations.
650	Fast and Practical Neural Architecture Search	Jiequan Cui, Pengguang Chen, Ruiyu Li, Shu Liu, Xiaoyong Shen, Jiaya Jia	In this paper, we propose a fast and practical neural architecture search (FPNAS) framework for automatic network design.
651	Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning	Jiwoong Park, Minsik Lee, Hyung Jin Chang, Kyuewang Lee, Jin Young Choi	We propose a symmetric graph convolutional autoencoder which produces a low-dimensional latent representation from a graph.
652	Deep Elastic Networks With Model Selection for Multi-Task Learning	Chanho Ahn, Eunwoo Kim, Songhwai Oh	In this work, we consider the problem of instance-wise dynamic network model selection for multi-task learning.
653	Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings	Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein	In this paper, we tackle this scattering problem with a distribution-aware regularization named HORDE.
654	Adversarial Learning With Margin-Based Triplet Embedding Regularization	Yaoyao Zhong, Weihong Deng	To address this problem, we propose to improve the local smoothness of the representation space, by integrating a margin-based triplet embedding regularization term into the classification objective, so that the obtained models learn to resist adversarial examples.
655	Simultaneous Multi-View Instance Detection With Learned Geometric Soft-Constraints	Ahmed Samy Nassar, Sebastien Lefevre, Jan Dirk Wegner	We propose to jointly learn multi-view geometry and warping between views of the same object instances for robust cross-view object detection.
656	CenterNet: Keypoint Triplets for Object Detection	Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian	This paper presents an efficient solution that explores the visual patterns within individual cropped regions with minimal costs.
657	Online Hyper-Parameter Learning for Auto-Augmentation Strategy	Chen Lin, Minghao Guo, Chuming Li, Xin Yuan, Wei Wu, Junjie Yan, Dahua Lin, Wanli Ouyang	In this paper, we propose Online Hyper-parameter Learning for Auto-Augmentation (OHL-Auto-Aug), an economical solution that learns the augmentation policy distribution along with network training.
658	DANet: Divergent Activation for Weakly Supervised Object Localization	Haolan Xue, Chang Liu, Fang Wan, Jianbin Jiao, Xiangyang Ji, Qixiang Ye	In this paper, we propose a divergent activation (DA) approach, and target at learning complementary and discriminative visual patterns for image classification and weakly supervised object localization from the perspective of discrepancy.
659	Selective Sparse Sampling for Fine-Grained Image Recognition	Yao Ding, Yanzhao Zhou, Yi Zhu, Qixiang Ye, Jianbin Jiao	In this paper, we propose a simple yet effective framework, called Selective Sparse Sampling, to capture diverse and fine-grained details.
660	Dynamic Anchor Feature Selection for Single-Shot Object Detection	Shuai Li, Lingxiao Yang, Jianqiang Huang, Xian-Sheng Hua, Lei Zhang	In this paper, we present a dynamic feature selection operation to select new pixels in a feature map for each refined anchor received from the ARM.
661	Incremental Learning Using Conditional Adversarial Networks	Ye Xiang, Ying Fu, Pan Ji, Hua Huang	In this paper, we propose a new incremental learning strategy based on conditional adversarial networks.
662	Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks	Jianyu Wang, Haichao Zhang	In this paper, we study fast training of adversarially robust models.
663	View Confusion Feature Learning for Person Re-Identification	Fangyi Liu, Lei Zhang	In this paper, we mainly focus on how to learn view-independent features by getting rid of view specific information through a view confusion learning mechanism.
664	Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification	Hang Xu, Lewei Yao, Wei Zhang, Xiaodan Liang, Zhenguo Li	In this paper, we study NAS for object detection, a core computer vision task that classifies and localizes object instances in an image.
665	PARN: Position-Aware Relation Networks for Few-Shot Learning	Ziyang Wu, Yuwei Li, Lihua Guo, Kui Jia	To address this problem, we introduce a deformable feature extractor (DFE) to extract more efficient features, and design a dual correlation attention mechanism (DCA) to deal with its inherent local connectivity.
666	Multi-Adversarial Faster-RCNN for Unrestricted Object Detection	Zhenwei He, Lei Zhang	For alleviating the problem of domain dependency and cumbersome labeling, this paper proposes to detect objects in unrestricted environment by leveraging domain knowledge trained from an auxiliary source domain with sufficient labels.
667	Object Guided External Memory Network for Video Object Detection	Hanming Deng, Yang Hua, Tao Song, Zongpu Zhang, Zhengui Xue, Ruhui Ma, Neil Robertson, Haibing Guan	In this work, we propose the first object guided external memory network for online video object detection.
668	An Empirical Study of Spatial Attention Mechanisms in Deep Networks	Xizhou Zhu, Dazhi Cheng, Zheng Zhang, Stephen Lin, Jifeng Dai	Toward a better general understanding of attention mechanisms, we present an empirical study that ablates various spatial attention elements within a generalized attention formulation, encompassing the dominant Transformer attention as well as the prevalent deformable convolution and dynamic convolution modules.
669	Attribute Attention for Semantic Disambiguation in Zero-Shot Learning	Yang Liu, Jishun Guo, Deng Cai, Xiaofei He	Considering both low-level visual information and global class-level features that relate to this ambiguity, we propose a practical Latent Feature Guided Attribute Attention (LFGAA) framework to perform object-based attribute attention for semantic disambiguation.
670	CIIDefence: Defeating Adversarial Attacks by Fusing Class-Specific Image Inpainting and Image Denoising	Puneet Gupta, Esa Rahtu	This paper presents a novel approach for protecting deep neural networks from adversarial attacks, i.e., methods that add well-crafted imperceptible modifications to the original inputs such that they are incorrectly classified with high confidence.
671	ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices	Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun	In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet.
672	Dual Student: Breaking the Limits of the Teacher in Semi-Supervised Learning	Zhanghan Ke, Daoye Wang, Qiong Yan, Jimmy Ren, Rynson W.H. Lau	In this work, we show that the coupled EMA teacher causes a performance bottleneck.
673	MVP Matching: A Maximum-Value Perfect Matching for Mining Hard Samples, With Application to Person Re-Identification	Han Sun, Zhiyuan Chen, Shiyang Yan, Lin Xu	In this paper, we propose a novel weighted complete bipartite graph based maximum-value perfect (MVP) matching for mining the hard samples from a batch of samples.
674	Adaptive Context Network for Scene Parsing	Jun Fu, Jing Liu, Yuhang Wang, Yong Li, Yongjun Bao, Jinhui Tang, Hanqing Lu	Based on this observation, we propose an Adaptive Context Network (ACNet) to capture the pixel-aware contexts by a competitive fusion of global context and local context according to different per-pixel demands.
675	Constructing Self-Motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach	Qing Lian, Fengmao Lv, Lixin Duan, Boqing Gong	We propose a new approach, called self-motivated pyramid curriculum domain adaptation (PyCDA), to facilitate the adaptation of semantic segmentation neural networks from synthetic source domains to real target domains.
676	SparseMask: Differentiable Connectivity Learning for Dense Image Prediction	Huikai Wu, Junge Zhang, Kaiqi Huang	In this paper, we aim at automatically searching an efficient network architecture for dense image prediction.
677	Significance-Aware Information Bottleneck for Domain Adaptive Semantic Segmentation	Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, Yi Yang	In this work, we equip the adversarial network with a “significance-aware information bottleneck (SIB)”, to address the above problem.
678	Relational Attention Network for Crowd Counting	Anran Zhang, Jiayi Shen, Zehao Xiao, Fan Zhu, Xiantong Zhen, Xianbin Cao, Ling Shao	In order to address such an issue, we propose a Relational Attention Network (RANet) with a self-attention mechanism for capturing interdependence of pixels.
679	ACFNet: Attentional Class Feature Network for Semantic Segmentation	Fan Zhang, Yanqin Chen, Zhihang Li, Zhibin Hong, Jingtuo Liu, Feifei Ma, Junyu Han, Errui Ding	In this paper, we use two types of base networks to evaluate the effectiveness of ACFNet.
680	Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly Supervised Semantic Segmentation	Jungbeom Lee, Eunji Kim, Sungmin Lee, Jangho Lee, Sungroh Yoon	We propose a method of using videos automatically harvested from the web to identify a larger region of the target object by using temporal information, which is not present in the static image.
681	Boundary-Aware Feature Propagation for Scene Segmentation	Henghui Ding, Xudong Jiang, Ai Qun Liu, Nadia Magnenat Thalmann, Gang Wang	In this work, we address the challenging issue of scene segmentation.
682	Self-Ensembling With GAN-Based Data Augmentation for Domain Adaptation in Semantic Segmentation	Jaehoon Choi, Taekyung Kim, Changick Kim	In this paper, we introduce a self-ensembling technique, one of the successful methods for domain adaptation in classification.
683	Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data	Fabian Manhardt, Diego Martin Arroyo, Christian Rupprecht, Benjamin Busam, Tolga Birdal, Nassir Navab, Federico Tombari	In this work we propose to explicitly deal with these ambiguities.
684	Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving	Xinzhu Ma, Zhihui Wang, Haojie Li, Pengbo Zhang, Wanli Ouyang, Xin Fan	In this paper, we propose a monocular 3D object detection framework in the domain of autonomous driving.
685	MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation	Lorenzo Bertoni, Sven Kreiss, Alexandre Alahi	We tackle the fundamentally ill-posed problem of 3D human localization from monocular RGB images.
686	Unsupervised High-Resolution Depth Learning From Videos With Dual Networks	Junsheng Zhou, Yuwang Wang, Kaihuai Qin, Wenjun Zeng	In order to fully explore the information contained in high-resolution data, we propose a simple yet effective dual networks architecture, which can directly take high-resolution images as input and generate high-resolution and high-accuracy depth map efficiently.
687	Bayesian Graph Convolution LSTM for Skeleton Based Action Recognition	Rui Zhao, Kang Wang, Hui Su, Qiang Ji	We propose a framework for recognizing human actions from skeleton data by modeling the underlying dynamic process that generates the motion pattern.
688	DeCaFA: Deep Convolutional Cascade for Face Alignment in the Wild	Arnaud Dapogny, Kevin Bailly, Matthieu Cord	In this paper, we introduce an end-to-end deep convolutional cascade (DeCaFA) architecture for face alignment.
689	Probabilistic Face Embeddings	Yichun Shi, Anil K. Jain	We propose Probabilistic Face Embeddings (PFEs), which represent each face image as a Gaussian distribution in the latent space.
690	Gaze360: Physically Unconstrained Gaze Estimation in the Wild	Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, Antonio Torralba	In this work, we present Gaze360, a large-scale remote gaze-tracking dataset and method for robust 3D gaze estimation in unconstrained images.
691	Unsupervised Person Re-Identification by Camera-Aware Similarity Consistency Learning	Ancong Wu, Wei-Shi Zheng, Jian-Huang Lai	To alleviate the effect of cross-camera scene variation, we propose a Camera-Aware Similarity Consistency Loss to learn consistent pairwise similarity distributions for intra-camera matching and cross-camera matching.
692	Photo-Realistic Monocular Gaze Redirection Using Generative Adversarial Networks	Zhe He, Adrian Spurr, Xucong Zhang, Otmar Hilliges	In this work, we present a novel method to alleviate this problem by leveraging generative adversarial training to synthesize an eye image conditioned on a target gaze direction.
693	Dynamic Kernel Distillation for Efficient Pose Estimation in Videos	Xuecheng Nie, Yuncheng Li, Linjie Luo, Ning Zhang, Jiashi Feng	To address this issue, we propose a novel Dynamic Kernel Distillation (DKD) model to facilitate small networks for estimating human poses in videos, thus significantly lifting the efficiency.
694	Single-Stage Multi-Person Pose Machines	Xuecheng Nie, Jiashi Feng, Jianfeng Zhang, Shuicheng Yan	In this work, we present the first single-stage model, Single-stage multi-person Pose Machine (SPM), to simplify the pipeline and lift the efficiency for multi-person pose estimation.
695	SO-HandNet: Self-Organizing Network for 3D Hand Pose Estimation With Semi-Supervised Learning	Yujin Chen, Zhigang Tu, Liuhao Ge, Dejun Zhang, Ruizhi Chen, Junsong Yuan	Inspired by the point cloud autoencoder presented in self-organizing network (SO-Net), our proposed SO-HandNet aims at making use of the unannotated data to obtain accurate 3D hand pose estimation in a semi-supervised manner.
696	Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression	Xinyao Wang, Liefeng Bo, Li Fuxin	In this paper, we analyze the ideal loss function properties for heatmap regression in face alignment problems.
697	Single-Network Whole-Body Pose Estimation	Gines Hidalgo, Yaadhav Raaj, Haroon Idrees, Donglai Xiang, Hanbyul Joo, Tomas Simon, Yaser Sheikh	We present the first single-network approach for 2D whole-body pose estimation, which entails simultaneous localization of body, face, hands, and feet keypoints.
698	Face Alignment With Kernel Density Deep Neural Network	Lisha Chen, Hui Su, Qiang Ji	To model more general distributions, such as multi-modal or asymmetric distributions, we propose to develop a kernel density deep neural network.
699	Spatiotemporal Feature Residual Propagation for Action Prediction	He Zhao, Richard P. Wildes	In this study, we address this task by investigating how action patterns evolve over time in a spatial feature space.
700	Identity From Here, Pose From There: Self-Supervised Disentanglement and Generation of Objects Using Unlabeled Videos	Fanyi Xiao, Haotian Liu, Yong Jae Lee	We propose a novel approach that disentangles the identity and pose of objects for image generation.
701	Relation Distillation Networks for Video Object Detection	Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei	In this paper, we introduce a new design to capture the interactions across the objects in spatio-temporal context.
702	Video Compression With Rate-Distortion Autoencoders	Amirhossein Habibian, Ties van Rozendaal, Jakub M. Tomczak, Taco S. Cohen	In this paper we present a a deep generative model for lossy video compression.
703	Non-Local ConvLSTM for Video Compression Artifact Reduction	Yi Xu, Longwen Gao, Kai Tian, Shuigeng Zhou, Huyang Sun	To remedy these shortcomings, in this paper we propose a novel end-to-end deep neural network called non-local ConvLSTM (NL-ConvLSTM in short) that exploits multiple consecutive frames.
704	Self-Supervised Moving Vehicle Tracking With Stereo Sound	Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba	In particular, we propose a framework that consists of a vision “teacher” network and a stereo-sound “student” network.
705	Self-Supervised Learning With Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera	Yuhua Chen, Cordelia Schmid, Cristian Sminchisescu	We present GLNet, a self-supervised framework for learning depth, optical flow, camera pose and intrinsic parameters from monocular video — addressing the difficulty of acquiring realistic ground-truth for such tasks.
706	Learning Temporal Action Proposals With Fewer Labels	Jingwei Ji, Kaidi Cao, Juan Carlos Niebles	In this work, we propose a semi-supervised learning algorithm specifically designed for training temporal action proposal networks.
707	TSM: Temporal Shift Module for Efficient Video Understanding	Ji Lin, Chuang Gan, Song Han	In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance.
708	Graph Convolutional Networks for Temporal Action Localization	Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan	In this paper, we propose to exploit the proposal-proposal relations using GraphConvolutional Networks (GCNs).
709	Fast Object Detection in Compressed Video	Shiyao Wang, Hongchao Lu, Zhidong Deng	In this paper, we propose a fast object detection method by taking advantage of this with a novel Motion aided Memory Network (MMNet).
710	Predicting 3D Human Dynamics From Video	Jason Y. Zhang, Panna Felsen, Angjoo Kanazawa, Jitendra Malik	In this work, we present perhaps the first approach for predicting a future 3D mesh model sequence of a person from past video input.
711	Imitation Learning for Human Pose Prediction	Borui Wang, Ehsan Adeli, Hsu-kuang Chiu, De-An Huang, Juan Carlos Niebles	Inspired by the recent success of deep reinforcement learning methods, in this paper we propose a new reinforcement learning formulation for the problem of human pose prediction, and develop an imitation learning algorithm for predicting future poses under this formulation through a combination of behavioral cloning and generative adversarial imitation learning.
712	Human Motion Prediction via Spatio-Temporal Inpainting	Alejandro Hernandez, Jurgen Gall, Francesc Moreno-Noguer	We propose a Generative Adversarial Network (GAN) to forecast 3D human motion given a sequence of past 3D skeleton poses.
713	Structured Prediction Helps 3D Human Motion Modelling	Emre Aksan, Manuel Kaufmann, Otmar Hilliges	In this paper, we propose a novel approach that decomposes the prediction into individual joints by means of a structured prediction layer that explicitly models the joint dependencies.
714	Learning Shape Templates With Structured Implicit Functions	Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T. Freeman, Thomas Funkhouser	In this paper, we investigate learning a general shape template from data.
715	CompenNet++: End-to-End Full Projector Compensation	Bingyao Huang, Haibin Ling	In this paper, we propose the first end-to-end solution, named CompenNet++, to solve the two problems jointly. Moreover, we construct the first setup-independent full compensation benchmark to facilitate the study on this topic.
716	Deep Parametric Indoor Lighting Estimation	Marc-Andre Gardner, Yannick Hold-Geoffroy, Kalyan Sunkavalli, Christian Gagne, Jean-Francois Lalonde	We present a method to estimate lighting from a single image of an indoor scene.
717	FSGAN: Subject Agnostic Face Swapping and Reenactment	Yuval Nirkin, Yosi Keller, Tal Hassner	We present Face Swapping GAN (FSGAN) for face swapping and reenactment.
718	Deep Single-Image Portrait Relighting	Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, David W. Jacobs	In this work, we apply a physically-based portrait relighting method to generate a large scale, high quality, “in the wild” portrait relighting dataset (DPR).
719	PU-GAN: A Point Cloud Upsampling Adversarial Network	Ruihui Li, Xianzhi Li, Chi-Wing Fu, Daniel Cohen-Or, Pheng-Ann Heng	This paper presents a new point cloud upsampling network called PU-GAN, which is formulated based on a generative adversarial network (GAN), to learn a rich variety of point distributions from the latent space and upsample points over patches on object surfaces.
720	Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation	Giorgos Bouritsas, Sergiy Bokhnyak, Stylianos Ploumpis, Michael Bronstein, Stefanos Zafeiriou	In this paper, we focus on 3D deformable shapes that share a common topological structure, such as human faces and bodies.
721	Joint Learning of Saliency Detection and Weakly Supervised Semantic Segmentation	Yu Zeng, Yunzhi Zhuge, Huchuan Lu, Lihe Zhang	Here we propose a unified multi-task learning framework to jointly solve WSSS and SD using a single network, i.e. saliency and segmentation network (SSNet).
722	Towards High-Resolution Salient Object Detection	Yi Zeng, Pingping Zhang, Jianming Zhang, Zhe Lin, Huchuan Lu	This paper pushes forward high-resolution saliency detection, and contributes a new dataset, named High-Resolution Salient Object Detection (HRSOD) dataset. To our best knowledge, HRSOD is the first high-resolution saliency detection dataset to date.
723	Event-Based Motion Segmentation by Motion Compensation	Timo Stoffregen, Guillermo Gallego, Tom Drummond, Lindsay Kleeman, Davide Scaramuzza	We present the first per-event segmentation method for splitting a scene into independently moving objects.
724	Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection	Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, Huchuan Lu	In this work, we propose a novel depth-induced multi-scale recurrent attention network for saliency detection. In addition, we create a large scale RGB-D dataset containing more complex scenarios, which can contribute to comprehensively evaluating saliency models.
725	Stacked Cross Refinement Network for Edge-Aware Salient Object Detection	Zhe Wu, Li Su, Qingming Huang	Motivated by the logical interrelations between binary segmentation and edge maps, we propose a novel Stacked Cross Refinement Network (SCRN) for salient object detection in this paper.
726	Motion Guided Attention for Video Salient Object Detection	Haofeng Li, Guanqi Chen, Guanbin Li, Yizhou Yu	In this paper, we develop a multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using two sub-networks, one sub-network for salient object detection in still images and the other for motion saliency detection in optical flow images.
727	Semi-Supervised Video Salient Object Detection Using Pseudo-Labels	Pengxiang Yan, Guanbin Li, Yuan Xie, Zhen Li, Chuan Wang, Tianshui Chen, Liang Lin	In this paper, we address the semi-supervised video salient object detection task using pseudo-labels.
728	Joint Learning of Semantic Alignment and Object Landmark Detection	Sangryul Jeon, Dongbo Min, Seungryong Kim, Kwanghoon Sohn	In this paper, we present a joint learning approach for obtaining dense correspondences and discovering object landmarks from semantically similar images.
729	RainFlow: Optical Flow Under Rain Streaks and Rain Veiling Effect	Ruoteng Li, Robby T. Tan, Loong-Fah Cheong, Angelica I. Aviles-Rivero, Qingnan Fan, Carola-Bibiane Schonlieb	Concerning this, we propose a deep-learning based optical flow method designed to handle heavy rain.
730	GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing	Xiaohong Liu, Yongrui Ma, Zhihao Shi, Jun Chen	We propose an end-to-end trainable Convolutional Neural Network (CNN), named GridDehazeNet, for single image dehazing.
731	Learning to See Moving Objects in the Dark	Haiyang Jiang, Yinqiang Zheng	We propose a novel optical system to capture bright and dark videos of the exact same scenes, generating training and groud truth pairs for authentic low-light video dataset.
732	SegSort: Segmentation by Discriminative Sorting of Segments	Jyh-Jing Hwang, Stella X. Yu, Jianbo Shi, Maxwell D. Collins, Tien-Ju Yang, Xiao Zhang, Liang-Chieh Chen	This motivates us to propose an end-to-end pixel-wise metric learning approach that mimics this process.
733	What Synthesis Is Missing: Depth Adaptation Integrated With Weak Supervision for Indoor Scene Parsing	Keng-Chi Liu, Yi-Ting Shen, Jan P. Klopp, Liang-Gee Chen	The aim of this work is hence twofold: Exploit synthetic data where feasible and integrate weak supervision where necessary.
734	AdaptIS: Adaptive Instance Selection Network	Konstantin Sofiiuk, Olga Barinova, Anton Konushin	We present Adaptive Instance Selection network architecture for class-agnostic instance segmentation.
735	DADA: Depth-Aware Domain Adaptation in Semantic Segmentation	Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Perez	In this work, we aim at exploiting at best such a privileged information while training the UDA model.
736	Guided Curriculum Model Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation	Christos Sakaridis, Dengxin Dai, Luc Van Gool	Our central contributions are: 1) a curriculum framework to gradually adapt semantic segmentation models from day to night via labeled synthetic images and unlabeled real images, both for progressively darker times of day, which exploits cross-time-of-day correspondences for the real images to guide the inference of their labels; 2) a novel uncertainty-aware annotation and evaluation framework and metric for semantic segmentation, designed for adverse conditions and including image regions beyond human recognition capability in the evaluation in a principled fashion; 3) the Dark Zurich dataset, which comprises 2416 unlabeled nighttime and 2920 unlabeled twilight images with correspondences to their daytime counterparts plus a set of 151 nighttime images with fine pixel-level annotations created with our protocol, which serves as a first benchmark to perform our novel evaluation.
737	SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation	Yang Zhou, Zachary While, Evangelos Kalogerakis	In this paper we propose a neural message passing approach to augment an input 3D indoor scene with new objects matching their surroundings.
738	SkyScapes Fine-Grained Semantic Understanding of Aerial Scenes	Seyed Majid Azimi, Corentin Henry, Lars Sommer, Arne Schumann, Eleonora Vig	We therefore propose a novel multi-task model, which incorporates semantic edge detection and is better tuned for feature extraction from a wide range of scales.
739	Transferable Representation Learning in Vision-and-Language Navigation	Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie	Our approach adapts pre-trained vision and language representations to relevant in-domain tasks making them more effective for VLN.
740	Towards Unsupervised Image Captioning With Shared Multimodal Embeddings	Iro Laina, Christian Rupprecht, Nassir Navab	In this paper, we address image captioning by generating language descriptions of scenes without learning from annotated pairs of images and their captions.
741	ViCo: Word Embeddings From Visual Co-Occurrences	Tanmay Gupta, Alexander Schwing, Derek Hoiem	We propose to learn word embeddings from visual co-occurrences.
742	Seq-SG2SL: Inferring Semantic Layout From Scene Graph Through Sequence to Sequence Learning	Boren Li, Boyu Zhuang, Mingyang Li, Jian Gu	We present a conceptually simple, flexible and general framework using sequence to sequence (seq-to-seq) learning for this task.
743	U-CAM: Visual Explanation Using Uncertainty Based Class Activation Maps	Badri N. Patro, Mayank Lunayach, Shivansh Patel, Vinay P. Namboodiri	Towards this, we propose a method that obtains gradient-based certainty estimates that also provide visual attention maps.
744	See-Through-Text Grouping for Referring Image Segmentation	Ding-Jie Chen, Songhao Jia, Yi-Chen Lo, Hwann-Tzong Chen, Tyng-Luh Liu	Motivated by the conventional grouping techniques to image segmentation, we develop their DNN counterpart to tackle the referring variant.
745	VideoBERT: A Joint Model for Video and Language Representation Learning	Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid	Whereas most existing approaches learn low-level representations, we propose a joint visual-linguistic model to learn high-level features without any explicit supervision.
746	Language Features Matter: Effective Language Representations for Vision-Language Tasks	Andrea Burns, Reuben Tan, Kate Saenko, Stan Sclaroff, Bryan A. Plummer	We conclude that language features deserve more attention, which has been informed by experiments which compare different word embeddings, language models, and embedding augmentation steps on five common VL tasks: image-sentence retrieval, image captioning, visual question answering, phrase grounding, and text-to-clip retrieval.
747	Semantic Stereo Matching With Pyramid Cost Volumes	Zhenyao Wu, Xinyi Wu, Xiaoping Zhang, Song Wang, Lili Ju	To further capture the details of disparity maps, in this paper, we propose a novel semantic stereo network named SSPCV-Net, which includes newly designed pyramid cost volumes for describing semantic and spatial information on multiple levels.
748	Spatial Correspondence With Generative Adversarial Network: Learning Depth From Monocular Videos	Zhenyao Wu, Xinyi Wu, Xiaoping Zhang, Song Wang, Lili Ju	In this paper, we present a novel SC-GAN network with end-to-end adversarial training for depth estimation from monocular videos without estimating the camera pose and pose change over time.
749	Learning Relationships for Multi-View 3D Object Recognition	Ze Yang, Liwei Wang	To tackle this problem, we propose a Relation Network to effectively connect corresponding regions from different viewpoints, and therefore reinforce the information of individual view image.
750	View N-Gram Network for 3D Object Retrieval	Xinwei He, Tengteng Huang, Song Bai, Xiang Bai	To address these issues, we propose an effective and efficient framework called View N-gram Network (VNN).
751	Expert Sample Consensus Applied to Camera Re-Localization	Eric Brachmann, Carsten Rother	In this work, we fit the 6D camera pose to a set of noisy correspondences between the 2D input image and a known 3D environment.
752	Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints From Limited Training Data	Yutong Bai, Qing Liu, Lingxi Xie, Weichao Qiu, Yan Zheng, Alan L. Yuille	In this paper, we present an approach which can learn from a small annotated dataset containing a limited range of viewpoints and generalize to detect semantic parts for a much larger range of viewpoints.
753	Dynamic Points Agglomeration for Hierarchical Point Sets Learning	Jinxian Liu, Bingbing Ni, Caiyuan Li, Jiancheng Yang, Qi Tian	To this end, we develop a novel hierarchical point sets learning architecture, with dynamic points agglomeration.
754	Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints	Ning Yu, Larry S. Davis, Mario Fritz	We present the first study of learning GAN fingerprints towards image attribution and using them to classify an image as real or GAN-generated.
755	Dual Adversarial Inference for Text-to-Image Synthesis	Qicheng Lao, Mohammad Havaei, Ahmad Pesaranghader, Francis Dutil, Lisa Di Jorio, Thomas Fevens	In this paper, we aim to learn two variables that are disentangled in the latent space, representing content and style respectively.
756	View-LSTM: Novel-View Video Synthesis Through View Decomposition	Mohamed Ilyes Lakhal, Oswald Lanz, Andrea Cavallaro	We tackle the problem of synthesizing a video of multiple moving people as seen from a novel view, given only an input video and depth information or human poses of the novel view as prior.
757	HoloGAN: Unsupervised Learning of 3D Representations From Natural Images	Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, Yong-Liang Yang	We propose a novel generative adversarial network (GAN) for the task of unsupervised learning of 3D representations from natural images.
758	Unpaired Image-to-Speech Synthesis With Multimodal Information Bottleneck	Shuang Ma, Daniel McDuff, Yale Song	We propose a multimodal information bottleneck approach that learns the correspondence between modalities from unpaired data (image and speech) by leveraging the shared modality (text).
759	Improved Conditional VRNNs for Video Prediction	Lluis Castrejon, Nicolas Ballas, Aaron Courville	In this work we argue that this is a sign of underfitting.
760	Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery	Xiaosheng Yan, Feigege Wang, Wenxi Liu, Yuanlong Yu, Shengfeng He, Jia Pan	In this paper, we propose a novel iterative multi-task framework to complete the segmentation mask of an occluded vehicle and recover the appearance of its invisible parts. To evaluate our method, we present a dataset, Occluded Vehicle dataset, containing synthetic and real-world occluded vehicle images.
761	Learning Single Camera Depth Estimation Using Dual-Pixels	Rahul Garg, Neal Wadhwa, Sameer Ansari, Jonathan T. Barron	To allow learning based methods to work well on dual-pixel imagery, we identify an inherent ambiguity in the depth estimated from dual-pixel cues, and develop an approach to estimate depth up to this ambiguity.
762	Domain-Adaptive Single-View 3D Reconstruction	Pedro O. Pinheiro, Negar Rostamzadeh, Sungjin Ahn	In this paper, we propose a framework to improve over these challenges using adversarial training.
763	Transformable Bottleneck Networks	Kyle Olszewski, Sergey Tulyakov, Oliver Woodford, Hao Li, Linjie Luo	We propose a novel approach to performing fine-grained 3D manipulation of image content via a convolutional neural network, which we call the Transformable Bottleneck Network (TBN).
764	RIO: 3D Object Instance Re-Localization in Changing Indoor Environments	Johanna Wald, Armen Avetisyan, Nassir Navab, Federico Tombari, Matthias Niessner	In this work, we introduce the task of 3D object instance re-localization (RIO): given one or multiple objects in an RGB-D scan, we want to estimate their corresponding 6DoF poses in another 3D scan of the same environment taken at a later point in time.
765	Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation	Kiru Park, Timothy Patten, Markus Vincze	To address these problems, we propose a novel pose estimation method, Pix2Pose, that predicts the 3D coordinates of each object pixel without textured models.
766	CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation	Zhigang Li, Gu Wang, Xiangyang Ji	In this work, we propose a novel 6-DoF pose estimation approach: Coordinates-based Disentangled Pose Network (CDPN), which disentangles the pose to predict rotation and translation separately to achieve highly accurate and robust pose estimation.
767	C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion	David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedaldi	We propose C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images.
768	Learning to Reconstruct 3D Manhattan Wireframes From a Single Image	Yichao Zhou, Haozhi Qi, Yuexiang Zhai, Qi Sun, Zhili Chen, Li-Yi Wei, Yi Ma	From a single view of an urban environment, we propose a method to effectively exploit the global structural regularities for obtaining a compact, accurate, and intuitive 3D wireframe representation.
769	Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning	Shichen Liu, Tianye Li, Weikai Chen, Hao Li	Unlike the state-of-the-art differentiable renderers, which only approximate the rendering gradient in the back propagation, we propose a truly differentiable rendering framework that is able to (1) directly render colorized mesh using differentiable functions and (2) back-propagate efficient supervision signals to mesh vertices and their attributes from various forms of image representations, including silhouette, shading and color images.
770	Learnable Triangulation of Human Pose	Karim Iskakov, Egor Burkov, Victor Lempitsky, Yury Malkov	We present two novel solutions for multi-view 3D human pose estimation based on new learnable triangulation methods that combine 3D information from multiple 2D views.
771	xR-EgoPose: Egocentric 3D Human Pose From an HMD Camera	Denis Tome, Patrick Peluse, Lourdes Agapito, Hernan Badino	We present a new solution to egocentric 3D body pose estimation from monocular images captured from a downward looking fish-eye camera installed on the rim of a head mounted virtual reality device.
772	DeepHuman: 3D Human Reconstruction From a Single Image	Zerong Zheng, Tao Yu, Yixuan Wei, Qionghai Dai, Yebin Liu	We propose DeepHuman, an image-guided volume-to-volume translation CNN for 3D human reconstruction from a single RGB image.
773	A Neural Network for Detailed Human Depth Estimation From a Single Image	Sicong Tang, Feitong Tan, Kelvin Cheng, Zhaoyang Li, Siyu Zhu, Ping Tan	This paper presents a neural network to estimate a detailed depth map of the foreground human in a single RGB image.
774	DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare	Yuanlu Xu, Song-Chun Zhu, Tony Tung	We present DenseRaC, a novel end-to-end framework for jointly estimating 3D human pose and body shape from a monocular RGB image. To boost learning, we further construct a large-scale synthetic dataset (MOCA) utilizing web-crawled Mocap sequences, 3D scans and animations.
775	Not All Parts Are Created Equal: 3D Pose Estimation by Modeling Bi-Directional Dependencies of Body Parts	Jue Wang, Shaoli Huang, Xinchao Wang, Dacheng Tao	In this paper, we propose a progressive approach that explicitly accounts for the distinct DOFs among the body parts.
776	Extreme View Synthesis	Inchang Choi, Orazio Gallo, Alejandro Troccoli, Min H. Kim, Jan Kautz	We present Extreme View Synthesis, a solution for novel view extrapolation that works even when the number of input images is small—as few as two.
777	View Independent Generative Adversarial Network for Novel View Synthesis	Xiaogang Xu, Ying-Cong Chen, Jiaya Jia	In this paper, we propose an encoder-decoder based generative adversarial network VI-GAN to tackle this problem.
778	Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion	Pingping Zhang, Wei Liu, Yinjie Lei, Huchuan Lu, Xiaoyun Yang	To address these issues, in this work we propose a novel deep learning framework, named Cascaded Context Pyramid Network (CCPNet), to jointly infer the occupancy and semantic labels of a volumetric 3D scene from a single depth image.
779	View-Consistent 4D Light Field Superpixel Segmentation	Numair Khan, Qian Zhang, Lucas Kasser, Henry Stone, Min H. Kim, James Tompkin	Our proposed approach combines an occlusion-aware angular segmentation in horizontal and vertical EPI spaces with an occlusion-aware clustering and propagation step across all views.
780	GLoSH: Global-Local Spherical Harmonics for Intrinsic Image Decomposition	Hao Zhou, Xiang Yu, David W. Jacobs	In this work, we propose a Global-Local Spherical Harmonics (GLoSH) lighting model to improve the lighting component, and jointly predict reflectance and surface normals.
781	Surface Normals and Shape From Water	Satoshi Murai, Meng-Yu Jennifer Kuo, Ryo Kawahara, Shohei Nobuhara, Ko Nishino	In this paper, we introduce a novel method for reconstructing surface normals and depth of dynamic objects in water.
782	Restoration of Non-Rigidly Distorted Underwater Images Using a Combination of Compressive Sensing and Local Polynomial Image Representations	Jerin Geo James, Pranay Agrawal, Ajit Rajwade	Motivated by this, we pose the task of restoration of such video sequences as a compressed sensing (CS) problem.
783	Learning Perspective Undistortion of Portraits	Yajie Zhao, Zeng Huang, Tianye Li, Weikai Chen, Chloe LeGendre, Xinglei Ren, Ari Shapiro, Hao Li	We present the first deep learning based approach to remove such artifacts from unconstrained portraits. Moreover, we also build the first perspective portrait database with a large diversity in identities, expression and poses.
784	Towards Photorealistic Reconstruction of Highly Multiplexed Lensless Images	Salman S. Khan, Adarsh V. R., Vivek Boominathan, Jasper Tan, Ashok Veeraraghavan, Kaushik Mitra	In this paper, we present a method to obtain image reconstructions from mask-based lensless measurements that are more photorealistic than those currently available in the literature.
785	Unconstrained Motion Deblurring for Dual-Lens Cameras	M. R. Mahesh Mohan, Sharath Girish, A. N. Rajagopalan	In this paper, we propose a generalized blur model that elegantly explains the intrinsically coupled image formation model for dual-lens set-up, which are by far most predominant in smartphones.
786	Stochastic Exposure Coding for Handling Multi-ToF-Camera Interference	Jongho Lee, Mohit Gupta	In this paper, we propose stochastic exposure coding (SEC), a novel approach for mitigating.
787	Convolutional Approximations to the General Non-Line-of-Sight Imaging Operator	Byeongjoo Ahn, Akshat Dave, Ashok Veeraraghavan, Ioannis Gkioulekas, Aswin C. Sankaranarayanan	We introduce a computationally tractable framework for solving the ellipsoidal tomography problem.
788	Agile Depth Sensing Using Triangulation Light Curtains	Joseph R. Bartels, Jian Wang, William “Red” Whittaker, Srinivasa G. Narasimhan	In this paper, we present an approach and system to dynamically and adaptively sample the depths of a scene using the principle of triangulation light curtains.
789	Asynchronous Single-Photon 3D Imaging	Anant Gupta, Atul Ingle, Mohit Gupta	We propose asynchronous single-photon 3D imaging, a family of acquisition schemes to mitigate pileup during data acquisition itself.
790	Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation	Yu-Jhe Li, Ci-Siang Lin, Yan-Bo Lin, Yu-Chiang Frank Wang	To achieve this goal, our proposed Pose Disentanglement and Adaptation Network (PDA-Net) aims at learning deep image representation with pose and domain information properly disentangled.
791	A Learned Representation for Scalable Vector Graphics	Raphael Gontijo Lopes, David Ha, Douglas Eck, Jonathon Shlens	In this work we attempt to model the drawing process of fonts by building sequential generative models of vector graphics.
792	ELF: Embedded Localisation of Features in Pre-Trained CNN	Assia Benbihi, Matthieu Geist, Cedric Pradalier	This paper introduces a novel feature detector based only on information embedded inside a CNN trained on standard tasks (e.g. classification).
793	Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking	Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler	We propose a new Group Feature Selection method for Discriminative Correlation Filters (GFS-DCF) based visual object tracking.
794	Sampling Wisely: Deep Image Embedding by Top-K Precision Optimization	Jing Lu, Chaofan Xu, Wei Zhang, Ling-Yu Duan, Tao Mei	In contrast to existing works, in this paper, we propose a novel deep image embedding algorithm with end-to-end optimization to top-k precision, the evaluation metric that is closely related to user experience.
795	On the Global Optima of Kernelized Adversarial Representation Learning	Bashir Sadeghi, Runyi Yu, Vishnu Boddeti	In this paper, we first study the “linear” form of this problem i.e., the setting where all the players are linear functions. We show that the resulting optimization problem is both non-convex and non-differentiable. We obtain an exact closed-form expression for its global optima through spectral learning and provide performance guarantees in terms of analytical bounds on the achievable utility and invariance. We then extend this solution and analysis to non-linear functions through kernel representation.
796	Addressing Model Vulnerability to Distributional Shifts Over Image Transformation Sets	Riccardo Volpi, Vittorio Murino	We formulate a combinatorial optimization problem that allows evaluating the regions in the image space where a given model is more vulnerable, in terms of image transformations applied to the input, and face it with standard search algorithms.
797	Attract or Distract: Exploit the Margin of Open Set	Qianyu Feng, Guoliang Kang, Hehe Fan, Yi Yang	In this paper, we exploit the semantic structure of open set data from two aspects: 1) Semantic Categorical Alignment, which aims to achieve good separability of target known classes by categorically aligning the centroid of target with the source.
798	MIC: Mining Interclass Characteristics for Improved Metric Learning	Karsten Roth, Biagio Brattoli, Bjorn Ommer	In contrast, we propose to explicitly learn the latent characteristics that are shared by and go across object classes.
799	Self-Supervised Representation Learning via Neighborhood-Relational Encoding	Mohammad Sabokrou, Mohammad Khalooei, Ehsan Adeli	In this paper, we propose a novel self-supervised representation learning by taking advantage of a neighborhood-relational encoding (NRE) among the training data.
800	AWSD: Adaptive Weighted Spatiotemporal Distillation for Video Representation	Mohammad Tavakolian, Hamed R. Tavakoli, Abdenour Hadid	We propose an Adaptive Weighted Spatiotemporal Distillation (AWSD) technique for video representation by encoding the appearance and dynamics of the videos into a single RGB image map.
801	Bilinear Attention Networks for Person Retrieval	Pengfei Fang, Jieming Zhou, Soumava Kumar Roy, Lars Petersson, Mehrtash Harandi	We propose an Attention in Attention (AiA) mechanism to build inter-dependency among the second order local and global features with the intent to make better use of, or pay more attention to, such higher order statistical relationships.
802	Discriminative Feature Learning With Consistent Attention Regularization for Person Re-Identification	Sanping Zhou, Fei Wang, Zeyi Huang, Jinjun Wang	In this paper, we propose a simple yet effective feedforward attention network to address the two mentioned problems, in which a novel consistent attention regularizer and an improved triplet loss are designed to learn foreground attentive features for person Re-ID.
803	Semi-Supervised Domain Adaptation via Minimax Entropy	Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko	To address this semi-supervised domain adaptation (SSDA) setting, we propose a novel Minimax Entropy (MME) approach that adversarially optimizes an adaptive few-shot model.
804	Boosting Few-Shot Visual Learning With Self-Supervision	Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Perez, Matthieu Cord	In this work we exploit the complementarity of these two domains and propose an approach for improving few-shot learning through self-supervision.
805	FDA: Feature Disruptive Attack	Aditya Ganeshan, Vivek B.S., R. Venkatesh Babu	In this work we, (i) show the drawbacks of such attacks, (ii) propose two new evaluation metrics: Old Label New Rank (OLNR) and New Label Old Rank (NLOR) in order to quantify the extent of damage made by an attack, and (iii) propose a new attack FDA: Feature Disruptive attack, to address the drawbacks of existing attacks.
806	A Novel Unsupervised Camera-Aware Domain Adaptation Framework for Person Re-Identification	Lei Qi, Lei Wang, Jing Huo, Luping Zhou, Yinghuan Shi, Yang Gao	From the perspective of representation learning, this paper proposes a novel end-to-end deep domain adaptation framework to address them.
807	Recover and Identify: A Generative Dual Model for Cross-Resolution Person Re-Identification	Yu-Jhe Li, Yun-Chun Chen, Yen-Yu Lin, Xiaofei Du, Yu-Chiang Frank Wang	To overcome this problem, we propose a novel generative adversarial network to address cross-resolution person re-ID, allowing query images with varying resolutions.
808	Cross-View Policy Learning for Street Navigation	Ang Li, Huiyi Hu, Piotr Mirowski, Mehrdad Farajtabar	Since aerial images are easily and globally accessible, we propose instead to transfer a ground view policy, from training areas to unseen (target) parts of the city, by utilizing aerial view observations.
809	Learning Across Tasks and Domains	Pierluigi Zama Ramirez, Alessio Tonioni, Samuele Salti, Luigi Di Stefano	In this work, we introduce a novel adaptation framework that can operate across both task and domains.
810	EMPNet: Neural Localisation and Mapping Using Embedded Memory Points	Gil Avraham, Yan Zuo, Thanuja Dharmasiri, Tom Drummond	In this work we develop a memory module which contains rigidly aligned point-embeddings that represent a coherent scene structure acquired from an RGB-D sequence of observations.
811	AVT: Unsupervised Learning of Transformation Equivariant Representations by Autoencoding Variational Transformations	Guo-Jun Qi, Liheng Zhang, Chang Wen Chen, Qi Tian	To this end, we present a novel principled method by Autoencoding Variational Transformations (AVT), compared with the conventional approach to autoencoding data.
812	Composite Shape Modeling via Latent Space Factorization	Anastasia Dubrovina, Fei Xia, Panos Achlioptas, Mira Shalah, Raphael Groscot, Leonidas J. Guibas	We present a novel neural network architecture, termed Decomposer-Composer, for semantic structure-aware 3D shape modeling.
813	Deep Comprehensive Correlation Mining for Image Clustering	Jianlong Wu, Keyu Long, Fei Wang, Chen Qian, Cheng Li, Zhouchen Lin, Hongbin Zha	In this paper, we propose a novel clustering framework, named deep comprehensive correlation mining (DCCM), for exploring and taking full advantage of various kinds of correlations behind the unlabeled data from three aspects: 1) Instead of only using pair-wise information, pseudo-label supervision is proposed to investigate category information and learn discriminative features.
814	Unsupervised Multi-Task Feature Learning on Point Clouds	Kaveh Hassani, Mike Haley	We introduce an unsupervised multi-task model to jointly learn point and shape features on point clouds.
815	Reciprocal Multi-Layer Subspace Learning for Multi-View Clustering	Ruihuang Li, Changqing Zhang, Huazhu Fu, Xi Peng, Tianyi Zhou, Qinghua Hu	In this work, we present a novel Reciprocal Multi-layer Subspace Learning (RMSL) algorithm for multi-view clustering, which is composed of two main components: Hierarchical Self-Representative Layers (HSRL), and Backward Encoding Networks (BEN).
816	Geometric Disentanglement for Generative Latent Shape Models	Tristan Aumentado-Armstrong, Stavros Tsogkas, Allan Jepson, Sven Dickinson	In this paper, we propose an unsupervised approach to partitioning the latent space of a variational autoencoder for 3D point clouds in a natural way, using only geometric information, that builds upon prior work utilizing generative adversarial models of point sets.
817	GAN-Tree: An Incrementally Learned Hierarchical Generative Framework for Multi-Modal Data Distributions	Jogendra Nath Kundu, Maharshi Gor, Dakshit Agrawal, R. Venkatesh Babu	In contrast to such bottom-up approaches, we present GAN-Tree, which follows a hierarchical divisive strategy to address such discontinuous multi-modal data.
818	GODS: Generalized One-Class Discriminative Subspaces for Anomaly Detection	Jue Wang, Anoop Cherian	In this paper, we propose a novel objective for one-class learning.
819	Neighborhood Preserving Hashing for Scalable Video Retrieval	Shuyan Li, Zhixiang Chen, Jiwen Lu, Xiu Li, Jie Zhou	In this paper, we propose a Neighborhood Preserving Hashing (NPH) method for scalable video retrieval in an unsupervised manner.
820	Self-Training With Progressive Augmentation for Unsupervised Cross-Domain Person Re-Identification	Xinyu Zhang, Jiewei Cao, Chunhua Shen, Mingyu You	In this work, we develop a self-training method with progressive augmentation framework (PAST) to promote the model performance progressively on the target dataset.
821	SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects	Xue Yang, Jirui Yang, Junchi Yan, Yue Zhang, Tengfei Zhang, Zhi Guo, Xian Sun, Kun Fu	This paper presents a novel multi-category rotation detector for small, cluttered and rotated objects, namely SCRDet.
822	Cross-X Learning for Fine-Grained Visual Categorization	Wei Luo, Xitong Yang, Xianjie Mo, Yuheng Lu, Larry S. Davis, Jun Li, Jian Yang, Ser-Nam Lim	In this paper, we propose Cross-X learning, a simple yet effective approach that exploits the relationships between different images and between different network layers for robust multi-scale feature learning.
823	Maximum-Margin Hamming Hashing	Rong Kang, Yue Cao, Mingsheng Long, Jianmin Wang, Philip S. Yu	The main idea of this work is to directly embody the Hamming radius into the loss functions, leading to Maximum-Margin Hamming Hashing (MMHH), a new model specifically optimized for Hamming space retrieval.
824	Conservative Wasserstein Training for Pose Estimation	Xiaofeng Liu, Yang Zou, Tong Che, Peng Ding, Ping Jia, Jane You, B.V.K. Vijaya Kumar	We propose to incorporate inter-class correlations in a Wasserstein training framework by pre-defining (i.e., using arc length of a circle) or adaptively learning the ground metric.
825	Learning to Rank Proposals for Object Detection	Zhiyu Tan, Xuecheng Nie, Qi Qian, Nan Li, Hao Li	To address this issue, in this paper, we propose a novel Learning-to-Rank (LTR) model to produce the suppression rank via a learning procedure, thus facilitating the candidate generation and lifting the detection performance.
826	Vehicle Re-Identification With Viewpoint-Aware Metric Learning	Ruihang Chu, Yifan Sun, Yadong Li, Zheng Liu, Chi Zhang, Yichen Wei	Inspired by the behavior in human’s recognition process, we propose a novel viewpoint-aware metric learning approach.
827	WSOD2: Learning Bottom-Up and Top-Down Objectness Distillation for Weakly-Supervised Object Detection	Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao, Lei Zhang	In this paper, we propose a novel WSOD framework with Objectness Distillation (i.e., WSOD2) by designing a tailored training mechanism for weakly-supervised object detection.
828	Localization of Deep Inpainting Using High-Pass Fully Convolutional Network	Haodong Li, Jiwu Huang	This paper presents a method to locate the regions manipulated by deep inpainting.
829	Clustered Object Detection in Aerial Images	Fan Yang, Heng Fan, Peng Chu, Erik Blasch, Haibin Ling	In this paper, we address both issues inspired by observing that these targets are often clustered.
830	Unsupervised Graph Association for Person Re-Identification	Jinlin Wu, Yang Yang, Hao Liu, Shengcai Liao, Zhen Lei, Stan Z. Li	In this paper, we propose an unsupervised graph association (UGA) framework to learn the underlying viewinvariant representations from the video pedestrian tracklets.
831	Learning a Mixture of Granularity-Specific Experts for Fine-Grained Categorization	Lianbo Zhang, Shaoli Huang, Wei Liu, Dacheng Tao	We aim to divide the problem space of fine-grained recognition into some specific regions.
832	advPattern: Physical-World Attacks on Deep Person Re-Identification via Adversarially Transformable Patterns	Zhibo Wang, Siyan Zheng, Mengkai Song, Qian Wang, Alireza Rahimpour, Hairong Qi	We propose a novel attack algorithm, called advPattern, for generating adversarial patterns on clothes, which learns the variations of image pairs across cameras to pull closer the image features from the same camera, while pushing features from different cameras farther.
833	ABD-Net: Attentive but Diverse Person Re-Identification	Tianlong Chen, Shaojin Ding, Jingyi Xie, Ye Yuan, Wuyang Chen, Yang Yang, Zhou Ren, Zhangyang Wang	Specifically, we introduce a pair of complementary attention modules, focusing on channel aggregation and position awareness, respectively.
834	From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer	Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Zhiguo Cao, Chunhua Shen	Inspired by this idea, we propose a simple but effective approach, Spatial Divide-and-Conquer Network (S-DCNet).
835	Towards Precise End-to-End Weakly Supervised Object Detection Network	Ke Yang, Dongsheng Li, Yong Dou	In this paper, we propose to jointly train the two phases in an end-to-end manner to tackle this problem.
836	Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting	Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai	In this paper, we propose a simple yet effective approach to tackle this problem.
837	Ground-to-Aerial Image Geo-Localization With a Hard Exemplar Reweighting Triplet Loss	Sudong Cai, Yulan Guo, Salman Khan, Jiwei Hu, Gongjian Wen	In this paper, we propose a novel in-batch reweighting triplet loss to emphasize the positive effect of hard exemplars during end-to-end training.
838	Learning to Discover Novel Visual Categories via Deep Transfer Clustering	Kai Han, Andrea Vedaldi, Andrew Zisserman	Our contributions are twofold. The first contribution is to extend Deep Embedded Clustering to a transfer learning setting; we also improve the algorithm by introducing a representation bottleneck, temporal ensembling, and consistency. The second contribution is a method to estimate the number of classes in the unlabelled data.
839	AM-LFS: AutoML for Loss Function Search	Chuming Li, Xin Yuan, Chen Lin, Minghao Guo, Wei Wu, Junjie Yan, Wanli Ouyang	In this paper, we propose AutoML for Loss Function Search (AM-LFS) which leverages REINFORCE to search loss functions during the training process.
840	Few-Shot Object Detection via Feature Reweighting	Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell	In this work we develop a few-shot object detector that can learn to detect novel objects from only a few annotated examples.
841	Objects365: A Large-Scale, High-Quality Dataset for Object Detection	Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, Jian Sun	In this paper, we introduce a new large-scale object detection dataset, Objects365, which has 365 object categories over 600K training images.
842	Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network	Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen	In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.
843	Foreground-Aware Pyramid Reconstruction for Alignment-Free Occluded Person Re-Identification	Lingxiao He, Yinggang Wang, Wu Liu, He Zhao, Zhenan Sun, Jiashi Feng	This paper proposes a novel occlusion-robust and alignment-free model for occluded person ReID and extends its application to realistic and crowded scenarios.
844	Collect and Select: Semantic Alignment Metric Learning for Few-Shot Learning	Fusheng Hao, Fengxiang He, Jun Cheng, Lei Wang, Jianzhong Cao, Dacheng Tao	To address this issue, this paper proposes a Semantic Alignment Metric Learning (SAML) method for few-shot learning that aligns the semantically relevant dominant objects through a “collect-and-select” strategy.
845	Bayesian Adaptive Superpixel Segmentation	Roy Uziel, Meitar Ronen, Oren Freifeld	As a remedy, we propose a novel probabilistic model, self-coined Bayesian Adaptive Superpixel Segmentation (BASS), together with an efficient inference.
846	CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing	Kevin Duarte, Yogesh S. Rawat, Mubarak Shah	In this work we propose a capsule-based approach for semi-supervised video object segmentation.
847	BAE-NET: Branched Autoencoder for Shape Co-Segmentation	Zhiqin Chen, Kangxue Yin, Matthew Fisher, Siddhartha Chaudhuri, Hao Zhang	We treat shape co-segmentation as a representation learning problem and introduce BAE-NET, a branched autoencoder network, for the task.
848	VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation	Hsien-Yu Meng, Lin Gao, Yu-Kun Lai, Dinesh Manocha	We present a novel algorithm for point cloud segmentation.Our approach transforms unstructured point clouds into regular voxel grids, and further uses a kernel-based interpolated variational autoencoder (VAE) architecture to encode the local geometry within each voxel.
849	Group-Wise Deep Object Co-Segmentation With Co-Attention Recurrent Neural Network	Bo Li, Zhengxing Sun, Qian Li, Yunjie Wu, Anqi Hu	This paper proposes a novel end-to-end deep learning approach for group-wise object co-segmentation with a recurrent network architecture.
850	Human Attention in Image Captioning: Dataset and Analysis	Sen He, Hamed R. Tavakoli, Ali Borji, Nicolas Pugeault	In this work, we present a novel dataset consisting of eye movements and verbal descriptions recorded synchronously over images.
851	Variational Uncalibrated Photometric Stereo Under General Lighting	Bjoern Haefner, Zhenzhang Ye, Maolin Gao, Tao Wu, Yvain Queau, Daniel Cremers	To eliminate such restrictions, we propose an efficient principled variational approach to uncalibrated PS under general illumination.
852	SPLINE-Net: Sparse Photometric Stereo Through Lighting Interpolation and Normal Estimation Networks	Qian Zheng, Yiming Jia, Boxin Shi, Xudong Jiang, Ling-Yu Duan, Alex C. Kot	This paper solves the Sparse Photometric stereo through Lighting Interpolation and Normal Estimation using a generative Network (SPLINE-Net).
853	Hyperspectral Image Reconstruction Using Deep External and Internal Learning	Tao Zhang, Ying Fu, Lizhi Wang, Hua Huang	In this paper, we present an effective convolutional neural network (CNN) based method for coded HSI reconstruction, which learns the deep prior from the external dataset as well as the internal information of input coded image with spatial-spectral constraint.
854	Gravity as a Reference for Estimating a Person’s Height From Video	Didier Bieler, Semih Gunel, Pascal Fua, Helge Rhodin	We focus on motion cues and exploit gravity on earth as an omnipresent reference ‘object’ to translate acceleration, and subsequently height, measured in image-pixels to values in meters.
855	Shadow Removal via Shadow Image Decomposition	Hieu Le, Dimitris Samaras	We propose a novel deep learning method for shadow removal. Moreover, we create an augmented ISTD dataset based on an image decomposition system by modifying the shadow parameters to generate new synthetic shadow images.
856	OperatorNet: Recovering 3D Shapes From Difference Operators	Ruqi Huang, Marie-Julie Rakotosaona, Panos Achlioptas, Leonidas J. Guibas, Maks Ovsjanikov	This paper proposes a learning-based framework for reconstructing 3D shapes from functional operators, compactly encoded as small-sized matrices.
857	Neural Inverse Rendering of an Indoor Scene From a Single Image	Soumyadip Sengupta, Jinwei Gu, Kihwan Kim, Guilin Liu, David W. Jacobs, Jan Kautz	Our key contribution is the Residual Appearance Renderer (RAR), which can be trained to synthesize complex appearance effects (e.g., inter-reflection, cast shadows, near-field illumination, and realistic shading), which would be neglected otherwise.
858	ForkNet: Multi-Branch Volumetric Semantic Completion From a Single Depth Image	Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari	We propose a novel model for 3D semantic completion from a single depth image, based on a single encoder and three separate generators used to reconstruct different geometric and semantic representations of the original and completed scene, all sharing the same latent space.
859	Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments	Junsheng Zhou, Yuwang Wang, Kaihuai Qin, Wenjun Zeng	To overcome these problems, we propose a new optical-flow based training paradigm which reduces the difficulty of unsupervised learning by providing a clearer training target and handles the non-texture regions.
860	GraphX-Convolution for Point Cloud Deformation in 2D-to-3D Conversion	Anh-Duc Nguyen, Seonghwa Choi, Woojae Kim, Sanghoon Lee	In this paper, we present a novel deep method to reconstruct a point cloud of an object from a single still image.
861	FrameNet: Learning Local Canonical Frames of 3D Surfaces From a Single RGB Image	Jingwei Huang, Yichao Zhou, Thomas Funkhouser, Leonidas J. Guibas	In this work, we introduce the novel problem of identifying dense canonical 3D coordinate frames from a single RGB image.
862	Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense	Yixin Chen, Siyuan Huang, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu	We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic scene parsing and reconstruction—3D estimations of object bounding boxes, camera pose, and room layout, and (ii) 3D human pose estimation.
863	MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding	Quan Kong, Ziming Wu, Ziwei Deng, Martin Klinkigt, Bin Tong, Tomokazu Murakami	On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism to realize an adaptive knowledge transfer from sensor-based modalities to vision-based modalities. To address the disadvantage of vision-based modalities and push towards multi/cross modal action understanding, this paper introduces a new large-scale dataset recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal.
864	HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization	Hang Zhao, Antonio Torralba, Lorenzo Torresani, Zhicheng Yan	This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos.
865	3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization	Sanath Narayan, Hisham Cholakkal, Fahad Shahbaz Khan, Ling Shao	In this work, we propose a framework, called 3C-Net, which only requires video-level supervision (weak supervision) in the form of action category labels and the corresponding count.
866	Grounded Human-Object Interaction Hotspots From Video	Tushar Nagarajan, Christoph Feichtenhofer, Kristen Grauman	We propose an approach to learn human-object interaction “hotspots” directly from video.
867	Hallucinating IDT Descriptors and I3D Optical Flow Features for Action Recognition With CNNs	Lei Wang, Piotr Koniusz, Du Q. Huynh	In this paper, we revive the use of old-fashioned handcrafted video representations for action recognition and put new life into these techniques via a CNN-based hallucination step.
868	Learning to Paint With Model-Based Deep Reinforcement Learning	Zhewei Huang, Wen Heng, Shuchang Zhou	We show how to teach machines to paint like human painters, who can use a small number of strokes to create fantastic paintings.
869	Neural Re-Simulation for Generating Bounces in Single Images	Carlo Innamorati, Bryan Russell, Danny M. Kaufman, Niloy J. Mitra	We introduce a method to generate videos of dynamic virtual objects plausibly interacting via collisions with a still image’s environment.
870	Deep Appearance Maps	Maxim Maximov, Laura Leal-Taixe, Mario Fritz, Tobias Ritschel	We propose a deep representation of appearance, i.e. the relation of color, surface orientation, viewer position, material and illumination.
871	GarNet: A Two-Stream Network for Fast and Accurate 3D Cloth Draping	Erhan Gundogdu, Victor Constantin, Amrollah Seifoddini, Minh Dang, Mathieu Salzmann, Pascal Fua	Taking advantage of this, we propose a novel architecture to fit a 3D garment template to a 3D body.
872	Joint Embedding of 3D Scan and CAD Objects	Manuel Dahnert, Angela Dai, Leonidas J. Guibas, Matthias Niessner	We propose a novel approach to learn a joint embedding space between scan and CAD geometry, where semantically similar objects from both domains lie close together.
873	CompoNet: Learning to Generate the Unseen by Part Synthesis and Composition	Nadav Schor, Oren Katzir, Hao Zhang, Daniel Cohen-Or	In our work, we present CompoNet, a generative neural network for 2D or 3D shapes that is based on a part-based prior, where the key idea is for the network to synthesize shapes by varying both the shape parts and their compositions.
874	DDSL: Deep Differentiable Simplex Layer for Learning Geometric Signals	Chiyu “Max” Jiang, Dana Lansigan, Philip Marcus, Matthias Niessner	We present a complete theoretical framework for the process as well as an efficient backpropagation algorithm.
875	EGNet: Edge Guidance Network for Salient Object Detection	Jia-Xing Zhao, Jiang-Jiang Liu, Deng-Ping Fan, Yang Cao, Jufeng Yang, Ming-Ming Cheng	In this paper, to solve this problem, we focus on the complementarity between salient edge information and salient object information.
876	SID4VAM: A Benchmark Dataset With Synthetic Images for Visual Attention Modeling	David Berga, Xose R. Fdez-Vidal, Xavier Otazu, Xose M. Pardo	This study proposes a new way to evaluate saliency models in the forthcoming literature, accounting for synthetic images with uniquely low-level feature contexts, distinct from previous eye tracking image datasets.
877	Two-Stream Action Recognition-Oriented Video Super-Resolution	Haochen Zhang, Dong Liu, Zhiwei Xiong	Tailored for two-stream action recognition networks, we propose two video SR methods for the spatial and temporal streams respectively.
878	Where Is My Mirror?	Xin Yang, Haiyang Mei, Ke Xu, Xiaopeng Wei, Baocai Yin, Rynson W.H. Lau	In this paper, we present a novel method to segment mirrors from an input image. First, we construct a large-scale mirror dataset that contains mirror images with corresponding manually annotated masks.
879	Disentangled Image Matting	Shaofan Cai, Xiaoshuai Zhang, Haoqiang Fan, Haibin Huang, Jiangyu Liu, Jiaming Liu, Jiaying Liu, Jue Wang, Jian Sun	We propose AdaMatting, a new end-to-end matting framework that disentangles this problem into two sub-tasks: trimap adaptation and alpha estimation.
880	Guided Super-Resolution As Pixel-to-Pixel Transformation	Riccardo de Lutio, Stefano D’Aronco, Jan Dirk Wegner, Konrad Schindler	Here, we propose to turn that interpretation on its head and instead see it as a pixel-to-pixel mapping of the guide image to the domain of the source image.
881	Deep Learning for Light Field Saliency Detection	Tiantian Wang, Yongri Piao, Xiao Li, Lihe Zhang, Huchuan Lu	To address this, we introduce a new dataset to assist the subsequent research in 4D light field saliency detection.
882	Optimizing the F-Measure for Threshold-Free Salient Object Detection	Kai Zhao, Shanghua Gao, Wenguan Wang, Ming-Ming Cheng	In this paper, we investigate an interesting issue: can we consistently use the F-measure formulation in both training and evaluation for SOD?
883	Image Inpainting With Learnable Bidirectional Attention Maps	Chaohao Xie, Shaohui Liu, Chao Li, Ming-Ming Cheng, Wangmeng Zuo, Xiao Liu, Shilei Wen, Errui Ding	In this paper, we present a learnable attention map module for learning feature re-normalization and mask-updating in an end-to-end manner, which is effective in adapting to irregular holes and propagation of convolution layers.
884	Joint Demosaicking and Denoising by Fine-Tuning of Bursts of Raw Images	Thibaud Ehret, Axel Davy, Pablo Arias, Gabriele Facciolo	In this paper we present a method to learn demosaicking directly from mosaicked images, without requiring ground truth RGB data.
885	DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better	Orest Kupyn, Tetiana Martyniuk, Junru Wu, Zhangyang Wang	We present a new end-to-end generative adversarial network (GAN) for single image motion deblurring, named DeblurGAN-V2, which considerably boosts state-of-the-art deblurring performance while being much more flexible and efficient.
886	Reflective Decoding Network for Image Captioning	Lei Ke, Wenjie Pei, Ruiyu Li, Xiaoyong Shen, Yu-Wing Tai	Following the conventional encoder-decoder framework, we propose the Reflective Decoding Network (RDN) for image captioning, which enhances both the long-sequence dependency and position perception of words in a caption decoder.
887	Joint Optimization for Cooperative Image Captioning	Gilad Vered, Gal Oren, Yuval Atzmon, Gal Chechik	To address these challenges, we present an effective optimization technique based on partial-sampling from a multinomial distribution combined with straight-through gradient updates, which we name PSST for Partial-Sampling Straight-Through.
888	Watch, Listen and Tell: Multi-Modal Weakly Supervised Dense Event Captioning	Tanzila Rahman, Bicheng Xu, Leonid Sigal	In this paper, we present the evidence, that audio signals can carry surprising amount of information when it comes to high-level visual-lingual tasks.
889	Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning	Jingyi Hou, Xinxiao Wu, Wentian Zhao, Jiebo Luo, Yunde Jia	We propose a novel video captioning approach that takes into account both visual perception and syntax representation learning to generate accurate descriptions of videos.
890	Entangled Transformer for Image Captioning	Guang Li, Linchao Zhu, Ping Liu, Yi Yang	In this paper, we investigate a Transformer-based sequence modeling framework, built only with attention layers and feedforward layers.
891	Shapeglot: Learning Language for Shape Differentiation	Panos Achlioptas, Judy Fan, Robert Hawkins, Noah Goodman, Leonidas J. Guibas	In this work we explore how fine-grained differences between the shapes of common objects are expressed in language, grounded on 2D and/or 3D object representations. We first build a large scale, carefully controlled dataset of human utterances each of which refers to a 2D rendering of a 3D CAD model so as to distinguish it from a set of shape-wise similar alternatives.
892	nocaps: novel object captioning at scale	Harsh Agrawal, Karan Desai, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson	To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.
893	Fully Convolutional Geometric Features	Christopher Choy, Jaesik Park, Vladlen Koltun	In this work, we present fully-convolutional geometric features, computed in a single pass by a 3D fully-convolutional network.
894	Learning Local RGB-to-CAD Correspondences for Object Pose Estimation	Georgios Georgakis, Srikrishna Karanam, Ziyan Wu, Jana Kosecka	In this paper, we solve this key problem of existing methods requiring expensive 3D pose annotations by proposing a new method that matches RGB images to CAD models for object pose estimation.
895	Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras	Ariel Gordon, Hanhan Li, Rico Jonschkowski, Anelia Angelova	We present a novel method for simultaneous learning of depth, egomotion, object motion, and camera intrinsics from monocular videos, using only consistency across neighboring video frames as supervision signal.
896	OmniMVS: End-to-End Learning for Omnidirectional Stereo Matching	Changhee Won, Jongbin Ryu, Jongwoo Lim	In this paper, we propose a novel end-to-end deep neural network model for omnidirectional depth estimation from a wide-baseline multi-view stereo setup. In addition, we present large-scale synthetic datasets for training and testing omnidirectional multi-view stereo algorithms.
897	On the Over-Smoothing Problem of CNN Based Disparity Estimation	Chuangrong Chen, Xiaozhi Chen, Hui Cheng	Based on this observation, we propose a single-modal weighted average operation on the probability distribution during inference, which can alleviate the problem effectively.
898	Disentangling Propagation and Generation for Video Prediction	Hang Gao, Huazhe Xu, Qi-Zhi Cai, Ruth Wang, Fisher Yu, Trevor Darrell	In this paper, we describe a computational model for high-fidelity video prediction which disentangles motion-specific propagation from motion-agnostic generation.
899	Guided Image-to-Image Translation With Bi-Directional Feature Transformation	Badour AlBahar, Jia-Bin Huang	To better utilize the constraints of the guidance image, we present a bi-directional feature transformation (bFT) scheme.
900	Towards Multi-Pose Guided Virtual Try-On Network	Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bochao Wang, Hanjiang Lai, Jia Zhu, Zhiting Hu, Jian Yin	This paper makes the first attempt towards a multi-pose guided virtual try-on system, which enables clothes to transfer onto a person with diverse poses.
901	Photorealistic Style Transfer via Wavelet Transforms	Jaejun Yoo, Youngjung Uh, Sanghyuk Chun, Byeongkyu Kang, Jung-Woo Ha	We introduce a theoretically sound correction to the network architecture that remarkably enhances photorealism and faithfully transfers the style.
902	Personalized Fashion Design	Cong Yu, Yang Hu, Yan Chen, Bing Zeng	In this work, we propose to automatically synthesis new items for recommendation.
903	Tag2Pix: Line Art Colorization Using Text Tag With SECat and Changing Loss	Hyunsu Kim, Ho Young Jhoo, Eunhyeok Park, Sungjoo Yoo	A GAN approach is proposed, called Tag2Pix, of line art colorization which takes as input a grayscale line art and color tag information and produces a quality colored image.
904	Free-Form Video Inpainting With 3D Gated Convolution and Temporal PatchGAN	Ya-Liang Chang, Zhe Yu Liu, Kuan-Ying Lee, Winston Hsu	In this paper, we introduce a deep learning based free-form video inpainting model, with proposed 3D gated convolutions to tackle the uncertainty of free-form masks and a novel Temporal PatchGAN loss to enhance temporal consistency.
905	TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting	Wei Feng, Wenhao He, Fei Yin, Xu-Yao Zhang, Cheng-Lin Liu	In this paper, we propose a novel text spotting framework to detect and recognize text of arbitrary shapes in an end-to-end manner, using only word/line-level annotations for training.
906	Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning	Yipeng Sun, Jiaming Liu, Wei Liu, Junyu Han, Errui Ding, Jingtuo Liu	To address this issue, we introduce a new large-scale text reading benchmark dataset named Chinese Street View Text (C-SVT) with 430,000 street view images, which is at least 14 times as large as the existing Chinese text reading benchmarks.
907	Deep Floor Plan Recognition Using a Multi-Task Network With Room-Boundary-Guided Attention	Zhiliang Zeng, Xianzhi Li, Ying Kin Yu, Chi-Wing Fu	This paper presents a new approach to recognize elements in floor plan layouts.
908	GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition	Fangneng Zhan, Chuhui Xue, Shijian Lu	This paper presents an innovative Geometry-Aware Domain Adaptation Network (GA-DAN) that is capable of modelling cross-domain shifts concurrently in both geometry space and appearance space and realistically converting images across domains with very different characteristics.
909	Large-Scale Tag-Based Font Retrieval With Generative Feature Learning	Tianlang Chen, Zhaowen Wang, Ning Xu, Hailin Jin, Jiebo Luo	In this paper, we address the problem of large-scale tag-based font retrieval which aims to bring semantics to the font selection process and enable people without expert knowledge to use fonts effectively.
910	Convolutional Character Networks	Linjie Xing, Zhi Tian, Weilin Huang, Matthew R. Scott	In this work, we propose convolutional character networks, referred as CharNet, which is an one-stage model that can process two tasks simultaneously in one pass.
911	Geometry Normalization Networks for Accurate Scene Text Detection	Youjiang Xu, Jiaqi Duan, Zhanghui Kuang, Xiaoyu Yue, Hongbin Sun, Yue Guan, Wayne Zhang	In this work, we first conduct experiments to investigate the capacity of networks for learning geometry variances on detecting scene texts, and find that networks can handle only limited text geometry variances. Then, we put forward a novel Geometry Normalization Module (GNM) with multiple branches, each of which is composed of one Scale Normalization Unit and one Orientation Normalization Unit, to normalize each text instance to one desired canonical geometry range through at least one branch.
912	Symmetry-Constrained Rectification Network for Scene Text Recognition	Mingkun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai	To tackle this issue, we propose in this paper a Symmetry-constrained Rectification Network (ScRN) based on local attributes of text instances, such as center line, scale and orientation.
913	YOLACT: Real-Time Instance Segmentation	Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee	We present a simple, fully-convolutional model for real-time instance segmentation that achieves 29.8 mAP on MS COCO at 33.5 fps evaluated on a single Titan Xp, which is significantly faster than any previous competitive approach.
914	Expectation-Maximization Attention Networks for Semantic Segmentation	Xia Li, Zhisheng Zhong, Jianlong Wu, Yibo Yang, Zhouchen Lin, Hong Liu	In this paper, we formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed.
915	Multi-Class Part Parsing With Joint Boundary-Semantic Awareness	Yifan Zhao, Jia Li, Yu Zhang, Yonghong Tian	In this paper, we propose a joint parsing framework with boundary and semantic awareness to address this challenging problem.
916	Explaining Neural Networks Semantically and Quantitatively	Runjin Chen, Hao Chen, Jie Ren, Ge Huang, Quanshi Zhang	This paper presents a method to pursue a semantic and quantitative explanation for the knowledge encoded in a convolutional neural network (CNN).
917	PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment	Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, Jiashi Feng	In this paper, we tackle the challenging few-shot segmentation problem from a metric learning perspective and present PANet, a novel prototype alignment network to better utilize the information of the support set.
918	ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors	Weicheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin	We introduce ShapeMask, which learns the intermediate concept of object shape to address the problem of generalization in instance segmentation to novel categories.
919	Sequence Level Semantics Aggregation for Video Object Detection	Haiping Wu, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang	In this work, we argue that aggregating features in the full-sequence level will lead to more discriminative and robust features for video object detection.
920	Video Object Segmentation Using Space-Time Memory Networks	Seoung Wug Oh, Joon-Young Lee, Ning Xu, Seon Joo Kim	We propose a novel solution for semi-supervised video object segmentation.
921	Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks	Wenguan Wang, Xiankai Lu, Jianbing Shen, David J. Crandall, Ling Shao	This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS).
922	MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences	Xingyu Liu, Mengyuan Yan, Jeannette Bohg	We propose a novel neural network architecture called MeteorNet for learning representations for dynamic 3D point cloud sequences.
923	3D Instance Segmentation via Multi-Task Metric Learning	Jean Lahoud, Bernard Ghanem, Marc Pollefeys, Martin R. Oswald	We propose a novel method for instance label segmentation of dense 3D voxel grids.
924	DeepGCNs: Can GCNs Go As Deep As CNNs?	Guohao Li, Matthias Muller, Ali Thabet, Bernard Ghanem	In this work, we present new ways to successfully train very deep GCNs.
925	Deep Hough Voting for 3D Object Detection in Point Clouds	Charles R. Qi, Or Litany, Kaiming He, Leonidas J. Guibas	In this work, we return to first principles to construct a 3D detection pipeline for point cloud data and as generic as possible.
926	M3D-RPN: Monocular 3D Region Proposal Network for Object Detection	Garrick Brazil, Xiaoming Liu	We propose to reduce the gap by reformulating the monocular 3D detection problem as a standalone 3D region proposal network.
927	SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences	Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, Jurgen Gall	In this paper, we introduce a large dataset to propel research on laser-based semantic segmentation.
928	WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving	Senthil Yogamani, Ciaran Hughes, Jonathan Horgan, Ganesh Sistu, Padraig Varley, Derek O’Dea, Michal Uricar, Stefan Milz, Martin Simon, Karl Amende, Christian Witt, Hazem Rashed, Sumanth Chennupati, Sanjaya Nayak, Saquib Mansoor, Xavier Perrotton, Patrick Perez	We release the first extensive fisheye automotive dataset, WoodScape, named after Robert Wood who invented the fisheye camera in 1906.
929	Scalable Place Recognition Under Appearance Change for Autonomous Driving	Anh-Dzung Doan, Yasir Latif, Tat-Jun Chin, Yu Liu, Thanh-Toan Do, Ian Reid	To this end, we propose a novel place recognition technique that can be efficiently retrained and compressed, such that the recognition of new queries can exploit all available data (including recent changes) without suffering from visible growth in computational cost.
930	Exploring the Limitations of Behavior Cloning for Autonomous Driving	Felipe Codevilla, Eder Santana, Antonio M. Lopez, Adrien Gaidon	In this paper, we propose a new benchmark to experimentally investigate the scalability and limitations of behavior cloning.
931	Habitat: A Platform for Embodied AI Research	Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra	We present Habitat, a platform for research in embodied artificial intelligence (AI).
932	Towards Interpretable Face Recognition	Bangjie Yin, Luan Tran, Haoxiang Li, Xiaohui Shen, Xiaoming Liu	In this work, focusing on a specific area of visual recognition, we report our efforts towards interpretable face recognition.
933	Co-Mining: Deep Face Recognition With Noisy Labels	Xiaobo Wang, Shuo Wang, Jun Wang, Hailin Shi, Tao Mei	To address this issue, this paper develops a novel co-mining strategy to effectively train on the datasets with noisy labels.
934	Few-Shot Adaptive Gaze Estimation	Seonwook Park, Shalini De Mello, Pavlo Molchanov, Umar Iqbal, Otmar Hilliges, Jan Kautz	We embrace these challenges and propose a novel framework for Few-shot Adaptive GaZE Estimation (Faze) for learning person-specific gaze networks with very few (<= 9) calibration samples.
935	Live Face De-Identification in Video	Oran Gafni, Lior Wolf, Yaniv Taigman	We propose a method for face de-identification that enables fully automatic video modification at high frame rates.
936	Face Video Deblurring Using 3D Facial Priors	Wenqi Ren, Jiaolong Yang, Senyou Deng, David Wipf, Xiaochun Cao, Xin Tong	In this paper we propose a novel face video deblurring network capitalizing on 3D facial priors.
937	Semi-Supervised Monocular 3D Face Reconstruction With End-to-End Shape-Preserved Domain Transfer	Jingtan Piao, Chen Qian, Hongsheng Li	To tackle this problem, we propose a semi-supervised monocular reconstruction method, which jointly optimizes a shape-preserved domain-transfer CycleGAN and a shape estimation network.
938	3D Face Modeling From Diverse Raw Scan Data	Feng Liu, Luan Tran, Xiaoming Liu	To address these problems, this paper proposes an innovative framework to jointly learn a nonlinear face model from a diverse set of raw 3D scan databases and establish dense point-to-point correspondence among their scans.
939	A Decoupled 3D Facial Shape Model by Adversarial Training	Victoria Fernandez Abrevaya, Adnane Boukhayma, Stefanie Wuhrer, Edmond Boyer	In this work, we explore a new direction with Generative Adversarial Networks and show that they contribute to better face modeling performances, especially in decoupling natural factors, while also achieving more diverse samples.
940	Photo-Realistic Facial Details Synthesis From Single Image	Anpei Chen, Zhang Chen, Guli Zhang, Kenny Mitchell, Jingyi Yu	We present a single-image 3D face synthesis technique that can handle challenging facial expressions while recovering fine geometric details.
941	S2GAN: Share Aging Factors Across Ages and Share Aging Trends Among Individuals	Zhenliang He, Meina Kan, Shiguang Shan, Xilin Chen	Following this biological principle, in this work, we propose an effective and efficient method to simulate natural aging.
942	PuppetGAN: Cross-Domain Image Manipulation by Demonstration	Ben Usman, Nick Dufour, Kate Saenko, Chris Bregler	In this work we propose a model that can manipulate individual visual attributes of objects in a real scene using examples of how respective attribute manipulations affect the output of a simulation.
943	Few-Shot Adversarial Learning of Realistic Neural Talking Head Models	Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky	However, in many practical scenarios, such personalized talking head models need to be learned from a few image views of a person, potentially even a single image. Here, we present a system with such few-shot capability.
944	Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection	Bo Wan, Desen Zhou, Yongfei Liu, Rongjie Li, Xuming He	To address those challenges, we propose a multi-level relation detection strategy that utilizes human pose cues to capture global spatial configurations of relations and as an attention mechanism to dynamically zoom into relevant regions at human part level.
945	TRB: A Novel Triplet Representation for Understanding 2D Human Body	Haodong Duan, Kwan-Yee Lin, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang	In this paper, we propose the Triplet Representation for Body (TRB) — a compact 2D human body representation, with skeleton keypoints capturing human pose information and contour keypoints containing human shape information.
946	Learning Trajectory Dependencies for Human Motion Prediction	Wei Mao, Miaomiao Liu, Mathieu Salzmann, Hongdong Li	In this paper, we propose a simple feed-forward deep network for motion prediction, which takes into account both temporal smoothness and spatial dependencies among human body joints.
947	Cross-Domain Adaptation for Animal Pose Estimation	Jinkun Cao, Hongyang Tang, Hao-Shu Fang, Xiaoyong Shen, Cewu Lu, Yu-Wing Tai	In this paper, we are interested in pose estimation of animals.
948	NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection	Jiyang Gao, Jiang Wang, Shengyang Dai, Li-Jia Li, Ram Nevatia	To reduce the dependence on expensive bounding box annotations, we propose a new semi-supervised object detection formulation, in which a few seed box level annotations and a large scale of image level annotations are used to train the detector.
949	Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy	Qing Yu, Kiyoharu Aizawa	In this work, we propose a two-head deep convolutional neural network (CNN) and maximize the discrepancy between the two classifiers to detect OOD inputs.
950	SBSGAN: Suppression of Inter-Domain Background Shift for Person Re-Identification	Yan Huang, Qiang Wu, JingSong Xu, Yi Zhong	In this paper, we formulate such problems as a background shift problem.
951	Enriched Feature Guided Refinement Network for Object Detection	Jing Nie, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao	We propose a single-stage detection framework that jointly tackles the problem of multi-scale object detection and class imbalance.
952	Deep Meta Metric Learning	Guangyi Chen, Tianren Zhang, Jiwen Lu, Jie Zhou	In this paper, we present a deep meta metric learning (DMML) approach for visual recognition.
953	Discriminative Feature Transformation for Occluded Pedestrian Detection	Chunluan Zhou, Ming Yang, Junsong Yuan	In this paper, we propose a discriminative feature transformation which en- forces feature separability of pedestrian and non-pedestrian examples to handle occlusions for pedestrian detection.
954	Contextual Attention for Hand Detection in the Wild	Supreeth Narasimhaswamy, Zhengwei Wei, Yang Wang, Justin Zhang, Minh Hoai	We present Hand-CNN, a novel convolutional network architecture for detecting hand masks and predicting hand orientations in unconstrained images. We also introduce large-scale annotated hand datasets containing hands in unconstrained images for training and evaluation.
955	Meta R-CNN: Towards General Solver for Instance-Level Low-Shot Learning	Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xiaodan Liang, Liang Lin	In this work, we present a flexible and general methodology to achieve these tasks.
956	Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation	Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, Rui Yao	In this paper, we propose to model structured segmentation data with graphs and apply attentive graph reasoning to propagate label information from support data to query data.
957	Presence-Only Geographical Priors for Fine-Grained Image Classification	Oisin Mac Aodha, Elijah Cole, Pietro Perona	We propose an efficient spatio-temporal prior, that when conditioned on a geographical location and time, estimates the probability that a given object category occurs at that location.
958	POD: Practical Object Detection With Scale-Sensitive Network	Junran Peng, Ming Sun, Zhaoxiang Zhang, Tieniu Tan, Junjie Yan	For fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into combinations of fixed integral scales for each convolution filter, which exploit the dilated convolution.
959	Human Uncertainty Makes Classification More Robust	Joshua C. Peterson, Ruairidh M. Battleday, Thomas L. Griffiths, Olga Russakovsky	In this paper, we make progress on this problem by training with full label distributions that reflect human perceptual uncertainty. We first present a new benchmark dataset which we call CIFAR10H, containing a full distribution of human labels for each image of the CIFAR10 test set.
960	FCOS: Fully Convolutional One-Stage Object Detection	Zhi Tian, Chunhua Shen, Hao Chen, Tong He	We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation.
961	Self-Critical Attention Learning for Person Re-Identification	Guangyi Chen, Chunze Lin, Liangliang Ren, Jiwen Lu, Jie Zhou	In this paper, we propose a self-critical attention learning method for person re-identification.
962	Temporal Knowledge Propagation for Image-to-Video Person Re-Identification	Xinqian Gu, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen	To solve this problem, we propose a novel Temporal Knowledge Propagation (TKP) method which propagates the temporal knowledge learned by the video representation network to the image representation network.
963	RepPoints: Point Set Representation for Object Detection	Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, Stephen Lin	In this paper, we present RepPoints (representative points), a new finer representation of objects as a set of sample points useful for both localization and recognition.
964	SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering	Haonan Luo, Guosheng Lin, Zichuan Liu, Fayao Liu, Zhenmin Tang, Yazhou Yao	To tackle these problems, we propose a segmentation based visual attention mechanism for Embodied Question Answering.
965	No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques	Tanmay Gupta, Alexander Schwing, Derek Hoiem	We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches.
966	Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection	Keren Ye, Mingda Zhang, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse Berent	Instead, we show how to squeeze the most information out of these captions by training a text-only classifier that generalizes beyond dataset boundaries. Our discovery provides an opportunity for learning detection models from noisy but more abundant and freely-available caption data.
967	No Fear of the Dark: Image Retrieval Under Varying Illumination Conditions	Tomas Jenicek, Ondrej Chum	We propose a learnable normalisation based on the U-Net architecture, which is trained on a combination of single-camera multi-exposure images and a newly constructed collection of similar views of landmarks during day and night.
968	Hierarchical Shot Detector	Jiale Cao, Yanwei Pang, Jungong Han, Xuelong Li	To solve the first problem, a novel reg-offset-cls (ROC) module is proposed. It contains three hierarchical steps: box regression, the feature sampling location predication, and the regressed box classification with the features of offset locations. To further solve the second problem, a hierarchical shot detector (HSD) is proposed, which stacks two ROC modules and one feature enhanced module.
969	Few-Shot Learning With Global Class Representations	Aoxue Li, Tiange Luo, Tao Xiang, Weiran Huang, Liwei Wang	In this paper, we propose to tackle the challenging few-shot learning (FSL) problem by learning global class representations using both base and novel class training samples.
970	Better to Follow, Follow to Be Better: Towards Precise Supervision of Feature Super-Resolution for Small Object Detection	Junhyug Noh, Wonho Bae, Wonhee Lee, Jinhwan Seo, Gunhee Kim	We propose a novel feature-level super-resolution approach that not only correctly addresses these two desiderata but also is integrable with any proposal-based detectors with feature pooling.
971	Weakly Supervised Object Detection With Segmentation Collaboration	Xiaoyan Li, Meina Kan, Shiguang Shan, Xilin Chen	To obtain a more accurate detector, in this work we propose a novel end-to-end weakly supervised detection approach, where a newly introduced generative adversarial segmentation module interacts with the conventional detection module in a collaborative loop.
972	AutoFocus: Efficient Multi-Scale Inference	Mahyar Najibi, Bharat Singh, Larry S. Davis	This paper describes AutoFocus, an efficient multi-scale inference algorithm for deep-learning based object detectors.
973	Leveraging Long-Range Temporal Relationships Between Proposals for Video Object Detection	Mykhailo Shvets, Wei Liu, Alexander C. Berg	In this paper, we present a light-weight modification to a single-frame detector that accounts for arbitrary long dependencies in a video.
974	Transferable Contrastive Network for Generalized Zero-Shot Learning	Huajie Jiang, Ruiping Wang, Shiguang Shan, Xilin Chen	To tackle such problem, we propose a novel Transferable Contrastive Network (TCN) that explicitly transfers knowledge from the source classes to the target classes.
975	Fast Point R-CNN	Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia	We present a unified, efficient and effective framework for point-cloud based 3D object detection.
976	Mesh R-CNN	Georgia Gkioxari, Jitendra Malik, Justin Johnson	We propose a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object.
977	Deep Supervised Hashing With Anchor Graph	Yudong Chen, Zhihui Lai, Yujuan Ding, Kaiyi Lin, Wai Keung Wong	To address these problems, this paper proposes an interesting regularized deep model to seamlessly integrate the advantages of deep hashing and efficient binary code learning by using the anchor graph.
978	Detecting 11K Classes: Large Scale Object Detection Without Fine-Grained Bounding Boxes	Hao Yang, Hao Wu, Hao Chen	In this paper, we propose a semi-supervised large scale fine-grained detection method, which only needs bounding box annotations of a smaller number of coarse-grained classes and image-level labels of large scale fine-grained classes, and can detect all classes at nearly fully-supervised accuracy.
979	Re-ID Driven Localization Refinement for Person Search	Chuchu Han, Jiacheng Ye, Yunshan Zhong, Xin Tan, Chi Zhang, Changxin Gao, Nong Sang	To alleviate this issue, we propose a re-ID driven localization refinement framework for providing the refined detection boxes for person search.
980	Hierarchical Encoding of Sequential Data With Compact and Sub-Linear Storage Cost	Huu Le, Ming Xu, Tuan Hoang, Michael Milford	To address these limitations, in this paper we present a totally new hierarchical encoding approach that enables a sub-linear storage scale.
981	C-MIDN: Coupled Multiple Instance Detection Network With Segmentation Guidance for Weakly Supervised Object Detection	Yan Gao, Boxiao Liu, Nan Guo, Xiaochun Ye, Fang Wan, Haihang You, Dongrui Fan	In this paper, we propose a novel Coupled Multiple Instance Detection Network (C-MIDN) to address this problem.
982	Learning Feature-to-Feature Translator by Alternating Back-Propagation for Generative Zero-Shot Learning	Yizhe Zhu, Jianwen Xie, Bingchen Liu, Ahmed Elgammal	We conduct extensive comparisons with existing generative ZSL methods on five benchmarks, demonstrating the superiority of our method in not only ZSL performance but also convergence speed and computational cost.
983	Deep Constrained Dominant Sets for Person Re-Identification	Leulseged Tesfaye Alemu, Marcello Pelillo, Mubarak Shah	In this work, we propose an end-to-end constrained clustering scheme to tackle the person re-identification (re-id) problem.
984	Invariant Information Clustering for Unsupervised Image Classification and Segmentation	Xu Ji, Joao F. Henriques, Andrea Vedaldi	We present a novel clustering objective that learns a neural network classifier from scratch, given only unlabelled data samples.
985	Subspace Structure-Aware Spectral Clustering for Robust Subspace Clustering	Masataka Yamaguchi, Go Irie, Takahito Kawanishi, Kunio Kashino	In this paper, we propose a novel graph clustering framework for robust subspace clustering.
986	Order-Preserving Wasserstein Discriminant Analysis	Bing Su, Jiahuan Zhou, Ying Wu	This paper presents a linear method, namely Order-preserving Wasserstein Discriminant Analysis (OWDA), which learns the projection by maximizing the inter-class distance and minimizing the intra-class scatter.
987	LayoutVAE: Stochastic Scene Layout Generation From a Label Set	Akash Abdu Jyothi, Thibaut Durand, Jiawei He, Leonid Sigal, Greg Mori	We propose LayoutVAE, a variational autoencoder based framework for generating stochastic scene layouts.
988	Robust Variational Bayesian Point Set Registration	Jie Zhou, Xinke Ma, Li Liang, Yang Yang, Shijin Xu, Yuhe Liu, Sim-Heng Ong	In this work, we propose a hierarchical Bayesian network based point set registration method to solve missing correspondences and various massive outliers.
989	Is an Affine Constraint Needed for Affine Subspace Clustering?	Chong You, Chun-Guang Li, Daniel P. Robinson, Rene Vidal	This paper shows, both theoretically and empirically, that when the dimension of the ambient space is high relative to the sum of the dimensions of the affine subspaces, the affine constraint has a negligible effect on clustering performance.
990	Meta-Learning to Detect Rare Objects	Yu-Xiong Wang, Deva Ramanan, Martial Hebert	Our key insight is to disentangle the learning of category-agnostic and category-specific components in a CNN based detection model.
991	New Convex Relaxations for MRF Inference With Unknown Graphs	Zhenhua Wang, Tong Liu, Qinfeng Shi, M. Pawan Kumar, Jianhua Zhang	We propose two novel relaxations for solving this problem. The first is a linear programming (LP) relaxation, which is provably tighter than the existing LP relaxation. The second is a non-convex quadratic programming (QP) relaxation, which admits an efficient concave-convex procedure (CCCP).
992	Cluster Alignment With a Teacher for Unsupervised Domain Adaptation	Zhijie Deng, Yucen Luo, Jun Zhu	In this paper, we propose Cluster Alignment with a Teacher (CAT) for unsupervised domain adaptation, which can effectively incorporate the discriminative clustering structures in both domains for better adaptation.
993	Analyzing the Variety Loss in the Context of Probabilistic Trajectory Prediction	Luca Anthony Thiede, Pratik Prabhanjan Brahma	In this work, we present a proof to show that the MoN loss does not lead to the ground truth probability density function, but approximately to its square root instead.
994	Deep Mesh Reconstruction From Single RGB Images via Topology Modification Networks	Junyi Pan, Xiaoguang Han, Weikai Chen, Jiapeng Tang, Kui Jia	In this paper, we present an end-to-end single-view mesh reconstruction framework that is able to generate high-quality meshes with complex topologies from a single genus-0 template mesh.
995	UprightNet: Geometry-Aware Camera Orientation Estimation From Single Images	Wenqi Xian, Zhengqi Li, Matthew Fisher, Jonathan Eisenmann, Eli Shechtman, Noah Snavely	We introduce UprightNet, a learning-based approach for estimating 2DoF camera orientation from a single RGB image of an indoor scene.
996	Escaping Plato’s Cave: 3D Shape From Adversarial Rendering	Philipp Henzler, Niloy J. Mitra, Tobias Ritschel	We introduce PlatonicGAN to discover the 3D structure of an object class from an unstructured collection of 2D images, i.e., where no relation between photos is known, except that they are showing instances of the same category.
997	Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module	Di Qiu, Jiahao Pang, Wenxiu Sun, Chengxi Yang	In this work, we propose a framework for jointly alignment and refinement via deep learning.
998	GEOBIT: A Geodesic-Based Binary Descriptor Invariant to Non-Rigid Deformations for RGB-D Images	Erickson R. Nascimento, Guilherme Potje, Renato Martins, Felipe Cadar, Mario F. M. Campos, Ruzena Bajcsy	In this paper, we introduce a novel binary RGB-D descriptor invariant to isometric deformations. We also provide to the community a new dataset composed of annotated RGB-D images of different objects (shirts, cloths, paintings, bags), subjected to strong non-rigid deformations, to evaluate point correspondence algorithms.
999	CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark	Alan Lukezic, Ugur Kart, Jani Kapyla, Ahmed Durmush, Joni-Kristian Kamarainen, Jiri Matas, Matej Kristan	We propose a new color-and-depth general visual object tracking benchmark (CDTB).
1000	Learning Joint 2D-3D Representations for Depth Completion	Yun Chen, Bin Yang, Ming Liang, Raquel Urtasun	In this paper, we tackle the problem of depth completion from RGBD data.
1001	Make a Face: Towards Arbitrary High Fidelity Face Manipulation	Shengju Qian, Kwan-Yee Lin, Wayne Wu, Yangxiaokang Liu, Quan Wang, Fumin Shen, Chen Qian, Ran He	In this work, we propose Additive Focal Variational Auto-encoder (AF-VAE), a novel approach that can arbitrarily manipulate high-resolution face images using a simple yet effective model and only weak supervision of reconstruction and KL divergence losses.
1002	M2FPA: A Multi-Yaw Multi-Pitch High-Quality Dataset and Benchmark for Facial Pose Analysis	Peipei Li, Xiang Wu, Yibo Hu, Ran He, Zhenan Sun	In this paper, a new large-scale Multi-yaw Multi-pitch high-quality database is proposed for Facial Pose Analysis (M2FPA), including face frontalization, face rotation, facial pose estimation and pose-invariant face recognition.
1003	Fair Loss: Margin-Aware Reinforcement Learning for Deep Face Recognition	Bingyu Liu, Weihong Deng, Yaoyao Zhong, Mei Wang, Jiani Hu, Xunqiang Tao, Yaohai Huang	In this paper, we introduce a new margin-aware reinforcement learning based loss function, namely fair loss, in which each class will learn an appropriate adaptive margin by Deep Q-learning.
1004	Face De-Occlusion Using 3D Morphable Model and Generative Adversarial Network	Xiaowei Yuan, In Kyu Park	In this paper, a novel method is proposed to restore de-occluded face images based on inverse use of 3DMM and generative adversarial network.
1005	Detecting Photoshopped Faces by Scripting Photoshop	Sheng-Yu Wang, Oliver Wang, Andrew Owens, Richard Zhang, Alexei A. Efros	We present a method for detecting one very popular Photoshop manipulation — image warping applied to human faces — using a model trained entirely using fake images that were automatically generated by scripting Photoshop itself.
1006	Ego-Pose Estimation and Forecasting As Real-Time PD Control	Ye Yuan, Kris Kitani	We propose the use of a proportional-derivative (PD) control based policy learned via reinforcement learning (RL) to estimate and forecast 3D human pose from egocentric videos.
1007	End-to-End Learning for Graph Decomposition	Jie Song, Bjoern Andres, Michael J. Black, Otmar Hilliges, Siyu Tang	In this paper, we study how to connect deep networks with graph decomposition into an end-to-end trainable framework.
1008	Laplace Landmark Localization	Joseph P. Robinson, Yuncheng Li, Ning Zhang, Yun Fu, Sergey Tulyakov	To address both issues, we propose an adversarial training framework that leverages unlabeled data to improve model performance.
1009	Through-Wall Human Mesh Recovery Using Radio Signals	Mingmin Zhao, Yingcheng Liu, Aniruddh Raghu, Tianhong Li, Hang Zhao, Antonio Torralba, Dina Katabi	This paper presents RF-Avatar, a neural network model that can estimate 3D meshes of the human body in the presence of occlusions, baggy clothes, and bad lighting conditions.
1010	Discriminatively Learned Convex Models for Set Based Face Recognition	Hakan Cevikalp, Golara Ghorban Dordinejad	In contrast to these methods, this paper introduces a novel method that searches for discriminative convex models that best fit to an individual’s face images but at the same time are as far as possible from the images of other persons in the gallery.
1011	Camera Distance-Aware Top-Down Approach for 3D Multi-Person Pose Estimation From a Single RGB Image	Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee	In this work, we firstly propose a fully learning-based, camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image.
1012	Context-Aware Emotion Recognition Networks	Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, Kwanghoon Sohn	We present deep networks for context-aware emotion recognition, called CAER-Net, that exploit not only human facial expression but also context information in a joint and boosting manner.
1013	Aggregation via Separation: Boosting Facial Landmark Detector With Semi-Supervised Style Translation	Shengju Qian, Keqiang Sun, Wayne Wu, Chen Qian, Jiaya Jia	In this paper, we investigate a new perspective of facial landmark detection and demonstrate it leads to further notable improvement.
1014	Deep Head Pose Estimation Using Synthetic Images and Partial Adversarial Domain Adaption for Continuous Label Spaces	Felix Kuhnke, Jorn Ostermann	More precisely, we adapt the predominant weighting approaches to continuous label spaces by applying a weighted resampling of the source domain during training.
1015	Flare in Interference-Based Hyperspectral Cameras	Eden Sassoon, Yoav Y. Schechner, Tali Treibitz	We present a theoretical image formation model for this effect, and quantify it through simulations and experiments.
1016	Computational Hyperspectral Imaging Based on Dimension-Discriminative Low-Rank Tensor Recovery	Shipeng Zhang, Lizhi Wang, Ying Fu, Xiaoming Zhong, Hua Huang	In this paper, we propose to make full use of the high-dimensionality structure of the desired HSI to boost the reconstruction quality.
1017	Deep Optics for Monocular Depth Estimation and 3D Object Detection	Julie Chang, Gordon Wetzstein	Here we introduce the paradigm of deep optics, i.e. end-to-end design of optics and image processing, to the monocular depth estimation problem, using coded defocus blur as an additional depth cue to be decoded by a neural network.
1018	Physics-Based Rendering for Improving Robustness to Rain	Shirsendu Sukanta Halder, Jean-Francois Lalonde, Raoul de Charette	To improve the robustness to rain, we present a physically-based rain rendering pipeline for realistically inserting rain into clear weather images.
1019	ARGAN: Attentive Recurrent Generative Adversarial Network for Shadow Detection and Removal	Bin Ding, Chengjiang Long, Ling Zhang, Chunxia Xiao	In this paper we propose an attentive recurrent generative adversarial network (ARGAN) to detect and remove shadows in an image.
1020	Deep Tensor ADMM-Net for Snapshot Compressive Imaging	Jiawei Ma, Xiao-Yang Liu, Zheng Shou, Xin Yuan	In this paper, we propose a deep tensor ADMM-Net for video SCI systems that provides high-quality decoding in seconds.
1021	Convex Relaxations for Consensus and Non-Minimal Problems in 3D Vision	Thomas Probst, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool	In this paper, we formulate a generic non-minimal solver using the existing tools of Polynomials Optimization Problems (POP) from computational algebraic geometry.
1022	Pareto Meets Huber: Efficiently Avoiding Poor Minima in Robust Estimation	Christopher Zach, Guillaume Bourmaud	In this paper, we propose a novel algorithm relying on multi-objective optimization which allows to match those two properties.
1023	K-Best Transformation Synchronization	Yifan Sun, Jiacheng Zhuo, Arnav Mohan, Qixing Huang	In this paper, we introduce the problem of K-best transformation synchronization for the purpose of multiple scan matching.
1024	Parametric Majorization for Data-Driven Energy Minimization Methods	Jonas Geiping, Michael Moeller	In this work, we present a new strategy to optimize these bi-level problems.
1025	A Bayesian Optimization Framework for Neural Network Compression	Xingchen Ma, Amal Rannen Triki, Maxim Berman, Christos Sagonas, Jacques Cali, Matthew B. Blaschko	In this work, we develop a general Bayesian optimization framework for optimizing functions that are computed based on U-statistics.
1026	HiPPI: Higher-Order Projected Power Iterations for Scalable Multi-Matching	Florian Bernard, Johan Thunberg, Paul Swoboda, Christian Theobalt	We address these shortcomings by introducing a Higher-order Projected Power Iteration method, which is (i) efficient and scales to tens of thousands of points, (ii) straightforward to implement, (iii) able to incorporate geometric consistency, (iv) guarantees cycle-consistent multi-matchings, and (iv) comes with theoretical convergence guarantees.
1027	Language-Conditioned Graph Networks for Relational Reasoning	Ronghang Hu, Anna Rohrbach, Trevor Darrell, Kate Saenko	In this paper, we take an alternate approach and build contextualized representations for objects in a visual scene to support relational reasoning.
1028	Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction	Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, Graham W. Taylor	In this work, we present a recurrent image generation model which takes into account both the generated output up to the current step as well as all past instructions for generation.
1029	Relation-Aware Graph Attention Network for Visual Question Answering	Linjie Li, Zhe Gan, Yu Cheng, Jingjing Liu	We propose a Relation-aware Graph Attention Network (ReGAT), which encodes each image into a graph and models multi-type inter-object relations via a graph attention mechanism, to learn question-adaptive relation representations.
1030	Unpaired Image Captioning via Scene Graph Alignments	Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang	In this paper, we present a scene graph-based approach for unpaired image captioning.
1031	Modeling Inter and Intra-Class Relations in the Triplet Loss for Zero-Shot Learning	Yannick Le Cacheux, Herve Le Borgne, Michel Crucianu	Our approach consists in taking into account both inter-class and intra-class relations, respectively by being more permissive with confusions between similar classes, and by penalizing visual samples which are atypical to their class.
1032	Occlusion-Shared and Feature-Separated Network for Occlusion Relationship Reasoning	Rui Lu, Feng Xue, Menghan Zhou, Anlong Ming, Yu Zhou	For the reasons above, we propose the Occlusion-shared and Feature-separated Network (OFNet).
1033	Compositional Video Prediction	Yufei Ye, Maneesh Singh, Abhinav Gupta, Shubham Tulsiani	We present an approach for pixel-level future prediction given an input image of a scene.
1034	Mixture-Kernel Graph Attention Network for Situation Recognition	Mohammed Suhail, Leonid Sigal	In this paper, we propose a novel mixture-kernel attention graph neural network (GNN) architecture designed to address these challenges.
1035	Learning Similarity Conditions Without Explicit Supervision	Reuben Tan, Mariya I. Vasileva, Kate Saenko, Bryan A. Plummer	To address this deficiency, we propose an approach that jointly learns representations for the different similarity conditions and their contributions as a latent variable without explicit supervision.
1036	Joint Prediction for Kinematic Trajectories in Vehicle-Pedestrian-Mixed Scenes	Huikun Bi, Zhong Fang, Tianlu Mao, Zhaoqi Wang, Zhigang Deng	In this paper, we tackle this problem using separate LSTMs for heterogeneous vehicles and pedestrians.
1037	Learning to Caption Images Through a Lifetime by Asking Questions	Tingke Shen, Amlan Kar, Sanja Fidler	Inspired by a student learning in a classroom, we present an agent that can continuously learn by posing natural language questions to humans.
1038	VrR-VG: Refocusing Visually-Relevant Relationships	Yuanzhi Liang, Yalong Bai, Wei Zhang, Xueming Qian, Li Zhu, Tao Mei	To encourage further development in visual relationships, we propose a novel method to mine more valuable relationships by automatically pruning visually-irrelevant relationships. We construct a new scene graph dataset named Visually-Relevant Relationships Dataset (VrR-VG) based on Visual Genome.
1039	TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo	Andrea Romanoni, Matteo Matteucci	Assuming the untextured areas piecewise planar, in this paper we generate novel PatchMatch hypotheses so to expand reliable depth estimates in neighboring untextured regions.
1040	U4D: Unsupervised 4D Dynamic Scene Understanding	Armin Mustafa, Chris Russell, Adrian Hilton	We introduce the first approach to solve the challenging problem of unsupervised 4D visual scene understanding for complex dynamic scenes with multiple interacting people from multi-view video.
1041	Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation	Li Jiang, Hengshuang Zhao, Shu Liu, Xiaoyong Shen, Chi-Wing Fu, Jiaya Jia	To incorporate point features in the edge branch, we establish a hierarchical graph framework, where the graph is initialized from a coarse layer and gradually enriched along the point decoding process.
1042	Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds From Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction	Zhizhong Han, Xiyang Wang, Yu-Shen Liu, Matthias Zwicker	To resolve this issue, we propose MAP-VAE to enable the learning of global and local geometry by jointly leveraging global and local self-supervision.
1043	P-MVSNet: Learning Patch-Wise Matching Confidence Aggregation for Multi-View Stereo	Keyang Luo, Tao Guan, Lili Ju, Haipeng Huang, Yawei Luo	In this paper, we propose a new end-to-end deep learning network of P-MVSNet for multi-view stereo based on isotropic and anisotropic 3D convolutions.
1044	SME-Net: Sparse Motion Estimation for Parametric Video Prediction Through Reinforcement Learning	Yung-Han Ho, Chuan-Yuan Cho, Wen-Hsiao Peng, Guo-Lun Jin	This paper leverages a classic prediction technique, known as parametric overlapped block motion compensation (POBMC), in a reinforcement learning framework for video prediction.
1045	ClothFlow: A Flow-Based Model for Clothed Person Generation	Xintong Han, Xiaojun Hu, Weilin Huang, Matthew R. Scott	We present ClothFlow, an appearance-flow-based generative model to synthesize clothed person for posed-guided person image generation and virtual try-on.
1046	LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup	Qiao Gu, Guanzhi Wang, Mang Tik Chiu, Yu-Wing Tai, Chi-Keung Tang	We propose a local adversarial disentangling network (LADN) for facial makeup and de-makeup.
1047	Point-to-Point Video Generation	Tsun-Hsuan Wang, Yen-Chi Cheng, Chieh Hubert Lin, Hwann-Tzong Chen, Min Sun	We introduce point-to-point video generation that controls the generation process with two control points: the targeted start- and end-frames.
1048	Semantics-Enhanced Adversarial Nets for Text-to-Image Synthesis	Hongchen Tan, Xiuping Liu, Xin Li, Yi Zhang, Baocai Yin	This paper presents a new model, Semantics-enhanced Generative Adversarial Network (SEGAN), for fine-grained text-to-image generation.
1049	VTNFP: An Image-Based Virtual Try-On Network With Body and Clothing Feature Preservation	Ruiyun Yu, Xiaoqi Wang, Xiaohui Xie	Here we present a new virtual try-on network, called VTNFP, to synthesize photo-realistic images given the images of a clothed person and a target clothing item.
1050	Boundless: Generative Adversarial Networks for Image Extension	Piotr Teterwak, Aaron Sarna, Dilip Krishnan, Aaron Maschinot, David Belanger, Ce Liu, William T. Freeman	We introduce semantic conditioning to the discriminator of a generative adversarial network (GAN), and achieve strong results on image extension with coherent semantics and visually pleasing colors and textures.
1051	Image Synthesis From Reconfigurable Layout and Style	Wei Sun, Tianfu Wu	In this paper, we present a layout- and style-based architecture for generative adversarial networks (termed LostGANs) that can be trained end-to-end to generate images from reconfigurable layout and style.
1052	Attribute Manipulation Generative Adversarial Networks for Fashion Images	Kenan E. Ak, Joo Hwee Lim, Jo Yew Tham, Ashraf A. Kassim	To address this and other limitations, we introduce Attribute Manipulation Generative Adversarial Networks (AMGAN) for fashion images.
1053	Few-Shot Unsupervised Image-to-Image Translation	Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, Jan Kautz	Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, we seek a few-shot, unsupervised image-to-image translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images.
1054	Very Long Natural Scenery Image Prediction by Outpainting	Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, Shuicheng Yan	To solve the two problems, we devise some innovative modules, named Skip Horizontal Connection and Recurrent Content Transfer, and integrate them into our designed encoder-decoder structure.
1055	Scaling Recurrent Models via Orthogonal Approximations in Tensor Trains	Ronak Mehta, Rudrasis Chakraborty, Yunyang Xiong, Vikas Singh	We describe the “orthogonal” tensor train, and demonstrate its ability to express a standard network layer both theoretically and empirically.
1056	A Deep Cybersickness Predictor Based on Brain Signal Analysis for Virtual Reality Contents	Jinwoo Kim, Woojae Kim, Heeseok Oh, Seongmin Lee, Sanghoon Lee	In this paper, we address the above question by developing an electroencephalography (EEG) driven VR cybersickness prediction model.
1057	Learning With Unsure Data for Medical Image Diagnosis	Botong Wu, Xinwei Sun, Lingjing Hu, Yizhou Wang	In this paper, we raise “learning with unsure data” problem and formulate it as an ordinal regression and propose a unified end-to-end learning framework, which also considers the aforementioned two issues: (i) incorporate cost-sensitive parameters to alleviate the data imbalance problem, and (ii) execute the conservative and aggressive strategies by introducing two parameters in the training procedure.
1058	Recursive Cascaded Networks for Unsupervised Medical Image Registration	Shengyu Zhao, Yue Dong, Eric I-Chao Chang, Yan Xu	We present recursive cascaded networks, a general architecture that enables learning deep cascades, for deformable image registration.
1059	DUAL-GLOW: Conditional Flow-Based Generative Model for Modality Transfer	Haoliang Sun, Ronak Mehta, Hao H. Zhou, Zhichun Huang, Sterling C. Johnson, Vivek Prabhakaran, Vikas Singh	We present experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset with 826 subjects, and obtain good performance in PET image synthesis, qualitatively and quantitatively better than recent works.
1060	Dilated Convolutional Neural Networks for Sequential Manifold-Valued Data	Xingjian Zhen, Rudrasis Chakraborty, Nicholas Vogt, Barbara B. Bendlin, Vikas Singh	Instead of a recurrent model which poses computational/technical issues, and inspired by recent results showing the viability of dilated convolutional models for sequence prediction, we develop a dilated convolutional neural network architecture for this task.
1061	Align, Attend and Locate: Chest X-Ray Diagnosis via Contrast Induced Attention Network With Limited Supervision	Jingyu Liu, Gangming Zhao, Yu Fei, Ming Zhang, Yizhou Wang, Yizhou Yu	In this paper, we propose a Contrast Induced Attention Network (CIA-Net), which exploits the highly structured property of chest X-ray images and localizes diseases via contrastive learning on the aligned positive and negative samples.
1062	Joint Acne Image Grading and Counting via Label Distribution Learning	Xiaoping Wu, Ni Wen, Jie Liang, Yu-Kun Lai, Dongyu She, Ming-Ming Cheng, Jufeng Yang	In this paper, we address the problem of acne image analysis via Label Distribution Learning (LDL) considering the ambiguous information among acne severity. In addition, we further build the ACNE04 dataset with annotations of acne severity and lesion number of each image for evaluation.
1063	An Alarm System for Segmentation Algorithm Based on Shape Model	Fengze Liu, Yingda Xia, Dong Yang, Alan L. Yuille, Daguang Xu	Motivated by this, in this paper, we learn a feature space using the shape information which is a strong prior shared among different datasets and robust to the appearance variation of input data.
1064	HistoSegNet: Semantic Segmentation of Histological Tissue Type in Whole Slide Images	Lyndon Chan, Mahdi S. Hosseini, Corwyn Rowsell, Konstantinos N. Plataniotis, Savvas Damaskinos	In this paper, we propose HistoSegNet, a method for semantic segmentation of histological tissue type (HTT).
1065	Prior-Aware Neural Network for Partially-Supervised Multi-Organ Segmentation	Yuyin Zhou, Zhe Li, Song Bai, Chong Wang, Xinlei Chen, Mei Han, Elliot Fishman, Alan L. Yuille	To address the background ambiguity in these partially-labeled datasets, we propose Prior-aware Neural Network (PaNN) via explicitly incorporating anatomical priors on abdominal organ sizes, guiding the training process with domain-specific knowledge.
1066	CAMEL: A Weakly Supervised Learning Framework for Histopathology Image Segmentation	Gang Xu, Zhigang Song, Zhuo Sun, Calvin Ku, Zhe Yang, Cancheng Liu, Shuhao Wang, Jianpeng Ma, Wei Xu	In this research, we propose CAMEL, a weakly supervised learning framework for histopathology image segmentation using only image-level labels.
1067	Conditional Recurrent Flow: Conditional Generation of Longitudinal Samples With Applications to Neuroimaging	Seong Jae Hwang, Zirui Tao, Won Hwa Kim, Vikas Singh	We develop a conditional generative model for longitudinal image datasets based on sequential invertible neural networks.
1068	Multi-Stage Pathological Image Classification Using Semantic Segmentation	Shusuke Takahama, Yusuke Kurose, Yusuke Mukuta, Hiroyuki Abe, Masashi Fukayama, Akihiko Yoshizawa, Masanobu Kitagawa, Tatsuya Harada	In this paper, we propose a new model structure combining the patch-based classification model and whole slide-scale segmentation model in order to improve the prediction performance of automatic pathological diagnosis.
1069	Semantic-Transferable Weakly-Supervised Endoscopic Lesions Segmentation	Jiahua Dong, Yang Cong, Gan Sun, Dongdong Hou	To better utilize these dependencies, we present a new semantic lesions representation transfer model for weakly-supervised endoscopic lesions segmentation, which can exploit useful knowledge from relevant fully-labeled diseases segmentation task to enhance the performance of target weakly-labeled lesions segmentation task. Finally, we build a new medical endoscopic dataset with 3659 images collected from more than 1100 volunteers.
1070	Unsupervised Microvascular Image Segmentation Using an Active Contours Mimicking Neural Network	Shir Gur, Lior Wolf, Lior Golgher, Pablo Blinder	We present a novel deep learning method for unsupervised segmentation of blood vessels.
1071	GLAMpoints: Greedily Learned Accurate Match Points	Prune Truong, Stefanos Apostolopoulos, Agata Mosinska, Samuel Stucky, Carlos Ciller, Sandro De Zanet	We introduce a novel CNN-based feature point detector – Greedily Learned Accurate Match Points (GLAMpoints) – learned in a semi-supervised manner.
1072	Adversarial Robustness vs. Model Compression, or Both?	Shaokai Ye, Kaidi Xu, Sijia Liu, Hao Cheng, Jan-Henrik Lambrechts, Huan Zhang, Aojun Zhou, Kaisheng Ma, Yanzhi Wang, Xue Lin	This paper proposes a framework of concurrent adversarial training and weightpruning that enables model compression while still preserving the adversarial robustness and essentially tackles the dilemma of adversarial training.
1073	MONET: Multiview Semi-Supervised Keypoint Detection via Epipolar Divergence	Yuan Yao, Yasamin Jafarian, Hyun Soo Park	This paper presents MONET- an end-to-end semi-supervised learning framework for a keypoint detector using multiview image streams.
1074	Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters	Axel Barroso-Laguna, Edgar Riba, Daniel Ponsa, Krystian Mikolajczyk	We introduce a novel approach for keypoint detection task that combines handcrafted and learned CNN filters within a shallow multi-scale architecture.
1075	Miss Detection vs. False Alarm: Adversarial Learning for Small Object Segmentation in Infrared Images	Huan Wang, Luping Zhou, Lei Wang	In this paper, we propose a deep adversarial learning framework to improve this situation.