Most Influential ArXiv (Computer Vision and Pattern Recognition) Papers (2026-04 Version)
The field of Computer Vision and Pattern Recognition in arXiv covers image processing, computer vision, pattern recognition, and scene understanding. Roughly it includes material in ACM Subject Classes I.2.10, I.4, and I.5. Paper Digest Team analyzes all papers published in this field in the past years, and presents up to 30 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the latest version of this list or the most influential papers from other conferences/journals, please visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2026-04).
As a pioneer in the field since 2018, Paper Digest has curated thousands of such lists, drawing on years of accumulated data across decades of conferences and research topics.To ensure users never miss a breakthrough, our daily digest service sifts through tens of thousands of new papers, clinical trials, news articles, community posts every day – delivering only what matters most to your specific interests. Beyond discovery, Paper Digest offers built-in research tools to help users read articles, write articles, get answers, conduct literature reviews, and generate research reports more efficiently.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Most Influential ArXiv (Computer Vision and Pattern Recognition) Papers (2026-04 Version)
| Year | Rank | Paper | Author(s) |
|---|---|---|---|
| 2025 | 1 | Qwen2.5-VL Technical Report IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Qwen2.5-VL, the latest flagship model of Qwen vision-language series, which demonstrates significant advancements in both foundational capabilities and innovative functionalities. |
SHUAI BAI et. al. |
| 2025 | 2 | Wan: Open and Advanced Large-Scale Video Generative Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Openness: We open-source the entire series of Wan, including source code and all models, with the goal of fostering the growth of the video generation community. |
TEAM WAN et. al. |
| 2025 | 3 | YOLOv12: Attention-Centric Real-Time Object Detectors IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms. |
Yunjie Tian; Qixiang Ye; David Doermann; |
| 2025 | 4 | InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm. |
JINGUO ZHU et. al. |
| 2025 | 5 | VGGT: Visual Geometry Grounded Transformer IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views. |
JIANYUAN WANG et. al. |
| 2025 | 6 | Model Adaptation: Unsupervised Domain Adaptation Without Source Data IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate a challenging unsupervised domain adaptation setting — unsupervised model adaptation. |
Rui Li; Qianfen Jiao; Wenming Cao; Hau-San Wong; Si Wu; |
| 2025 | 7 | SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce SigLIP 2, a family of new multilingual vision-language encoders that build on the success of the original SigLIP. |
MICHAEL TSCHANNEN et. al. |
| 2025 | 8 | Cosmos World Foundation Model Platform for Physical AI IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Inthis paper, we present the Cosmos World Foundation Model Platform to helpdevelopers build customized world models for their Physical AI setups. |
NIKET AGARWAL et. al. |
| 2025 | 9 | Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, direct training with RL struggles to activate complex reasoning capabilities such as questioning and reflection in MLLMs, due to the absence of substantial high-quality multimodal reasoning data. To address this issue, we propose the reasoning MLLM, Vision-R1, to improve multimodal reasoning capability. |
WENXUAN HUANG et. al. |
| 2025 | 10 | Visual-RFT: Visual Reinforcement Fine-Tuning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This work introduces Visual Reinforcement Fine-Tuning (Visual-RFT), which further extends the application areas of RFT on visual tasks. |
ZIYU LIU et. al. |
| 2025 | 11 | Emerging Properties in Unified Multimodal Pretraining IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduceBAGEL, an open-source foundational model that natively supports multimodalunderstanding and generation. |
CHAORUI DENG et. al. |
| 2025 | 12 | VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Furthermore, we conduct comprehensive ablation studies that uncover a series of noteworthy insights, including the presence of reward hacking in object detection, the emergence of the OD aha moment, the impact of training data quality, and the scaling behavior of RL across different model sizes. Through these analyses, we aim to deepen the understanding of how reinforcement learning enhances the capabilities of vision-language models, and we hope our findings and open-source contributions will support continued progress in the vision-language RL community. |
HAOZHAN SHEN et. al. |
| 2025 | 13 | InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce InternVL 3.5, a new family of open-source multimodal models thatsignificantly advances versatility, reasoning capability, and inferenceefficiency along the InternVL series. |
WEIYUN WANG et. al. |
| 2025 | 14 | VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose VideoLLaMA3, a more advanced multimodal foundation model for image and video understanding. |
BOQIANG ZHANG et. al. |
| 2025 | 15 | Qwen-Image Technical Report IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Qwen-Image, an image generation foundation model in the Qwenseries that achieves significant advances in complex text rendering and preciseimage editing. |
CHENFEI WU et. al. |
| 2025 | 16 | R1-Onevision: Advancing Generalized Multimodal Reasoning Through Cross-Modal Formalization IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce R1-Onevision, a multimodal reasoning model designed to bridge the gap between visual perception and deep reasoning. |
YI YANG et. al. |
| 2025 | 17 | Video-R1: Reinforcing Video Reasoning in MLLMs IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Inspired by DeepSeek-R1’s success in eliciting reasoning abilities throughrule-based reinforcement learning (RL), we introduce Video-R1 as the firstattempt to systematically explore the R1 paradigm for incentivizing videoreasoning within multimodal large language models (MLLMs). |
KAITUO FENG et. al. |
| 2025 | 18 | Continuous 3D Perception Model with Persistent State IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a unified framework capable of solving a broad range of 3D tasks. |
Qianqian Wang; Yifei Zhang; Aleksander Holynski; Alexei A. Efros; Angjoo Kanazawa; |
| 2025 | 19 | CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As a result, existing VLAs lack temporal planning or reasoning capabilities. In this paper, we introduce a method that incorporates explicit visual chain-of-thought (CoT) reasoning into vision-language-action models (VLAs) by predicting future image frames autoregressively as visual goals before generating a short action sequence to achieve these goals. |
QINGQING ZHAO et. al. |
| 2025 | 20 | Flow-GRPO: Training Flow Matching Models Via Online RL IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose Flow-GRPO, the first method integrating online reinforcementlearning (RL) into flow matching models. |
JIE LIU et. al. |
| 2025 | 21 | MambaHSI: Spatial-Spectral Mamba for Hyperspectral Image Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, representing the HSI is challenging for the Mamba due to the requirement for an integrated spatial and spectral understanding. To remedy these drawbacks, we propose a novel HSI classification model based on a Mamba model, named MambaHSI, which can simultaneously model long-range interaction of the whole image and integrate spatial and spectral information in an adaptive manner. |
Yapeng Li; Yong Luo; Lefei Zhang; Zengmao Wang; Bo Du; |
| 2025 | 22 | BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Motivated by the strong potential of autoregressive and diffusion models for high-quality generation and scalability, we conduct a comprehensive study of their use in unified multimodal settings, with emphasis on image representations, modeling objectives, and training strategies. Grounded in these investigations, we introduce a novel approach that employs a diffusion transformer to generate semantically rich CLIP image features, in contrast to conventional VAE-based representations. |
JIUHAI CHEN et. al. |
| 2025 | 23 | Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets. |
ZIBO ZHAO et. al. |
| 2025 | 24 | Step1X-Edit: A Practical Framework for General Image Editing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Thus, inthis paper, we aim to release a state-of-the-art image editing model, calledStep1X-Edit, which can provide comparable performance against the closed-sourcemodels like GPT-4o and Gemini2 Flash. |
SHIYU LIU et. al. |
| 2025 | 25 | Seed1.5-VL Technical Report IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. |
DONG GUO et. al. |
| 2025 | 26 | OmniGen2: Exploration to Advanced Multimodal Generation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce OmniGen2, a versatile and open-source generativemodel designed to provide a unified solution for diverse generation tasks,including text-to-image, image editing, and in-context generation. |
CHENYUAN WU et. al. |
| 2025 | 27 | Reconstruction Vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We argue that this dilemma stems from the inherent difficulty in learning unconstrained high-dimensional latent spaces. To address this, we propose aligning the latent space with pre-trained vision foundation models when training the visual tokenizers. |
Jingfeng Yao; Bin Yang; Xinggang Wang; |
| 2025 | 28 | Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that achieves efficient and scalable 3D reconstruction by processing many views in parallel. |
JIANING YANG et. al. |
| 2025 | 29 | VACE: All-in-One Video Creation and Editing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce VACE, which enables users to perform Video tasks within an All-in-one framework for Creation and Editing. |
ZEYINZI JIANG et. al. |
| 2024 | 1 | YOLOv11: An Overview of The Key Architectural Enhancements IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This study presents an architectural analysis of YOLOv11, the latest iteration in the YOLO (You Only Look Once) series of object detection models. |
Rahima Khanam; Muhammad Hussain; |
| 2024 | 2 | YOLOv10: Real-Time End-to-End Object Detection IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and model architecture. |
AO WANG et. al. |
| 2024 | 3 | Qwen2-VL: Enhancing Vision-Language Model’s Perception of The World at Any Resolution IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present the Qwen2-VL Series, an advanced upgrade of the previous Qwen-VL models that redefines the conventional predetermined-resolution approach in visual processing. |
PENG WANG et. al. |
| 2024 | 4 | YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We proposed the concept of programmable gradient information (PGI) to cope with the various changes required by deep networks to achieve multiple objectives. |
Chien-Yao Wang; I-Hau Yeh; Hong-Yuan Mark Liao; |
| 2024 | 5 | Scaling Rectified Flow Transformers for High-Resolution Image Synthesis IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Despite its better theoretical properties and conceptual simplicity, it is not yet decisively established as standard practice. In this work, we improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales. |
PATRICK ESSER et. al. |
| 2024 | 6 | SAM 2: Segment Anything in Images and Videos IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. |
NIKHILA RAVI et. al. |
| 2024 | 7 | LLaVA-OneVision: Easy Visual Task Transfer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. |
BO LI et. al. |
| 2024 | 8 | VMamba: Visual State Space Model IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we adapt Mamba, a state-space language model, into VMamba, a vision backbone with linear time complexity. |
YUE LIU et. al. |
| 2024 | 9 | Depth Anything: Unleashing The Power of Large-Scale Unlabeled Data IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This work presents Depth Anything, a highly practical solution for robust monocular depth estimation. |
LIHE YANG et. al. |
| 2024 | 10 | Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we show that the reliance on self-attention for visual representation learning is not necessary and propose a new generic vision backbone with bidirectional Mamba blocks (Vim), which marks the image sequences with position embeddings and compresses the visual representation with bidirectional state space models. |
LIANGHUI ZHU et. al. |
| 2024 | 11 | CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels. |
ZHUOYI YANG et. al. |
| 2024 | 12 | Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we delve into the relationship betweenmodel scaling and performance, systematically exploring the performance trendsin vision encoders, language models, dataset sizes, and test-timeconfigurations. |
ZHE CHEN et. al. |
| 2024 | 13 | Depth Anything V2 IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. |
LIHE YANG et. al. |
| 2024 | 14 | How Far Are We to GPT-4V? Closing The Gap to Commercial Multimodal Models with Open-Source Suites IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. |
ZHE CHEN et. al. |
| 2024 | 15 | 2D Gaussian Splatting for Geometrically Accurate Radiance Fields IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present 2D Gaussian Splatting (2DGS), a novel approach to model and reconstruct geometrically accurate radiance fields from multi-view images. |
Binbin Huang; Zehao Yu; Anpei Chen; Andreas Geiger; Shenghua Gao; |
| 2024 | 16 | MiniCPM-V: A GPT-4V Level MLLM on Your Phone IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we present MiniCPM-V, a series of efficient MLLMs deployable on end-side devices. |
YUAN YAO et. al. |
| 2024 | 17 | Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce Video-MME, the first-ever full-spectrum, Multi-Modal Evaluation benchmark of MLLMs in Video analysis. |
CHAOYOU FU et. al. |
| 2024 | 18 | Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM). |
TIANHE REN et. al. |
| 2024 | 19 | HunyuanVideo: A Systematic Framework For Large Video Generative Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. |
WEIJIE KONG et. al. |
| 2024 | 20 | WorldSimBench: Towards Video Generation Models As World Simulators IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we classify the functionalities of predictive models into a hierarchy and take the first step in evaluating World Simulators by proposing a dual evaluation framework called WorldSimBench. |
YIRAN QIN et. al. |
| 2024 | 21 | Visual Autoregressive Modeling: Scalable Image Generation Via Next-Scale Prediction IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine next-scale prediction or next-resolution prediction, diverging from the standard raster-scan next-token prediction. |
Keyu Tian; Yi Jiang; Zehuan Yuan; Bingyue Peng; Liwei Wang; |
| 2024 | 22 | YOLO-World: Real-Time Open-Vocabulary Object Detection IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. |
TIANHENG CHENG et. al. |
| 2024 | 23 | Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. |
SHENGBANG TONG et. al. |
| 2024 | 24 | LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. |
JIAXIANG TANG et. al. |
| 2024 | 25 | Are We on The Right Way for Evaluating Large Vision-Language Models? IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, we present MMStar, an elite vision-indispensable multi-modal benchmark comprising 1,500 samples meticulously selected by humans. |
LIN CHEN et. al. |
| 2024 | 26 | Eyes Wide Shut? Exploring The Visual Shortcomings of Multimodal LLMs IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We further evaluate various CLIP-based vision-and-language models and found a notable correlation between visual patterns that challenge CLIP models and those problematic for multimodal LLMs. As an initial effort to address these issues, we propose a Mixture of Features (MoF) approach, demonstrating that integrating vision self-supervised learning features with MLLMs can significantly enhance their visual grounding capabilities. |
SHENGBANG TONG et. al. |
| 2024 | 27 | VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks. |
ZESEN CHENG et. al. |
| 2024 | 28 | Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce LlamaGen, a new family of image generation models that apply original “next-token prediction” paradigm of large language models to visual generation domain. |
PEIZE SUN et. al. |
| 2024 | 29 | SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we present a system to facilitate this approach. |
BOYUAN CHEN et. al. |
| 2024 | 30 | MambaIR: A Simple Baseline for Image Restoration with State-Space Model IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce a simple but effective baseline, named MambaIR, which introduces both local enhancement and channel attention to improve the vanilla Mamba. |
HANG GUO et. al. |
| 2023 | 1 | Segment Anything IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. |
ALEXANDER KIRILLOV et. al. |
| 2023 | 2 | Visual Instruction Tuning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. |
Haotian Liu; Chunyuan Li; Qingyang Wu; Yong Jae Lee; |
| 2023 | 3 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper proposes BLIP-2, a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models. |
Junnan Li; Dongxu Li; Silvio Savarese; Steven Hoi; |
| 2023 | 4 | DINOv2: Learning Robust Visual Features Without Supervision IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. |
MAXIME OQUAB et. al. |
| 2023 | 5 | Adding Conditional Control to Text-to-Image Diffusion Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. |
Lvmin Zhang; Anyi Rao; Maneesh Agrawala; |
| 2023 | 6 | Improved Baselines with Visual Instruction Tuning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. |
Haotian Liu; Chunyuan Li; Yuheng Li; Yong Jae Lee; |
| 2023 | 7 | SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present SDXL, a latent diffusion model for text-to-image synthesis. |
DUSTIN PODELL et. al. |
| 2023 | 8 | Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. |
SHILONG LIU et. al. |
| 2023 | 9 | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we conduct a systematic and comprehensive study on vision-language instruction tuning based on the pretrained BLIP-2 models. |
WENLIANG DAI et. al. |
| 2023 | 10 | MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We believe that the enhanced multi-modal generation capabilities of GPT-4 stem from the utilization of sophisticated large language models (LLM). To examine this phenomenon, we present MiniGPT-4, which aligns a frozen visual encoder with a frozen advanced LLM, Vicuna, using one projection layer. |
Deyao Zhu; Jun Chen; Xiaoqian Shen; Xiang Li; Mohamed Elhoseiny; |
| 2023 | 11 | DETRs Beat YOLOs on Real-time Object Detection IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Nevertheless, the high computational cost limits their practicality and hinders them from fully exploiting the advantage of excluding NMS. In this paper, we propose the Real-Time DEtection TRansformer (RT-DETR), the first real-time end-to-end object detector to our best knowledge that addresses the above dilemma. |
YIAN ZHAO et. al. |
| 2023 | 12 | Sigmoid Loss for Language Image Pre-Training IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP). |
Xiaohua Zhai; Basil Mustafa; Alexander Kolesnikov; Lucas Beyer; |
| 2023 | 13 | A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a comprehensive analysis of YOLO’s evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with Transformers. |
Juan Terven; Diana Cordova-Esparza; |
| 2023 | 14 | Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning. |
ANDREAS BLATTMANN et. al. |
| 2023 | 15 | MMBench: Is Your Multi-modal Model An All-around Player? IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Meanwhile, subjective benchmarks, such as OwlEval, offer comprehensive evaluations of a model’s abilities by incorporating human labor, which is not scalable and may display significant bias. In response to these challenges, we propose MMBench, a bilingual benchmark for assessing the multi-modal capabilities of VLMs. |
YUAN LIU et. al. |
| 2023 | 16 | Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images. |
JINZE BAI et. al. |
| 2023 | 17 | Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We hence propose a novel partial convolution (PConv) that extracts spatial features more efficiently, by cutting down redundant computation and memory access simultaneously. |
JIERUN CHEN et. al. |
| 2023 | 18 | Zero-1-to-3: Zero-shot One Image to 3D Object IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image. |
RUOSHI LIU et. al. |
| 2023 | 19 | Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. |
ANDREAS BLATTMANN et. al. |
| 2023 | 20 | T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, relying solely on text prompts cannot fully take advantage of the knowledge learned by the model, especially when flexible and accurate controlling (e.g., color and structure) is needed. In this paper, we aim to “dig out the capabilities that T2I models have implicitly learned, and then explicitly use them to control the generation more granularly. |
CHONG MOU et. al. |
| 2023 | 21 | LLaVA-Med: Training A Large Language-and-Vision Assistant for Biomedicine in One Day IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images. |
CHUNYUAN LI et. al. |
| 2023 | 22 | ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. |
SANGHYUN WOO et. al. |
| 2023 | 23 | ImageBind: One Embedding Space To Bind Them All IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present ImageBind, an approach to learn a joint embedding across six different modalities – images, text, audio, depth, thermal, and IMU data. |
ROHIT GIRDHAR et. al. |
| 2023 | 24 | AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models Without Specific Tuning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present AnimateDiff, a practical framework for animating personalized T2I models without requiring model-specific tuning. |
YUWEI GUO et. al. |
| 2023 | 25 | IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present IP-Adapter, an effective and lightweight adapter to achieve image prompt capability for the pretrained text-to-image diffusion models. |
Hu Ye; Jun Zhang; Sibo Liu; Xiao Han; Wei Yang; |
| 2023 | 26 | Evaluating Object Hallucination in Large Vision-Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To investigate it, this work presents the first systematic study on object hallucination of LVLMs. |
YIFAN LI et. al. |
| 2023 | 27 | MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, it is difficult for these case studies to fully reflect the performance of MLLM, lacking a comprehensive evaluation. In this paper, we fill in this blank, presenting the first comprehensive MLLM Evaluation benchmark MME. It measures both perception and cognition abilities on a total of 14 subtasks. |
CHAOYOU FU et. al. |
| 2023 | 28 | Video-LLaVA: Learning United Visual Representation By Alignment Before Projection IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM. |
BIN LIN et. al. |
| 2023 | 29 | MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has not been systematically studied. To bridge this gap, we present MathVista, a benchmark designed to combine challenges from diverse mathematical and visual tasks. |
PAN LU et. al. |
| 2023 | 30 | Efficient Multi-Scale Attention Module with Cross-Spatial Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, a novel efficient multi-scale attention (EMA) module is proposed. |
DALIANG OUYANG et. al. |
| 2022 | 1 | YOLOv7: Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors IF:8 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Abstract: YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object … |
Chien-Yao Wang; Alexey Bochkovskiy; Hong-Yuan Mark Liao; |
| 2022 | 2 | Hierarchical Text-Conditional Image Generation with CLIP Latents IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. |
Aditya Ramesh; Prafulla Dhariwal; Alex Nichol; Casey Chu; Mark Chen; |
| 2022 | 3 | Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. |
CHITWAN SAHARIA et. al. |
| 2022 | 4 | A ConvNet for The 2020s IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. |
ZHUANG LIU et. al. |
| 2022 | 5 | BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. |
Junnan Li; Dongxu Li; Caiming Xiong; Steven Hoi; |
| 2022 | 6 | Instant Neural Graphics Primitives with A Multiresolution Hash Encoding IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations: a small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. |
Thomas Müller; Alex Evans; Christoph Schied; Alexander Keller; |
| 2022 | 7 | Flamingo: A Visual Language Model for Few-Shot Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. |
JEAN-BAPTISTE ALAYRAC et. al. |
| 2022 | 8 | LAION-5B: An Open Large-scale Dataset for Training Next Generation Image-text Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Until now, no datasets of this size have been made openly available for the broader research community. To address this problem and democratize research on large-scale multi-modal models, we present LAION-5B – a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, of which 2.32B contain English language. |
CHRISTOPH SCHUHMANN et. al. |
| 2022 | 9 | Scalable Diffusion Models with Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches. |
William Peebles; Saining Xie; |
| 2022 | 10 | DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we present a new approach for personalization of text-to-image diffusion models. |
NATANIEL RUIZ et. al. |
| 2022 | 11 | DreamFusion: Text-to-3D Using 2D Diffusion IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. |
Ben Poole; Ajay Jain; Jonathan T. Barron; Ben Mildenhall; |
| 2022 | 12 | Elucidating The Design Space of Diffusion-Based Generative Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. |
Tero Karras; Miika Aittala; Timo Aila; Samuli Laine; |
| 2022 | 13 | YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: The YOLO community has prospered overwhelmingly to enrich its use in a multitude of hardware platforms and abundant scenarios. In this technical report, we strive to push its limits to the next level, stepping forward with an unwavering mindset for industry application. |
CHUYI LI et. al. |
| 2022 | 14 | InstructPix2Pix: Learning to Follow Image Editing Instructions IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. |
Tim Brooks; Aleksander Holynski; Alexei A. Efros; |
| 2022 | 15 | An Image Is Worth One Word: Personalizing Text-to-Image Generation Using Textual Inversion IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. |
RINON GAL et. al. |
| 2022 | 16 | Prompt-to-Prompt Image Editing with Cross Attention Control IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we pursue an intuitive prompt-to-prompt editing framework, where the edits are controlled by text only. |
AMIR HERTZ et. al. |
| 2022 | 17 | Visual Prompt Tuning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. |
MENGLIN JIA et. al. |
| 2022 | 18 | DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present DINO (\textbf{D}ETR with \textbf{I}mproved de\textbf{N}oising anch\textbf{O}r boxes), a state-of-the-art end-to-end object detector. |
HAO ZHANG et. al. |
| 2022 | 19 | Video Diffusion Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To generate long and higher resolution videos we introduce a new conditional sampling technique for spatial and temporal video extension that performs better than previously proposed methods. |
JONATHAN HO et. al. |
| 2022 | 20 | Conditional Prompt Learning for Vision-Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). |
Kaiyang Zhou; Jingkang Yang; Chen Change Loy; Ziwei Liu; |
| 2022 | 21 | RePaint: Inpainting Using Denoising Diffusion Probabilistic Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. |
ANDREAS LUGMAYR et. al. |
| 2022 | 22 | Imagen Video: High Definition Video Generation with Diffusion Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. |
JONATHAN HO et. al. |
| 2022 | 23 | Diffusion Models in Vision: A Survey IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this survey, we provide a comprehensive review of articles on denoising diffusion models applied in vision, comprising both theoretical and practical contributions in the field. |
Florinel-Alin Croitoru; Vlad Hondru; Radu Tudor Ionescu; Mubarak Shah; |
| 2022 | 24 | Make-A-Video: Text-to-Video Generation Without Text-Video Data IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose Make-A-Video — an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). |
URIEL SINGER et. al. |
| 2022 | 25 | BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images Via Spatiotemporal Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. |
ZHIQI LI et. al. |
| 2022 | 26 | VideoMAE: Masked Autoencoders Are Data-Efficient Learners for Self-Supervised Video Pre-Training IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we show that video masked autoencoders (VideoMAE) are data-efficient learners for self-supervised video pre-training (SSVP). |
Zhan Tong; Yibing Song; Jue Wang; Limin Wang; |
| 2022 | 27 | CoCa: Contrastive Captioners Are Image-Text Foundation Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper presents Contrastive Captioner (CoCa), a minimalist design to pretrain an image-text encoder-decoder foundation model jointly with contrastive loss and captioning loss, thereby subsuming model capabilities from contrastive approaches like CLIP and generative methods like SimVLM. |
JIAHUI YU et. al. |
| 2022 | 28 | TensoRF: Tensorial Radiance Fields IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present TensoRF, a novel approach to model and reconstruct radiance fields. |
Anpei Chen; Zexiang Xu; Andreas Geiger; Jingyi Yu; Hao Su; |
| 2022 | 29 | Magic3D: High-Resolution Text-to-3D Content Creation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. |
CHEN-HSUAN LIN et. al. |
| 2022 | 30 | Objaverse: A Universe of Annotated 3D Objects IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite considerable interest and potential applications in 3D vision, datasets of high-fidelity 3D models continue to be mid-sized with limited diversity of object categories. Addressing this gap, we present Objaverse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive captions, tags, and animations. |
MATT DEITKE et. al. |
| 2021 | 1 | Learning Transferable Visual Models From Natural Language Supervision IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. |
ALEC RADFORD et. al. |
| 2021 | 2 | Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. |
ZE LIU et. al. |
| 2021 | 3 | High-Resolution Image Synthesis with Latent Diffusion Models IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. |
Robin Rombach; Andreas Blattmann; Dominik Lorenz; Patrick Esser; Björn Ommer; |
| 2021 | 4 | Masked Autoencoders Are Scalable Vision Learners IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. |
KAIMING HE et. al. |
| 2021 | 5 | Emerging Properties in Self-Supervised Vision Transformers IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). |
MATHILDE CARON et. al. |
| 2021 | 6 | SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. |
ENZE XIE et. al. |
| 2021 | 7 | Zero-Shot Text-to-Image Generation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. |
ADITYA RAMESH et. al. |
| 2021 | 8 | YOLOX: Exceeding YOLO Series in 2021 IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector — YOLOX. |
Zheng Ge; Songtao Liu; Feng Wang; Zeming Li; Jian Sun; |
| 2021 | 9 | TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation. |
JIENENG CHEN et. al. |
| 2021 | 10 | Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset. |
CHAO JIA et. al. |
| 2021 | 11 | Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Unlike the recently-proposed Transformer model (e.g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks. |
WENHAI WANG et. al. |
| 2021 | 12 | GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. |
ALEX NICHOL et. al. |
| 2021 | 13 | Coordinate Attention for Efficient Mobile Network Design IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a novel attention mechanism for mobile networks by embedding positional information into channel attention, which we call coordinate attention. |
Qibin Hou; Daquan Zhou; Jiashi Feng; |
| 2021 | 14 | EfficientNetV2: Smaller Models and Faster Training IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models. |
Mingxing Tan; Quoc V. Le; |
| 2021 | 15 | BEiT: BERT Pre-Training of Image Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce a self-supervised vision representation model BEiT, which stands for Bidirectional Encoder representation from Image Transformers. |
Hangbo Bao; Li Dong; Songhao Piao; Furu Wei; |
| 2021 | 16 | Learning to Prompt for Vision-Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Different from the traditional representation learning that is based mostly on discretized labels, vision-language pre-training aligns images and texts in a common feature space, which allows zero-shot transfer to a downstream task via prompting, i.e., classification weights are synthesized from natural language describing classes of interest. In this work, we show that a major challenge for deploying such models in practice is prompt engineering, which requires domain expertise and is extremely time-consuming — one needs to spend a significant amount of time on words tuning since a slight change in wording could have a huge impact on performance. |
Kaiyang Zhou; Jingkang Yang; Chen Change Loy; Ziwei Liu; |
| 2021 | 17 | Masked-attention Mask Transformer for Universal Image Segmentation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). |
Bowen Cheng; Ishan Misra; Alexander G. Schwing; Alexander Kirillov; Rohit Girdhar; |
| 2021 | 18 | MLP-Mixer: An All-MLP Architecture for Vision IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. |
ILYA TOLSTIKHIN et. al. |
| 2021 | 19 | Restormer: Efficient Transformer for High-Resolution Image Restoration IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images. |
SYED WAQAS ZAMIR et. al. |
| 2021 | 20 | Transformers in Vision: A Survey IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline. |
SALMAN KHAN et. al. |
| 2021 | 21 | Barlow Twins: Self-Supervised Learning Via Redundancy Reduction IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. |
Jure Zbontar; Li Jing; Ishan Misra; Yann LeCun; Stéphane Deny; |
| 2021 | 22 | ViViT: A Video Vision Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present pure-transformer based models for video classification, drawing upon the recent success of such models in image classification. |
ANURAG ARNAB et. al. |
| 2021 | 23 | Is Space-Time Attention All You Need for Video Understanding? IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a convolution-free approach to video classification built exclusively on self-attention over space and time. |
Gedas Bertasius; Heng Wang; Lorenzo Torresani; |
| 2021 | 24 | Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Compared to NeRF, mip-NeRF reduces average error rates by 17% on the dataset presented with NeRF and by 60% on a challenging multiscale variant of that dataset that we present. |
JONATHAN T. BARRON et. al. |
| 2021 | 25 | Align Before Fuse: Vision and Language Representation Learning with Momentum Distillation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce a contrastive loss to ALign the image and text representations BEfore Fusing (ALBEF) them through cross-modal attention, which enables more grounded vision and language representation learning. |
JUNNAN LI et. al. |
| 2021 | 26 | Swin Transformer V2: Scaling Up Capacity and Resolution IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper aims to explore large-scale models in computer vision. |
ZE LIU et. al. |
| 2021 | 27 | Vision Transformers for Dense Prediction IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks. |
René Ranftl; Alexey Bochkovskiy; Vladlen Koltun; |
| 2021 | 28 | Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study. |
LI YUAN et. al. |
| 2021 | 29 | CLIPScore: A Reference-free Evaluation Metric for Image Captioning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we report the surprising empirical finding that CLIP (Radford et al., 2021), a cross-modal model pretrained on 400M image+caption pairs from the web, can be used for robust automatic evaluation of image captioning without the need for references. |
Jack Hessel; Ari Holtzman; Maxwell Forbes; Ronan Le Bras; Yejin Choi; |
| 2021 | 30 | CvT: Introducing Convolutions to Vision Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs. |
HAIPING WU et. al. |
| 2020 | 1 | An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. |
ALEXEY DOSOVITSKIY et. al. |
| 2020 | 2 | End-to-End Object Detection With Transformers IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a new method that views object detection as a direct set prediction problem. |
NICOLAS CARION et. al. |
| 2020 | 3 | YOLOv4: Optimal Speed And Accuracy Of Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. |
Alexey Bochkovskiy; Chien-Yao Wang; Hong-Yuan Mark Liao; |
| 2020 | 4 | Training Data-efficient Image Transformers & Distillation Through Attention IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we produce a competitive convolution-free transformer by training on Imagenet only. |
HUGO TOUVRON et. al. |
| 2020 | 5 | Deformable DETR: Deformable Transformers for End-to-End Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. |
XIZHOU ZHU et. al. |
| 2020 | 6 | Exploring Simple Siamese Representation Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. |
Xinlei Chen; Kaiming He; |
| 2020 | 7 | Unsupervised Learning of Visual Features By Contrasting Cluster Assignments IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. |
MATHILDE CARON et. al. |
| 2020 | 8 | Taming Transformers for High-Resolution Image Synthesis IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In particular, we present the first results on semantically-guided synthesis of megapixel images with transformers and obtain the state of the art among autoregressive models on class-conditional ImageNet. |
Patrick Esser; Robin Rombach; Björn Ommer; |
| 2020 | 9 | Improved Baselines With Momentum Contrastive Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this note, we verify the effectiveness of two of SimCLR’s design improvements by implementing them in the MoCo framework. |
Xinlei Chen; Haoqi Fan; Ross Girshick; Kaiming He; |
| 2020 | 10 | Image Segmentation Using Deep Learning: A Survey IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. |
SHERVIN MINAEE et. al. |
| 2020 | 11 | Rethinking Semantic Segmentation from A Sequence-to-Sequence Perspective with Transformers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. |
SIXIAO ZHENG et. al. |
| 2020 | 12 | RAFT: Recurrent All-Pairs Field Transforms For Optical Flow IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Recurrent All-Pairs Field Transforms (RAFT), a new deep network architecture for optical flow. |
Zachary Teed; Jia Deng; |
| 2020 | 13 | Implicit Neural Representations With Periodic Activation Functions IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or Sirens, are ideally suited for representing complex natural signals and their derivatives. |
Vincent Sitzmann; Julien N. P. Martel; Alexander W. Bergman; David B. Lindell; Gordon Wetzstein; |
| 2020 | 14 | Fourier Features Let Networks Learn High Frequency Functions In Low Dimensional Domains IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities. |
MATTHEW TANCIK et. al. |
| 2020 | 15 | Point Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we present Point Transformer, a deep neural network that operates directly on unordered and unstructured point sets. |
Nico Engel; Vasileios Belagiannis; Klaus Dietmayer; |
| 2020 | 16 | Shortcut Learning in Deep Neural Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this perspective we seek to distill how many of deep learning’s problems can be seen as different symptoms of the same underlying problem: shortcut learning. |
ROBERT GEIRHOS et. al. |
| 2020 | 17 | Face2Face: Real-time Face Capture And Reenactment Of RGB Videos IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present Face2Face, a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). |
Justus Thies; Michael Zollhöfer; Marc Stamminger; Christian Theobalt; Matthias Nießner; |
| 2020 | 18 | Oscar: Object-Semantics Aligned Pre-training For Vision-Language Tasks IF:8 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Abstract: Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks. While existing methods simply … |
XIUJUN LI et. al. |
| 2020 | 19 | Training Generative Adversarial Networks With Limited Data IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. |
TERO KARRAS et. al. |
| 2020 | 20 | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce four new real-world distribution shift datasets consisting of changes in image style, image blurriness, geographic location, camera operation, and more. |
DAN HENDRYCKS et. al. |
| 2020 | 21 | Pre-Trained Image Processing Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). |
HANTING CHEN et. al. |
| 2020 | 22 | PixelNeRF: Neural Radiance Fields from One or Few Images IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose pixelNeRF, a learning framework that predicts a continuous neural scene representation conditioned on one or few input images. |
Alex Yu; Vickie Ye; Matthew Tancik; Angjoo Kanazawa; |
| 2020 | 23 | Center-based 3D Object Detection and Tracking IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we instead propose to represent, detect, and track 3D objects as points. |
Tianwei Yin; Xingyi Zhou; Philipp Krähenbühl; |
| 2020 | 24 | Designing Network Design Spaces IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we present a new network design paradigm. |
Ilija Radosavovic; Raj Prateek Kosaraju; Ross Girshick; Kaiming He; Piotr Dollár; |
| 2020 | 25 | Deep Learning for Person Re-identification: A Survey and Outlook IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. |
MANG YE et. al. |
| 2020 | 26 | PCT: Point Cloud Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper presents a novel framework named Point Cloud Transformer(PCT) for point cloud learning. |
MENG-HAO GUO et. al. |
| 2020 | 27 | Zero-Reference Deep Curve Estimation For Low-Light Image Enhancement IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: The paper presents a novel method, Zero-Reference Deep Curve Estimation (Zero-DCE), which formulates light enhancement as a task of image-specific curve estimation with a deep network. |
CHUNLE GUO et. al. |
| 2020 | 28 | D-NeRF: Neural Radiance Fields For Dynamic Scenes IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we introduce D-NeRF, a method that extends neural radiance fields to a dynamic domain, allowing to reconstruct and render novel images of objects under rigid and non-rigid motions from a \emph{single} camera moving around the scene. |
Albert Pumarola; Enric Corona; Gerard Pons-Moll; Francesc Moreno-Noguer; |
| 2020 | 29 | NeRF in The Wild: Neural Radiance Fields for Unconstrained Photo Collections IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs. |
RICARDO MARTIN-BRUALLA et. al. |
| 2020 | 30 | ResNeSt: Split-Attention Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a simple and modular Split-Attention block that enables attention across feature-map groups. |
HANG ZHANG et. al. |
| 2019 | 1 | Momentum Contrast For Unsupervised Visual Representation Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present Momentum Contrast (MoCo) for unsupervised visual representation learning. |
Kaiming He; Haoqi Fan; Yuxin Wu; Saining Xie; Ross Girshick; |
| 2019 | 2 | Searching For MobileNetV3 IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. |
ANDREW HOWARD et. al. |
| 2019 | 3 | EfficientDet: Scalable And Efficient Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. |
Mingxing Tan; Ruoming Pang; Quoc V. Le; |
| 2019 | 4 | FCOS: Fully Convolutional One-Stage Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. |
Zhi Tian; Chunhua Shen; Hao Chen; Tong He; |
| 2019 | 5 | CutMix: Regularization Strategy To Train Strong Classifiers With Localizable Features IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. |
SANGDOO YUN et. al. |
| 2019 | 6 | ECA-Net: Efficient Channel Attention For Deep Convolutional Neural Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. |
QILONG WANG et. al. |
| 2019 | 7 | Generalized Intersection Over Union: A Metric And A Loss For Bounding Box Regression IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we address the weaknesses of $IoU$ by introducing a generalized version as both a new loss and a new metric. |
HAMID REZATOFIGHI et. al. |
| 2019 | 8 | Deep High-Resolution Representation Learning For Human Pose Estimation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. |
Ke Sun; Bin Xiao; Dong Liu; Jingdong Wang; |
| 2019 | 9 | Distance-IoU Loss: Faster And Better Learning For Bounding Box Regression IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a Distance-IoU (DIoU) loss by incorporating the normalized distance between the predicted box and the target box, which converges much faster in training than IoU and GIoU losses. |
ZHAOHUI ZHENG et. al. |
| 2019 | 10 | Deep High-Resolution Representation Learning For Visual Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. |
JINGDONG WANG et. al. |
| 2019 | 11 | DeepSDF: Learning Continuous Signed Distance Functions For Shape Representation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape representation, interpolation and completion from partial and noisy 3D input data. |
Jeong Joon Park; Peter Florence; Julian Straub; Richard Newcombe; Steven Lovegrove; |
| 2019 | 12 | ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations For Vision-and-Language Tasks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. |
Jiasen Lu; Dhruv Batra; Devi Parikh; Stefan Lee; |
| 2019 | 13 | RandAugment: Practical Automated Data Augmentation With A Reduced Search Space IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we remove both of these obstacles. |
Ekin D. Cubuk; Barret Zoph; Jonathon Shlens; Quoc V. Le; |
| 2019 | 14 | CSPNet: A New Backbone That Can Enhance Learning Capability Of CNN IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose Cross Stage Partial Network (CSPNet) to mitigate the problem that previous works require heavy inference computations from the network architecture perspective. |
CHIEN-YAO WANG et. al. |
| 2019 | 15 | Scalability In Perception For Autonomous Driving: Waymo Open Dataset IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In an effort to help align the research community’s contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. |
PEI SUN et. al. |
| 2019 | 16 | GhostNet: More Features From Cheap Operations IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper proposes a novel Ghost module to generate more feature maps from cheap operations. |
KAI HAN et. al. |
| 2019 | 17 | Objects As Points IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we take a different approach. |
Xingyi Zhou; Dequan Wang; Philipp Krähenbühl; |
| 2019 | 18 | MMDetection: Open MMLab Detection Toolbox And Benchmark IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce the various features of this toolbox. |
KAI CHEN et. al. |
| 2019 | 19 | CenterNet: Keypoint Triplets For Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper presents an efficient solution which explores the visual patterns within each cropped region with minimal costs. |
KAIWEN DUAN et. al. |
| 2019 | 20 | CheXpert: A Large Chest Radiograph Dataset With Uncertainty Labels And Expert Comparison IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models. |
JEREMY IRVIN et. al. |
| 2019 | 21 | Object Detection in 20 Years: A Survey IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed-up techniques, and the recent state-of-the-art detection methods. |
Zhengxia Zou; Keyan Chen; Zhenwei Shi; Yuhong Guo; Jieping Ye; |
| 2019 | 22 | KPConv: Flexible And Deformable Convolution For Point Clouds IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present Kernel Point Convolution (KPConv), a new design of point convolution, i.e. that operates on point clouds without any intermediate representation. |
HUGUES THOMAS et. al. |
| 2019 | 23 | Semantic Image Synthesis With Spatially-Adaptive Normalization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. |
Taesung Park; Ming-Yu Liu; Ting-Chun Wang; Jun-Yan Zhu; |
| 2019 | 24 | Res2Net: A New Multi-scale Backbone Architecture IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. |
SHANG-HUA GAO et. al. |
| 2019 | 25 | Class-Balanced Loss Based On Effective Number Of Samples IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. |
Yin Cui; Menglin Jia; Tsung-Yi Lin; Yang Song; Serge Belongie; |
| 2019 | 26 | FaceForensics++: Learning To Detect Manipulated Facial Images IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To standardize the evaluation of detection methods, we propose an automated benchmark for facial manipulation detection. |
ANDREAS RÖSSLER et. al. |
| 2019 | 27 | Contrastive Multiview Coding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. |
Yonglong Tian; Dilip Krishnan; Phillip Isola; |
| 2019 | 28 | A Survey Of The Recent Architectures Of Deep Convolutional Neural Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This survey thus focuses on the intrinsic taxonomy present in the recently reported deep CNN architectures and, consequently, classifies the recent innovations in CNN architectures into seven different categories. |
Asifullah Khan; Anabia Sohail; Umme Zahoora; Aqsa Saeed Qureshi; |
| 2019 | 29 | Selective Kernel Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. |
Xiang Li; Wenhai Wang; Xiaolin Hu; Jian Yang; |
| 2019 | 30 | UNITER: UNiversal Image-TExt Representation Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce UNITER, a UNiversal Image-TExt Representation, learned through large-scale pre-training over four image-text datasets (COCO, Visual Genome, Conceptual Captions, and SBU Captions), which can power heterogeneous downstream V+L tasks with joint multimodal embeddings. |
YEN-CHUN CHEN et. al. |
| 2018 | 1 | YOLOv3: An Incremental Improvement IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present some updates to YOLO! |
Joseph Redmon; Ali Farhadi; |
| 2018 | 2 | MobileNetV2: Inverted Residuals And Linear Bottlenecks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. |
Mark Sandler; Andrew Howard; Menglong Zhu; Andrey Zhmoginov; Liang-Chieh Chen; |
| 2018 | 3 | CBAM: Convolutional Block Attention Module IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. |
Sanghyun Woo; Jongchan Park; Joon-Young Lee; In So Kweon; |
| 2018 | 4 | The Unreasonable Effectiveness Of Deep Features As A Perceptual Metric IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To answer these questions, we introduce a new dataset of human perceptual similarity judgments. |
Richard Zhang; Phillip Isola; Alexei A. Efros; Eli Shechtman; Oliver Wang; |
| 2018 | 5 | Encoder-Decoder With Atrous Separable Convolution For Semantic Image Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose to combine the advantages from both methods. |
Liang-Chieh Chen; Yukun Zhu; George Papandreou; Florian Schroff; Hartwig Adam; |
| 2018 | 6 | UNet++: A Nested U-Net Architecture For Medical Image Segmentation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. |
Zongwei Zhou; Md Mahfuzur Rahman Siddiquee; Nima Tajbakhsh; Jianming Liang; |
| 2018 | 7 | ArcFace: Additive Angular Margin Loss For Deep Face Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. We release all refined training data, training codes, pre-trained models and training logs, which will help reproduce the results in this paper. |
Jiankang Deng; Jia Guo; Niannan Xue; Stefanos Zafeiriou; |
| 2018 | 8 | Path Aggregation Network For Instance Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose Path Aggregation Network (PANet) aiming at boosting information flow in proposal-based instance segmentation framework. |
Shu Liu; Lu Qi; Haifang Qin; Jianping Shi; Jiaya Jia; |
| 2018 | 9 | Dynamic Graph CNN For Learning On Point Clouds IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds including classification and segmentation. |
YUE WANG et. al. |
| 2018 | 10 | Attention U-Net: Learning Where To Look For The Pancreas IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. |
OZAN OKTAY et. al. |
| 2018 | 11 | ShuffleNet V2: Practical Guidelines For Efficient CNN Architecture Design IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. |
Ningning Ma; Xiangyu Zhang; Hai-Tao Zheng; Jian Sun; |
| 2018 | 12 | Dual Attention Network For Scene Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we address the scene segmentation task by capturing rich contextual dependencies based on the selfattention mechanism. |
JUN FU et. al. |
| 2018 | 13 | OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. |
Zhe Cao; Gines Hidalgo; Tomas Simon; Shih-En Wei; Yaser Sheikh; |
| 2018 | 14 | Image Super-Resolution Using Very Deep Residual Channel Attention Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To solve these problems, we propose the very deep residual channel attention networks (RCAN). |
YULUN ZHANG et. al. |
| 2018 | 15 | Spatial Temporal Graph Convolutional Networks For Skeleton-Based Action Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose a novel model of dynamic skeletons called Spatial-Temporal Graph Convolutional Networks (ST-GCN), which moves beyond the limitations of previous methods by automatically learning both the spatial and temporal patterns from data. |
Sijie Yan; Yuanjun Xiong; Dahua Lin; |
| 2018 | 16 | Object Detection With Deep Learning: A Review IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we provide a review on deep learning based object detection frameworks. |
Zhong-Qiu Zhao; Peng Zheng; Shou-tao Xu; Xindong Wu; |
| 2018 | 17 | ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. |
XINTAO WANG et. al. |
| 2018 | 18 | Group Normalization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present Group Normalization (GN) as a simple alternative to BN. |
Yuxin Wu; Kaiming He; |
| 2018 | 19 | CornerNet: Detecting Objects As Paired Keypoints IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. |
Hei Law; Jia Deng; |
| 2018 | 20 | SlowFast Networks For Video Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present SlowFast networks for video recognition. |
Christoph Feichtenhofer; Haoqi Fan; Jitendra Malik; Kaiming He; |
| 2018 | 21 | Residual Dense Network For Image Super-Resolution IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a novel residual dense network (RDN) to address this problem in image SR. |
Yulun Zhang; Yapeng Tian; Yu Kong; Bineng Zhong; Yun Fu; |
| 2018 | 22 | Unsupervised Representation Learning By Predicting Image Rotations IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In our work we propose to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input. |
Spyros Gidaris; Praveer Singh; Nikos Komodakis; |
| 2018 | 23 | MnasNet: Platform-Aware Neural Architecture Search For Mobile IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. |
MINGXING TAN et. al. |
| 2018 | 24 | Occupancy Networks: Learning 3D Reconstruction In Function Space IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose Occupancy Networks, a new representation for learning-based 3D reconstruction methods. |
Lars Mescheder; Michael Oechsle; Michael Niemeyer; Sebastian Nowozin; Andreas Geiger; |
| 2018 | 25 | The HAM10000 Dataset, A Large Collection Of Multi-source Dermatoscopic Images Of Common Pigmented Skin Lesions IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We tackle this problem by releasing the HAM10000 (Human Against Machine with 10000 training images) dataset. |
Philipp Tschandl; Cliff Rosendahl; Harald Kittler; |
| 2018 | 26 | ImageNet-trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies. |
ROBERT GEIRHOS et. al. |
| 2018 | 27 | CCNet: Criss-Cross Attention For Semantic Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a Criss-Cross Network (CCNet) for obtaining full-image contextual information in a very effective and efficient way. |
ZILONG HUANG et. al. |
| 2018 | 28 | CosFace: Large Margin Cosine Loss For Deep Face Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a novel loss function, namely large margin cosine loss (LMCL), to realize this idea from a different perspective. |
HAO WANG et. al. |
| 2018 | 29 | PointRCNN: 3D Object Proposal Generation And Detection From Point Cloud IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose PointRCNN for 3D object detection from raw point cloud. |
Shaoshuai Shi; Xiaogang Wang; Hongsheng Li; |
| 2018 | 30 | Multimodal Unsupervised Image-to-Image Translation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To address this limitation, we propose a Multimodal Unsupervised Image-to-image Translation (MUNIT) framework. |
Xun Huang; Ming-Yu Liu; Serge Belongie; Jan Kautz; |
| 2017 | 1 | Squeeze-and-Excitation Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the Squeeze-and-Excitation (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. |
Jie Hu; Li Shen; Samuel Albanie; Gang Sun; Enhua Wu; |
| 2017 | 2 | Mask R-CNN IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a conceptually simple, flexible, and general framework for object instance segmentation. |
Kaiming He; Georgia Gkioxari; Piotr Dollár; Ross Girshick; |
| 2017 | 3 | Focal Loss For Dense Object Detection IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we investigate why this is the case. |
Tsung-Yi Lin; Priya Goyal; Ross Girshick; Kaiming He; Piotr Dollár; |
| 2017 | 4 | MobileNets: Efficient Convolutional Neural Networks For Mobile Vision Applications IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a class of efficient models called MobileNets for mobile and embedded vision applications. |
ANDREW G. HOWARD et. al. |
| 2017 | 5 | PointNet++: Deep Hierarchical Feature Learning On Point Sets In A Metric Space IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. |
Charles R. Qi; Li Yi; Hao Su; Leonidas J. Guibas; |
| 2017 | 6 | A Survey On Deep Learning In Medical Image Analysis IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. |
GEERT LITJENS et. al. |
| 2017 | 7 | Non-local Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. |
Xiaolong Wang; Ross Girshick; Abhinav Gupta; Kaiming He; |
| 2017 | 8 | Rethinking Atrous Convolution For Semantic Image Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter’s field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. |
Liang-Chieh Chen; George Papandreou; Florian Schroff; Hartwig Adam; |
| 2017 | 9 | Quo Vadis, Action Recognition? A New Model And The Kinetics Dataset IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We provide an analysis on how current architectures fare on the task of action classification on this dataset and how much performance improves on the smaller benchmark datasets after pre-training on Kinetics. |
Joao Carreira; Andrew Zisserman; |
| 2017 | 10 | ShuffleNet: An Extremely Efficient Convolutional Neural Network For Mobile Devices IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). |
Xiangyu Zhang; Xinyu Zhou; Mengxiao Lin; Jian Sun; |
| 2017 | 11 | Enhanced Deep Residual Networks For Single Image Super-Resolution IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we develop an enhanced deep super-resolution network (EDSR) with performance exceeding those of current state-of-the-art SR methods. |
Bee Lim; Sanghyun Son; Heewon Kim; Seungjun Nah; Kyoung Mu Lee; |
| 2017 | 12 | Learning Transferable Architectures For Scalable Image Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we study a method to learn the model architectures directly on the dataset of interest. |
Barret Zoph; Vijay Vasudevan; Jonathon Shlens; Quoc V. Le; |
| 2017 | 13 | Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present an approach for learning to translate an image from a source domain $X$ to a target domain $Y$ in the absence of paired examples. |
Jun-Yan Zhu; Taesung Park; Phillip Isola; Alexei A. Efros; |
| 2017 | 14 | Cascade R-CNN: Delving Into High Quality Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Abstract: In object detection, an intersection over union (IoU) threshold is required to define positives and negatives. An object detector, trained with low IoU threshold, e.g. 0.5, … |
Zhaowei Cai; Nuno Vasconcelos; |
| 2017 | 15 | Adversarial Discriminative Domain Adaptation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: They also can improve recognition despite the presence of domain shift or dataset bias: several adversarial approaches to unsupervised domain adaptation have recently been introduced, which reduce the difference between the training and test domain distributions and thus improve generalization performance. |
Eric Tzeng; Judy Hoffman; Kate Saenko; Trevor Darrell; |
| 2017 | 16 | Arbitrary Style Transfer In Real-time With Adaptive Instance Normalization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. |
Xun Huang; Serge Belongie; |
| 2017 | 17 | ScanNet: Richly-annotated 3D Reconstructions Of Indoor Scenes IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations. |
ANGELA DAI et. al. |
| 2017 | 18 | Dynamic Routing Between Capsules IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We use the length of the activity vector to represent the probability that the entity exists and its orientation to represent the instantiation parameters. |
Sara Sabour; Nicholas Frosst; Geoffrey E Hinton; |
| 2017 | 19 | Billion-scale Similarity Search With GPUs IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art. |
Jeff Johnson; Matthijs Douze; Hervé Jégou; |
| 2017 | 20 | Bottom-Up And Top-Down Attention For Image Captioning And Visual Question Answering IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. |
PETER ANDERSON et. al. |
| 2017 | 21 | Learning To Compare: Relation Network For Few-Shot Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a conceptually simple, flexible, and general framework for few-shot learning, where a classifier must learn to recognise new classes given only few examples from each. |
FLOOD SUNG et. al. |
| 2017 | 22 | Learning Important Features Through Propagating Activation Differences IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. |
Avanti Shrikumar; Peyton Greenside; Anshul Kundaje; |
| 2017 | 23 | VoxelNet: End-to-End Learning For Point Cloud Based 3D Object Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. |
Yin Zhou; Oncel Tuzel; |
| 2017 | 24 | High-Resolution Image Synthesis And Semantic Manipulation With Conditional GANs IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). |
TING-CHUN WANG et. al. |
| 2017 | 25 | Simple Online And Realtime Tracking With A Deep Association Metric IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we integrate appearance information to improve the performance of SORT. |
Nicolai Wojke; Alex Bewley; Dietrich Paulus; |
| 2017 | 26 | The Kinetics Human Action Video Dataset IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We describe the DeepMind Kinetics human action video dataset. |
WILL KAY et. al. |
| 2017 | 27 | Improved Regularization Of Convolutional Neural Networks With Cutout IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we show that the simple regularization technique of randomly masking out square regions of input during training, which we call cutout, can be used to improve the robustness and overall performance of convolutional neural networks. |
Terrance DeVries; Graham W. Taylor; |
| 2017 | 28 | Random Erasing Data Augmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN). |
Zhun Zhong; Liang Zheng; Guoliang Kang; Shaozi Li; Yi Yang; |
| 2017 | 29 | Accurate, Large Minibatch SGD: Training ImageNet In 1 Hour IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization. |
PRIYA GOYAL et. al. |
| 2017 | 30 | StarGAN: Unified Generative Adversarial Networks For Multi-Domain Image-to-Image Translation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. |
YUNJEY CHOI et. al. |
| 2016 | 1 | Densely Connected Convolutional Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. |
Gao Huang; Zhuang Liu; Laurens van der Maaten; Kilian Q. Weinberger; |
| 2016 | 2 | Feature Pyramid Networks For Object Detection IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. |
TSUNG-YI LIN et. al. |
| 2016 | 3 | Grad-CAM: Visual Explanations From Deep Networks Via Gradient-based Localization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a technique for producing visual explanations for decisions from a large class of CNN-based models, making them more transparent. |
RAMPRASAATH R. SELVARAJU et. al. |
| 2016 | 4 | Image-to-Image Translation With Conditional Adversarial Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. |
Phillip Isola; Jun-Yan Zhu; Tinghui Zhou; Alexei A. Efros; |
| 2016 | 5 | DeepLab: Semantic Image Segmentation With Deep Convolutional Nets, Atrous Convolution, And Fully Connected CRFs IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. |
Liang-Chieh Chen; George Papandreou; Iasonas Kokkinos; Kevin Murphy; Alan L. Yuille; |
| 2016 | 6 | YOLO9000: Better, Faster, Stronger IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. |
Joseph Redmon; Ali Farhadi; |
| 2016 | 7 | Xception: Deep Learning With Depthwise Separable Convolutions IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present an interpretation of Inception modules in convolutional neural networks as being an intermediate step in-between regular convolution and the depthwise separable convolution operation (a depthwise convolution followed by a pointwise convolution). |
François Chollet; |
| 2016 | 8 | PointNet: Deep Learning On Point Sets For 3D Classification And Segmentation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we design a novel type of neural network that directly consumes point clouds and well respects the permutation invariance of points in the input. |
Charles R. Qi; Hao Su; Kaichun Mo; Leonidas J. Guibas; |
| 2016 | 9 | Inception-v4, Inception-ResNet And The Impact Of Residual Connections On Learning IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: With an ensemble of three residual and one Inception-v4, we achieve 3.08 percent top-5 error on the test set of the ImageNet classification (CLS) challenge |
Christian Szegedy; Sergey Ioffe; Vincent Vanhoucke; Alex Alemi; |
| 2016 | 10 | Pyramid Scene Parsing Network IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). |
Hengshuang Zhao; Jianping Shi; Xiaojuan Qi; Xiaogang Wang; Jiaya Jia; |
| 2016 | 11 | The Cityscapes Dataset For Semantic Urban Scene Understanding IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. |
MARIUS CORDTS et. al. |
| 2016 | 12 | Photo-Realistic Single Image Super-Resolution Using A Generative Adversarial Network IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). |
CHRISTIAN LEDIG et. al. |
| 2016 | 13 | Aggregated Residual Transformations For Deep Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a simple, highly modularized network architecture for image classification. |
Saining Xie; Ross Girshick; Piotr Dollár; Zhuowen Tu; Kaiming He; |
| 2016 | 14 | Perceptual Losses For Real-Time Style Transfer And Super-Resolution IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. |
Justin Johnson; Alexandre Alahi; Li Fei-Fei; |
| 2016 | 15 | Identity Mappings In Deep Residual Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. |
Kaiming He; Xiangyu Zhang; Shaoqing Ren; Jian Sun; |
| 2016 | 16 | V-Net: Fully Convolutional Neural Networks For Volumetric Medical Image Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work we propose an approach to 3D image segmentation based on a volumetric, fully convolutional, neural network. |
Fausto Milletari; Nassir Navab; Seyed-Ahmad Ahmadi; |
| 2016 | 17 | Wide Residual Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. |
Sergey Zagoruyko; Nikos Komodakis; |
| 2016 | 18 | Beyond A Gaussian Denoiser: Residual Learning Of Deep CNN For Image Denoising IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we take one step forward by investigating the construction of feed-forward denoising convolutional neural networks (DnCNNs) to embrace the progress in very deep architecture, learning algorithm, and regularization method into image denoising. |
Kai Zhang; Wangmeng Zuo; Yunjin Chen; Deyu Meng; Lei Zhang; |
| 2016 | 19 | 3D U-Net: Learning Dense Volumetric Segmentation From Sparse Annotation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper introduces a network for volumetric segmentation that learns from sparsely annotated volumetric images. |
Özgün Çiçek; Ahmed Abdulkadir; Soeren S. Lienkamp; Thomas Brox; Olaf Ronneberger; |
| 2016 | 20 | Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present an approach to efficiently detect the 2D pose of multiple people in an image. |
Zhe Cao; Tomas Simon; Shih-En Wei; Yaser Sheikh; |
| 2016 | 21 | Adversarial Examples In The Physical World IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Up to now, all previous work have assumed a threat model in which the adversary can feed data directly into the machine learning classifier. |
Alexey Kurakin; Ian Goodfellow; Samy Bengio; |
| 2016 | 22 | R-FCN: Object Detection Via Region-based Fully Convolutional Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present region-based, fully convolutional networks for accurate and efficient object detection. |
Jifeng Dai; Yi Li; Kaiming He; Jian Sun; |
| 2016 | 23 | Real-Time Single Image And Video Super-Resolution Using An Efficient Sub-Pixel Convolutional Neural Network IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. |
WENZHE SHI et. al. |
| 2016 | 24 | Context Encoders: Feature Learning By Inpainting IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present an unsupervised visual feature learning algorithm driven by context-based pixel prediction. |
Deepak Pathak; Philipp Krahenbuhl; Jeff Donahue; Trevor Darrell; Alexei A. Efros; |
| 2016 | 25 | Stacked Hourglass Networks For Human Pose Estimation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This work introduces a novel convolutional network architecture for the task of human pose estimation. |
Alejandro Newell; Kaiyu Yang; Jia Deng; |
| 2016 | 26 | Learning Without Forgetting IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. |
Zhizhong Li; Derek Hoiem; |
| 2016 | 27 | Deep Convolutional Neural Networks For Computer-Aided Detection: CNN Architectures, Dataset Characteristics And Transfer Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we exploit three important, but previously understudied factors of employing deep convolutional neural networks to computer-aided detection problems. |
HOO-CHANG SHIN et. al. |
| 2016 | 28 | Least Squares Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator. |
XUDONG MAO et. al. |
| 2016 | 29 | XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose two efficient approximations to standard convolutional neural networks: Binary-Weight-Networks and XNOR-Networks. |
Mohammad Rastegari; Vicente Ordonez; Joseph Redmon; Ali Farhadi; |
| 2015 | 1 | Deep Residual Learning For Image Recognition IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. |
Kaiming He; Xiangyu Zhang; Shaoqing Ren; Jian Sun; |
| 2015 | 2 | U-Net: Convolutional Networks For Biomedical Image Segmentation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. |
Olaf Ronneberger; Philipp Fischer; Thomas Brox; |
| 2015 | 3 | Faster R-CNN: Towards Real-Time Object Detection With Region Proposal Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. |
Shaoqing Ren; Kaiming He; Ross Girshick; Jian Sun; |
| 2015 | 4 | You Only Look Once: Unified, Real-Time Object Detection IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present YOLO, a new approach to object detection. |
Joseph Redmon; Santosh Divvala; Ross Girshick; Ali Farhadi; |
| 2015 | 5 | SSD: Single Shot MultiBox Detector IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a method for detecting objects in images using a single deep neural network. |
WEI LIU et. al. |
| 2015 | 6 | Rethinking The Inception Architecture For Computer Vision IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Here we explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. |
Christian Szegedy; Vincent Vanhoucke; Sergey Ioffe; Jonathon Shlens; Zbigniew Wojna; |
| 2015 | 7 | Fast R-CNN IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. |
Ross Girshick; |
| 2015 | 8 | Delving Deep Into Rectifiers: Surpassing Human-Level Performance On ImageNet Classification IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we study rectifier neural networks for image classification from two aspects. |
Kaiming He; Xiangyu Zhang; Shaoqing Ren; Jian Sun; |
| 2015 | 9 | SegNet: A Deep Convolutional Encoder-Decoder Architecture For Image Segmentation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. |
Vijay Badrinarayanan; Alex Kendall; Roberto Cipolla; |
| 2015 | 10 | FaceNet: A Unified Embedding For Face Recognition And Clustering IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. |
Florian Schroff; Dmitry Kalenichenko; James Philbin; |
| 2015 | 11 | Learning Deep Features For Discriminative Localization IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels. |
Bolei Zhou; Aditya Khosla; Agata Lapedriza; Aude Oliva; Antonio Torralba; |
| 2015 | 12 | Multi-Scale Context Aggregation By Dilated Convolutions IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we develop a new convolutional network module that is specifically designed for dense prediction. |
Fisher Yu; Vladlen Koltun; |
| 2015 | 13 | Convolutional LSTM Network: A Machine Learning Approach For Precipitation Nowcasting IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem in which both the input and the prediction target are spatiotemporal sequences. |
XINGJIAN SHI et. al. |
| 2015 | 14 | Spatial Transformer Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. |
Max Jaderberg; Karen Simonyan; Andrew Zisserman; Koray Kavukcuoglu; |
| 2015 | 15 | Accurate Image Super-Resolution Using Very Deep Convolutional Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a highly accurate single-image super-resolution (SR) method. |
Jiwon Kim; Jung Kwon Lee; Kyoung Mu Lee; |
| 2015 | 16 | Recent Advances In Convolutional Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we provide a broad survey of the recent advances in convolutional neural networks. |
JIUXIANG GU et. al. |
| 2015 | 17 | FlowNet: Learning Optical Flow With Convolutional Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we construct appropriate CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. |
PHILIPP FISCHER et. al. |
| 2015 | 18 | Learning Deconvolution Network For Semantic Segmentation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a novel semantic segmentation algorithm by learning a deconvolution network. |
Hyeonwoo Noh; Seunghoon Hong; Bohyung Han; |
| 2015 | 19 | Holistically-Nested Edge Detection IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We develop a new edge detection algorithm that tackles two important issues in this long-standing vision problem: (1) holistic image training and prediction; and (2) multi-scale and multi-level feature learning. |
Saining Xie; Zhuowen Tu; |
| 2015 | 20 | Multi-view Convolutional Neural Networks For 3D Shape Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We address this question in the context of learning to recognize 3D shapes from a collection of their rendered views on 2D images. |
Hang Su; Subhransu Maji; Evangelos Kalogerakis; Erik Learned-Miller; |
| 2015 | 21 | A Neural Algorithm Of Artistic Style IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. |
Leon A. Gatys; Alexander S. Ecker; Matthias Bethge; |
| 2015 | 22 | NetVLAD: CNN Architecture For Weakly Supervised Place Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph. |
Relja Arandjelović; Petr Gronat; Akihiko Torii; Tomas Pajdla; Josef Sivic; |
| 2015 | 23 | Brain Tumor Segmentation With Deep Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present a fully automatic brain tumor segmentation method based on Deep Neural Networks (DNNs). |
MOHAMMAD HAVAEI et. al. |
| 2015 | 24 | Unsupervised Visual Representation Learning By Context Prediction IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. |
Carl Doersch; Abhinav Gupta; Alexei A. Efros; |
| 2015 | 25 | A Large Dataset To Train Convolutional Networks For Disparity, Optical Flow, And Scene Flow Estimation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, we propose three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks. |
NIKOLAUS MAYER et. al. |
| 2015 | 26 | Cyclical Learning Rates For Training Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper describes a new method for setting the learning rate, named cyclical learning rates, which practically eliminates the need to experimentally find the best values and schedule for the global learning rates. |
Leslie N. Smith; |
| 2015 | 27 | An End-to-End Trainable Neural Network For Image-based Sequence Recognition And Its Application To Scene Text Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. |
Baoguang Shi; Xiang Bai; Cong Yao; |
| 2015 | 28 | Microsoft COCO Captions: Data Collection And Evaluation Server IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we describe the Microsoft COCO Caption dataset and evaluation server. |
XINLEI CHEN et. al. |
| 2015 | 29 | Deeply-Recursive Convolutional Network For Image Super-Resolution IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose an image super-resolution method (SR) using a deeply-recursive convolutional network (DRCN). |
Jiwon Kim; Jung Kwon Lee; Kyoung Mu Lee; |
| 2015 | 30 | Image-based Recommendations On Styles And Substitutes IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We seek here to model this human sense of the relationships between objects based on their appearance. |
Julian McAuley; Christopher Targett; Qinfeng Shi; Anton van den Hengel; |
| 2014 | 1 | Very Deep Convolutional Networks For Large-Scale Image Recognition IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. |
Karen Simonyan; Andrew Zisserman; |
| 2014 | 2 | Microsoft COCO: Common Objects In Context IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. |
TSUNG-YI LIN et. al. |
| 2014 | 3 | Going Deeper With Convolutions IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a deep convolutional neural network architecture codenamed Inception, which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). |
CHRISTIAN SZEGEDY et. al. |
| 2014 | 4 | ImageNet Large Scale Visual Recognition Challenge IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements. |
OLGA RUSSAKOVSKY et. al. |
| 2014 | 5 | Fully Convolutional Networks For Semantic Segmentation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Our key insight is to build fully convolutional networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. |
Jonathan Long; Evan Shelhamer; Trevor Darrell; |
| 2014 | 6 | Caffe: Convolutional Architecture For Fast Feature Embedding IF:10 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. … |
YANGQING JIA et. al. |
| 2014 | 7 | Spatial Pyramid Pooling In Deep Convolutional Networks For Visual Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we equip the networks with another pooling strategy, spatial pyramid pooling, to eliminate the above requirement. |
Kaiming He; Xiangyu Zhang; Shaoqing Ren; Jian Sun; |
| 2014 | 8 | Deep Learning Face Attributes In The Wild IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a novel deep learning framework for attribute prediction in the wild. |
Ziwei Liu; Ping Luo; Xiaogang Wang; Xiaoou Tang; |
| 2014 | 9 | Learning Spatiotemporal Features With 3D Convolutional Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset. |
Du Tran; Lubomir Bourdev; Rob Fergus; Lorenzo Torresani; Manohar Paluri; |
| 2014 | 10 | Image Super-Resolution Using Deep Convolutional Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a deep learning method for single image super-resolution (SR). |
Chao Dong; Chen Change Loy; Kaiming He; Xiaoou Tang; |
| 2014 | 11 | Two-Stream Convolutional Networks For Action Recognition In Videos IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. |
Karen Simonyan; Andrew Zisserman; |
| 2014 | 12 | High-Speed Tracking With Kernelized Correlation Filters IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Based on this simple observation, we propose an analytic model for datasets of thousands of translated patches. |
João F. Henriques; Rui Caseiro; Pedro Martins; Jorge Batista; |
| 2014 | 13 | Show And Tell: A Neural Image Caption Generator IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. |
Oriol Vinyals; Alexander Toshev; Samy Bengio; Dumitru Erhan; |
| 2014 | 14 | Long-term Recurrent Convolutional Networks For Visual Recognition And Description IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. |
JEFF DONAHUE et. al. |
| 2014 | 15 | 3D ShapeNets: A Deep Representation For Volumetric Shapes IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network. |
ZHIRONG WU et. al. |
| 2014 | 16 | Deep Visual-Semantic Alignments For Generating Image Descriptions IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a model that generates natural language descriptions of images and their regions. |
Andrej Karpathy; Li Fei-Fei; |
| 2014 | 17 | Semantic Image Segmentation With Deep Convolutional Nets And Fully Connected CRFs IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). |
Liang-Chieh Chen; George Papandreou; Iasonas Kokkinos; Kevin Murphy; Alan L. Yuille; |
| 2014 | 18 | CIDEr: Consensus-based Image Description Evaluation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a novel paradigm for evaluating image descriptions that uses human consensus. |
Ramakrishna Vedantam; C. Lawrence Zitnick; Devi Parikh; |
| 2014 | 19 | CNN Features Off-the-shelf: An Astounding Baseline For Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We use features extracted from the \overfeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. |
Ali Sharif Razavian; Hossein Azizpour; Josephine Sullivan; Stefan Carlsson; |
| 2014 | 20 | Depth Map Prediction From A Single Image Using A Multi-Scale Deep Network IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. |
David Eigen; Christian Puhrsch; Rob Fergus; |
| 2014 | 21 | Return Of The Devil In The Details: Delving Deep Into Convolutional Nets IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Source code and models to reproduce the experiments in the paper is made publicly available. |
Ken Chatfield; Karen Simonyan; Andrea Vedaldi; Andrew Zisserman; |
| 2014 | 22 | Deep Neural Networks Are Easily Fooled: High Confidence Predictions For Unrecognizable Images IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: A recent study revealed that changing an image (e.g. of a lion) in a way imperceptible to humans can cause a DNN to label the image as something else entirely (e.g. mislabeling a lion a library). |
Anh Nguyen; Jason Yosinski; Jeff Clune; |
| 2014 | 23 | Predicting Depth, Surface Normals And Semantic Labels With A Common Multi-Scale Convolutional Architecture IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling. |
David Eigen; Rob Fergus; |
| 2014 | 24 | Deep Domain Confusion: Maximizing For Domain Invariance IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a new CNN architecture which introduces an adaptation layer and an additional domain confusion loss, to learn a representation that is both semantically meaningful and domain invariant. |
Eric Tzeng; Judy Hoffman; Ning Zhang; Kate Saenko; Trevor Darrell; |
| 2014 | 25 | Deep Learning Face Representation By Joint Identification-Verification IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we show that it can be well solved with deep learning and using both face identification and verification signals as supervision. |
Yi Sun; Xiaogang Wang; Xiaoou Tang; |
| 2014 | 26 | Learning Face Representation From Scratch IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To solve this problem, this paper proposes a semi-automatical way to collect face images from Internet and builds a large scale dataset containing about 10,000 subjects and 500,000 images, called CASIAWebFace. |
Dong Yi; Zhen Lei; Shengcai Liao; Stan Z. Li; |
| 2014 | 27 | Person Re-identification By Local Maximal Occurrence Representation And Metric Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA). |
Shengcai Liao; Yang Hu; Xiangyu Zhu; Stan Z. Li; |
| 2014 | 28 | Understanding Deep Image Representations By Inverting Them IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we conduct a direct analysis of the visual information contained in representations by asking the following question: given an encoding of an image, to which extent is it possible to reconstruct the image itself? |
Aravindh Mahendran; Andrea Vedaldi; |
| 2014 | 29 | Exploiting Linear Structure Within Convolutional Networks For Efficient Evaluation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present techniques for speeding up the test-time evaluation of large convolutional networks, designed for object recognition tasks. |
Emily Denton; Wojciech Zaremba; Joan Bruna; Yann LeCun; Rob Fergus; |
| 2014 | 30 | Learning Rich Features From RGB-D Images For Object Detection And Segmentation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features. |
Saurabh Gupta; Ross Girshick; Pablo Arbeláez; Jitendra Malik; |
| 2013 | 1 | Rich Feature Hierarchies For Accurate Object Detection And Semantic Segmentation IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012—achieving a mAP of 53.3%. |
Ross Girshick; Jeff Donahue; Trevor Darrell; Jitendra Malik; |
| 2013 | 2 | Visualizing And Understanding Convolutional Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. |
Matthew D Zeiler; Rob Fergus; |
| 2013 | 3 | Intriguing Properties Of Neural Networks IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we report two such properties. |
CHRISTIAN SZEGEDY et. al. |
| 2013 | 4 | Deep Inside Convolutional Networks: Visualising Image Classification Models And Saliency Maps IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. |
Karen Simonyan; Andrea Vedaldi; Andrew Zisserman; |
| 2013 | 5 | OverFeat: Integrated Recognition, Localization And Detection Using Convolutional Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present an integrated framework for using Convolutional Networks for classification, localization and detection. |
PIERRE SERMANET et. al. |
| 2013 | 6 | DeCAF: A Deep Convolutional Activation Feature For Generic Visual Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks. |
JEFF DONAHUE et. al. |
| 2013 | 7 | Describing Textures In The Wild IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Aiming at supporting this analytical dimension in image understanding, we address the challenging problem of describing textures with semantic attributes. |
Mircea Cimpoi; Subhransu Maji; Iasonas Kokkinos; Sammy Mohamed; Andrea Vedaldi; |
| 2013 | 8 | DeepPose: Human Pose Estimation Via Deep Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a method for human pose estimation based on Deep Neural Networks (DNNs). |
Alexander Toshev; Christian Szegedy; |
| 2013 | 9 | Fine-Grained Visual Classification Of Aircraft IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces FGVC-Aircraft, a new dataset containing 10,000 images of aircraft spanning 100 aircraft models, organised in a three-level hierarchy. |
Subhransu Maji; Esa Rahtu; Juho Kannala; Matthew Blaschko; Andrea Vedaldi; |
| 2013 | 10 | Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a new effective and efficient IQA model, called gradient magnitude similarity deviation (GMSD). |
Wufeng Xue; Lei Zhang; Xuanqin Mou; Alan C. Bovik; |
| 2013 | 11 | Zero-Shot Learning Through Cross-Modal Transfer IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This work introduces a model that can recognize objects in images even if no training data is available for the objects. |
RICHARD SOCHER et. al. |
| 2013 | 12 | Scalable Object Detection Using Deep Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose a saliency-inspired neural network model for detection, which predicts a set of class-agnostic bounding boxes along with a single score for each box, corresponding to its likelihood of containing any object of interest. |
Dumitru Erhan; Christian Szegedy; Alexander Toshev; Dragomir Anguelov; |
| 2013 | 13 | Image Segmentation In Video Sequences: A Probabilistic Approach IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The basic idea of this paper is that we can classify each pixel using a model of how that pixel looks when it is part of different classes. |
Nir Friedman; Stuart Russell; |
| 2013 | 14 | Medical Image Fusion: A Survey Of The State Of The Art IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This review article provides a factual listing of methods and summarizes the broad scientific challenges faced in the field of medical image fusion. |
A. P. James; B. V. Dasarathy; |
| 2013 | 15 | A Survey Of Appearance Models In Visual Object Tracking IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this survey, we first decompose the problem of appearance modeling into two different processing stages: visual representation and statistical modeling. |
XI LI et. al. |
| 2013 | 16 | Multi-digit Number Recognition From Street View Imagery Using Deep Convolutional Neural Networks IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we address an equally hard sub-problem in this domain viz. recognizing arbitrary multi-digit numbers from Street View imagery. |
Ian J. Goodfellow; Yaroslav Bulatov; Julian Ibarz; Sacha Arnoud; Vinay Shet; |
| 2013 | 17 | SEEDS: Superpixels Extracted Via Energy-Driven Sampling IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce a new approach based on a simple hill-climbing optimization. |
Michael Van den Bergh; Xavier Boix; Gemma Roig; Luc Van Gool; |
| 2013 | 18 | Advances In Hyperspectral Image Classification: Earth Monitoring With Statistical Learning Methods IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: New methods have been presented to account for the spatial homogeneity of images, to include user’s interaction via active learning, to take advantage of the manifold structure with semisupervised learning, to extract and encode invariances, or to adapt classifiers and image representations to unseen yet similar scenes. |
Gustavo Camps-Valls; Devis Tuia; Lorenzo Bruzzone; Jón Atli Benediktsson; |
| 2013 | 19 | Fast Training Of Convolutional Networks Through FFTs IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a simple algorithm which accelerates training and inference by a significant factor, and can yield improvements of over an order of magnitude compared to existing state-of-the-art implementations. |
Michael Mathieu; Mikael Henaff; Yann LeCun; |
| 2013 | 20 | Rotational Projection Statistics For 3D Local Surface Description And Object Recognition IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a novel method named Rotational Projection Statistics (RoPS). |
Yulan Guo; Ferdous Sohel; Mohammed Bennamoun; Min Lu; Jianwei Wan; |
| 2013 | 21 | Dropout Improves Recurrent Neural Networks For Handwriting Recognition IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show that their performance can be greatly improved using dropout – a recently proposed regularization method for deep architectures. |
Vu Pham; Théodore Bluche; Christopher Kermorvant; Jérôme Louradour; |
| 2013 | 22 | PANDA: Pose Aligned Networks For Deep Attribute Modeling IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a method for inferring human attributes (such as gender, hair style, clothes style, expression, action) from images of people under large variation of viewpoint, pose, appearance, articulation and occlusion. |
Ning Zhang; Manohar Paluri; Marc’Aurelio Ranzato; Trevor Darrell; Lubomir Bourdev; |
| 2013 | 23 | Indoor Semantic Segmentation Using Depth Information IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. |
Camille Couprie; Clément Farabet; Laurent Najman; Yann LeCun; |
| 2013 | 24 | Recognizing Image Style IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We describe an approach to predicting style of images, and perform a thorough evaluation of different image features for these tasks. |
SERGEY KARAYEV et. al. |
| 2013 | 25 | Deep Convolutional Ranking For Multilabel Image Annotation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose to leverage the advantage of such features and analyze key components that lead to better performances. |
Yunchao Gong; Yangqing Jia; Thomas Leung; Alexander Toshev; Sergey Ioffe; |
| 2013 | 26 | Some Improvements On Deep Convolutional Neural Network Based Image Classification IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We investigate multiple techniques to improve upon the current state of the art deep convolutional neural network based image classification pipeline. |
Andrew G. Howard; |
| 2013 | 27 | Coded Aperture Compressive Temporal Imaging IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present experimental results for reconstruction at 148 frames per coded snapshot. |
PATRICK LLULL et. al. |
| 2013 | 28 | Fast Image Scanning With Deep Max-Pooling Convolutional Neural Networks IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show how dynamic programming can speedup the process by orders of magnitude, even when max-pooling layers are present. |
Alessandro Giusti; Dan C. Cireşan; Jonathan Masci; Luca M. Gambardella; Jürgen Schmidhuber; |
| 2013 | 29 | Patch-based Probabilistic Image Quality Assessment For Face Selection And Improved Video-based Face Recognition IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an efficient patch-based face image quality assessment algorithm which quantifies the similarity of a face image to a probabilistic face model, representing an ideal face. |
Yongkang Wong; Shaokang Chen; Sandra Mau; Conrad Sanderson; Brian C. Lovell; |
| 2013 | 30 | Shadow Detection: A Survey And Comparative Evaluation Of Recent Methods IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The survey covers methods published during the last decade, and places them in a feature-based taxonomy comprised of four categories: chromacity, physical, geometry and textures. |
Andres Sanin; Conrad Sanderson; Brian C. Lovell; |
| 2012 | 1 | UCF101: A Dataset Of 101 Human Actions Classes From Videos In The Wild IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce UCF101 which is currently the largest dataset of human actions. |
Khurram Soomro; Amir Roshan Zamir; Mubarak Shah; |
| 2012 | 2 | Multi-column Deep Neural Networks For Image Classification IF:9 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Abstract: Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our … |
Dan Cireşan; Ueli Meier; Juergen Schmidhuber; |
| 2012 | 3 | Efficient Inference In Fully Connected CRFs With Gaussian Edge Potentials IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image. |
Philipp Krähenbühl; Vladlen Koltun; |
| 2012 | 4 | Sparse Subspace Clustering: Algorithm, Theory, And Applications IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose and study an algorithm, called Sparse Subspace Clustering (SSC), to cluster data points that lie in a union of low-dimensional subspaces. |
Ehsan Elhamifar; Rene Vidal; |
| 2012 | 5 | Invariant Scattering Convolution Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: The mathematical analysis of wavelet scattering networks explains important properties of deep convolution networks for classification. |
Joan Bruna; Stéphane Mallat; |
| 2012 | 6 | Generalized Principal Component Analysis (GPCA) IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents an algebro-geometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points. |
Rene Vidal; Yi Ma; Shankar Sastry; |
| 2012 | 7 | An Evaluation Of Popular Copy-Move Forgery Detection Approaches IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we aim to answer which copy-move forgery detection algorithms and processing steps (e.g., matching, filtering, outlier detection, affine transformation estimation) perform best in various postprocessing scenarios. |
Vincent Christlein; Christian Riess; Johannes Jordan; Corinna Riess; Elli Angelopoulou; |
| 2012 | 8 | Pedestrian Detection With Unsupervised Multi-Stage Feature Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Adding to the list of successful applications of deep learning methods to vision, we report state-of-the-art and competitive results on all major pedestrian datasets with a convolutional network model. |
Pierre Sermanet; Koray Kavukcuoglu; Soumith Chintala; Yann LeCun; |
| 2012 | 9 | Unsupervised Discovery Of Mid-Level Discriminative Patches IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The goal of this paper is to discover a set of discriminative patches which can serve as a fully unsupervised mid-level visual representation. |
Saurabh Singh; Abhinav Gupta; Alexei A. Efros; |
| 2012 | 10 | A Multi-View Embedding Space For Modeling Internet Images, Tags, And Their Semantics IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present two ways to train the three-view embedding: supervised, with the third view coming from ground-truth labels or search keywords; and unsupervised, with semantic themes automatically obtained by clustering the tags. |
Yunchao Gong; Qifa Ke; Michael Isard; Svetlana Lazebnik; |
| 2012 | 11 | Convolutional Neural Networks Applied To House Numbers Digit Classification IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We classify digits of real-world house numbers using convolutional neural networks (ConvNets). |
Pierre Sermanet; Soumith Chintala; Yann LeCun; |
| 2012 | 12 | Face Expression Recognition And Analysis: The State Of The Art IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The paper presents a time-line view of the advances made in this field, the applications of automatic face expression recognizers, the characteristics of an ideal system, the databases that have been used and the advances made in terms of their standardization and a detailed summary of the state of the art. |
Vinay Bettadapura; |
| 2012 | 13 | Poisson Noise Reduction With Non-local PCA IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces a novel denoising algorithm for photon-limited images which combines elements of dictionary learning and sparse patch-based representations of images. |
Joseph Salmon; Zachary Harmany; Charles-Alban Deledalle; Rebecca Willett; |
| 2012 | 14 | A New Local Adaptive Thresholding Technique In Binarization IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper describes a locally adaptive thresholding technique that removes background by using local mean and mean deviation. |
T. Romen Singh; Sudipta Roy; O. Imocha Singh; Tejmani Sinam; Kh. Manglem Singh; |
| 2012 | 15 | Regularized Robust Coding For Face Recognition IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a new face coding model, namely regularized robust coding (RRC), which could robustly regress a given signal with regularized regression coefficients. |
Meng Yang; Lei Zhang; Jian Yang; David Zhang; |
| 2012 | 16 | Constructing The L2-Graph For Robust Subspace Learning And Subspace Clustering IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present a novel method to eliminate the effects of the errors from the projection space (representation) rather than from the input space. |
Xi Peng; Zhiding Yu; Huajin Tang; Zhang Yi; |
| 2012 | 17 | Stable Image Reconstruction Using Total Variation Minimization IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This article presents near-optimal guarantees for accurate and robust image recovery from under-sampled noisy measurements using total variation minimization. |
Deanna Needell; Rachel Ward; |
| 2012 | 18 | Mahotas: Open Source Software For Scriptable Computer Vision IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The interface is in Python, a dynamic programming language, which is very appropriate for fast development, but the algorithms are implemented in C++ and are tuned for speed. |
Luis Pedro Coelho; |
| 2012 | 19 | Collaborative Representation Based Classification For Face Recognition IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we discuss how SRC works, and show that the collaborative representation mechanism used in SRC is much more crucial to its success of face classification. |
Lei Zhang; Meng Yang; Xiangchu Feng; Yi Ma; David Zhang; |
| 2012 | 20 | SVD Based Image Processing Applications: State Of The Art, Contributions And Research Challenges IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The aim of this paper is to provide a better understanding of the SVD in image processing and identify important various applications and open research directions in this increasingly important area; SVD based image processing in the future research. |
Rowayda A. Sadek; |
| 2012 | 21 | Scene Parsing With Multiscale Feature Learning, Purity Trees, And Optimal Covers IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The scene parsing method proposed here starts by computing a tree of segments from a graph of pixel dissimilarities. |
Clément Farabet; Camille Couprie; Laurent Najman; Yann LeCun; |
| 2012 | 22 | Multimodal Similarity-preserving Hashing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce an efficient computational framework for hashing data belonging to multiple modalities into a single representation space where they become mutually comparable. |
Jonathan Masci; Michael M. Bronstein; Alexander A. Bronstein; Jürgen Schmidhuber; |
| 2012 | 23 | Image Labeling On A Network: Using Social-Network Metadata For Image Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Since these types of data are inherently relational, we propose a model that explicitly accounts for the interdependencies between images sharing common properties. |
Julian McAuley; Jure Leskovec; |
| 2012 | 24 | Real-time Image-based 6-DOF Localization In Large-Scale Environments IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a real-time approach for image-based localization within large scenes that have been reconstructed offline using structure from motion (Sfm). |
Hyon Lim; Sudipta Sinha; Michael Cohen; Matt Uyttendaele; |
| 2012 | 25 | Difference Of Normals As A Multi-Scale Operator In Unorganized Point Clouds IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The Difference of Normals (DoN) provides a computationally efficient, multi-scale approach to processing large unorganized 3D point clouds. |
Yani Ioannou; Babak Taati; Robin Harrap; Michael Greenspan; |
| 2012 | 26 | Stable And Robust Sampling Strategies For Compressive Imaging IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we turn to a more refined notion of coherence — the so-called local coherence — measuring for each sensing vector separately how correlated it is to the sparsity basis. |
Felix Krahmer; Rachel Ward; |
| 2012 | 27 | Kernel Principal Component Analysis And Its Applications In Face Recognition And Active Shape Models IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Abstract: Principal component analysis (PCA) is a popular tool for linear dimensionality reduction and feature extraction. Kernel PCA is the nonlinear form of PCA, which better exploits the … |
Quan Wang; |
| 2012 | 28 | Image Processing Using Smooth Ordering Of Its Patches IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an image processing scheme based on reordering of its patches. |
Idan Ram; Michael Elad; Israel Cohen; |
| 2012 | 29 | Graph Degree Linkage: Agglomerative Clustering On A Directed Graph IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper proposes a simple but effective graph-based agglomerative algorithm, for clustering high-dimensional data. |
Wei Zhang; Xiaogang Wang; Deli Zhao; Xiaoou Tang; |
| 2012 | 30 | Intra-Retinal Layer Segmentation Of 3D Optical Coherence Tomography Using Coarse Grained Diffusion Map IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a fast segmentation method based on a new variant of spectral graph theory named diffusion maps. |
Raheleh Kafieh; Hossein Rabbani; Michael D. Abramoff; Milan Sonka; |
| 2011 | 1 | Moving Object Detection By Detecting Contiguous Outliers In The Low-Rank Representation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we show that above challenges can be addressed in a unified framework named DEtecting Contiguous Outliers in the LOw-rank Representation (DECOLOR). |
Xiaowei Zhou; Can Yang; Weichuan Yu; |
| 2011 | 2 | 3D Terrestrial Lidar Data Classification Of Complex Natural Scenes Using A Multi-scale Dimensionality Criterion: Applications In Geomorphology IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present the technique and illustrate its efficiency in separating riparian vegetation from ground and classifying a mountain stream as vegetation, rock, gravel or water surface. |
Nicolas Brodu; Dimitri Lague; |
| 2011 | 3 | Local Naive Bayes Nearest Neighbor For Image Classification IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Local Naive Bayes Nearest Neighbor, an improvement to the NBNN image classification algorithm that increases classification accuracy and improves its ability to scale to large numbers of object classes. |
Sancho McCann; David G. Lowe; |
| 2011 | 4 | Compressive Imaging Using Approximate Message Passing And A Markov-Tree Prior IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel algorithm for compressive imaging that exploits both the sparsity and persistence across scales found in the 2D wavelet transform coefficients of natural images. |
Subhojit Som; Philip Schniter; |
| 2011 | 5 | Introduction To The Bag Of Features Paradigm For Image Classification And Retrieval IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents an introduction to BoF image representations, describes critical design choices, and surveys the BoF literature. |
Stephen O’Hara; Bruce A. Draper; |
| 2011 | 6 | SHREC 2011: Robust Feature Detection And Description Benchmark IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The present paper is a report of the SHREC’11 robust feature detection and description benchmark results. |
E. BOYER et. al. |
| 2011 | 7 | Minutiae Extraction From Fingerprint Images – A Review IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a review of a large number of techniques present in the literature for extracting fingerprint minutiae. |
Roli Bansal; Priti Sehgal; Punam Bedi; |
| 2011 | 8 | Continuous Multiclass Labeling Approaches And Algorithms IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The generic framework ensures existence of minimizers and covers a wide range of relaxations of the originally combinatorial problem. |
Jan Lellmann; Christoph Schnörr; |
| 2011 | 9 | A Supervised Clustering Approach For FMRI-based Inference Of Brain States IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a method that combines signals from many brain regions observed in functional Magnetic Resonance Imaging (fMRI) to predict the subject’s behavior during a scanning session. |
VINCENT MICHEL et. al. |
| 2011 | 10 | A Panorama On Multiscale Geometric Representations, Intertwining Spatial, Directional And Frequency Selectivity IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a panorama of the aforementioned literature on decompositions in multiscale, multi-orientation bases or dictionaries. |
Laurent Jacques; Laurent Duval; Caroline Chaux; Gabriel Peyré; |
| 2011 | 11 | Prostate Biopsy Tracking With Deformation Estimation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, a volume-swept 3D US based tracking system for fast and accurate estimation of prostate tissue motion is proposed. |
Michael Baumann; Pierre Mozer; Vincent Daanen; Jocelyne Troccaz; |
| 2011 | 12 | The IHS Transformations Based Image Fusion IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Therefore, the main purpose of this work is to explore different IHS transformation techniques and experiment it as IHS based image fusion. |
Firouz Abdullah Al-Wassai; N. V. Kalyankar; Ali A. Al-Zuky; |
| 2011 | 13 | A Review Of Research On Devnagari Character Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This article is intended to serve as a guide and update for the readers, working in the Devnagari Optical Character Recognition (DOCR) area. |
V J Dongre; V H Mankar; |
| 2011 | 14 | An Axis-Based Representation For Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a new axis-based shape representation scheme along with a matching framework to address the problem of generic shape recognition. |
Cagri Aslan; Sibel Tari; |
| 2011 | 15 | Real Time Face Recognition Using Adaboost Improved Fast PCA Algorithm IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents an automated system for human face recognition in a real time background world for a large homemade dataset of persons face. |
K. Susheel Kumar; Vijay Bhaskar Semwal; R C Tripathi; |
| 2011 | 16 | On The Cohomology Of 3D Digital Images IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a method for computing the cohomology ring of three–dimensional (3D) digital binary-valued pictures. |
Rocio Gonzalez-Diaz; Pedro Real; |
| 2011 | 17 | Statistical Compressed Sensing Of Gaussian Mixture Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A novel framework of compressed sensing, namely statistical compressed sensing (SCS), that aims at efficiently sampling a collection of signals that follow a statistical distribution, and achieving accurate reconstruction on average, is introduced. |
Guoshen Yu; Guillermo Sapiro; |
| 2011 | 18 | Disconnected Skeleton: Shape At Its Absolute Scale IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a new skeletal representation along with a matching framework to address the deformable shape recognition problem. |
C. Aslan; A. Erdem; E. Erdem; S. Tari; |
| 2011 | 19 | Positive Semidefinite Metric Learning Using Boosting-like Algorithms IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a boosting-based technique, termed BoostMetric, for learning a quadratic Mahalanobis distance metric. |
Chunhua Shen; Junae Kim; Lei Wang; Anton van den Hengel; |
| 2011 | 20 | Design Of An Optical Character Recognition System For Camera-based Handheld Devices IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a complete Optical Character Recognition (OCR) system for camera captured image/graphics embedded textual documents for handheld devices. |
Ayatullah Faruk Mollah; Nabamita Majumder; Subhadip Basu; Mita Nasipuri; |
| 2011 | 21 | Salient Local 3D Features For 3D Shape Retrieval IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we describe a new formulation for the 3D salient local features based on the voxel grid inspired by the Scale Invariant Feature Transform (SIFT). |
Afzal Godil; Asim Imdad Wagan; |
| 2011 | 22 | A Multiple Component Matching Framework For Person Re-Identification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building on these similarities, we propose a Multiple Component Matching (MCM) framework for the person re-identification problem, which is inspired by Multiple Component Learning, a framework recently proposed for object detection. |
Riccardo Satta; Giorgio Fumera; Fabio Roli; Marco Cristani; Vittorio Murino; |
| 2011 | 23 | Fingerprint Recognition Using Standardized Fingerprint Model IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper discusses on the standardized fingerprint model which is used to synthesize the template of fingerprints. |
Le Hoang Thai; Ha Nhat Tam; |
| 2011 | 24 | A Linear Framework For Region-based Image Segmentation And Inpainting Involving Curvature Penalization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present the first method to handle curvature regularity in region-based image segmentation and inpainting that is independent of initialization. |
Thomas Schoenemann; Fredrik Kahl; Simon Masnou; Daniel Cremers; |
| 2011 | 25 | A Multi-feature Tracking Algorithm Enabling Adaptation To Context Variations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose in this paper a tracking algorithm which is able to adapt itself to different scene contexts. |
Duc Phu Chau; François Bremond; Monique Thonnat; |
| 2011 | 26 | Convex Approaches To Model Wavelet Sparsity Patterns IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose new modeling approaches based on group-sparsity penalties that leads to convex optimizations that can be solved exactly and efficiently. |
Nikhil S Rao; Robert D. Nowak; Stephen J. Wright; Nick G. Kingsbury; |
| 2011 | 27 | Anti-sparse Coding For Approximate Nearest Neighbor Search IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: This paper proposes a binarization scheme for vectors of high dimension based on the recent concept of anti-sparse coding, and shows its excellent performance for approximate … |
Hervé Jégou; Teddy Furon; Jean-Jacques Fuchs; |
| 2011 | 28 | Leveraging Billions Of Faces To Overcome Performance Barriers In Unconstrained Face Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We employ the face recognition technology developed in house at face.com to a well accepted benchmark and show that without any tuning we are able to considerably surpass state of the art results. |
Yaniv Taigman; Lior Wolf; |
| 2011 | 29 | A Comparative Experiment Of Several Shape Methods In Recognizing Plants IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this research, a comparative experiment of 4 methods to identify plants using shape features was accomplished. |
A. Kadir; L. E. Nugroho; A. Susanto; P. I. Santosa; |
| 2011 | 30 | Steps Towards A Theory Of Visual Information: Active Perception, Signal-to-Symbol Conversion And The Interplay Between Sensing And Control IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This manuscript describes the elements of a theory of information tailored to control and decision tasks and specifically to visual data. |
Stefano Soatto; |
| 2010 | 1 | Image Deblurring And Super-resolution By Adaptive Sparse Domain Selection And Adaptive Regularization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Considering that the contents can vary significantly across different images or different patches in a single image, we propose to learn various sets of bases from a pre-collected dataset of example image patches, and then for a given patch to be processed, one set of bases are adaptively selected to characterize the local sparse domain. |
Weisheng Dong; Lei Zhang; Guangming Shi; Xiaolin Wu; |
| 2010 | 2 | Survey Of Nearest Neighbor Techniques IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present the survey of such techniques. |
Nitin Bhatia; |
| 2010 | 3 | Solving Inverse Problems With Piecewise Linear Estimators: From Gaussian Mixture Models To Structured Sparsity IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A general framework for solving image inverse problems is introduced in this paper. |
Guoshen Yu; Guillermo Sapiro; Stéphane Mallat; |
| 2010 | 4 | Lesion Border Detection In Dermoscopy Images IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Methods: In this article, we present a systematic overview of the recent border detection methods in the literature paying particular attention to computational issues and evaluation aspects. |
M. Emre Celebi; Hitoshi Iyatomi; Gerald Schaefer; William V. Stoecker; |
| 2010 | 5 | A Comprehensive Review Of Image Enhancement Techniques IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: The paper focuses on spatial domain techniques for image enhancement, with particular reference to point processing methods and histogram processing. |
Raman Maini; Himanshu Aggarwal; |
| 2010 | 6 | TILT: Transform Invariant Low-rank Textures IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we show how to efficiently and effectively extract a class of low-rank textures in a 3D scene from 2D images despite significant corruptions and warping. |
Zhengdong Zhang; Arvind Ganesh; Xiao Liang; Yi Ma; |
| 2010 | 7 | Image Segmentation By Using Threshold Techniques IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper attempts to undertake the study of segmentation image techniques by using five threshold methods as Mean method, P-tile method, Histogram Dependent Technique (HDT), Edge Maximization Technique (EMT) and visual Technique and they are compared with one another so as to choose the best technique for threshold segmentation techniques image. |
Salem Saleh Al-amri; N. V. Kalyankar; Khamitkar S. D.; |
| 2010 | 8 | Fast L1-Minimization Algorithms For Robust Face Recognition IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, our study addresses the speed and scalability of its algorithms. |
Allen Y. Yang; Zihan Zhou; Arvind Ganesh; S. Shankar Sastry; Yi Ma; |
| 2010 | 9 | Fast Inference In Sparse Coding Algorithms With Applications To Object Recognition IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work we propose a simple and efficient algorithm to learn basis functions. |
Koray Kavukcuoglu; Marc’Aurelio Ranzato; Yann LeCun; |
| 2010 | 10 | Segmentation Of Natural Images By Texture And Boundary Compression IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a novel algorithm for segmentation of natural images that harnesses the principle of minimum description length (MDL). |
Hossein Mobahi; Shankar R. Rao; Allen Y. Yang; Shankar S. Sastry; Yi Ma; |
| 2010 | 11 | Automatic Image Segmentation By Dynamic Region Merging IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper addresses the automatic image segmentation problem in a region merging style. |
Bo Peng; Lei Zhang; David Zhang; |
| 2010 | 12 | Hybrid Linear Modeling Via Local Best-fit Flats IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a simple and fast geometric method for modeling data by a union of affine subspaces. |
Teng Zhang; Arthur Szlam; Yi Wang; Gilad Lerman; |
| 2010 | 13 | Feature Level Fusion Of Face And Fingerprint Biometrics IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The aim of this paper is to study the fusion at feature extraction level for face and fingerprint biometrics. |
Ajita Rattani; Dakshina Ranjan Kisku; Manuele Bicego; Massimo Tistarelli; |
| 2010 | 14 | Automatic Detection Of Blue-White Veil And Related Structures In Dermoscopy Images IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this article, we present a machine learning approach to the detection of blue-white veil and related structures in dermoscopy images. |
M. EMRE CELEBI et. al. |
| 2010 | 15 | Classification With Scattering Operators IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: A scattering vector is a local descriptor including multiscale and multi-direction co-occurrence information. It is computed with a cascade of wavelet decompositions and complex … |
Joan Bruna; Stéphane Mallat; |
| 2010 | 16 | Combining Multiple Feature Extraction Techniques For Handwritten Devnagari Character Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we present an OCR for Handwritten Devnagari Characters. |
Sandhya Arora; Debotosh Bhattacharjee; Mita Nasipuri; Dipak Kumar Basu; Mahantapas Kundu; |
| 2010 | 17 | Performance Comparison Of SVM And ANN For Handwritten Devnagari Character Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we discuss the characteristics of the some classification methods that have been successfully applied to handwritten Devnagari character recognition and results of SVM and ANNs classification method, applied on Handwritten Devnagari characters. |
SANDHYA ARORA et. al. |
| 2010 | 18 | A Comparative Study Of Removal Noise From Remote Sensing Image IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper attempts to undertake the study of three types of noise such as Salt and Pepper (SPN), Random variation Impulse Noise (RVIN), Speckle (SPKN). |
Salem Saleh Al-amri; N. V. Kalyankar; S. D. Khamitkar; |
| 2010 | 19 | The Projected GSURE For Automatic Parameter Tuning In Iterative Shrinkage Methods IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work we focus on optimally selecting such parameters in iterative shrinkage methods for image deblurring and image zooming. |
Raja Giryes; Michael Elad; Yonina C Eldar; |
| 2010 | 20 | Face Identification By SIFT-based Complete Graph Topology IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a new face identification system based on Graph Matching Technique on SIFT features extracted from face images. |
Dakshina Ranjan Kisku; Ajita Rattani; Enrico Grosso; Massimo Tistarelli; |
| 2010 | 21 | Nonlinear Vector Filtering For Impulsive Noise Removal From Color Images IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, a comprehensive survey of 48 filters for impulsive noise removal from color images is presented. |
M. Emre Celebi; Hassan A. Kingravi; Y. Alp Aslandogan; |
| 2010 | 22 | Hybrid Medical Image Classification Using Association Rule Mining With Decision Tree Algorithm IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The two image mining approaches with a hybrid manner have been proposed in this paper. |
P. Rajendran; M. Madheswaran; |
| 2010 | 23 | Generalized Tree-Based Wavelet Transform IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we propose a new wavelet transform applicable to functions defined on graphs, high dimensional data and networks. |
Idan Ram; Michael Elad; Israel Cohen; |
| 2010 | 24 | Handwritten Bangla Basic And Compound Character Recognition Using MLP And SVM Classifier IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A novel approach for recognition of handwritten compound Bangla characters, along with the Basic characters of Bangla alphabet, is presented here. |
NIBARAN DAS et. al. |
| 2010 | 25 | An Explicit Nonlinear Mapping For Manifold Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, an explicit nonlinear mapping is proposed for manifold learning, based on the assumption that there exists a polynomial mapping between the high-dimensional data samples and their low-dimensional representations. |
Hong Qiao; Peng Zhang; Di Wang; Bo Zhang; |
| 2010 | 26 | Real-Time Implementation Of Order-Statistics Based Directional Filters IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce two methods to speed up these filters. |
M. Emre Celebi; |
| 2010 | 27 | A Family Of Statistical Symmetric Divergences Based On Jensen’s Inequality IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a novel parametric family of symmetric information-theoretic distances based on Jensen’s inequality for a convex functional generator. |
Frank Nielsen; |
| 2010 | 28 | Active Testing For Face Detection And Localization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We provide a novel search technique, which uses a hierarchical model and a mutual information gain heuristic to efficiently prune the search space when localizing faces in images. |
Raphael Sznitman; Bruno Jedynak; |
| 2010 | 29 | Real-time Robust Principal Components’ Pursuit IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In the recent work of Candes et al, the problem of recovering low rank matrix corrupted by i.i.d. sparse outliers is studied and a very elegant solution, principal component pursuit, is proposed. |
Chenlu Qiu; Namrata Vaswani; |
| 2010 | 30 | Scalable Large-Margin Mahalanobis Distance Metric Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a fast and scalable algorithm to learn a Mahalanobis distance metric. |
Chunhua Shen; Junae Kim; Lei Wang; |