Depth Estimation Using Monocular And Stereo Cues

Ashutosh Saxena; Jamie Schulte; Andrew Y. Ng

Venue: International Joint Conference on Artificial Intelligence (IJCAI) 2007
Recognition: Most Influential IJCAI 2007 Paper (Rank No. 14)
Edition: 2026-03
Impact factor: 6
Certificate ID: af1b1c61d3e25005

Abstract

Depth estimation in computer vision and robotics is most commonly done via stereo vision (stereopsis), in which images from two cameras are used to triangulate and estimate distances. However, there are also numerous monocular visual cues---such as texture variations and gradients, defocus, color/haze, etc.---that have heretofore been little exploited in such systems. Some of these cues apply even in regions without texture, where stereo would work poorly. In this paper, we apply a Markov Random Field (MRF) learning algorithm to capture some of these monocular cues, and incorporate them into a stereo system. We show that by adding monocular cues to stereo (triangulation) ones, we obtain significantly more accurate depth estimates than is possible using either monocular or stereo cues alone. This holds true for a large variety of environments, including both indoor environments and unstructured outdoor environments containing trees/forests, buildings, etc. Our approach is general, and applies to incorporating monocular cues together with any off-the-shelf stereo system.

Download PDF certificate