Cover Trees For Nearest Neighbor
Abstract
We present a tree data structure for fast nearest neighbor operations in general <i>n</i>-point metric spaces (where the data set consists of <i>n</i> points). The data structure requires <i>O</i>(<i>n</i>) space <i>regardless</i> of the metric's structure yet maintains all performance properties of a navigating net (Krauthgamer & Lee, 2004b). If the point set has a bounded expansion constant <i>c</i>, which is a measure of the intrinsic dimensionality, as defined in (Karger & Ruhl, 2002), the cover tree data structure can be constructed in <i>O</i> (<i>c</i><sup>6</sup><i>n</i> log <i>n</i>) time. Furthermore, nearest neighbor queries require time only logarithmic in <i>n</i>, in particular <i>O</i> (<i>c</i><sup>12</sup> log <i>n</i>) time. Our experimental results show speedups over the brute force search varying between one and several orders of magnitude on natural machine learning datasets.