Fast Approximation Of Matrix Coherence And Statistical Leverage
Abstract
The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities have been of interest in recently-popular problems such as matrix completion and Nystrom-based low-rank matrix approximation; in large-scale statistical data analysis applications more generally; and since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n � d matrix A, with n >> d, and that returns as output relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs in O(nd log n) time, as opposed to the O(nd2) time required by the naive algorithm that involves computing an orthogonal basis for the range of A. This resolves an open question from (Drineas et al., 2006b) and (Mohri & Talwalkar, 2011); and our result leads to immediate improvements in coreset-based L2-regression, the estimation of the coherence of a matrix, and several related low-rank matrix problems. Interestingly, to achieve our result we judiciously apply random projections on both sides of A.