Full Regularization Path For Sparse Principal Component Analysis
Abstract
Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a particular linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semidefinite relaxation to this problem and derive a greedy algorithm that computes a <i>full set</i> of good solutions for all numbers of non zero coefficients, with complexity <i>O(n</i><sup>3</sup>), where <i>n</i> is the number of variables. We then use the same relaxation to derive sufficient conditions for global optimality of a solution, which can be tested in <i>O(n</i><sup>3</sup>). We show on toy examples and biological data that our algorithm does provide globally optimal solutions in many cases.