Query Preserving Graph Compression
Abstract
It is common to find graphs with millions of nodes and billions of edges in, <i>e.g.</i>, social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose <i>query preserving graph compression</i>, to compress graphs <i>relative to</i> a class Λ of queries of users' choice. We compute a small <i>G</i><sub><i>r</i></sub> from a graph <i>G</i> such that (a) for <i>any</i> query <i>Q</i> Ε Λ <i>Q</i>, <i>Q</i>(<i>G</i>) = Q'(<i>G</i><sub><i>r</i></sub>), where <i>Q'</i> Ε Λ can be efficiently computed from <i>Q</i>; and (b) any algorithm for computing <i>Q</i>(<i>G</i>) can be <i>directly</i> applied to evaluating <i>Q'</i> on <i>G</i><sub><i>r</i></sub><i>as is</i>. That is, while we cannot lower the complexity of evaluating graph queries, we reduce data graphs while preserving the answers to <i>all</i> the queries in Λ. To verify the effectiveness of this approach, (1) we develop compression strategies for two classes of queries: reachability and graph pattern queries via (bounded) simulation. We show that graphs can be efficiently compressed via a reachability equivalence relation and graph bisimulation, respectively, while reserving query answers. (2) We provide techniques for aintaining compressed graph <i>G</i><sub><i>r</i></sub> in response to changes Δ<i>G</i> to the original graph <i>G</i>. We show that the incremental maintenance problems are <i>unbounded</i> for the two lasses of queries, <i>i.e.</i>, their costs are not a function of the size of Δ<i>G</i> and changes in <i>G</i><sub><i>r</i></sub>. Nevertheless, we develop incremental algorithms that depend only on Δ<i>G</i> and <i>G</i><sub><i>r</i></sub>, <i>independent of</i><i>G</i>, i.e., we do not have to decompress <i>G</i><sub>r</sub> to propagate the changes. (3) Using real-life data, we experimentally verify that our compression techniques could reduce graphs in average by 95% for reachability and 57% for graph pattern matching, and that our incremental maintenance algorithms are efficient.