PAPER DIGEST
Most Influential SIGMOD 2004 Paper · 2026-03 edition

Graph Indexing: A Frequent Structure-based Approach

Xifeng Yan; Philip S. Yu; Jiawei Han

Venue
ACM SIGMOD Conference (SIGMOD) 2004
Recognition
Most Influential SIGMOD 2004 Paper (Rank No. 2)
Edition
2026-03
Impact factor
7
Certificate ID
a671d9e10ba41163

Abstract

Graph has become increasingly important in modelling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a <i>graph query</i>, it is desirable to retrieve graphs quickly from a large database via <i>graph-based indices.</i> In this paper, we investigate the issues of indexing graphs and propose a novel solution by applying a graph mining technique. Different from the existing <i>path-based methods</i>, our approach, called <i>gIndex</i>, makes use of <i>frequent substructure</i> as the basic indexing feature. Frequent substructures are ideal candidates since they explore the intrinsic characteristics of the data and are relatively stable to database updates. To reduce the size of index structure, two techniques, <i>size-increasing support constraint</i> and <i>discriminative fragments</i>, are introduced. Our performance study shows that gIndex has 10 times smaller index size, but achieves 3--10 times better performance in comparison with a typical path-based method, <i>GraphGrep.</i> The gIndex approach not only provides and elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit form data mining, especially frequent pattern mining. Furthermore, the concepts developed here can be applied to indexing sequences, trees, and other complicated structures as well.

Download PDF certificate