PAPER DIGEST
Most Influential WWW 2015 Paper · 2026-03 edition

Path Sampling: A Fast And Provable Method For Estimating 4-Vertex Subgraph Counts

Madhav Jha; C. Seshadhri; Ali Pinar

Venue
ACM Web Conference (WWW) 2015
Recognition
Most Influential WWW 2015 Paper (Rank No. 15)
Edition
2026-03
Impact factor
4
Certificate ID
006cdb54f0987196

Abstract

Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4-vertex patterns is highly challenging, and there are few practical results known that can scale to massive sizes. Indeed, even a highly tuned enumeration code takes more than a day on a graph with millions of edges. Most previous work that runs for truly massive graphs employ clusters and massive parallelization. We provide a sampling algorithm that provably and accurately approximates the frequencies of all 4-vertex pattern subgraphs. Our algorithm is based on a novel technique of 3-path sampling and a special pruning scheme to decrease the variance in estimates. We provide theoretical proofs for the accuracy of our algorithm, and give formal bounds for the error and confidence of our estimates. We perform a detailed empirical study and show that our algorithm provides estimates within 1% relative error for all subpatterns (over a large class of test graphs), while being orders of magnitude faster than enumeration and other sampling based algorithms. Our algorithm takes less than a minute (on a single commodity machine) to process an Orkut social network with 300 million edges.

Download PDF certificate