PAPER DIGEST
Most Influential SIGMOD 2002 Paper · 2026-03 edition

Processing Complex Aggregate Queries Over Data Streams

Alin Dobra; Minos Garofalakis; Johannes Gehrke; Rajeev Rastogi

Venue
ACM SIGMOD Conference (SIGMOD) 2002
Recognition
Most Influential SIGMOD 2002 Paper (Rank No. 9)
Edition
2026-03
Impact factor
6
Certificate ID
c388a40cf46c5301

Abstract

Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited memory. Providing (perhaps approximate) answers to queries over such continuous data streams is a crucial requirement for many application environments; examples include large telecom and IP network installations where performance data from different parts of the network needs to be continuously collected and analyzed.In this paper, we consider the problem of approximately answering general <i>aggregate</i> SQL queries over continuous data streams with limited memory. Our method relies on randomizing techniques that compute small "sketch" summaries of the streams that can then be used to provide approximate answers to aggregate queries with provable guarantees on the approximation error. We also demonstrate how existing statistical information on the base data (e.g., histograms) can be used in the proposed framework to improve the quality of the approximation provided by our algorithms. The key idea is to intelligently partition the domain of the underlying attribute(s) and, thus, decompose the sketching problem in a way that provably tightens our guarantees. Results of our experimental study with real-life as well as synthetic data streams indicate that sketches provide significantly more accurate answers compared to histograms for aggregate queries. This is especially true when our domain partitioning methods are employed to further boast the accuracy of the final estimates.

Download PDF certificate