PAPER DIGEST
Most Influential SIGMOD 2012 Paper · 2026-03 edition

Can We Beat The Prefix Filtering?: An Adaptive Framework For Similarity Join And Search

Jiannan Wang; Guoliang Li; Jianhua Feng

Venue
ACM SIGMOD Conference (SIGMOD) 2012
Recognition
Most Influential SIGMOD 2012 Paper (Rank No. 13)
Edition
2026-03
Impact factor
5
Certificate ID
27ddf097ad6ad7c0

Abstract

As two important operations in data cleaning, similarity join and similarity search have attracted much attention recently. Existing methods to support similarity join usually adopt a prefix-filtering-based framework. They select a prefix of each object and prune object pairs whose prefixes have no overlap. We have an observation that prefix lengths have significant effect on the performance. Different prefix lengths lead to significantly different performance, and prefix filtering does not always achieve high performance. To address this problem, in this paper we propose an adaptive framework to support similarity join. We propose a cost model to judiciously select an appropriate prefix for each object. To efficiently select prefixes, we devise effective indexes. We extend our method to support similarity search. Experimental results show that our framework beats the prefix-filtering-based framework and achieves high efficiency.

Download PDF certificate