PAPER DIGEST
Most Influential SIGMOD 2003 Paper · 2026-03 edition

Winnowing: Local Algorithms For Document Fingerprinting

Saul Schleimer; Daniel S. Wilkerson; Alex Aiken

Venue
ACM SIGMOD Conference (SIGMOD) 2003
Recognition
Most Influential SIGMOD 2003 Paper (Rank No. 1)
Edition
2026-03
Impact factor
10
Certificate ID
18682f75adfb5f8b

Abstract

Digital content is for copying: quotation, revision, plagiarism, and file sharing all create copies. Document fingerprinting is concerned with accurately identifying copying, including small partial copies, within large sets of documents.We introduce the class of <i>local</i> document fingerprinting algorithms, which seems to capture an essential property of any finger-printing technique guaranteed to detect copies. We prove a novel lower bound on the performance of any local algorithm. We also develop <i>winnowing</i>, an efficient local fingerprinting algorithm, and show that winnowing's performance is within 33% of the lower bound. Finally, we also give experimental results on Web data, and report experience with MOSS, a widely-used plagiarism detection service.

Download PDF certificate