PAPER DIGEST
Most Influential WWW 2002 Paper · 2026-03 edition

Evaluating Strategies For Similarity Search On The Web

Taher H. Haveliwala; Aristides Gionis; Dan Klein; Piotr Indyk

Venue
ACM Web Conference (WWW) 2002
Recognition
Most Influential WWW 2002 Paper (Rank No. 13)
Edition
2026-03
Impact factor
6
Certificate ID
276ebebe3337cdfa

Abstract

Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages queries, but comparative evaluation by user studies is expensive, especially when large strategy spaces must be searched (e.g., when tuning parameters). We present a technique for automatically evaluating strategies using Web hierarchies, such as Open Directory, in place of user feedback. We apply this evaluation methodology to a mix of document representation strategies, including the use of text, anchor-text, and links. We discuss the relative advantages and disadvantages of the various approaches examined. Finally, we describe how to efficiently construct a similarity index out of our chosen strategies, and provide sample results from our index.

Download PDF certificate