Constant Interaction-time Scatter/gather Browsing Of Very Large Document Collections

Douglass R. Cutting; David R. Karger; Jan O. Pedersen

Venue: ACM SIGIR Conference (SIGIR) 1993
Recognition: Most Influential SIGIR 1993 Paper (Rank No. 7)
Edition: 2026-03
Impact factor: 5
Certificate ID: a21d57301f03ea56

Abstract

The Scatter/Gather document browsing method uses fast document clustering to produce table-of-contents-like outlines of large document collections. Previous work [1] developed linear-time document clustering algorithms to establish the feasibility of this method over moderately large collections. However, even linear-time algorithms are too slow to support interactive browsing of very large collections such as Tipster, the DARPA standard text retrieval evaluation collection. We present a scheme that supports constant interaction-time Scatter/Gather of arbitrarily large collections after near-linear time preprocessing. This involves the construction of a <i>cluster hierarchy</i>. A modification of Scatter/Gather employing this scheme, and an example of its use over the Tipster collection are presented.

Download PDF certificate