PAPER DIGEST
Most Influential SIGCOMM 2003 Paper · 2026-03 edition

Peer-to-peer Information Retrieval Using Self-organizing Semantic Overlay Networks

Chunqiang Tang; Zhichen Xu; Sandhya Dwarkadas

Venue
ACM SIGCOMM Conference (SIGCOMM) 2003
Recognition
Most Influential SIGCOMM 2003 Paper (Rank No. 9)
Edition
2026-03
Impact factor
7
Certificate ID
5db710471a74a64a

Abstract

Content-based full-text search is a challenging problem in Peer-to-Peer (P2P) systems. Traditional approaches have either been centralized or use flooding to ensure accuracy of the results returned.In this paper, we present pSearch, a decentralized non-flooding P2P information retrieval system. pSearch distributes document indices through the P2P network based on document semantics generated by Latent Semantic Indexing (LSI). The search cost (in terms of different nodes searched and data transmitted) for a given query is thereby reduced, since the indices of semantically related documents are likely to be co located in the network.We also describe techniques that help distribute the indices more evenly across the nodes, and further reduce the number of nodes accessed using appropriate index distribution as well as using index samples and recently processed queries to guide the search.Experiments show that pSearch can achieve performance comparable to centralized information retrieval systems by searching only a small number of nodes. For a system with 128,000 nodes and 528,543 documents (from news, magazines, etc.), pSearch searches only 19 nodes and transmits only 95.5KB data during the search, whereas the top 15 documents returned by pSearch and LSI have a 91.7% intersection.

Download PDF certificate