PAPER DIGEST
Most Influential SIGIR 2008 Paper · 2026-03 edition

Selecting Good Expansion Terms For Pseudo-relevance Feedback

Guihong Cao; Jian-Yun Nie; Jianfeng Gao; Stephen Robertson

Venue
ACM SIGIR Conference (SIGIR) 2008
Recognition
Most Influential SIGIR 2008 Paper (Rank No. 3)
Edition
2026-03
Impact factor
6
Certificate ID
0525c85000f85229

Abstract

Pseudo-relevance feedback assumes that most frequent terms in the pseudo-feedback documents are useful for the retrieval. In this study, we re-examine this assumption and show that it does not hold in reality - many expansion terms identified in traditional approaches are indeed unrelated to the query and harmful to the retrieval. We also show that good expansion terms cannot be distinguished from bad ones merely on their distributions in the feedback documents and in the whole collection. We then propose to integrate a term classification process to predict the usefulness of expansion terms. Multiple additional features can be integrated in this process. Our experiments on three TREC collections show that retrieval effectiveness can be much improved when term classification is used. In addition, we also demonstrate that good terms should be identified directly according to their possible impact on the retrieval effectiveness, i.e. using supervised learning, instead of unsupervised learning.

Download PDF certificate