PAPER DIGEST
Most Influential SIGIR 2001 Paper · 2026-03 edition

A Study Of Smoothing Methods For Language Models Applied To Ad Hoc Information Retrieval

Chengxiang Zhai; John Lafferty

Venue
ACM SIGIR Conference (SIGIR) 2001
Recognition
Most Influential SIGIR 2001 Paper (Rank No. 1)
Edition
2026-03
Impact factor
9
Certificate ID
454562d790203426

Abstract

Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech recognition. The basic idea of these approaches is to estimate a language model for each document, and then rank documents by the likelihood of the query according to the estimated language model. A core problem in language model estimation is <i>smoothing</i>, which adjusts the maximum likelihood estimator so as to correct the inaccuracy due to data sparseness. In this paper, we study the problem of language model smoothing and its influence on retrieval performance. We examine the sensitivity of retrieval performance to the smoothing parameters and compare several popular smoothing methods on different test collections.

Download PDF certificate