PAPER DIGEST
Most Influential SIGIR 2004 Paper · 2026-03 edition

A Formal Study Of Information Retrieval Heuristics

Hui Fang; Tao Tao; ChengXiang Zhai

Venue
ACM SIGIR Conference (SIGIR) 2004
Recognition
Most Influential SIGIR 2004 Paper (Rank No. 8)
Edition
2026-03
Impact factor
6
Certificate ID
11279ea370e211cc

Abstract

Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. One basic research question is thus what exactly are these "necessary" heuristics that seem to cause good retrieval performance. In this paper, we present a <i>formal</i> study of retrieval heuristics. We formally define a set of basic desirable constraints that any reasonable retrieval function should satisfy, and check these constraints on a variety of representative retrieval functions. We find that none of these retrieval functions satisfies all the constraints unconditionally. Empirical results show that when a constraint is not satisfied, it often indicates non-optimality of the method, and when a constraint is satisfied only for a certain range of parameter values, its performance tends to be poor when the parameter is out of the range. In general, we find that the empirical performance of a retrieval formula is tightly related to how well it satisfies these constraints. Thus the proposed constraints provide a good explanation of many empirical observations and make it possible to evaluate any existing or new retrieval formula <i>analytically</i>.

Download PDF certificate