Similarity Measures For Tracking Information Flow

Donald Metzler; Yaniv Bernstein; W. Bruce Croft; Alistair Moffat; Justin Zobel

Venue: ACM Conference on Information and Knowledge Management (CIKM) 2005
Recognition: Most Influential CIKM 2005 Paper (Rank No. 14)
Edition: 2026-03
Impact factor: 5
Certificate ID: ea765d2ed639af76

Abstract

Text similarity spans a spectrum, with broad topical similarity near one extreme and document identity at the other. Intermediate levels of similarity -- resulting from summarization, paraphrasing, copying, and stronger forms of topical relevance -- are useful for applications such as information flow analysis and question-answering tasks. In this paper, we explore mechanisms for measuring such intermediate kinds of similarity, focusing on the task of identifying where a particular piece of information originated. We consider both sentence-to-sentence and document-to-document comparison, and have incorporated these algorithms into <small>RECAP</small>, a prototype information flow analysis tool. Our experimental results with <small>RECAP</small> indicate that new mechanisms such as those we propose are likely to be more appropriate than existing methods for identifying the intermediate forms of similarity.

Download PDF certificate