PAPER DIGEST
Most Influential SIGMOD 2005 Paper · 2026-03 edition

Reference Reconciliation In Complex Information Spaces

Xin Dong; Alon Halevy; Jayant Madhavan

Venue
ACM SIGMOD Conference (SIGMOD) 2005
Recognition
Most Influential SIGMOD 2005 Paper (Rank No. 4)
Edition
2026-03
Impact factor
7
Certificate ID
b75a1791d31880ba

Abstract

Reference reconciliation is the problem of identifying when different references (i.e., sets of attribute values) in a dataset correspond to the same real-world entity. Most previous literature assumed references to a <i>single</i> class that had a fair number of attributes (e.g., research publications). We consider complex information spaces: our references belong to <i>multiple</i> related classes and each reference may have very few attribute values. A prime example of such a space is Personal Information Management, where the goal is to provide a coherent view of all the information on one's desktop.Our reconciliation algorithm has three principal features. First, we exploit the associations between references to design new methods for reference comparison. Second, we propagate information between reconciliation decisions to accumulate positive and negative evidences. Third, we gradually enrich references by merging attribute values. Our experiments show that (1) we considerably improve precision and recall over standard methods on a diverse set of personal information datasets, and (2) there are advantages to using our algorithm even on a standard citation dataset benchmark.

Download PDF certificate