IMAP: Discovering Complex Semantic Matches Between Database Schemas

Robin Dhamankar; Yoonkyong Lee; AnHai Doan; Alon Halevy; Pedro Domingos

Venue: ACM SIGMOD Conference (SIGMOD) 2004
Recognition: Most Influential SIGMOD 2004 Paper (Rank No. 6)
Edition: 2026-03
Impact factor: 6
Certificate ID: 61b16a22510b7a07

Abstract

Creating semantic matches between disparate data sources is fundamental to numerous data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have focused on automating the matching process. To date, however, virtually all of these works deal only with one-to-one (1-1) matches, such as address = location. They do not consider the important class of more complex matches, such as address = concat (city, state) and room-pric = room-rate*(1 + tax-rate).We describe the iMAP system which semi-automatically discovers both 1-1 and complex matches. iMAP reformulates schema matching as a search in an often very large or infinite match space. To search effectively, it employs a set of searchers, each discovering specific types of complex matches. To further improve matching accuracy, iMAP exploits a variety of domain knowledge, including past complex matches, domain integrity constraints, and overlap data. Finally, iMAP introduces a novel feature that generates explanation of predicted matches, to provide insights into the matching process and suggest actions to converge on correct matches quickly. We apply iMAP to several real-world domains to match relational tables, and show that it discovers both 1-1 and complex matches with high accuracy.

Download PDF certificate