IMAP: Discovering Complex Semantic Matches Between Database Schemas
Abstract
Creating semantic matches between disparate data sources is fundamental to numerous data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have focused on automating the matching process. To date, however, virtually all of these works deal only with one-to-one (1-1) matches, such as <b>address = location</b>. They do not consider the important class of more complex matches, such as <b>address</b> = concat (<b>city, state</b>) and <b>room-pric</b> = <b>room-rate*</b><b>(1 + tax-rate)</b>.We describe the <b>iMAP</b> system which semi-automatically discovers both 1-1 and complex matches. <b>iMAP</b> reformulates schema matching as a <i>search</i> in an often very large or infinite match space. To search effectively, it employs a set of searchers, each discovering specific types of complex matches. To further improve matching accuracy, <b>iMAP</b> exploits a variety of domain knowledge, including past complex matches, domain integrity constraints, and overlap data. Finally, <b>iMAP</b> introduces a novel feature that generates explanation of predicted matches, to provide insights into the matching process and suggest actions to converge on correct matches quickly. We apply <b>iMAP</b> to several real-world domains to match relational tables, and show that it discovers both 1-1 and complex matches with high accuracy.