Tutorial: Uncertain Schema Matching: The Power of not Knowing

Avigdor Gal - Technion

Tutorial slides


Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. Schema matching is one of the basic operations required by the process of data and schema integration, and thus has a great effect on its outcomes, whether these involve targeted content delivery, view integration, database integration, query rewriting over heterogeneous sources, duplicate data elimination, or automatic streamlining of workflow activities that involve heterogeneous data sources.

Although schema matching research has been ongoing for over 25 years, only recently a realization has emerged that schema matchers are inherently uncertain. Since 2003, work on the uncertainty in schema matching has picked up, along with research on uncertainty in other areas of data management.

This tutorial presents various aspects of uncertainty in schema matching within a single unified framework. We introduce basic formulations of uncertainty and provide several alternative representations of schema matching uncertainty. Then, we cover two common methods that have been proposed to deal with uncertainty in schema matching, namely ensembles and top-K matchings, and analyze them in this context. We conclude with a set of real-world applications and in particular, the use of uncertain schema matching in NisB, a European project that is aimed at harnessing an evolving Wisdom of the Network to dynamically connect businesses to attain common business goals.




Avigdor Gal - Technion

Avigdor Gal is an Associate professor at the Technion -- Israel Institute of Technology. He has published more than 95 papers in journals (e.g. Journal of the ACM (JACM), ACM Transactions on Database Systems (TODS), IEEE Transactions on Knowledge and Data Engineering (TKDE), ACM Transactions on Internet Technology (TOIT), and the VLDB Journal), books (Schema Matching and Mapping) and conferences (CIKM, ICDE, ER, CoopIS, BPM) on the topics of data integration, temporal databases, information systems architectures, and active databases. Avigdor Gal is the author of the book Uncertain schema Matching, part of Synthesis Lectures on Data Management (March 2011).