Managing Inconsistencies in Data Exchange
- Author(s): Halpert, Richard Lee
- Advisor(s): Kolaitis, Phokion G;
- ten Cate, Balder
- et al.
Data exchange is the problem of transforming data structured under one schema, called the source schema, into data structured under another schema, called the target schema, in such a way that pre-specified constraints on the two schemas are satisfied. Queries may then be posed against the target schema, where the traditional semantics of data exchange, called certain answers, give those answers that hold on all acceptable choices of target instance (that is, all "solutions"). In a data exchange setting with target constraints, it is often the case that a given source instance has no solutions. Intuitively, this happens when data sources contain inconsistent or conflicting information that is exposed by the target constraints. In such cases, the traditional semantics of target queries trivialize.
In this work, we explore a new framework, called exchange-repairs, that gives meaningful answers in such cases. Informally, an exchange-repair of a source instance is a pair of instances: a source instance that differs minimally from the original, but has a solution, along with one such solution. Exchange-repairs give rise to a natural notion of exchange-repair certain (XR-certain) answers; they are the answers that hold on every exchange-repair.
We first explore the problem of computing the XR-certain answers to conjunctive queries. We show limited circumstances under which the XR-certain query answers can be expressed as the consistent query answers (in the sense of standard database repairs) on the same instance, but using another query and set of constraints. We also give a translation to disjunctive logic programming (DLP) that is applicable for a much larger class of XR-certain query answering problems. Next, we study how to effectively perform XR-certain query answering for practical data exchange settings. We evaluate the use of modern, sophisticated DLP solvers, and develop a principled approach to optimizing their use for XR-certain query answering. Then, we investigate approximations of XR-certain semantics, and draw close parallels to those proposed for the problem of inconsistency-tolerant ontology based data access (OBDA). However, we find that the approximations known to be tractable for constraint languages common in OBDA research prove to have computational complexity no better than XR-certain semantics for the types of constraints typically studied in data exchange.