Dixit, Akhil Anand

Answering Queries Over Inconsistent Databases Using SAT Solvers

2021

Dixit, Akhil Anand
Advisor(s): Kolaitis, Phokion G

Abstract

An inconsistent database is a database that violates one or more integrity constraints, such as key constraints and functional dependencies. Consistent Query Answering (CQA) is a rigorous and principled approach to the semantics of queries posed against inconsistent databases. The consistent answers to a query on an inconsistent database are the intersection of the answers to the query on every repair, i.e., on every consistent database that differs from the given inconsistent one in a minimal way. Computing the consistent answers of a fixed conjunctive query on a given inconsistent database can be a coNP-hard problem, even though every fixed conjunctive query is efficiently computable on a given consistent database.

The notion of consistent answers is extended to the notion of range consistent answers for queries with aggregation operators with or without the grouping constructs. The range consistent answers to an aggregation query is the interval [glb, lub] of the greatest lower bound and the least upper bound such that the answer to the query on every repair falls within the range. Computing the range consistent answers to a fixed aggregation query can be an NP-hard problem. In fact, we show that computing the range consistent answers to an aggregation query can be NP-hard even if the consistent answers to its underlying conjunctive query (i.e., the query without aggregation operators) are first-order rewritable, thus, computable in polynomial-time.

Despite several attempts towards building practical systems for consistent query answering, no comprehensive and scalable system for consistent query answering exists at present; this state of affairs has impeded the broader adoption of the framework of repairs and consistent answers as a principled alternative to data cleaning.

We designed, implemented, and evaluated CAvSAT, the first SAT-based system for consistent query answering. CAvSAT leverages a set of natural reductions from the problem of computing the consistent answers to variants of the Boolean Satisfiability problem. The system is capable of handling unions of conjunctive queries and arbitrary denial constraints, which include functional dependencies as a special case. Moreover, it is also the first system capable of computing the range consistent answers of general aggregation queries with the COUNT(A), COUNT(*), and SUM(A) operators, and with or without grouping constructs. We report results from experiments evaluating CAvSAT on both synthetic and real-world databases. We carry out an extensive set of experiments on both synthetic and real-world databases that demonstrate the usefulness and the scalability of CAvSAT.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Santa Cruz

Answering Queries Over Inconsistent Databases Using SAT Solvers