On Indexing Multi-Valued Fields in AsterixDB
- Author(s): Galvizo, Glenn Justo
- Advisor(s): Carey, Michael J
- et al.
Secondary indexes in database systems are traditionally built under the assumption that one data record maps to one indexed value. Nowadays single data records often hold collections of values that users want to access efficiently in an ad-hoc manner. Database users are thus torn between changing their data model to support such indexes or living with subpar query execution time. Multi-valued indexes aim to give users the best of both worlds: (i) to keep a more natural data model of records with collections of values, and (ii) to reap the benefits of a secondary index. This thesis details the steps taken to realize multi-valued indexes in AsterixDB, a big data management system with a structured query language operating over a collection of documents. A non-ambiguous, clean, and concise syntax is first developed for specifying such indexes. Data flows for bulk-loading a collection of records and maintaining an index correctly with respect to concurrency are then illustrated. Query plans to take advantage of multi-valued indexes for use in joins involving arrays / multisets, predicates with existential quantification, and predicates with universal quantification (the latter of which is not supported by any other system to date) are discussed next. We finally conclude with experiments demonstrating the efficacy of these indexes.