Quill: Efficient, Transferable, and Rich Analytics at Scale
Skip to main content
Open Access Publications from the University of California

UC Riverside

UC Riverside Previously Published Works bannerUC Riverside

Quill: Efficient, Transferable, and Rich Analytics at Scale

Published Web Location

No data is associated with this publication.

This paper introduces Quill (stands for a quadrillion tuples per day ), a library and distributed platform for relational and temporal analytics over large datasets in the cloud. Quill exposes a new abstraction for parallel datasets and computation, called ShardedStreamable . This abstraction provides the ability to express efficient distributed physical query plans that are transferable, i.e., movable from offline to real-time and vice versa. ShardedStreamable decouples incremental query logic specification, a small but rich set of data movement operations, and keying; this allows Quill to express a broad space of plans with complex querying functionality, while leveraging existing temporal libraries such as Trill. Quill's layered architecture provides a careful separation of responsibilities with independently useful components, while retaining high performance. We built Quill for the cloud, with a master-less design where a language-integrated client library directly communicates and coordinates with cloud workers using off-the-shelf distributed cloud components such as queues. Experiments on up to 400 cloud machines, and on datasets up to 1TB, find Quill to incur low overheads and outperform SparkSQL by up to orders-of-magnitude for temporal and 6× for relational queries, while supporting a rich space of transferable, programmable, and expressive distributed physical query plans.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Item not freely available? Link broken?
Report a problem accessing this item