Over the last decade there has been a proliferation of distributed storage systems. These systems handle various types of data (e.g., text, photos, documents, files) in a variety of settings (e.g., enterprise, cloud, mobile). Typically, each system’s design is optimized towards a specific workload and setting. Yet, when designing new applications, developers often face the onerous task of choosing one of these existing solutions to store their workload’s data. This choice can be difficult for two reasons: (1) existing systems may not offer all of the features required by the new application and the development effort to extend an existing system’s implementation can be prohibitive, and (2) choosing a system design which cost-effectively meets a workload’s performance goal(s) is non-trivial and an incorrect choice may significantly inflate the cost to achieve the desired performance.
In this dissertation I design, build, and evaluate two distributed storage systems which, via their flexible implementations, offer improved ease of use and reduced development overhead when implementing data-centric applications.
First, I present Simba, a cloud-based data synchronization service which reduces development complexity of mobile apps. It improves upon existing systems which offer disjoint table- or object-only syncing with eventual consistency. Simba is built around a novel data abstraction that unifies a tabular and object data model, and allows apps to choose from a set of distributed consistency schemes. Mobile apps written to this abstraction can effortlessly sync data with the cloud and other mobile devices while benefiting from end-to-end data consistency.
Second, I present Lego, a modular emulator of distributed object stores. Lego is a tool which aids in the selection of the system design that can most cost-effectively satisfy a workload’s performance goals. It enables developers to construct, deploy, and evaluate the performance of candidate object store designs in the target setting. Lego’s modular architecture enables rapid prototyping, allowing for any object store design to be instantiated as a collection of modules. Since different object store designs often share similarities, this feature enables reuse of code across system implementations, thus significantly reducing development effort.