Operating systems are built and designed around two driving forces: the capabilities of hardware, and the demands of software. Yet traditional (read: old) operating systems and programming models have inertia, resulting in interfaces for new hardware following the designs of existing interfaces. As a result, programmers are limited in their ability to express the important parts of their programs due to the layers of compatibility and overhead thrust upon them, despite their persistent demands for higher throughput and lower latency. Operating system abstractions must evolve into the modern day. Merely relying on decades old abstractions and incremental change will relegate novel hardware of the last decade to a fate of access via interfaces designed for tape and spinning rust.
We stand before an opportunity to study how a confluence of trends may shift programming models away from a traditional, process-centric view point towards a data-centric one, in which data is the primary citizen of the system. This opportunity arises from trends in hardware that directly impact how we view the data access path and the responsibilities ofthe operating system and the kernel. The increasing speed of interconnect technologies draws computing nodes closer together in latency space, increasing the efficiency and useability of shared memory. Persistence, traditionally trapped low in the memory hierarchy, is leaking upward, as access speeds for persistent devices increase (sidenote: this includes new technologies like byte-addressible non-volatile memory DIMMs, but also improvements in SSD performance and interfaces). As a result, the overhead of the traditional kernel-driven data access path begins to dominate the cost of accessing persistent data. Finally, the increasing heterogeneity and disaggregation of compute and memory devices demands increased data and compute mobility, as software demands continued scalability, distribution, and raw speed.
This dissertation presents a new, data-centric operating system and programming model designed around the trends above. The data-centric approach reframes the goals of the operating system and enables us to re-imagine classic systems programming techniques into a model that facilitates data sharing instead of hindering it. For example, classical systems programming models and techniques tend to involve significant complexity and overhead in dealing with data persistence and sharing, such as expensive coarse-grained persistence operations, rigid RPC data models, and serialization. In contrast, our data-centric approach gets the kernel out of the way of the data access path, makes data mobile through invariant references whose meaning does not change depending on address space or machine (in contrast to virtual memory, whose references only have meaning inside a single address space), and thus removes the need for serialization.
We will cover the motivation and hardware trends that lead to our design, define a design space based on those trends, and finally discuss Twizzler, a point in that design space that exemplifies the ideals we will discuss. We will evaluate Twizzler with case-studies that demonstrate system behavior, and efficacy and useability of our programming models, by building several new pieces of software for Twizzler. These, along with ported larger applications, will be used to demonstrate the performanceof Twizzler and its programming model, often showing performance increase just due to simplification of software layers.