The Journal of Systems Research (JSys) is a diamond open-access journal covering all areas of computer systems research.
Volume 1, Issue 1, 2021
In the past few years, FaaS has gained significant popularity and became a go-to choice for deploying cloud applications and micro-services. FaaS with its unique ‘pay as you go’ pricing model and key performance benefits over other cloud services, offers an easy and intuitive programming model to build cloud applications. In this model, a developer focuses on writing the code of the application while infrastructure management is left to the cloud provider who is responsible for the underlying resources, security, isolation, and scaling of the application. Recently, a number of commercial and open-source FaaS platforms have emerged, offering a wide range of features to application developers. In this paper, first, we present measurement studies demystifying various fea- tures and performance of commercial and open-source FaaS platforms that can help developers with deploying and con- figuring their serverless applications. Second, we discuss the distinct performance and cost benefits of FaaS and interesting use cases that leverage the performance, cost, or both aspects of FaaS. Lastly, we discuss challenges a developer may face while developing or deploying a serverless application. We also discuss state of the art solutions and open problems.
MultiPaxos and Raft are the two most popular and widely deployed state machine replication protocols. There is a more sophisticated family of generalized multi-leader state machine replication protocols like EPaxos, Caesar, and Atlas that have better performance, but they are extremely complicated and hard to understand. Due to their complexity, they have seen little to no industry adoption, and academically there has been a lack of clarity in analyzing, comparing, and extending the protocols. This paper is a tutorial on generalized multi-leader protocols. We explain why the protocols work the way they do, what they have in common, where they differ, which parts of the protocols are straightforward, which are more subtle than they appear, and so on. In doing so, we present four new generalized multi-leader protocols, identify key insights into existing protocols, and taxonomize the space.
State machine replication protocols, like MultiPaxos and Raft, are at the heart of numerous distributed systems. To tol- erate machine failures, these protocols must replace failed machines with new machines, a process known as reconfigu- ration. Reconfiguration has become increasingly important over time as the need for frequent reconfiguration has grown. Despite this, reconfiguration has largely been neglected in the literature. In this paper, we present Matchmaker Paxos and Matchmaker MultiPaxos, a reconfigurable consensus and state machine replication protocol respectively. Our protocols can perform a reconfiguration with little to no impact on the latency or throughput of command processing; they can per- form a reconfiguration in a few milliseconds; and they present a framework that can be generalized to other replication pro- tocols in a way that previous reconfiguration techniques can not. We provide proofs of correctness for the protocols and optimizations, and present empirical results from an open source implementation showing that throughput and latency do not change significantly during a reconfiguration.
When designing their performance evaluations, networking researchers often encounter questions such as: How long should a run be? How many runs to perform? How to account for the variability across multiple runs? What statistical methods should be used to analyze the data? Despite their best intentions, researchers often answer these questions differently, thus impairing the replicability of their evaluations and the confidence in their results.
In this paper, we propose a concrete methodology for the design and analysis of performance evaluations. Our approach hierarchically partitions the performance evaluation into three timescales, following the principle of separation of concerns. The idea is to understand, for each timescale, the temporal characteristics of variability sources, and then to apply rigorous statistical methods to derive performance results with quantifiable confidence in spite of the inherent variability. We implement this methodology in a software framework called TriScale. For each performance metric, TriScale computes a variability score that estimates, with a given confidence, how similar the results would be if the evaluation were replicated; in other words, TriScale quantifies the replicability of evaluations. We showcase the practicality and usefulness of TriScale on four different case studies demonstrating that TriScale helps to generalize and strengthen published results.
Improving the standards of replicability in networking is a complex challenge. This paper is an important contribution to this endeavor; it provides networking researchers with a rational and concrete experimental methodology rooted in sound statistical foundations. The first of its kind.