Scalable Causal Message Logging for Wide-Area Networks
Skip to main content
eScholarship
Open Access Publications from the University of California

Scalable Causal Message Logging for Wide-Area Networks

Abstract

Wide-area systems are gaining in popularity as an infrastructure for running scientific applications. From a fault tolerance perspective, these environments are challenging due to their scale and their inherent variability. Causal message logging protocols have attractive properties that make them suitable for these environments. They spread fault tolerance information around in the system providing high availability. This information can also be used to replicate objects that are otherwise inaccessible due to network partitions. However, current causal message logging protocols do not scale to thousands or millions of processes. We describe the Hierarchical Causal Logging Protocol (HCML) that uses a hierarchy of shared logging sites, or proxies, to reduces the space requirements exponentially. These proxies also act as caches for fault tolerance information and reduce the overall message overhead of causal message logging protocols by as much as 50%. In addition, HCML leverages differences in bandwidth between communicating processes by piggybacking more fault tolerance information over high bandwidth links. Doing so improves overall message latency by as much as 97%.

Pre-2018 CSE ID: CS2000-0651

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View