Toward Scalable, Reliable and Efficient Big Data Publish Subscribe Systems
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

Toward Scalable, Reliable and Efficient Big Data Publish Subscribe Systems

Abstract

Societal-scale notification systems have transformed how people request for and receive information today - traffic notifications, extreme weather alerts (NOAA alerts), social media feeds (Twitter), and public health systems (COVID exposure alerts) are examples. At the heart of several such systems are publish-subscribe-based architectures, where users subscribe to events of interest proactively and receive notification messages when such events occur. Distributed publish-subscribe paradigms leverage distributed broker networks in-place for matching and routing these events to interested subscribers; this helps increase the scale and scope of information dissemination in such systems. The rise of big data platforms and cloud computing technologies serve an important role in transforming messaging platforms into societal-scale notification systems. In this thesis, we propose and design emerging Big Data Publish Subscribe (BDPS) systems - scalable hierarchical architectures for the next generation of enriched and customized notification systems. BDPS systems combine: i) the advantages of popular Big Data Management Systems (BDMS) with scalable storage, efficient query processing, and massive data ingestion capabilities from heterogeneous publishers and sources, which generates a vast amount of enriched and customized notifications; and ii) distributed publish-subscribe broker networks for scalable delivery of such notifications to interested end-users. We explore three challenging problems about scalability, resilience, and efficiency of BDPS architectures under dynamic conditions. First, we investigate the problem of potentially skewed load distributions among brokers due to the dynamic nature of the systems. We develop a multistage adaptive load balancing framework for handling dynamically skewed load distributions among brokers which affects the performance and ability of the systems to efficiently disseminate notifications to subscribers. Next, we address the issue of fault tolerance in the broker network. We propose REAPS (REliable Active Publish Subscribe) - a primary-backup fault tolerance framework that can handle different classes of broker failures, including randomized failures and geographically correlated failures (e.g., in a natural disaster). REAPS exploits subscription similarity among brokers and applies a quasi-active state replication for low overhead while supporting fast recovery and delivery guarantees of notification services. Finally, we develop techniques for prioritizing and scheduling delivery of important notifications to end-users at the brokers when the systems experience high workloads, and techniques such as load balancing are not applicable to maximize the user benefit and fairness among users, e.g., in situations where large volumes of notifications must be disseminated in a short time. Overall, we explore and develop services for creating scalable, reliable, and efficient BDPS systems that can ingest petabytes of data and generate millions of enriched and customized notifications to reach mega folks in milliseconds.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View