Search

Scholarly Works (6 results)

Sort By:

Thesis
Peer Reviewed

From Controlled Data-Center Environments to Open Distributed Environments: Scalable, Efficient, and Robust Systems with Extended Functionality

UC Santa Barbara Electronic Theses and Dissertations (2019)

The past two decades have witnessed several paradigm shifts in computing environments. Starting from cloud computing which offers on-demand allocation of storage, network, compute, and memory resources, as well as other services, in a pay-as-you-go billing

model. Ending with the rise of permissionless blockchain technology, a decentralized computing paradigm with lower trust assumptions and limitless number of participants. Unlike in the cloud, where all the computing resources are owned by some trusted cloud provider, permissionless blockchains allow computing resources owned by possibly malicious parties to join and leave their network without obtaining permission from some centralized trusted authority. Still, in the presence of malicious parties, permissionless

blockchain networks can perform general computations and make progress. Cloud computing is powered by geographically distributed data-centers controlled and managed by trusted cloud service providers and promises theoretically infinite computing resources. On the other hand, permissionless blockchains are powered by open networks of geographically distributed computing nodes owned by entities that are not necessarily known or trusted. This paradigm shift requires a reconsideration of distributed data management protocols and distributed system designs that assume low latency across system components, inelastic computing resources, or fully trusted computing resources.

In this dissertation, we propose new system designs and optimizations that address scalability and efficiency of distributed data management systems in cloud environments. We also propose several protocols and new programming paradigms to extend the functionality and enhance the robustness of permissionless blockchains. The work presented spans global-scale transaction processing, large-scale stream processing, atomic transaction processing across permissionless blockchains, and extending the functionality and the use-cases of permissionless blockchains. In all these directions, the focus is on rethinking system and protocol designs to account for novel cloud and permissionless blockchain assumptions. For global-scale transaction processing, we propose GPlacer, a placement optimization framework that decides replica placement of fully and partial geo-replicated databases. For large-scale stream processing, we propose Cache-on-Track (CoT) an adaptive and elastic client-side cache that addresses server-side load-imbalances that occur in large-scale distributed storage layers. In permissionless blockchain transaction processing, we propose AC3WN, the first correct cross-chain commitment protocol that guarantees atomicity of cross-chain transactions. Also, we propose TXSC, a transactional smart contract programming framework. TXSC provides smart contract developers with transaction primitives. These primitives allow developers to write smart contracts without the need to reason about the anomalies that can arise due to concurrent smart contract function executions. In addition, we propose a forward-looking architecture that unifies both permissioned and permissionless blockchains and exploits the running infrastructure of permissionless blockchains to build global asset management systems.

Cover page: From Controlled Data-Center Environments to Open Distributed Environments: Scalable, Efficient, and Robust Systems with Extended Functionality

Thesis
Peer Reviewed

Global-Scale Data Management with Strong Consistency Guarantees

UC Santa Barbara Electronic Theses and Dissertations (2017)

Global-scale data management(GSDM) empowers systems by providing higher levels of fault-tolerance, read availability, and efficiency in utilizing cloud resources. This has led to the emergence of global-scale data management and event processing. However, the Wide-Area Network (WAN) latency separating datacenters is orders of magnitude larger than typical network latencies, and this requires a reevaluation of many of the traditional design trade-offs of data management systems. Therefore, data management problems must be revisited to account for the new design space.

In this dissertation, we propose theoretical foundations to understand the limits imposed by WAN latency on GSDM, and propose practical systems and protocols to minimize the overhead caused by WAN latency. The presented work spans global-scale transaction processing, communication, analytics, and machine learning. In all these directions, the focus is on the trade-off between consistency and latency, where we ask the question: what is the best performance (often latency) we can achieve without compromising the consistency and integrity of data? For transaction processing, we propose a lower-bound formulation for transaction latency that is imposed by the WAN latency. Also, we propose a new paradigm for transaction processing (proactive coordination) that inspired out two proposed protocols, Message Futures and Helios, which can achieve the lower-bound latency. We also propose a communication framework, called Chariots, to scale multi-datacenter communication. Chariots is carefully designed to allow scaling communication while providing a consistent view of the communicated information. Finally, we explore challenges in global-scale analytics and machine learning. Specifically, we propose Ogre, a scalable system for global-scale heterogeneous transactional and analytics workloads. Also, we propose COP, a system designed to speed up machine learning on globally generated data.

Cover page: Global-Scale Data Management with Strong Consistency Guarantees

Thesis
Peer Reviewed

Data Management Solutions for Tackling Big Data Variety

UC Santa Barbara Electronic Theses and Dissertations (2018)

Variety is one of the three defining characteristics of Big Data; the others being Volume and Velocity. There are several aspects of this data variety: diversity in data formats (text, video, audio) and structure (relational, graph etc), variety in access methodologies

(OLTP, OLAP), and distribution heterogeneity within the workloads (read-heavy, high contention). Data management solutions for modern-day applications need to tackle this variety.

This dissertation provides an understanding of the challenges associated with the different elements of variety, and proposes several solutions for efficiently handling its various aspects. First, the dissertation studies the challenges related to variety in data structure and access methodologies, and the resultant heterogeneity at the data infrastructure level. Applications now employ several data-processing engines with different underlying representations, like row, column, graph etc., to process their data. We propose Janus, which introduces a novel data-movement pipeline, which enables the use of different representations to support both high throughput of transactions and diverse analytics, while still ensuring consistent real-time analytics in a scale-out setting. Janus partitions the data at different representations, and allows distributed transactions and diverse partitioning strategies at the representations. Then, we propose Typhon and Cerberus, which define and enforce consistency semantics for application data spread across representations. Second, this dissertation proposes solutions for handling distribution heterogeneity within the workloads. Workloads can have have skewed distribution in terms of operation-type, data access or temporal variation. We propose strongly-consistent quorum reads for Raft-like consensus protocols, which can be utilized to scale read-heavy workloads. For supporting high contention transaction workloads, we integrate an existing dynamic timestamp allocation based concurrency control mechanism in a distributed OLTP setting, and analyze its performance. Third, we study IoT applications, which have to deal with both physical heterogeneity of the sensors, as well as

diverse data-processing demands. We propose a multi-representation based architecture catering to IoT applications, and also present the initial design of M-stream, a computation framework for enabling integration and monitoring of uncertain data from multiple

sensors. Through analysis, illustrative examples and extensive evaluation of the proposed protocols, this dissertation demonstrates that the proposed solutions can be employed for efficiently handling the different aspects of variety of data-intensive applications.

Cover page: Data Management Solutions for Tackling Big Data Variety

Thesis
Peer Reviewed

Large-Scale Data Management using Permissioned Blockchains

UC Santa Barbara Electronic Theses and Dissertations (2020)

The unique features of blockchain such as transparency, provenance, and authenticity are used by many large-scale data management systems to deploy a wide range of distributed applications including supply chain management, healthcare, and crowdsourcing in a permissioned setting. Unlike permissionless settings, e.g., Bitcoin, where the network is public, and anyone can participate without a specific identity, a permissioned blockchain consists of a set of known, identified nodes that might not fully trust each other. While the characteristics of permissioned blockchains are appealing to a wide range of large-scale data management systems, these systems, have to deal with five important challenges: confidentiality, verifiability, performance, scalability, and fault tolerance. Confidentiality of data is required in many collaborative large-scale data management applications where collaboration between enterprises, e.g., cross-enterprise transactions, should be visible to all enterprises, however, the internal data of each enterprise, e.g, internal transactions, might be confidential. Besides confidentiality, in many multi-enterprise systems, e.g., crowdworking environments, participants need to verify transactions that are initiated by other enterprises to ensure some predefined global constraints on the entire system. Thus, the system needs to support verifiability while preserving the confidentiality of transactions. Verifiability will gain in importance as crowdworking applications increase in popularity, and the need for regulation will arise. Large-scale data management applications also require high performance in terms of throughput and latency. Scalability is one of the main obstacles to business adoption of blockchain systems. To support a large-scale data management application, a blockchain system should be able to scale efficiently by adding more resources to the system. Finally, large-scale data management systems must provide fault tolerance. Fault-tolerant protocols are the main building block of large-scale data management systems. However and and in spite of years of intensive research, existing fault-tolerant protocols, do not adequately address hybrid environments consisting of trusted and untrusted servers which are widely used by enterprises. In this dissertation, we propose several techniques and develop different systems to address all five main challenges of large-scale data management using permissioned blockchains. We have developed systems, called CAPER, SEPAR, ParBlockchain, SharPer, and SeeMoRe to deal with the confidentiality, verifiability, performance, scalability, and fault tolerance requirements of large-scale data management respectively.

Cover page: Large-Scale Data Management using Permissioned Blockchains

Thesis
Peer Reviewed

Enhancing the performance, fault tolerance, and security of distributed data management systems

UC Santa Barbara Electronic Theses and Dissertations (2022)

Individuals and enterprises produce over 2.5 exabytes (1018 bytes) of data everyday. Much of this data - including sensitive and private information - is stored with and managed by third-parties, such as Amazon Web Services or Google Cloud. These companies can lose millions to billions of dollars in sales if their data access latencies increase by only a few hundred milliseconds. Achieving data fault tolerance – a necessary primitive of database systems – while maintaining low access latency is particularly challenging. Hence, reducing data access latency to improve performance and guaranteeing data fault tolerance received the highest priority while designing cloud data management systems. But the ever growing number and sophistication of cyber attacks on the cloud coupled with increasing legal requirements for data privacy and security (e.g., GDPR or HIPAA) have forced cloud providers to re-evaluate their priorities. However, there exists a fundamental trade-off between security and efficiency in data management systems.

This dissertation discusses designing and evaluating data management protocols that strike a balance between efficiency, fault tolerance, and security in both trusted and untrusted environments. Before being able to solve security challenges in database systems, we first delve into traditional cloud settings, which assumes trust, to understand existing system designs. Existing cloud databases replicate their data to provide fault tolerance and shard (or partition) the data and store the shards on multiple servers to provide scalability. In trusted environments, we propose two solutions: G-PAC, an atomic commitment protocol that commits transactions accessing data that is both sharded and replicated, and Samya, a data system that maintains aggregate data and supports high contention write-intensive workloads. As the next step towards building secure data systems, to better understand the interplay between multiple security guarantees and performance, we study various blockchain systems – an ideal example where untrustedgeo-distributed entities manage critical data.

Equipped with blockchain techniques that protect data, we build three solutions that focus on data Confidentiality, Integrity, and Availability, more popularly known as the CIA triad, which forms the pillars of secure systems. For confidentiality, this dissertation proposes ORTOA: a protocol that allows users to read or write data onto an untrusted external server without revealing the type of operation in a single round, whereas all existing solutions to hide the type of operation require two rounds of communication. For integrity, this dissertation presents Fides: a transactional database system that guarantees data integrity and provides verifiable ACID guarantees. In this work, we also propose TFCommit - the first distributed transaction commitment protocol that tolerates up to n − 1 maliciously failing servers (out of n servers) without using expensive data replication. And for availability, we propose QuORAM : the first fully fault-tolerant Oblivious RAM datastore that guarantees data privacy by hiding access patterns of users along with the contents of data.

Cover page: Enhancing the performance, fault tolerance, and security of distributed data management systems

Article
Peer Reviewed

Preserving Location Privacy in Geosocial Applications

UC Santa Barbara Previously Published Works (2014)

Using geosocial applications, such as FourSquare, millions of people interact with their surroundings through their friends and their recommendations. Without adequate privacy protection, however, these systems can be easily misused, for example, to track users or target them for home invasion. In this paper, we introduce LocX, a novel alternative that provides significantly improved location privacy without adding uncertainty into query results or relying on strong assumptions about server security. Our key insight is to apply secure user-specific, distance-preserving coordinate transformations to all location data shared with the server. The friends of a user share this user's secrets so they can apply the same transformation. This allows all location queries to be evaluated correctly by the server, but our privacy mechanisms guarantee that servers are unable to see or infer the actual location data from the transformed data or from the data access. We show that LocX provides privacy even against a powerful adversary model, and we use prototype measurements to show that it provides privacy with very little performance overhead, making it suitable for today's mobile devices. © 2014 IEEE.

Cover page: Preserving Location Privacy in Geosocial Applications