Skip to main content
eScholarship
Open Access Publications from the University of California

UC Davis

UC Davis Electronic Theses and Dissertations bannerUC Davis

Applied Machine Learning for Resource Provisioning of Data-Intensive Applications on Scale-Out Platforms and Its Security Challenges

Abstract

The processing of data-intensive applications is a challenging and time-consuming task that often requires massive infrastructure to ensure fast data analysis. One of the most powerful scale-out infrastructures to perform massive computation (e.g. big data analytics) and eliminate the need to maintain high-end expensive computing resources at the user side is the cloud. The performance and the cost of such infrastructure depend on the overall server configuration, such as processor, memory, network, and storage configurations. In addition to the cost of owning or maintaining the hardware, the heterogeneity in the server configuration further expands the selection space, leading to non-convergence. The challenge is further exacerbated by the dependency of the application's performance on the underlying hardware.

Despite an increasing interest in resource provisioning, little works have been done in developing accurate and practical models to proactively predict the performance of data-intensive applications corresponding to the server configuration and provision an optimal configuration online. The key challenges of current solutions are uncertainty in predictions, cost of training, generalizability from benchmark datasets to real-world systems datasets, and interpretability of the model.

In this dissertation, through a comprehensive real-system empirical analysis of performance, we address these challenges by introducing a proactive machine-learning-based methodology for resource provisioning. We first characterize diverse types of data-intensive workloads across different types of server architectures. The characterization aids in accurately capture applications' behavior and train a model for the prediction of their performance. Then, we build a set of cross-platform performance models for applications. Based on the developed predictive model, we use optimization techniques to distinguish close-to-optimal configurations in order to reach the performance goal.

On the other hand, in recent literature, researchers substantiated that the machine learning-based models bring new security challenges such as adversarial machine learning attacks. In this dissertation, we investigate what could be the target of adversarial machine learning in the cloud domain and how much the risk of this new thread is real. To the best of our knowledge, we are the first group looking into this domain of research as no report has been found on the adversarial attacks on resource provisioning systems (RPS) of the cloud. Our investigation shows that adversarial machine learning can be used for co-locating the adversary Virtual Machines (VM) with the victim VM to attack to its performance. Moreover, we show that the attacker can fool the RPS to evade the detection and migration performed by RPS.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View