Skip to main content
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Adapting Data Representations for Optimizing Data-Intensive Applications


The need for efficiently managing Big Data has grown due the large sizes of data sets encountered by real-world Big Data applications. For example, large-scale graph analytics involves processing of graphs that have billions of nodes or edges. Because of the compute- and data-intensive nature of data analytics, programmers face multiple challenges in trying to achieve efficiency. This dissertation presents two novel approaches for handling data to scale up the performance of Big Data applications.

The first approach supports multiple in-memory physical representations (data structures) for data and dynamically selects, or switches to, the best representation for the given data/workload characteristics. For example, in a graph processing application, where the graph evolves over time, our approach switches to the best data structure as the characteristics of the graph (e.g., graph density) change over time. Similarly, in a key-value store, upon changes in relative frequencies of different types of queries over time, we switch to a more efficient data structure for that query load. Our programming and compiler support produces adaptive applications that automatically switch data structures on-the-fly with little overhead and without the developer worrying about safety. Our evaluation shows running that our adaptive applications enjoy average speedups of 1.6x for graph applications and average throughput increase of 1.4x for a key-value store, compared to their non-adaptive versions.

The second approach employs data transformations to create alternate data representations to accelerate shared memory and out-of-core applications. A large input graph is transformed into smaller graphs using a sequence of data transformations. Execution time reductions are achieved using a processing model that effectively runs the original iterative algorithm in two phases: first, using the reduced input graph to reduce execution time; and second, using the original input graph along with the results from the first phase for computing precise results. In the context of out-of-core applications this model is used to reduce I/O cost by creating a smaller graph that can be held in memory during the first phase. For parallel graph applications, our approach yields speedups of up to 2.14x and 1.81x for in-memory and out-of-core processing respectively, compared to applications running on untransformed data.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View