Skip to main content
eScholarship
Open Access Publications from the University of California

Optimizing Interactive Development of Data-Intensive Applications.

  • Author(s): Interlandi, M
  • Tetali, SD
  • Gulzar, MA
  • Noor, J
  • Condie, T
  • Kim, M
  • Millstein, T
  • et al.
Abstract

Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. Vega is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage Vega to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications.

Many UC-authored scholarly publications are freely available on this site because of the UC Academic Senate's Open Access Policy. Let us know how this access is important for you.

Main Content
Current View