In recent years, scientific exploration has become more reliant upon computers. Certain scientific frontiers, such as the study of fluid dynamics, are difficult to study empirically, as measurement devices interfere with what they are measuring. By using computers to simulate known low-level physical interactions, scientists can reproduce higher-level phenomena than they can through physical experiments, and at a much lower cost. With the growth of high-performance computing, computer simulations are able to generate enormous amounts of data.
Analyzing large-scale scientific data is expensive both in terms of computational power and time, and in terms of programming effort. For some fields, such as Computational Fluid Dynamics (CFD), scientists are still searching for mathematical models to explain observations. Such models are part of the scientific inquiry and so are continually under development. As a consequence, data analysis in these fields is largely ad hoc. Many different queries are asked of the data, queries which are not known ahead of time. Current libraries do a poor job optimizing the execution of such ad hoc queries.
In this thesis we introduce Saaz, a C++ library for analyzing turbulent flow. While Saaz makes analysis codes easier to write, maintain, and share, it significantly harms performance. Compared to plain C++ (C++ with no user-defined abstractions), Saaz code can perform as poorly as 97 times slower. To address these issues we present Tettnang, a source-to-source translator which uses semantic knowledge of the Saaz library to track the data schemas being used. Saaz library calls are then re-written at the call-site to remove the abstraction overheads of the library. After being optimized by Tettnang, ad hoc queries written in Saaz perform comparably to the plain C++ implementation. Of the queries we tested, the slowest was 18% slower than the plain C++ implementation, while the fastest was 16% faster.
Our success with Tettnang demonstrates the power and effectiveness of custom translation in creating an Embedded Domain-Specific Language (EDSL) from an application library. The EDSL technique allows powerful transformations that are not available through existing techniques such as Expression Templates, or existing library annotation systems such as Broadway.