Cross-System Runtime Prediction of Parallel Applications on Multi-Core Processors
Prediction of the performance of parallel applications is a concept useful in several domains of software operation. In the commercial world, it’s often useful to be able to anticipate how an application will perform on a customer’s machine with a minimal burden to the user. In the same spirit, it’s in the best interest of a user/consumer of computational software to most optimally operate it. In the super-computing/distributed computing world, being able to anticipate the performance of an application on a set of compute-nodes allows one to more optimally select the set of nodes to execute on. In terms of a large-scale shared computing environment where parallel computational jobs are assigned resources and scheduled for execution, being able to optimally do so can improve overall throughput by decreasing contention. In all cases, being able to anticipate the ideal degree of parallelism to invoke during execution (and have reasonable expectations for what can be acheived) will lead to more optimal use of all resources involved. For any of this to be possible, a good model (or models) are required which can not only capture an application’s performance on one machine but also predict its behavior on another.
Here, we present a large family of performance models composed of discrete parts, all as combinatoric variations on Amdahl’s Law. We establish a protocol involving thorough benchmarking of the application on a known system. A protocol is established for the collection of meaningful machine architecture and performance information for the known and target machines. With the resulting high quality models and a single execution of the application on the target system we are able to closely predict its parallel behavior. We propose that computation applications that are in need of this kind of treatment are sufficiently sophisticated and, especially in the case of commercial applications, are most likely black boxes and therefore avoid any need to analyze our applications in any static manner and expressly rely on parallel runtimes of individual executions. The protocols and methods can be implemented by any skilled developer on conceivably any parallel platform without the need of specialized API’s, hardware diagnostic support, or any manner of reverse engineering of the applications of interest.