Exploiting iteration-level parallelism in declarative programs
- Author(s): Roy, John M.A.
- et al.
In order to achieve viable parallel processing three basic criteria must be met: (1) the system must provide a programming environment which hides the details of parallel processing from the programmer; (2) the system must execute efficiently on the given hardware; and (3) the system must be economically attractive.
The first criterion can be met by providing the programmer with an implicit rather than explicit programming paradigm. In this way ali of the synchronization and distribution are handled automatically. To meet the second criterion, the system must perform synchronization and distribution in such a way that the available computing resources are used to their utmost. And to meet the third criterion, the system must not require esoteric or expensive hardware to achieve efficient utilization.
This dissertation reports on the Process-Oriented Dataflow System (PODS), which meets all of the above criteria. PODS uses a hybrid von Neumann-Dataflow model of computation supported by an automatic partitioning and distribution scheme. The new partitioning and distribution algorithm is presented along with the underlying principles. Four new mechanisms for distribution are presented: (1) a distributed array allocation operator for data distribution; (2) a distributed L operator for code distribution; (3) a range filter for restriction index ranges for different PEs; and (4) a specialized apply operator for functional parallelism.
Simulations show that PODS balances communication overhead with distributed processing to achieve efficient parallel execution on distributed memory multiprocessors. This is partially due to a new software array caching scheme, called remote caching, which greatly reduces the amount of remote memory reads. PODS is designed to use off-the-shelf components, with no specialized hardware. In this way a real PODS machine can be built quickly and cost effectively. The system is currently being retargeted to the Intel iPSC/2 so that it can be run on commercially available equipment.