This paper investigates the idle time associated with a parallel
computation, that is, the time that processors are idle because they are either
waiting for data from other processors or waiting to synchronize with other
processors. We study doubly-nested loops corresponding to parallelogram- or
trapezoidal-shaped iteration spaces that have been parallelized by the
well-known tiling transformation. We introduce the notion of rise r, which
relates the shape of the iteration space to that of the tiles. For
parallelogram-shaped iteration spaces, we show that when r -2, the idle time
is linear in P, the number of processors, but when r -1, it is quadratic in P.
In the context of hierarchical tiling, where multiple levels of tiling are
used, a good choice of rise can lead to less idle time and better performance.
While idle time is not the only cost that should be considered in evaluating a
tiling strategy, current architectural trends (of deeper memory hierarchies and
multiple levels of parallelism) suggest it has increasing importance.
Pre-2018 CSE ID: CS1999-0617