Speedup of banded linear recurrences in the presence of resource constraints
An m-th order linear recurrence system of N equations computes Xi =Ci+ L:!~f-m aijXj for 1 ::; i ::; N. Linear recurrences have a role of central importance in computer design, numerical analysis, program analysis, image processing and vision. However, programs containing banded linear recurrences are difficult to parallelize due to loop-carried dependences. In this paper, we first present a family of schedules, called the exact schedules, for parallel evaluation of low order ( m ::; 2) banded linear recurrences with an execution time (2m2 + 3m)N/(p + (m(m + 1)(2m + 1))/(2(2 + llog mj))) for 0 < m::; 2 , N > (p + 5)(2p + 3)/6 and number of processors p > m. We show that the exact schedules achieve the strict time lower bound under matrix multiplication model. Next, we derive another family of schedules, called the pipelined schedules, with better program-space efficiency and with an execution time of pipeline startup time+ (2m2 + 3m)N /(p + (2m + 1)/2) for m = 1, and pipeline startup time+ (6m + 2)N /(p + (2m + 1) for m > 1, m < p ::; 4m + 1, and pipeline startup time+(2m2 +3m)N/(p+(m-1)(2m+1)) form> l,p > 4m+l. This is the first parallel algorithm that achieves this time bound, which improved the fastest prevoiusly published algorithm by a factor 2:'. (p+ 2m2 - m-1)/(p+ m + 1/2) form> 1. We illustrate the technique by parallelizing loops containing linear recurrences and demonstrate the available speedup on a VLIW architecture with experimental results obtained using our pipelined schedules.