Ultra fine-grain percolation scheduling
- Author(s): Kolson, David J.
- Dutt, Nikil
- Nicolau, Alexandru
- et al.
Previously Percolation Based Synthesis (PBS) was proposed as a new approach to the task of scheduling in High-Level Synthesis and demonstrated favorable results. PBS is an adaptation of well-founded compiler techniques with several desirable properties (e.g. flexibility, completeness and optimality). However, the granularity of the PBS scheduler is at the functional unit level. Schedules generated at this level do not necessarily represent the best obtainable performance as they do not exploit the lower level aspects of the functional units. By scheduling at a lower level more efficient and effective schedules can be generated by utilizing the lower level information. This lower-level granularity is termed the ultra fine-grain level and the resources at this level correspond to the hardware components that are sequenced via the control unit during each clock (sub-) cycle. Therefore the schedules produced with this technique are suitable for control unit construction. The resulting impact is that it allows the ultra fine-grain code scheduler to apply parallelization techniques in the presence of low-level constraints and to utilize the underlying architecture in novel ways. In this paper we discuss the enhancement of PBS which allows the exploitation of the ultra fine-grain level. While it may seem that, due to the finer granularity, code size might increase beyond manageability, our experiments have shown that on the standard benchmark set, this was not a prohibitive factor. In fact, the increase in run-time of the code scheduler is within acceptable limits for the resultant increase in parallelism. Further, although the scheduling of this ultra fine-grain code could potentially cause simultaneous register file accesses to increase-an undesirable side-effect as building register files with a large number of ports is not currently practical-our experiments indicate that for the benchmarks studied the maximum number of simultaneous RF accesses can actually decrease.