A partitioning approach for GPU accelerated level-based on -chip variation static timing analysis
- Author(s): Zhang, Michael Longqiang
- et al.
Technology and design trends have made timing analysis the bottleneck of electronic design automation (EDA) tools. Efficient and accurate timing analysis is a challenge that the EDA industry must overcome in order to move forward. Using LLC-OCV leverages Physical Location, Path Level, and Cell type information to further increase timing accuracy. This model introduces increased data complexity as a result of maintaining delays for each unique path-level. We parallelize this computation for co-processing on a CUDA enabled GPU. We introduce a novel divide-and-conquer partitioning approach for computing the per-level delay data used in the level-based aspect of LLC-OCV. Partitioning the circuit graph halves the inherently serial structure of a topological traversal of the circuit graph with a costly but more parallel merge step that combines the solutions of the two partitions. Using a massively parallel GPU-based approach allows us to absorb the cost of merging by performing it in parallel. Our experimental results on the ISCAS '85 benchmark demonstrate our parallel algorithm scales with timing graph size more efficiently than the serial algorithm. Results also show that our partitioning approach allows us to more fully utilize the massively parallel computational resources of the GPU. Our experiments on artificial test cases demonstrate that the parallel algorithms outperform the serial algorithm on large non-linear graph structures. We also find that LOCV timing analysis is a memory bound computation. We expect our algorithm to perform better on the newer Fermi architecture because of the new cached memory architecture