Electronic design automation (EDA) is an important part of the integrated circuit (IC) industry, and has been evolving together with design and fabrication technologies. This evolution is reflected in both algorithm perspective and software implementation perspective. Innovative algorithms delivers accurate and reliable results in shorter computation time, and thus saves human resource and R&D cost. Smartly designed software can utilize hardware resources efficiently and maximize computing performance.
However, EDA is now facing many complicated cases in the current VLSI technology. As integration scales to the sub-90~nm regime, the performance of ICs is becoming less predictable. Different sources of variations are caused from manufacturing process and these variations will finally end up with parametric yield loss. To deal with yield loss, efficient algorithms are required to accurately predict the performance of a circuit at the design stage. Unfortunately, given the high complexity of VLSI systems, software tools with traditional algorithm and implementation suffer from the ``curse of dimensionality'' problem. For instance, Monte Carlo (MC) method, which is the most trustworthy way to capture statistical information of a design, becomes inefficient as a large number of samplings are needed for an accurate analysis of the variations of the circuit response. Also, many verifications require transient simulation or frequency domain simulation of the full-chip design in order to guarantee an optimized product. Typical power grid has a tremendous size of over billion nodes, and takes several days for traditional transient simulation to calculate time domain response. It is a strong trend to use parallel computing such as
multi-threaded CPU and general purpose GPU.
The objective of this thesis is to study the tough issues in these aforementioned simulations and discuss our algorithm and software solutions to reduce the computation cost without hurting the accuracy of the results. We also take the benefits of modern multi-core and many-core computer architectures, such as multi-core CPU and general purpose GPU, to gain speed in our simulation. The characteristics of computational independency in MC simulation is exploited and hence we have developed a GPU parallel Monte Carlo analysis based on a symbolic technique. Our parallel MC of circuit transfer functions has been verified using statistics extracted from classical MC, and the proposed method is proved to be effective. We also study an optimization based method to derive the performance bounds as a non-Monte Carlo solution. The bounds from this method is accurate without over-conservativeness.
To accelerate linear algebra operations in equation solveing tasks,
we apply GPU on fine-grained tasks in GMRES solver and attain impressive speedup over traditional CPU method. The management of different levels of GPU resources, such as thread organization and memory assignment, are discussed so that the data intensive feature of GPU can be fully rendered and its weakness like high latency be well hidden by its superb data throughput. All algorithms and implementations are demonstrated with representative numerical experiments and thorough comparisons among different methods and platforms.