Skip to main content
eScholarship
Open Access Publications from the University of California

Resilient 3D Network-on-Chip Design and Analysis

  • Author(s): Yaghini, Pooria M.
  • Advisor(s): Baghrzadeh, Nader
  • et al.
Creative Commons Attribution 4.0 International Public License
Abstract

Like every other major changes in computer architecture, exascale computing, targeted

for 2020, requires dramatic and unanticipated shifts in different perspectives.

The biggest challenge facing this trend is to design an exascale system with a hundredfold

optimization on the estimated power cost of above $2.5B per year for a system

designed with current technology. It has been reported that a large portion of total

power is consumed for communication through interconnection network. Communication

between the computational components of System-on-Chip (SoC) designs can

account for more than 25 percent of the energy dissipation of the whole system. NoC

is recognized by many researchers as the best communication infrastructure for manycore

systems. To lower communication power, researchers have proposed the idea

of designing thinned and stacked 3D ICs. 3D ICs, fabricated using Through-Silicon

Via (TSV), offer higher bandwidths, smaller form factors, shorter wire lengths, lower

power, and better performance than traditional 2D ICs. The combination of 3D structures

and NoC is the most promising approach for obtaining the projected performance

and power requirements for exascale systems. Besides the extremely constrained power budget, achieving an acceptable level of resiliency for 1,000,000 cores in an exascale

system is a crucial challenge. Communication reliability, due to the huge amount of

data movement in these systems, plays a key role.

In this dissertation, the focus is to identify, characterize, and mitigate the reliability

threats of TSV-based 3D communication structures, specifically threats introduced by

TSV-to-TSV coupling fault.

In the first step which is the identification of the reliability threats, the potential physical

faults of a baseline TSV-based 3D NoC architecture by targeting Two-dimensional

(2D) NoC components and their inter-die connections is classified. Subsequently, TSV

issues, thermal concerns, and Single Event Effect (SEE) are investigated and categorized,

in order to propose evaluation metrics for inspecting the resiliency of 3D NoC

designs.

Then, in the second step, having overviewed the common TSV issues, a framework

is proposed for quantifying the 3D NoC reliability using formal methods. TSV issues

are modeled as a time-invariant failure probability and a reliability criterion for TSVbased

NoC is defined. The relationship between NoC reliability and TSV failure is

quantified. For the first time, the reliability criterion is reduced to a tractable closedform

expression that requires a single Monte Carlo simulation.

In the third step, a system-level TSV coupling fault model is proposed, which models

the capacitive coupling effect, considering thermal impact, at circuit-level accuracy.

This model can be plugged into any system-level and RTL-level TSV-based 3D-IC

data-oriented simulator. Having analyzed and recorded the TSV coupling effect at

circuit-level, these effects are applied to the Through-Silicon Vias (TSVs) dynamically in system-level simulations at runtime through precise monitoring and calculation. The

proposed fault model is potentially useful for evaluating the reliability of 3D many-core

applications in which TSV coupling may lead to failure.

After setting up the TSV coupling fault modeling framework, multiple coding approaches

are proposed to prevent coupling fault occurrence on TSV links. In these

approaches, the coupling fault effect is addressed by diagnosing the hazardous current

flow direction patterns of the TSV bus, and encoding the data bits to avoid those patterns

at run-time. Different coding schemes are devised to address both types of TSV

coupling, inductive and capacitive. These approaches are devised to be low overhead,

fast, and highly efficient. Empirical simulations are performed with both random and

realistic benchmarks, including PARSEC, to demonstrate the efficacy of the devised

approaches. All these approaches are also implemented at hardware-level, to have a

realistic estimate of the imposed overheads at logic-level. Experimental results show

that these approaches improve the communication reliability over TSV links significantly,

with no extra TSV and negligible information redundancy or hardware logic

overhead.

Overall, this work provides a rich set of TSV coupling-avoidance techniques, besides an

accurate and fast TSV coupling fault modeling simulation framework, for efficient and

effective design of reliable 3D communication architectures. It helps DFT designers to

more easily design robust TSV links.

Main Content
Current View