# UC San Diego

**Technical Reports** 

# Title

Charge-Matching Tail Approximation in a Piece-wise

# Permalink

https://escholarship.org/uc/item/1md7d77z

# **Author** Liu, Bao

Publication Date 2005-10-12

Peer reviewed

# Charge-Matching Tail Approximation in a Piece-wise Linear-and-Exponential Function

Bao Liu

University of California, San Diego Computer Science and Engineering Department 9500 Gilman Dr., La Jolla, CA 92093 bliu@cs.ucsd.edu

Abstract— A gate output signal transition in DSM designs is approximated in a ramp function followed by an exponential attenuation tail. We observe that an "effective capacitance" is the gate output charge during the ramp signal transition time at the gate output with a unit supply voltage. We propose a tail approximation scheme which matches the "remaining capacitance," s.t., the total gate output charge equals the total load capacitance with a unit supply voltage. We model a driving point signal transition in a piece-wise linear-and-exponential (PWLE) function in accordance with the saturation and the linear behaviors of a driving transistor. We present tail approximation and interconnect delay calculation formulas with PWLE functions. Our experimental results from industry design test cases show an average of 3.8%(8.6%) and maximum of 15.4%(16.7%) accuracy improvement over existing ramp-based driving point signal transition approximation schemes in interconnect delay (transition time) calculation.

## I. INTRODUCTION

Gate level static timing analysis (STA) achieves orders of magnitude speedups over full waveform time domain simulations (e.g., SPICE). A typical STA implementation calculates gate delays and output transition times from input transistion times and output load capacitances, based on a table lookup approach. Counting the downstream resistive shielding effect in deep sub-micron (DSM) domain replaces the lumped total load capacitance with an *effective capacitance* [2, 12, 15].

Traditional delay calculator fits a signal transition in a ramp function. Ramps are good fits of signal transitions in traditional designs, with negligible interconnect resistances and lumped interconnect capacitances. *With a single load capacitance and a constant gate output current, the driving point signal transition is in a ramp.* In DSM designs with larger interconnect resistances, faster signal transition times and lower transistor threshold voltages, gates are more likely to operate with a linear output resistance, which results in a more significant exponential attenuation at the driving point of an interconnect(Figure 3). Effective capacitances could under-estimate signal transition times and interconnect delays. A *tail approximation* procedure is proposed to adjust the ramp fit[12].

Ramps are essentially inefficient to capture the exponential attenuation effect. Signal transitions at the driving point of a RC network cannot be accurately approximated by any signal transition which drives a single capacitance (Figure 3). Differences in transition time threshold voltage definition among industry libraries can also result in different transition times and calculated delays.

A gate output resistance is proposed to follow a ramp excitation to capture the exponential attenuation effect[3, 14]. Such methods suffer accuracy problems, since transistor output resistances vary in magnitudes during signal transitions. Linearization of a transistor I-V curve would be over-timeconsuming for timing analysis[4].

We propose to model signal transition at a driving gate output as a piece-wise linear-and-exponential (PWLE) function, in accordance with the saturation and the linear behaviors of a driving transistor during a signal transition. Parameters of a PWLE function are extracted such that the total charge are preserved. We also present the interconnect delay calculation formulas with PWLE functions. Our experimental results show an average of 3.8%(8.6%) and maximum of 15.4%(16.7%) accuracy improvement on interconnect delay (transition time) in industry design test cases.

We review effective capacitance computation and suggest matching "remaining capacitances" for tail approximation in Section II. We present a PWLE function which models a gate output voltage in Section III. We derive formulas for chargematching-based tail approximation in Section II, and formulas for interconnect delay calculation in Section V. We demonstrate our experimental results in Section VI and conclude in Section VII.

# II. CHARGE MATCHING

An effective capacitance is computed by matching the average gate output current up to the gate output transition time[2]. We observe that this is equivalent to match the driving transis-



Fig. 1. A transistor output current  $I_d$  drives a distributed RC interconnect load which is simplified in a  $\Pi$  model.



Fig. 2. Time domain waveform of a transistor output current  $I_d$  which drives a distributed RC network (e.g., in a  $\Pi$  model as shown in Figure 1) with a driving point signal transiton from t = 0 to  $T_r$ . Integration of the output current to infinite time gives the total load capacitance. Integration of the output current from t = 0 to  $T_r$  gives the effective capacitance. The rectangle area gives the near end capacitance.

tor output charge (i.e., integration of the driving transistor output current) up to the completion of the ramp signal transition at the driving transistor output; thereafter, the driving transistor acts as a passive resistor, while the RC network is continually charged; the total transistor output charge (i.e., integration of the output current up to the infinite time) equals the total load capacitance. With a single load capacitance, this gives identical effective and total load capacitances. Note that the transistor output current with a distributed RC network load increases exponentially, while it keeps constant with a single load capacitance and a ramp driving voltage. This may be partially responsible for delay mismatches in an effective-capacitancebased gate delay calculation (Figures 1 and 2).

We propose a tail approximation scheme which matches the "remaining capacitance," i.e.,  $C_{total} - C_{eff}$ . Our method (Algorithm 1) is based on a piece-wise linear-and-exponential (PWLE) driving voltage function as follows.

# III. TRANSISTOR OUTPUT VOLTAGE IN A PWLE FUNCTION

We propose a piece-wise linear-and-exponential function in modeling gate output voltage as follows (for a rising signal transition).

$$\frac{V_d(t)}{V_{dd}} = \begin{cases} 0 & t < 0\\ t/T_r & 0 < t < T_t\\ 1 - \kappa e^{\rho(t - T_t)} & T_t < t, \end{cases}$$
(1)

where the exponential function of time constant  $\rho$  starts at time  $T_t$  and voltage  $V(T_t) = (1 - \kappa)V_{dd}$ .  $T_t$  is the time that the driving transistor switches from saturation to linear behavior with  $V_{ds} \leq V_{gs} - V_{th}$ . Assuming a fast input transition, e.g.,  $V_{gs}(T_t) = V_{dd}$ , we have  $V_{ds} = V_{dd} - V_{th}$ , or  $V_d(T_t) = V_{th}$ . Hence,

$$\kappa = 1 - \frac{V_{th}}{V_{dd}}$$

$$T_t = \frac{V_{th}}{V_{dd}}T_r.$$
(2)

| Algorithm 1: Delay Calculation with PWLE functions                                                           |  |  |  |  |  |  |  |
|--------------------------------------------------------------------------------------------------------------|--|--|--|--|--|--|--|
| Input: distributed interconnect, timing library<br>Output: gate and interconnect delays                      |  |  |  |  |  |  |  |
| 1. Interconnect model order reduction<br>2. Compute $C_{eff}$ , gate delay, and gate output transition $T_r$ |  |  |  |  |  |  |  |
| by matching $C_{eff} = \int_0^{T_r} I_d dt$                                                                  |  |  |  |  |  |  |  |
| 3. Derive $\kappa$ and $T_t$ from $V_{th}$ .                                                                 |  |  |  |  |  |  |  |
| 4. Compute $R_d C_{eff}$ by fitting gate output transition                                                   |  |  |  |  |  |  |  |
| 5. Compute $C_s = \int_0^{T_t} I_d dt$ from driving point impedance                                          |  |  |  |  |  |  |  |

6. Compute  $\rho$  with Equation 5

7. Calculate interconnect delay with Equation 13

#### IV. MATCHING REMAINING CAPACITANCE

Output charge of a transistor from t = 0 to  $T_t$  is given by

$$Q_s = \int_0^{T_t} I_d dt$$

where the transistor output current  $I_d$  is obtained from the signal transition and the interconnect impedance at the driving point.

Output charge of a transistor from  $T_t$  to  $t = \infty$  is given by

$$Q_r = \int_{T_t}^{\infty} I_d dt = \int_{T_t}^{\infty} V_d dt / R_d$$
(3)

where  $R_d$  is the transistor output resistance. From the PWLE function,

$$\int_{T_t}^{\infty} V_d dt = \int_{T_t}^{\infty} \kappa e^{\rho(t-T_t)} dt = -\rho \kappa$$

In matching the total driving transistor output charge with the total load capacitance,

$$Q_{total} = Q_s + Q_r = C_{total} V_{dd} \tag{4}$$

we have

$$\rho = \frac{V_{dd}}{\kappa} R(C_{total} - C_s)$$
$$= (\frac{V_{dd}}{\kappa} \frac{C_{total} - C_s}{C_{eff}}) R_d C_{eff}$$
(5)

where  $R_d C_{eff}$  is obtained by fitting an exponential signal transition with the ramp signal transition at the gate output, e.g., to the upper and the lower transition threshold voltages.

## V. INTERCONNECT DELAY CALCULATION

A distributed R(L)C interconnect can be represented in the following transfer function by applying model reduction techniques[5, 6, 9, 11]:

$$H(s) = \sum_{i} \frac{k_i}{s - p_i} \tag{6}$$

in Laplace domain or

$$H(t) = \sum_{i} k_i e^{p_i t} \tag{7}$$

in time domain, where  $k_i$  and  $p_i$  are residues and poles, respectively. A time domain output function  $V_o(t)$  is obtained by applying convolution to an input function  $V_i(t)$  and an interconnect transfer function H(t):

$$V_o(t) = \int_0^t V_i(t-\tau)H(\tau)d\tau.$$
 (8)

For a step input function

$$\frac{V_i(t)}{V_{dd}} = \begin{cases} 0 & t < 0\\ 1 & t > 0, \end{cases}$$
(9)

output signal function is given by

$$\frac{V_o(t)}{V_{dd}} = \sum_i \frac{k_i}{p_i} e^{p_i t} + 1.$$
 (10)

For a ramp input function

$$\frac{V_i(t)}{V_{dd}} = \begin{cases} 0 & t < 0 \\ t/Tr & 0 < t < Tr \\ 1 & Tr < t \end{cases}$$
(11)

with transition time  $T_r$ , output signal function is given by

$$\frac{V_o(t)}{V_{dd}} = \begin{cases} \frac{t}{T_r} + \sum_i \frac{1}{T_r} \frac{k_i}{p_i^2} (e^{p_i t} - 1) & t < T_r \\ 1 + \sum_i \frac{1}{T_r} \frac{k_i}{p_i^2} (e^{p_i t} - e^{p_i (t - T_r)}) & T_r \le t. \end{cases}$$
(12)

Similarly, for our proposed PWLE voltage function, output function is given by:

$$\frac{V_o(t)}{V_{dd}} = \begin{cases} \frac{t}{T_r} + \sum_i \frac{1}{T_r} \frac{k_i}{p_i^2} (e^{p_i t} - 1) & t < T_t \\ 1 + \sum_i ((\kappa \frac{k_i}{p_i} - \frac{1}{T_r} \frac{k_i}{p_i^2}) e^{p_i (t - T_t)} + \\ \frac{1}{T_r} \frac{k_i}{p_i^2} e^{p_i t} + \frac{\kappa k}{p_i - \rho} (e^{\rho(t - T_t)} - e^{p_i (t - T_t)})) & T_t < t. \end{cases}$$
(13)

Consider a ramp input with an extremely slow transition time (i.e.,  $t < T_r$  and  $e^{pt} \ll 1$ ), where Equation 13 becomes

$$\frac{V_o(t)}{V_{dd}} = \frac{t}{T_r} - \sum_i \frac{k_i}{p_i^2} \frac{1}{Tr}.$$
 (14)



Fig. 3. An actual driving point signal transition with a "tail" of exponential attenuation is different with signal transitions derived from effective and total interconnect capacitances, respectively.

Setting  $V_o(t) = 0.5$  gives the following interconnect delay

$$t_d = \sum_i \frac{k_i}{p_i^2},\tag{15}$$

which equals the first moment  $m_1$  or the Elmore delay[7]:

$$t_d^{Elmore} = m_1 = \int_0^\infty t H(t) dt = \sum_i \frac{k_i}{p_i^2}.$$
 (16)

This shows that the 50% RC interconnect delay equals the Elmore delay with an infinitive input transition time.<sup>1</sup>

## VI. EXPERIMENT

We implemented our non-ramp interconnect delay calculation scheme in C++, and run our experiments on several industry design test cases, which include an interconnect driven by a 8X low threshold ( $V_{th} = 0.2V, V_{dd} = 1.0V$ ) driver in a  $0.13\mu m$ design. The interconnect of 120 resistors and 121 capacitors with a total of 0.23pF capacitance (with no coupling capacitance) is scaled by a factor k = 1, 2, 4, 8, or 16 to represent long interconnects, or interconnects in future process technologies.

Table I compares SPICE simulation and STA results on signal transitions and propagation delays at the near end (or driving point) and the far end of a load interconnect with four approximated driving point signal transitions: (1) an effectivecapacitance-derived ramp transition from [12]; (2) effectivecapacitance-derived ramp transition; with tail approximation adjustment [12]; (3) an effective-capacitance-derived ramp transition from [2]; and (4) our proposed charge-matching PWLE-based signal transition. We have the following observations.

<sup>&</sup>lt;sup>1</sup>Another proof is given in [7].

**Observation 1** All existing signal transition approximation schemes could under-estimate interconnect delays and (far end) transition times.

**Observation 2** All existing signal transition approximation schemes suffer increasing accuracy loss with increasing resistive and capacitive loads.

**Observation 3** Interconnect delay is increasingly sensitive to transition time with a fast input transition (e.g., from a strong driver<sup>2</sup>) and a large capacitive load (e.g., of a long interconnect).

**Observation 4** *Our non-ramp signal transition approximation scheme achieves an average of* 3.8%(8.6%) *and maximum of* 15.4%(16.7%) *accuracy improvement over existing approaches in interconnect delay (far end transition time) calculation.*<sup>3</sup>

# VII. CONCLUSION

More significant interconnect resistances, faster signal transition times and lower transistor threshold voltages in DSM designs deviate signal transitions from ramp functions. Tail approximation becomes more critical to achieving accurate interconnect delay calculation. We observe that "effective capacitance" equals the driving gate output charge during the gate output ramp transition time, and propose a tail approximation method which matches with the "remaining capacitance," such that the total driving gate output charge equals the total load capacitance with a unit supply voltage. Our method is based on a piece-wise linear-and-exponential (PWLE) function which captures the driving point signal transition in accordance with the saturation and the linear behaviors of the driving transistor. Our experimental results show an average of 3.8%(8.6%)and maximum of 15.4%(16.7%) accuracy improvement in interconnect delay (transition time) calculation.

DSM technology requires improved accuracy of delay calculation (e.g., for static timing analysis). Capturing signal transitions in a waveform other than ramp functions becomes increasingly important to achieving accurate delay calculation in DSM designs[1, 8]. We exhibit improved STA accuracy by combining a non-ramp function with the current ramp-based transistor characterization scheme. Our on-going efforts address consideration of crosstalk and supply voltage drop effects.

#### REFERENCES

C. S. Amin, F. Dartu and Y. I. Ismail, "Weibull Based Analytical Waveform Model," *Proc. International Conference on Computer-Aided Design*, 2003, pp. 161-168.

- [2] F. Dartu, N. Menezes, J. Qian and L. Pileggi, "A Gate-Delay Model for High-Speed CMOS Circuits," in *Proc. Design Automation Conference*, 1994, pp. 576-579.
- [3] F. Dartu and L. Pileggi, "Calculating Worst-Case Gate Delays Due to Dominant Capacitance Coupling," in *Proc. Design Automation Conference*, 1997, pp. 46-51.
- [4] F. Dartu and L. Pileggi, "TETA: Transistor-Level Engine for Timing Analysis," in *Proc. Design Automation Conference*, 1998, pp.
- [5] L. M. Elfadel and D. D. Ling, "Block Rational Arnoldi Algorithm for Multipoint Passive Model-Order Reduction of Multiport RLC Networks," in *Proc. International Conference on Computer-Aided Design*, 1997, pp. 66-71.
- [6] P. Feldmann and R. W. Freund, "Efficient Linear Circuit Analysis by Pade Approximation via the Lanczos Process," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, 14(5), 1995, pp. 639-649.
- [7] R. Gupta, B. Krauter, B. Tutuianu, J. Willis and L. T. Pileggi, "The Elmore Delay as a Bound for RC Trees with Generalized Input Signals," in *Proc. Design Automation Conference*, 1995, pp. 364-369.
- [8] M. Hashimoto, Y. Yamada and H. Onodera, "Capturing Crosstalk-Induced Waveform for Accurate Static Timing Analysis," *Proc. International Symposium on Physical Design*, 2003, pp. 18-23.
- [9] A. Odabasioglu, M. Celik and L. T. Pileggi, "PRIMA: Passive Reduced-Order Interconnect Macromodeling Algorithm," *IEEE Trans. on Computer-Aided Design of Integrated Circuits* and Systems, 17(8), 1998, pp. 645-654.
- [10] P. Penfield, J. Rubinstein and M. A. Horowitz, "Signal Delay in RC Tree Networks," *IEEE Trans. Computer-Aided Design*, 2(3), 1983, pp. 202-211.
- [11] L. T. Pillage and R. A. Rohrer, "Asymptotic waveform evaluation for timing analysis," *IEEE Trans. on Computer-Aided Design*, 9, 1990, pp. 352-366.
- [12] J. Qian, S. Pullela and L. Pillage, "Modeling the "Effective Capacitance" for the RC Interconnect of CMOS Gates," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, 13(12), 1994, pp. 1526-1535.
- [13] T. Sakurai and R. Newton, "Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas," *IEEE Trans. on Solid State Circuits*, 25(2), 1990, pp. 584-594.
- [14] B. N. Sheehan, "Osculating Thevenin Model for Predicting Delay and Slew of Capacitively Characterized Cells," in *Proc. De*sign Automation Conference, 2002, pp. 866-869.
- [15] B. N. Sheehan, "Library Compatible C<sub>eff</sub> for Gate-Leve Timing," in Proc. Design, Automation and Test in Europe Conference and Exhibition, 2002, pp. 826-830.

 $<sup>^{2}</sup>$ We also run our experiments on a 1X buffer (besides a 8X buffer) and observe identical results for all transition time approximation schemes.

<sup>&</sup>lt;sup>3</sup>With increasingly large driving transistors, we observe improved accuracy on gate delay calculation, and an exaggerated near end signal transistion time which gives good accuracy on interconnect delay and far end signal transition times.

# TABLE I

GATE DELAYS, NEAR END TRANSITION TIMES, WIRE DELAYS, AND FAR END TRANSITION TIMES WITH FOUR APPROXIMATED DRIVING POINT SIGNAL TRANSITIONS - (1) CEFF: AN EFFECTIVE-CAPACITANCE-DERIVED RAMP TRANSITION FROM [12]; (2) CEFF + TAIL: EFFECTIVE-CAPACITANCE-DERIVED RAMP TRANSITION; WITH TAIL APPROXIMATION ADJUSTMENT [12]; (3) CDEFF: AN EFFECTIVE-CAPACITANCE-DERIVED RAMP TRANSITION FROM [2]; OR (4) PWLE: OUR PROPOSED CHARGE-MATCHING PWLE-BASED SIGNAL TRANSITION - ARE NORMALIZED TO SPICE SIMULATION RESULTS FOR AN INTERCONNECT OF 0.23pF TOTAL CAPACITANCE WHICH ARE SCALED BY k = 1, 2, 4, 8, OR 16, AND A 8X LOW THRESHOLD DRIVER IN A  $0.13\mu m$  DESIGN.

|    |             | Rising Signal Transition |            |            |           | Falling Signal Transition |            |            |           |
|----|-------------|--------------------------|------------|------------|-----------|---------------------------|------------|------------|-----------|
| k  |             | gate delay               | near trans | wire delay | far trans | gate delay                | near trans | wire delay | far trans |
|    | Ceff        | 1.068                    | 0.829      | 0.951      | 0.816     | 1.069                     | 0.814      | 0.959      | 0.771     |
| 1  | Ceff + TAIL | 1.068                    | 0.941      | 0.953      | 0.922     | 1.069                     | 1.025      | 0.976      | 0.941     |
|    | Cdeff       | 0.995                    | 0.749      | 0.949      | 0.740     | 0.992                     | 0.715      | 0.946      | 0.697     |
|    | PWLE        | 1.068                    | 1.098      | 0.994      | 1.096     | 1.069                     | 1.196      | 1.057      | 1.169     |
|    | Ceff        | 1.196                    | 0.697      | 0.861      | 0.668     | 1.148                     | 0.654      | 0.826      | 0.717     |
| 2  | Ceff + TAIL | 1.196                    | 0.943      | 0.884      | 0.855     | 1.148                     | 1.247      | 0.922      | 0.897     |
|    | Cdeff       | 1.001                    | 0.556      | 0.837      | 0.574     | 1.011                     | 0.519      | 0.806      | 0.690     |
|    | PWLE        | 0.933                    | 1.063      | 0.962      | 1.043     | 1.017                     | 1.497      | 1.047      | 1.169     |
|    | Ceff        | 1.454                    | 0.490      | 0.731      | 0.628     | 1.153                     | 0.616      | 0.857      | 0.832     |
| 4  | Ceff + TAIL | 1.454                    | 1.006      | 0.821      | 0.788     | 1.153                     | 2.455      | 0.913      | 0.901     |
|    | Cdeff       | 1.087                    | 0.342      | 0.710      | 0.602     | 1.083                     | 0.551      | 0.856      | 0.832     |
|    | PWLE        | 1.120                    | 1.178      | 0.931      | 1.011     | 1.022                     | 2.959      | 1.073      | 1.078     |
|    | Ceff        | 1.455                    | 0.380      | 0.793      | 0.763     | 1.150                     | 0.665      | 0.920      | 0.916     |
| 8  | Ceff + TAIL | 1.455                    | 1.645      | 0.842      | 0.820     | 1.150                     | 5.270      | 0.937      | 0.926     |
|    | Cdeff       | 1.307                    | 0.334      | 0.793      | 0.763     | 1.323                     | 0.840      | 0.920      | 0.916     |
|    | PWLE        | 1.123                    | 1.921      | 0.983      | 0.967     | 1.019                     | 6.380      | 1.058      | 0.997     |
|    | Ceff        | 1.435                    | 0.432      | 0.880      | 0.873     | 1.134                     | 0.757      | 0.959      | 0.958     |
| 16 | Ceff + TAIL | 1.435                    | 3.232      | 0.892      | 0.879     | 1.134                     | 7.893      | 0.961      | 0.959     |
|    | Cdeff       | 1.897                    | 0.599      | 0.881      | 0.873     | 1.756                     | 1.489      | 0.959      | 0.958     |
|    | PWLE        | 1.106                    | 4.307      | 1.002      | 0.941     | 1.000                     | 12.963     | 1.028      | 0.976     |