# UC San Diego

**Technical Reports** 

### Title

Maximum Instantaneous Power Estimation by Subgraph Coloring

### Permalink

https://escholarship.org/uc/item/5ng7w4s3

## Author

Liu, Bao

# Publication Date 2005-10-12

Peer reviewed

## **Maximum Instantaneous Power Estimation by Subgraph Coloring**

Bao Liu UCSD CSE Dept. 9500 Gilman Dr. La Jolla, CA 92093

#### Abstract

Finding maximum instantaneous power consumption helps estimation of worst-case supply voltage drop and lifetime of a VLSI system. We observe signal correlations represented by logic inversions, and present a maximum subgraph two-coloring formulation for the problem. We show the problem is NP-hard, and the complexity comes from the presence of *reconvergent fanouts* besides inputs signal correlations. We further develop an efficient algorithm for general VLSI designs of practical sizes. Our experimental results on the ISCAS'89 benchmark show our subgraph coloring algorithm achieves 4.31% less standard deviation of maximum instantaneous power estimate than the greedy transition assignment algorithm in [12]; and a 2.82% standard deviation of maximum instantaneous power estimate with 2-3 orders of runtime speedups compared with pattern-dependent simulation techniques.

#### **1** Introduction

Continuous VLSI process technology advancement integrates millions of gates on a single chip and introduces increasing power consumption[8]. Average power consumption causes chip heating effect and is responsible for battery life; worst case instantaneous power consumption relates to supply voltage drop and affects lifetime of devices and reliability of a VLSI system[9]. Estimation techniques are needed for both average and worst case instantaneous power consumption.

Several approaches have been proposed on maximum instantaneous power (MIP, i.e., worst case power consumption in a clock cycle) estimation. Applying the theory of extreme order statistics and maximum likelihood estimation reduces the number of input patterns for MIP simulation with given accuracy and confidence[13]. Calculating all possible signal transitions in a clock cycle helps in building a *maximum envelope currents* waveform for each net and providing a MIP upper bound of a netlist[6]. Integer linear programming finds MIP of a small simple gate netlist; MIP of a larger design can be bounded by sum of MIP estimates of each partition[5]. Relaxing integer variables enables continuous optimization and provides a MIP upper bound[11]. A weighted maximumsatisfiability formulation enables a branch-and-bound method in solving the problem[2]. A more scalable greedy algorithm assigns signal transitions to nets in decreasing order of net capacitance, with each assignment followed by a justification procedure, which includes backtracing and implication of typical ATPG (Automatic Test Pattern Generation) algorithms[12]. A genetic algorithm assigns signal transitions to nets in decreasing order of net capacitance times maximum possible signal toggling number of the net in a clock cycle[4].

We observe signal correlations introduced by possible logic inversions across each cell instance, and formulate the problem as maximum subgraph two-coloring. We show that the problem is NP-hard, and the complexity comes from the presence of reconvergent fanouts besides input signal correlations. We further develop an efficient algorithm for general designs of practical sizes. Our experimental results on the ISCAS'89 benchmark show our subgraph coloring algorithm achieves 4.31% less standard deviation of maximum instantaneous power estimate than the greedy transition assignment algorithm in [12]; and a 2.82% standard deviation of maximum instantaneous power estimate with 2-3 orders of runtime speedups compared with pattern-dependent simulation techniques.

The rest of this paper is organized as follows. We expand our analysis from a simple gate to general netlists in section 2. Our experimental results on the ISCAS'89 benchmark are presented in section 3 before we conclude in section 4.



Figure 1: A simple gate consumes maximum instantaneous power when all inputs and outputs take simultaneous transitions.

#### 2 Analysis, Formulation and Algorithm

#### 2.1 Signal Transition Based Power Consumption

Power consumption of a CMOS design consists of four parts: (1) dynamic power which charges capacitive loads, (2) glitch power from additional transitions of a signal before settling to a steady state (which is sensitive to input signal propagation delays), (3) short-circuit power from instantaneous short circuit between the power and ground lines during a signal transition (which grows with a slow signal transition), and (4) static leakage power.

$$P = P_{dynamic} + P_{glitch} + P_{shortckt} + P_{static}$$
  
$$= \sum_{i} (0.5C_{i}V_{dd}^{2}N_{i} + P_{g/t}N_{toggle}N_{i} + P_{s/t}N_{i}) + P_{static}$$
  
$$= \sum_{i} \alpha_{i}N_{i} + P_{static}$$
(1)

with capacitance  $C_i$ , transition number  $N_i$ , additional toggling number  $N_{toggle}$  for each transition, glitch power for each toggling  $P_{g/t}$ , and short circuit power for each transition  $P_{s/t}$  of a net *i*, assuming all transitions swing between supply voltage  $V_{dd}$  and ground voltage 0. Finding maximum  $\alpha$  weighted signal transitions gives maximum instantaneous power consumption.

#### 2.2 Logic Inversion Based Maximum Number of Signal Transitions

Our analysis on finding maximum weighted signal transitions starts with a few observations.

**Observation 1** A simple (AND, OR, NOT) gate consumes maximum instantaneous power when all inputs and output of the gate take simultaneous transitions, i.e., nets with logic inversion take transitions in the opposite direction with nets without logic inversion (Figure 1).

**Observation 2** A simple gate netlist of tree structure consumes maximum instantaneous power when all nets take simultaneous transitions, without the presence of input signal correlations (Figure 2).

We represent a gate level netlist by a *net connectivity graph*  $G_{nc} = (V, E, I)$ , where V is the set of nets,  $(u, v) \in E$  if nets u and v are input and output of a gate, respectively, and signal transitions across (u, v) are either inverted (i(u, v) = 1) or non-inverted  $(i(u, v) = 0, i(u, v) \in I)$ . A simple gate netlist of tree structure with all nets taking simultaneous transitions can be represented as a two-colorable graph (Figure 2).

#### 2.3 **Reconvergent Fanouts**

**Observation 3** For general structure netlists, the presence of reconvergent fanouts introduces cycles of odd number of logic inversions, and prevents all nets from taking simultaneous transitions (Figure 3).

At least one net in a cycle of odd number of logic inversions cannot take transition. Setting a net stable may imply other nets stable: setting a gate input to a controlling value<sup>1</sup> brings the gate output to the controlled value of

 $<sup>^{1}</sup>$ A controlling value independently determines the output of a gate, e.g., 0 is the controlling value of a NAND gate, 1 is the controlling value of a NOR gate.



Figure 2: A simple gate netlist of tree structure which consumes maximum instantaneous power when all nets take simultaneous transitions without the presence of input signal correlations (left); and its net connectivity graph  $G_{nc}$  (right), where solid edges connect two nets with logic inversion i(u, v) = 1; dotted edges connect two nets without logic inversion i(u, v) = 0; white vertices are nets with rising signal transitions; and dark vertices are nets with falling signal transitions.

| net | set to | implied stable nets | stable net cap | weight |  |
|-----|--------|---------------------|----------------|--------|--|
| b   | 0      | a b                 | 2              | 2      |  |
|     | 1      | bdehi               | 6              |        |  |
| с   | 0      | abcfgij             | 8              | 1      |  |
|     | 1      | с                   | 1              |        |  |
| e   | 0      | a b e               | 3              | 3      |  |
|     | 1      | e h i               | 4              |        |  |
| f   | 0      | c f                 | 2              | 2      |  |
|     | 1      | fij                 | 4              |        |  |

Table 1: Implied stable nets, total capacitance of implied stable nets, and weights of nets in Figure 3. All net capacitances equal to their fanouts (i.e., all nets have unit capacitance expect net *i* which has a capacitance of 2).

the gate;<sup>2</sup> setting a gate output to a non-controlled value of the gate implies the gate inputs of the non-controlling value of the gate.<sup>3</sup> We define the *weight* w(v) of a net v as the minimum sum of capacitance of the nets which are implied stable when net v is stable (for example, Table 1 shows the weights of the nets in Figure 3).

**Observation 4** A cycle of odd number of logic inversions consumes maximum instantaneous power when all nets take simultaneous transitions except the net with the minimum weight in the cycle.

The problem is more complicated in the presence of multiple interweaved cycles. We propose a higher level of abstraction in solving the problem.

#### 2.4 Fanout Net Connectivity Graph

We abstract a net connectivity graph  $G_{nc}$  into a fanout net connectivity graph  $G_{fnc} = (V, E, I, W)$ , where V is the set of *fanout nets* (nets with multiple fanouts), an edge  $(u, v) \in E$  with  $i(u, v) \in I$  and  $w(u, v) \in W$  in  $G_{fnc}$  if there is a path path(u, v) in  $G_{nc}$ , logic inversion across edge  $(u, v) \in G_{fnc}$  equals that across  $path(u, v) \in G_{nc}$ , i.e.,  $i(u, v) = XOR\{i(p,q) | (p,q) \in path(u, v), (p,q) \in E \in G_{nc}\}$ , and weight of edge  $(u, v) \in G_{fnc}$  equals the minimum net weight on  $path(u, v) \in G_{nc}$ , i.e.,  $w(u, v) = MIN\{w(p) | p \in path(u, v), p \in V \in G_{nc}\}$  (for example, the net connectivity graph  $G_{nc}$  in Figure 3 (right) is abstracted as the fanout net connectivity graph  $G_{fnc}$  in Figure 4). We formulate the maximum power estimation problem as maximum subgraph two-coloring:

we formulate the maximum power estimation problem as maximum subgraph two-coloring:

 $<sup>^{2}</sup>$ A controlled value is the gate output value when a controlling value is applied to a gate input, e.g., 1 is the controlled value of a NAND gate, 0 is the controlled value of a NOR gate.

<sup>&</sup>lt;sup>3</sup>Setting a gate output to a controlled value doesn't imply that the gate inputs are stable: the gate inputs can take simultaneous transitions in opposite directions.



Figure 3: Nets in a cycle of odd number of logic inversions cannot take simutaneous transitions (left); its net connectivity graph  $G_{nc}$  cannot be properly colored due to the presence of a cycle of odd number of logic inversions (right).



Figure 4: Fanout net connectivity graph  $G_{fnc}$  of the netlist in Figure 3.

**Problem 1 (Maximum Subgraph Two-Coloring)** Given a graph  $G_{fnc} = (V, E, I, W)$ , find a two-coloring of a subgraph with maximum sum of weights  $f : V' \to \{0, 1\}$ , where  $V' \subset V$ , and

$$f(u) = f(v) \quad if(u,v) \in E, i(u,v) = 0$$
  

$$f(u) \neq f(v) \quad if(u,v) \in E, i(u,v) = 1$$
(2)



**Proof.** Maximum subgraph two-coloring can be reduced to the following bipartite subgraph problem:

**Problem 2 (Bipartite Subgraph)** Instance: Graph G = (V, E), positive integer  $K \le |E|$ , Question: Is there a subset E' in E with  $|E'| \ge K$  such that G' = (V, E') is bipartite?

which is NP-complete[3].

We develop an efficient MIP estimation algorithm (Algorithm 1), which constructs a maximum spanning tree on a fanout net connectivity graph  $G_{fnc}$ , and sets stable the nets with minimum weights on each path in  $G_{nc}$  which corresponds to a non-tree edge in  $G_{fnc}$ . The runtime is dominated by maximum spanning tree construction, which is  $O(N \log N)$ , where N is the number of fanout nets.

| Algorithm 1: MIP estimation algorithm                                                                                                                                                                                                                                                                                                                   |  |  |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| <b>Input:</b> A netlist represented by a net connectivity graph $G_{nc} = \{V, E, I, W\}$ with nets V, gate connections E, logic inversions I and net capacitive weights W<br><b>Output:</b> A two-colorable subgraph with maximum sum of weights                                                                                                       |  |  |  |  |
| <ol> <li>Construct fanout net connectivity graph G<sub>fnc</sub></li> <li>Construct maximum spanning tree on G<sub>fnc</sub></li> <li>For each non-tree edge e in G<sub>fnc</sub><br/>Finding corresponding path p in G<sub>nc</sub><br/>Set stable the net with minimum weight on path p</li> <li>Output subgraph of transition-taking nets</li> </ol> |  |  |  |  |

| ckt   | $ G_{nc} $ | $ G_{fnc} $ | $ G_{fnc} /G_{nc} $ | $ E_{fnc}  -  V_{fnc} $ |
|-------|------------|-------------|---------------------|-------------------------|
| s208  | 124        | 28          | 0.226               | 46                      |
| s298  | 136        | 34          | 0.250               | 67                      |
| s344  | 184        | 40          | 0.217               | 57                      |
| s349  | 185        | 41          | 0.222               | 82                      |
| s382  | 182        | 14          | 0.077               | 10                      |
| s386  | 172        | 26          | 0.151               | 23                      |
| s526  | 217        | 54          | 0.249               | 129                     |
| s820  | 312        | 39          | 0.125               | 73                      |
| s832  | 310        | 39          | 0.126               | 61                      |
| s1196 | 561        | 151         | 0.269               | 469                     |
| s1238 | 540        | 165         | 0.306               | 618                     |
| s1488 | 667        | 76          | 0.114               | 175                     |
| s1494 | 661        | 76          | 0.115               | 209                     |

Table 2: Topological structures of ISCAS'89 benchmark netlists.

#### 2.5 Application to General Netlists

General netlists can be decomposed into simple gate netlists, or, logic inversions can be abstracted at a higher hierarchy level with certain assertions (e.g., of enabling signals in multiplexers or encoder/decoders), and maximum instantaneous power can be estimated for each assertion case. We abstract possible logic inversions across sequential elements, e.g., flip-flops and latches. We include input signal correlations and propagate constant signals before constructing a fanout net connectivity graph  $G_{nc}$ .

#### **3** Experiments

We implement our algorithm in C and run our experiments on the ISCAS'89 benchmark on a Sun Sparc 20 workstation. Table 2 gives topological structures of the ISCAS'89 benchmark netlists in our experiments. An average of 20.0% nets are fanout nets, which shows an average of five-time reduction in problem size by abstracting  $G_{fnc}$ from  $G_{nc}$ . The numbers of cycles of odd number of logic inversions, shown in the last column of Table 1, provide an indication of complexity for each test case.

We compare with simulation result of an input pattern dependent maximum power estimator in [12], which is within 5% accurate with 95% confidence through a Monte Carlo technique. As in [12], we assume per-transition toggling power consumption  $P_{g/t}N_{toggle}$  and per-transition short-circuit power consumption  $P_{s/t}$  are proportional to net capacitance  $C_i$ , and net capacitance  $C_i$  is proportional to net fanout number. Maximum instantaneous power is given in terms of total fanout number of the transition-taking nets.

Table 3 shows our subgraph coloring algorithm achieves 2-3 orders of speedups and a 2.82% standard deviation of MIP estimate compared with input-dependent simulation. A maximum speedup of 5190 times is achieved with a 97.4% accuracy for the s820 design; for the s1238 and s1494 designs, our algorithm achieves even better results (larger MIP) than the pattern-dependent simulator with 0.2% and 0.7% CPU time, respectively.

The greedy transition assignment algorithm in [12] has a 7.13% standard deviation of MIP estimate. Compared with the greedy transition assignment algorithm, subgraph coloring achieves 4.31% less standard deviation and an average of 9.16 times runtime speedup<sup>4</sup>.

#### 4 Conclusion

Maximum instantaneous power is associated with weighted maximum number of signal transitions. We observe signal correlations represented by logic inversions in a VLSI netlist, and formulate the problem as maximum subgraph two-coloring. We show the problem is NP-hard, which complexity comes from the presence of reconvergent fanouts besides input signal correlations. We develop an efficient algorithm for general designs of practical sizes, which runs in  $O(N \log N)$  time, where N is the number of fanout nets. Resultant signal transitions provide an estimate on maximum instantaneous power, e.g., through an input-dependent simulator. Our experimental results on the ISCAS'89 benchmark show our subgraph coloring algorithm achieves 4.31% less standard deviation of

 $<sup>^{4}</sup>$ Runtime of the greedy algorithm is reported on a *HP*715/50 workstation.

|       | Subgraph Coloring |            |         | Greedy[12] |            | Simulation |     |            |       |
|-------|-------------------|------------|---------|------------|------------|------------|-----|------------|-------|
| ckt   | MIP               | CPU (sec.) | err (%) | MIP        | CPU (sec.) | err (%)    | MIP | CPU (sec.) | #sim  |
| s208  | 138               | 0.04       | 0.0     | 135        | 0.24       | -2.2       | 138 | 51.8       | 13590 |
| s298  | 189               | 0.05       | -3.1    | 201        | 0.61       | 3.1        | 195 | 30.7       | 13410 |
| s344  | 187               | 0.10       | 0.0     | 185        | 0.39       | -1.1       | 187 | 70.7       | 14790 |
| s349  | 188               | 0.07       | 0.0     | 180        | 1.42       | -4.3       | 188 | 68.6       | 14790 |
| s382  | 226               | 0.07       | 0.0     | 266        | 1.96       | 17.7       | 226 | 42.8       | 14790 |
| s386  | 248               | 0.08       | -7.8    | 268        | 0.33       | -0.4       | 269 | 77.6       | 12630 |
| s526  | 296               | 0.15       | -0.3    | 298        | 0.74       | 0.3        | 297 | 72.4       | 14790 |
| s820  | 608               | 0.03       | -2.6    | 612        | 4.66       | -1.9       | 624 | 155.7      | 14580 |
| s832  | 625               | 0.04       | -1.3    | 619        | 6.83       | -2.2       | 633 | 169.6      | 14580 |
| s1196 | 605               | 0.70       | -4.6    | 594        | 10.6       | -6.3       | 634 | 321.6      | 16380 |
| s1238 | 654               | 0.65       | 0.8     | 610        | 1.63       | -6.0       | 649 | 311.0      | 16380 |
| s1488 | 908               | 0.90       | -1.6    | 784        | 4.33       | -15.1      | 923 | 187.0      | 12810 |
| s1494 | 932               | 1.25       | 0.6     | 938        | 3.46       | 1.3        | 926 | 188.9      | 12810 |

Table 3: Comparison of MIP estimation by subgraph coloring, greedy transition assignment[12] and an inputdependent simulation on ISCAS'89 sequential benchmark test cases.

maximum instantaneous power estimate than the greedy transition assignment algorithm in [12]; and a 2.82% standard deviation of maximum instantaneous power estimate with 2-3 orders of runtime speedups compared with pattern-dependent simulation techniques.

#### References

- K. Banerjee, A. Mehrotra, A. Sangiovanni-Vincentelli and C. Hu, "On Thermal Effects in Deep Sub-Micron VLSI Interconects," in *Proc. Design Automation Conference*, 1999, pp. 885-891.
- [2] S. Devadas, K.Keutzer and J. White, "Estimation of Power Dissipation in CMOS Combinational Circuits Using Boolean Function Manipulation," *IEEE Trans. on Computer-Aided Design*, 11(3), 1992, pp. 373-383.
- [3] M. R.Garey and D. S. Johnson, *Computers and Intractability: a Guide to the Theory of NP-completeness*, W. H. Freeman Publication, San Francisco, 1979.
- [4] M. S. Hsiao, "Peak Power Estimation Using Genetic Spot Optimization for Large VLSI Circuits", in Proc. Design, Automation and Test in Europe, 1999, pp. 175-179.
- [5] Y.-M. Jiang and K.-T. Cheng, "Exact and Approximate Estimation for Maximum Instantaneous Current of CMOS Circuits", in *Proc. Design, Automation and Test in Europe*, 1998, pp. 698-702.
- [6] H. Kriplani, F. N. Najm and I. N. Hajj, "Pattern Independent Maximum Current Estimation in Power and Ground Buses of CMOS VLSI Circuits: Algorithms, Signal Correlations, and Their Resolution," *IEEE Trans.* On Computer-Aided Design of Integrated Circuits and Systems, 14(8), 1995, pp. 998-1012.
- [7] A. Macii, E. Macii and M. Poncino, "Reducing Peak Power Consumption of Combinational Test Sets," in *Proc. Asilomar Conference on Signals, Systems and Computers*, 2, 1998, pp. 1042-1046.
- [8] F. N. Najm, "Estimating Power Dissipation in VLSI Circuits," *IEEE Circuits and Devices Magazine*, 10(4), 1994, pp.11-19.
- [9] F. N. Najm, "A Survey of Power Estimation Techniques in VLSI Circuits," *IEEE Trans. On VLSI Systems*, 2(4), 1994, pp. 446-455.
- [10] S. R. Vemuru and N. Scheinberg, "Short-Circuit Power Dissipation Estimation for CMOS Logic Gates," *IEEE Trans. on Circuits and Systems-1: Fundamental Theory and Applications*, 41(11), 1994, pp. 762-765.
- [11] C.-Y. Wang and K. Roy, "COSMOS: A Continuous Optimization Approach for Maximum Power Estimation of CMOS Circuits", in *Proc. International Conference of Computer-Aided Design*, 1997, pp. 52-55.
- [12] C.-Y. Wang, K. Roy and T.-L. Chou, "Maximum Power Estimation for Sequential Circuits Using a Test Generation Based Technique," in *Proc. IEEE Custom Integrated Circuits Conference*, 1996, pp. 229-232.
- [13] Q. Wu, Q. Qiu and M. Pedram. "Estimation of Peak Power Dissipation in VLSI Circuits Using the Limiting Distributions of Extreme Order Statistics," *IEEE Trans. on Computer-Aided Design*, 20(8), 2001, pp. 942-956.