# UC San Diego

**Technical Reports** 

### Title

A Multiple Level Network Approach for Clock Skew Minimization with Process Variations

Permalink https://escholarship.org/uc/item/2v29675w

Authors Mori, Makoto Chen, Hongyu Yao, Bo <u>et al.</u>

Publication Date 2003-07-28

Peer reviewed

### A Multiple Level Network Approach for Clock Skew Minimization with Process Variations

Makoto Mori, Hongyu Chen, Bo Yao, and Chung-Kuan Cheng

{mouri, hchen, byao, kuan}@cs.ucsd.edu

Department of Computer Science and Engineering, University of California, San Diego 9500 Gilman Dr., La Jolla, CA 92093

#### ABSTRACT

In this paper, we investigate the effect of multilevel network for clock skew. We first define the simplified RC circuit model of a hybrid clock mesh/tree structure. The skew reduction effect of shunt segment contributed by the mesh is derived analytically from the simplified model. The result indicates that the skew decreases proportionally to the exponential of  $-R_I/R$ , where  $R_I$  is the driving resistance of a leaf node in the clock tree and R is the resistance of a mesh segment. Based on our analysis, we propose a hybrid multi-level mesh and tree structure for global clock distribution. A simple optimization scheme is adopted to optimize the routing resource distribution of the multi-level mesh. Experimental results show that by adding a mesh to the bottom-level leaves of an H-tree, the clock skew can be reduced from 29.2 ps to 8.7 ps, and the multi-level networks with same total routing area can further reduce the clock skew by another 30%. We also discuss the inductive effect of mesh in the appendix. When the clock frequency is less than 4 GHz, our RC model remains valid for clock meshes with grounded shielding or using differential signals.

#### 1. Introduction

The clock distribution network design has been a great challenge in the state-of-the-art high performance chip design. With tens or even hundreds millions of transistors integrated, how to distribute the clock signal to the local areas all over the chip with near-zero skew becomes a very hard problem. Moreover, as the clock frequency climbs to giga-hertz range and the interconnect delay dominates in deep sub-micron technology, the portion of the clock skew introduced by the process variations on the wire width and the clock buffers length can no longer be ignored. A robust clock distribution network that is less sensitive to the process variations is desired.

In real designs, the clock distribution networks can be partitioned into two parts: the global clock network and local network. The global clock networks distribute the clock signal from the clock source in the center of the chip to local regions. It usually has a symmetric structure. Local distribution networks deliver clock signals to numerous registers in the local area. Its structure is often non-symmetric because the locations of registers are not necessarily regular. In this paper, we focus on the global clock networks.

Many works have been done in the past two decades to find the best structure for global clock distribution. The tree-based structures are widely used, because the tree structure has the advantage of easy to tune and simulate, to achieve low clock skew and power consumption [1].

Coping with the process variations, however, the mesh structure is better than a tree structure, since the mesh has more local connection that can smooth out the local delay variations and yield a better clock skew.

A recent trend is to use the hybrid structure of a mesh and symmetric trees for the global clock network. For example, the Intel Pentium<sup>®</sup> 4 microprocessor adopts three spines on the bottom level and binary distribution trees on the top level to deliver the global clock [9]. The bottom-level spines can be deemed as a simple mesh structure. Restel et. al. proposed a hybrid clock network with two levels. The top level is an H-tree, and the bottom level is a uniform mesh that connects all the leave of the top level H-tree. This hybrid clock network structure has been successfully applied to six designs[1], including the latest Power4 microprocessors [9][14]. The measurements from real chip produce proved that this two level structure accomplished low clock skew with process variations.



Figure 1: A Multilevel Global and Local Clock Distribution Networks

In [17], the authors proposed a two level clock network which makes a departure from the popular "mesh at the lowest level" structure. In their design, the top level is a zero-skew mesh that delivers clock to four quarters of the chip from a single source and the bottom level contains four zero-skew trees. Their method inspires us to ask at which level of the tree the shortcut mesh works most efficiently in terms of skew reduction.

The mixture of the tree structure and the mesh structure complicates the simulation problem. Simple Elmore delay model that fits the tree structure no longer applies to the mixed structures. A new model is needed to calculate the delay and also provide guidelines for the design of the hybrid networks. In this paper, we try to address this problem. In order to keep the presentation easy to understand we focus on the hybrid clock network similar to the one proposed by [10], which consists a symmetric H-tree and a mesh connecting all the bottom level leaves. Our method can also be applied to other hybrid tree and mesh structure, like the binary tree plus spine structure[9].

Based on our study on the skew reduction effect of mesh, we propose a multilevel network for global clock distribution. Figure 1 shows a schematic example of our multilevel network. The dotted line represents the meshes which connect together all the nodes at the same level in the original H-tree. In order to reduce the inductance of the shunt segments, we may use grounded shielding or differential pairs for mesh connections.

Our contributions in this paper include the following:

1. We use a simplified RC circuit model to study the skew reduction effect of adding shunt connections between two leaf nodes of clock tree. Based on this model, we derive an analytical skew approximation, which fits the SPICE simulation very well. And further analysis and simulation suggests that the RC model remains valid for differential clock nets running at the frequency of less than 4GHz.

2. We extend our skew approximation formula to the mesh network. We get the parameters of the skew expression from the SPICE simulation results by least square linear regression. This analysis can be used to guide the design of hybrid clock network.

3. We propose a mixed multi-level mesh and tree structure for global clock distribution networks.

4. We propose a method to optimize the routing resource distribution of a mixed multi-level mesh and tree clock network. The optimized multi-level mesh/tree network produces 30% skew reduction over the single-level mesh and tree network and is more robust in the presence of voltage fluctuation.

The rest of the paper is organized as follows: In Section 2, we formulate the hybrid multi-level mesh/tree optimization problem. In section 3, we propose a simplified circuit model for hybrid mesh/tree networks and derive the analytical skew expression. SPICE simulation results show that our skew formula is very accurate for single branch. In section 4, we extend the skew formula to uniform meshes. Following that, we introduce our multi-level mesh optimization scheme in section 5 and in section 6, we present the experimental results. In Section 7, we discuss the inductive effect and then, we conclude this paper in section 8.

### 2. Problem Formulation

#### 2.1 Process Variations Model

Semiconductor manufacturing variations occur when process parameters deviate from their ideal, as-designed values. Process variations have always been a key concern for manufacturability, process control, and circuit design. With rapid technology scaling, the importance of the impact of variations on the circuit design is increasing further.

Variation can be categorized into temporal and spatial sources [3]. Temporal sources are time-varying and change depending on circuit activities. Spatial effects are depend on physical factors and impact the geometry of a structure and can lead to undesirable effects such as yield loss. In this paper, we mainly consider the variations on the geometrical parameters of interconnect and devices in a clock distribution network.

Conventional circuit techniques typically represent the interconnect and device parameter variations as random variables. However, recent studies [11] have shown that strong spatial pattern dependencies exist, especially when considering interconnect variations in strong chemical mechanical polishing (CMP) process. Therefore, the total variation can be separated into systematic and random components. [11] shows that considering the systematic variations is the key to reduce design uncertainty and maximize circuit performance.

In this paper, we adopt a simple linear variation model to represent the systematic spatial variations on wire widths and transistor lengths. For any location (x, y) on the chip, the actual geometrical parameter  $d = d_0 + k_x x + k_y y$ , where  $d_0$  is the nominal parameter and  $k_x$ ,  $k_y$  are the horizontal, vertical variation coefficient, respectively. Without loss of generality, we assume that the origin of the coordinate (0,0) locates in the center of the chip, and  $k_x$ ,  $k_y$  are positive numbers. We set the maximum variations across the chip to be  $\pm 10\%$  of the ideal value[15]

## 2.2 Optimum Balanced Clock Tree Augmentation Problem

We illustrate a symmetric clock tree and mesh network in Figure 2. Each of the leaf nodes on the H-tree connects a sink capacitor contributed by local clock tree and a uniform mesh connects all of the leaf nodes together.



Figure 2: An H-tree with Bottom-level Mesh

With process variations [16], we can observe the clock skew between the nodes in the same level. Suppose we are given the layout information and process variations model of a symmetric zero-skew clock tree (either H-tree or binary tree), we can obtain the clock skew between the nodes at the same level using Monte Carlo simulation [18] or variational circuit analysis[10]. We denote  $T_i$  the worst clock skew between any two nodes at the level *i* in the H-tree.

Adding shunt segments between nodes at the same level in a tree is a common way to reduce the skew and widely accepted by the industrial practice. For example, in the symmetrical H-tree shown in Figure 2, all the leaf nodes on level 3 are connected by an 8 by 8 mesh, which is drawn by dotted line in Figure 2. We can also connect the four level one nodes a, b, c, d, by a 2 by 2 mesh.

When wire width of the mesh is wide enough, the nodes at the same level are almost short-circuited, the skew between them can approach zero. On the other side, using too wide wire may waste too much routing resource and hence degrade the routability. In addition, wide wires in the mesh can increase the clock slew because it increases the load capacitance of clock buffers. Consequently, the design of clock distribution networks must follow the total routing area budget.

For same amount of routing resources, adding them to the meshes at different level may have different impact on the clock skew. In this paper, we are interested in the optimal way to distribute the routing resources to the meshes in different level such that the minimum skew is achieved at the leaf nodes with given routing area budget.

We formulate this problem as the following optimum balanced clock tree augmentation problem.

#### **Optimum Balanced Clock Tree Augmentation Problem:**

Given: An n level symmetric clock tree (wire width of segments in each level, buffer location and buffer sizes);

The clock skew between nodes at the same level introduced by process variations

- Input: The total routing area budget for all the meshes
- Output: The optimum wire width  $w_i$  of shortcut connections at level *i*, for  $i = 1 \dots n$ , such that the clock skew is minimized
- Topology Constraints: uniform mesh/spine for the short cut connections in each level.

### 3. Skew-Shunt Resistance Relations in a Simplified Circuit Model

We use a simplified circuit model shown in Figure 3 to study the skew-shunt resistance relations in a hybrid tree and mesh structure. In the model, there are two nodes of a clock tree,  $s_1$ and  $s_2$ . Both of them drive a load capacitance with value  $C_1$ , which is the summation of the sink capacitance and the wire capacitance.  $R_1$  consists of the resistance in the clock buffer and the final segment of the H-tree. A mesh segment with resistance R is added between these two tree branches to reduce the skew between node  $n_1$  and node  $n_2$ . We assume that  $V_{s1}$  and  $V_{s2}$  are step function and  $V_{s2}$  works behind time T from  $V_{s1}$ . Note that many factors can contribute to this timing difference T between node  $s_1$  and  $s_2$ , for example, the skew effect due to the distribution of upstream network, variations of  $R_1$ , and  $C_1$ , and supply voltage variations at the clock buffers. For the simplicity of modeling, we summarize all these effects into the timing difference T between the input step functions. We assume that Tis given.



Figure 3: simplified circuit

#### 3.1 Skew Function

We defined  $V_1$  and  $V_2$  as the voltage of node  $n_1$  and  $n_2$ , respectively and get the following equation from the circuit in Figure 3.

$$\begin{bmatrix} \frac{dV_{-1}}{dt} \\ \frac{dV_{-2}}{dt} \end{bmatrix} = \begin{bmatrix} -\frac{1}{RC_{-1}} - \frac{1}{R_{+}C_{-1}} & \frac{1}{RC_{-1}} \\ \frac{1}{RC_{-1}} & -\frac{1}{RC_{-1}} - \frac{1}{R_{+}C_{-1}} \end{bmatrix} \begin{bmatrix} V_{+} \\ V_{2} \end{bmatrix}$$

$$+ \begin{bmatrix} \frac{1}{R_{+}C_{-1}} & 0 \\ 0 & \frac{1}{R_{+}C_{-1}} \end{bmatrix} \begin{bmatrix} V_{+1} \\ V_{+2} \end{bmatrix}$$
(1)

We derive a simple skew function from equation (1):

$$\Delta T = T \exp(-2\ln 2\frac{R_1}{R}) \tag{2}$$

The derivation is shown in the appendix.

Surprisingly, in the obtained skew model (3.2), skew  $\Delta T$  is only determined by  $R_I/R$ , the ratio between driving resistance and shunt resistance, and is independent with the value of load capacitance. This simple relation enables us to easily estimate the skew on a hybrid tree and mesh clock network. We will verify this relation by SPICE simulation in the following subsection.

## **3.2 Validation of Skew Formula with SPICE Simulation**

We use SPICE to simulate the circuit in Figure 3. We change the value of  $R_1$  from 1200 $\Omega$  to 300 $\Omega$ , R from 100 $\Omega$  to 1000 $\Omega^1$ , We set the value of  $C_1$  range from 10fF to 200fF. We simulate the circuit with different parameters using SPICE.

At first, we show the relations between skew and  $R_1$ ,  $C_1$ , R when T is 5 ps. Figure 4 indicates the effect of  $R_1$  and  $C_1$ . From this result, the skew decreases proportionally to the exponential of  $R_1/R$  and  $C_1$  barely affects the skew in our interested range.



Figure 5: Skew~R1/R Relation in Simplified Hybrid Tree/Mesh Circuit Model

Figure 5 presents the skew versus  $R_1/R$  relation. We also plot the curve for our skew model  $\Delta T = T \exp(-2ln2R_1/R)$ in the same figure. Note that the skew is in log scale, hence the curve for our skew function is a straight line. From the plot, we can see that all the points locate around the straight line. For different value of  $C_1$ , the experimental results deviate very little from the straight line. This result justifies our skew formula (2), in which the skew is determined by the value of  $R_1/R$  and is proportional to the exponential of  $-R_1/R$ .

#### 4. Clock Skew on a Mesh

We extend our skew model derived from simplified model, where only one shunt segment exists, to describe the skew on a mixed H-tree and mesh network. From formula (2), we conjecture that the skew shunt resistance relation on a full mesh will also follow similar exponential trend. Then, for any N by Nmesh, we can have the skew expression in the form of:

$$\Delta T = T \exp(-kR_1 / R)$$

<sup>&</sup>lt;sup>1</sup>1200Ω and 300Ω are typical driving resistance of minimum size buffer and 4 times wide buffer, respectively. 400Ω is the typical value of resistance of 1mm wire with minimum wire width in 70nm technology.

Where, k is the constant related to N, the number of columns and rows in a mesh,  $\Delta T$  is the resulted skew with mesh, T is the initial skew without mesh.

We use SPICE simulation to justify our conjecture, and get the k values for different meshes using curve-fitting technique.

We made the circuit model shown in Figure 6 that is showing the connection between H-tree having  $N^2$  leaves and N by N mesh.

The process variations cause the skew. In this paper, we consider the variations of transistor length and wire width that are important for circuit analysis [10][15] and dependent on intra-chip location strongly [12][16]. We assume the lower left corner in chip has the fastest delay and the upper right corner has the slowest delay for the worst skew analysis. Therefore, we define the variations model varying transistor length and wire width linearly from lower loft to upper right depending on the location. The center, the upper left corner and the lower right corner has medium value. Voltage sources  $V_{Sij}$  create step function. As a result of process variations, the lower and more left voltage source works earlier, the upper and the more right one works later and each adjacent voltage source has timing different T/(2 (N-1)).  $V_{SNN}$  works behind time T from  $V_{S11}$  after all.

We synthesize a 4-level H-tree in 70 *nm* technology by the method proposed in [5]. This technique optimizes wire width and the buffer size and location. Then, we put 10% variations into transistor length and wire width in H-tree. Figure 7 to Figure 10 show the global skew simulation result for each level mesh when we change R, which corresponds to the wire width of the mesh, from minimum width to 5 times wide width. At that time,  $R_1$  and  $C_1$  are fixed for H-tree is given. In addition, T is given because we can obtain it from H-tree simulation before we add the mesh to the H-tree. Table 1 shows those parameters.



Figure 6: circuit model for mesh

We do our experiments for 2 by 2, 4 by 4, 8 by 8, and 16 by 16 meshes. Figures 7 to 10 display the experimental results. All of these plots show very nice linearity on the  $\Delta T \sim R_1/R$  relation in the log scale graph.

We calculate k using least square linear regression to fit the line to the points in the figure. k is the slope in the Figures 7-10. Table 2 presents k factors for the mesh of each level.



Figure 7: Skew ~ R1/R relation for a 2 by 2 mesh



Figure 8: Skew ~ R1/R relation for a 4 by 4 mesh



Figure 9:Skew ~ R1/R relation for a 8 by 8 mesh



Figure 10: Skew ~ R1/R relation for a 16 by 16 mesh

Table 1: parameters for different meshes

| mesh size    | T <sub>i</sub> (ps) | $R_1(\Omega)$ | C <sub>1</sub> (fF) | $R(\Omega)$ |        |
|--------------|---------------------|---------------|---------------------|-------------|--------|
| 1110511 5120 |                     |               |                     | min         | max    |
| 2 by 2       | 6.26                | 183.8         | 141.66              | 339.16      | 1288.8 |
| 4 by 4       | 11.75               | 367.5         | 69.63               | 169.58      | 644.4  |
| 8 by 8       | 21.06               | 735.0         | 34.22               | 84.79       | 322.2  |
| 16 by 16     | 29.22               | 1470.0        | 20.00               | 42.39       | 161.1  |

Table 2: k values for different meshes

| mesh size | 2 by 2 | 4 by 4 | 8 by 8 | 16 by 16 |
|-----------|--------|--------|--------|----------|
| k         | 1.167  | 0.363  | 0.107  | 0.030    |

## 5. Optimization of Multi-level Mesh and Tree Structure

For a uniform mesh, R, the resistance of a wire segment is inverse proportional to its width, w. We rewrite the skew expression  $\Delta T = T \exp(-kR_1/R)$  as

 $\Delta T = T \exp(-k'R_1w) = T \exp(-k'w)$ . Where, k is the constant determined by the number of columns and rows of the mesh, and k' is also a constant for given  $R_1$  and mesh.

We can formulate the Optimum Balanced Clock Tree Augmentation Problem as following nonlinear programming problem MLMOP(Multi-Level Mesh Optimization Problem):

Min: 
$$\Delta T = (((T_1 e^{-k_1 w_1}) + T_2) e^{-k_2 w_2} + T_3) \dots + T_n) e^{-k_n w_n}$$
 (5.1)

$$\text{s.t.:} \sum_{i=1}^{n} l_i w_i = A \tag{5.2}$$

Where, the constant A is the total routing area budget for all the meshes.  $l_i$  is the total wire length of level i mesh.  $T_i$  is the initial skew between level i-I nodes and level i nodes on the H-tree without a mesh. A,  $L_i$ , and  $T_i$  are all constants and are given. The wire width of level-i mesh,  $w_i$ , are variables.

In above nonlinear program, the cost function (5.1) is the skew at the bottom level leaves, which is to be minimized, and the constraint (5.2) is the total routing area constraint. By solving this non-linear program, for any given total routing area of multi-level meshes, we can find the minimum skew can be achieved by multi-level mesh and the best way to assign routing resources to meshes at different levels.

From the property of exponential functions, we can show that equation (5.1) is a convex function. And because the constraint (5.2) is also a convex set, we have following theorem about the nonlinear program MLMOP.

**Theorem:** The local optimal solution of the nonlinear program MLMOP is also the global optimum.

Because of the convex property of the skew function (5.1), many optimization techniques (e.g. many gradient methods and line search methods [2]) can be used to find the best {*wi*} assignment such that the skew is minimized subject to the total routing area constraint. In our experiments, we use the line search algorithm provided in the optimization toolkit of Matlab to solve this nonlinear program.

#### 6. Experimental results

We apply our method to the hybrid clock network in 70 nm technology. In our experiments setting, the chip size is 24 mm x 24 mm and the 4-level symmetric H-tree is synthesized by the method described in [5]. The first level mesh connects  $4^1$  nodes of first level H-tree and the length of a segment is 12 mm, i.e. the total wire length is 48 mm. Regarding to 2nd level mesh, the number of nodes is  $4^2$ , the length of each segment is 6 mm and total wire length is 144 mm. The segment length and total wire length for 3rd level mesh is 3 mm and 336 mm respectively, and for 4th (bottom) level mesh they are 1.5 mm and 720 mm similarly.

Table 3 presents the optimized wire widths for each level mesh. Moreover, we normalized wire resources by bottom level mesh with minimum wire width (160 nm). The result suggests that we should put wire resources into higher (top) level mesh until that level saturates, and then we should put them into lower (bottom) level meshes.

 Table 3: Optimized Resource Distribution

|            | optimized wire width of each level mesh |      |      |      |
|------------|-----------------------------------------|------|------|------|
| total area | 1st                                     | 2nd  | 3rd  | 4th  |
| 0.25       | 1.23                                    | 0.98 | 0.00 | 0.00 |
| 0.40       | 1.23                                    | 1.85 | 0.00 | 0.00 |
| 1.00       | 1.23                                    | 1.88 | 1.27 | 0.00 |
| 3.00       | 1.23                                    | 1.88 | 2.56 | 1.40 |
| 5.00       | 1.23                                    | 1.89 | 2.56 | 3.40 |

Table 4 expresses the comparison between optimized multi mesh and single level mesh by SPICE simulation. Regarding single level mesh, we put all resources into bottom level mesh like as industry doing. We put each of optimized 4-level mesh and single level mesh into original H-tree, and then ran SPICE simulation to measure the global skew. From this simulation, the more we use wire resource for mesh, the more the optimized mesh reduces skew comparing with single level mesh. When we used only ×0.25 resource, improvement by optimization is only 2.2%, however when we use ×5.0 resource, optimized mesh can accomplish 30% reduction of skew.

Figure 11 and Figure 12 demonstrate the effect of mesh on clock skew. In these two figures, the crossing points mean the sink node at bottom level H-tree, *x*- and *y*-axis indicate the position in chip and *z*-axis is the delay of sink nodes. Figure 11 shows the delay map for a H-tree without a mesh, and Figure 12 demonstrate the case of a multi-level network. The worst local skew and the global skew in Figure 11 are 5.9ps, 29.2ps respectively, by adopting a multilevel network, these values decrease to 3.1ps, 19.8ps respectively.



Figure 11 Delay of Clock Terminals in an H-tree w/o Mesh



Figure 12 Delay of Clock Terminals in a Multilevel Network

|            | skew      |           |        |  |
|------------|-----------|-----------|--------|--|
| total area | s-mesh(s) | m-mesh(s) | m/s    |  |
| 0.00       | 2.92E-11  | 2.92E-11  | 100.0% |  |
| 0.25       | 2.79E-11  | 2.60E-11  | 93.2%  |  |
| 0.40       | 2.71E-11  | 2.45E-11  | 90.4%  |  |
| 1.00       | 2.42E-11  | 1.98E-11  | 81.8%  |  |
| 3.00       | 1.70E-11  | 1.24E-11  | 73.2%  |  |
| 5.00       | 1.24E-11  | 8.72E-12  | 70.5%  |  |

Table 4: Multi-level mesh vs. single-level mesh

In addition, we confirm the robustness of our multi-level mesh against voltage fluctuation. In our experiments, we perturb the supply voltage of each clock buffer randomly by 10%. For each pair of multi-level mesh and single level mesh with same total routing area, we do 10 simulations with different random seeds. Table 5 shows the average and the worst skew of these 10 cases. Note that in the experiments, in order to focus on the voltage fluctuation effect, we ignore the process variations. For the multi-level mesh with total area 5, the average and worst clock skew are 1.16ps and 2.02ps, respectively, which are 60% less than those produced by a single level mesh.

**Table 5: Skew with Voltage Fluctuation** 

| total area | mutli-lev | vel mesh | single-level mesh |          |  |
|------------|-----------|----------|-------------------|----------|--|
|            | ave       | worst    | ave               | worst    |  |
| 1.00       | 8.38E-12  | 1.14E-11 | 8.26E-12          | 1.43E-11 |  |
| 2.00       | 2.71E-12  | 4.42E-12 | 6.18E-12          | 1.11E-11 |  |
| 3.00       | 1.89E-12  | 3.33E-12 | 4.83E-12          | 8.73E-12 |  |
| 4.00       | 1.45E-12  | 2.48E-12 | 3.88E-12          | 6.96E-12 |  |
| 5.00       | 1.16E-12  | 2.02E-12 | 3.18E-12          | 5.64E-12 |  |

#### 7. Discussions on Inductive Effect

In our previous analysis and experiments, we ignore the inductive effect of interconnect. When the clock frequency keeps climbing, the inductance's effect becomes more and more important. However, a lots of techniques can be used to control the parasitic inductance of clock interconnect, such as grounded shielding and using differential signals.

In [5], a set of rules has been developed to help us decide under which conditions the inductive effect can be ignored. According to [5], the error between RC and RLC representation will not exceed 15% for a single wire, if  $C_L >> C$ ,  $R/Z_0 > 2$ , and  $R_1 > nZ_0$ , where *n* is between 0.5 and 1,  $C_L$  is the loading at the far end of the line, *C* is the wire capacitance, and  $Z_0$  is the impendence caused by the inductance.

On the top level of our proposed multiple level mesh, the load capacitance  $C_L$  has value of 149.4fF, which is much larger than 14.3fF, the wire capacitance *C*. For a pair of 1.2cm copper differential wires with minimal wire spacing on metal layer 10, the inductance is 2.7nH [8]. At the frequency of 4GHz, with the clock slew of 50 ps, the impendence caused by the inductance is 339  $\Omega$ , which is much smaller than 5130 $\Omega$ , the wire resistance, and also smaller than 367 $\Omega$ , the driving resistance.

We conduct a SPICE simulation for a multiple level clock network using differential meshes. We compare the maximum skew of RC circuit and RLC circuit. At the frequency of 4 GHz, the error on maximum skew between RC and RLC circuit is less than 1%.

#### 8. Conclusion and Future Directions

We demonstrate the effect of the mesh network to the clock skew. From the result of the simplified circuit, the skew decreases proportionally to the exponential of  $-R_I/R$ . This analytical relation can be used to guide the design of hybrid mesh/tree clock network. We propose to use hybrid multi-level mesh/tree structure to reduce the clock skew. By solving a very simple non-linear programming, we can get the optimum resource distribution among the meshes in different levels.

Our experiments show that by adding an 16 by 16 mesh at the bottom level leaves of an H-tree, the clock skew can be reduced from 29.2ps to 12.4ps and the optimized hybrid multi-level mesh and tree structure produces a clock skew of 8.72ps, which is 30% less than the single level mesh. The experiments also demonstrate that the optimized hybrid multi-level mesh and tree structure is much more robust than single-level mesh and tree structure in the presence of voltage variations.

Some interesting future research directions include:

- Theoretical analysis of clock signal propagation on a uniform mesh
- Clock skew calculation using RLC model
- The use of non-uniform mesh to further reduce the clock skew

#### 9. References

- F.E. Anderson, et al, The Core Clock System on the Next Generation Itanium<sup>™</sup> Microprocessor, ISSCC 2002 Session 8.5.
- [2] M.S. Bazaraa, H.D. Sherali, and C.M. Shetty, Nonlinear Programming: Theory and Algorithms, 2nd ed. New York: Wiley, 1997
- [3] D. Boning and S. Nassif, Models of Process Variations in Device and Interconnect, in Design of High Performance Microprocessor Circuits, Editors: A. Chandrakasan, W. Bowhill, F. Fox, IEEE Press, 2000
- [4] P.J. Camporese, et al, X-Y Grid Tree Tuning Method, U.S. Patent No. 6205571 B1, Mar. 20, 2001.
- [5] C.K. Cheng, J. Lillis, et al, Interconnect Analysis and Synthesis, 2000, Wiley Interscience.
- [6] M.P. Desai, R. Cvijetic, J. Jensen, Sizing of clock distribution networks for high performance CPU chips, Proceedings of Design Automation Conference, p.389-394, June 03-07, 1996
- [7] D. Harris, S. Naffziger, Statistical Clock Skew Modeling With Data Delay Variations, IEEE trans. on VLSI SYSTEMS, Vol.9, No. 6, Dec 2001, pp. 888-898
- [8] M. Kamon, M. J. Tsuk, and J. K. White, FastHenry: A multipoleaccelerated 3-d inductance extraction program. IEEE Trans. on Microwave Theory and Techniques, 42(9):1750--8, September 1994.
- [9] N.A. Kurd, et al, A Multigigahertz Clocking Scheme for the Pentium® 4 Microprocessor, IEEE Journal of Solid-State Circuits, Vol. 36, No. 11, Nov. 2001 pp. 1647-53.
- [10] Y. Liu, S.R. Nassi, L.T. Pilleggi, and A.J. Strojwas, Impact of Interconnect Variations on the Clock Skew of a Gigahertz Microprocessor, in Porc. Of DAC 2000, pp. 168-171

- [11] V. Mehrotra, Modeling the Effects of Systematic Process Variation on Circuit Performance, Ph.D. Thesis, MIT, May, 2001
- [12] M. Orshansky, L. Milor, P. Chen, K. Keutzer and C. Hu, Impact of Spatial Intrachip Gate Length Variability on the Performance of High-Speed Digital Circuit, IEEE trans. on CAD, p.544-553, vol. 21, No. 5, May 2002
- [13] P.J. Restle, et al, A Clock Distribution Network for Microprocessors, IEEE Journal of Solid-State Circuits, Vol. 36, No. 5, May 2001 pp. 792-99.
- [14] P.J. Restle, et al, The Clock Distribution of the Power4 Microprocessor, ISSCC 2002, Session 8.4.
- [15] B.E. Stine, D.S. Boning, J.E. Chung, D.J. Ciplickas, J.K. Kibarian, Simulating the Impact of Pattern-Dependent Poly-CD Variation on Circuit Performance, IEEE Trans on semiconductor manufacturing, vol. 11, No. 4, Nov. 1998, pp. 552-556
- [16] S. Sauter, D. Cousinard, R. Thewes, D. Schmitt-Landsiedel, W. Weber, Clock skew determination from parameter variations at chip and wafer level, 1999. IWSM. 1999 4th International Workshop Statistical Methology, pp 7-9
- [17] H. Su, S.S. Sapatnekar, Hybrid Structured Clock Network Construction, ICCAD 2001, pp. 333-336
- [18] P. Zarkesh-Ha, T. Mule, and J. Meindl, Characterization and Modeling of Clock skew with Process Variations, in Proc. CICC, pp. 441-444, 1999

#### **Appendix: Derivation of Skew Function**

First, we get the close form expression of  $V_1$  and  $V_2$  by solving differential equation (1), without loss of generality, we set  $V_{s1} = V_{s2} = I$ :

for  $t \leq T$ 

$$\begin{cases} V_{1} = -\frac{1}{2} (\exp^{-\frac{1}{R_{1}C_{1}}t} + \frac{1}{1+2\frac{R_{1}}{R}} \exp^{-\frac{1+2\frac{R_{1}}{R_{1}}t}{R_{1}C_{1}}t} + 1 + \frac{1}{1+2\frac{R_{1}}{R}}) \\ V_{2} = -\frac{1}{2} (\exp^{-\frac{1}{R_{1}C_{1}}t} - \frac{1}{1+2\frac{R_{1}}{R}} \exp^{-\frac{1+2\frac{R_{1}}{R_{1}}}{R_{1}C_{1}}t} + 1 - \frac{1}{1+2\frac{R_{1}}{R}}) \end{cases}$$
(3)

for t > T

$$\begin{cases} V_1 = K_1 \exp^{-\frac{1}{R_i C_1}t} + K_2 \exp^{-\frac{1+2\frac{R_i}{R_i}}{R_i C_1}} + 1 \\ V_2 = K_1 \exp^{-\frac{1}{R_i C_1}t} - K_2 \exp^{-\frac{1+2\frac{R_i}{R_i}}{R_i C_1}} + 1 \end{cases}$$
(4)

where

$$\begin{cases} K_1 = -\frac{1}{2} (\exp^{\frac{1}{R_1 C_1}T} + 1) \\ K_2 = \frac{1}{2(1+2\frac{R_1}{R})} (\exp^{\frac{1+2\frac{R_1}{R}}{R_1 C_1}T} - 1) \end{cases}$$

According to the equation (4), for t > T, both  $V_1$  and  $V_2$  have the term  $_{K_1 \exp^{-\frac{1}{R_1C_1}t}}$  while the term  $_{K_2 \exp^{-\frac{1+2\frac{R_1}{R}}{R_1C_1}t}}$  causes the clock skew.

We define  $t_1$  and  $t_2$  to be the arriving time of node  $n_1$  and node  $n_2$ , respectively. In other words,  $V_1(t_1) = V_2(t_2) = 0.5$ . The clock skew  $\Delta T = t_2 - t_1$ .

We assume that the initial clock skew *T* is much smaller than the clock delay  $ln2R_IC_I$ . This assumption is reasonable for most symmetric clock trees with typical design parameters. Based on this assumption, we have  $t_1 \approx t_2 \approx \ln 2 \cdot R_1C_1$ .

We compute the voltage-increasing rate of  $V_2$  and voltage difference between  $V_1$  and  $V_2$  at time  $t_1$ . By dividing these two numbers, we can get the time  $V_2$  needed to achieve 0.5V. We compute the skew  $\Delta T$  using following approximation:

$$\Delta T \approx \frac{V_1(t=t1) - V_2(t=t1)}{\dot{V}_2(t=t1)}$$
(5)  
$$\approx \frac{V_1(t=\ln 2R_1C_1) - V_2(t=\ln 2R_1C_1)}{\dot{V}_2(t=\ln 2R_1C_1)}$$
$$= \frac{2K_2 \exp(-\ln 2(1+2\frac{R_1}{R}))}{0.5(\frac{-K_1}{R_1C_1} + K_2)}$$
$$= \frac{K_2 \exp(-2\ln 2\frac{R_1}{R})}{\frac{1}{2}(\frac{-K_1}{R_1C_1} + K_2(\frac{R_1C_1}{1+2\frac{R_1}{R}})\exp(-2\ln 2\frac{R_1}{R}))}$$

Because  $T \le R_I C_I$ , we have  $T/(R_I C_I) \le 1$ .

When x << 1, we can use first order Taylor's expansion  $e^x = 1+x$  to approximate the value of exponential function  $e^x$ . We utilize this approximation to simplify the expression of  $K_1$  and  $K_2$ 

$$K_{1} = -\frac{1}{2} (\exp^{\frac{1}{R_{1}C_{1}}T} + 1) = -\frac{1}{2} (2 + \frac{T}{R_{1}C_{1}})$$

$$K_{2} = \frac{1}{2(1 + 2\frac{R_{1}}{R})} (\exp^{\frac{1 + 2\frac{R_{1}}{R}}{R_{1}C_{1}}} - 1) = \frac{1}{2(1 + 2\frac{R_{1}}{R})} (1 + (\frac{1 + 2\frac{R_{1}}{R}}{R_{1}C_{1}}T) - 1)$$

$$= \frac{T}{2R_{1}C_{1}}$$
(6)
(7)

Plug (6) and (7) into (5), and omit all the small terms contain  $\frac{T}{R_1C_1}$ , we get following skew expression:

$$\Delta T \approx \frac{\frac{T}{2R_1C_1} \exp(-2 \ln 2 \frac{R_1}{R})}{\frac{1}{2R_1C_1}} = T \exp(-2 \ln 2 \frac{R_1}{R})$$
(8)