### UNIVERSITY OF CALIFORNIA

Los Angeles

Simple Universal Parallel Interface (SuperCHIPS) Protocol for High Performance Heterogeneous System Integration

> A thesis submitted in partial satisfaction of the requirements for the degree Master of Science in Electrical Engineering

> > by

SivaChandra Jangam

2017

© Copyright by SivaChandra Jangam 2017

#### ABSTRACT OF THE THESIS

# Simple Universal Parallel Interface (SuperCHIPS) Protocol for High Performance Heterogeneous System Integration

by

SivaChandra Jangam Master of Science in Electrical Engineering University of California, Los Angeles, 2017 Professor Subramanian Srikantes Iyer, Chair

This thesis presents the Simple Universal Parallel intERface (SuperCHIPS) protocol for high interconnect density heterogeneous system integration. This is enabled by fine pitch interconnects and dielet assembly at close proximity on interconnect fabric. Dramatic improvements in bandwidth, latency, and power are achieved through this integration scheme where small dielets (1-25 mm<sup>2</sup>) are attached to a Silicon Interconnect Fabric (Si-IF) at fine interconnect pitch (2-10  $\mu$ m) and short inter-dielet spacing (50-500  $\mu$ m) using solderless metal-to-metal thermal compression bonding (TCB). Simulated models indicate that links in the Si-IF with short wire-lengths ( $<500 \ \mu m$ ) have excellent signal transfer characteristics with low channel loss (<-2 dB) and cross-talk (<-15 dB) achieving data rates >10 Gbps per link. Further, the maximum current density for a given current is 30x lower in copper interconnects compared to conventional solder bumps. With fine interconnect pitches (<10 $\mu$ m), this scheme can achieve >5-30x improvement in data bandwidth and >50x reduction in power compared to PCB-style integration. This scheme of system integration using a dielet based assembly method provides significant reduction in design and validation cost. Test vehicles were fabricated and experimental demonstration of the integration scheme is presented.

The thesis of SivaChandra Jangam is approved.

Sudhakar Pamarti

Puneet Gupta

Subramanian Srikantes Iyer, Committee Chair

University of California, Los Angeles

2017

To my parents

# TABLE OF CONTENTS

| 1        | Intr           | roduction                                                          | 1  |
|----------|----------------|--------------------------------------------------------------------|----|
|          | 1.1            | System performance                                                 | 1  |
|          |                | 1.1.1 System-on-Chip                                               | 2  |
|          |                | 1.1.2 Interposer                                                   | 2  |
|          | 1.2            | Heterogeneous Integration                                          | 3  |
|          | 1.3            | Contribution of this Work                                          | 3  |
|          | 1.4            | Organization of this Thesis                                        | 4  |
| <b>2</b> | $\mathbf{Sim}$ | ple Universal Parallel intERface for Chips (SuperCHIPS) Protocol . | 5  |
|          | 2.1            | SuperCHIPS protocol                                                | 5  |
|          | 2.2            | Silicon Interconnect Fabric (Si-IF)                                | 6  |
|          |                | 2.2.1 Thermomechanical Properties                                  | 7  |
|          |                | 2.2.2 Electrical Properties                                        | 8  |
|          | 2.3            | Thermal Compression Bonding                                        | 9  |
| 3        | Si-I           | F Interconnect Modelling                                           | 11 |
|          | 3.1            | Si-IF vs PCB links                                                 | 11 |
|          | 3.2            | Si-IF link model                                                   | 12 |
|          |                | 3.2.1 Digital signal transfer                                      | 15 |
|          |                | 3.2.2 Analog signal transfer                                       | 16 |
|          |                | 3.2.3 Cross-talk                                                   | 17 |
|          | 3.3            | Signal Integrity Analysis                                          | 19 |
|          | 3.4            | Copper Pillar Interconnect Model                                   | 22 |

| 4  | Ben   | efits of SuperCHIPS Protocol                                                                                             | 7        |
|----|-------|--------------------------------------------------------------------------------------------------------------------------|----------|
|    | 4.1   | Latency                                                                                                                  | 7        |
|    | 4.2   | Power                                                                                                                    | 8        |
|    | 4.3   | Bandwidth                                                                                                                | 9        |
|    | 4.4   | SuperCHIPS vs Conventional Package                                                                                       | 0        |
| 5  | Exp   | perimental Demonstration and Results                                                                                     | <b>2</b> |
|    | 5.1   | Test Vehicle Design                                                                                                      | 2        |
|    |       | 5.1.1 Fabrication process flow for Si-IF                                                                                 | 2        |
|    | 5.2   | Experimental Demonstration                                                                                               | 4        |
|    |       | 5.2.1 Thermal Compression Bonding 3                                                                                      | 4        |
|    |       | 5.2.2 Alignment Accuracy                                                                                                 | 5        |
|    |       | 5.2.3 Inter-Dielet Spacing                                                                                               | 5        |
|    | 5.3   | Continuity Results                                                                                                       | 7        |
| 6  | Con   | $\alpha$ clusion                                                                                                         | 8        |
| Re | efere | $nces \ldots 3$ | 9        |

# LIST OF FIGURES

| 2.1  | Schematic of SuperCHIPS integration with dielets mounted on Si-IF                             | 6  |
|------|-----------------------------------------------------------------------------------------------|----|
| 2.2  | Test Si-IF with dielets assembled at inter-dielet spacing of 100 $\mu m$                      | 7  |
| 2.3  | Schematic of TCB process experimental setup                                                   | 9  |
| 3.1  | Transmission Line model                                                                       | 12 |
| 3.2  | Lumped RC circuit model for Si-IF link                                                        | 12 |
| 3.3  | Structure of the model used to simulate link characteristics                                  | 13 |
| 3.4  | Different Wire configurations studied                                                         | 14 |
| 3.5  | Insertion Loss for 2 $\mu$ m interconnect pitch                                               | 16 |
| 3.6  | Insertion Loss for 10 $\mu$ m interconnect pitch                                              | 16 |
| 3.7  | Insertion Loss at different characteristics impedance and length $\ldots$ .                   | 17 |
| 3.8  | NEXT for signals without shared ground                                                        | 18 |
| 3.9  | NEXT for signals with shared ground                                                           | 18 |
| 3.10 | FEXT for signals without shared ground                                                        | 19 |
| 3.11 | FEXT for signals with shared ground                                                           | 19 |
| 3.12 | Schematic of circuit used for signal integrity analysis                                       | 21 |
| 3.13 | Eye-diagram of 2 $\mu {\rm m}$ pitch interconnect at 10 GHz input frequency $~\ldots~\ldots~$ | 21 |
| 3.14 | Eye-diagram of 10 $\mu {\rm m}$ pitch interconnect at 10 GHz input frequency                  | 22 |
| 3.15 | Copper Pillar Interconnect Model                                                              | 23 |
| 3.16 | Current distribution in 5 $\mu$ m diameter copper pillar interconnect                         | 23 |
| 3.17 | C4 bump Interconnect Model                                                                    | 24 |
| 3.18 | Current distribution in 50 $\mu$ m diameter C4 bump in log scale                              | 24 |
| 3.19 | Plot of current crowding ratio variation across interconnect diameter                         | 25 |
| 3.20 | Area required for 30A current supply vs interconnect pitch                                    | 26 |

| 4.1 | Comparison of Latencies across different technologies                                                       | 28 |
|-----|-------------------------------------------------------------------------------------------------------------|----|
| 4.2 | Comparison of Energy per bit across different technologies                                                  | 29 |
| 4.3 | Comparison of Bandwidth/mm across different technologies $\ldots \ldots \ldots$                             | 30 |
| 5.1 | Micrograph of the test Si-IF with Cu-pillars ( $\Phi = 5 \ \mu m$ , Pitch = 10 $\mu m$ ). Test              |    |
|     | pad size is 50 x 50 $\mu$ m                                                                                 | 33 |
| 5.2 | Micrograph of a fabricated test dielet (2 x 2 mm) and 10 $\mu \rm m$ interconnect pitch                     | 33 |
| 5.3 | Fabrication process flow for Si-IF                                                                          | 34 |
| 5.4 | Dielet after shearing off the test Si-IF. An overlay of $\pm 1~\mu{\rm m}$ in x-direction,                  |    |
|     | $\pm 0.5~\mu{\rm m}$ in y-direction. Rotation of around z-axis is $0.0003^{\rm o}$ $~.$ $~.$ $~.$ $~.$ $~.$ | 36 |
| 5.5 | Micrograph of the dielets assembled with 100 $\mu {\rm m}$ inter-dielet spacing $~$                         | 36 |
| 5.6 | IV-characteristics of a daisy chain including interconnects, pads and fanout                                |    |
|     | wires                                                                                                       | 37 |

# LIST OF TABLES

| 3.1 | Simulated Model Dimensions               | 15 |
|-----|------------------------------------------|----|
| 3.2 | RLGC Extraction                          | 20 |
| 4.1 | Si-IF vs Conventional Package Comparison | 31 |
| 5.1 | Thermal Compression Bonding Parameters   | 35 |

#### Acknowledgments

I would like to express my sincere gratitude to my advisor Prof. Subramanian Iver for his support, guidance, and useful discussions throughout this project. I will always be grateful to him for motivating and inspiring me to pursue research. I would like to thank Prof. Sudharkar Pamarti for being on my thesis committee and his guidance in the development of models for simulations. I would also like to thank Prof. Puneet Gupta for being on my thesis committee and his advice in evaluation of performance benefits. I thank Dr. Adeel Bajwa for his support and help in fabrication of test vehicles and experimental demonstration. I also thank Saptadeep Pal for helpful discussions and aid in simulations and performance comparisons. I thank all my colleagues from CHIPS for their help. I thank Guru Krupa Foundation for their generous fellowship that supported me financially and greatly helped in pursuing my research. I thank DARPA and members of the UCLA CHIPS consortium for their support in this work. The Defense Advanced Research Projects Agency (DARPA) through ONR grant N00014-16-1-263 and the UCLA CHIPS Consortium supported this work. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

Sections of chapter 1 - 4, 6 were adapted from S. Jangam, S. Pal, A. Bajwa, S. Pamarti, P. Gupta, S. S. Iyer, "Latency, Bandwidth and Power Benefits of the SuperCHIPS Integration Scheme," 2017 IEEE 67th Electronic Components and Technology Conference (ECTC), 2017, in process of publication.

Sections of chapter 2, 5 were adapted from A. A. Bajwa, S. Jangam, S. Pal, N. Marathe, M. Goorsky, T. Fukushima, S. S. Iyer, "Heterogeneous Integration at Fine Pitch ( $\leq 10 \ \mu$ m) using Thermal Compression Bonding," 2017 IEEE 67th Electronic Components and Technology Conference (ECTC), 2017, in process of publication.

# CHAPTER 1

# Introduction

#### **1.1** System performance

High performance systems demand large bandwidth for communication which is challenging to achieve with existing integration technologies. Higher interconnect density and fine interconnect pitch are key enablers to achieve high data bandwidth in systems. Mainstream system integration technologies use solder based interconnects like Ball Grid Array (BGA) or Controlled Collapsed Chip Connection (C4) bumps to assemble packaged dies on Printed Circuit Board (PCB) substrates. Typical BGA pitch is 400  $\mu$ m and typical C4 bump pitch is 100  $\mu$ m while on-chip interconnects have a pitch of  $\leq 2 \mu$ m. Solder extrusion, warpage of substrate and so on limit the scaling of solder ball dimensions. This constraints the number of I/O connections for dielets, which is a major bottleneck for achievable data bandwidth. To accommodate the demand for higher I/O and power links, the size of the package is increasing, leading to higher package to silicon die area ratio (2x - 10x). This ratio restricts the minimum inter-dielet spacing between individual dies that can be achieved on the substrate. Consequently, traces between separately packaged dies run from a few to several centimeters leading to increased channel loss and communication latency in PCBs.

To increase the bandwidth through such links, serialization and deserialization circuits, commonly known as SerDes are implemented. These circuits have complex transceivers and receivers with equalizers to ensure the signal integrity over such long data links. They occupy substantial portion of real estate on die and consume significant power, which can be as high 30% of the total chip power. The energy per bit of SerDes based high speed links can be >23pJ/b [1]. Today's systems demand on-chip like fine pitch interconnect densities

of  $\leq 10 \ \mu m$  to meet the ever-growing bandwidth requirements. Several technologies have been proposed in the recent past with targets to achieve high performance systems. Systemon-Chip (SoC) and Interposer technologies are some of the solutions to increase the data bandwidth.

#### 1.1.1 System-on-Chip

The System-on-Chip (SoC) approach offers solution to integration problems by designing and fabricating an entire system with different IP blocks on a single silicon die. Availability of fine pitch wiring on a single chip provides opportunity to achieve high bandwidth. Further due to short inter-block spacings, the channel loss and latency is low, resulting in energy efficient data transfer. However, the design complexity and time to design such systems is very high. SoCs require large die size to accommodate all the functional blocks which is a major concern for yield in manufacturing and it is not scalable for large systems.

Wafer scale integration (WSI) is another approach to integrate large systems on a single wafer to reduce interconnect energy and latency [2]. This helps in realizing better performance and reduced cost of packaging. Despite significant efforts, wafer scale chip integration has not been practically realized. Low yield of manufacturing a massive chip, interconnect reliability, performance variability due to variation across wafer and so on are the major challenges.

#### 1.1.2 Interposer

Interposer technology has been presented as a high interconnect density redistribution layer (RDL) between dielets [3, 4, 5, 6, 7, 8]. Several substrates were proposed including silicon, glass, and organic interposers. The interconnect pitch in the silicon interposer technology is about 40  $\mu$ m. However, today's bandwidth requirements demand even finer interconnect pitch of  $\leq 10 \ \mu$ m. The size of the interposer is also limited, restricting the scalability. The interposer is finally connected to an organic board using solder interconnects. This requires Through Silicon Vias (TSV) in the interposer. The cost of the TSV also restricts the wide

spread applicability of the technology.

### **1.2** Heterogeneous Integration

Large and complex systems require incorporation of different technologies like CMOS, III-V, MEMS and so on for integration of logic circuits, power devices, sensors, and others. SoCs are inherently made in one technology, which is not always optimal for system level integration [9]. Authors in [9] discuss that high-performance interconnect fabrics could be used to integrate the processor and the L3 cache tightly incurring minimum latency while providing desirable bandwidth. Thus, dielet based assembly approach that could integrate dies manufactured in different technologies is beneficial. An heterogeneous integration scheme agnostic to dielet fabrication technology provides opportunities for development of novel energy efficient high-performance system architectures. Several past works [10, 11, 12] have focused on integrating components from different technologies onto a chip, however it remains a very costly and tedious challenge.

#### **1.3** Contribution of this Work

This research focuses on the development of novel integration technology and protocol for system integration that achieves higher data bandwidth, energy efficient data transfer and incorporates heterogeneity. The main objective was to scale the packaging technology to provide SoC-like interconnections for high performance and efficiency. In this work, the Silicon Interconnect Fabric (Si-IF) technology was developed with silicon as the substrate. Direct metal-metal Thermal Compression Bonding (TCB) process was developed to replace the solder interconnects. Novel heterogeneous integration protocol called the Simple Universal Parallel intERface for CHIPS (SuperCHIPS) is proposed. The links in Si-IF were modelled and simulated using 3D EM solvers to study the transfer characteristics and signal integrity. The current carrying capacity of the interconnects was investigated with Finite Element Models (FEM). Latency, bandwidth, and power benefits of SuperCHIPS protocol compared to the existing integration technologies were estimated. Fine interconnect pitch (10  $\mu$ m) and small inter-dielet spacing ( $\leq 100 \ \mu$ m) was experimentally demonstrated and continuity of the daisy chains with interconnects was illustrated.

### 1.4 Organization of this Thesis

This thesis is organized as follows: Chapter 2 introduces the SuperCHIPS protocol and describes the key technological enablers for the SuperCHIPS protocol i.e. the Si-IF technology and the TCB process. The modelling and performance analysis of the Si-IF links and the copper pillar interconnects are presented in Chapter 3. Chapter 4 illustrates the benefits of SuperCHIPS integration protocol and compares it with traditional integration technologies. Experimental demonstration of fine pitch interconnects and results of electrical continuity tests are presented in Chapter 5. Conclusion of this thesis is given in Chapter 6.

# CHAPTER 2

# Simple Universal Parallel intERface for Chips (SuperCHIPS) Protocol

#### 2.1 SuperCHIPS protocol

The SuperCHIPS integration protocol is aimed at providing a scalable platform for integrating systems. It leverages the benefits of fine pitch interconnects technology to achieve system performances similar to that of SoCs. It incorporates the concepts of IP-reuse and heterogeneous architecture for optimal system performance and ease of design.

In SuperCHIPS protocol, small unpackaged dielets (1-25 mm<sup>2</sup> area) are assembled on the substrate at close proximity ( $\leq 100 \ \mu$ m) and interconnected at SoC-like wiring pitches. Die level packaging is eliminated to achieve low inter-dielet spacing and the packaging is done at system-level. This corresponds to data-links that are only 50- 500  $\mu$ m long between dielets. These lengths are significantly shorter than those in PCB which can be several centimeters long. As a result, the channel loss and link latency overheads are greatly reduced allowing simple inverters to be used as drivers. Thus, the need for complex transceiver and receiver circuitry is eliminated which significantly reduces the power consumption and the real estate for I/O circuitry. The interconnects between the dielets and the substrate are at fine pitches (2 - 10  $\mu$ m) providing more number of data links compared to existing technologies mentioned earlier. This is achieved by elimination of the use of solder by direct metal-to-metal TCB [13] between metal pillars on substrate, to metal pads on the dielets. Solder extrusion and intermetallic formation at the interface are eliminated by this process. With the availability of large number of data-links, each link can be operated at a lower frequency and at the same time achieve a higher bandwidth per mm of the edge length of die. Thus, the need

for serialization and deserialization of data is eliminated by parallelizing data transfer. The high interconnect density and short data links provide the opportunity to use simple inverter drivers instead of SerDes circuits leading to 20x - 80x energy efficiency. The schematic of the SuperCHIPS integration on substrate is shown in fig 2.1.



Figure 2.1: Schematic of SuperCHIPS integration with dielets mounted on Si-IF

Some of the key technological enablers for SuperCHIPS protocol are listed below-

- Development of superior substrate that allows for fine interconnect and trace pitches. The Si-IF substrate was developed to address this problem.
- Development of metal-metal TCB process with low bonding pressure and temperature in ambient atmosphere for fine pitch interconnects. Fabrication and bonding techniques were developed to overcome this challenge.
- Development of advanced placement techniques for assembly of dielets at close proximity of ≤100 µm. State of the art tools and alignment techniques were used to achieve this objective.

### 2.2 Silicon Interconnect Fabric (Si-IF)

To realize the fine interconnect pitch and high interconnect density the Si-IF substrate was developed that would replace the organic substrates. Si-IF is a rigid silicon (Si) based substrate with silicon dioxide (SiO<sub>2</sub>) dielectric layer. The SiO<sub>2</sub> layer acts as the interconnection platform for the entire system. The wiring in the dielectric layer is compatible with the mature Back End of the Line (BEOL) fabrication technology in CMOS fabrication. The traces are dual damascene processed copper. The top metal layer of Si-IF is terminated with copper pillars that can be TCB bonded to copper pads on dielets. A test Si-IF structure with dielets assembled at inter-dielet spacing of 100  $\mu$ m spacing is shown in fig 2.2.



Figure 2.2: Test Si-IF with dielets assembled at inter-dielet spacing of 100  $\mu$ m

#### 2.2.1 Thermomechanical Properties

Silicon was chosen as the substrate for its superior thermomechanical properties compared to organic substrates.

- Si-IF is rigid and mechanically robust compared to organic substrates.
- Silicon is a good thermal conductor with thermal conductivity of 149 WmK<sup>-1</sup>. This is 600x higher than typical organic substrates like FR4 with thermal conductivity of 0.25 WmK<sup>-1</sup> [14]. This ensures good heat dissipation.
- Since most of the dielets are made in Si technology, the coefficient of thermal expansion (CTE) mismatch between the dielets and substrate is significantly reduced compared to that of the organic substrates which have 4x higher CTE.

- Low CTE mismatch ensures low Chip-Package-Interaction (CPI) related failures.
- By using direct metal-metal TCB, the need for complex Under Bump Metallurgy (UBM) is eliminated reducing fabrication overhead and cost.

#### 2.2.2 Electrical Properties

Silicon substrate allows opportunities for fabrication of fine traces like those on chip with superior electrical performance.

- BEOL fabrication techniques allow for fine trace dimensions of ≤1 μm and trace pitch of ≤1.5 μm. Compared to minimum dimensions of traces in PCB which are in the order of ≈100 μm, Si-IF traces are 100x smaller.
- The Si-IF can have up to four levels of wiring which should be sufficient for integration of large systems. This is however not a fundamental limit and Si-IF can be extended to have more wiring levels.
- With the elimination of solder and intermetallics in the interconnects, the current carrying capacity is 2 orders of magnitude higher in the case of direct Cu-Cu bonded interconnects.
- Fine copper pillar dimensions also have lower current crowding ratio that is 30x lower compared to C4 solder bumps. This significantly reduces electromigration related failures.
- The use of SiO<sub>2</sub> as the dielectric layer helps in the reduction of cross-talk between links and have a better signal integrity.

The superior thermomechanical and electrical properties of Si-IF substrate are quintessential to realize the SuperCHIPS integration protocol.

### 2.3 Thermal Compression Bonding

Direct metal-metal bonding technology is a key enabler for fine pitch interconnects as mentioned earlier due to elimination of bridging and voids in solder based interconnects. In this work, TCB process for direct copper to copper (Cu-Cu) is investigated. In TCB process, pressure and temperature are applied between the dielet and Si-IF substrate for bonding. A schematic of the TCB experimental setup is shown in fig 2.3.



Figure 2.3: Schematic of TCB process experimental setup

The two major conditions required for a successful TCB are listed below-

- The metal surfaces have to be atomically flat.
- The metal surfaces have to be pristine i.e. the surfaces must be free of native oxides and contaminations.

Atomically flat surfaces can be achieved by a good Chemical Mechanical Polishing (CMP) process. Pristine surfaces are however hard to achieve and depend on the metals used. Ideally direct Cu-Cu TCB is preferred for best electrical performance. However, Cu has a high rate of oxidation [15, 16, 17] and hence ambient atmosphere bonding is difficult. In-situ formic acid treatment is a viable option to realize direct Cu-Cu TCB. In this work, thin

layer of nickel-gold (Ni-Au) was used as a protection layer and Au-Au TCB was performed as an interim solution to achieve fine pitch interconnects. However, the modelling of the interconnects was done assuming that perfect Cu-Cu bonding is realizable and the properties should not deviate too much from the interim solution. With the precise control of pressure and temperature, the surface asperities can be overcome to achieve good metal-metal bond. In dielet based approach, the dielet must be precisely aligned and assembled on the substrate which is also a major challenge for fine pitch interconnects addressed in this work.

# CHAPTER 3

### Si-IF Interconnect Modelling

### 3.1 Si-IF vs PCB links

SuperCHIPS protocol relies on short wire lengths for low loss and energy efficient signal transfer. These short wire lengths cannot be realized on conventional PCB based designs due to die-level packaging limiting the minimum inter dielet spacing. Hence, links on PCBs are typically several centimeters long. In SuperCHIPS protocol, due to small inter-dielet spacing of  $\leq 100 \ \mu$ m, the lengths of the links in Si-IF are typically within 50  $\mu$ m to 500  $\mu$ m. Hence the links in Si-IF are 10-100x smaller than those in PCBs.

Due to the long wire lengths in PCBs, the parasitic inductances of such links are high along with parasitic capacitances. The inductance becomes significant and transmission line model needs to be used for wires with wire lengths greater than 1/10th of the wavelength  $(\lambda)$  of the propagating EM wave. For a 1 GHz signal, the inductance becomes significant for 1 cm  $(\lambda/10)$  Cu wires. The PCBs wires are typically longer than this value and hence the links behave as transmission lines and modelled as shown in fig 3.1. For 100  $\mu$ m  $(\lambda/10)$ wires of Cu in SiO<sub>2</sub>, the inductance becomes significant only at 100 GHz. Therefore, links in Si-IF show RC like behavior with negligible parasitic inductances. Hence simple lumped RC model can be used to model the links in Si-IF as shown in fig 3.2.

Signal transfer in transmission line requires impedance matching to reduce reflections at the ends of both transmitter and receiver. Reflections due to impedance mismatch leads to inter-symbol interference (ISI) between consecutive data bits. Complex equalizers [1] are used to reduce the ISI which adds overhead on driver power and area. In Si-IF, since the links behave like lumped RC circuits instead of transmission lines, the reflections due to impedance



Figure 3.1: Transmission Line model



Figure 3.2: Lumped RC circuit model for Si-IF link

mismatch are eliminated. Therefore, there is no longer a need for equalization simplifying driver design and reducing power, area. Also since the PCB links require equalization and serialization techniques, the data must be transferred synchronously. With the RC behavior of Si-IF links, the data can also be transferred asynchronously.

### 3.2 Si-IF link model

3-D models of Si-IF links were designed and simulated in Electromagnetic (EM) solvers like ANSYS HFSS to study the signal transfer characteristics. For the models, direct Cu-Cu bonding with no additional metal layers and no intermetallics at the interface was assumed for ease of simulation. Also, perfect bonding at the interface with no voids was assumed which allowed for application of bulk properties of Cu to interconnects across the interface. We also placed the dielets in near proximity (<100  $\mu$ m), so that wirelengths of 100 - 500  $\mu$ m are realizable. The simulated Si-IF structure is shown in fig 3.3. The bottom substrate is Si with SiO<sub>2</sub> dielectric layer. For the ease of simulation, a single Cu metal layer for data links inside the dielectric layer is analyzed. However, for a real system, four or more wiring levels are possible and the characteristics should not deviate too much from the simulated structure. The top layer of Si-IF is terminated with Cu pillars that protrude out of the surface. The top dielets also consist of Si and  $SiO_2$  dielectric layer. The top layer of dielets are terminated with Cu pads openings that are flip chip TCB bonded to Cu pillars.



Figure 3.3: Structure of the model used to simulate link characteristics

Different models with varying lengths, pitches, and configurations were designed to analyze insertion loss and cross-talk of the links. The dimensions of the layers used in simulations is shown in Table 3.1. The Si layer thickness is lower than expected for ease of simulation. The Cu pads in the top dielet act as the terminal for the EM wave excitation. Different terminal configurations were investigated for insertion loss and cross-talk estimation. In this work, the results of three configurations are presented namely

- 1. Ground-Signal-Ground (GSG)
- 2. Ground-Signal-Signal-Ground (GSSG)
- 3. Ground-Signal-Signal-Signal-Ground (GSSSSG)



Figure 3.4: Different Wire configurations studied

The configurations are shown in fig 3.4. Traditional PCBs use ground planes as a return

path for signal. However, in Si-IF the grounds (return paths) are also wires like signal wires. Both the signal (forward path) and ground (return path) wires are of same dimensions as mentioned in Table 3.1. For the boundary condition of simulations, the bottom of the Si substrate is assumed to be grounded. The silicon in the Si-IF substrate was assumed to be of p-type doped with a doping concentration of  $10^{15}$  cm<sup>-3</sup>.

| Component                | Thickness ( $\mu m$ ) | Width ( $\mu$ m) | Pitch ( $\mu$ m) |
|--------------------------|-----------------------|------------------|------------------|
| Copper Pillar            | 5                     | 1,5              | 2,10             |
| Copper Data Link         | 1                     | 1                | $1.5,\!2,\!10$   |
| Si Substrate             | 50                    | 50               | N.A              |
| $SiO_2$ dielectric layer | 20                    | 50               | N.A              |
| Air Gap                  | 2                     | 50               | N.A              |

Table 3.1: Simulated Model Dimensions

#### 3.2.1 Digital signal transfer

As mentioned earlier, the resistance and capacitance are the only significant contributors to the frequency response of the Si-IF links. Fig 3.5 shows the insertion loss of the links with pillar diameter and wire width of 1  $\mu$ m, and with pitch of 2  $\mu$ m for different wire lengths. Fig 3.6 shows the insertion loss of the links with wire width of 1  $\mu$ m, pillar diameter of 5  $\mu$ m and pitch of 10  $\mu$ m. The plots indicate that the 100  $\mu$ m and 500  $\mu$ m wires behave as a lumped RC circuit for most of the frequencies of interest (0.1 - 100 GHz), while the 1 mm wires deviate from RC behavior at high frequencies due to larger inductances. The Si-IF data links show excellent signal transfer with insertion loss of less than -2 dB for 500  $\mu$ m wires even for frequencies up to 100 GHz. The insertion losses are significantly lower compared to other technologies [6, 7, 8, 18, 19]. The excellent characteristics signify the importance of short wire lengths for signal transfer. The low loss eliminates the need for complex transmitter or receiver circuits. Simple tapered buffers can be used as drivers in such a system.



Figure 3.5: Insertion Loss for 2  $\mu$ m interconnect pitch



Figure 3.6: Insertion Loss for 10  $\mu$ m interconnect pitch

#### 3.2.2 Analog signal transfer

For analog RF signal frequencies (10 GHz - 1 THz), the GSG configuration of links was analyzed. At very high frequencies (>50 GHz) depending on the wire length, the wires may behave like transmission lines due to self-inductance. The characteristic impedance is not clearly defined for short Si-IF links and is a valid concept only for longer links. To study the link behavior, coplanar GSG link traces were designed with long wire length characteristic impedances of 50  $\Omega$  and 100  $\Omega$ . The wire width was calculated to be 6  $\mu$ m and the wire spacing was 3  $\mu$ m and 7  $\mu$ m respectively [20]. The insertion loss for different terminations and wire lengths are shown in fig 3.7. The simulated insertion loss is less than -3 dB even for THz signals for 100  $\mu$ m wire. Therefore, high frequency signals of up to few THz can be transferred with low loss using short channels on Si-IF. The termination of 100  $\Omega$  may also be realized using Si-IF.



Figure 3.7: Insertion Loss at different characteristics impedance and length

#### 3.2.3 Cross-talk

The cross-talk between Si-IF links is predominantly due to the capacitive coupling of the coplanar traces. In this work, the objective was to estimate the cross-talk between parallel links in the same layer. For cross-talk analysis, staggered pillar array arrangement was modelled as shown in the GSSSSG configuration (fig 3.4c) with different pitches. The width of the wire is 1  $\mu$ m. The near-end cross-talk (NEXT) between links with non-shared grounds is shown in fig 3.8, and the NEXT between links with shared grounds is shown in fig 3.9. The far-end cross-talk (FEXT) between links with non-shared grounds is shown in fig 3.10, and the FEXT between links with shared ground is shown in fig 3.11.

The simulations show that the NEXT and FEXT between links with non-shared ground



Figure 3.8: NEXT for signals without shared ground



Figure 3.9: NEXT for signals with shared ground

is less than -20 dB for all pitches even at very high frequencies. The worst case NEXT between links with shared ground is less than -15 dB at 10 GHz and -5 dB at 100 GHz. The worst-case FEXT is less than -20 dB at 10 GHz and less than -12.5 dB at 100 GHz. The NEXT and FEXT between links with shared ground are higher due to ground bounce effect. These values are lower than the typical acceptable cross-talk of -12 dB. The low cross-talk can be attributed to the good dielectric insulation properties of SiO<sub>2</sub> and the short channel lengths in Si-IF. The simulated cross-talk is similar to the cross-talk between on-chip wires



Figure 3.10: FEXT for signals without shared ground



Figure 3.11: FEXT for signals with shared ground

in top metal layers [21, 22] which is expected in this technology.

### 3.3 Signal Integrity Analysis

The simulated model of the links was meshed and exported to ANSYS Q3D Extractor software to extract the lumped RLGC parameters. To make a meaningful estimate of the parasitics, a fixed wire width of 1  $\mu$ m and interconnect pitches of 2  $\mu$ m and 10  $\mu$ m were assumed. The pillar diameters were 1  $\mu$ m and 5  $\mu$ m respectively. The extracted parameters are shown in Table 3.2. As expected from the transfer characteristics, the extracted parasitic inductances are much lower compared to PCB links which reiterates the assumption of using just R and C to model the links. Also, as shown in the table 3.2, the parasitic capacitance and resistance of Si-IF links are much lower than those discussed in [23] for interposers. Since pad openings in this technology are much smaller than that in traditional packages, the parasitic pad capacitances are also low.

| Interconnect     | Wire | length | $\mathbf{B} \otimes 1 \mathbf{GHz} (0)$ | L (nH) | C (fF) |  |
|------------------|------|--------|-----------------------------------------|--------|--------|--|
| Pitch ( $\mu$ m) | (µm) |        |                                         |        |        |  |
| 2                | 100  |        | 2.09                                    | 0.10   | 17.3   |  |
| 2                | 500  |        | 9.33                                    | 0.68   | 79.23  |  |
| 10               | 100  |        | 1.89                                    | 0.10   | 8.54   |  |
| 10               | 500  |        | 8.85                                    | 0.54   | 34.10  |  |

Table 3.2: RLGC Extraction

The extracted parameters were exported into circuit simulator for eye diagram simulations. The lumped equivalent circuit model to simulate an end-to-end link with transceivers is shown in fig 3.12. Commercial 45 nm technology library was used to design the transceivers and HSPICE L-2016.06 was used for circuit-level simulations. A tapered buffer was used to design the driver with the last stage having an NMOS width of 1  $\mu$ m. This ensures good signal slew while providing required drive strength. The receiver circuit is a simple buffer. The pads on the dielets must be protected from ESD events that might damage the built-in devices. Typically, ESD protection capacitors are used for such purpose. Due to the low contact area per interconnect and dielet handling in this technology, the ESD protection capacitors needed for dielets is expected to be lower than those in traditional packages. A total ESD protection capacitance of 50fF was assumed in the simulations, which was the additional load at both the transceiver output, and receiver input terminals. Two different pitches were simulated with wire length of 2  $\mu$ m pitch channel being 100  $\mu$ m; and the 10  $\mu$ m pitch channel being 500  $\mu$ m. The eye diagrams at the output of receiver buffer are shown at operating frequency of 10 GHz. The rise and fall time were assumed to be 10% of Unit Interval (UI) and the duty cycle distortion was assumed to be 10% of UI. The eye diagrams are shown in fig 3.13 and 3.14. The eye-opening height is 997 mV for both 2  $\mu$ m and 10  $\mu$ m pitch channel. The eye-opening width is 68.4 ps and 59.81 ps for 2  $\mu$ m pitch channel and 10  $\mu$ m pitch channel respectively. The figures show a clear eye opening for both 2  $\mu$ m pitch and 10  $\mu$ m pitch channel. It illustrates that data rates of >10 Gbps can easily be achieved on Si-IF with short wirelengths and fine interconnect pitch.



Figure 3.12: Schematic of circuit used for signal integrity analysis



Figure 3.13: Eye-diagram of 2  $\mu$ m pitch interconnect at 10 GHz input frequency



Figure 3.14: Eye-diagram of 10  $\mu$ m pitch interconnect at 10 GHz input frequency

### 3.4 Copper Pillar Interconnect Model

To study the current carrying capacity of the Cu pillar interconnects in Si-IF, ANSYS Maxwell 3D model was designed and simulated. The model consists of a section of Si-IF with Si and SiO<sub>2</sub> layer terminated with Cu pillars and Cu pads at the bottom. The pillar diameter was varied 1  $\mu$ m to 10  $\mu$ m while the pad width is also varied accordingly such that there is a 1  $\mu$ m overlay tolerance on either side of the pillar. The thickness of the pillar is 5  $\mu$ m and the pad is 1  $\mu$ m. The other end of the pillar is bonded to Cu pad on dielet which is also a Si die with SiO<sub>2</sub> layer. The air gap between the Si-IF and dielet is 2  $\mu$ m. The schematic of the copper pillar interconnect is shown in fig 3.15. The Cu-Cu bond is assumed to be perfect with no voids or intermetallics for ease of simulation. A uniform current of 100 mA excitation is presented at one end of the pad and the current distribution across the pillar is analyzed. The current distribution in the 5  $\mu$ m pillar is shown in fig 3.16. From the current distribution, it is evident that the current crowding decreases as the pillar diameter decreases.

To compare the current crowding effect with traditional C4 bumps, a C4 bump model was also simulated. The model consists of top dielet with aluminum (Al) pad opening and Cu UBM layer with tin (Sn) solder [24]. The thickness of Al is 1  $\mu$ m and Cu UBM is 3  $\mu$ m.



Figure 3.15: Copper Pillar Interconnect Model



Figure 3.16: Current distribution in 5  $\mu$ m diameter copper pillar interconnect

The solder ball diameter is varied from 50  $\mu$ m to 150  $\mu$ m. Between the copper and solder, there is a layer of intermetallic i.e. Cu<sub>6</sub>Sn<sub>5</sub> of thickness 1  $\mu$ m. The bottom substrate consists of Cu trace terminated with Ni. The thickness of the Cu trace is 35  $\mu$ m which is the typical thickness of traces in organic substrates and the Ni thickness is 4  $\mu$ m. The solder forms a NiSn intermetallic of thickness 1  $\mu$ m. The model is shown in fig 3.17. and the current distribution in 50  $\mu$ m C4 bump is shown in fig 3.18. The current distribution shows the maximum current density at the Al pad opening which is expected and the current crowding ratio is 60 which 12x higher than 1  $\mu$ m pillar.  $\mu$ -bumps with copper pillars and solder caps



were also studied with appropriate structures not shown in this thesis.

Figure 3.17: C4 bump Interconnect Model



Figure 3.18: Current distribution in 50  $\mu$ m diameter C4 bump in log scale

The current crowding ratio is defined as the ratio between maximum current density in the interconnect and the average current density across the interconnect. The comparison of current crowding effects is shown in fig 3.19. The plot shows that the current crowding ratio decreases as the interconnect diameter decreases. The current crowding ratio in direct Cu-Cu bonded pillars is 12x - 30x better than that is solder based interconnects [25, 26, 27, 28]. The maximum current density before electromigration in Cu is  $5.7x10^6$  A/cm<sup>2</sup>, which is two orders of magnitude higher than in solder bumps which is  $3.2x10^4$  A/cm<sup>2</sup> [29]. This illustrates the higher current carrying capability for Cu pillars over solder based interconnects. The area required to supply 30 A of current for different interconnect pitches is shown in fig 3.20. The interconnect pitch is assumed to be roughly twice the pillar diameter. The plot shows that using fine pitch interconnects, same supply of current can be transferred with a smaller area overhead providing significant benefits with very low IR drop of less than 0.75 mV per interconnect at maximum current density.



Figure 3.19: Plot of current crowding ratio variation across interconnect diameter



Figure 3.20: Area required for 30A current supply vs interconnect pitch

# CHAPTER 4

# **Benefits of SuperCHIPS Protocol**

In this chapter, the latency, bandwidth, and power benefits of SuperCHIPS integration protocol are estimated and compared with the conventional PCB based integration technology, interposer technology and SoC systems. Analysis were done for 2  $\mu$ m and 10  $\mu$ m interconnect pitch channels with pillar diameter being half the pitch. The wire width of the channels was 1  $\mu$ m. SuperCHIPS provides a protocol based on fine pitch integration of system where the inter-dielet spacing is  $\approx 10$  - 20x smaller than the conventional packaged systems on PCB. The small inter-dielet spacing as shown earlier provides superior data transfer characteristics due to availability of short channels which is a key enabler for high bandwidth. The fine pitch interconnects provide  $\approx 15$  - 80x more number of I/O pins compared to BGA interconnects and  $\approx 2$  - 10x more compared to Cu  $\mu$ -bumps [30]. This ensures the bandwidth requirement per link is correspondingly lower than that in traditional technologies.

#### 4.1 Latency

For a first order estimate of latency in Si-IF data links, the Elmore delay [31] is calculated from the last stage of the driver to the receiver input. 32 nm technology node driver resistance values were utilized for the calculations. The last stage driver was assumed to be 3.2  $\mu$ m wide. The driver resistance and gate capacitance values are given in [32]. The 2  $\mu$ m interconnect pitch links were assumed to be of 100  $\mu$ m wire length while the 10  $\mu$ m interconnect pitch links were assumed to be of 500  $\mu$ m wire length. The latency was estimated for both the cases, with and without ESD protection capacitors at the terminals of the transmitter and receiver. The ESD capacitance was assumed to be 50 fF at each terminal. The overall latency in SuperCHIPS depending on the technology and driver design, with ESD range from 50 - 100 ps dominated by the external ESD capacitance. Without ESD, the latencies can go as low as 30 - 40 ps. The comparison of latencies is presented in fig 4.1. The latencies without ESD are close to the SoC latencies. Even with ESD, the latencies are 13x lower than other integration technologies.



Figure 4.1: Comparison of Latencies across different technologies

### 4.2 Power

The energy per bit is calculated as  $CV^2$  plus the switching energy of the transceiver. The energy per bit is significantly lower (<0.3pJ/bit) using SuperCHIPS compared to traditional systems. As mentioned earlier, this is mainly attributed to the reduced driver complexity. The elimination of power hungry SerDes transceivers significantly reduced the I/O power by almost 80x. Also, the short channel lengths correspond to low channel losses and hence have higher energy efficiency for signal transfer. The comparison of energy per bit with other technologies is shown in fig 4.2.



Figure 4.2: Comparison of Energy per bit across different technologies

### 4.3 Bandwidth

From the extracted parasitics, the achievable maximum data-rate was calculated for 6 time constant ( $\tau = RC$ ) settlement. Again the  $\tau$  is dominated by the ESD protection capacitance. With ESD capacitance, the data-rate per link can range from 4 - 10 Gbps. Without ESD, the data-rate per link can go as high as 20 Gbps. The data-rate per link in SuperCHIPS is expected to be lower than that in SerDes links since there is no serialization of data bits. However, the bandwidth per mm of die edge is higher due to increased density of interconnections. For bandwidth estimations, the pin configuration assumed was two rows of staggered pins with half of them being signal and rest are ground. The predicted bandwidth is shown in fig 4.3. As shown in the figure, the maximum bandwidth per mm of die edge without ESD approaches SoC bandwidth values. With ESD, the bandwidth per mm of die edge is still 30x higher than traditional technologies. The bandwidth per link can further be increased by using stronger drivers at the cost of a higher energy per bit.



Figure 4.3: Comparison of Bandwidth/mm across different technologies

# 4.4 SuperCHIPS vs Conventional Package

Table 4.1 presents the overall comparison of key parameters such as the latency, energy per bit and bandwidth of SuperCHIPS integration protocol with the existing mainstream integration technologies.

| Interconnect pitch/<br>protocol                        | 2 μm<br>on Si IF<br>Super-                                            | 10 μm<br>on Si IF<br>Super-                                         | 50 μm<br>on Si In-<br>terposer | 400 μm<br>on FR4<br>PCB/ |  |
|--------------------------------------------------------|-----------------------------------------------------------------------|---------------------------------------------------------------------|--------------------------------|--------------------------|--|
|                                                        | CHIPS CHIPS                                                           |                                                                     | DDR3                           | SerDes                   |  |
| Dielet Size (mm <sup>2</sup> )                         | 1-25                                                                  | 10-100                                                              | 25-600                         | 25-625                   |  |
| No of signal links                                     | 1,000-<br>5,000                                                       | 600-2,000 100-1,000                                                 |                                | 100-500                  |  |
| Inter-die distance $(\mu m)$                           | <100                                                                  | <500                                                                | <5,000                         | 10,000                   |  |
| Link Latoney (ng)                                      | $5.5^{\mathrm{a}}$                                                    | 24.3ª                                                               |                                | ΝΔ                       |  |
| Link Latency (ps)                                      | $8.7^{\mathrm{b}}$                                                    | $27.3^{\mathrm{b}}$                                                 | IN.A.                          |                          |  |
| Overall Latency (ps)                                   | 37 <sup>a</sup><br>40.22 <sup>b</sup>                                 | $ \begin{array}{c} 55.75^{a} \\ 58.8^{b} \end{array} $ $300^{[23]}$ |                                | ≈1000                    |  |
| Max data-rate/link 20.5 <sup>a</sup> 4.76 <sup>a</sup> |                                                                       | 1 c[33]                                                             | 40[1]                          |                          |  |
| (Gbps)                                                 | $13^{\mathrm{b}}$                                                     | $4.21^{\rm b}$ $1.6^{[33]}$                                         |                                | 40[1]                    |  |
| Energy per bit (pJ/b)                                  | < 0.3 <sup>b</sup>                                                    | <0.4 <sup>b</sup>                                                   | $9.48^{[33]}$                  | $23.2^{[1]}$             |  |
| Max Bandwidth per<br>mm (Gbps/mm)                      | andwidth per $10,250^{a}$ $2,380^{a}$ $300^{b}$ $1,300^{b}$ $421^{b}$ |                                                                     | 32                             | 100                      |  |
| Total I/O power (W) 2.82-14.28 2.13-6.74               |                                                                       | 2.13-6.74                                                           | 6-15                           | 46-230                   |  |

a: Without ESD Capacitance

b: With ESD Capacitance

Table 4.1: Si-IF vs Conventional Package Comparison

# CHAPTER 5

### **Experimental Demonstration and Results**

#### 5.1 Test Vehicle Design

As mentioned earlier, the key enablers for SuperCHIPS integration protocol are the availability of fine pitch interconnects  $\leq 10 \ \mu$ m pitch and small inter-dielet spacing of  $\leq 100 \ \mu$ m. In this work, both Si-IF and dielets test vehicles were designed to demonstrate the fine pitch interconnects and small inter-dielet spacing. The Si-IF was fabricated using the BEOL processing, with two levels of conventional Cu-Damascene process. The wire pitches were 2 -10  $\mu$ m. The Si-IF is terminated with Cu-pillars of 5  $\mu$ m diameter and 10  $\mu$ m pitch. The dielets are similarly prepared but contain only one level of metal, i.e. Cu pad terminations. The Cu pillars and pads were capped with electro-less nickel immersion gold (ENIG) plated Ni-Au to prevent oxidation of Cu terminations. The wiring in Si-IF, the pillars and the Cu pads on the dielets form a daisy chain.

For demonstration of fine interconnect pitch, electrical continuity of daisy chains was measured using fan-out wires on Si-IF. The dielet size is 4 mm x 4 mm and the test vehicle has a total of 640,000 connections. Fig 5.1 shows the micrographs of the fabricated Si-IF. The micrograph of the dielet is depicted in fig 5.2. For demonstration of small inter-dielet spacing, Si-IF accommodating 2 mm x 2 mm dielets at 100  $\mu$ m inter-dielet spacing was designed without fanout wires as shown in the fig 2.2.

#### 5.1.1 Fabrication process flow for Si-IF

The fabrication process steps of the Si-IF are indicated in fig 5.3. The fabrication process for dielets is also similar but stops at the step 5. The fabrication process is like BEOL Cu



Figure 5.1: Micrograph of the test Si-IF with Cu-pillars ( $\Phi$ = 5  $\mu$ m, Pitch = 10  $\mu$ m). Test pad size is 50 x 50  $\mu$ m



Figure 5.2: Micrograph of a fabricated test dielet (2 x 2 mm) and 10  $\mu$ m interconnect pitch

damascene process. However, the last step in Si-IF fabrication is the recess of SiO<sub>2</sub> dielectric layer by 1 - 2  $\mu$ m in order to expose the pillar for TCB.



Figure 5.3: Fabrication process flow for Si-IF

### 5.2 Experimental Demonstration

#### 5.2.1 Thermal Compression Bonding

The TCB of the test sites was carried out using an optimized dielet-to-wafer bonder from K&S. The dielet is optically aligned to the Si-IF with pattern recognition techniques. The key parameters of TCB used in the experiments are shown in Table 5.1. The dielet temperature and substrate temperature were chosen such that the interface temperature is  $\approx 250^{\circ}$ C. The dielets were initially temporarily bonded (tacking) for 20 s. After the placement of all the dielets on Si-IF, the Si-IF is then annealed for a longer period. This is a batch process and can be done at elevated temperatures or in vacuum. In this work however, the batch anneal

| Process Parameters       | Value                   |
|--------------------------|-------------------------|
| Bond head temperature    | $350^{\circ}\mathrm{C}$ |
| Bottom chuck temperature | 120°C                   |
| Bonding pressure         | 64 MPa                  |
| Tacking time             | 20 s                    |
| Annealing time           | 8 min                   |
| Chamber environment      | Air                     |
|                          |                         |

is done at bonding temperature of  $\approx 250^{\circ}$ C in ambient atmosphere.

Table 5.1: Thermal Compression Bonding Parameters

#### 5.2.2 Alignment Accuracy

Alignment accuracy of dielet to substrate is a key requirement for the bonding of fine pitch interconnects.  $\pm 1 \,\mu$ m alignment overlay accuracy was achieved under the bonding conditions mentioned earlier in table 4.1. For optimization of the alignment process, Si-IF with only metal pillars and dielets with metal pads were fabricated with evaporated Titanium/Gold (Ti/Au) (50/200 nm) using lift off process. These dielets were tacked to Si-IF for 20 s and then sheared to observe the alignment accuracy. The sheared dielet showing the alignment accuracy is shown in fig 5.4.

#### 5.2.3 Inter-Dielet Spacing

The achieved inter-dielet spacing of 100  $\mu$ m between the 2 mm x 2 mm dielets is shown in fig 5.5. A total of 112 dielets are mounted on a quarter of 4-inch Si-IF shown in fig 2.2. The inter-dielet spacing can further be reduced to 30 - 50  $\mu$ m. Roughness of the dielet edges due to dicing is one of the major limitations to achieve finer inter-dielet spacings. This can be addressed by advanced dicing techniques like plasma dicing.



Figure 5.4: Dielet after shearing off the test Si-IF. An overlay of  $\pm 1 \ \mu m$  in x-direction,  $\pm 0.5 \ \mu m$  in y-direction. Rotation of around z-axis is  $0.0003^{\circ}$ 



Figure 5.5: Micrograph of the dielets assembled with 100  $\mu$ m inter-dielet spacing

### 5.3 Continuity Results

Continuity of the daisy chains were measured for the 4 mm x 4 mm die mounted on Si-IF with the fan-out wires. Electrical continuity was demonstrated for 197 daisy chains out of 200 chains tested. Each daisy chain is comprised of 400 interconnects. In the chains that failed, upon further inspection, it was observed that only a few of the 400 interconnections in failed. For the worst case, assuming all 400 interconnects in a daisy chain failed, the continuity yield is 98.5%. However, for the best case, assuming only a single interconnect among the 400 interconnects failed, the yield is >99.99%. The measured average contact resistance per interconnect (pillar-to-pad) was found to be 42 mΩ as shown in fig 5.6. This corresponds to a specific contact resistance of  $0.82 \ \Omega-\mu\text{m}^2$  [34].



Figure 5.6: IV-characteristics of a daisy chain including interconnects, pads and fanout wires

# CHAPTER 6

# Conclusion

In this work, a Simple Universal Parallel intERface integration protocol for chips (Super-CHIPS) was introduced for high performance heterogeneous systems. SuperCHIPS promises SoC-like performance and flexibility for system-level heterogeneous integration. The close inter-dielet assembly and fine pitch interconnects on Si-IF are key enablers for this protocol. Simulations show the channel loss to be less than -2 dB and the cross-talk less than -15 dB. Thus, with simple transceiver drivers, energy per bit of <0.3 pJ/b can be achieved even for data rates >10 Gbps per link for short links ( $\leq 100 \ \mu$ m) in Si-IF. The latency of a Si-IF link is estimated to be around 50 - 100 ps, dominated by the ESD capacitance. The bandwidth in SuperCHIPS systems can reach several terabits per second with massive number of parallel links instead of the traditional SerDes techniques at total I/O power <2.5W.

Analysis show that the SuperCHIPS protocol can result in 50x improvement in interconnect energy efficiency (pJ/bit), 13x latency decrement and 5x - 30x increment in bandwidth/mm compared to PCB based systems. In comparison with SoCs, the latency is only 2x and energy per bit is only 5x higher in SuperCHIPS. However, for large SoCs, these numbers become comparable. Thus, fine pitch dielet based integration scheme with the SuperCHIPS protocol finds a sweet spot between conventional integration technologies and SoC based designs. This technology provides the flexibility to decompose an SoC into sets of constituent components, where each set is implemented on a different dielet, by presenting a solution to tightly reintegrate them. Because different sub-components may be optimized differently, dielet-based assembly provides opportunities to optimize overall power, performance, reliability, and cost of a system.

#### References

- R. Navid et al., "A 40 Gb/s Serial Link Transceiver in 28 nm CMOS Technology," in IEEE Journal of Solid-State Circuits, vol. 50, no. 4, pp. 814-827, April 2015.
- [2] R. Dettmer. Brighter prospects for wafer-scale integration, Electronics and Power, pp.283288, April 1986.
- [3] H. Lee, et al., "Design and signal integrity analysis of high bandwidth memory (HBM) interposer in 2.5D terabyte/s bandwidth graphics module," 2015 IEEE 24th Electrical Performance of Electronic Packaging and Systems (EPEPS), San Jose, CA, 2015, pp. 145-148.
- [4] K. Cho et al., "Design optimization of high bandwidth memory (HBM) interposer considering signal integrity," 2015 IEEE Electrical Design of Advanced Packaging and Systems Symposium (EDAPS), Seoul, 2015, pp. 15-18.
- [5] K. Cho, H. Lee, J. Kim, "Signal and power integrity design of 2.5D HBM (High bandwidth memory module) on SI interposer", Pan Pacific Microelectronics Symposium (Pan Pacific), Jan. 2016, pp. 1- 5.
- [6] B. Sawyer et al., "Design and Demonstration of 2.5D Glass Interposers as a Superior Alternative to Silicon Interposers for 28 Gbps Signal Transmission," 2016 IEEE 66th Electronic Components and Technology Conference (ECTC), Las Vegas, NV, 2016, pp. 972-977.
- [7] B. Sawyer, B. C. Chou, S. Gandhi, J. Mateosky, V. Sundaram and R. Tummala, "Modeling, design, and demonstration of 2.5D glass interposers for 16-channel 28 Gbps signaling applications," 2015 IEEE 65th Electronic Components and Technology Conference (ECTC), San Diego, CA, 2015, pp. 2188-2192.
- [8] Y. Kim, J. Cho, K. Kim, V. Sundaram, R. Tummala and J. Kim, "Signal and power integrity analysis in 2.5D integrated circuits (ICs) with glass, silicon and organic interposer," 2015 IEEE 65th Electronic Components and Technology Conference (ECTC), San Diego, CA, 2015, pp. 738-743.
- [9] S. S. Iyer, "Heterogeneous Integration for Performance and Scaling," in IEEE Transactions on CPMT, vol. 6, no. 7, pp. 973-982.
- [10] Z. Wang, M. Pantouvaki, G. Morthier, C. Merckling, J. vanCampenhout, D. van Thourhout, and G. Roelkens, "Heterogeneous Integration of InP devices on silicon", In 2016 Compound Semiconductor Week (CSW), June2016.
- [11] O. Moutanabbir and U. Gsele. "Heterogeneous integration of compound semiconductors" in Annual Review of Materials Research, pp. 469500, 2010.

- [12] A. Gutierrez-Aitken, P. Chang-Chien, D. Scott, K. Hennig, E. Kaneshiro, P. Nam, N. Cohen, D. Ching, K. Thai, B. Oyama, J. Zhou, C. Geiger, B. Poust, M. Parlee, R. Sandhu, W. Phan, A. Oki, and R. Kagiwada., "Advanced Heterogeneous Integration of InP HBT and CMOS Si Technologies", In 2010 IEEE Compound Semiconductor Integrated Circuit Symposium (CSICS), pages 14, Oct 2010.
- [13] J. Fan and C. Tan (2012), "Low temperature wafer-level metal thermo-compression bonding technology for 3-d integration" in Metallurgy- Advances in Materials and Process, Dr. Yogiraj Pardhi (Ed.), InTech.
- [14] K. Azar and J. E. Graebner, "Experimental determination of thermal conductivity of printed wiring boards," Twelfth Annual IEEE Semiconductor Thermal Measurement and Management Symposium. Proceedings, Austin, TX, 1996, pp. 169-182.
- [15] A. S. White and L. H. Germer, "The rate of oxidation of copper at room temperature," R. B. Mears presiding. pp. 305319, 1942.
- [16] J. C. Yang, B. Kolasa, J. M. Gibson, and M. Yeadon, "Self-limiting oxidation of copper," Appl. Phys. Lett., vol. 73, no. 19, pp. 28412843, 1998.
- [17] Y. Zhu, K. Mimura, and M. Isshiki, "Oxidation mechanism of copper at 623-1073 K.," Mater. Trans., vol. 43, no. 9, pp. 21732176, 2002.
- [18] G. Kumar, T. Bandyopadhyay, V. Sukumaran, V. Sundaram, S. K. Lim and R. Tummala, "Ultra-high I/O density glass/silicon interposers for high bandwidth smart mobile applications," 2011 IEEE 61st Electronic Components and Technology Conference (ECTC), Lake Buena Vista, FL, 2011, pp. 217-223.
- [19] R. Weerasekera, J. R. Cubillo and G. Katti, "Analysis of signal integrity(SI) robustness in through-silicon interposer (TSI) interconnects," 2012 IEEE 14th Electronics Packaging Technology Conference (EPTC), Singapore, 2012, pp. 397-398.
- [20] G. Ghione and C. Naldi, "Analytical formulas for coplanar lines in hybrid and monolithic MICs," in Electronics Letters, vol. 20, no. 4, pp. 179-181, February 16 1984.
- [21] A. Deutsch et al., "When are transmission-line effects important for on-chip interconnections," 1997 Proceedings 47th Electronic Components and Technology Conference, San Jose, CA, 1997, pp. 704-712.
- [22] S. Delmas Bendhia, F. Caignet and E. Sicard, "On chip crosstalk characterization on deep submicron buses," Proceedings of the 2000 Third IEEE International Caracas Conference on Devices, Circuits and Systems (Cat. No.00TH8474), Cancun, 2000, pp. C70/1-C70/5.
- [23] H. Kalargaris and V. F. Pavlidis, "Interconnect design tradeoffs for silicon and glass interposers," 2014 IEEE 12th International New Circuits and Systems Conference (NEW-CAS), Trois-Rivieres, QC, 2014, pp. 77-80.

- [24] M.K. Md Arshad, U. Hashim, and Muzamir Isa, "Under bump metallurgy (UBM) -A technology review for flip chip packaging," International Journal of Mechanical and Materials Engineering (IJMME), Vol. 2 (2007), No. 1, 48-54.
- [25] Y. W. Chang et al., "Influence of trace geometry on the current crowding effect in ultrafine pitch MicroBump," 2010 5th International Microsystems Packaging Assembly and Circuits Technology Conference, Taipei, 2010, pp. 1-4.
- [26] Y. W. Chang et al., "Analysis of bump resistance and electrical distribution of ultrafine-pitch microbumps," 2010 5th International Microsystems Packaging Assembly and Circuits Technology Conference, Taipei, 2010, pp. 1-4.
- [27] Y. Wang, K. H. Lu, J. Im and P. S. Ho, "Reliability of Cu pillar bumps for flip-chip packages with ultra low-k dielectrics," 2010 Proceedings 60th Electronic Components and Technology Conference (ECTC), Las Vegas, NV, USA, 2010, pp. 1404-1410.
- [28] Y. W. Chang and Chih Chen, "Design of Al pad geometry for reducing current crowding effect in flip-chip solder joint using finite-element analysis," 2010 11th International Thermal, Mechanical & Multi-Physics Simulation, and Experiments in Microelectronics and Microsystems (EuroSimE), Bordeaux, 2010, pp. 1-4.
- [29] C. C. Wei et al., "Comparison of the electromigration behaviors between micro-bumps and C4 solder bumps," 2011 IEEE 61st Electronic Components and Technology Conference (ECTC), Lake Buena Vista, FL, 2011, pp. 706-710.
- [30] L. J. Bum, J. A. J. Li and D. R. M. Woo, "Process development of multi-die stacking using 20 um pitch micro bumps on large scale dies," 2014 IEEE 16th Electronics Packaging Technology Conference (EPTC), Singapore, 2014, pp. 318-321.
- [31] W. C. Elmore, "The Transient Analysis of Damped Linear Networks with Particular Regard to Wideband Amplifiers," Journal of Applied Physics, vol. 19(1), 1948.
- [32] S. X. Shian and D. Z. Pan, "Wire sizing with scattering effect for nanoscale interconnection," Asia and South Pacific Conference on Design Automation, 2006., Yokohama, 2006, pp. 6 pp.-.
- [33] M. A. Karim, P. D. Franzon and A. Kumar, "Power comparison of 2D, 3D and 2.5D interconnect solutions and power optimization of interposer interconnects," 2013 IEEE 63rd Electronic Components and Technology Conference, Las Vegas, NV, 2013, pp. 860-866.
- [34] L. Di Cioccio, P. Gueguen, R. Taibi, T. Signamarcheix, L. Bally, L. Vandroux, M. Zussy, S. Verrun, J. Dechamp, P. Leduc, M. Assous, D. Bouchu, F. De Crecy, L. L. Chapelon, and L. Clavelier, "An innovative die to wafer 3D integration scheme: Die to wafer oxide or copper direct bonding with planarised oxide inter-die filling," in 2009 IEEE International Conference on 3D System Integration, 3DIC 2009, 2009, pp. 710.