# UC Davis UC Davis Electronic Theses and Dissertations

### Title

Dynamic Voltage and Frequency Scaling Controller and Circuits Using Multiple Back Bias Voltages

### Permalink

https://escholarship.org/uc/item/0zr4j4qh

### Author

Cui, Jin

### **Publication Date**

2023

Peer reviewed|Thesis/dissertation

Dynamic Voltage and Frequency Scaling Controller and Circuits Using Multiple Back Bias Voltages

#### By

#### JIN CUI THESIS

Submitted in partial satisfaction of the requirements for the degree of

#### MASTER OF SCIENCE

 $\mathrm{in}$ 

Electrical and Computer Engineering

in the

#### OFFICE OF GRADUATE STUDIES

of the

#### UNIVERSITY OF CALIFORNIA

DAVIS

Approved:

Bevan Baas, Chair

Venkatesh Akella

Rajeevan Amirtharajah

Committee in Charge

2023

To my parents, J. Wang and Z. Cui.

# Abstract

Power and thermal limits have become increasingly significant for integrated circuits as the scale of integration keeps growing. Ultra-Thin Body and Buried Oxide (UTBB) Fully Depleted Silicon-on-Insulator (FD-SOI) is a technology aimed at improving the device performance and power efficiency at the same time. A thin buried oxide (BOX) layer is introduced to not only lower the leakage currents, but also enable an strong back biasing (BB) voltage that is adjustable through front-side contacts. As a result, the threshold voltage is tunable to achieve high performance across a wide range of supply voltages. A 28 nm UTBB FD-SOI Low Threshold Voltage (LVT) technology from STMicroelectronics provides transistors that operate normally across a wide supply voltage range.

It is a common practice that digital circuits are throttled according to their real-time workload to conserve power and reduce heat generation. This is achieved by introducing a dynamic voltage and frequency scaling (DVFS) circuit which optimizes the supply voltage and clock frequency automatically.

In this thesis, the 28 nm UTBB FD-SOI technology is characterized through transistorlevel circuit simulations. A DVFS controller design that supports two supply voltages and two back-bias voltages targeting the aforementioned technology to optimize circuit performance and reduce power consumption is proposed. Power gates are used to switch between voltages and shut down unused components. The DVFS controller suggests clock frequencies and voltages dynamically based on workload to maximize power efficiency without significantly sacrificing performance. Additionally, the controller's output is manually configurable to accommodate user control. In a simulation conducted on inverter chains, BB provides as much as 17% reduction in propagation delay versus no BB at 1.0 V nominal supply voltage, and a maximum 56% reduction at 0.5 V. The DVFS design contributes to an average of 20.5% and a maximum of 56.3% reduction in total energy consumed in the simulated applications versus no DVFS while maintaining 96% of the throughput.

# Acknowledgements

My journey towards the Master's study has been challenging but rewarding. It's been a long time since I stepped into this world, and during which I pleasantly met so many helpful, friendly individuals.

I would like to gratefully thank my major advisor, Professor Bevan Baas, for his firm support throughout my years of study. He not only gave me vaulable feedbacks for my work, but also offered great encouragements without which the completion of my study would not be possible.

I would like to thank Professor Venkatesh Akella and Professor Rajeevan Amirtharajah for serving on my thesis committee. Their valuable suggestions helped me shape this thesis.

I would like to thank Professor Kent Wilken. I was privileged to work as a teaching assistant under his guidance. It was a great chance to sharpen my communication skills and deepen my knowledge in respective fields.

Thanks to the fellow members of VCL I worked with, in no particular order: Jon Pimentel, Brent Bohnenstiehl, Satyabrata Sarangi, Timothy Andreas, Mark Hildebrand, Forough Mahmoudabadi, Filipe Borges, Christi Tain, Arthur Hlaing, Zhangfan Zhao, Sharmila Kulkarni, Yikai Mao, Haotian Wu, Wai Cheong Tsoi, Christian Lum, Benjamin Moore, Ziyuan Dong, Aidan Callahan, Thomas Abbott, Santhosh Sammeta, Sagar Sajeev, Dinesh Nagulapati, Michael Wang, Yuxuan Huo, Yechengnuo Zhang, Basheer Ammar, and Derek Li. I was able to learn new things from these amazing people every day.

Finally, I would like to thank my family, who encouraged me to pursue whatever I have in mind, and shared their endless care with me from thousands of miles away.

Special thanks to STMicroelectronics for providing the process design kit, and Dr. Philippe Flatresse who provided vital help in setting up the project.

# Contents

| $\mathbf{A}$  | bstra                | $\mathbf{ct}$                        | iii |  |  |
|---------------|----------------------|--------------------------------------|-----|--|--|
| A             | cknov                | wledgements                          | iv  |  |  |
| Li            | List of Figures viii |                                      |     |  |  |
| $\mathbf{Li}$ | st of                | Tables                               | ix  |  |  |
| 1             | Intr                 | oduction                             | 1   |  |  |
|               | 1.1                  | Motivation                           | 1   |  |  |
|               | 1.2                  | Background                           | 1   |  |  |
|               | 1.3                  | Design Goals                         | 3   |  |  |
|               | 1.4                  | Thesis Organization                  | 4   |  |  |
| <b>2</b>      | UT                   | BB FD-SOI Technology                 | 5   |  |  |
|               | 2.1                  | Overview                             | 5   |  |  |
|               | 2.2                  | Physical Structure                   | 6   |  |  |
|               | 2.3                  | Back Biasing                         | 8   |  |  |
| 3             | Cha                  | aracterizing the UTBB FD-SOI Process | 11  |  |  |
|               | 3.1                  | Workflow                             | 11  |  |  |
|               | 3.2                  | Cadence Spectre and MDL              | 13  |  |  |
|               | 3.3                  | Logic Chain Simulations              | 16  |  |  |

| 4  | Sca            | ling Approaches                         | <b>22</b> |  |
|----|----------------|-----------------------------------------|-----------|--|
|    | 4.1            | Voltage Scaling                         | 22        |  |
|    |                | 4.1.1 Previous Work on Voltage Scaling  | 22        |  |
|    |                | 4.1.2 Voltage Scaling Architecture      | 24        |  |
|    |                | 4.1.3 Power Gates                       | 26        |  |
|    |                | 4.1.4 Supply Voltage Switching          | 28        |  |
|    |                | 4.1.5 Bias Voltage Switching            | 29        |  |
|    | 4.2            | Voltage and Frequency Selection         | 30        |  |
| 5  | Imp            | blementation                            | 33        |  |
|    | 5.1            | Voltage Switchers                       | 33        |  |
|    | 5.2            | DVFS Controller                         | 41        |  |
| 6  | $\mathbf{Sim}$ | ulation Results                         | 46        |  |
|    | 6.1            | DVFS Controller Functionality           | 46        |  |
|    | 6.2            | DVFS Effectiveness on 28 nm UTBB FD-SOI | 47        |  |
| 7  | Cor            | nclusion                                | 53        |  |
|    | 7.1            | Summary                                 | 53        |  |
|    | 7.2            | Future Work                             | 53        |  |
| G  | ossa           | ry                                      | 55        |  |
| Bi | Bibliography   |                                         |           |  |

# List of Figures

| 2.1 | Proposed FD-SOI process scaling roadmap down to 10 nm [1]                                                                  | 6  |
|-----|----------------------------------------------------------------------------------------------------------------------------|----|
| 2.2 | Cross section of UTBB FD-SOI with BB [1]                                                                                   | 7  |
| 2.3 | Perspective illustration of planar UTBB FD-SOI and 3D FinFET devices $[1]$ $\ldots$ .                                      | 7  |
| 2.4 | Body factor ( $\gamma$ ) of UTBB FD-SOI versus comparable bulk transistor [2]                                              | 8  |
| 2.5 | Back biasing concept [3]                                                                                                   | 9  |
| 2.6 | Difference in GP doping between NMOS devices [4]                                                                           | 9  |
| 2.7 | RVT and LVT bias polarity $[5]$                                                                                            | 10 |
| 3.1 | The MDL flow [6]                                                                                                           | 12 |
| 3.2 | FO4 inverter chain with multiple stages                                                                                    | 16 |
| 3.3 | Propagation delay of a single inverter with viariable BB and $V_T$ options $\ldots \ldots$                                 | 17 |
| 3.4 | Propagation delay of a single inverter with variable BB and PB options $\ldots \ldots \ldots$                              | 18 |
| 3.5 | Normalized static power consumption of LVT and RVT devices with different $V_{dd}$ ,                                       |    |
|     | $PB = 0  \dots  \dots  \dots  \dots  \dots  \dots  \dots  \dots  \dots $                                                   | 19 |
| 3.6 | Energy Delay Product of (a) LVT and (b) RVT devices with different BB voltages $% \left( {{{\bf{B}}_{{\rm{B}}}} \right)$ . | 20 |
| 3.7 | Normalized propagation delay of the two LVT gate chains versus reference LVT                                               |    |
|     | inverter chain                                                                                                             | 21 |
| 4.1 | Workload varies among cores and over time [7]                                                                              | 23 |
| 4.2 | Dynamic voltage scaling with multiple supply voltage and back bias rails grouped                                           |    |
|     | into buses                                                                                                                 | 24 |

| 4.3  | Histogram of core clock frequency required for LDPC decoding on KiloCore, a 1000-                                                                                                   |    |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
|      | core processor array [8]                                                                                                                                                            | 25 |
| 4.4  | Inverter chain to demonstrate voltage droop on the grid                                                                                                                             | 27 |
| 4.5  | Simulation of voltage droop with various power gate sizes and decoupling capacitors                                                                                                 | 28 |
| 4.6  | Three voltage settings with their corresponding operable frequency ranges and volt-                                                                                                 |    |
|      | age switching points                                                                                                                                                                | 31 |
| 5.1  | Supply switcher circuit in detail                                                                                                                                                   | 35 |
| 5.2  | Back bias switcher circuit in detail                                                                                                                                                | 36 |
| 5.3  | Multi-level power gate module                                                                                                                                                       | 37 |
| 5.4  | Passive logic level converter                                                                                                                                                       | 37 |
| 5.5  | Variable length delay chain                                                                                                                                                         | 38 |
| 5.6  | Supply switcher circuit timing diagram                                                                                                                                              | 38 |
| 5.7  | BB switcher circuit timing diagram when switching to and from zero back bias $\ldots$                                                                                               | 39 |
| 5.8  | BB switcher circuit timing diagram when switching between two back bias voltages .                                                                                                  | 40 |
| 5.9  | Top level block diagram of the DVFS controller                                                                                                                                      | 41 |
| 5.10 | Variable length ring oscillator                                                                                                                                                     | 42 |
| 5.11 | Workload sensing module                                                                                                                                                             | 43 |
| 5.12 | Voltage switch delay counter                                                                                                                                                        | 45 |
| 6.1  | Simulated waveform of a $V_{dd}$ switching process $\ldots \ldots $ | 46 |
| 6.2  | Simulated waveform of switching between two non-zero $V_{bb}$ values                                                                                                                | 47 |
| 6.3  | Simulated waveform of switching to zero $V_{bb}$                                                                                                                                    | 47 |
| 6.4  | Simulated waveform of switching from zero $V_{bb}$ to a non-zero value $\ldots \ldots \ldots$                                                                                       | 47 |
| 6.5  | Simulated waveform of logic level converter from 1 V to -1 V $\ \ldots \ \ldots \ \ldots \ \ldots$                                                                                  | 47 |
| 6.6  | Core clock frequencies of i7-8700K processor while opening a webpage                                                                                                                | 49 |
| 6.7  | Relative energy consumption of i7-8700K processor with DVFS $\ldots$                                                                                                                | 50 |
| 6.8  | Relative energy consumption of equivalent 28 nm UTBB FD-SOI processor with DVFS $$                                                                                                  | 51 |
| 6.9  | Normalized energy time product of processors with DVFS                                                                                                                              | 51 |

# List of Tables

| 5.1 | Configuration for the voltage switchers                                      | 34 |
|-----|------------------------------------------------------------------------------|----|
| 5.2 | Maximum Clock Frequencies for Different Voltage Settings                     | 42 |
| 5.3 | LUT contents                                                                 | 44 |
| 5.4 | DVFS configuration registers                                                 | 45 |
| 6.1 | Maximum clock frequency, execution time, and average power usage of i7-8700K |    |
|     | running LZMA2 file compression algorithm                                     | 49 |

# Chapter 1

# Introduction

#### 1.1 Motivation

The development of consumer electronics, especially smartphones and smart wearable devices in the past few years are heavily driven by the improvement of integrated circuit (IC) performance. Applications such as speech recognition, computational photography and mobile gaming requires powerful system-on-chip (SoC) with various hardware accelerators. However, the increase of circuit performance and scale comes at the cost of power consumption [9]. The slow development of chemical batteries and the physical nature of compact devices limits the power a processor can draw and the heat it can dissipate. It is vital that energy efficiency be taken into account by digital circuit designers.

### 1.2 Background

There have been many energy-saving techniques developed over the past few decades. Clock gating is one of the common approaches that reduces the power usage of a processor [10]. Temporarily disabling the clock inputs of unused circuits reduces the dynamic power consumption. Additionally, the load on the clock tree is also reduced [11].

To achieve higher efficiency without sacrificing peak performance, adjustable clock frequencies and supply voltages were introduced [12]. According to the dynamic power equation:

$$P_{dynamic} = \alpha C_L V_{dd}^2 f \tag{1.1}$$

where  $\alpha$  is the probability of switching,  $C_L$  is the circuit capacitive load,  $V_{dd}$  is the supply voltage, and f is the clock frequency. The dynamic power of a logic circuit is proportional to the operating frequency and the square of supply voltage. Lowering the clock frequency and the supply voltage leads to a significant reduction of dynamic power consumption. Maximum power savings is possible when the clock speed is tuned down if the workload is light. However, the supply voltage is a limiting factor of the minimum clock frequency, as demonstrated below [13]:

$$t_{pd} \lesssim \frac{V_{dd}}{V_{dd} - V_T} \tag{1.2}$$

where  $t_{pd}$  is the propagation delay, and  $V_T$  is the threshold voltage. The selected frequency must satisfy the circuit delay. Ideally, the circuit must always run at the minimum supply voltage possible when a target frequency is selected based on the workload. The on-the-fly adjustment of voltage and frequency is known as dynamic voltage and frequency scaling (DVFS).

Threshold voltage is another parameter for performance tuning, according to the relation above. Lowering the  $V_T$  results in a lower delay, thus increasing the circuit's maximum allowed frequency at a given voltage [14]. This comes at a cost of increased leakage power [15]. An adjustable  $V_T$  provides another degree of freedom for performance tuning. Body biasing technology is a common practice to realize this. Applying a voltage bias between the two terminals affects the  $V_T$ . This phenomenon is known as body effect [16]:

$$\gamma \approx \frac{\Delta V_T}{\Delta V_{bs}} \tag{1.3}$$

where  $V_{bs}$  is the body-source voltage and  $\gamma$  is the body factor which is determined by physical characteristics. A forward bias voltage (i.e. negative  $V_{bs}$ ) lowers the  $V_T$  and results in higher performance and higher leakage current, while a reverse bias voltage has the opposite effect [13].

Besides dynamic power, static power consumption should also be improved to further improve efficiency. As technology nodes evolve, leakage power becomes a significant part of the total power consumption [17] [18]. Transistors not in use should not be powered to minimize leakage and conserve power. This technology is known as power gating [19] [20] and is easily applicable to a DVFS design with little extra hardware.

While any digital circuit can benefit from power gating and DVFS, the continuation of parallelization in processor designs have made these approaches increasingly beneficial. As seen in the analysis above, increasing clock frequency significantly affects power density since higher  $V_{dd}$  is also required to reduce the switching time. Parallel computing architectures have become a dominant trend in processor designs in the last decade. By embedding more processor cores in a same chip, multi-thread computing tasks benefit greatly from parallel computing. Naturally, not all applications fully utilize every processor core all the time. When a core has low workload, downscaling the voltage and frequency is a simple yet effective way to lower its power consumption.

#### 1.3 Design Goals

DVFS is a technique to reduce the power/energy thirst of ICs while maintaining the ability to achieve high performance when needed. Different transistor technology leads to difference in transistor behaviors as different materials, physical structure, process etc. may all change the electrostatic performance of a device unavoidably. A DVFS design must taken full consideration of the characteristics of the technology it targets at. Usually, DVFS design is not an isolated job, but a significant part in the entire IC design process. The suitable frequencies, voltages, and even device sizes are all important choices to be made to satisfy user demands. The applicationspecific requirements must also be taken into consideration when designing the DVFS mechanism for a certain IC. Applications that have a short active window while mainly stays idle may benefit from a low-leakage low-performance technology, while applications that constantly require high throughput may need fast but inefficient devices.

In this thesis, the DVFS design is targeted at the latter type of applications, and thus performance has a higher priority than power consumption.

### 1.4 Thesis Organization

The rest of this thesis will present the following contents:

Chapter 2 briefly introduces the 28 nm Ultra-Thin Body and Buried Oxide (UTBB) Fully Depleted Silicon-on-Insulator (FD-SOI) technology.

Chapter 3 includes the simulation data acquired from characterizing the UTBB FD-SOI process with logic gate chains.

Chapter 4 introduces the voltage and frequency scaling mechanisms in the proposed design.

Chapter 5 shows the DVFS controller's implementation details.

Chapter 6 presents the simulated results of the effectiveness of DVFS based on UTBB

#### FD-SOI.

Chapter 7 concludes the thesis and provides suggestions for future work on this topic.

### Chapter 2

# **UTBB FD-SOI Technology**

This chapter introduces the process being used, the 28 nm UTBB FD-SOI technology from STMicroelectronics (ST).

#### 2.1 Overview

In the past, the improvement of integrated circuits' performance relied heavily on the shrinking of device size, as described by the Dennard scaling law [21]. However, the size of semiconductor devices cannot decrease indefinitely. The benefit of scaling diminishes as the technology nodes approach 7 nm and beyond [22] [23]. As semiconductor technology advances, the flaws of conventional planar transistors become an increasing concern in the industry [24]. A series of physical effects known as short channel effects (SCE) arose as an essential part of device physics in nano-scale devices that lead to serious performance, and were predicted to eventually end the technology scaling [25]. Meanwhile, the reduced physical dimensions of transistors result in greater leakage currents and lead to undesired high power consumption and high power density. The growing demand for performance has requires new solutions to be found. Over the years, new physical designs have been proposed to overcome short channel effects, some with sophisticated 3D structures, such as FinFET [26]. One of the solutions that stands out is UTBB FD-SOI which became a mainstream technology at ST deployed at 28 nm and beyond [27] [28]. UTBB FD-SOI delivers high performance and efficiency over a very wide voltage range (as low as 0.32 V seen in published



work [29]) and is considered suitable for many applications.

Figure 2.1: Proposed FD-SOI process scaling roadmap down to 10 nm [1]

### 2.2 Physical Structure

Traditional transistor fabrication involves applying several steps to a silicon substrate to create certain metal-oxide-semiconductor (MOS) structures. Devices built by this process are often referred to as bulk transistors. Silicon-on-insulator (SOI) uses a slightly different substrate. A buried oxide (BOX) insulator lies under an ultra-thin layer of silicon film where the transistors are fabricated, as shown in Figure 2.2. The transistors follows the traditional planar design and serves as a natural evolution of the traditional planar bulk technology. Despite the relatively high fabrication cost of SOI substrates [30], the process requirement is considerably simpler than FinFET (see Figure 2.3), leading to a process overall cost saving of about 10% [1].



Figure 2.2: Cross section of UTBB FD-SOI with BB [1]



Figure 2.3: Perspective illustration of planar UTBB FD-SOI and 3D FinFET devices [1]

The thin BOX layer provides several benefits over traditional bulk transistors. The insulator layer channels the flow of electrons between terminals, resulting in lower leakage currents. Also, the source and drain parasitic capacitance is reduced thanks to the smaller cross-sectional area [3]. More importantly, the BOX layer drastically lowers substrate leakage, allowing aggressive back biasing (BB) to be applied through the BOX. According to ST, over 200% stronger body effect is achieved compared to bulk technology (see Figure 2.4) with a much wider BB voltage range.



Figure 2.4: Body factor  $(\gamma)$  of UTBB FD-SOI versus comparable bulk transistor [2]

Additionally, ST offers 4 different gate length options (often referred to as Poly Bias, or PB by ST) from 24 nm to 40 nm [31] to further accompany various design goals. A shorter gate length provides better performance since electrons travel a shorter distance [32].

#### 2.3 Back Biasing

As discussed previously in Chapter 1, BB utilizes body effect to modify the threshold voltage  $(V_T)$  of transistors. Back bias, by definition, is a bias voltage applied to the back of the BOX, so that  $V_{bs} \neq 0$ , effectively forming a second gate structure. The bias voltage applied to PMOS and NMOS are referred to as  $V_{bp}$  and  $V_{bn}$  respectively. The voltage relative to the source determines the bias polarity and the strength. Figure 2.5 demonstrates the concept of back biasing and how it affects  $(V_T)$ . Forward back biasing (FBB) is aimed at maximizing performance by lowering  $V_T$ . Reverse back biasing (RBB) minimizes the leakage, and is more suitable for ultralow-power applications that idle frequently.



Figure 2.5: Back biasing concept [3]

Two transistor flavors are provided by ST's UTBB FD-SOI technology: Low  $V_T$  (LVT) and Regular  $V_T$  (RVT). The key difference is the doped substrate (ground plane, GP, illustrated in Fig. 2.2) beneath the BOX layer. An RVT cell utilizes conventional wells where the diffusion type is similar to bulk transistors. PMOS sits above n-doped GP and NMOS sits above p-doped GP. Contrarily, an LVT cell is built on flip wells where PMOS sits on p-GP and NMOS sits on n-GP [3]. LVT has a focus on FBB and RVT is designed for RBB. Body effect is sometimes referred to as back-gate effect [16]. Figure 2.6 demonstrates the difference between the two transistor types. Note that the figure is from early stage development of the technology, and a different naming convention is used. HVT in this figure is equivalent to RVT in the discussion above.



Figure 2.6: Difference in GP doping between NMOS devices [4]

For RVT devices with conventional well structure, applying BB is straightforward. Two additional voltage sources are required to establish a positive  $V_{bs}$  over  $V_{dd}$  and a negative  $V_{bs}$  below GND. However, LVT devices with flip well structure applies the bias in a slightly different manner: both  $V_{bp}$  and  $V_{bn}$  are tied to the GND. The flip well structure forms a p-n diode effectively. To prevent the voltage of the p-GP from exceeding the voltage of the n-GP, turning the back substrate into a forward-biased diode, the p-GP must be kept at a lower voltage than the n-GP. Luckily, the FBB of LVT devices reduces the voltage on p-GP and increases the voltage on n-GP, thus lowering the p-GP voltage to GND when  $V_{bp}=0$  is a easily viable solution and simplifies the circuit design process. The differences between the devices are illustrated in Figure 2.7. Single n-well (SNW) and single p-well (SPW) are two other proposed schemes that provides more  $V_T$  options.

To simplify the discussions, all BB voltages  $(V_{bb})$  mentioned in the following chapters are absolute values. This applies to FBB for LVT devices and RBB for RVT devices.



Figure 2.7: RVT and LVT bias polarity [5]

## Chapter 3

# Characterizing the UTBB FD-SOI Process

This chapter focuses on the characterization of 28 nm UTBB FD-SOI technology with Cadence Spectre Circuit Simulator platform, which is proven to have great compatibility with the library provided by ST through the simulations. The library is classified as confidential, so not all information can be disclosed in great detail.

### 3.1 Workflow

Test circuits for this purpose are not complex, so the netlists are written manually. Then, a sweep of different supply voltages  $V_{dd}$  and BB voltages  $V_{bb}$  is performed via Spectre Measurement Discription Language (MDL), a scripting language designed specifically to simplify the simulation setup and control. A MDL script template was written as the basic testbench, and a Python script was used to switch between the different device types, and generate the MDL scripts necessary for the simulations performed.



Figure 3.1: The MDL flow [6]

#### **3.2** Cadence Spectre and MDL

The circuit simulation software used throughout this research project is Spectre, a SPICE-based simulation solution with fast simulation speed and strong customizability. Two files are required to conduct a simulation: a library file that models the behavior of standard cells and devices, and a netlist file that describes the circuit components and connectivity. Spectre netlists has different syntax than standard SPICE netlists, but a SPICE Reader tool is included in the suite to convert between two languages freely. The library is provided by ST as *corners.scs* which includes all the transistor variations available in this technology. Additionally, a measurement file in the form of an MDL script that controls the inputs can be added for more complicated simulation tasks that requires multiple runs with variable input values. Besides the standard outputs saved to a *.raw* file from the Spectre simulator, MDL includes powerful built-in mathematical functions that are capable of preliminary data processing, and the results are saved to a *.measure* file for further analyses. MDL takes over the simulation inputs and processes the outputs with the ability to generate indirect results using built-in mathematical expressions. The results are saved to a separate file while the raw outputs of the simulator are preserved.

The structure of a netlist file required by Spectre is demonstrated in the sample code below.

```
1 // NETLIST SAMPLE UTBB FD-SOI INVERTER LVT
2 // lvt.scs
3
4 simulator lang=spectre
5 include "corners.scs"
6
7 // Parameters
8 parameters VDD=1.1
9 parameters VBB=0.5
10 parameters glen=24n
11
12 // Inverter Subcircuit
```

```
13 subckt Inv (out vin vdd gnd vbp vbn)
          (out vin vdd vbp)
                                 lvtpfet l=glen w=300n
14
      Μ1
          (out vin gnd vbn)
                                 lvtnfet l=glen w=210n
      M2
16 ends
17
18 // Circuit Instances
19 XinvlO
          (out0 vin vdd 0 vbp vbn)
                                      Inv m=1
 Xinvl1
          (out1 out0 vdd 0 vbp vbn) Inv m=4
          (out2 out1 vdd 0 vbp vbn) Inv m=16
21
  Xinvl2
23 // Input Sources
  Vdd (vdd 0) vsource dc=VDD
  Vin (vin 0) vsource type=pwl wave=[0 0 1n VDD 10n VDD 10.01n 0 20n 0]
26 Vbp (0 vbp) vsource dc=VBB
27 vbn (vbn 0) vsource dc=VBB
```

The sample file demonstrates a circuit of twenty-one inverters connected into a chain of three stages with a fan-out of 4 (FO4). The beginning of the file specifies the simulator's netlist syntax to use (lang=spectre) and imports the standard cell library. Parameters are variables that can be modified in a separate MDL script. Inverter subcircuits are created for convenient re-use by defining the input and output (I/O) nodes, and how the standard cells are connected internally. In this case, the inverter consists of two transistor components, *lvtpfet* and *lvtnfet* from the standard cell library. BB voltages for PMOS and NMOS are defined as *vbp* and *vbn* resepctively, and are connected to their terminals as specified by the standard cell library. Next, three inverter instances are created by calling the *Inv* subcircuit and passing the I/O nodes to them. The multiplier *m* at the end of each instance call creates *m* copies of the same instance as if *m* instances are connected to the simulated circuit. The inverter chain input signal *vin* is a piecewise linear (PWL) function and its output follows the *wave* array behind. The *wave* array's format is [*time0 value0 time1 value1 ...*].

After constructing the netlist, an MDL script can be created to alter the parameters and collect the data from the simulation results. The result is exported to a text file with all the inputs and outputs specified in the MDL script (defaults to a *.measure* file unless otherwise specified with the *print* function). Below is a sample MDL script that measures the average static power of an inverter in the sample netlist.

```
1 // MDL SAMPLE INVERTER POWER LVT
2 // lvt.mdl
  alias measurement lvt {
      run tran(stop = 20n)
      real Pstat=avg(trim(sig=Xinvl1:pwr, from=5n, to=6n))/4
7
      export real Pstat_nW=Pstat*(1e9)
8
9 }
  foreach glen from {24n, 40n} {
      foreach temp from \{20, 40, 60\} {
           foreach VDD from swp(start=0.4, stop=1.1, step=0.1) {
13
               foreach VBB from swp(start=-0.3, stop=1.3, step=0.1) {
14
                   run lvt
               }
          }
17
      }
18
19 }
```

A measurement alias is a procedure that executes a simulation and performs calculations on the data gathered in the process. The *run* command initiates a Spectre analysis. In this case, a transient analysis is called. Next, a real number (*real*) type variable named *Pstat* is declared and defined as the result of an expression. The expression consists of several built-in functions, including *trim* which returns the specified portion of a signal, and *avg* which returns the average value of the input. The default result is in SI units with scientific notations, and for the sake of readability, the data is converted to nanowatts before saved to the default file with *export*.

To launch the measurement alias, the input variables must be passed to it. A loop statement in MDL is *foreach*, and it requires an *array*-type argument. A sweep function (swp) is

used as an example of sweeping across the given range at the given step, generating arithmetic sequences automatically.

To execute the simulation, simply use *spectremdl -tab lvt.mdl* command. The argument *-tab* outputs the measurement results in a tabular format, which is helpful for data analyses.

### 3.3 Logic Chain Simulations

In order to accurately capture the performance and power characteristics of UTBB FD-SOI devices, several gate chain circuits are generated and measured. By comparing the difference in data from multiple simulation runs, it is possible to establish a good estimation of the technology's behaviors in various designs. This serves as a statistical foundation for the design of a DVFS controller. Unless other specified, the temperatures in all simulations are set to 50 °C.



Figure 3.2: FO4 inverter chain with multiple stages

Figure 3.2 demonstrates a multi-stage FO4 inverter chain. Each stage has four times the number of inverters compared to its previous stage. A five-stage FO4 inverter chain is simulated for this thesis as a reference point for further designs and simulations. The third stage is measured to achieve a closer estimation of the behavior of devices in an actual data path. The output of the last stage is tied to a 0.1 pF capacitor. The simulation is performed on both LVT and RVT devices with all available PB options.

The results are as follows. Data points in Figure 3.3 reflects the performance of a inverter with the minimum allowed Poly Bias (PB), 24 nm, in the 28 nm UTBB FD-SOI technology. The results are normalized to LVT at 1.1 V and no BB. When  $V_{dd} = 0.5V$ , the normalized propagation delay of an RVT inverter is between 10 and 26.2. For LVT inverters, increasing the back bias results in a decreased propagation delay  $(t_p)$ , which corresponds to an improvement of performance. The performance gain becomes more significant when the supply voltage (labeled *VDD* in the figure) is low. At 1.1 V, the performance gain is merely 17% while at 0.5 V, the  $t_p$  is approximately halved. For RVT devices, increasing the BB strength has a negative impact on performance. The performance lost varies between 27% to 162% depending on the  $V_{dd}$ .



Figure 3.3: Propagation delay of a single inverter with viariable BB and  $V_T$  options

Next, the effect of PB on performance is evaluated. During the simulations represented by Figure 3.4,  $V_{dd}$  is set to 0.9 V. PBx means the gate length is x nm larger than the minimum allowed value 24 nm by ST's denotation. At a fixed  $V_{dd} = 0.9V$ , increasing the PB results in a decrease in performance for both LVT and RVT devices. At zero BB, increasing the gate length of an LVT device by 16 nm results in a 78.6% increase in  $(t_p)$  while at 1.3 V BB, the difference is down to 72.9%. The effect of PB becomes slightly less significant when BB is applied. RVT devices vary in an opposite manner in terms of performance. Increasing an RVT device's PB amplifies the effect of BB and results in an even larger performance deficit.



Figure 3.4: Propagation delay of a single inverter with variable BB and PB options

The efficiency of UTBB FD-SOI devices can be evaluated using the two following parameters: static power consumption and energy-delay product (EDP). Figure 3.5 demonstrates the normalized static power consumption of LVT versus RVT devices. The RVT devices have superior static power consumption as promised by ST. RBB also drastically reduces the static power consumption by as much as  $10\times$ .



Figure 3.5: Normalized static power consumption of LVT and RVT devices with different  $V_{dd}$ , PB = 0

A transistor consumes more power during the switching process than in idling state. Thus, the switching energy can be found by integrating the total power consumed by the device during a power surge (corresponding to the switching event) minus the idling power. A threshold of 105% steady-state power is used in this research to trigger the integration. Figure 3.6 shows the EDP of LVT devices at different  $V_{dd}$  and  $V_{bb}$ . Increasing the FBB played a vital role in reducing the EDP. It is worth noticing that the optimum point shifts gradually towards low  $V_{dd}$  as  $V_{bb}$  increases, thus for less intense workloads, reducing the  $V_{dd}$  contributes to the improvement of overall performance-energy balance. Opposite effect can be observed on RVT devices. Due to the slow switching time of RVT devices, especially with added RBB, the EDP is significantly worse than LVT devices of the same size.



Figure 3.6: Energy Delay Product of (a) LVT and (b) RVT devices with different BB voltages

As discussed in Chapter 1, the design goal of the DVFS circuit in this thesis is to prioritize performance over energy efficiency, thus LVT devices appears to be the obvious choice for its low switching EDP and low  $t_p$ .

Next, two chains consisting of 45 stages of NAND2, NOR2, NAND3, NOR3 gates and inverters are created as a comparison to the reference inverter chain to examine the scalability of the previous simulation results and serve as an closer resemblance of real-world data paths. Chain 1 consists of NAND2, NOR2 and inverters, while Chain 2 consists of NAND3, NOR3 gates and inverters. The middle section of 5 stages are measured. The results can be seen in Figure 3.7 The simulation results are largely comparable between the three chains, mostly falls within 1% of each other. The maximum difference in the relative propagation delay is 3% observed at 1.1 V  $V_{dd}$  and 1.2 V  $V_{bb}$ . The close proximity of simulation data between the chains guarantees that the simulation data in this chapter can be scaled to larger circuits while accurately predict the performances.



Figure 3.7: Normalized propagation delay of the two LVT gate chains versus reference LVT inverter chain

## Chapter 4

# Scaling Approaches

### 4.1 Voltage Scaling

Adding dynamic back biasing to an IC raises additional challenges in DVFS design. The most obvious one being the extra hardware required to host adjustable back bias voltages. This leads to higher complexity in the physical design process and must be considered when developing the DVFS circuit. On the plus side, the principles behind dynamic supply voltage scaling and dynamic back bias voltage are largely similar: offering multiple voltages and a voltage transition mechanism. It is possible to examine the existing voltage scaling methods for supply voltages, and adapt the best strategies and practices to back biasing.

#### 4.1.1 Previous Work on Voltage Scaling

The most common supply voltage scaling architecture utilizes an adjustable externel voltage regulator usually soldered to the printed circuit board near the chip it serves [33] [34]. This approach is most suitable for chips with simpler architecture and smaller scale and not much additional optimization is required during the chip design process. A single voltage supply is required for the chip with little to no added hardware within the chip to support voltage scaling besides the DVFS control logic. However, as chip complexity grows, the drawbacks of this approach become obvious. Modern processors usually consists of multiple processing units, or cores. Since the

voltage supply is global in this approach, voltage cannot be individually supplied to match a core's performance need. This reduces the power saving potential when a workload utilizes multiple processor cores in an unbalanced manner. As demonstrated in Figure 4.1, a many-core system with high parallelism often has varying performance requirements for each core over time [7] [35]. Another drawback is that communicating with an off-chip DC-DC converter is time and energy consuming. The same advantages and disadvantages apply to the scaling of back bias voltages as well.



Figure 4.1: Workload varies among cores and over time [7]

An improved architecture assigns each processor core to a separate architectural domain and provides the voltages individually within the same package [36]. Sometimes, this approach is applied as an addition to the previously mentioned approach [37]. The external circuit supplies a slightly higher voltage globally, and second-stage fully-integrated voltage regulators (FIVR) that further reduces voltages to target level are built on-chip [38] to reduce the complexity of surrounding circuits and improve scaling speed and efficiency [39] [40]. This approach offers coarse-grain voltage scaling capable of scaling core voltage individually at fine steps, and is widely used in today's consumer-grade central processing unit (CPU) products from Intel, AMD, etc. [41]. Since the FIVRs are built on the same silicon as the processor (fully-integrated), the size and efficiency of them becomes a problem for many applications. The passive elements required for these FIVRs takes up large sizes of valuable silicon area [42], and raises the fabrication cost. The FIVRs handles large currents on chip, accumulating huge amount of heat within the package, imposing stricter thermal limits to the chip. The situation becomes worse when two separate BB voltages are introduced. Each domain now requires three separate sets of FIVRs. This approach has poor scalability and is not practical for highly parallel systems such as KiloCore, a high performance many-core processor array with 1000 small MIMD cores [43].

#### 4.1.2 Voltage Scaling Architecture

A combination of the two previous ideas emerged as an effort to join the best of both worlds. While each core has to have access to variable voltages, the power supply circuit can still be massively simplified by limiting the dynamic voltages to a small subset consisting of only a few values [44]. These voltages are generated by voltage regulators either off-chip to reduce cooling system pressures, or on-chip to reduce package pin count. This approach offers maximum flexibility as a large amount of voltage domains can be supported, and can be easily scaled up or down to fit chips with different architectures. This approach can be found in several many-core chip designs, including AsAP, a 167-core processor platform [45] [46].

By adding additional power rails for multiple BB voltages, this design can easily be converted into a BB-ready DVFS controller. Figure 4.2 demonstrates its DVFS architecture for a single processor core.



Figure 4.2: Dynamic voltage scaling with multiple supply voltage and back bias rails grouped into buses

The supply voltages are provided externally, and the core is connected to only one  $V_{dd}$  rail

at any given moment via a rail switching mechanism controlled by the DVFS controller. The DVFS controller would operate on a separate voltage rail and remain constantly powered to maintain normal operations during the voltage switching process.

This architecture has its drawbacks. A three- $V_{dd}$ , three- $V_{bb}$  design would require ten unique voltage rails, a number resulting in unacceptable area overheads, and causing great troubles in the back-end physical design flow [47]. Therefore, the amount of voltage rails must be limited to a smaller number. Luckily, due to the BOX in the substrate, the back current of a transistor is as low as picoamp level, so the  $V_{bb}$  rails don't have to occupy areas as large as  $V_{dd}$  and GND rails. A total of two  $V_{bb}$  pairs (four rails) is chosen for this thesis to simplify physical design complexity. Additionally, as discussed in §2.3, the FBB zero point for LVT devices is GND for both  $V_{bp}$  and  $V_{bn}$  to avoid forward biasing the substrates. As a result, GND can be treated as the third available  $V_{bb}$  option, providing zero back bias, by adding a large pull-down resistor to the BB grid.

A total of two  $V_{dd}$  rails is suggested in this thesis to cover the entire clock frequency range. The reason behind this is that intermediate clock frequency requirements are not common in real-world applications. Figure 4.3 is a histogram that demonstrates the distribution of core frequency requirements for an Low Density Parity Check (LDPC) decoder application with no throughput loss on a 1000-core processor [8].



Frequency Required (% of Fmax)

Figure 4.3: Histogram of core clock frequency required for LDPC decoding on KiloCore, a 1000-core processor array [8]

The majority of cores require either above 75%, or below 35% of maximum frequency. Thus, optimization of DVFS algorithms should focus on the lower and higher ends of the frequency range.

#### 4.1.3 Power Gates

A processor core as an independent voltage domain doesn't necessarily need to be exposed to the power rails, since all components in the domain share uniform  $V_{dd}$  and  $V_{bb}$ s. The DVFS controller handles the voltage (and frequency) selection and switches the core's internal supply grid as an entirety to different power rails for it. MOS transistors are naturally the best choice for the switching circuits on or off from a power rail, exactly like how they perform in inverters. A technique called power gating uses this method to reduce power consumption by disconnecting from the power supply. By attaching a PMOS between the core's  $V_{dd}$  grid and each global  $V_{dd}$  rail, switching between voltages can be conveniently break down into two steps: switching off the old rail and switching on the new rail. Similar structure can be applied to the  $V_{bp}$  rail while the  $V_{bn}$  rail requires NMOS transistors. In the following discussion, these transistors are referred to as power gates.

However, this mechanism creates multiple problems. The switching process is not seamless, so the core must be halted before the switch and waken up afterwards. Using a single transistor for the voltage switch of a large scale circuit will result in unbalanced voltage distribution across the core voltage grid due to IR drop [48]. The transistor size also matters. Only a very large transistor can provide enough drive currents. In conclusion, using multiple small transistors as power gates is the only viable solution.

The size and amount of these power gates is an important design trade-off. The total drive current of a power gate is proportional to the total width of it. The core is a transient load, meaning that it draws a large current and pulls down the core grid voltage periodically. A significant voltage droop on the core grid is expected when the size of the power gates are too small. The  $V_{dd}$  droop reduces the switching speed of transistors and has a negative impact on the performance of the core. Figure 4.4 shows an FO4 LVT inverter chain with the three stages in the middle powered

by PMOS transistors. Five additional FO1 inverter stages are added to the beginning and the end of the chain to simulate input from an actual circuit. Figure 4.5 is the simulation result at a nominal operation voltage of 1.0 V and a  $V_{bb}$  of 0.6 V that demonstrates the relation between voltage droop, total width of power gates, and transient load decoupling capacitors. The transistor total width in the plot is relative to the size of the measured stage of the inverter chain.

For body bias voltages, due to the very low back current measured throughout, 5% of total  $V_{dd}$  power gate size for each rail is deemed ample and simulation proves this conclusion, with a performance loss of less than 1%.



Figure 4.4: Inverter chain to demonstrate voltage droop on the grid



Figure 4.5: Simulation of voltage droop with various power gate sizes and decoupling capacitors

#### 4.1.4 Supply Voltage Switching

The voltage switching mechanism is described in the next two subsections. The DVFS controller constantly suggests voltages and frequencies based on current workload. For  $V_{dd}$ , the switching process can be briefly broken down into the following steps:

- 1. When the DVFS controller initiates a switch request, the core is stalled.
- 2. Next, the power gates currently in use are shut off.
- 3. The new gates are switched on.
- 4. The core returns to normal operation when the new voltage is ready.

Step 1 is to avoid the core generating unpredictable behaviors during core voltage switching. The core is responsible for returning a *stalled* signal so that the voltage switching process can proceed. Step 2 involves shutting the current power gates off. This is achieved by an internal signal, *power\_off*, that overwrites the control signals to the power gates (power gates are active-low PMOS transistors). Once the off signal reaches the power gates, a *all\_off* signal is sent back to the switcher, and the new gates can be switched on by deactivating *power\_off*. Finally, the stall is lifted and the core resumes normal operation.

```
1 // supply voltage switcher
3 module vddcfg (
      vdd_in,
       stalled,
      all_off,
      power_off,
      vddhi,
       vddlow,
9
       stall
10
11 );
12
           [1:0]
13 input
                    vddin;
14 input
           stalled, all_off;
           power_off, vddhi, vddlow;
15 output
16 output
           stall;
```

The I/O signals from the voltage switching module are (taking  $V_{dd}$  as an example):

where  $vdd_in$  is the suggested  $V_{dd}$  from the DVFS controller encoded in two bits, vddhi/vddlow are signals enabling their corresponding high/low  $V_{dd}$  rails, and *stall* is switcher's stall signal to the core.

#### 4.1.5 Bias Voltage Switching

The back bias voltage switcher is largely similar to the supply voltage switcher with a few signals defined differently. These signals include *vbb\_in* for the suggested BB level from the DVFS controller, and the corresponding output signals to control PMOS and NMOS bias power gates. When switching between BB rails, a stall is not required, since the processor core remains in normal operation as long as the supply voltage lasts. However, the clock frequency must be reduced to a safe level to ensure the core logic functions normally. This is achieved by adding another pair of signals: *slowed* and *slow* that forces the oscillator to reduce the clock frequency.

The I/O signals for the voltage switching module are:

```
1 // back bias voltage switcher
2
3 module vbbcfg (
       vbb_in,
4
       slowed,
       bb_zero,
       bb_off,
7
       vbbph,
8
9
       vbbpl,
       vbbpn,
       vbbnh,
       vbbnl,
13
       vbbnn,
       slow,
14
15 );
16
           [2:0]
17 input
                    vbb_in;
           slowed, bb_zero;
18 input
19 output
           bb_off, vbbphi, vbbplow, vbbpn, vbbnhi, vbbnlow, vbbnn;
20 output
           slow;
```

### 4.2 Voltage and Frequency Selection

The frequency scaling is very straightforward and consists of only two steps.

- 1. A utilization rate estimation is produced based on the feedback data from the processor core.
- 2. A frequency is suggested based on the estimation.

Obtaining information about a processor core's utilization rate is not the focus of this thesis. The proposed DVFS controller expects this information from the processor core. A frequency controller converts the utilization rate to desired frequency and voltages, and passes these data to the oscillator and the voltage switchers. The frequency controller contains a lookup table (LUT) which maps core utilization rate to a series of frequency points supported by a configurable ring oscillator.

Since there are six unique  $V_{dd}/V_{bb}$  voltage setting combinations, the full frequency range supported by the design is divided into six non-overlapping segments, each corresponds to a voltage setting. The five separation points between the segments are the voltage switching points. To prevent undesired behaviors caused by inappropriate frequency targets, the maximum allowed clock frequency of each possible  $V_{dd}/V_{bb}$  voltage setting is established via circuit simulations. The six voltage settings are sorted by their maximum allowed frequency, and a higher voltage setting corresponds to a higher maximum allowed frequency. A voltage switching point between two voltage settings must not exceed the maximum operable frequency of the lower voltage setting. The DVFS controller switches the core to a higher voltage setting if a frequency suggestion is higher than the voltage switching point of the current voltage setting, and to a lower voltage setting if a frequency suggestion is lower than the current voltage switching point. Voltage switching points are adjustable to provide flexibility and portability to future designs.



Figure 4.6: Three voltage settings with their corresponding operable frequency ranges and voltage switching points.

Figure 4.6 is a demonstration of the concept with three voltage settings and two voltage switching points separating the full frequency range into three segments. When the circuit operates at a clock frequency below the voltage switching point, the DVFS controller switches the core to a lower voltage setting to reduce power consumption. When a heavy workload requires higher throughput, the DVFS controller suggests an increased frequency. If the new frequency is higher than the voltage switching point, the DVFS controller switches the core to the high voltage setting to prevent violation of the maximum allowed frequency.

### Chapter 5

## Implementation

#### 5.1 Voltage Switchers

There are two voltage switcher circuits in the design: supply switcher and back bias switcher. The supply switcher circuit is presented in Figure 5.1. The back bias switcher is presented in Figure 5.2. During normal operation, the DVFS controller suggests  $V_{dd}$  and  $V_{bb}$ . The voltages can be overwritten manually with configurable registers,  $vdd\_conf$  and  $vbb\_conf$  respectively. Table 5.1 contains possible configurations for these registers.

Figure 5.3 demonstrates a conceptual design of a PMOS power gate module. This design uses 64 PMOS transistors as power gates for each  $V_{dd}$  supply voltage. The power gates are connected to a multi-level buffer chain to provide switching speed adjustment. This is beneficial when the processor scale is large. Having a slow switch between the voltage rails reduces the IR droop on the voltage grids. A structurally similar design can be used for BB voltages. Since BB requires both positive and negative bias voltages, a total of four power gate modules are required, two of which consists of PMOS transistors working at positive voltages and two consists of NMOS transistors working at negative voltages. A module with NMOS power gates requires a logic level converter to work properly. Figure 5.4 demonstrates one such converter. The resistance and values are manually picked to offset the input signal (0–1 V) to the desired level (-1–0 V) . Register  $pg\_delay\_conf$  controls the switching delay of the modules. The upper six bits are dedicated to the  $V_{bb}$  switching delay, and the lower six bits are for the  $V_{dd}$  switching delay.

Figure 5.5 illustrates the design of a variable length delay buffer chain. Each supply switcher utilizes multiple delay chains to avoid incorrect signal timing. Each delay chain is individually configurable to provide the most suitable delay length for different operating conditions. Figure 5.6 is the signal timing diagram for supply voltage switching.

The signal timing for the back bias switching circuit when switching to and from zero back bias is demonstrated in Figure 5.7. Figure 5.8 is the timing diagram for back bias switching when switching between high and low back bias voltages. When switching both  $V_{bb}$  and  $V_{dd}$  at the same time, the processor core is stalled.

| Register Name     | Bit Range | Value  | Configuration                                    |  |
|-------------------|-----------|--------|--------------------------------------------------|--|
| $vdd\_conf$       | [3:0]     | 01xx   | normal mode                                      |  |
|                   |           | 11xx   | overwrite controller decisions                   |  |
|                   |           | 1110   | switch to low $V_{dd}$                           |  |
|                   |           | 1101   | switch to high $V_{dd}$                          |  |
|                   |           | 1111   | switch off from $V_{dd}$                         |  |
|                   |           | 1100   | switch on both $V_{dd}$ rails - HAZARD           |  |
|                   |           | x0xx   | bypass switching, force apply voltage            |  |
|                   |           | x010   | force apply low $V_{dd}$                         |  |
|                   |           | x001   | force apply high $V_{dd}$                        |  |
|                   |           | x011   | force remove $V_{dd}$                            |  |
|                   |           | x000   | force apply both $V_{dd}$ rails - HAZARD         |  |
| $vbb\_conf$       | [4:0]     | 01xxx  | normal mode                                      |  |
|                   |           | 11xxx  | overwrite controller decisions                   |  |
|                   |           | 11011  | switch to high $V_{bb}$                          |  |
|                   |           | 11101  | switch to low $V_{bb}$                           |  |
|                   |           | 11110  | switch to zero $V_{bb}$                          |  |
|                   |           | 1100x  | switch on both $V_{bb}$ rails - HAZARD           |  |
|                   |           | x0xxx  | bypass switching, force apply voltage            |  |
|                   |           | x001x  | force apply high $V_{bb}$                        |  |
|                   |           | x010x  | force apply low $V_{bb}$                         |  |
|                   |           | x011x  | force zero $V_{bb}$                              |  |
|                   |           | x000x  | force apply both $V_{bb}$ rails - HAZARD         |  |
| $pg\_delay\_conf$ | [11:6]    | XXXXXX | power gate delay settings for $V_{bb}$           |  |
|                   |           |        | setting each bit doubles power gate switch delay |  |
| $pg\_delay\_conf$ | [5:0]     | xxxxxx | power gate delay settings for $V_{dd}$           |  |
|                   |           |        | setting each bit doubles power gate switch delay |  |

Table 5.1: Configuration for the voltage switchers



Figure 5.1: Supply switcher circuit in detail



Figure 5.2: Back bias switcher circuit in detail



Figure 5.3: Multi-level power gate module



Figure 5.4: Passive logic level converter



Figure 5.5: Variable length delay chain



Figure 5.6: Supply switcher circuit timing diagram



Figure 5.7: BB switcher circuit timing diagram when switching to and from zero back bias



Figure 5.8: BB switcher circuit timing diagram when switching between two back bias voltages

#### 5.2 DVFS Controller

The top level block diagram of the DVFS controller is shown in Figure 5.9. A single bit *workload* signal that indicates whether the processor core is idling is expected. Additionally, a register  $clk\_source$  selects the clock signal for the DVFS circuit.



Figure 5.9: Top level block diagram of the DVFS controller

A conceptual ring oscillator design is proposed in Figure 5.10. The oscillator consists of 5 stages. Stage N of the oscillator consists of  $2^N$  ordinary inverters and two tri-state inverters. The tri-state inverters in stage N is controlled by the corresponding N - 1 bit in the osc\_conf register. When osc\_conf[N-1] is set, the inverters in the N-th stage are bypassed, and as a result, the ring oscillator length is shortened. As a result, the total length of the oscillator ring stages are variable from 5 to 67, offering 32 unique frequency settings. Additional fixed delay is added in the form of an even number of inverters to offset the oscillator output frequency to the desired range. Assuming a target maximum frequency of 4 GHz, the oscillator design needs 14 additional inverters according to Spectre simulation.

The workload information is linearly mapped to frequency target with an LUT. Minimum workload corresponds to minimum clock frequency and maximum workload corresponds to maximum clock frequency. The LUT also contains the voltage settings for each possible clock frequency. Table 5.2 is the frequency ranges for each voltage setting used in this design. As explained in Chapter 4, the frequency ranges are manually decided so that the lowest possible voltage setting is always used. The maximum allowed frequencies are rounded up to the next 0.1 GHz with an additional 0.1 GHz safe margin.



Figure 5.10: Variable length ring oscillator

| Maximum Frequency (GHz) | Supply Voltage (V) | Back Bias Voltage (V) |
|-------------------------|--------------------|-----------------------|
| 1.1                     | 0.6                | 0                     |
| 1.4                     | 0.6                | 0.6                   |
| 1.9                     | 0.6                | 1.2                   |
| 3.3                     | 1.0                | 0                     |
| 3.6                     | 1.0                | 0.6                   |
| 4.0                     | 1.0                | 1.2                   |

Table 5.2: Maximum Clock Frequencies for Different Voltage Settings

Figure 5.11 is the workload sensing module. The module consists of the LUT mentioned above. It stores the voltage configurations for all clock frequencies supported by this design. A counter that changes according to *workload* is used to index the LUT to estimate the optimal clock frequency for the processor core. The content of the LUT is shown in Table 5.3. Register  $clk_offset$  shifts the voltage switching points by applying an offset to the LUT's address.

The switching interval counters are demonstrated in Figure 5.12. This module controls the frequency of voltage switching. Register  $sw\_counter$  controls the number of clock cycles the DVFS controller must wait between two switching requests.

Table 5.4 contains all the configuration registers included in the design.



Figure 5.11: Workload sensing module

| $\begin{array}{c ccc} conf\_index & v\_suggest & Voltage Settings (V) \\ \hline 000000 & 10110 & & & \\ \hline 000001 & 10110 & & V_{dd} = 0.6 \\ \hline 000011 & 10110 & & V_{bb} = 0.0 \\ \hline 000100 & 10101 & & & V_{dd} = 0.6 \\ \hline 000110 & 10101 & & V_{dd} = 0.6 \\ \hline 000111 & 10011 & & & V_{bb} = 0.6 \\ \hline 000111 & 10011 & & & V_{dd} = 0.6 \\ \hline 001011 & 10011 & & V_{dd} = 0.6 \\ \hline 001011 & 10011 & & V_{dd} = 0.6 \\ \hline 001011 & 10011 & & V_{dd} = 1.2 \\ \hline 001100 & 01110 & & \\ 001101 & 01110 & & \\ 001101 & 01110 & & \\ 001010 & 01110 & & \\ 010000 & 01110 & & \\ 010001 & 01110 & & \\ 010001 & 01110 & & \\ 010010 & 01110 & & \\ 010010 & 01110 & & \\ 010101 & 01110 & & \\ 010100 & 01110 & & \\ 010100 & 01110 & & V_{dd} = 1.0 \\ \hline 011001 & 01101 & & V_{dd} = 1.0 \\ \hline 011001 & 01101 & & V_{dd} = 1.0 \\ \hline 011100 & 01101 & & V_{dd} = 1.0 \\ \hline 011101 & 01011 & & V_{dd} = 1.0 \\ \hline 011101 & 01011 & & V_{dd} = 1.0 \\ \hline 011101 & 01011 & & V_{dd} = 1.0 \\ \hline 011100 & 01011 & & V_{dd} = 1.0 \\ \hline 011100 & 01011 & & V_{dd} = 1.0 \\ \hline 011100 & 01011 & & V_{dd} = 1.0 \\ \hline 011000 & 01011 & & V_{dd} = 1.0 \\ \hline 011000 & 01011 & & V_{dd} = 1.0 \\ \hline 011001 & 01011 & & V_{dd} = 1.0 \\ \hline 011001 & 01011 & & V_{dd} = 1.0 \\ \hline 011001 & 01011 & & V_{dd} = 1.0 \\ \hline 011001 & 01011 & & V_{dd} = 1.0 \\ \hline 011001 & 01011 & & V_{dd} = 1.0 \\ \hline 011001 & 01011 & & V_{dd} = 1.0 \\ \hline 011001 & 01011 & & V_{dd} = 1.0 \\ \hline 011001 & 01011 & & V_{dd} = 1.0 \\ \hline 011001 & 01011 & & V_{dd} = 1.0 \\ \hline 011001 & 01011 & & V_{dd} = 1.0 \\ \hline 011001 & 01011 & & V_{dd} = 1.0 \\ \hline 010011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 01011 & & V_{dd} = 1.0 \\ \hline 00011 & 00011 & & V_{dd} = 1.0 \\ \hline 00011 & 00011 & & V_{dd} = 1.0 \\ \hline 00011 & 00$ |               |              |                      |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|--------------|----------------------|
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | $conf\_index$ | $v\_suggest$ | Voltage Settings (V) |
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 000000        | 10110        |                      |
| $\begin{array}{c c c c c c c c c c c c c c c c c c c $                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 000001        | 10110        |                      |
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 000010        | 10110        | $V_{dd} = 0.6$       |
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 000011        | 10110        | $V_{bb} = 0.0$       |
| $ \begin{array}{c c c c c c c c c c c c c c c c c c c $                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 000100        | 10101        |                      |
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 000101        | 10101        | $V_{dd} = 0.6$       |
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 000110        | 10101        | $V_{bb} = 0.6$       |
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 000111        | 10011        |                      |
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 001000        | 10011        |                      |
| $\begin{array}{c c c c c c c c c c c c c c c c c c c $                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 001001        | 10011        |                      |
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 001010        | 10011        | $V_{dd} = 0.6$       |
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 001011        | 10011        | $V_{bb} = 1.2$       |
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 001100        | 01110        |                      |
| $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 001101        | 01110        |                      |
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 001110        | 01110        |                      |
| $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 001111        | 01110        |                      |
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 010000        | 01110        |                      |
| $ \begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 010001        | 01110        |                      |
| $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 010010        | 01110        |                      |
| $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 010011        | 01110        |                      |
| $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 010100        | 01110        |                      |
| $ \begin{array}{c c c c c c c c c c c c c c c c c c c $                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 010101        | 01110        |                      |
| $ \begin{array}{c c c c c c c c c c c c c c c c c c c $                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 010110        | 01110        |                      |
| $ \begin{array}{c c c c c c c c c c c c c c c c c c c $                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 010111        | 01110        |                      |
| $\begin{tabular}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 011000        | 01110        | $V_{dd} = 1.0$       |
| $ \begin{array}{c c c c c c c c c c c c c c c c c c c $                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 011001        | 01110        | $V_{bb} = 0.0$       |
| $\begin{tabular}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 011010        | 01101        |                      |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 011011        | 01101        | $V_{dd} = 1.0$       |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 011100        | 01101        | $V_{bb} = 0.6$       |
| $ \begin{array}{c cccc} 011111 & 01011 \\ 100000 & 01011 \\ 100001 & 01011 \\ 100010 & 01011 \end{array} \\ V_{dd} = 1.0 \\ \end{array} $                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 011101        | 01011        |                      |
| $ \begin{array}{c cccc} 100000 & 01011 \\ 100001 & 01011 \\ 100010 & 01011 \end{array} & V_{dd} = 1.0 \\ \end{array} $                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 011110        | 01011        |                      |
| $\begin{array}{c cccc} 100001 & & 01011 \\ 100010 & & 01011 \end{array} & & V_{dd} = 1.0 \end{array}$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 011111        | 01011        |                      |
| 100010 01011 $V_{dd} = 1.0$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 100000        | 01011        |                      |
| uu                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 100001        | 01011        |                      |
| $100011 \qquad 01011 \qquad V_{bb} = 1.2$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 100010        | 01011        | $V_{dd} = 1.0$       |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 100011        | 01011        | $V_{bb} = 1.2$       |

Table 5.3: LUT contents



Figure 5.12: Voltage switch delay counter

| Register Name | Bit Range | Value  | Configuration                                           |  |
|---------------|-----------|--------|---------------------------------------------------------|--|
| clk_source    | [1:0]     | 00     | use DVFS clock                                          |  |
|               |           | 01     | use processor clock                                     |  |
|               |           | 1x     | disable clock                                           |  |
| osc_conf      | [4:0]     | XXXXX  | oscillator frequency setting                            |  |
|               |           | 00000  | minimum oscillator frequency                            |  |
|               |           | 11111  | maximum oscillator frequency                            |  |
| $clk\_offset$ | [1:0]     | XX     | voltage switching point offset                          |  |
|               |           | 00     | decrease all voltage switching frequencies by two steps |  |
|               |           | 01     | decrease all voltage switching frequencies by one step  |  |
|               |           | 10     | default voltage switching points                        |  |
|               |           | 11     | increase all voltage switching frequencies by one step  |  |
| sw_counter    | [5:0]     | XXXXXX | voltage switching interval                              |  |
|               |           |        | in number of clock cycles                               |  |

Table 5.4: DVFS configuration registers

### Chapter 6

## Simulation Results

### 6.1 DVFS Controller Functionality

The DVFS controller design is implemented in Verilog. The functionality of it is verified using Icarus Verilog compiler. Figure 6.1 demonstrates how the output signals change corresponding to the specified input signals. The delays are set manually for convenient demonstration of the signals; in actual hardware implementations, they must be set to appropriate values to avoid timing conflicts. As shown in the figure, the power gates on the low  $V_{dd}$  rail are enabled initially  $(vdd\_in = 10)$ . Output signals to the power gates  $vdd\_out$  are first switched off  $(vdd\_out = 11)$ then switched back on to the new voltage configuration.



Figure 6.1: Simulated waveform of a  $V_{dd}$  switching process

The  $V_{bb}$  rail functions similarly, as demonstrated in Figures 6.2–6.4. The outputs match the expected timing constraints as illustrated in Figure 5.7 and 5.8.



Figure 6.2: Simulated waveform of switching between two non-zero  $V_{bb}$  values



Figure 6.3: Simulated waveform of switching to zero  $V_{bb}$ 



Figure 6.4: Simulated waveform of switching from zero  $V_{bb}$  to a non-zero value

Figure 6.5 demonstrates the output of the logic level converter in Figure 5.4.



Figure 6.5: Simulated waveform of logic level converter from 1 V to -1 V

#### 6.2 DVFS Effectiveness on 28 nm UTBB FD-SOI

The effectiveness of implementing a DVFS mechanism can be estimated by comparing the energy consumption of executing the same task with and without DVFS. An Intel Core i7-8700K desktop processor is chosen as the reference of the comparisons in this chapter. The i7-8700K is a hexa-core processor with individually configurable core clock frequencies and DVFS. The base clock frequency of the processor is 3.7 GHz for all cores, but to make the results comparable to this work, the clock frequencies are manually capped at 4.0 GHz, the same maximum frequency as the proposed design. The processor is set to 4.0 GHz and 1.1 V while simulating a non-DVFS processor. Multiple applications are tested on this processor, and the processor activity is captured using Intel VTune Profiler, a multi-platform performance tuning tool which can collect various system metrics. The metrics of the processor is sampled at a fixed 10 ms step to reduce performance overhead introduced by data collection.

To demonstrate how implementing DVFS on 28 nm UTBB FD-SOI technology impacts processor energy efficiency, a circuit similar to the one shown in Figure 4.4 is simulated while the input is toggled at various frequencies. Voltages are set according to Table 5.2 and the energy consumption is recorded with a 10 ms sampling interval. A Python script scales the simulation data according to the recorded clock frequencies of the i7-8700K processor to calculate the estimated total energy consumed during an application.

Five typical daily applications are selected to estimate the energy saving potential of DVFS. The five tested workload applications are:

- Webpage rendering with a total of 21 webpages.
- 10 GiB file copying.
- Hardware-accelerated H.264 video playback.
- FLAC audio encoding.
- LZMA2 file compression.

The physical core frequencies and total power usage of i7-8700K while rendering webpages are as shown in Figure 6.6. Opening a webpage creates a short burst of workload that pulls the processor core frequencies up. The core clock frequencies are exported to the Python script to estimate the energy consumption of the same processor implemented with 28 nm UTBB FD-SOI technology.



Figure 6.6: Core clock frequencies of i7-8700K processor while opening a webpage

Intel VTune Profiler reports a total energy consumption of 6.29 kJ. The same webpage rendering test is executed again, with fixed 1.1 V supply voltage and 4 GHz clock frequency. The total energy consumption reported by Intel VTune Profiler is 7.05 kJ. Thus, the DVFS mechanism on the i7-8700K processor results in a 10.8% reduction in energy consumption. The execution time is 135.9 s without DVFS and 141.0 s with DVFS, which means a 3.6% loss in throughput.

Throughput is measured by the execution time of applications. The relationship between core clock frequency, execution time, and average power consumption is revealed in Table 6.1. However, the test H.264 video is always played at the required frame rate, so the execution time remains approximately the same.

| Core Clock Frequency (GHz) | Execution Time (s) | Average Power Usage (W) |
|----------------------------|--------------------|-------------------------|
| 4.8                        | 45.7               | 93.2                    |
| 4.0                        | 55.3               | 56.3                    |
| 3.0                        | 74.8               | 29.6                    |



Other applications are also tested, and the energy consumption data are shown in Figure 6.7 and Figure 6.8. The results are average values over five runs. On Intel i7-8700K, applying DVFS results in an average of 15.7% energy saving across four applications. The calculated results show that applying DVFS to 28 nm UTBB FD-SOI technology reduces energy consumption by 20.5% on average.

On intensive tasks such as LZMA2 compression, a processor core can rarely drop its

frequency without significantly lowering the throughput. The significance of DVFS in these applications are minimal, with less than 1% reduction in energy consumption is observed in the tests. On the other hand, H.264 decoding is mostly done by hardware video accelerators in modern graphics processing unit (GPU) [49]. The workload on CPU is very low. In the tests, the minimum processor core frequency recorded by VTune is as low as 1.0 GHz. Processors can benefit from huge energy savings from lowered DVFS in such circumstances. Lowered clock frequency and core supply voltage contributes to a 46.4% reduction in core energy consumption on the tested i7-8700K processor.

With 28 nm UTBB FD-SOI technology, the supply voltage of a processor is adjustable in a wide range. The processor can run at one-third its maximum allowed clock frequency at 0.6 V without worrying about inducing errors according to Spectre simulation results. More significant energy saving is possible due to the relatively low supply voltage. The same H.264 video playback task demonstrates the energy-efficiency potential of the technology as only 43.7% energy is needed compared to no-DVFS runs.



Relative Energy Consumption of i7-8700K with DVFS

Figure 6.7: Relative energy consumption of i7-8700K processor with DVFS



Figure 6.8: Relative energy consumption of equivalent 28 nm UTBB FD-SOI processor with DVFS

Figure 6.9 compares the Energy  $\times$  Time Product when executing the test applications (H.264 video playback is a real-time application and is excluded from this comparison). Numbers are normalized to 4 GHz without DVFS. Energy  $\times$  Time Product covers both energy consumption and throughput of a processor.



Normalized Energy Time Product

Figure 6.9: Normalized energy time product of processors with DVFS

The execution times of the four applications imply a performance overhead of as high as 3.7% caused by the i7-8700K's DVFS mechanism. On average, 28 nm UTBB FD-SOI can reduce the Energy × Time Product by 8.8% compared to a no-DVFS design. In comparison, i7-8700K has an average reduction of 5.2% in normalized Energy × Time Product.

### Chapter 7

# Conclusion

#### 7.1 Summary

This thesis introduces back biasing into the design of a DVFS circuit to provide further flexibility in performance and power tuning. The controller decides the optimal clock frequency based on workload information, and dictates the selection of  $V_{dd}$  and  $V_{bb}$ . BB adjustment has a substantial effect on both transistor switching speed and leakage power, enabling finer DVFS adjustments. Traditional approaches would require a high resolution voltage supply scheme to optimize the power consumption at different clock frequencies, while this thesis presents a simpler design with several fixed voltages to achieve similar effect. Adding BB support is possible with a small power and area overhead, which made it a suitable solution for many low power applications, such as wearable devices and emergency equipment.

#### 7.2 Future Work

Due to time and resource limitations, results in this thesis are scaled to 28 nm UTBB FD-SOI to avoid the complex physical design and verification process. Further design and tests on actual silicon would help evaluate the power saving effectiveness of this design. This project paves the way towards many further research opportunities. Here are some possible improvements on the design that has yet to be done due to time and resource limitations:

- The thesis demonstrates the design of a DVFS controller based on a 28 nm technology. The effect of BB scaling in more advanced processes remains unknown.
- The rapid development of machine learning opens up possibilities to design smart DVFS controllers that observe workload patterns and make predictive decisions.
- A global DVFS supervisor with workload balancing on a multi-core processor can take power management into consideration when assigning tasks to its cores to reduce the chip's overall power consumption.
- Introducing a global voltage scaling controller that varies the voltages on the  $V_{dd}$  rail can further improve the effectiveness of DVFS [11]. A global  $V_{bb}$  selector that controls the  $V_{bb}$ input voltages may lead to more flexibility in fine-tuning the performance and energy efficiency of multi-core processors.

## Glossary

- **BB** Back Biasing. Biasing voltage applied to the "back" side of FD-SOI transistors.
- BOX Buried OXide. The insulator layer of SOI substrates.
- **DVFS** Dynamic Voltage and Frequency Scaling. Runtime adjustments of clock frequency and supply voltage.
- **EDP** Energy-Delay Product. The product of energy spent on a switching event and its duration. Serves as a metric of trade-off between speed and efficiency.
- FBB Forward BB. Positive back biasing voltage that improves transistor switching speed.
- **FD-SOI** Fully Depleted SOI. Transistors with thin non-doped channels.
- **FIVR** Fully-Integrated Voltage Regulators. Voltage regulators that are implemented on-chip to improve efficiency and voltage transition speed.
- GND Ground. The voltage reference point at 0 volt that the circuit is connected to.
- **GP** Ground Plane. The doped substrate under the BOX layer of a transistor.
- **IC** Integrated Circuit. Circuit built on a piece of semiconductor material.
- LUT Lookup Table. An indexed array used to replace expensive computation.
- **LVT** Low Threshold Voltage. A low threshold voltage variant of transistors and standard cells that provides maximum performance.

- **MDL** Measurement Description Language. A scripting language developed by Cadence aimed at improving circuit simulation productivity.
- MOS Metal-Oxide-Semiconductor. The most common IC transistor structure.
- **PB** Poly Bias. An alternative name for the gate length options offered by UTBB FD-SOI technology.
- **RBB** Reverse BB. Negative biasing voltage that sacrifices speed for power consumption.
- **RVT** Regular Threshold Voltage. A relatively high threshold voltage variant of transistors and standard cells that provides better efficiency.
- **SCE** Short Channel Effects. A combination of physical effects that negatively impact transistor characteristics.
- **SI** Système International. The International System of Units which is widely used in scientific and industrial environments.
- SoC System-on-Chip. IC with multiple function units built on the same chip.
- **SOI** Silicon-on-Insulator. Silicon substrate with a insulator layer under the surface to improve electrostatic characteristics.
- **UTBB** Ultra-Thin Body and BOX. A process technology from STMicroelectronics that combines thin transistor channel and BOX.

# Bibliography

- Philippe Magarshack, Philippe Flatresse, and Giorgio Cesana. UTBB FD-SOI: A process/design symbiosis for breakthrough energy-efficiency. In 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 952–957, 2013.
- [2] F. Arnaud, N. Planes, O. Weber, V. Barral, S. Haendler, P. Flatresse, and F. Nyer. Switching energy efficiency optimization for advanced CPU thanks to UTBB technology. In 2012 International Electron Devices Meeting, pages 3.2.1–3.2.4, 2012.
- [3] Bertrand Pelloux-Prayer, Milovan Blagojević, Olivier Thomas, Amara Amara, Andrei Vladimirescu, Borivoje Nikolić, Giorgio Cesana, and Philippe Flatresse. Planar fully depleted SOI technology: The convergence of high performance and low power towards multimedia mobile applications. In 2012 IEEE Faible Tension Faible Consommation, pages 1–4, 2012.
- [4] C Fenouillet-Beranger, O Thomas, P Perreau, J.-P Noel, A Bajolet, S Haendler, L Tosti, S Barnola, R Beneyton, C Perrot, C de Buttet, F Abbate, F Baron, B Pernet, Y Campidelli, L Pinzelli, P Gouraud, M Cassé, C Borowiak, O Weber, F Andrieu, S Denorme, F Boeuf, O Faynot, T Skotnicki, K K Bourdelle, B Y Nguyen, and F Boedt. Efficient multi-VT FDSOI technology with UTBOX for low power circuit design. In 2010 Symposium on VLSI Technology, pages 65–66, 2010.
- [5] Ramiro Taco, Itamar Levi, Alex Fish, and Marco Lanuzza. Exploring back biasing opportunities in 28nm UTBB FD-SOI technology for subthreshold digital design. In 2014 IEEE 28th Convention of Electrical & Electronics Engineers in Israel (IEEEI), pages 1–4, 2014.

- [6] Cadence Design Systems. Spectre Circuit Simulator Measurement Description Language User Guide and Reference, 2018.
- [7] Bin Liu, Brent Bohnenstiehl, and Bevan M. Baas. Scalable hardware-based power management for many-core systems. In 2014 48th Asilomar Conference on Signals, Systems and Computers, pages 1834–1838, 2014.
- [8] Bevan Baas, Brent Bohnenstiehl, and Jin Cui. Exploration of fine-grain body bias control in many-core processor arrays. In 2018 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), pages 1–3, 2018.
- [9] M. Horowitz, E. Alon, D. Patil, S. Naffziger, Rajesh Kumar, and K. Bernstein. Scaling, power, and the future of CMOS. In *IEEE International Electron Devices Meeting*, 2005. *IEDM Technical Digest.*, pages 7 pp.–15, 2005.
- [10] Qing Wu, M. Pedram, and Xunwei Wu. Clock-gating and its application to low power design of sequential circuits. *IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications*, 47(3):415–420, 2000.
- [11] Bin Liu. Energy-Efficient Computing with Fine-Grained Many-Core Systems. PhD thesis, University of California, Davis, Davis, CA, USA, September 2016. http://vcl.ece.ucdavis. edu/pubs/theses/2016-1/.
- [12] Marcus T Schmitz and Bashir M Al-Hashimi. Energy minimisation for processor cores using variable supply voltages. 2000.
- [13] T. Chen and S. Naffziger. Comparison of adaptive body bias (ABB) and adaptive supply voltage (ASV) for improving delay and leakage under the presence of process variation. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 11(5):888–899, 2003.
- [14] Yoni Aizik and Avinoam Kolodny. Exploration of energy-delay tradeoffs in digital circuit design. In 2008 IEEE 25th Convention of Electrical and Electronics Engineers in Israel, pages 001–005, 2008.

- [15] Shekhar Beaverton Borkar. Low power design challenges for the decade. Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat. No.01EX455), pages 293–296, 2001.
- [16] Jean-Philippe Noel, Olivier Thomas, Marie-Anne Jaud, Olivier Weber, Thierry Poiroux, Claire Fenouillet-Beranger, Pierrette Rivallin, Pascal Scheiblin, François Andrieu, Maud Vinet, Olivier Rozeau, Frédéric Boeuf, Olivier Faynot, and Amara Amara. Multi-v<sub>T</sub> UTBB FDSOI device architectures for low-power CMOS circuit. *IEEE Transactions on Electron Devices*, 58(8):2473–2482, 2011.
- [17] George Sery, Shekhar Borkar, and Vivek De. Life is CMOS: why chase the life after? In Proceedings of the 39th Annual Design Automation Conference, pages 78–83, 2002.
- [18] Milind Gautam and Shyam Akashe. Transistor gating: Reduction of leakage current and power in full subtractor circuit. In 2013 3rd IEEE International Advance Computing Conference (IACC), pages 1514–1518, 2013.
- [19] R.W. Brodersen, A. Chandrakasan, and S. Sheng. Design techniques for portable systems. In 1993 IEEE International Solid-State Circuits Conference Digest of Technical Papers, pages 168–169, 1993.
- [20] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada. 1-V power supply high-speed digital circuit technology with multithreshold-voltage CMOS. *IEEE Journal of Solid-State Circuits*, 30(8):847–854, 1995.
- [21] R.H. Dennard, F.H. Gaensslen, Hwa-Nien Yu, V.L. Rideout, E. Bassous, and A.R. LeBlanc. Design of ion-implanted MOSFET's with very small physical dimensions. *IEEE Journal of Solid-State Circuits*, 9(5):256–268, 1974.
- [22] Aaron Stillmaker, Zhibin Xiao, and Bevan Baas. Toward more accurate scaling estimates of CMOS circuits from 180 nm to 22 nm. Technical Report ECE-VCL-2011-4, VLSI Computation Lab, ECE Department, University of California, Davis, December 2011. http://www.ece. ucdavis.edu/cerl/techreports/2011-4/.

- [23] A. Stillmaker and B. Baas. Scaling equations for the accurate prediction of CMOS device performance from 180 nm to 7 nm. Integration, the VLSI Journal, 58:74-81, 2017. http: //vcl.ece.ucdavis.edu/pubs/2017.02.VLSIintegration.TechScale/.
- [24] A. Chin and S.R. McAlister. The power of functional scaling: beyond the power consumption challenge and the scaling roadmap. *IEEE Circuits and Devices Magazine*, 21(1):27–35, 2005.
- [25] J.A. Hutchby, G.I. Bourianoff, V.V. Zhirnov, and J.E. Brewer. Extending the road beyond CMOS. *IEEE Circuits and Devices Magazine*, 18(2):28–41, 2002.
- [26] Anterpreet Gill, Charu Madhu, and Pardeep Kaur. Investigation of short channel effects in Bulk MOSFET and SOI FinFET at 20nm node technology. In 2015 Annual IEEE India Conference (INDICON), pages 1–4, 2015.
- [27] N. Planes, O. Weber, V. Barral, S. Haendler, D. Noblet, D. Croain, M. Bocat, P.-O. Sassoulas, X. Federspiel, A. Cros, A. Bajolet, E. Richard, B. Dumont, P. Perreau, D. Petit, D. Golanski, C. Fenouillet-Béranger, N. Guillot, M. Rafik, V. Huard, S. Puget, X. Montagner, M.-A. Jaud, O. Rozeau, O. Saxod, F. Wacquant, F. Monsieur, D. Barge, L. Pinzelli, M. Mellier, F. Boeuf, F. Arnaud, and M. Haond. 28nm FDSOI technology platform for high-speed low-voltage digital applications. In 2012 Symposium on VLSI Technology (VLSIT), pages 133–134, 2012.
- [28] Bertrand Pelloux-Prayer, Milovan Blagojević, Sebastien Haendler, Alexandre Valentian, Amara Amara, and Philippe Flatresse. Performance analysis of multi-VT design solutions in 28nm UTBB FD-SOI technology. In 2013 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), pages 1–2, 2013.
- [29] Davide Rossi, Antonio Pullini, Igor Loi, Michael Gautschi, Frank Kagan Gurkaynak, Adam Teman, Jeremy Constantin, Andreas Burg, Ivan Miro-Panades, Edith Beignè, Fabien Clermidy, Fady Abouzeid, Philippe Flatresse, and Luca Benini. 193 MOPS/mW @ 162 MOPS, 0.32V to 1.15V voltage range multi-core accelerator for energy efficient parallel and sequential digital processing. In 2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX), pages 1–3, 2016.

- [30] A. Chaudhry and M.J. Kumar. Controlling short-channel effects in deep-submicron SOI MOS-FETs for improved reliability: a review. *IEEE Transactions on Device and Materials Reliability*, 4(1):99–109, 2004.
- [31] Ali Mohsen, Adnan Harb, Nathalie Deltimple, and Abraham Serhane. 28-nm UTBB FD-SOI vs. 22-nm Tri-Gate FinFET review: A designer guide—part I. volume 08, page 93–110. Scientific Research Publishing, Inc., 2017.
- [32] X. Federspiel, D. Angot, M. Rafik, F. Cacho, A. Bajolet, N. Planes, D. Roy, M. Haond, and F. Arnaud. 28nm node bulk vs FDSOI reliability comparison. In 2012 IEEE International Reliability Physics Symposium (IRPS), pages 3B.1.1–3B.1.4, 2012.
- [33] T.D. Burd, T.A. Pering, A.J. Stratakos, and R.W. Brodersen. A dynamic voltage scaled microprocessor system. *IEEE Journal of Solid-State Circuits*, 35(11):1571–1580, 2000.
- [34] L.S. Nielsen, C. Niessen, J. Sparso, and K. van Berkel. Low-power operation using self-timed circuits and adaptive scaling of the supply voltage. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 2(4):391–397, 1994.
- [35] B. Bohnenstiehl Bin Liu and Bevan M. Baas. Scalable hardware-based power management for many-core systems. In *IEEE Asilomar Conference on Signals, Systems and Computers* (ACSSC), Nov. 2014.
- [36] Soheil Ghiasi Bin Liu, Mohammad H. Foroozannejad and Bevan M. Baas. Optimizing power of many-core systems by exploiting dynamic voltage, frequency and core scaling. In *IEEE International Midwest Symposium on Circuits and Systems (MWSCAS)*, Aug. 2015.
- [37] Noah Sturcken, Michele Petracca, Steven Warren, Paolo Mantovani, Luca P. Carloni, Angel V. Peterchev, and Kenneth L. Shepard. A switched-inductor integrated voltage regulator with nonlinear feedback and network-on-chip load in 45 nm SOI. *IEEE Journal of Solid-State Circuits*, 47(8):1935–1945, 2012.
- [38] Per Hammarlund, Alberto J. Martinez, Atiq A. Bajwa, David L. Hill, Erik Hallnor, Hong Jiang, Martin Dixon, Michael Derr, Mikal Hunsaker, Rajesh Kumar, Randy B. Osborne, Ravi Rajwar,

Ronak Singhal, Reynold D'Sa, Robert Chappell, Shiv Kaushik, Srinivas Chennupaty, Stephan Jourdan, Steve Gunther, Tom Piazza, and Ted Burton. Haswell: The fourth-generation Intel Core processor. *IEEE Micro*, 34(2):6–20, 2014.

- [39] Waclaw Godycki, Christopher Torng, Ivan Bukreyev, Alyssa Apsel, and Christopher Batten. Enabling realistic fine-grain voltage scaling with reconfigurable power distribution networks. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pages 381–393, 2014.
- [40] Xin Zhan, Jianhao Chen, Edgar Sánchez-Sinencio, and Peng Li. Power management for multicore processors via heterogeneous voltage regulation and machine learning enabled adaptation. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 27(11):2641–2654, 2019.
- [41] Bilge Acun, Kavitha Chandrasekar, and Laxmikant V. Kale. Fine-grained energy efficiency using per-core DVFS with an adaptive runtime system. In 2019 Tenth International Green and Sustainable Computing Conference (IGSC), pages 1–8, 2019.
- [42] Edward A. Burton, Gerhard Schrom, Fabrice Paillet, Jonathan Douglas, William J. Lambert, Kaladhar Radhakrishnan, and Michael J. Hill. FIVR — fully integrated voltage regulators on 4th generation Intel Core SoCs. In 2014 IEEE Applied Power Electronics Conference and Exposition - APEC 2014, pages 432–439, 2014.
- [43] Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, and Bevan Baas. A 5.8 pJ/Op 115 billion Ops/sec, to 1.78 trillion Ops/sec 32 nm 1000-processor array. In 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), pages 1–2, 2016.
- [44] Wayne H. Cheng and Bevan M. Baas. Dynamic voltage and frequency scaling circuits with two supply voltages. In *IEEE International Symposium on Circuits and Systems (ISCAS)*, pages 1236–1239, May 2008.
- [45] Dean N. Truong, Wayne H. Cheng, Tinoosh Mohsenin, Zhiyi Yu, Anthony T. Jacobson, Gouri Landge, Michael J. Meeuwsen, Christine Watnik, Anh T. Tran, Zhibin Xiao, Eric W. Work,

Jeremy W. Webb, Paul V. Mejia, and Bevan M. Baas. A 167-processor computational platform in 65 nm CMOS. *IEEE Journal of Solid-State Circuits*, 44(4):1130–1144, 2009.

- [46] D. Truong, W. Cheng, T. Mohsenin, Zhiyi Yu, T. Jacobson, G. Landge, M. Meeuwsen, C. Watnik, P. Mejia, Anh Tran, J. Webb, E. Work, Zhibin Xiao, and B. Baas. A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling. In VLSI Circuits, 2008 IEEE Symposium on, June 2008.
- [47] Christi Tain. Back-end physical design flow for 28 nm fdsoi with body-bias. Master's thesis, University of California, Davis, CA, USA, September 2019. http://vcl.ece.ucdavis.edu/ pubs/theses/2019-2.ctain/.
- [48] Dohyeon Lee, Heecheol Hwang, Hyunteck Oh, and Yongchan James Ban. Mitigating IR-drop with design technology co-optimization for sub-nanometer node technology. In 2021 18th International SoC Design Conference (ISOCC), pages 1–2, 2021.
- [49] Huifang Deng, Chunhui Deng, and Jingjing Li. GPU-based real-time decoding technique for high-definition videos. In 2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pages 186–190, 2012.