## **UC San Diego UC San Diego Electronic Theses and Dissertations**

### **Title**

Energy-Efficient Circuits for IoT Systems

**Permalink** <https://escholarship.org/uc/item/4hn019wv>

**Author** Wang, Xiaoyang

**Publication Date** 2021

Peer reviewed|Thesis/dissertation

### UNIVERSITY OF CALIFORNIA SAN DIEGO

### Energy-Efficient Circuits for IoT Systems

A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy

in

Electrical Engineering (Electronic Circuits and Systems)

by

Xiaoyang Wang

Committee in charge:

Professor Patrick P. Mercier, Chair Professor Gert Cauwenberghs Professor Drew A. Hall Professor Tzu-Chien Hsueh Professor Dean M. Tullsen

Copyright Xiaoyang Wang, 2021 All rights reserved.

<span id="page-3-0"></span>The dissertation of Xiaoyang Wang is approved, and it is acceptable in quality and form for publication on microfilm and electronically.

University of California San Diego

2021

### DEDICATION

<span id="page-4-0"></span>To my parents and sister.



<span id="page-5-0"></span>



### LIST OF FIGURES

<span id="page-7-0"></span>









### LIST OF TABLES

<span id="page-12-0"></span>

#### ACKNOWLEDGEMENTS

<span id="page-13-0"></span>When I look back to an experience in the past, there will always be some background color in the memory, and the memory of the life at San Diego usually has a bright background color. At first I thought it is because of the nice weather and sunshine, but then I realized that it is the people I met there. Over the past years at UCSD, I have received lots of help and encouragement from many people, and I would like to give my greatest thanks to my Ph.D. advisor, Professor Patrick Mercier, for providing me the opportunity to do research on the state-of-the-art topics and the guidance through the entire journey. Professor Mercier is very smart, knowledgeable and supportive, and always does his utmost to support the students. It is with his invaluable guidance and excellent insight that the design can evolve from some initial ideals to a practical and good chip. I learnt quite much not only about technical related knowledge but also how to think and solve a problem, which I think is the most valuable thing I have gained from my Ph.D. study. I would also like to thank all the professors in my committee: Professor Gert Cauwenberghs, Professor Drew Hall, Professor Tzu-Chien Hsueh and Professor Dean Tullsen, for taking the precious time to be my committee members and for their valuable input and feedback.

This journey also can not be finished without the help from my colleagues in the EEMS group. I really enjoy and learn a lot from the discussion with you. I would like to thank Hui Wang, Dhon Lee, Cooper Levy, Jiwoong Park, Somayeh Imani, Julian Warchall, Po-Han Peter Wang, Bao Lam, Jiannan Jason Huang, Abdullah Abdulslam, Hossein Rahmanian Kooshkaki and Nader Fathy for their help and support.

The Ph.D. life would never be so colorful without the friends who share the joy and sorrow with me. I would like to thank Changtian Wang, Yaocheng Wang, Da Yin, Xiahan Zhou, Haowei Jiang, Jiapeng Zhang, Yincong Li and Jiayi Dong. It is so nice to have you during this journey and I think I don't need to talk much since you have already know how much this friendship means for us, and it is our friendship that makes San Diego a unique place for me.

Last but definitely not the least, I would like to thank my family, my mother Huifeng Xiao, my father Chengye Wang and my sister Xiaoping Wang. You always support my decision, from when I went to a university in a different city to a university in a different country and continent. To my mom, the stories that you told me when I was a child helped me to become who I am and how I think about life and the world, and I think that is the seed that finally helps me to go to a different country to see a bigger world.

The material in this dissertation is based on the following papers which are either published or preparation for publication.

Chapter 2, is based on and mostly a reprint of materials from Xiaoyang Wang and Patrick P. Mercier, "A Charge-Pump-based Digital LDO Employing an AC-Coupled High-Z Feedback Loop Towards a sub-4fs FoM and a 105,000x Stable Dynamic Current Range," in IEEE Custom Integrated Circuits Conference, Apr. 2019, and Xiaoyang Wang and Patrick P. Mercier, "A Dynamically High-Impedance Charge-Pump-Based LDO With Digital-LDO-Like Properties Achieving a Sub-4-fs FoM," in IEEE Journal of Solid-State Circuits, Mar., 2020. The dissertation author was the primary investigator and author of these papers.

Chapter 3, in part, is based on materials from Xiaoyang Wang and Patrick P. Mercier, "An 11.1nJ-Start-up 16/20MHz Crystal Oscillator with Multi-Path Feedforward Negative Resistance Boosting and Optional Dynamic Pulse Width Injection" in IEEE Custom Integrated Circuits Conference, Mar. 2020. The dissertation author was the primary investigator and author of these papers.

Chapter 4, is based on and mostly a reprint of the materials from "A 5.5nW Battery-Powered Wireless Ion Sensing System," in Proc. IEEE European Solid-State Circuits Conference (ESSCIRC), Sep. 2017 by Hui Wang, Xiaoyang Wang, Jiwoong Park, Abbas Barfidokht, Joseph Wang, and Patrick Mercier, and "A Battery-Powered Wireless Ion Sensing System Consuming 5.5 nW of Average Power," IEEE Journal of Solid-State Circuits (JSSC), Apr. 2018 by Hui Wang, Xiaoyang Wang, Abbas Barfidokht, Jiwoong Park, Joseph Wang, and Patrick Mercier. The dissertation author designed the analog front-end, wireless transmitter and DC-DC converter in the system, was the primary investigator and author of this paper, and co-authors have approved the use of the material for this dissertation.

Xiaoyang Wang Santa Clara, CA June, 2021

### VITA

<span id="page-16-0"></span>

#### <span id="page-17-0"></span>ABSTRACT OF THE DISSERTATION

#### Energy-Efficient Circuits for IoT Systems

by

Xiaoyang Wang

Doctor of Philosophy in Electrical Engineering (Electronic Circuits and Systems)

University of California San Diego, 2021

Professor Patrick P. Mercier, Chair

Internet of things (IoT) has greatly improved our understanding and control of the world. By deploying sensors to the environment, different types of environmental parameters can be sensed and with proper processing of the data, such as artificial intelligence, we can observe and control at both the top-level and detail. However, various IoT applications also pose lots of challenges for the integrated circuit design, especially for the power efficiency, speed and size. Solving these problems are the keys to turn the ideas into real and practical design and improve the user experience.

In this dissertation, the circuits and systems design of the IoT sensor is presented and discussed. The power management unit (PMU) is an important block in an IoT sensing system, which determines the maximum performance of the circuit. For lots of applications where it is hard to replace the battery and has limited energy source, the efficiency of the PMU also significantly affects the system lifetime. A fast-responsetime, high-power-efficiency and dynamic-range change-pump-based LDO is proposed to solve the trade-off between speed, power consumption and stability. The event-driven mechanism and AC-coupled high-Z feedback loop enable fast detection and response speed with low power. The charge-pump with single power transistor architecture helps the LDO achieve a high dynamic range and low ripple over the entire load range. In addition to the power management unit, the clock generation circuit also significantly affects the system performance and power efficiency. Several techniques are proposed to achieve an energy-efficient fast start-up. With the multi-path feedforward negative resistance boosting and dynamic pulse-width injection technique, the start-up time of different frequency XOs can be greatly reduced without a precise injection oscillator. Last, a battery-powered ion-sensing platform is presented and discussed.

## <span id="page-19-0"></span>Chapter 1

## Introduction

## <span id="page-19-1"></span>1.1 Motivation for IoT

Information and silicon technology have significantly changed the human life and will continue to fuel the next evolution. The three industrial revolutions have greatly improved people's living and changed the society. In the first industrial revolution, water and steam power are used to mechanize production. In the second industrial revolution, electric power is used to create mass production and the Third used electronics and information technology to automate production. Now, even though we have no conclusion what techniques will bring the next technological revolution, artificial intelligence (AI) and internet of things (IoT) are considered to be quite promising [\[1\]](#page-118-1). As one of the most important approaches of AI, machine learning needs lots of data to train its model and make it accurate, and the IoT sensor platform is a very good source of these data. IoT sensor nodes deployed at different places can provide data to help train the AI, and in turn, AI can process these data and dynamically control the IoT devices according to the environment and achieve a system-level optimization.

One example of the IoT and AI application is the smart watering system. In the US, more than half of the outdoor water is used for watering lawns and gardens. Nationwide, landscape irrigation is estimated to account for nearly one-third of all residential water use, totaling nearly 9 billion gallons per day, and as much as 50 percent of water used for irrigation is wasted due to inefficient irrigation methods and systems [\[2\]](#page-118-2). For conventional automatic irrigation, the lawn is watered regularly, regardless of the weather, season and other environment changing, which results in an inefficient water-

<span id="page-20-0"></span>

Figure 1.1: IoT applications of smart home and wearable devices.

ing and water waste. With the IoT sensing system, environmental parameters such as temperature, light and humidity are sensed, and then AI determines how much water is needed based on the past and current condition and can even make a future prediction, which improves the efficiency of the irrigation system.

Other examples of the IoT applications include wearable health monitor systems, smart home and city traffic regulation. With increases in healthcare costs and a limited supply of physicians, changes must be made for the healthcare system. Compared to the costly and not easily access medical devices, consumer electronics are ubiquitous and inexpensive. One compelling solution to this problem is to alleviate some of the burden on the healthcare system by equipping the general population with tools to track and monitor their health. Using wearable devices together with wireless network, we can monitor the health condition real-time and send these data to the health center and physicians. Professional suggestions can be given and people can know their health conditions and take actions accordingly before it develops terminal. In a smart home system, the housing environment parameters such as temperature, light and humidity are monitored real-time and can be accessed and controlled remotely, as shown in Fig. [1.1](#page-20-0) [\[3\]](#page-118-3). By evaluating the weather and recent human activities, the AI manager can dynamically adjust the air conditioning to maintain a constant room temperature while save power. The information can also be used to determine how often the grass should be watered and when the clean robot should be activated.

IoT sensing network forms the digital nerve system of the world, the data from different sensors at different locations gives us a high-level picture. With the real-time data it provided and the help of AI, feedback control can be faster and more accurate, and a system-level optimization can be performed.

### <span id="page-21-0"></span>1.2 General IoT Sensing Node Architecture

Fig. [1.2](#page-22-0) shows a general architecture of an IoT sensing node. It consists of an analog-front-end (AFE), ADC, digital signal processing unit, transmitter (TX), power management circuits and a clock generation block. The environmental signals (temperature, humidity, light, etc.), electrical signals (ECG, EEG, ECoG, etc.) and chemical signals (glucose, ion concentration, etc.) are first sensed by the sensor and sent to the AFE. The AFE here has two functions. First, it acted as a buffer to provide a high input impedance and low output impedance. Some of the electrodes can have a impedance as high as several MΩs, if it is connected to a low input impedance AFE or ADC, the signal amplitude sensed by the AFE will be very small. The gates of the input transistors of the AFE usually have very high impedance and can thereby provide a low load to the electrode and extract most of the signal. Second, the AFE provides some gain so that the sensed signal is amplified to the full quantization range of the next stage ADC, and this also reduces the input referred noise to improve the overall SNR. After being amplified by the AFE, the analog signal is then quantized by a low power ADC. SAR ADC [\[4\]](#page-118-4) [\[5\]](#page-118-5) is usually used in the IoT sensing node due to its low-power characteristic. VCO-based and  $\Delta\Sigma$  ADCs are also promising choices due to its ability to eliminate the using of the AFE [\[6\]](#page-118-6) [\[7\]](#page-118-7). The quantized digital signal is then processed by a DSP unit and then sent by a transmitter to some nearby local base stations whose energy is more abundant for more complicated processing or storage.

Besides the above signal path, a power management unit and clock generation

<span id="page-22-0"></span>

Figure 1.2: General architecture of an IoT sensor node.

unit are also needed in the sensor node to keep the system work properly. Since the wireless sensor nodes are often powered by battery or energy-harvesting cell [\[8\]](#page-118-8), the supply voltage may not be stable and regulators or proper timing control circuits are needed. For example, the battery output voltage will droop after working for a certain time, in this case a low-dropout regulator (LDO) is needed to keep the supply voltage of the circuit stable. For the biofuel cell in [\[8\]](#page-118-8), due to the large internal resistance and limited power density, a large output voltage drop will occur when the TX is turn on and draw a large current. Therefore, a proper timing control is essential to ensure that the system works properly.

For the clock generation unit, ring oscillator, relaxation oscillator and crystal oscillator (XO) can be used based on the applications [\[9\]](#page-118-9) [\[10\]](#page-119-0). Ring oscillator has the advantages of simple architecture, ability to work at low supply voltage, process portability and wide tuning range and multi-phase output. However, its frequency is sensitive to PVT variation and the phase noise performance is not good. By using some temperature compensation techniques, such as using resistors with inverse temperature coefficients and temperature-compensated current or voltage sources [\[11\]](#page-119-1), the frequency can be more stable, however, it may still not satisfy some applications which have high requirements such as wireless communication. In this case, the crystal oscillator is

needs. Due to the very high Q (over 100,000) of the crystal, the XO has very low phase noise and high frequency stability. However, it also has a larger power consumption and the start-up time is long and can be larger than 1ms without using any fast start-up techniques.

## <span id="page-23-0"></span>1.3 IoT Circuits Requirements

Due to the application requirements, the circuits in the wireless IoT sensor nodes are usually subject to size and power constraints. Therefore, their design focus, architecture and working mechanism are different from the general-purpose circuits.

The size of the IoT sensors is usually small due to the following reasons: (1). ability to be deployed to different locations and environments; (2). to minimize the effect to environment around; (3). cost due to large numbers of sensor nodes. For example, in a capsule endoscope system the capsule should be small enough so that it can go through the digestive tract and patients will not feel uncomfortable. In the ECog monitoring, there are very strict requirements on the size and heating of each sensor, and a smaller size permits the deployment of more sensors and a higher resolution.

Due to the size constraint, the available energy from the battery of the wireless sensor node is limited and thereby the circuit should be designed to be low-power and energy-efficient, and the timing and working flow of the system can be adjusted to reduce the average power. For example, many emerging sensing applications have the feature that do not change rapidly with time, such as temperature [\[12\]](#page-119-2), air quality and human body ion concentration sensing. Therefore, the sensing system can have a low sampling rate and can be aggressively duty-cycled into sleep mode. The power-hungry blocks such as the TX is activated only when data needs to be transferred. Since the majority of the circuits in the system is power gated during the sleep mode, the average power can be greatly reduced [\[13\]](#page-119-3) [\[14\]](#page-119-4). For example, by aggressively duty-cycling the TX and using other techniques to reduce the leakage current in the sleep mode, the energy-per-bit in [\[13\]](#page-119-3) can be as small as 38pJ/bit even though the active power of TX is

 $191 \mu W$ .

The architecture of IoT circuits are usually specially designed to be low-power and able to work with low supply voltage provided by the energy harvesting cell such as bio-fuel cell. For example, instead of using TX architectures which support complicated modulation scheme [\[15\]](#page-119-5), power-oscillator-based TX architecture is usually used in wireless IoT sensors, which has the advantages of simple architecture to be able to work with low supply voltage, low standby power and inherent impedance matching with antenna. In IoT system, we should not only consider the circuit active power, but also the sleep-mode leakage power. The average power consumption can be greatly reduced by aggressively duty-cycling the active circuits, and in this case, the average power consumption of the system is not determined primarily by the active power, but rather from a combination of active power and sleep-mode leakage power. And actually the sleepmode leak power contributes a large portion of the average power. Therefore, circuit architectures which have low leakage power should be used. In the power-oscillator-based TX architecture, by using power-gating technique the leakage power can be reduced up to 4,000x [\[13\]](#page-119-3). But insert a power-gating transistor will introduce a series-resistor into the power path and affect the circuit performance in the active mode, especially when the active current is large such as in the TX. To minimize the effect, a proper type of transistor with maximal  $R_{off}/R_{on}$  should be used as the power gating switch.

Due to the low sampling rate feature, many low-frequency oscillator architectures are also proposed for the IoT application [\[16\]](#page-119-6) [\[9\]](#page-118-9). Crystal oscillators are widely used as a high-precision frequency reference due to its insensitivity to supply and temperature variations. However, it is hard to be integrated on the chip and costly. Constantcurrent relaxation oscillators and R-C oscillators are widely used as fully-integrated oscillator architectures. The oscillation period is based on an RC time constant or capacitor charging time. For Hz-range frequencies, large resistors and capacitors are needed, necessitating a total capacitance on the order of 10  $nF$  and hundreds of  $M\Omega$ s, or a total resistance on the order of 10  $G\Omega$  with an on-chip realizable capacitance on the order of tens of pF. To make the RC integratable and reduce the on-chip area, gate-leakage tran-

sistors are utilized as ultra-low-current sources in [\[9\]](#page-118-9), and the intrinsic relaxation-like operation of the proposed oscillator ensures a highly-accurate frequency.

## <span id="page-25-0"></span>1.4 Dissertation Contributions

This dissertation discusses the key building blocks in the IoT system, including power management, clock generation and low-power transmitter circuits. To ensure a proper functionality, the supply voltage of all the blocks need to be stable and clear. Sometimes the quality of the supply voltage can be the bottle neck of the performance. In Chapter 2, a charge-pump-based LDO with AC-coupled high-Z feedback loop is proposed, which improves the overall speed (response time and settling time), dynamic range and ripple amplitude over the entire load range. Crystal oscillator provides a clean reference clock for the IoT communication system, but it has a long startup time due to its high Q factor. The startup time determines the maximum data rate and the startup energy occupies a large portion of the overall energy consumption. In Chapter 3, multi-Path Feedforward negative resistance boosting and dynamic pulse width injection techniques are proposed to reduce both the startup time and startup energy of the crystal oscillator. In Chapter 4, a battery-powered wireless ion sensing platform featuring complete sensing-to-transmission functionality is presented. Finally, Chapter 5 concludes the thesis and discusses the future research directions. The ultimate goal is to build a wireless sensing system with carefully optimized building blocks from sensing to wireless data transfer, with stable and energy-efficient power supply and reference clock.

## <span id="page-26-0"></span>Chapter 2

# A Dynamically-High-Impedance Charge-Pump-Based LDO with Digital-LDO-Like Properties

## <span id="page-26-1"></span>2.1 Introduction

Scaled CMOS system-on-chips (SoCs) are trending in the direction of having many functional cores, where each core has their own power domain in order to be run at an optimal energy-performance trade-off point. Since it is difficult to integrate highpower-density switching DC-DC converters directly into the SoC fabric, most solutions rely on one or more external power management ICs (PMICs) to bring the supply down to a scaled-CMOS-friendly voltage (e.g.,  $\leq 1$  V), after which multiple on-chip linear low-drop out regulators (LDOs) individually scale down and regulate the voltage of each core according to dynamic application demands, as shown in Fig. [2.1.](#page-27-0)

Conventionally, LDOs are designed in an analog manner, where an error amplifier is used in a compensated feedback loop to regulate the output voltage through a single power transistor. However, such analog LDOs have difficulty operating well at low voltages due to limited transistor overdrive. Additionally, stabilizing analog feedback loops while achieving high performance can take a significant amount of time and effort, leading to long re-design times when specifications or process technologies change.

For these reasons, there has been significant recent interest in digital LDOs,

<span id="page-27-0"></span>

Figure 2.1: Hierarchical power supply architecture in a digital SoC.

which replace the analog amplifier with one or more comparators that digitally control an array of power transistors [\[17](#page-119-7)[–20\]](#page-120-0). Since there are no analog amplifiers, low-voltage operation can be more easily achieved, and the mostly-digital nature can enable more rapid process portability.

Despite these advantage, digital LDOs tend to have worse performance in terms of response time, settling time, ripple, regulation range, and power supply rejection (PSR) than a well-designed analog LDO [\[21](#page-120-1)[–23\]](#page-120-2). Fundamentally, digital LDOs that rely on a clock for operation cannot, in the worst case, response to sudden full-scale load changes quicker than within a single clock cycle. In practice, shift-register-based N-bit digital LDOs require many clock cycles [\[17\]](#page-119-7), while N-bit binary-search digital LDOs require up to  $N$  clock cycles [\[24\]](#page-120-3), both of which may be too slow for the increasingly stringent demands of modern digital loads. While increasing the clock frequency can improve the response time, it directly leads to higher quiescent power and, without careful compensation, can result in stability issues. Changing from clocked to continuous-time comparators can help digital LDOs respond more quickly while retaining the favorable digital LDO properties of low-voltage operation and easier process portability [\[25\]](#page-120-4) [\[26\]](#page-120-5), yet they typically require energy-expensive multi-bit quantizers, and have non-negligible delay through complex control logic.

To further improve the response time of digital LDOs, recent work has suggested

using analog circuits to "assist" the digital circuits [\[27\]](#page-120-6) [\[28\]](#page-120-7). Such approaches retain the benefits of digital LDOs, yet offer direct performance advantages in terms of response time. In general, they operate by coupling the output voltage to the gate of the power transistors through a high-pass  $RC$  network, which enables the provision of nearly instantaneous compensation current during load transients. However, the compensation effect is seriously degraded when the load current is small in [\[27\]](#page-120-6), while [\[28\]](#page-120-7) cannot respond to voltage overshoot during load transient and has a limited input/output voltage range. Besides, such approaches do not yet address ripple and regulation range.

To help improve the ripple, regulation range, and PSR of digital LDOs, other recent solutions have suggested combining digital LDOs with analog LDOs operating in parallel to create hybrid LDOs that inherit the performance benefits of both approaches [\[29–](#page-120-8)[33\]](#page-121-0). However, such solutions may not be appropriate in the applications in which digital LDOs are advantageous: applications that operate at low input voltages (since analog amplifiers are still needed), or in applications that require rapid process portability (since the analog feedback loops can be difficult to stabilize without a large design time/effort). Thus, while such approaches may yield excellent performance across numerous specifications, especially in regards to PSR (which is not normally addressed in digital LDOs and in fact may be quite poor - though this is often acceptable for digital loads), their comparison points should really be to hybrid or analog LDOs, not digital LDOs, in which case the utility of the hybrid approach is less clear.

This work presents the design of an LDO that mostly operates in an analog manner, yet is specifically designed to retain the advantages of digital LDOs: namely, lowvoltage operation, and easy process portability, all with favorable response time, quiescent current, ripple, and dynamic range [\[34\]](#page-121-1) [\[35\]](#page-121-2). To enable low-voltage and process scalable operation, the design, shown in Fig. [2.2,](#page-29-1) forgoes the use of an amplifier and instead biases the voltage of a single power transistor via a charge pump (CP), which is controlled by two dead zone comparators. A direct AC-coupled high-impedance (ACHZ) feedback loop is further used to dynamically increase response time and help stabilize the system, while a small current charge pump is then used to improve reg-

<span id="page-29-1"></span>

Figure 2.2: Proposed charge-pump-based LDO with ACHZ loop.

ulation accuracy in the design. It should be noted that the proposed design does not improve PSR in any way over conventional digital LDOs.

## <span id="page-29-0"></span>2.2 Architecture and Working Principle

The architecture of the proposed LDO is shown in Fig. [2.2.](#page-29-1) In contrast to conventional digital LDOs that utilize arrays of PMOS power transistors , the proposed design utilizes a single PMOS power transistor,  $M_1$ , driven by a pair of charge pumps, which in turn are driven by a pair of time-interleaved dynamic-inverter-based continuous-time comparators setting upper and lower regulations bounds  $(V_{refH}$  and  $V_{refL}$ ) of a regulation dead zone. Capacitor  $C_C$  is set across the power transistor  $M_1$  to

form the ACHZ loop. In addition to the pair of continuous-time comparators that set the regulation dead zone, an auxiliary clocked comparator is used to compare the output voltage with  $V_{ref}$ , usually set to be in the middle of the dead zone, to detect whether the output voltage is above or below the desired reference voltage and improve regulation accuracy through an auxiliary 1-bit fine-tuning charge pump.

The working principal of the LDO is as follows. When  $V_{out}$  is within the dead zone between bounds during steady state, the main charge pumps are disabled (ignore the fine-tuning charge pump for now), and their output,  $V_G$ , is high-impedance. Thus, any residual charge stored on  $C_C$  and parasitic capacitance,  $C_G$ , determines the power transistor's gate voltage, and thus the current supplied by the LDO. The ACHZ loop is formed by directly AC coupling  $V_{out}$  to  $V_G$  via capacitor  $C_C$ . Since this node is high impedance in this state (when the charge pumps are off), any droop experienced at  $V_{out}$  during a load transient will directly couple to  $V_G$  with coupling efficiency set by  $C_C/(C_C+C_G)$ . This serves to directly lower the gate voltage of of  $M_1$ , thereby providing near-instantaneous compensation current through the power transistors (i.e.,  $I_{MOS}$ ), which helps to significantly shorten the response time, as illustrated by the red section of curves in Fig. [2.3.](#page-31-0)

Though it helps to significantly improve the response time, the ACHZ loop may not be able to provide all of the necessary compensation current to return the LDO's output all the way back to the middle of the dead zone under all circumstances. This is where the charge pumps come in. When  $V_{out}$  falls below  $V_{refL}$ , as also illustrated in Fig. [2.3,](#page-31-0) the lower continuous-time comparator is triggered, which turns on  $M_{CPN}$  for continuous time integration. This then further discharges  $V_G$ , thereby further increasing the current through the power transistor,  $I_{MOS}$ , to help  $V_{out}$  settle back to within the dead zone. After  $V_{out}$  settles to  $V_{refL}$ , the lower bound detection comparator's output is flipped again, which turns off transistor  $M_{CPN}$  in the charge pump, and thus the main charge pump is shut down. Interestingly, during this phase  $V_{out}$  will settle in the dead zone with the help of ACHZ loop - the details of this will be discussed in detail later.

Ideally, the ACHZ loop should provide instantaneous compensation current when

<span id="page-31-0"></span>

Figure 2.3: Transient response waveform of the proposed LDO.

load transients occur, and the charge pump path should begin operating as soon as  $V_{out}$ drops out of the dead zone. Thus, ideally, both paths should operate together, with some possible time-overlap. However, when the edge rate of the load is faster than the propagation delay of the charge pump loop, it's possible that the ACHZ loop will provide most of the compensation current to reduce output droop, while the charge-pump path is mainly responsible for voltage recovery and settling. On the other hand, if the edge time is relatively long, then both the two paths contribute current to reduce the voltage drop.

Since the gain of the power transistor changes rather dramatically once the device enters the subthreshold region, loop stability can be affected at high values of  $V_G$ . To compensate for this, a subthreshold detection block is used that, upon detection of

<span id="page-32-1"></span>

Figure 2.4: (a) Conventional shift-register digital LDO architecture (b) shift-register digital LDO detection delay.

a subthreshold gate voltage, disconnects the large-current charge pump from the CP path and only use the small-current charge pump. The stability of this approach will be discussed in more detail later.

To improve the accuracy of  $V_{out}$ , an auxiliary small-size charge pump path is used for fine tuning. After  $V_{out}$  settles back and re-enters the dead zone, the main charge pump is turned off and the auxiliary 1-bit fine-tuning charge pump path is activated. A clocked comparator compares  $V_{out}$  with  $V_{ref}$ , and the result is used to regulate  $V_{out}$ by 1 LSB per cycle toward  $V_{ref}$ . Once  $V_{out}$  crosses  $V_{ref}$ , this auxiliary fine-tuning charge pump path is turned-off to avoid limit-cycling. If small perturbations in the output voltages are present, the fine-tuning charge pump can be left on so that the dead zone detector will not be frequently triggered. Due to the small-size transistors in the fine-tuning charge pump, the impedance at node G is larger than  $4 M\Omega$ , together with the low-latency event-driven charge-pump path, the load transient voltage droop difference is less than 3 mV compared to the non-continuous mode.

## <span id="page-32-0"></span>2.3 Performance Analysis

This section will describe the speed, ripple, and dynamic range performance of the proposed LDO, and contrast it to prior-art digital LDO designs.

<span id="page-33-1"></span>

Figure 2.5: Analog-assisted loop when only the LSB power transistor is on.

### <span id="page-33-0"></span>2.3.1 Speed

Both the detection and respond speed of conventional shift-register (SR) digital LDOs are limited by the clock frequency. For the representative conventional SR-based digital LDO shown in Fig. [2.4\(](#page-32-1)a), the comparator can only perform the comparison at the edges of the clock and thus, if a load transient happens right after the edge of the clock as illustrated in Fig. [2.4\(](#page-32-1)b), the digital LDO requires essentially an entire clock cycle to detect the load transient, and then another cycle to respond. After detection, at least several clock cycles are required in conventional SR-based digital LDOs to settle back, as illustrated in Fig. [2.3.](#page-31-0) Increasing the clock frequency would increase the detection and response speed, however, this directly trades-off with increased quiescent power consumption. Moreover, increased frequency may also degrade the phase margin of the system, potentially rendering it unstable as described briefly in Section [2.4,](#page-39-0) and by the analysis in [\[24\]](#page-120-3).

On the other hand, the proposed ACHZ loop responds nearly instantaneously to sudden load current steps, and thus the proposed LDO can respond in less than a clock cycle, importantly without any increase in clock frequency or quiescent power.

<span id="page-34-0"></span>

Figure 2.6: Bode plot of ACHZ and AA loop.

Since the ACHZ loop may not provide all necessary compensation current, the charge pump path is also designed to respond quicker than a single clock cycle thanks to the continuous-time comparator (which are designed for low quiescent power as described in Section [2.5\)](#page-47-0), the output of which is used to perform a fast continuous-time integration, as illustrated in Fig. [2.3.](#page-31-0)

The proposed ACHZ loop is thematically similar to the analog-assisted (AA) loop in [\[27\]](#page-120-6), which can also provide nearly instantaneous compensation current when there is an output voltage droop and coupled through the RC feedback network shown in Fig. [2.5.](#page-33-1) However, the amount of compensation current in the AA technique is seriously degraded when the load current is small. For example, in Fig. [2.5](#page-33-1) when the load current is small and thus only a single LSB power transistor is on, the coupling only affects the LSB, and thus only supplies a small amount of compensation current.

There is also a difficult trade-off between value of the resistance in the feedback loop,  $R_{AA}$ , the coupling efficiency, and settling speed. To have a high coupling efficiency, a large time constant in the high-pass RC network is preferred, which means a large value of  $R_{AA}$  and  $C_{AA}$ . For a given time constant, a large  $R_{AA}$  and a small  $C_{AA}$ are used to save silicon area [\[27\]](#page-120-6). However, a large  $R_{AA}$  would affect the normal turnon time of the power transistor since it is in the path between the drivers and ground. To improve upon the speed-area tradeoff, the AA loop was modified to a NAND-based highpass analog path (NAP) with NMOS power transistor in [\[28\]](#page-120-7). Using an NMOS power transistor together with a voltage doubler to boost the gate-driven voltage can

achieve a fast response speed due to the inherent  $V_{GS}$ - $I_D$  relationship. However, it has the drawbacks of limited input or output voltage range. In [\[28\]](#page-120-7) and [\[36\]](#page-121-3), the voltage doubler directly boost the input voltage and the maximum input voltage of the LDO can only be half of the maximum supply voltage of the process, and the input/output range is only 150/200 mV in [\[28\]](#page-120-7). In [\[37\]](#page-121-4), the voltage doubler boosts a internally generated voltage, which permits the input voltage to be the normal power supply voltage of the process. However, analog clamp and buffer blocks are needed, which is not suitable in digital LDO applications. Besides, the LDO output voltage is still limited to  $V_{DD/PUMP} - V_{TH}$  and the dropout voltage can potentially be large. This problem becomes even worse in advanced process since the threshold voltage doesn't decrease as much as the supply voltage. The settling speed is also limited by the clock frequency of the charge pump.

On the other hand, the proposed ACHZ loop does not suffer from such tradeoffs. Specifically, there is no large high-pass resistance in the normal settling path, and thus there is no  $RC$ -based trade-off. Additionally, since the output voltage droop is directly coupled to the gate of the sole power transistor, it can provide full compensation capabilities at all current levels, including the important case of a low (e.g., sleep-mode) current. Moreover, due to the high impedance at  $V_G$ , the  $V_{out}$ -to- $V_G$  coupling efficiency is set by  $C_C/(C_C+C_G)$ , where  $C_G$  is the parasitic capacitance at the gate of the power transistor. During this  $V_{out}$ -to- $V_G$  coupling process,  $C_L$  doesn't affect the coupling efficiency. Therefore, only a small  $C_C$  of 40 pF is required to achieve over 90% coupling efficiency, even for a 105 mA-capable PMOS.

Importantly, the proposed ACHZ loop can, even with the same (high) load current, provide more compensation than the AA loop due to inherent loop stability advantages. In the AA loop, there are three poles and one zero, as shown in Fig. [2.6.](#page-34-0) To ensure the system is stable, the loop gain  $A_V = g_m x R_{out}$  is set to be <1 [\[38\]](#page-121-5), which means a limited compensation current. The proposed ACHZ loop has only two poles, and the pole located at  $V_G(p_0)$ , which is close to origin due to the high-impedance node, is cancelled by the zero introduced by  $C_{\mathcal{C}}$ . Thus, the ACHZ loop has only one effective
<span id="page-36-0"></span>

Figure 2.7: Open-loop instantaneous compensation current simulation results of ACHZ and AA loops.

<span id="page-36-1"></span>

Figure 2.8: Relationship between load resistance, power transistor  $g_m$ , and  $V_{step}$ .

pole, and thus the ACHZ loop on its own is inherently stable (the stability of the overall multi-loop system will be discussed in section IV). This means the loop gain can be set to >1 to obtain a larger  $g_m$ , improving compensation current by 3.7x over an AA loop for  $I_{load,initial}$ =5mA, as shown in Fig. [2.7.](#page-36-0) Due to the high-impedance node, the compensation current can also last for a longer period of time, at least until the charge pump kicks in (which is not shown in this example).

When  $V_{out}$  falls out of the dead zone and the charge pump starts to drive  $V_G$ down (green segment of  $V_G$  in Fig. [2.3\)](#page-31-0) to increase  $I_{MOS}$ , the falling  $V_G$  is coupled to  $V_{out}$  through  $C_C$  and may affect  $V_{out}$ . The coupling factor is:

$$
\frac{\Delta V_{out}}{\Delta V_G} = \frac{R_L / \frac{1}{sC_L}}{\frac{1}{sC_C} + (R_L / \frac{1}{sC_L}))},\tag{2.1}
$$

without considering the charging current from power transistor. A large  $C_L$  can thus attenuate the effect from  $\Delta V_G$ . However, this coupling voltage is very small and can be neglected due to the complementary relationship between  $\Delta V_G$  and  $R_L$ . For example, when  $\Delta I_L$  and  $\Delta V_G$  are large,  $R_L$  is small, and vice versa. Thus, when  $I_{load}$  jumps, for example to the maximum load current of 105 mA,  $\Delta V_G$  has the largest amplitude. In this case,  $\Delta V_G$  falls by 70 mV in 19 ns in simulation, and  $R_L$  is 500mV/105mA=4.76 $\Omega$ . From simulation, even with zero load capacitance, the voltage coupled to output is about 700  $\mu$ V. When  $\Delta I_{load}$  is small, the output impedance is larger, but the amplitude of  $\Delta V_G$  also becomes smaller, and the coupled voltage is small. From simulation, the output voltage decreasing coupled by  $V_G$  drop is less than 2 mV in the entire load range. Moreover, with the charging of  $I_{MOS}$  or an additional load capacitance  $C_L$ , its effect can be neglected.

Interestingly, the proposed LDO can potentially be even faster than a analog LDO designed with the same quiescent current, since high-power multi-stage amplifiers are usually used in analog LDOs to achieve a high loop gain. To achieve high speed, the last stage, which drives the large power transistor, requires a large static bias current to improve slewing during large load transient. While for the proposed LDO, since the power transistor is driven by a charge pump, and the charge pump is off for most of the time, a large static bias current is eliminated.

#### 2.3.2 Output Ripple and Voltage Tuning Ability

Due to their inherent switching nature, baseline digital LDOs have ripples at their outputs, even at steady state. Unfortunately, this ripple amplitude can increase significantly when the load current is small, since the ratio of the LSB transistor's resistance (which is fixed) to that of the effective load resistance gets worse at low currents [\[24\]](#page-120-0).

Fortunately, the output ripple and tuning ability trade-off is inherently mitigated in the proposed LDO. Since  $g_m$  of the power transistor is proportional to the load current, the decreased  $g_m$  compensates the increased  $R_{load}$  at small load current, and generates small and stable step voltage. Similarly, the increased  $g_m$  compensates the decreased  $R_{load}$  at large load current, thereby maintaining a good voltage tuning ability in this case. Figure [2.8](#page-36-1) illustrates this intrinsic compensation, demonstrating that the step voltage varies between 6 and 12 mW, or a 7 mV variation, across the entire  $100,000 \times$  load current range with zero load capacitance according to simulations. With an additional load capacitance,  $C_L$ , this ripple amplitude can be further reduced.

#### 2.3.3 Dynamic Range

For an SR-based digital LDO, the load current dynamic range (DR) is given by:

$$
DR = \frac{I_{max}}{I_{min}} = \frac{N \times I_{unit}}{I_{unit}} = N,
$$
\n(2.2)

which is determined by the number of power transistors,  $N$ . Increasing the number of power transistors can increase the dynamic range, however, at a cost of power consumption and the area of power transistor drivers. Using a binary search control can mitigate this issue [\[24\]](#page-120-0), but the MSB-first switching may potentially generate large output glitches.

Fortunately, the proposed LDO can achieve a large dynamic range without significant power or area overhead. Specifically, the charge-pump-based LDO generates the maximum current when the gate voltage of the power transistor is pulled down to zero, i.e.,:

<span id="page-39-0"></span>
$$
I_{max} = \frac{1}{2} \times \mu_P C_{ox} \frac{W}{L} (V_{DD} - |V_{TH}|)^2,
$$
\n(2.3)

where  $\mu_P$  is the transistor's mobility and  $C_{ox}$  is the gate oxide capacitance per unit area. The minimum current that the LDO can provide is set by the transistor's cutoff leakage current, that is when the gate voltage of the power transistor is  $V_{DD}$ :

<span id="page-39-1"></span>
$$
I_{min} = \mu_P C_{ox} \frac{W}{L} (n-1) \phi_t^2 e^{-V_{TH}/(n\phi_t)},
$$
\n(2.4)

where *n* is the subthreshold coefficient, and  $\phi_t$  is the thermal voltage. With Eqn. [\(2.3\)](#page-39-0) and Eqn. [\(2.4\)](#page-39-1), the dynamic range of the charge pump LDO can be obtained as

$$
DR_{CPLDO} = \frac{I_{max}}{I_{min}} = \frac{\frac{1}{2}(V_{DD} - |V_{TH}|)^2}{(n-1) \times \phi_t^2 e^{-V_{TH}/(n\phi_t)}}
$$
(2.5)

This dynamic range can be as large as  $2 \times 10^6$  according to calculations and simulations.

## 2.4 Stability Analysis

Stability analysis is critical to all LDO designs. As discussed in Section [2.3.](#page-32-0)A, the ACHZ loop itself (i.e., when ignoring the contributions of the charge pump path) is inherently stable. Unfortunately, analysis beyond this loop is complicated by the inherently non-linear nature of the full LDO.

To help better intuitively understand the stability of the LDO, this section will first look at the transient operation of the proposed LDO in three cases, and qualitatively discuss how the overflow current can potentially cause stability issues, which can be resolved by inclusion of the ACHZ loop. A non-rigorous stability criterion of the system is then ascertained from this discussion. Then, a concise piece-wise-linear time-domain analysis method is used to quantitatively analyze the non-rigorous stability criterion of the system. This analysis helps the designer model and understand the trade-off and relationships among circuit parameters and the charge pump current, overflow current,

<span id="page-40-0"></span>

Figure 2.9: Load transient waveforms comparison w/ and w/o ACHZ loop.

and the time required for  $V_{out}$  to go back to the dead zone  $(t_1$  to  $t_2)$ .

#### 2.4.1 Qualitative Analysis of Transient Operation

In conventional shift-register based digital LDOs, a faster clock permits a shorter response and settling time. However, when  $f_{clk}$  is much larger than the effective frequency of the load's pole,  $f_L$ , the shift register can potentially accumulate more zeros or ones than necessary, which can turn on or off more transistors than desired in a short period of time, which results in an oscillatory or unstable response, as described in [\[24\]](#page-120-0).

The same sort of stability issue could, if not compensated for, occur if the charg-

ing or discharging speed in the charge pump of the proposed LDO is too fast. Figure [2.9](#page-40-0) will be used to qualitatively illustrate this for three different cases.

#### Slow charge pump without the ACHZ loop

After a sudden load step in example curve 1, a slow charge pump is activated after a brief delay through the dead zone and continuous-time comparator. This serves to decrease  $V_G$ , which increases the amount of current provided by the power PMOS,  $I_{MOS}$ . Once  $I_{MOS} = I_{load}$ , then  $V_{out}$  would ideally stop decreasing and stall at its current value. If this value of  $V_{OUT}$  is outside of the dead zone, then the charge pump will remain on, and provide a small amount of overshoot current until  $V_{OUT}$  returns to the dead zone.

#### Fast charge pump without the ACHZ loop

The previous example was found to be stable, at least qualitatively. To improve response and settling time, the charge pump current (i.e., its speed) can be increased. However, as illustrated by example curve 2, stability issues can arise if the charge pump is made too strong. In this example, the high current available by the charge pump will help to more rapidly pull down  $V_G$ , rendering a slightly faster response time and a significantly reduced time for  $V_{OUT}$  to settle back into the dead zone. However, the overflow current at the time  $V_{OUT}$  enters the dead zone will be large. At this point, the charge pump turns off, and thus  $V_G$  remains largely the same since it is a high-impedance floating node in this state, thereby keeping this large overflow current at approximately the same level as before. This can cause  $V_{OUT}$  to rapidly increase, possibly even shooting outside of the upper bound of the dead zone, which will trigger the upper bound detecting comparator to start charging  $V_G$  quickly, which may then compensate too much, such that  $V_{OUT}$  shoots outside the bottom of the dead zone, and so on, rendering the system unstable. Because of this, the charge pump current without the ACHZ loop cannot be designed to be too large, thereby resulting in a direct speed-stability trade-off.

#### Fast charge pump with the ACHZ loop

Fortunately, inclusion of the ACHZ loop can help facilitate an increased charge pump current without compromising stability - breaking this trade-off. Example curve 3 qualitatively illustrates this. In this example, the ACHZ provides a nearly-instantaneously compensation current before the continuous-time comparator can react, already improving the response time. Once the comparator does react (and after its propagation delay), then the charge pump is activated. This means that the gate of power transistor is no longer floating, and the coupling facilitated by the ACHZ loop is temporarily attenuated by the low-impedance charge pump's termination. Thus, at this time, the charge pump provides some overshoot current, until  $V_{out}$  enters the dead zone. At this point the charge pump is shut off, thereby making  $V_G$  high impedance again, and re-activating the ACHZ loop. During this state, the increasing  $V_{OUT}$  (due to the overflow current) is coupled to  $V_G$ , which serves to increase  $V_G$ , thereby naturally suppressing the overflow current to help  $V_{out}$  settle within the dead zone. As a result, a much faster charge pump can be employed than without ACHZ, which helps to reduce settling time by 56% according to simulations.

Note, however, that the  $g_m$  of power transistor  $M_1$  becomes very small when it enters the subthreshold region, and thus the overflow current suppression loop is less effective in this case. To combat this, a subthreshold detection circuit is employed, where a comparator is used to compare the gate voltage with the threshold voltage of the power transistor. When it detects the power transistor enters the subthreshold region, it disables the large-current charge pump, which helps to extend the stable operation range down to  $1\mu$ A, for an effective 6.6-bit resolution improvement compared to without this technique.

<span id="page-43-0"></span>

Figure 2.10: Proposed LDO piece-wise-linear settling waveform during load transient.

# 2.4.2 Quantitative Piece-Wise-Linear Time-Domain Analysis

Normally, a digital LDO driven by a fixed clock can be linearized, such that a small signal model can be constructed and its stability can be analyzed via Bode diagrams [\[39\]](#page-121-0) [\[40\]](#page-121-1). However, event-driven multi-loop LDOs don't have constant sampling rates and in fact have several working states, and thus linearized small signal models are not accurate [\[25\]](#page-120-1). Here, a time-domain stability analysis method is instead derived in order to give quantitative insight into the stability of the proposed LDO. Compared to the linearized small signal model, the time-domain analysis method considers the initial and end conditions of each phase and gives a more accurate result (though is by no means a rigorous stability proof).

<span id="page-44-0"></span>

Figure 2.11: Simulated relationship among charge pump current, overflow current, and settling time.

From the discussion in the previous section, we know that if the output voltage can settle within the dead zone, the system is stable, otherwise, it may result in an oscillatory output. And the criterion to determine if  $V_{out}$  can settle within the dead zone is: whether the overflow current can be suppressed by  $I_{sup}$ , which is generated by coupling the rising  $V_{out}$  in the dead zone to the gate of the power transistor through the ACHZ loop. In this subsection, the expression of the overflow current  $I_{OF}$  and suppression current  $I_{sup}$  are derived. Then, using this criterion and comparing  $I_{OF}$  and  $I_{sup}$ , we can check if the system is stable.

Consider the example shown in Fig. [2.10,](#page-43-0) where a sudden load transient occurs at  $t_0$ . Due to the fast edge rate of the resulting output droop, the ACHZ loop response before the charge pump path and provide most of the compensation current between  $t_0$ and  $t_1$ , as given by:

$$
I_L - I_{ini} = \eta \times Gm\Delta V_{out},\tag{2.6}
$$

where  $\eta$  is the  $V_{out}$ -to- $V_G$  coupling efficiency and  $G_M$  is the average transconductance

<span id="page-45-0"></span>

Figure 2.12: Time-interleaved inverter-based continuous-time comparator (a) schematic and (b) timing diagram.

of the power transistor. After  $t_1$ , the charge pump path starts working and  $V_{out}$  settles back, enters the dead zone at  $t_2$  per the following equation:

$$
\bar{I}_{t1t2} \times (t_2 - t_1) = C_L \times V_{settle},\tag{2.7}
$$

where  $\bar{I}_{t1t2}$  is the average charging current during  $t_1$  to  $t_2$  at output, and

$$
V_{settle} = \Delta V_{out} - \frac{1}{2} V_{DZ}.
$$
\n(2.8)

The increasing current from the power transistor is provided by discharging  $V_G$ :

$$
I_{CP} \times (t_2 - t_1) = C_C \times \Delta V_{G, t1t2},\tag{2.9}
$$

where  $I_{CP}$  is the charge pump current. At the time the output voltage enters the dead zone, the overflow current is:

$$
I_{OF} = I_{t2} - I_L = Gm \times \Delta V_{G, t1t2},
$$
\n(2.10)

while the maximum overflow current the ACHZ loop can suppress is:

$$
I_{sup} = Gm \times V_{DZ}.
$$
\n(2.11)

With the above equations, the value of  $I_{OF}$  and  $I_{sup}$  can be calculated. If  $I_{sup}$  is larger than  $I_{OF}$ , it means that the ACHZ loop can supress the overflow current and the output voltage can settle within the dead zone.

A Matlab model is built and the relationship among charge pump current, overflow current and the time required for  $V_{out}$  to go back to the dead zone  $(t_1$  to  $t_2)$   $t_B$  is plotted in Fig. [2.11.](#page-44-0) With a dead zone of 40 mV,  $I_{sup}$  is calculated to be 32 mA. From the figure, it can be observed that with a larger charge pump current, the settling time can be reduced. However, the overflow current is also increased which may degrade the system stability. When the charge pump current is larger than 1.5 mA, the improvement on  $t_B$  is very limited while the overflow current is still increasing. Therefore, a charge pump current of 1.2 mA is selected, corresponding to a 14 mA overflow current, which is smaller than  $I_{sup}$ =32 mA, to achieve a fast settling speed while leaving some safety margin.

<span id="page-47-0"></span>

Figure 2.13: Single dynamic-inverter-based comparator operation: (a) phase 1, (b) phase 2.

# 2.5 Circuit Implementation

The overall block diagram of the proposed LDO was shown earlier in Fig. [2.2](#page-29-0) and was already generally described throughout the chapter. This section will focus on the implementation details of a few key circuits: the comparators, the ACHZ loop, and the power transistor.

#### 2.5.1 Time-Interleaved Continuous-Time Comparator

The proposed architecture utilizes two continuous-time comparators to establish the upper and lower bounds of the dead zone, set by  $V_{refH}$  and  $V_{refL}$ , respectively. The schematic of these two comparators is shown in Fig.  $2.12(a)$  $2.12(a)$ . Their design is based on the design presented in [\[41\]](#page-122-0), with one key addition to address an important issue.

The baseline design in [\[41\]](#page-122-0) is an inverter-based design that has the advantages of resilience to PVT variation, process scalability, and low offset voltage. The design operates two phases, illustrated in Fig. [2.13.](#page-47-0) During phase 1 (the reset phase), the input and output of the first inverter is connected together, and the reference voltage is sampled onto capacitor  $C_I$ . In phase 2 (the active phase), the input voltage is connected to the bottom plate of capacitor  $C_I$ . Due to this double sampling feature, the inverter-based comparator has a very small offset voltage. However, it has to be reset during operation to refresh the charge stored on the sampling capacitor, which interrupts the detection. If a load transient happens in the reset phase, it cannot be detected and an error code will be produced [\[25\]](#page-120-1).

To address this issue, a time-interleaved architecture is proposed in this design, as depicted in Fig. [2.12\(](#page-45-0)a). By time-interleaving two of these comparators, continuoustime operation is enabled throughout the reset phase. As shown in Fig. [2.12\(](#page-45-0)b), when COMP A is active, COMP B is powered gated by  $M_{PG}$  to save power. And at the end of phase A, COMP<sub>-B</sub> is reset prior to the next activation to refresh the charge on sampling capacitor,  $C_I$ . Then at the beginning of the next phase, COMP B is activated and COMP A is powered down to save power. In this way, the reset time slot is always hidden behind the activation phase, and the comparator is able to work continuously. Since only one comparator is activated at a time, and since the leakage current of the off-comparator is only 440 pA, the overall power of the time-interleaved comparator is almost the same as the original inverter-based comparator.

The output of the two inverter-based comparators are then combined and used to control the charge pump. When COMP\_B (COMP\_A) is power gated, its output is high-impedance and may have an incorrect value. To eliminate its possible effects on the output,  $act_B (act_A)$  is set to ground so that the output of COMP B (COMP A) is blocked, and only the output of COMP\_A (COMP\_B) is effective.

The  $V_{refH}$  and  $V_{refL}$  are generated off-chip to provide tuning flexibility dur-

<span id="page-49-0"></span>

Figure 2.14:  $V_G$  routing (a) normal routing (b) sandwich routing methods.

ing measurement. They can also be generated on chip using bandgap references with low power buffers to drive the 1 pF sampling capacitors in the dynamic-inverter-based continuous-time comparators.  $V_{ref}$  is generated on-chip using  $V_{refH}$  and  $V_{refL}$  with a resistor ladder, since it is connected to the gate of the clocked comparator and does not need to drive a heavy load.

#### 2.5.2 ACHZ Loop

As mentioned in Section [2.2,](#page-29-1) the  $V_{out}$  to  $V_G$  coupling efficiency is determined by  $\eta = C_C/(C_C + C_G)$ . A high coupling efficiency is desirable, as it can help to suppress the output voltage droop during load transient. Therefore, a small  $C_G$  and a large  $C_C$  is desired. However, the value of  $C_C$  should not be too large in order to save silicon area. Using a high metal layer to route  $V_G$  can reduce the  $V_G$  to ground parasitic capacitance  $C_G$  as shown in Fig. [2.14\(](#page-49-0)a), but unfortunately this shows only a minor improvement. Instead, a sandwich-based routing method is used in this design to minimize  $C_G$  and

<span id="page-50-0"></span>

Figure 2.15: Micrograph of the fabricated LDO.

maximize  $C_C$ . As shown in Fig. [2.14\(](#page-49-0)b), along the route of signal  $V_G$  (metal layer  $M_K$ ), two metal layers  $M_{K-1}$  and  $M_{K+1}$  which are connected to  $V_{out}$  are put below and above it. The two metal layers are connected using  $M_{K-1}$  to  $M_{K+1}$  vias so that  $V_G$  routing is totally surrounded by  $V_{out}$ . Therefore, all the  $V_G$  to ground parasitic capacitance on the routing wire are transformed into  $V_G$  to  $V_{out}$  coupling capacitance  $C_C$ , which increases the coupling efficiency.

#### 2.5.3 Power Transistor

The parasitic resistance at the source and drain of the power transistor is also critical to the performance of an LDO that supports large load currents [\[42\]](#page-122-1) [\[43\]](#page-122-2). In this design, the maximum load current is over 100 mA. This means even a 1  $\Omega$  parasitic resistance would result in a over 100 mV static voltage drop, which degrades the dynamic range. To minimize the parasitic resistance, the power transistor is split into 1680 multipliers and each one has 20 fingers whose width is less than 10  $\mu$ m. In this manner, the parasitic resistance is reduced by a factor of  $2.8 \times 10^6$  compared to using a single power transistor.

<span id="page-51-0"></span>

Figure 2.16: Measured transient results: response of the proposed LDO to a periodic load change with  $C_L=0$  (a) and  $C_L=10 \mu$ F (c); output voltage ripple with I<sub>L</sub>=100 mA and I<sub>L</sub>=500  $\mu$ A (b).

# 2.6 Measurement Results

The proposed LDO is fabricated in a 65nm process with an active area of 0.04 mm<sup>2</sup> including all capacitance. The chip micrograph is shown in Fig. [2.15.](#page-50-0) The total employed capacitance is 42 pF (40 pF for  $C_C$  and 2 pF for the comparators). Thanks to the high-impedance node at  $V_G$  and sandwich routing, the 40 pF  $C_C$  can provide an over 90% coupling efficiency. Since most of the parasitic capacitance from routing has been transformed to the coupling capacitor  $C<sub>C</sub>$ , the power transistor intrinsic gate-toground parasitic capacitance contributes most of  $C_G$ . Therefore, the coupling efficiency can be improved if a smaller power transistor is used. The 40 pF capacitance of  $C_C$ occupies about 53% of the total effective active area. If the area budget for the LDO

<span id="page-52-0"></span>

Figure 2.17: (a). Illustration of the LDO bond wire model (b). Simulated on-chip supply (input of LDO) voltage  $V_{in}$  droop during large load transient current

in the system is tight, a smaller coupling capacitor  $C_C$  can be used at the cost of lower coupling efficiency. A smaller sampling capacitor can also be used in the comparator with a shorter reset time interval to refresh the charge on the capacitor, at the cost of a higher power.

The measured transient response for  $\Delta I_{load}$ =100mA with a 10 ns edge rate (i.e., 10 mA/ns, which is the fastest edge rate amongst previously reported low-FoM and high-current digital LDOs as shown in the table [2.1\)](#page-58-0) and zero load capacitance is shown in Fig. [2.16,](#page-51-0) demonstrating 6.9 ns and 65 ns response and settling times, respectively, with  $V_{drop}$ =88 mV for a FoM of 1.8 fs. To achieve a faster settling during low-to-high load transient, the value of the discharging current is set to be larger than the charging current in the charge pump. In this case, the output voltage may exit the dead zone as it settles back after the initial voltage droop. However, since the charging current is set to a value which ensures a over damped settling, the output voltage will then settle within the dead zone when it is pulled down, as shown in the lower left figure in Fig. [2.16.](#page-51-0)(a). Thanks to the ACHZ and fast CP loops, the LDO can respond even before the end of the current transient, rendering in this case a response time that is faster than the edge rate.

<span id="page-53-0"></span>

Figure 2.18: Measured current efficiency at a 0.6 V input voltage for a 0.5 V output voltage.

Since the edge rate of the load transient can directly affect the response time and thus the FoM, the normalized edge rate of each design are listed in the table for comparison. To illustrate this effect, different edge rates are measured of the proposed LDO. To characterize the worst-case FoM and push the LDO at edge rates beyond what have been reported in the literature, especially for high-current digital LDOs, a  $\Delta I_{load}=100 \text{ mA}$ was also tested for a 1 ns edge, which is  $10\times$  faster than fastest other edge rate in the table. Naturally, the measured FoM of this design degrades with faster edge rates, yet still remains below 4 fs in all cases in this design, which is still a state-of-the-art result, even despite the extreme edge rate.

It should be noted that other, non-LDO-based effects begin to come into play when large edge rates are tested. As shown in Fig. [2.17.](#page-52-0) (a), the parasitic inductance of the bond wire and the on-chip decouping capacitor can potentially resonate during large transient events. For example, at the 100 mA/ns edge rate, a large *input* droop is also

<span id="page-54-0"></span>

Figure 2.19: Measurement results of (a) Load regulation and (b) line regulation.

observed in simulations, as shown in Fig. [2.17.](#page-52-0) (b). This 82 mV droop is due to the finite package and bondwire parasitic inductance, which limits the current flowing from input voltage source, while the parasitic resistance causes a static 10 mV voltage drop when the output reaches the steady state. This input droop has nothing to do with the LDO design itself, but does serve to reduce the measured FoM. Going from a bond-wire design to a flip-chip design could potentially significantly ameliorate this situation, for example.

The load step testing in Fig. [2.16\(](#page-51-0)a) demonstrated that the proposed LDO can, with zero attached load decoupling capacitance, successfully regulate across representative large load changes with rapid response and settling times, all without oscillatory behavior. Figure [2.16\(](#page-51-0)c) then repeats this test, but a load decoupling capacitance of 10  $\mu$ F is attached. Here, it can be seen that the proposed LDO can again successfully regulate with rapid response/settling time without oscillatory behavior. It should be noted that this would not be the case for a baseline digital LDO designed for a fast response - if the clock speed,  $f_s$ , of such a design is much larger than the effective pole frequency of the load,  $f_L$ , then  $V_{out}$  changes much slower than the decision of the controller, which would rapidly accumulate more zeros/ones in the barrel shifter, making the power transistor current much larger/smaller than the load current even if the load voltage has settled to  $V_{ref}$ , resulting in an oscillatory response as described in [\[24\]](#page-120-0). From the perspective of a z-domain model [\[39\]](#page-121-0), a faster clock pushes the pole closer to the unit circle, thereby reduces the phase margin, which degrades system stability. Fortunately, the ACHZ loop in the proposed LDO helps to stabilize its operation even for a fast (high-current) charge pump, regardless of the load capacitance, as evidenced by the results in Fig.  $2.16(c)$  $2.16(c)$ .

Thanks to the subthreshold detection and overflow suppression techniques, the LDO is measured to stably operate at load currents from  $1 \mu A$  to 105 mA for a dynamic range of  $105,000 \times$ , which is the largest amongst the prior art in Table I. The dynamic range is limited by two factors. Due to the large input  $IR$  drop at large load current, the gate-source voltage is reduced, which degrades the maximum load current that the LDO can provide. The minimum load current is limited either by the leakage of the power transistor, or the leakage of the load circuit. To provide a  $>100$  mA load current at high edge rates in the on-chip load test structure, LVT transistors are used. The leakage current of these load transistors are measured to be  $1 \mu A$ , which is thus the lowest current the implemented LDO can operate at. If a better load could be designed (or the edge rate specifications could be relaxed), it's possible the LDO could be measured to achieve an even higher dynamic range.

The current efficiency over this entire dynamic range is shown in Fig. [2.18,](#page-53-0) where a current efficiency >90% is achieved over a 2,100 $\times$  range from 50  $\mu$ A to 105 mA for DC loading conditions (noting that efficiency depends on the dynamics of the system and may get worse if the load current constantly changes by large amounts, throwing the output voltage frequently outside of the deadzone). Since the quiescent current of the LDO is independent of the DC load current, a high current-efficiency of 99.995% is achieved when the load current is large.

Thanks to the fine-tuning capabilities and  $g_m$ -adjusting weighted-CP-based design, ripple is measured to be  $\langle 10 \text{ mV} \text{ at both } I_{load} = 100 \text{ mA} \text{ and } 500 \mu \text{A} \text{ in Fig. 2.16(b)},$  $\langle 10 \text{ mV} \text{ at both } I_{load} = 100 \text{ mA} \text{ and } 500 \mu \text{A} \text{ in Fig. 2.16(b)},$  $\langle 10 \text{ mV} \text{ at both } I_{load} = 100 \text{ mA} \text{ and } 500 \mu \text{A} \text{ in Fig. 2.16(b)},$ and <15 mV over the entire load range. The stable ripple amplitudes at different load current verifies the analysis presented in Section [2.3.](#page-32-0)B. It should be noted that the ripple amplitude is determined by the fine-tuning charge pump size, and it can be further reduced by using a fine-tuning charge pump which has a smaller size.

Measured load and line regulation results in Fig. [2.19](#page-54-0) demonstrate 0.09 mV/mA and 6 mV/V worst-case regulation, respectively. The good load and line regulation performance is mainly due to the high DC open loop gain of the charge pump.

### 2.7 Summary

An event-driven charge-pump-based LDO with ACHZ loop is presented in this chapter. Thanks to the ACHZ loop and low-latency event-driven charge pump path, the LDO can respond less than a clock cycle and achieves 6.9 ns and 65 ns response and settling times, respectively, with  $V_{drop}=88$  mV for an FoM of 1.8 fs. With the help of the overflow current suppression, subthreshold detection and dynamic  $g_m$ -adjusting, the LDO achieves a  $105,000 \times$  load range (1  $\mu$ A to 105 mA). A <15 mV stable ripple amplitude is achieved over the entire load range.

The text of Chapter 2, in part, is based on materials from Xiaoyang Wang and Patrick P. Mercier, "A Charge-Pump-based Digital LDO Employing an AC-Coupled High-Z Feedback Loop Towards a sub-4fs FoM and a 105,000x Stable Dynamic Current Range," in IEEE Custom Integrated Circuits Conference, Apr. 2019, and Xiaoyang Wang and Patrick P. Mercier, "A Dynamically High-Impedance Charge-Pump-Based LDO With Digital-LDO-Like Properties Achieving a Sub-4-fs FoM," in IEEE Journal of Solid-State Circuits, March 2020. The dissertation author was the primary investigator and author of these papers.

<span id="page-58-0"></span>



\* For large current DLDOs, the edge rate is mainly limited by the input voltage droop due to parasitics during large load transient. The input voltage droop in lower-current DLDOs is typically<br>less affected and thus be tes \* For large current DLDOs, the edge rate is mainly limited by the input voltage droop due to parasitics during large load transient. The input voltage droop in lower-current DLDOs is typically less affected and thus be tested at a higher edge rate.

# Chapter 3

# A Fast Start-Up Crystal Oscillator with Multi-Path Feedforward Negative Resistance Boosting and Optional Dynamic Pulse Width Injection

# 3.1 Introduction

#### 3.1.1 Motivation

Many emerging Internet of Things (IoT) devices do not have constant wireless communication needs, and thus it is desirable to shut down all unnecessary blocks, including crystal oscillators (XOs), to save energy. However, the naturally high Q of a crystal resonator ( $>100,000$ ) means that baseline oscillator designs take a long time to start-up (0.6-3.2ms) [\[44\]](#page-122-3). This poses two challenges: 1) the power consumption of the XO is integrated over this long time period, resulting in a large start-up energy; and 2) the long start-up time limits the ability to rapidly duty-cycle under relatively high throughputs. However, duty-cycling as an energy-saving technique is most effective when the duty-ratio is  $<<1$  (i.e., low average throughputs), and thus to unlock the best possible energy savings from duty-cycling in most practical use cases, it is desired to minimize XO start-up energy at application-appropriate start-up times.

Much work in the area of duty-cycled XOs is primarily focused on reducing the XO start-up time; however, depending on the complexity of the employed technique,

<span id="page-60-0"></span>

Figure 3.1: XO startup (a). no fast startup techniques applied (b). fast startup with high energy (c). energy-efficient fast startup

this does not always yield a commensurate improvement in start-up energy; moreover, in some cases significant calibration is required, or the performance of the XO during post-start-up operation can be compromised by the start-up technique. Thus, the primary goal of this work is to minimize start-up energy by minimizing the product of start-up time and instantaneous power in a robust manner, all while not compromising the power or performance during post-start-up operation, as shown in Fig. [3.1.](#page-60-0)

<span id="page-61-0"></span>

Figure 3.2: Crystal oscillator architecture and different configurations of the  $g_m$  stage.

#### 3.1.2 Prior Fast Start-Up Techniques

The general architecture of a Pierce XO is shown in Fig. [3.2,](#page-61-0) where two knobs are available to reduce  $t_{startup}$ : the initial motional current amplitude,  $i_M(0)$ , and negative resistance,  $|R_N|$ . Increasing  $i_M(0)$  via injection at  $f_{xtal}$  is usually the most impactful way to reduce start-up time [\[45\]](#page-122-4) and a constant frequency injection oscillator is usually used to perform the injection; however, getting the injection frequency precise across PVT is difficult, and thus often results in extra energy or a start-up performance degradation.

To alleviate the injection requirement, a dither injection is proposed in [\[44\]](#page-122-3). The idea is to make the injection oscillator dithers between two frequency points to

<span id="page-62-0"></span>

Figure 3.3: The spectrum illustration of different injection methods.

<span id="page-62-1"></span>

Figure 3.4: Negative resistance boosting of single-stage and multi-stage  $g_m$ .

spread energy into a wider frequency range. It is effective but still has a relatively high requirements on the injection oscillator. To further relax the injection oscillator requirement and ensure that it is insensitive to PVT variation, the chirping injection [\[46\]](#page-122-5) is used. In the chirping, a periodic voltage is applied to the VCO so that the output frequency of VCO changes and sweeps over a large frequency range, to ensure the XO resonant frequency will always be covered in this range. But since the energy is spread over a wide frequency range, the injection efficiency is low, as shown in Fig. [3.3.](#page-62-0) Other techniques such as a precisely-timed energy injection [\[45\]](#page-122-4) or injection via a two-step process [\[47\]](#page-122-6) are also proposed to increase the injection efficiency while reduce the PVT

<span id="page-63-0"></span>

Figure 3.5: Architecture of the proposed fast start-up crystal oscillator.

requirements, and have demonstrated desirable outcomes in one or more of start-up time, start-up energy, steady-state performance, and calibration costs (though not in all four at once). For example,  $[44]$  achieved a start-up time of  $64\mu s$ , yet at a power of  $393\mu W$  (start-up energy not measured); [\[45\]](#page-122-4) achieved a very fast start-up time of 2.2 $\mu s$ and low start-up energy of 13.3nJ, yet with low output voltage (<200mV) and degraded phase noise; while [\[47\]](#page-122-6) achieved a good start-up time  $19\mu s$ , yet with a moderate start-up energy of 34.9nJ. If a precisely timed, temperature-stable injection source is available, both start-up time and energy can be improved to  $2.2\mu s$  and 13.3nJ, respectively, in [\[45\]](#page-122-4). However, this was only demonstrated for a low steady-state amplitude, which may reduce phase noise; while stabilization to a larger amplitude would require a longer injection time, and thus an even more PVT-tolerant oscillator.

On the other hand, boosting  $|R_N|$  can be less sensitive to PVT, and easiler to implement. Single-gm stage is the most commonly used configuration for its simplicity and low phase noise. From the  $R_N$  equation in Fig. [3.2,](#page-61-0) it reveals that there are two poles in its denominator, resulting in a decreasing  $R_N$  amplitude with frequency and a small negative resistance, as shown in Fig. [3.4.](#page-62-1) To boost the negative resistance, a 3-stage configuration can be used, as shown in Fig. [3.2.](#page-61-0) Here, a two-stage amplifier is set before the transconductance stage to boost the equivalent  $g_m$  to  $A_{v1}$  times  $A_{v2}$ times  $g_m$ , with a low power overhead. The negative resistance is boosted to  $2K\Omega$  in

this case. However, the two poles still dominate the frequency response. To further boost the negative resistance, a feed-forward path is introduced. With this feedforward path, a zero is introduced in the frequency response and drastically improves  $R_N$  around the oscillation frequency, but this could only achieve a start-up time  $>100\mu s$  for the frequency to settle to within  $\pm 20$ ppm, even with an inductive 3-stage  $g_m$  or single-stage capacitive feedforward technique [\[46,](#page-122-5) [48\]](#page-122-7).

This chapter presents a fast start-up XO that reduces both start-up time and energy via an elegant and effective multi-capacitor feedforward  $|R_N|$  boosting technique that adds additional zeros into the frequency response of  $|R_N|$  in a temperature-robust manner. To further improve start-up speed, yet with a more energy/cost-favorable imprecise on-chip oscillator, an optional dynamic pulse width (DPW) injection is also proposed. The overall system architecture is shown in Fig. [3.5.](#page-63-0) Both of the proposed techniques are robust to temperature and supply variation, and achieves ∼11x improvement in start-up time compared to a conventional 3-stage only XO while adding minimal hardware overhead.

## 3.2 Proposed Fast Start-up Techniques

# 3.2.1 Multi-Capacitor Feedforward Negative Resistance Boosting

The equation modeling the negative resistance of a conventional single-stage Pierce XO,  $|R_N|$ , is shown in Fig. [3.2.](#page-61-0)A, revealing two poles in its denominator, which results in a decreasing  $|R_N|$  amplitude with frequency. As described in [\[48\]](#page-122-7), placing two amplifiers in front of the last  $g_m$  stage (Fig. [3.2.](#page-61-0)B) can boost the equivalent  $G_M$  to  $A_{V1}A_{V2}g_m$ . The power overhead is only 1.8x since the first two amplifiers do not need to drive large loads. However, the two poles still dominate the frequency response.

Adding a feedforward capacitor,  $C_F$ , across the first gain stage (Fig. [3.2.](#page-61-0)C) introduces a zero in the frequency response which, at the right frequency, can help boost

<span id="page-65-0"></span>

**Figure 3.6:** Comparison of  $|R_N|$  boosting performances for (a) different  $g_m$  stage configurations (b) 3-stage  $g_m$  with single feedforward and multiple feedforward paths.

 $|R_N|$  by as much as 8x, as shown in Fig. [3.6\(](#page-65-0)a). However, measurement results in [\[48\]](#page-122-7) reveal that with this technique a start-up time of  $>100\mu s$  is still needed to settle to within  $\pm 20$ ppm. Moreover, since the peak amplitude and center frequency of  $|R_N|$ are both related to  $C_F$ , there is a trade-off between the maximum achievable  $|R_N|$  and

<span id="page-66-0"></span>

Figure 3.7: Relationship between the injection time and lobe width in the frequency domain.

the frequency at which this occurs. As shown in the red curves in Fig. [3.6\(](#page-65-0)b), with a larger  $C_{F1}$ , the peak  $|R_N|$  amplitude would increase, however, it can only be achieved at a lower frequency, introducing a  $|R_N|_{max}$ -frequency trade-off. However, for some applications, the PLL prefer a higher frequency crystal oscillator to reduce the frequency multiplication factor N so as to reduce the output phase noise. Therefore, it is important to achieve a wide frequency range while maintain a fast start-up time.

To simultaneously break this trade-off and further boost  $|R_N|$ , an additional feedforward capacitor,  $C_{F2}$ , is added in this work (Fig. [3.2.](#page-61-0)D), which introduces multiple zeros into the frequency response to boost  $|R_N|$  by 245x (Fig. [3.6\(](#page-65-0)a)). Importantly,  $C_{F1}$ and  $C_{F2}$  can be independently tuned towards a stable, yet reconfigurable  $|R_N|$  boost across a much wider range of XO frequencies than single-capacitor solutions (green curves in Fig. [3.6\(](#page-65-0)b)), enabling a solution that is much more easily amenable to different crystal frequencies.

#### 3.2.2 Dynamic Pulse Width Injection

Constant frequency injection is the most effective method to reduce start-up time, and the injection oscillator is usually implemented using a ring or relaxation oscillator. However, due to the high  $Q$  of the crystal resonator, the injection frequency needs to be very close to the XO frequency ( $\lt \pm 0.5\%$ ) to be effective [\[44\]](#page-122-3). This can

<span id="page-67-0"></span>

Figure 3.8: (a) Temperature variation results in a frequency drift of the injection oscillator and ineffective injection; (b) a shorter injection time generates a wider lobe width which ensures sufficient energy exists at  $f_{xtal}$ .

<span id="page-67-1"></span>

Figure 3.9: Working principle of the dynamic pulse width (DPW) injection technique.

be fulfilled by calibrating the frequency of the on-chip injection oscillator at a nominal temperature. However, the frequency of the injection oscillator will drift when environmental temperature changes. As a result, the energy injected into the XTAL resonator will be seriously decreased, as shown in Fig. [3.8\(](#page-67-0)a).

According to the Fourier transform, the injection time is inversely proportional to the lobe width in the frequency domain, as shown in Fig. [3.7.](#page-66-0) Interestingly, reducing the injection time will widen the main lobe width of the injection signal's spectrum and cover a wider bandwidth, result in injecting more energy into the XTAL resonator, as shown in Fig. [3.8\(](#page-67-0)b), when the injection frequency is not precise. In this design, a dynamic pulse width injection approach is proposed to exploit this feature to compensate the temperature-dependent frequency drifting of the injection-source, and thereby

<span id="page-68-0"></span>

Figure 3.10: Circuit implementation of the 3-stage multi-feedforward path XO core.

<span id="page-68-1"></span>

Figure 3.11: Circuit implementation of the dynamic pulse width injection oscillator.

enable inclusion of a lower-power, less-precise source.

The technique works as follows. At nominal temperature, the frequency of the injection oscillator is calibrated to be near the XO's frequency. By setting a proper injection time and lobe width, the XTAL can be sufficiently energized even there is a large mismatch between the injection and resonator frequency. When the temperature changes and the frequency of the injection oscillator deviates, the injection time is reduced accordingly, thereby widening the lobe width of the injection signal to ensure that sufficient energy exists at  $f_{xtal}$ . Here, proportional to absolute temperature (PTAT) and complementary to absolute temperature (CTAT) pulse width generators are used to control the injection time. When the temperature is lower than the nominal temperature,

the PTAT pulse width generator controls the injection time, and when the temperature is higher than the nominal temperature, the CTAT generator takes over, as shown in Fig. [3.9.](#page-67-1) By controlling the slopes of the PTAT and CTAT pulse generators, the effects of frequency drifting can be compensated by a two-segment curve, even if the frequencytemperature characteristic of the injection oscillator is nonlinear.

# 3.3 Circuit Implementation

The overall XO architecture with the optional dynamic pulse width (DPW) injection circuit is shown in Fig. [3.5.](#page-63-0) A simple on-chip LDO is implemented to provide a 0.8V power supply to the DPW injection circuits for supply variation robustness.

The XO core circuit is shown in Fig. [3.10.](#page-68-0) A 3-stage AC-coupled inverter chain biased by a constant- $g_m$  circuit is used instead of self-biased inverters to minimize  $|R_N|$ variation over temperature. Compared to constant- $g_m$  biasing, self-biased inverters are sensitive to temperature variation and can degrade  $|R_N|$  to less than 500 $\Omega$  according to simulation. With the constant- $g_m$  biasing,  $|R_N| > 60k\Omega$  is ensured across the entire -20 $^{\circ}$ C-100°C temperature range.  $C_{F1}$  and  $C_{F2}$  are implemented using two 6-bit MIM capacitors totaling 2pF, and are switched off to save 21% in power after start-up.

The implementation of the DPW injection circuit is shown in Fig. [3.11.](#page-68-1) When reset is set from low to high,  $V_{pulse}$  is pulled up to  $V_{DD}$  and enables the injection oscillator, and two current sources with opposing temperature coefficients start to charge capacitors  $C_{CG}$ . When  $V_{CG,1}$  ( $V_{CG,2}$ ) reaches  $V_{ref}$ , output of the continuous-time comparator flips and sets  $V_1$  ( $V_2$ ) to GND, which pulls  $V_{pulse}$  down and ends the injection, as shown in Fig. [3.11.](#page-68-1) At nominal temperature, the charge rates are equal and generate the maximum pulse width, but away from nominal, one or the other will become larger, producing a narrower pulse. By settling the temperature coefficients of  $I_{PTAT}$ and  $I_{CTAT}$ , the pulse width slope in Fig. [3.9](#page-67-1) can be different, which can compensate nonlinear frequency characteristics of the injection oscillator.

The injection oscillator is implemented using an on-chip three-stage RC-loaded

<span id="page-70-0"></span>

Figure 3.12: Measured XO start-up time for different  $g_m$  stage configurations.

ring oscillator. Resistor  $R_D$  in each stage is designed to be much larger than the inverter's on-resistance, and is implemented as a series combination of PTAT and CTAT resistors. A 6-bit MIM capacitor totaling 776fF is used to achieve a tuning range from 12.8MHz to 21.7MHz with a minimum step size of 180kHz.

## 3.4 Measurement Results and Conclusions

The proposed XO is implemented in 65nm CMOS and tested with 20MHz and 16MHz crystals. Fig. [3.12](#page-70-0) shows the measured frequency settling performance of the 20MHz XO with either zero feedforward, single-capacitor feedforward, multi-capacitor feedforward, and multi-capacitor feedforward with DPW injection. The start-up time is defined as the time for the frequency to settle to within  $\pm 20$ ppm away from the target frequency, which is well within the requirement of many IoT standards such



Figure 3.13: Measured XO start-up time for multi-path feedforwrd w/ and w/o DPW injection. as BLE ( $\pm$ 41ppm). Multi-capacitor feedforward alone requires only 40 $\mu$ s to settle within  $\pm 20$ ppm of the target frequency – an 8.5x improvement compared to 3-stage  $g_m$  only. Importantly, the start-up energy of the two-capacitor feedforward approach is only 10.2nJ. Adding the optional DPW injection feature helps to further improve the start-up time to  $30\mu s$ , with, due to the low-power imprecise injection source, an increase in start-up energy of only 0.9nJ.

To validate the performance of the proposed multi-path feedforward negative resistance boosting technique over PVT variation, 10 chips were measured at 20MHz in Fig. [3.14\(](#page-72-0)a) from -20 $\degree$ C to 100 $\degree$ C, demonstrating that start-up time was kept below  $62\mu s$ , and was on average better than  $40\mu s$  at room temperature. Likewise, start-up time was shown to have an only  $7\mu s$  variation when supply voltage changes from 0.95V to 1.2V. The ring oscillator frequency and injection pulse width are also measured and shown in Fig. [3.14\(](#page-72-0)b) and (c), demonstrating that the XO frequency is always located within and near the center of the main lobe of the injection spectrum. The phase noise is measured to be -146.6dBc/Hz at a 10kHz offset, which is adequate for BLE and IEEE 802.15.4 standards. A table of comparisons is shown in Table I, and a die photo is shown in Fig. [3.15.](#page-72-1) The design achieves fairly competitive start-up time and, more importantly,


Figure 3.14: Measurement result of 20MHz XO: (a) Start-up time measurement results with only proposed negative resistance boosting technique of multiple chips. (b) Injection oscillator frequency variation vs temperature. (c). Injection time vs temperature



Figure 3.15: Micrograph of the fabricated fast start-up XO chip.

achieves a state-of-the-art start-up energy, all in a robust manner.

# 3.5 Summary

This chapter presents a fast start-up crystal oscillator (XO) that reduces both start-up time and energy via an elegantly effective muti-path feedforward  $|R_N|$  boosting technique. To further improve start-up speed, yet with a more energy/cost-favorable imprecise on-chip ring oscillator, an optional dynamic pulse width (DPW) injection is also proposed. The proposed fast start-up technique is implemented in 65nm process and works with 20MHz and 16MHz crystals, achieve start-up times of  $30\mu s$  and  $34\mu s$ 

while consuming 11.1nJ and 13.2nJ, respectively. Multiple chips are measured over temperature and supply voltage to verify the robustness of the employed techniques.

The text of Chapter 3, in part, is based on materials from Xiaoyang Wang and Patrick P. Mercier, "An 11.1nJ-Start-up 16/20MHz Crystal Oscillator with Multi-Path Feedforward Negative Resistance Boosting and Optional Dynamic Pulse Width Injection" in IEEE Custom Integrated Circuits Conference, Mar. 2020. The dissertation author was the primary investigator and author of these papers.





# Chapter 4

# A Battery-Powered Wireless Ion Sensing System

### 4.1 Introduction

Advances in sensors and wireless technologies have enabled new and exciting classes of wearable devices for applications in precision athletics, health, and wellness. However, growth in the wearables market has been slower than many expected, in part due to challenges related to device size, battery life, and sensing capabilities. For example, many current wearables with relatively sophisticated capabilities are larger than desired, in part due to the requirement of a large battery, which is necessary to have acceptable battery life given the relatively high power consumption of underlying circuits. In addition, many wearable devices currently measure only a small handful of physical or electrophysiological parameters such as pressure [\[49\]](#page-122-0), motion [\[50\]](#page-122-1), temperature [\[51\]](#page-122-2), electrocardiography (ECG) [\[52\]](#page-123-0), or electroencephalography (EEG) [\[53\]](#page-123-1). While such parameters might be useful for general well-being or for very specific medical use-cases, more sophisticated sensing functionality is desired to make information derived from wearables more actionable and/or impactful across a wide range of applications.

Measurement of physiochemical parameters, for example ion homeostasis in sweat, blood, saliva, or tears, can potentially provide valuable additional insight into a user's overall health status. For example, measurement of sodium together with heart rate may enable real-time assessment of the risk of congestive heart failure, while monitoring of sodium and calcium may enable diagnosis or monitoring of syndrome of in-

<span id="page-76-0"></span>

Figure 4.1: Block diagram of the wireless ion sensing system.

appropriate anti-diuretic hormone secretion (SIADH) or hyponatremia. While previous work has demonstrated real-time sensing of ion concentration is possible on the body via low-cost patches, temporary tattoos, and other form factors [\[54](#page-123-2)[–56\]](#page-123-3), such sensors were not integrated with small, ultra-low-power sensing instrumentation and/or wireless communication functionality.

This chapter presents the design of a sensing system that integrates ion-selective electrodes with ultra-low-power sensor instrumentation, a wireless transmitter, and power management circuits [\[57\]](#page-123-4). A block diagram of the system is shown in Fig. [4.1.](#page-76-0) All circuit blocks, described in detail in Section [3.2,](#page-64-0) are carefully designed and optimized to consume nW power levels (or lower) in order to ensure ultra-long operation under battery power, or in future implementations via small energy harvesters (e.g., [\[58,](#page-123-5) [59\]](#page-123-6)). Measurement results presented in Section [3.4](#page-70-0) reveal the nW-level power consumption with acceptable system-level performance, including in-vitro results of sodium ion sensing.

### 4.2 Wireless Ion Sensing Platform

The proposed ion sensing system, shown in Fig. [4.1,](#page-76-0) comprises an ion selective electrode (ISE) that interfaces to a high-impedance potentiometric amplifier. A co-fabricated reference electrode (RE) is driven by a reference voltage generator, and is used to set the solution potential. The output of the potentiometric amplifier is digitized by a 10 S/s reference-free charge sharing analog-to-digital converter (ADC). Digital samples are then serialized, shaped, and then sent to a 2.4 GHz RF transmitter. The overall wireless sensing system is powered from a 1.8 V supply, for example from a small on-board battery, which is divided by a 3:1 switched-capacitor DC-DC converter to generate a 0.6 V supply voltage,  $V_{DD}$ , used by the majority of the circuits on-chip. A second supply voltage,  $V_{DDH} = 1.2$  V, is generated by an on-chip charge pump for the purposes of increasing the  $I_{ON}/I_{OFF}$  ratio of critical transistors via boosted gate driving or super cut-off gating. A start-up circuit is employed to ensure the DC-DC converter is clocked during cold-start before the switched-capacitor output is stabilized. Power gating, implemented with thick-gate transistors, is utilized to minimize the leakage current of the critical blocks during the off-state. A Serial Peripheral Interface (SPI) bus is implemented to perform benchtop calibration to characterize each block in Fig. [4.1.](#page-76-0) Design considerations and implementation details of each block are presented in the following sub-sections.

# 4.2.1 Fabrication of ISEs and Implementation of the Potentiometric Front End

The ISE and RE were fabricated utilizing screen-printing technology by employing a MPM-SPM semi-automatic screen printer (Speedline Technologies, Franklin, MA). Figure [4.2\(](#page-78-0)a) outlines the overall fabrication procedure. A sequence of a silver/silver chloride (Ag/AgCl) and graphite layers followed by an insulator layer were printed on a polyethylene terephthalate (PET) substrate, followed by a curing step in a

<span id="page-78-0"></span>

Figure 4.2: Fabrication procedure of the ISE (a) and photo of a fabricated ISE (b).

convection oven after the printing step of each layer. Specifically, the  $Ag/AgCl$  ink was cured at 85  $\rm{°C}$  for 10 minutes, the carbon ink at 80  $\rm{°C}$  for 10 minutes, and the insulator layer at 90 $\,^{\circ}$ C for 15 minutes.

The ISE was then modified with a membrane consisting of 1 mg of sodium ionophore X, 0.55 mg of sodium tetrakis (3,5-bis[trifluoromethyl]phenyl) borate (Na-TFPB), 33 mg of polyvinyl chloride (PVC), and 65.45 mg of bis (2-ethylhexyl) sebacate (DOS), all dissolved in 660 mL of nitrogen-purged tetrahydrofuran (THF). Up to 4  $\mu$ L of the sodium ion  $(Na^+)$  selective membrane cocktail was then drop-cast on the carbon indicator electrode and left overnight to dry under ambient conditions.

The reference membrane, containing electrolytes and forming a nanoporous structure that allows the exchange of electrolytes with the solution and provides a stable potential insensitive to changes in the ion concentration over a large concentration range, was prepared by dissolving 78.1 mg of Polyvinyl butyral resin BUTVAR B-98 (PVB) and 50 mg of NaCl in 1 mL methanol. Then, the reference electrode was modified by  $3 \mu L$  of PVB membrane and left to dry overnight alongside the ISE in ambient conditions.

As shown in Fig. [4.2\(](#page-78-0)b), the fabricated ISE consists of a pseudo reference electrode, driven by the on-chip reference voltage generator, and a working carbon electrode, which interfaces with the on-chip potentiometric amplifier. A blue insulator was screen

<span id="page-79-0"></span>

Figure 4.3: (a) Equivalent circuit model of the ISE. (b) Schematic of the potentiometric amplifier. (c) Schematic of the reference voltage generator.

printed over the surface of the electrode pattern to confine the electrode and contact areas and prevent contamination leakage. The electrodes are disposable and an equivalent circuit model [\[60\]](#page-123-7) is shown in Fig. [4.3\(](#page-79-0)a), where  $R_M$  is the membrane resistance,  $C_{DL}$  is the double-layer capacitance,  $Z_1$  is the Warburg diffusional element, and  $Z_2$  represents mobile cation and anion transport through a hydrated film.

To handle the ∼0.2–0.5 V input range from the ∼GΩ ISE (dominated by  $R_M$ in Fig. [4.3\(](#page-79-0)a)) under the constraints of a 0.6 V supply voltage, a two-stage differentialto-single-ended amplifier with at most 3 stacked transistors is employed with a 0.3 V output swing, a 50 dB gain, and a 57 Hz unity-gain bandwidth, as shown in Fig. [4.3,](#page-79-0) and configured in unity gain feedback to operate as an impedance buffer. Simulation results show that the potentiometric amplifier achieves  $\sim T\Omega$  input impedance, sufficiently large for interfacing with the  $\sim$ G $\Omega$  ISE, and an output noise of 40  $\mu$ V. A MOS-bipolar pseudoresistor-based ladder with a tuning step of 10 mV is implemented to generate a reference voltage, with a simulated 380  $\mu$ V variation across 0 to 100°C and a standard deviation of 4.8 mV due to process variation. This voltage is then buffered by a two-stage amplifier, which directly drives the reference electrode (Fig. [4.1\)](#page-76-0).

<span id="page-80-0"></span>

Figure 4.4: (a) Schematic of the reference-free charge-sharing ADC with offset attenuation and (b) charge-sharing during sample/hold and bit conversion.

# 4.2.2 Implementation of the ADC and Digital Processing Unit

The output of the potentiometric amplifier interfaces directly to a fully-integrated SAR ADC with all necessary peripheral circuitry included. Though previous works have demonstrated SAR ADC [\[61\]](#page-123-8) structures that achieve power efficient operation, most such solutions require external blocks such as reference generators which can consume even more power than the ADCs [\[62\]](#page-123-9). Thus, a reference-free charge-sharing SAR ADC architecture [\[63\]](#page-124-0) is utilized for digitization in the proposed ion sensing system, where energy from the signal itself is bottom-plate sampled and then charge-shared across the capacitive DAC during bit cycling. The overall schematic of the SAR ADC is shown in Fig. [4.4\(](#page-80-0)a). Here, the input signals,  $V_N$  and  $V_P$ , which can be represented by  $-v_{sig} + V_{CM}$  and  $+v_{sig} + V_{CM}$ , respectively, are first sampled onto both the sample and hold capacitors,  $C_{SH}$ , and the binary-weighted DAC capacitors,  $C_1$  to  $C_N$ . During bit conversion, as shown in Fig. [4.4\(](#page-80-0)b), the differential signal,  $2v_{sig}$ , is mapped to a charge signal,  $Q_{sig} = 2v_{sig}C_{SH}$ . In the meantime,  $V_{CM}$  is extracted and converted to reference charge,  $(Q_N = 2V_{CM}C_N)$ . The binary weighted  $Q_N$  are then successively connected to  $V_P$  and  $V_N$  to approximate  $Q_{sig}$  until the residual charge between  $V_P$  and  $V_N$  converges to zero.

<span id="page-81-0"></span>

Figure 4.5: (a) Schematic of the energy-efficient two-stage comparator. (b) Differential kickback current to the comparator input without cascode stage. (c) Differential kick-back current to the comparator input with cascode stage.

The comparator was implemented with an energy-efficient dynamic two-stage topology [\[64\]](#page-124-1), and is shown in Fig. [4.5\(](#page-81-0)a). The first stage (indicated by the dashed box in Fig. [4.5\(](#page-81-0)a)) amplifies the differential input signals from  $V_{\{INP,INN\}}$  to  $V_{\{FP,FN\}}$ , while the second stage consists of both a simple voltage amplification stage and a positive feedback loop to achieve rail-to-rail outputs,  $V_{\{OUTPOUTN\}}$ . The input referred noise,  $\sigma_V$ , of the comparator, which is dominated by the input pair of the first stage, is given by:

<span id="page-81-1"></span>
$$
\sigma_V = \frac{8kT}{C_P} \frac{\phi_t}{V_{threshold}},\tag{4.1}
$$

where  $C_P$  is the parasitic capacitance at the output nodes of the first stage,  $\phi_t$  is the thermal voltage, and  $V_{threshold}$  is the threshold voltage at which the first stage stops and the second stage takes over. To achieve a 10-bit resolution at 0.6 V,  $C_P$  was designed to be larger than 40 fF as indicated by [\(4.1\)](#page-81-1), and tunable for comparator offset calibration.

Kick-back noise can also be very significant in dynamic comparators. Con-

ventionally, the magnitude of the kick-back noise can be minimized by employing a large capacitor at the comparator input,  $C_{cmp,in}$ , which effectively creates a low input impedance. However, in the proposed structure,  $C_{cmp,in}$  consists of  $C_{att}$  in series with  $C_{DAC}$ . With a gate-drain capacitance  $C_{gd} > 5$  fF in the input stage of the comparator, significant kick-back noise (hundreds of  $\mu$ V) can be observed at the input of the comparator and thus must be carefully considered in this design. As shown in Fig. [4.5\(](#page-81-0)b), when the clock signal is high (i.e., when the first stage is in voltage amplification phase), voltages that approximate ramps,  $V_{\{rampN, rampP\}}$ , are generated at the output nodes of the first stage, which can be given by:

$$
V_{\{rampN, rampP\}} = \frac{I_{CMP}}{2C_P}t,\tag{4.2}
$$

where  $I_{CMP}$  is the DC operating current of the first stage of the comparator.  $V_{\{rampN, rampP\}}$ , in return, introduce a kick-back current,  $I_{gate}$ , via the drain-to-gate capacitor,  $C_{gd}$ , of the input transistor pair, which can be computed as:

$$
I_{gate} = \frac{I_{CMP}}{2C_P} C_{gd}.
$$
\n(4.3)

During each comparison cycle, the first stage operates for a period of  $T_{int}$  before the second stages begins to work.  $T_{int}$  is given by:

$$
T_{int} = \frac{2C_P}{I_{CMP}} V_{threshold}.
$$
\n(4.4)

As a result, the total charge variation introduced by the kick-back current during one comparison cycle can be calculated by:

$$
Q_{var} = V_{threshold} C_{gd}.
$$
\n(4.5)

Though, to the first order,  $Q_{var}$  can be canceled out since it shows on both inputs as a common mode signal, it can effectively introduce an offset due to the mismatch in the

<span id="page-83-0"></span>

<span id="page-83-1"></span>**Figure 4.6:** Comparator offset in charge-sharing SAR ADC without  $C_{att}$  (a) and with  $C_{att}$  (b).



Figure 4.7: Schematic of the pulse shaper logic with tunable delay.

input pair as well as in the DAC. More concerning, however, is that signal-dependent charge variation can be observed during the comparison, which would introduce nonlinearities. To address kick-back related issues, in the proposed ADC a cascode pair was implemented to isolate the output node from the inputs of the first stage, thus minimizing the signal-dependent charge variation [\[65\]](#page-124-2), as illustrated in Fig. [4.5\(](#page-81-0)c). Simulation results reveal a greater than  $3\times$  reduction in signal-dependent charge variation by reducing the differential voltage amplitudes at the drain of the input pair.

Since the operation of the charge-sharing ADC is based on the redistribution of the signal charge,  $Q_{SIG}$ , sampled during the sample and hold phase plus an error charge,  $Q_{ERR}$  during bit conversion phase, nonlinearities will be introduced if  $Q_{ERR}$  changes as bit conversion proceeds. On the other hand, comparator offsets, due to transistormatching issues or unequal kick-back, can lead to time-varying  $Q_{ERR}$  and thus nonlinearities in charge-sharing SAR ADCs [\[66\]](#page-124-3) [\[67\]](#page-124-4). The proposed design employs, in addition to capacitive comparator offset calibration and cascode kick-back reduction transistors, an offset-attenuation capacitor,  $C_{att}$ , to isolate the comparator input from the DAC and the sample and hold capacitor. Fig. [4.6\(](#page-83-0)a) shows a simplified block diagram of the charge-sharing ADC and the input offset voltage of the comparator  $(V_{OS})$  without

an attenuation capacitor. Here, offset charge,  $Q_{OS}$ , can be calculated by:

<span id="page-84-0"></span>
$$
Q_{OS} = V_{OS}(C_{SH} + C_{DAC}).\tag{4.6}
$$

As shown in [\(4.6\)](#page-84-0), while  $V_{OS}$  is manifested as a fixed voltage, the charge domain offset,  $Q_{OS}$ , changes during bit conversion as  $C_{DAC}$  alters, thereby introducing signaldependent offset and deteriorating linearity. The offset-attenuation capacitor,  $C_{att}$ , employed in this work, on the other hand, isolates  $V_{OS}$  from  $C_{DAC}$ . As shown in Fig. [4.6\(](#page-83-0)b), the offset charge at  $V_{\{P,N\}}$  becomes:

<span id="page-84-1"></span>
$$
Q_{OS,att} = V_{OS} \frac{C_{att}(C_{SH} + C_{DAC})}{C_{att} + C_{SH} + C_{DAC}}.
$$
\n(4.7)

For fast prototyping,  $C_{att}$  is implemented with a 3 pF Metal-Insulator-Metal (MIM) capacitor, while the total capacitance of the  $C_{SH}$  and  $C_{DAC}$  ( $C_{unit}$  = 44 fF) is 45 pF, though techniques such as placing capacitors in series or customized metal capacitors can render a smaller  $C_{unit}$ . Therefore,  $C_{SH} + C_{DAC} \gg C_{att}$  and [\(4.7\)](#page-84-1) becomes:

<span id="page-84-2"></span>
$$
Q_{OS,att} \approx V_{OS}C_{att}.\tag{4.8}
$$

The offset charge observed at  $V_{\{P,N\}}$  thus becomes constant during bit conversion, as indicated by [\(4.8\)](#page-84-2), thereby minimizing the nonlinearity introduced by the input offset voltage of the comparator. On the other hand, the decision making of the comparator is performed at the input of the comparator in voltage domain,  $V_{\{INPINN\}}$ . As shown in Fig. [4.6\(](#page-83-0)b),  $C_{att}$  can also divide  $V_{\{P,N\}}$  via  $C_{AMP}$ , the input capacitance of the comparator. However, since  $C_{att} \gg C_{CMP}$  ( $C_{CMP} = \sim 10$  fF), the input voltages at the comparator inputs  $V_{\{INP,INN\}} \approx V_{\{P,N\}}$ , indicating that  $C_{att}$  has negligible impact on the dynamic input signal to the comparator. Simulation results reveal a DNL of +2.1/-1 LSB with a 2 mV comparator offset without attenuation capacitors (Fig. [4.6\(](#page-83-0)a)), while a DNL of  $+0.7/-0.7$  LSB is achieved by employing the attenuation capacitor,  $C_{att}$ , indicating an over  $2 \times$  linearity improvement, in good accordance with the above analysis. The

switches in the DAC are designed with minimum size to reduce charge injection, yet to increase on-conductance and minimize non-linearities, they are activated by the charge pump supply,  $V_{DDH}$  (Fig. [4.1\)](#page-76-0), for a 3× improvement in  $R_{off}/R_{on}$ .

To further minimize power at low sampling rates, the ADC is primarily implemented with long-length and high- $V_t$  transistors, and is asynchronously controlled. Specifically, despite requiring a sampling rate of only 10 S/s, the ADC runs instantaneously during bit cycling at a clock rate of 1 MHz. After bit cycling is complete the ADC is clock-gated and placed into a low-power sleep state until the next 10 Hz sample clock edge. The 1 MHz clock is generated by the asynchronous unit as the SAR logic ripples through the 10 controlling slices [\[68\]](#page-124-5). The 10 Hz sampling clock is generated by a low power capacitive-discharging on-chip oscillator ('osc1' in Fig. [4.1\)](#page-76-0) [\[69\]](#page-124-6), whose power consumption is 140 pW when operating at 10 Hz.

Since the transmitter is active during a logic '1', and as described below the TX active power dominates system power budget, the 10-bit ADC output is serialized and passed through pulse-shaping logic to reduce the pulse width for a logic '1'. The delay cell in the conventional pulse shaper shown in Fig. [4.7](#page-83-1) to minimize the active time of the TX, thus saving TX power overhead.

#### 4.2.3 Implementation of the 2.4 GHz RF Transmitter

The 2.4 GHz TX utilizes a direct-RF power oscillator architecture, shown in Fig. [4.8,](#page-86-0) using an on-board 2.8 mm diameter loop antenna as both a radiative and resonant element. Such direct-RF power oscillator structures provide inherent impedance matching to loop antennas and can be readily gated down to very low leakage power levels [\[13\]](#page-119-0).

In this design, a center tap in the loop antenna is connected to  $V_{DD}$ , which provides power to the negative resistance generator. Conventionally, negative resistance is achieved via a cross-coupled pair with a tail current source, whose current can be controlled via a binary-weighted current-mirror approach. Current control is useful to control oscillation amplitude, and therefore the amount of radiated power. However, opera-

<span id="page-86-0"></span>

Figure 4.8: Schematics of direct-RF power oscillator transmitter.

tion of a current mirror transistor requires  $V_{DS} > V_{sat,sub-Vt} \approx 100$  mV in subthreshold to maintain operation in saturation. This  $V_{DS}$  requirement degrades the gate-to-source voltage headroom of the cross-coupled transistors. When more tail current transistors are turned on to increase current, the gate-to-source voltage of the cross-coupled transistors need to be increased accordingly, which is difficult under a fixed supply. At a low supply voltage (e.g., 0.6 V in this design), the increased gate-to-source voltage squeezes  $V_{DS}$  of the tail current transistor and therefore degrades the effects of current tuning due to channel length modulation.

In the proposed design, the tail current sources in [\[70\]](#page-124-7) are replaced with three triode-mode switches and each of them controls a pair of binary-weighted cross-coupled devices. By turning on and off the switch transistors, different weight cross-coupled devices are activated, thus controlling the value of current injected into the LC tank. Since the switch transistors are completed turned on and off,  $V_{DS,M0,i}$  is near zero when it's on. Therefore, the gate-to-source voltage of the cross-coupled pair is maximized, eliminating the conflict between  $V_{DS,M0,i}$  and  $V_{GS,M1(0r\ 2),i}$  when increasing the current. The current tuning range of the TX is thereby increased by 41%, as shown in Fig. [4.9.](#page-87-0)

<span id="page-87-0"></span>

Figure 4.9: Simulated current tunning ability of the proposed (triode-mode) and conventional (saturation-mode) tail devices in an RF power oscillator.

Increasing the number of the cross-coupled pairs will provide better control ability and higher resolution, but the leakage current will also increase proportionally. As the TX is deeply duty-cycled, the standby power matters here. Since three pairs can fulfill the requirements in this application, the number of controlling bits is set to be three to achieve a good trade-off between tuning ability and standby power.

While the TX in this design is on-off keying (OOK) modulated by turning on a fixed number of tail switches completely on and off, the improved linearity of the triodemode switches may be useful in future designs that employ at amplitude-modulated signals. The triode-mode switches also decrease the transistor size and parasitic capacitance by 84% by maximizing the gate-source voltage  $V_{GS}$  of the cross-coupled devices. For the same radiation frequency, the reduced parasitic capacitance permits a larger antenna (0.3 mm larger in diameter) and therefore increases the radiation efficiency (by 16.6% from simulation) and transmitter output power. To further minimize the parasitic capacitance introduced by the cross-couped devices,  $M_1$ , i and  $M_2$ , i are implemented with low- $V_t$  devices so that they can conduct the same current with  $20 \times$  less size than the high- $V_t$  devices. On the other hand,  $M_0$ , i does not have a size limitation, and is thus implemented with high- $V_t$  transistors, which, when sized up for the equivalent on-

conduction of a thin-oxide low- $V_t$  transistor, achieves  $100\times$  lower leakage current in the off mode.

The center frequency of the TX is controlled by the value of inductance and capacitance. The capacitor is implemented using a 5-bit binary-weighted array of digitallyactivated MIM capacitors, totaling 590 fF, and the inductor is implemented using a single-turn circular loop of copper on the board, which are both fairly temperature insensitive. The parasitic capacitance of the transistor may vary by a small amount with temperature, however, which is negligible compared to the 590 fF MIM capacitor. The relatively stable environment temperature of the application (wearable devices) of this design further ensures the frequency stability. To reduce the on-resistance of the digital switches and minimize the impact on the quality factor of the antenna, level shifters operating from  $V_{DDH}$  are used to drive the differential switches that connected to the capacitors.

The TX is deeply duty-cycled, and activated once every 100 ms, transmitting at an instantaneous data rate of 4 Mbps. Between transmissions, the TX is set to an ultralow-power sleep state by gating the tail transistors and power gating the control signals and level shifters, the latter of which reduces leakage power by  $4 \times$ .

#### 4.2.4 On-board Antenna Design

The power oscillator's loop antenna was implemented as a single-turn circular loop of 1 oz (i.e., 35  $\mu$ m thick) copper on an FR-4 substrate. In many cases, antennas for small portable electronic devices are electrically small, and their radiation efficiency increases with the physical size of the antenna [\[71\]](#page-124-8). Generally, the largest antenna permissible under application-driven size constraints is chosen. In the present application, the antenna should be made to be no larger than the size of a ∼3−5 mm coin cell battery. In addition,  $\eta_{rad}$  of electrically small antennas increases with frequency, and thus the antenna should support a self-resonant frequency as high as possible, though in close proximity to an Industrial, Scientific, and Medical (ISM) band (e.g., 2.4 GHz).

<span id="page-89-0"></span>

Figure 4.10: Antenna optimization: (a)  $\eta_{rad}$  and the required capacitance for resonance of a 2.8 mm diameter TX antenna with two 4 mm bond wires, and (b) required resonant capacitance for different size of antennas connected with minimum-estimated parasitic inductance (6.3 nH from two 4 mm bond wires).

Figure [4.10a](#page-89-0) illustrates simulated  $\eta_{rad}$  of a circular coil with a trace width of 0.4 mm and a diameter of 2.8 mm, when this antenna is connected to the power oscillator via two 4 mm bonding wires and the parasitics of a  $9 \times 9$  mm<sup>2</sup> QFN package. Here, it can be seen that operating at higher frequencies offers improved  $\eta_{rad}$ . However, the power oscillator requires the antenna to look inductive, and thus it is forbidden to choose the carrier frequency beyond the self-resonant frequency (8 GHz in Fig. [4.10a\)](#page-89-0). In addition, the parasitic capacitance of the bonding pads and electrostatic discharge diodes restricts the maximum resonant frequency since it decides the minimum resonant-tuning capacitance. Based on the layout-extracted parasitic capacitance (150 fF), a maximum

<span id="page-90-0"></span>

Figure 4.11: nW power switched-capacitor DC-DC converter (a) architecture and gate driver and (b) connection during phase one  $(\phi_1 = \phi'_1 = 1, \ \phi_2 = \phi'_2 = 0)$  and phase two  $(\phi_1 = \phi'_1 = 0$  and  $\phi_2 = \phi'_2 = 1$ ).

resonant frequency of 3.8 GHz is achieved, as shown in Fig. [4.10a,](#page-89-0) which offers sufficient margin to safely operate in the 2.4 GHz ISM band. If the size of the antenna were increased to afford increased  $\eta_{rad}$ , the resulting required resonant capacitance would decrease, as shown in Fig. [4.10b.](#page-89-0) This helps set a bound on the maximum tolerable tuning capacitance for the present size, and a guideline for how much tuning capacitance would be needed if future designs were to use a slightly larger antenna size.

Note that while an on-chip antenna could have provided a fully integrated solu-tion [\[72\]](#page-124-9), on-chip antennas tend to suffer from: 1) low radiation efficiency,  $\eta_{rad}$ , due to the limited dimension, thus usually requiring operation at high frequencies (e.g.,  $> 10$ ) GHz) which is not suitable for low-power applications; 2) larger capacitors to tune an on-chip antenna which will occupy a large core area and make it difficult to control the resonance at a fine step. On the other hand, in the proposed application the form-factor of the overall system is determined by the source device, for example, a battery. Therefore, an on-board antenna can achieve higher  $\eta_{rad}$  and provides more design flexibility, and is thus employed here.

#### 4.2.5 Implementation of the Power Management Unit

The overall system is powered by a 1.8 V battery, which is converted to 0.6 V via a 3:1 switched-capacitor DC-DC converter. Since the switching frequency is low (10 Hz) and load power is only a few nW, careful consideration must be taken to minimize leakage power. Amongst possible switched-capacitor DC-DC converter topologies, the Dickson topology can achieve  $6.3\times$  and  $1.8\times$  lower leakage power than the Ladder and Fibonacci topologies when configured for a similar ratio and with the same on resistance. In addition, the Dickson topology has low short-circuit current and good slowswitching limit (SSL) performance metrics [\[73\]](#page-124-10), and is thus chosen for this design. The implemented converter is shown in Fig. [4.11](#page-90-0) along with its gate driver and states during its two phases of operation. The employed power switches are implemented using thin-oxide standard- $V_t$  transistors, which for the same on-resistance, offer  $19\times$  lower leakage than low- $V_t$  devices. Off-chip ceramic capacitors (each  $1 \mu F$  and  $1 \times 0.5$  mm<sup>2</sup>) are employed to support the large instantaneous current draws from the TX, while also enabling a low SSL impedance during continuous operation. To ensure proper operation, the main ESD supply voltage is connected to the battery voltage. In addition, multiple diodes are stacked to prevent breakdown and reduce the leakage and turn-on current.

At steady state, the three flying capacitors divide the supply voltage into several voltage domains. To reduce leakage power and the risk of breakdown, the circuit is driven by cascode level-shifters, which are powered from the local power capacitor connected to the relevant switch in each voltage domain. For example, the terminals across capacitor  $C_1$  provide the power rails for the driver to switch transistors  $M_{N3}$  and  $M_{P3}$ . To do so, clock signals  $\phi_2$  and  $\phi_1$ , which are referenced between GND and 0.6 V, drive NMOS transistors  $M_{1,1}$  and  $M_{1,2}$  in the gate driver, respectively, which generate two pull-down signals connected to the sources of  $M_{2,1}$  and  $M_{2,2}$ , which toggle the latch formed by a pair of cross-coupled inverters to either the on or off position. The output of the latch is then buffered and then used to drive  $M_{N3}$  and  $M_{P3}$  (signal  $\phi'_1$ ). Similarly, NMOS transistors  $M_{3,1}$  and  $M_{3,2}$  in the gate driver reproduce these signals to drive the level-shifter above them, whose voltage is generated via the rails of capacitor  $C_2$ , and used to drive  $M_{P4}$  (signal  $\phi'_2$ ). Since  $C_1$  and  $C_2$  provide a two times larger voltage to the driver than  $C_{out}$ , transistors  $M_{N3}$  and  $M_{P3-5}$  are sized accordingly smaller to reduce

<span id="page-92-0"></span>

Figure 4.12: (a) Power dissipation introduced by the delay of  $\phi_1$  and  $\phi_2$  (b) Circuit to generate the aligned  $\phi_1$  and  $\phi_2$ .

parasitic capacitance and leakage power.

During switching, delay of the two differential clock signals can introduce unnecessary short-circuit power dissipation. For example, if  $\phi_2$  switches from "1" to "0" earlier than  $\phi_1$  at the end of phase 2 (when  $\phi_1 = 0$  and  $\phi_2 = 1$ ), as illustrated in Fig. [4.12\(](#page-92-0)a),  $M_{P1}$  would turn on and cascade  $C_1$  and  $C_{out}$ , generating a voltage of  $3V_{OUT}$  at the top plate of  $C_1$ . Since  $M_{P3}$  would still be on in this situation, the voltage difference between the top plate of  $C_1$  and  $C_3$  generates a short-circuit current. Simulations of a baseline design without short-circuit current optimizations reveals a short-circuit power can be larger than the rest of the DC-DC converter's power. To minimize short-circuit current, the circuit in Fig. [4.12\(](#page-92-0)b) is used to generate differential aligned clock signal to drive the switched-capacitor circuit. Faster inverters with even numbers are used in path 1 and slow inverters with odd number are used in path 2 to keep the delay the same while generating the differential clock signal. In this manner, quiescent power is reduced by at least 21%.

### 4.3 Measurement Results

The wireless ion sensing chip was implemented in a 1 mm  $\times$  1.2 mm 65 nm CMOS chip, and was packaged and soldered to a FR-4 substrate PCB with a 2.8 mm

<span id="page-93-0"></span>

<span id="page-93-1"></span>Figure 4.13: Measured 128-time averaged 1024-bin FFT plot of the charge-sharing ADC.

Table 4.1: Power Breakdown of the Ion Sensing System (Default in pW).

|                                                     |  | <b>AFE Buffer ADC Timer TX DC-DC Eff. Total</b> |                      |
|-----------------------------------------------------|--|-------------------------------------------------|----------------------|
| 406   15 $\times$ 2   780   140 $\times$ 2   2.4 nW |  | $70.5\%$                                        | $\frac{1}{2}$ 5.5 nW |

on-board loop antenna. For testing and debugging purposes, a  $9\times 9$  mm<sup>2</sup> QFN package was employed. In a future design iteration, many of the chip pads could be left unconnected, and a chip-on-board bonding strategy would significantly reduce the occupied chip footprint. Small diameter batteries (e.g., down to 4.8 mm) could also be used to provide power in a very small form factor. During testing, the DC-DC converter provided a 0.6 V supply to all load circuits.

#### 4.3.1 Benchtop Measurements

Measurement results show the potentiometric amplifier consumed 406 pW when operating from a 0.6 V supply. This includes the power required to drive the capacitors in the SAR ADC. At 10 S/s, the reference-free SAR ADC was measured to consume 780 pW, including the power of the 4.2 pW charge pump (Fig. [4.1\)](#page-76-0), which, to the best of our knowledge, is the lowest power 10-bit 10 S/s ADC with all peripheral circuits integrated. The measured input referred noise of the comparator was  $350 \mu V$ . Since different spectrum tests were required to characterized the performance of the ADC in different

<span id="page-94-0"></span>

Figure 4.14: Measured FoM of the ADC when operating at different sampling frequency (a) and when operating from different supply voltages at 10 S/s (b), achieving an FoM better than 379 fJ/conv-step.

environment while the ADC was operating at very low frequency (down to a few Hz), to accelerate the measurement a 1024-point FFT was calculated. However, since the frequency band of interest was only 0 to 5 Hz, a 1024-bin FFT provided a frequency resolution better than 0.005 Hz, which is good enough for the purpose of benchmarking the ADC ENOB. To achieve an accurate noise floor measurement, as shown in Fig. [4.13,](#page-93-0) a 128-times averaged 1024-point FFT with Hanning windowing was performed, and an ENOB of 8.3 bits was measured (ENOB degradation was mainly introduced by comparator noise and DAC parasitics), for an energy efficiency of 244 fJ/conv-step. At such a low sampling frequency, measured efficiency was dominated by leakage power. Figure [4.14\(](#page-94-0)a) shows the measured figure-of-merit (FoM) at different sampling frequencies. At 1 kS/s, the ADC power was 2.4 nW and the measured ENOB was 8.31 bits, resulting in an efficiency of 7.6 fJ/conv-step, further illustrating the leakage dominance at low sampling rates. When operating from 0.6 to 0.8 V with a sampling rate of 10 S/s, the measured FoM varied from 244 fJ/conv-step at 0.6 V to 1381.2 fJ/conv-step at 0.8 V without power gating, as shown in Fig. [4.14\(](#page-94-0)b). Power gating, which could be enabled when  $V_{DD} > 0.7$  V, improved the FoM significantly (e.g., 4.3 $\times$  at 0.8 V) by reducing

<span id="page-95-0"></span>

Figure 4.15: Measured DNL (a) and INL (b) of the reference-free charge-sharing ADC.

leakage in the sleep mode, and ensured an FoM better than 379 fJ/conv-step across the supply ranges from 0.6 to 0.8 V. The measured DNL and INL were -0.36/0.58 LSB and -0.77/0.84 LSB, respectively, as shown in Fig. [4.15.](#page-95-0)

At 1 m distance, the TX radiated -64.6 dBm of power as measured by a  $\lambda$ /4 whip antenna. Fig. [4.16\(](#page-96-0)a) shows the measured output spectrum measured at  $\sim$ 10 cm with a  $\lambda$ /4 whip antenna when operating with 4 Mbps OOK modulation. When transiently powered by  $C_{out}$ , a 1  $\mu$ F 1×0.5 mm<sup>2</sup> capacitor, the TX consumed 154.5  $\mu$ W of instantaneous power, corresponding to a bias current of  $257 \mu A$ , set by the default power control code "010". The start-up time of the TX was measured to be  $<$  52 ns, as shown in Fig. [4.16\(](#page-96-0)b). The measured sleep-mode power of the TX was 500 pW, thus achieving an average power of 2.4 nW after duty-cycling to 100 bps (i.e., 10 S/s).

Clocked by the reference-free relaxation oscillator, the switched-capacitor DC-DC converter operated between 2-10 Hz. The measured efficiency of the DC-DC converter at different frequencies with 10 nA load current is shown in Fig.  $4.17(a)$  $4.17(a)$ , while Fig. [4.17\(](#page-97-0)b) shows measured efficiency with different load current when operating at 10 Hz. As shown in Fig. [4.17\(](#page-97-0)b), the DC-DC converter achieved a peak efficiency of 96.8% when operating with a load current of 100 nA.

<span id="page-96-0"></span>

Figure 4.16: (a) Spectrum measured using a λ/4 whip antenna placed ∼10 cm from the on-board loop antenna with 4 Mbps OOK modulation. (b) Measured TX start-up time.

Table [4.1](#page-93-1) shows the power breakdown of the whole ion sensing system. Together, all of the load circuits in the wireless ion sensing system consumed 3.9 nW. At this load, the DC-DC converter achieved an efficiency of 70.5%, for a total system power consumption of 5.5 nW.

#### 4.3.2 In-vitro Tests

The ion sensing system was measured in-vitro with 0.1-100 mM NaCl concentration with the SPI configured to the default tuning set. The Ag/AgCl reference electrode, consisting of a polymeric PVB membrane, was biased by the ladder-based reference generator at 500 mV to provide a stable, mid-supply solution potential for the ISE output recording (Fig. [4.18\)](#page-98-0). The ADC was configured as pseudo-differential structure: the input  $V_N$  in Fig. [4.4,](#page-80-0) was connected to the output of potentiometric amplifier while  $V_P$  was biased through a voltage buffer (Fig. [4.2\(](#page-78-0)b)) at mid supply, resulting in

<span id="page-97-0"></span>

Figure 4.17: Measured efficiency of the DC-DC converter versus clock frequency (a) and load current (b).

a 0.7 LSB DNL degradation. The charge-sharing ADC, when operating at 10 S/s, required an input switching power less than 16 pW ( $\sim$ 2% of the total ADC power), which was well within the sourcing capability of the potentiometric amplifier and thus takes no extra power in the signal driver. In addition, an on-board ceramic capacitor was utilized after the potentiometric amplifier to minimize the large instant current spikes that might occur during sampling. The in-vitro test was performed by first using pure water as background, giving ∼510 mV output as shown in Fig. [4.18.](#page-98-0) Samples with different NaCl concentrations were then dropped onto the electrode and recorded for approximately 40 seconds before additional solution was added. Note that the variation on the reference voltage shows as a common-mode voltage and thus can be rejected. On the other hand, the systematic error will affect the reference voltage for digitization in ADC and therefore effectively introduces an ADC offset error which, however, does not matter in the proposed application since, for example, as shown in Fig. [4.18,](#page-98-0) the offset error will only effectively shift the y-axis, which can be calibrated out during normal system operation. The in-vitro measurements shown in Fig. [4.18](#page-98-0) exhibit a linear, near-Nernstian response with a response slope of 71 mV/log10[Na<sup>+</sup>], as better shown in

<span id="page-98-0"></span>

Figure 4.18: In-vitro ion concentration measurements with  $Na^+$ -selective electrode.

Fig. [4.19,](#page-99-0) thereby indicating the ability of the proposed system to accurately detect and wireless transmit ion concentration with only 5.5 nW of power. Note that the proposed ion sensing system can be adapted to measure other ions such as potassium, chloride, etc., by using different ionophores. A table summarizing system performance is shown in Fig. [4.20,](#page-99-1) and the die and PCB photos are shown in Fig. [4.21.](#page-100-0)

### 4.4 Summary

An ultra-low-power battery-connected wireless ion sensing system has been presented in this chapter. The platform comprises ISEs, a potentiometric amplifier, a reference-free charge-sharing SAR ADC with offset attenuation, a digital processing unit including a serializer and a pulse shaper, a 3:1 Dickson switched-capacitor DC-DC converter, a direct-RF power oscillator employing triode-mode switches, and low-power relaxation oscillators for clock generation. Measurements reveal a total system power consumption of 5.5 nW, resulting in the lowest power wireless ion sensing system to date. In-vitro testing shows a near-Nernstian response to varying  $Na<sup>+</sup>$  concentrations, indicating the ability of the proposed system to accurately detect and wireless transmit

<span id="page-99-0"></span>

<span id="page-99-1"></span>Figure 4.19: In-vitro measurements demonstrates that the wireless sensing platform achieves a linear response to the ion concentration.

| Technology             | 65 nm CMOS                                   |                               |                                                  |  |
|------------------------|----------------------------------------------|-------------------------------|--------------------------------------------------|--|
| Chip area              | $1 \text{ mm} \times 1.2 \text{ mm}$         |                               |                                                  |  |
| <b>TX</b> frequency    | 2.37 GHz                                     | <b>Power consumption [pW]</b> |                                                  |  |
| <b>TX</b> output power | -64.6 dBm $@1m$                              | Pot. Amp.                     | 406                                              |  |
| TX start-up time       | $52$ ns                                      | <b>ADC</b>                    | 780 @ 10 S/s                                     |  |
| <b>ADC ENOB</b>        | 8.3 bits                                     |                               | 500 (standby)                                    |  |
| <b>ADC FoM</b>         | 244 fJ/conv @10 S/s<br>7.7 fJ/conv $@1$ kS/s | TX                            | $154 \mu W$ (active)<br>2.4 nW (average $\omega$ |  |
| SC DC-DC eff.          | 96.8% @100 nA<br>$70.5\%$ @ 6.5 nA           | <b>Oscillator</b>             | 100 bps<br>140 @ 10 Hz                           |  |

Figure 4.20: Tables summarizing chip measurement results.

ion concentration at near-zero power levels.

The text of Chapter 4, is based on and mostly a reprint of the materials from "A 5.5nW Battery-Powered Wireless Ion Sensing System," in Proc. IEEE European Solid-State Circuits Conference (ESSCIRC), Sep. 2017 by Hui Wang, Xiaoyang Wang, Jiwoong Park, Abbas Barfidokht, Joseph Wang, and Patrick Mercier, and "A Battery-Powered Wireless Ion Sensing System Consuming 5.5 nW of Average Power," IEEE Journal of Solid-State Circuits (JSSC), Apr. 2018 by Hui Wang, Xiaoyang Wang, Abbas Barfidokht, Jiwoong Park, Joseph Wang, and Patrick Mercier. The dissertation author designed the analog front-end, wireless transmitter and DC-DC converter in the system, was the primary investigator and author of this paper, and co-authors have approved the

<span id="page-100-0"></span>

Figure 4.21: Die photo of the proposed ion sensing chip (a) and PCB photo (b).

use of the material for this dissertation.

# Chapter 5

# Summary

In this thesis, the breakthroughs made to low-dropout regulator, crystal oscillators and wireless sensor platforms have been described. Different architectures and techniques have been proposed and discussed which makes the wireless sensing node more energy-efficient, and has a faster response time. The following is a summary of the key points and results of chapters in the dissertation.

Chapter 2 presents an event-driven charge-pump-based low-dropout (LDO) regulator with an AC-coupled high-Z (ACHZ) feedback loop. By using the ACHZ loop and continuous-time dead zone detection, the proposed LDO responds in less than a clock cycle during load transients, achieving response and settling times of 6.9 ns and 65 ns, respectively, all at a 4.9  $\mu$ A quiescent current for a sub-4 fs FoM. The output ripple is measured to have a stable amplitude and is  $\langle 15 \text{ mV}$  over the LDO's  $105,000 \times$ stable load range (1  $\mu$ A to 105 mA). In addition to all these features, the proposed LDO also retain the advantages of normal digital LDOs: process portability, and the ability to operate at a low supply voltage.

Chpater 3 presents a fast start-up crystal oscillator (XO) that reduces both startup time and energy via an elegantly effective muti-path feedforward  $|R_N|$  boosting technique. To further improve start-up speed, yet with a more energy/cost-favorable imprecise on-chip ring oscillator, an optional dynamic pulse width (DPW) injection is also proposed. The proposed fast start-up technique is implemented in 65nm process and works with 20MHz and 16MHz crystals, achieve start-up times of  $30\mu s$  and  $34\mu s$  while consuming 11.1nJ and 13.2nJ, respectively. Multiple chips are measured over temperature and supply voltage to verify the robustness of the employed techniques.

Chpater 4 presents a battery-powered wireless ion sensing platform featuring complete sensing-to-transmission functionality. A 1 mm  $\times$  1.2 mm chip fabricated in 65 nm includes a 406 pW potentiometric analog front end, a 780 pW 10-bit SAR ADC, a 2.4 GHz power-oscillator-based wireless transmitter that consumes an average of 2.4 nW during a 10 sample/sec transmission rate, two timing generation oscillators that each consume 140 pW, and a 3:1 switched-capacitor DC-DC converter with 485 pW of quiescent power that achieves efficiencies of 96.8% and 70.5% at 60 nW and 3.9 nW loads, respectively. The chip connects to a screen-printed ion selective electrode (ISE) responsive to sodium ions, and in-vitro testing across a NaCl solution concentration range of 0.1-100 mM exhibited a linear near-Nernstian response with a slope of 71 mV/log10 $[Na^+]$ . When all blocks are operating, the system consumes an average of 5.5 nW.

# Appendix A

# A Switch-Capacitor Fast Start-Up TX

### A.1 Motivation

Wireless sensing systems have countless possible applications and potentials. However, due to the size and application environment limitations, the wireless sensor nodes usually have a small battery and very limited power budget. Therefore, the circuits should be designed to be low-power and energy-efficient. Many emerging sensing applications have the feature that do not change rapidly with time, such as temperature, air quality and human body ion concentration sensing. With this feature, the sensing system can have a low sampling rate and be aggressively duty-cycled into sleep mode to save power. The power-hungry blocks such as the TX is activated only when data needs to be transferred. Since the majority of the circuits in the system is power gated during the sleep mode, the average power can be greatly reduced. In this case, the average power consumption of the system is not determined primarily by the active power, but rather from a combination of active power and sleep-mode leakage power, with a large portion coming from sleep-mode static or leakage power. Thus, minimize the sleepmode power is the key to reduce the average power of the wireless sensing nodes with ultra-low data rates.

The power-oscillator-based transmitter [\[13\]](#page-119-0) is a widely used architecture in IoT applications which is specifically optimized for standby power in the picowatt regime. It has the advantages of low complexity, inherent impedance matching and ability to work with low power supply. The general architecture of the power-oscillator-based transmitter is shown in Fig. [A.1.](#page-104-0) It consists of three parts, the cross-coupled transistor

<span id="page-104-0"></span>

Figure A.1: General architecture of the power-oscillator-based transmitter.

to provide a negative resistance to sustain oscillation, a capacitor array to perform frequency tuning, and a center-tap loop antenna. OOK modulation is usually used for the communication for its low-power. The power oscillator can be viewed as two amplifiers in cascade and forms a positive feedback loop, and Fig. [A.2](#page-105-0) illustrate how the oscillation starts. Due to the high-Q of the LC tank, the power oscillator usually needs a long time to start up, which limits the maximum data rate and power efficiency. To reduce the start-up time and oscillation amplitude grow speed, one straight-forward way is to increase the transconductance  $g_m$  of the cross-coupled transistor, which needs to increase the bias current and burn more power. In this section, a switch-capacitor fast start-up TX architecture is proposed, which achieves immediate start-up without increasing the bias current.

<span id="page-105-0"></span>

Figure A.2: PO positive feedback and oscillation illustration.

<span id="page-105-1"></span>

Figure A.3: Proposed fast start-up PO-based TX architecture.

# A.2 Architecture and Working Flow

Fig. [A.3](#page-105-1) shows the architecture of the proposed fast start-up TX architecture. Instead of connecting the bottom plates of all the capacitors to a fixed power supply  $(GND$  or  $V_{DD}$ ), the bottom plate of one side of the capacitors are connected to  $GND$ and the other side are connected to  $V_{DD}$ . Instead of burning more current and power to boost the  $g_m$  and reduce the start-up time, the capacitors are switched to generate an initial voltage difference at the output of the power oscillator to help the start-up. The working flow is shown in Fig. [A.4.](#page-106-0) During steady state, the tail-current transistor  $M_0$  is turned off, and bottom plate of capacitor  $C_1$  is connected to  $V_{DD}$  and the bottom plate of

<span id="page-106-0"></span>

Figure A.4: Different working phases of fast start-up TX.

the capacitor  $C_2$  on the other side is connected to  $GND$ . The voltage difference across  $C_1$  is zero and that of  $C_2$  is  $V_{DD}$ . When the TX is turned on, the bottom plate of  $C_1$ is switched from  $V_{DD}$  to  $GND$  and the bottom plate of  $C_2$  is switched from  $GND$  to  $V_{DD}$ . Since the current in the inductor can not increase instantaneously, the voltage on the top plate of  $C_1$  will be pulled down to 0 and the top plate voltage of  $C_2$  will be boosted to  $2V_{DD}$ . Therefore, an initial voltage difference of  $2V_{DD}$  is generated at the power oscillator output. Then, the tail current transistor  $M_0$  is turned on, and the power oscillator starts oscillation immediately. When the transmission ends, the tail current transistor is turned off, and enters the standby mode. At the begining of the next startup, the bottom plate of the capacitors at the left and right side flips again, and the voltage at the differential output of the power oscillator is boosted again, achieves another fast start-up, as shown in Fig. [A.4](#page-106-0) (c) and (d).

<span id="page-107-0"></span>

Figure A.5: TX start-up: (a). normal start-up (b). with proposed fast start-up technique (c). fast start-up waveform zoom in

Fig. [A.5](#page-107-0) shows the simulated start-up waveform of TX with and without the proposed fast start-up technique. With same stead-state amplitude, which means same bias current, the normal TX needs at least 285ns to start up, while the TX with the proposed technique can start up immediately.

# A.3 PO-based TX Power Efficiency Analysis

Using the proposed fast start-up technique can effectively reduce the start-up time, increase the data rate and save power from the system level. It is also important to analyze and design the PO so that itself is power efficient. In this section, the relationship and trade-off between antenna size, radiation efficiency, PO power efficiency and frequency will be discussed, and the ultimate goal is to provide a guide to the designer
about how to maximize the TX power efficiency.

In Fig. [A.6,](#page-109-0)  $I_{SS}$  is the bias current and  $R_P$  is the equivalent parallel resistance of the LC tank. The output power of the PO is:

$$
P_{out} = \eta_{rad} \times P_{ant}, \tag{A.1}
$$

where  $\eta_{rad}$  is the radiation efficiency of the antenna and  $P_{ant}$  is the power dissipates on the antenna:

$$
P_{ant} = 2 \times \frac{V_{out}^2}{R_P}
$$
  
= 
$$
2 \times \frac{(\frac{1}{\sqrt{2}} \frac{2}{\pi} I_{SS} R_P)^2}{R_P}
$$
  
= 
$$
(\frac{2I_{SS}}{\pi})^2 R_P.
$$
 (A.2)

<span id="page-108-1"></span>Therefore, the power efficiency of the TX is:

$$
\eta_{total} = \frac{P_{out}}{P_{tot}} = \frac{\eta_{rad}}{I_{SS}V_{DD}} \times (\frac{2I_{SS}}{\pi})^2 R_P
$$
  
= 
$$
(\frac{2}{\pi})^2 \frac{\eta_{rad}}{V_{DD}} I_{SS} R_P,
$$
 (A.3)

Since the maximum oscillation swing is limited by the power supply voltage  $V_{DD}$ , that is:

$$
\frac{2}{\pi}I_{SS}R_P \leqslant V_{DD},\tag{A.4}
$$

which means that:

<span id="page-108-0"></span>
$$
I_{SS}R_P \leq \frac{\pi}{2}V_{DD} \to (I_{SS}R_P)_{max} \leq \frac{\pi}{2}V_{DD}.
$$
 (A.5)

Substitute [A.5](#page-108-0) into [A.3,](#page-108-1) we can get the maximum power efficiency of the TX is:

$$
\eta_{total,max} = \frac{2}{\pi} \times \eta_{rad},\tag{A.6}
$$

<span id="page-109-0"></span>

Figure A.6: PO-based TX.

<span id="page-109-1"></span>

Figure A.7: Small loop antenna model.

which means that the maximum power efficiency of the TX is determined by the radiation efficiency of the antenna.

For the loop antenna, it can be modeled as an inductor in series with a resistor  $R_{ant}$ , with parasitic capacitance in parallel, as shown in Fig. [A.7.](#page-109-1) The antenna's series resistance  $R_{ANT}$  is made of two parts: the radiation resistance  $R_{rad}$  and loss resistance  $R_{loss}$ . The  $R_{loss}$  models energy that dissipated as heat and  $R_{rad}$  is the part that converts energy to useful electromagnetic radiation. The radiation efficiency of the antenna is expressed as:

$$
\eta_{rad} = \frac{R_{rad}}{R_{rad} + R_{loss}},\tag{A.7}
$$

where the radiation resistance can be approximated by

<span id="page-110-0"></span>
$$
R_{rad} = \sqrt{\mu \varepsilon} \cdot \frac{8\pi^3}{3} (\frac{A}{\lambda^2})^2.
$$
 (A.8)

Here  $\mu$  is the permeability of the surrounding environment,  $\varepsilon$  is the permittivity, A is the antenna's area and  $\lambda$  is the operational wavelength. The loss resistance  $R_{loss}$  can be calculated as:

<span id="page-110-1"></span>
$$
R_{loss} = l \times R_{loss,pu},\tag{A.9}
$$

where *l* is the diameter of the antenna and  $R_{loss,pu}$  is the per-unit-length resistance. Considering the skin effect,  $R_{loss,pu}$  can be expressed as:

$$
R_{loss,pu} = \frac{1}{\sigma d_W \delta_s},\tag{A.10}
$$

here  $\sigma$  is the metallic conductance and equals  $5.96 \times 10^7 S/m$  of copper and  $3.77 \times$  $10^7 S/m$  of aluminium.  $d_W$  is the distance around the perimeter of the wire, for PCB trace,  $d_W$  can be approximated as the width of the metal trace.  $\delta_s$  is the skin depth and is given by

$$
\delta_s = \sqrt{\frac{1}{\pi f \mu \sigma}},\tag{A.11}
$$

A Matlab model is built and the relationship between the antenna radius and radiation and power efficiency is simulated and shown in Fig. [A.8.](#page-111-0) From the simulation results, we can observe that the radiation efficiency and total power efficiency first increases with the antenna radius, but when the antenna radius is larger than 30mm, they start to saturate. This is because  $R_{rad}$  increases much faster than  $R_{loss}$ . From Equ. [A.8](#page-110-0) and Equ. [A.9](#page-110-1) we can observe that,  $R_{rad}$  is proportional to  $A^2$ , that is, proportional to  $r^4$ , where r is the radius of the antenna.  $R_{loss}$  is proportional to the diameter l and r. So when the antenna size is small,  $R_{rad}$  increases much faster than  $R_{loss}$  and the radiation efficiency increases with the antenna radius. When the antenna size is very large,  $R_{rad}$ dominates and the radiation efficiency starts to saturate.

<span id="page-111-0"></span>

Figure A.8: Antenna radiation efficiency vs antenna radius.

Besides the saturation limitation, another factor that limits the maximum antenna size is the parasitic capacitance. Due to the parasitic capacitance, there is a maximum equivalent inductance for the antenna at a given frequency. Since the antenna equivalent inductance is proportional to the antenna radius, the maximum radius is also limited. The relationship between the maximum antenna radius and maximum total power efficiency is simulated with a parasitic capacitance of 100fF and shown in Fig. [A.9.](#page-112-0)

From the above analysis, it shows that the maximum power efficiency of the PO is proportional to the antenna radiation efficiency. When the antenna size is small, increase the antenna size can increase the radiation efficiency, but the radiation and maximum power efficiency start to saturate when the antenna size reach a limit. Besides, the parasitic capacitance and operation frequency also put a limit on the maximum antenna size, which set a trade-off between the maximum power efficiency, antenna size and operation frequency.

## A.4 Pulse-Drive PO-based TX

In this section, we will first discuss what contributes the total power consumption of a PO-based TX. Based on the power breakdown, we discuss how to reduce each

<span id="page-112-0"></span>

Figure A.9: Maximum antenna radius and power efficiency vs frequency.

part so as to reduce the total power consumption. Last, a pulse-drive power oscillator architecture is proposed to save the power dissipates on the cross-coupled transistor and increase total power efficiency.

For the PO-based TX, the power consumption can be divided into three parts:  $P_{res,ind}$ ,  $P_{res,cap}$  and  $P_{cross}$ , as shown in FIg. [A.10.](#page-113-0)  $P_{res,ind}$  is the power dissipates on the parasitic resistance of the inductor/antenna,  $P_{res,cap}$  is the power dissipates on the on-resistance of the switches that connects the capacitors to ground, and  $P_{cross}$  is the power dissipates on the cross-coupled transistors. Since the radiation power is directly proportional to  $P_{res,ind}$ , with a same antenna (radiation efficiency), a high radiation power needs a higher  $P_{res,ind}$ , and increase the antenna radiation efficiency can lower the  $P_{res,ind}$  requirement.

To reduce  $P_{res,cap}$ , a smaller on-resistance of the switch is desired. Since the switch connects the bottom plate of the capacitor is usually implemented using MOS transistors, increasing the size of the transistor can reduce the on-resistance of the switch, however, will also introduces a larger parasitic capacitance when the switch is turned off, which degrades the tuning range of the TX. A differential switch is usually used to reduce the AC on-resistance by half while introduce only one additional transis-

<span id="page-113-0"></span>

3.  $P_{cross}$ : static power of the cross-coupled transistor;

Figure A.10: PO-based TX power dissipation illustration.

tor, as shown in Fig. [A.1.](#page-104-0) To effectively reduce the on-resistance of the switch while not introduce additional parasitic capacitance, sometimes a voltage-boosting driver is used to drive the switch which can reduce the on resistance by 94% [\[8\]](#page-118-0).

The third part  $P_{cross}$  is due to that when the transistor  $M_1$  and  $M_2$  is turned on and conducts current from  $V_{out}$  to ground. With large input swings, we can assume that  $M_1$  and  $M_2$  experience complete switching, injecting nearly square current waveforms into the LC tank from the tail-current source, and the power is determined by the drain voltage amplitude of the cross-coupled devices. To minimize  $P_{cross}$  and increase the power efficiency, a pulse-drive power oscillator is proposed. The idea is, instead of using a cross-coupled pair where the two transistors drive each other, a voltage pulse is generated to drive the gate of the two transistors so that they are only turned on when the drain voltage of  $M_1$  and  $M_2$  is low, which significantly reduces the power dissipates on the two transistors, as shown in Fig. [A.11.](#page-114-0)

Fig. [A.12](#page-115-0) shows how the gate-driven pulse is generated. The output signal on one side of the power oscillator is first passed through a level shifter to change its common-mode voltage to half  $V_{DD}$ . The level shifter is implemented using an ACcoupled inverter. Then a normal inverter transforms the output sine wave to a square wave. Finally the square wave goes through a pulse generator which changes its duty

<span id="page-114-0"></span>

Figure A.11: Pulse drive PO illustration.

cycle and use it to drive the gate of the transistor on the other side.

A simple quantitative analysis can show that the proposed pulse-drive power oscillator can effectively reduce the power consumption and increase the power efficiency. For a normal power oscillator with a tail current of  $I_{SS}$ , the power consumption is:

$$
P_{normal} = V_{DD} \times I_{SS}.\tag{A.12}
$$

The output signal on one side is:

$$
V_{out} = \frac{2}{\pi} I_{SS} R_P \cdot \cos(\omega_o t), \tag{A.13}
$$

and the output power is:

$$
P_{out,norm} = \eta \times 2 \frac{V_{out,rms}^2}{R_P}
$$
  
=  $\eta \times \frac{1}{R_P} \times (\frac{2}{\pi} I_{SS} R_P)^2,$  (A.14)

Suppose the duty cycle of the gate-driven signal is  $D$  and the tail current is  $I_{SS, pulse}$ . The

<span id="page-115-0"></span>

Figure A.12: PO gate-drive pulse generation.

Fourier series of a periodic pulse current  $I_{SS}$  is:

$$
a_n = 2 \frac{I_{SS, pulse}}{n\pi} \sin(n\pi D). \tag{A.15}
$$

Therefore, the single-end output signal is

$$
V_{out, pulse} = \frac{2}{\pi} I_{SS} R_P \sin(\pi D) \cdot \cos(\omega_o t), \tag{A.16}
$$

and the output power is:

$$
P_{out, pulse} = \eta \times 2 \frac{V_{out, pulse, rms}^2}{R_P}
$$
  
=  $\eta \times \frac{1}{R_P} \times (\frac{2}{\pi} I_{SS, pulse} R_P \sin(\pi D))^2$ . (A.17)

If the output power of the proposed TX is the same as the normal TX, it should have:

$$
P_{out,normal} = P_{out, pulse} \tag{A.18}
$$

$$
\eta \times \frac{1}{R_P} \times (\frac{2}{\pi} I_{SS} R_P)^2 = \eta \times \frac{1}{R_P} \times (\frac{2}{\pi} I_{SS, pulse} R_P \sin(\pi D))^2.
$$
 (A.19)

and we can get that:

$$
I_{SS, pulse} = \frac{I_{SS}}{\sin(\pi D)}\tag{A.20}
$$

Therefore, with the same output power, the total power consumption of the proposed pluse-drive TX is:

$$
P_{pulse} = 2D \cdot V_{DD} I_{SS, pulse}
$$
  
= 
$$
\frac{2D}{\sin(\pi D)} V_{DD} I_{SS}
$$
 (A.21)

With the above equations, we can compare the power consumption of the proposed TX with the normal TX when they have the same output power. The power consumption ratio ( $P_{pulse}/P_{normal}$ ) of the proposed pulse-drive TX vs the duty cycle is shown in Fig. [A.13.](#page-117-0) From the figure, we can find that with a lower duty cycle, the proposed TX can save more power. But of course, there is a minimum pulse width requirement to sustain the oscillation of the PO, and this is just a simplified model to estimate the power saving of the proposed method. To verify the effectiveness of the proposed method, a schematic of the circuit is implemented and the simulation waveform is shown in Fig. [A.14.](#page-117-1) The simulation results show that with about 30% duty cycle, the proposed TX can save  $40\%$ total power consumption when it has the same output power as the normal PO.

## A.5 Summary

In this section, we first discussed the background and motivation of the poweroscillator-based transmitter, and proposed a switch-capacitor fast start-up power oscillator architecture to reduce the start-up time.Then, the trade-off between the maximum power efficiency, antenna size and operation frequency is discussed. From the analysis, it shows that the maximum power efficiency of the PO is proportional to the antenna radiation efficiency. When the antenna size is small, increase the antenna size can in-

<span id="page-117-0"></span>

<span id="page-117-1"></span>Figure A.13: Power consumption ratio  $(P_{pulse}/P_{normal})$  of the proposed TX vs duty cycle of the gate-driven pulse when they have the same radiation power.



Figure A.14: Simulated gate-driven pulse and output waveform.

crease the radiation efficiency, but the radiation and maximum power efficiency start to saturate when the antenna size reach a limit. Besides, the parasitic capacitance and operation frequency also put a limit on the maximum antenna size. Finally, a pulse-drive power oscillator is proposed, which uses a pulse to drive the transistor gate and only turn it on when the output voltage amplitude is low, and significantly reduce the total power and increase the power efficiency.

## References

- [1] S.-C. Park, "The Fourth Industrial Revolution and implications for innovative cluster policies," in *AI Soc*, Dec. 2018.
- [2] United States Environmental Protection Agency. (2017) Outdoor water use in the united states. [Online]. Available: [https://19january2017snapshot.epa.gov/www3/](https://19january2017snapshot.epa.gov/www3/watersense/pubs/outdoor.html) [watersense/pubs/outdoor.html](https://19january2017snapshot.epa.gov/www3/watersense/pubs/outdoor.html)
- [3] Designed using resources from freepik.com. [Online]. Available: [https:](https://Freepik.com) [//Freepik.com](https://Freepik.com)
- [4] X. Wang, H. Huang, and Q. Li, "Design Consideration of Ultra-Low Voltage Self-Calibrated SAR ADC," in *IEEE Transactions on Circuits and Systems–II*, Apr. 2015.
- [5] X. Wang, X. Zhou, and Q. Li, "An Energy-Efficient High Speed Segmented Prequantize and Bypass DAC for SAR ADCs," in *IEEE Int. Midwest Symposium on Circuits and Systems (MWSCAS)*, Aug 2014.
- [6] P. M. J. Huang, "A 112-dB SFDR 89-dB SNDR VCO-based Sensor Front-end Enabled by Background-Calibrated Differential Pulse Code Modulation," in *IEEE Journal of Solid-State Circuits*, Apr. 2021.
- [7] A. Yeknami, X. Wang, S. Imani, A. Nikoofard, I. Jeerapan, J. Wang, and P. Mercier, "A 0.3V Biofuel-Cell-Powered Glucose/Lactate Biosensing System Employing a 180nW 64dB SNR Passive ∆Σ ADC and a 920MHz Wireless Transmitter," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2018.
- <span id="page-118-0"></span>[8] A. Yeknami, X. Wang, I. Jeerapan, S. Imani, A. Nikoofard, J. Wang, and P. Mercier, "A 0.3V Biofuel-Cell-Powered Glucose/Lactate Biosensing System," in *IEEE Journal of Solid-State Circuits*, Nov. 2018.
- [9] H. Wang and P. Mercier, "A 51 pW Reference-Free Capacitive-Discharging Oscillator Architecture Operating at 2.8 Hz," in *IEEE Custom Integrated Circuits Conference (CICC)*, Sept. 2015.
- [10] X. Wang and P. Mercier, "A 11.1nJ-Start-up 16/20MHz Crystal Oscillator with Multi-Path Feedforward Negative Resistance Boosting and Optional Dynamic Pulse Width Injection," in *IEEE Custom Integrated Circuits Conference (CICC)*, Mar. 2020.
- [11] Y. Lin, D. Sylvester, and D. Blaauw, "A 150pW program-and-hold timer for ultralow-power sensor platforms," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2009.
- [12] X. Wang, P.-H. Wang, Y. Cao, and P. Mercier, "A 0.6V 75nW All-CMOS Temperature Sensor with 1.67m◦C/mV Supply Sensitivity," in *IEEE Transactions on Circuits and Systems I: Regular Papers*, Sep. 2017.
- [13] P. Mercier, S. Bandyopadhyay, A. Lysaght, K. Stankovic, and A. Chandrakasan, "A Sub-nW 2.4 GHz Transmitter for Low Data-Rate Sensing Applications," in *IEEE Journal of Solid-State Circuits*, July 2014.
- [14] H. Wang, X. Wang, J. Park, A. Barfidokht, J. Wang, and P. P. Mercier, "A 5.5 nW battery-powered wireless ion sensing system," in *IEEE European Solid State Circuits Conference (ESSCIRC)*, 2017.
- [15] J. Li and J. Gu, "An 8.5-11 GHz CMOS Transmitter with >19 dBm OP1dB and 24% Efficiency," in *IEEE Custom Integrated Circuits Conference (CICC)*, Mar. 2019.
- [16] Y. Lin, D. Sylvester, and D. Blaauw, "A sub-pW timer using gate leakage for ultra low-power sub-Hz monitoring systems," in *IEEE Custom Integrated Circuits Conference (CICC)*, 2007.
- [17] Y. Okuma, K. Ishida, Y. Ryu, X. Zhang, P.-H. Chen, K. Watanabe, M. Takamiya, and T. Sakurai, "0.5-V input digital LDO with 98.7% current efficiency and 2.7 µA quiescent current in 65nm CMOS," in *IEEE Custom Integrated Circuits Conference (CICC)*, Sept. 2010.
- [18] M. Onouchi, K. Otsuga, Y. Igarashi, T. Ikeya, S. Morita, K. Ishibashi, and K. Yanagisawa, "A 1.39-V Input Fast-Transient-Response Digital LDO Composed of Low-Voltage MOS Transistors in 40-nm CMOS Process," in *IEEE Asian Solid-State Circuits Conference (A-SSCC)*, Nov. 2011.
- [19] S. B. Nasir, S. Gangopadhyay, and A. Raychowdhury, "A  $0.13 \mu m$  Fully Digital Low-Dropout Regulator with Adaptive Control and Reduced Dynamic Stability for Ultra-Wide Dynamic Range," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2015.
- [20] F. Yang and P. K. T. Mok, "A Nanosecond-Transient Fine-Grained Digital LDO With Multi-Step Switching Scheme and Asynchronous Adaptive Pipeline Control," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 9, pp. 2463–2474, September 2017.
- [21] G. A. Rincon-Mora and P. E. Allen, "A Low-Voltage, Low Quiescent Current, Low Drop-Out Regulator," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 1, pp. 36–44, January 1998.
- [22] C.-J. Park, M. Onabajo, and J. Silva-Martinez, "External Capacitor-Less Low Drop-Out Regulator With 25 dB Superior Power Supply Rejection in the 0.4–4 MHz Range," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 2, pp. 486–501, February 2014.
- [23] W. Xu, P. Upadhyaya, X. Wang, R. Tsang, and L. Lin, "A 1A LDO Regulator Driven by a 0.0013mm<sup>2</sup> Class-D Controller," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2017.
- [24] L. G. Salem, J. Warchall, and P. P. Mercier, "A Successive Approximation Recursive Digital Low-Dropout Voltage Regulator With PD Compensation and Sub-LSB Duty Control," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 1, pp. 35–49, January 2018.
- [25] D. Kim and M. Seok, "A Fully Integrated Digital Low-Dropout Regulator Based on Event-Driven Explicit Time-Coding Architecture," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 11, pp. 3071–3080, November 2017.
- [26] D. Kim, J. Kim, H. Ham, and M. Seok, "A  $0.5V-V_{IN}$  1.44mA-Class Event-Driven Digital LDO with a Fully Integrated 100pF Output Capacitor," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2017.
- [27] M. Huang, Y. Lu, S.-P. U, and R. P. Martins, "An Output-Capacitor-Free Analog-Assisted Digital Low-Dropout Regulator with Tri-Loop Control," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2017.
- [28] X. Ma, Y. Lu, R. P. Martins, and Q. Li, "A 0.4V 430nA Quiescent Current NMOS Digital LDO with NAND-Based Analog-Assisted Loop in 28nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2018.
- [29] Y. Lu, F. Yang, F. Chen, and P. K. T. Mok, "A 500mA Analog-Assisted Digital-LDO-Based On-Chip Distributed Power Delivery Grid with Cooperative Regulation and IR-Drop Reduction in 65nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2018.
- [30] Y. Zhang, H. Song, R. Zhou, W. Rhee, I. Shim, and Z. Wang, "A Capacitor-Less Ripple-Less Hybrid LDO With Exponential Ratio Array and 4000x Load Current

Range," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 66, no. 1, pp. 36–40, January 2019.

- [31] M. Huang and Y. Lu, "An Analog-Proportional Digital-Integral Multi-Loop Digital LDO with Fast Response, Improved PSR and Zero Minimum Load Current," in *IEEE Custom Integrated Circuits Conference (CICC)*, Apr. 2019.
- [32] X. Liu, H. K. Krishnamurthy, T. Na, S. Weng, K. Z. Ahmed, K. Ravichandran, J. Tschanz, and V. De, "A Modular Hybrid LDO with Fast Load-Transient Response and Programmable PSRR in 14nm CMOS Featuring Dynamic Clamp Tuning and Time-Constant Compensation," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2019.
- [33] S. B. Nasir, S. Sen, and A. Raychowdhury, "Switched-Mode-Control Based Hybrid LDO for Fine-Grain Power Management of Digital Load Circuits," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 2, pp. 569–581, February 2018.
- [34] X. Wang and P. P. Mercier, "A Charge-Pump-based Digital LDO Employing an AC-Coupled High-Z Feedback Loop Towards a sub-4fs FoM and a 105,000x Stable Dynamic Current Range," in *IEEE Custom Integrated Circuits Conference (CICC)*, Apr. 2019.
- [35] X. Wang and P. Mercier, "A Dynamically High-Impedance Charge-Pump-Based LDO With Digital-LDO-Like Properties Achieving a Sub-4-fs FoM," *IEEE Journal of Solid-State Circuits*, vol. 55, no. 3, pp. 719–730, March 2020.
- [36] G. Bontempo, T. Signorelli, and F. Pulvirenti, "Low Supply Voltage, Low Quiescent Current, ULDO Linear Regulator," in *Proc. IEEE Int. Conf. Electronics, Circuits and Systems*, 2001, pp. 409–412.
- [37] G. W. Besten and B. Nauta, "Embedded 5V-to-3.3V Voltage Regulator for Supplying Digital IC's in 3.3V CMOS Technology," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 7, pp. 956–962, July 1998.
- [38] M. Huang, Y. Lu, S.-P. U, and R. P. Martins, "An Analog-Assisted Tri-Loop Digital Low-Dropout Regulator," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 1, pp. 20–34, January 2018.
- [39] S. Gangopadhyay, Y. Lee, S. B. Nasir, and A. Raychowdhury, "Modeling and Analysis of Digital Linear Dropout Regulators with Adaptive Control for High Efficiency under Wide Dynamic Range Digital Loads," in *Proc. Design, Autom. Test Eur. Conf. Exhibit. (DATE)*, Mar. 2014.
- [40] S. Gangopadhyay, D. Somasekhar, J. W. Tschanz, and A. Raychowdhury, "A 32 nm Embedded, Fully-Digital, Phase-Locked Low Dropout Regulator for Fine Grained Power Management in Digital Circuits," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 11, pp. 2684–2693, November 2014.
- [41] M. Kurchuk, C. Weltin-Wu, D. Morche, and Y. Tsividis, "Event-Driven GHz-Range Continuous-Time Digital Signal Processor With Activity-Dependent Power Dissipation," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 9, pp. 2164–2173, September 2012.
- [42] W.-J. Tsou, W.-H. Yang, J.-H. Lin, H. Chen, K.-H. Chen, C.-L. Wey, Y.-H. Lin, S.- R. Lin, and T.-Y. Tsai, "Digital Low-Dropout Regulator with Anti PVT-Variation Technique for Dynamic Voltage Scaling and Adaptive Voltage Scaling Multicore Processor," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2017.
- [43] S. Kundu, M. Liu, R. Wong, S.-J. Wen, and C. H. Kim, "A Fully Integrated 40pF Output Capacitor Beat-Frequency-Quantizer-Based Digital LDO with Built-In Adaptive Sampling and Active Voltage Positioning," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2018.
- [44] D. Griffith, J. Murdock, and P. T. Røine, "A 24MHz Crystal Oscillator with Robust Fast Start-Up Using Dithered Injection," in *ISSCC*, Feb. 2016.
- [45] H. Esmaeelzadeh and S. Pamarti, "A precisely-timed energy injection technique achieving 58/10/2 µs start-up in 1.84/10/50 MHz crystal oscillators," in *CICC*, May 2017.
- [46] K. Lei, P.-I. Mak, M.-K. Law, and R. P. Martins, "A Regulation-Free Sub-0.5V 16/24MHz Crystal Oscillator for Energy-Harvesting BLE Radios with 14.2nJ Startup Energy and 31.8pW Steady-State Power," in *ISSCC*, Feb. 2018.
- [47] K. M. Megawer, N. Pal, A. Elkholy, M. G. Ahmed, A. Khashaba, D. Griffith, and P. K. Hanumolu, "A 54MHz Crystal Oscillator with 30x Start-Up Time Reduction Using 2-Step Injection in 65nm CMOS," in *ISSCC*, Feb. 2019.
- [48] M. Miyahara, Y. Endo, K. Okada, and A. Matsuzawa, "A  $64\mu s$  Start-Up 26/40MHz Crystal Oscillator with Negative Resistance Boosting Technique Using Reconfigurable Multi-Stage Amplifier," in *VLSI Symposium*, 2018.
- [49] S. W. Park, P. S. Das, A. Chhetry, and J. Y. Park, "A Flexible Capacitive Pressure Sensor for Wearable Respiration Monitoring System," *IEEE Sensors Journal*, vol. 17, no. 20, pp. 6558–6564, Oct. 2017.
- [50] Y. Jiao, C. W. Young, S. Yang, S. Oren, H. Ceylan, S. Kim, K. Gopalakrishnan, P. C. Taylor, and L. Dong, "Wearable Graphene Sensors With Microfluidic Liquid Metal Wiring for Structural Health Monitoring and Human Body Motion Sensing," *IEEE Sensors Journal*, vol. 16, no. 22, pp. 7870–7875, Nov. 2016.
- [51] H. Wang and P. P. Mercier, "Near-Zero-Power Temperature Sensing via Tunneling Currents Through Complementary Metal-Oxide-Semiconductor Transistors," *Scientific Reports*, vol. 7, no. 4427, June 2017.
- [52] X. Zhang and Y. Lian, "A 300-mV 220-nW Event-Driven ADC With Real-Time QRS Detection for Wearable ECG Sensors," *IEEE Transactions on Biomedical Circuits and Systems*, vol. 8, no. 6, pp. 834–843, Dec. 2014.
- [53] J. P. Carmo, N. S. Dias, H. R. Silva, P. M. Mendes, C. Couto, and J. H. Correia, "A 2.4-GHz Low-Power/Low-Voltage Wireless Plug-and-Play Module for EEG Applications," *IEEE Sensors Journal*, vol. 7, no. 11, pp. 1524–1531, Nov. 2007.
- [54] A. J. Bandodkar, D. Molinnus, O. Mirza, T. Guinovart, J. R. Windmiller, G. Valdes- ´ Ramírez, F. J. Andrade, M. J. Schöning, and J. Wang, "Epidermal Tattoo Potentiometric Sodium Sensors with Wireless Signal Transduction for Continuous Non-Invasive Sweat Monitoring," *Biosensors and Bioelectronics*, vol. 54, pp. 603–609, Apr. 2014.
- [55] A. J. Bandodkar, W. Jia, and J. Wang, "Tattoo-Based Wearable Electrochemical Devices: A Review," *Electroanalysis*, vol. 27, no. 3, pp. 562–572, Mar. 2015.
- [56] J. Kim, T. N. Cho, G. Valdés-Ramírez, and J. Wang, "A Wearable Fingernail Chemical Sensing Platform: pH Sensing at Your Fingertips," *Talanta*, vol. 150, pp. 622 – 628, Apr. 2016.
- [57] H. Wang, X. Wang, J. Park, A. Barfidokht, J. Wang, and P. P. Mercier, "A Battery-Powered Wireless Ion Sensing System Consuming 5.5 nW of Average Power," in *IEEE Journal of Solid-State Circuits*, July 2017.
- [58] P. P. Mercier, A. C. Lysaght, S. Bandyopadhyay, A. P. Chandrakasan, and K. M. Stankovic, "Energy Extraction from the Biologic Battery in the Inner Ear." *Nature biotechnology*, vol. 30, no. 12, pp. 1240–3, Dec. 2012.
- [59] W. Jia, X. Wang, S. Imani, A. J. Bandodkar, J. Ramírez, P. P. Mercier, J. Wang, M. D. Lima, M. E. Kozlov, R. H. Baughman, S. J. Kim, S. W. Kim, R. Chowdhury, M. Ying, L. Z. Xu, M. Li, H. J. Chung, H. Keum, M. McCormick, P. Liu, Y. W. Zhang, F. G. Omenetto, Y. G. Huang, T. Coleman, and J. A. Rogers, "Wearable Textile Biofuel Cells for Powering Electronics," *J. Mater. Chem. A*, vol. 2, no. 43, pp. 18 184–18 189, Sep. 2014.
- [60] J. Sandifer and R. Buck, "Impedance Characteristics of Ion Selective Glass Electrodes," *Journal of Electroanalytical Chemistry and Interfacial Electrochemistry*, vol. 56, no. 3, pp. 385–398, Nov 1974.
- [61] X. Wang and Q. Li, "A High-Speed Energy-Efficient Segmented Prequantize and Bypass DAC for SAR ADCs," in *IEEE Transactions on Circuits and Systems–II*, Aug. 2015.
- [62] M. Liu, A. van Roermund, and P. Harpe, "A 10b 20MS/s SAR ADC with a Low-Power and Area-Efficient DAC-Compensated Reference," in *ESSCIRC 2017 - 43rd IEEE European Solid State Circuits Conference*, Sept 2017, pp. 231–234.
- [63] J. Craninckx and G. van der Plas, "A 65fJ/Conversion-Step 0-to-50MS/s 0-to-0.7mW 9b Charge-Sharing SAR ADC in 90nm Digital CMOS," in *2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers*, Feb. 2007, pp. 246–600.
- [64] M. van Elzakker, E. van Tuijl, P. Geraedts, D. Schinkel, E. A. M. Klumperink, and B. Nauta, "A 10-bit Charge-Redistribution ADC Consuming 1.9  $\mu$ W at 1 MS/s," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 5, pp. 1007–1015, May 2010.
- [65] K.-H. Chang and C.-C. Hsieh, "A 12b 10MS/s 18.9fJ/conversion-step sub-radix-2 SAR ADC," in *2016 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)*, Apr 2016, pp. 1–4.
- [66] B. Malki, T. Yamamoto, B. Verbruggen, P. Wambacq, and J. Craninckx, "A 70dB DR 10b 0-to-80MS/s Current-Integrating SAR ADC with Adaptive Dynamic Range," in *2012 IEEE International Solid-State Circuits Conference*, Feb. 2012, pp. 470–472.
- [67] X. Wang and Q. Li, "A 10-bit 150MS/s SAR ADC with Parallel Segmented DAC in 65nm CMOS," in *IEEE Int. Symposium on Circuits and Systems (ISCAS)*, June 2014.
- [68] P. J. A. Harpe, C. Zhou, Y. Bi, N. P. van der Meijs, X. Wang, K. Philips, G. Dolmans, and H. de Groot, "A 26  $\mu$ W 8 bit 10 MS/s Asynchronous SAR ADC for Low Energy Radios," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 7, pp. 1585–1595, Jul. 2011.
- [69] H. Wang and P. P. Mercier, "A Reference-Free Capacitive-Discharging Oscillator Architecture Consuming 44.4 pW/75.6 nW at 2.8 Hz/6.4 kHz," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 6, pp. 1423–1435, June 2016.
- [70] P. P. Mercier, S. Bandyopadhyay, A. C. Lysaght, K. M. Stankovic, and A. P. Chandrakasan, "A Sub-nW 2.4 GHz Transmitter for Low Data-Rate Sensing Applications," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 7, pp. 1463–1474, Jul. 2014.
- [71] H. A. Wheeler, "Fundamental Limitations of Small Antennas," *Proceedings of the IRE*, vol. 35, no. 12, pp. 1479–1484, Dec. 1947.
- [72] S. Ha, A. Akinin, J. Park, C. Kim, H. Wang, C. Maier, P. P. Mercier, and G. Cauwenberghs, "Silicon-Integrated High-Density Electrocortical Interfaces," *Proceedings of the IEEE*, vol. 105, no. 1, pp. 11–33, Jan 2017.
- [73] M. D. Seeman and S. R. Sanders, "Analysis and Optimization of Switched-Capacitor DC-DC Converters," *IEEE Transactions on Power Electronics*, vol. 23, no. 7, pp. 841–851, Mar. 2008.