# **UCLA UCLA Electronic Theses and Dissertations**

## **Title**

Efficient Track-and-Hold Techniques for High Speed time-interleaved ADCs

**Permalink** <https://escholarship.org/uc/item/6cq7d53m>

**Author** Wang, Xiao

**Publication Date** 2018

Peer reviewed|Thesis/dissertation

## UNIVERSITY OF CALIFORNIA

Los Angeles

Efficient Track-and-Hold Techniques for High Speed Time-interleaved ADCs

A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical and Computer Engineering

by

Xiao Wang

2018

## © Copyright by

Xiao Wang

2018

#### ABSTRACT OF THE DISSERTATION

#### Efficient Track-and-Hold Techniques for High Speed Time-interleaved ADCs

by

Xiao Wang

Doctor of Philosophy in Electrical and Computer Engineering University of California, Los Angeles, 2018 Professor Mau-Chung Frank Chang, Chair

Time-Interleaving (TI) can relax the power-speed tradeoffs of analog-to-digital converter (ADC) and reduce their metastability error rate while it is not free. Track-and-hold (T&H) nonlinearity, noise and power are the main limitations of high-speed high-resolution and lowpower ADCs. This dissertation introduces two efficient T&H design techniques to improve the performances of TI-ADCs without sophisticated calibrations. Two fabricated chips with 8b 2GS/s and 8b 8.8GS/s will be shown as the silicon verification of the proposed methods.

Two prototype ICs were designed during this work. First, a two-way time-interleaved pipelined ADC architecture was built upon a new concept of virtual-ground sampling, featuring merged front-end T/H, residue generation, input termination, and buffering. This architecture is investigated to alleviate the front-end performance tradeoff among the THD, bandwidth, and sample rate (interleaving factor). A 2-GS/s 8b ADC using the new architecture was designed and fabricated in a 28-nm CMOS, achieving 43-dB SNDR and 55-dB SFDR up to Nyquist frequency.

Second, a complementary dual-loop-assisted track-and-hold buffer is introduced to achieve both high linearity and bandwidth with low power. The prototype ADC also employs a two-level 2×8 master-slave hierarchical interleaved architecture and achieved an 8.8 GS/s 16-way timeinterleaved asynchronous SAR ADC fabricated in 28-nm CMOS technology. It achieves 38.4-dB SNDR and 50-dB SFDR with a Nyquist input at 8.8 GS/s sampling rate and consumes 83.4 mW, resulting in a 140 fJ/conv.-step Walden FOM with buffers.

The dissertation of Xiao Wang is approved.

Ya-Hong Xie

Katsushi Arisaka

Danijela Cabric

Mau-Chung Frank Chang, Committee Chair

University of California, Los Angeles

2018

## **DEDICATION**

*To my lovely mother*

# **TABLE OF CONTENTS**









# **LIST OF FIGURES**







| Fig 4.7 Simulated THD of front-end S/Hs with proposed and conventional SF buriers with        |  |
|-----------------------------------------------------------------------------------------------|--|
|                                                                                               |  |
| Fig 4.8 Voltage gain of front-end S/Hs with proposed and conventional SF buffers with similar |  |
|                                                                                               |  |
| Fig 4.9 High speed clock generation with timing skew adjustment cell and 8-phase non-overlap  |  |
|                                                                                               |  |
|                                                                                               |  |
|                                                                                               |  |
|                                                                                               |  |
|                                                                                               |  |
|                                                                                               |  |
|                                                                                               |  |
|                                                                                               |  |
| Fig 4.17 Measured SNDR/SFDR versus sampling rate for $f_{in} = 141.2$ MHz  65                 |  |
|                                                                                               |  |

Fig 4.7 Simulated THD of front-end S/Hs with proposed and conventional SF buffers with

## **LIST OF TABLES**



## **ACKNOWLEDGEMENTS**

I am grateful to the many people who supported and encouraged me during the graduate study leading to this thesis: professors, colleagues, alumni, friends and my parents.

I would like to thank my advisor, Professor M. C. Frank Chang, for his patience, encouragement, guidance and support. I had a hard time at the beginning of my Ph.D. research because I was in the Radio-Frequency Integrated Circuits (RFIC) field in my Master's study. Professor Chang helped select the challenging and worthwhile research topic and guided me throughout this endeavor. Professor Chang helped me to build knowledge in this new field gradually and provided me very useful ideas when I came across a technical problem and great chance to do the tapeouts in advanced technology process. Hardworking, passionate and creative as he is, he sets an excellent example for me in my future career. I am also grateful the other members of my committee, Yahong Xie, Katsushi Arisaka and Danijela Cabric.

I would also like to thank my friends, colleagues and alumni at UCLA. I feel privileged to have worked with these stimulating researchers, especially Yilei Li, Yuan Du, Jieqiong Du, Yan Zhao, Wei-Han Cho, and Chien-Heng Wong. I would also like to thank Dr. Hui Pan, our alumnus in Broadcom, and Chi-Hang Chan in University of Macau for their invaluable advices for this work and the preparation of this manuscript. Janet Lin assisted greatly in ordering electronics components for the test and polishing my manuscript grammar.

I was blessed to have these friends around me. Long Kong, Hao Xu, Dihang Yang, Wenlong Jiang, Kejian Shi, Hsin-Hao Chiang and Suzi Chao helped me a lot throughout my study and life at UCLA. I had a wonderful time together with them.

Most importantly, I would like to give my greatest gratitude to my family and my girlfriend. They give me their endless love and constant moral support. They are always with me at hard times and give me confidence to overcome all obstacles. I cannot adequately express the love and gratitude I feel for them.

## **VITA**

2011 B.E., Electrical Engineering, University of Electronic Science and Technology of China

2011 R&D Intern, Texas Instruments, Shenzhen, China

2012 Analog Circuits Design Intern, Qualcomm Technologies, Inc, San Diego, USA

2013 M.S., Electrical and Computer Engineering, University of California, Santa Barbara, USA

2014 Analog Circuits Design Intern, OmniVision, Santa Clara, USA

2015 Mixed-Signal Circuits Design Intern, Broadcom Limited, Irvine, USA

2017 Mixed-Signal Circuits Design Intern, Qualcomm Technologies, Inc, San Diego, USA

## **PUBLICATIONS**

**X. Shawn Wang**, et al, "An 8.8-GS/s 8b Time-Interleaved SAR ADC with 50-dB SFDR using Complementary Dual-Loop-Assisted Buffers in 28-nm CMOS," *IEEE Radio Frequency Integrated Circuits Symposium (RFIC).*, June 2018 --- **2018 RFIC Best Student Paper Award Finalist**.

C-H Wong, Y Li, J Du, **Xiao Wang**, Mau-Chung Frank Chang, "0.75 V 2.6 GHz digital bang– bang PLL with dynamic double-tail phase detector and supply-noise-tolerant gm-controlled DCO", *IET Electronics Letters*, vol. 54, no. 4, pp. 198-200, Jan 2018.

Yuan Du, Li Du, Xuefeng Gu, **Xiao Wang**, Mau-Chung Frank Chang, "A Memristive Neural Network Computing Engine using CMOS-Compatible Charge-Trap-Transistor (CTT)", *arXiv preprint arXiv:1709.06614*, Sep, 2017.

**X. Shawn Wang**, et al, "A 2-GS/s 8-bit ADC Featuring Virtual-Ground Sampling Interleaved Architecture in 28-nm CMOS., *IEEE Transactions on Circuits and Systems-II (TCAS-II).,* Sep 2017.

**X. Shawn Wang**, et al, "Concurrent Design Analysis of High-Linearity SP10T Switch With 8.5 kV ESD Protection," *IEEE Journal of Solid-State Circuits (JSSC).,* vol. 49, no. 9, pp. 1927- 1941, Sep. 2014.

**X. Shawn Wang** and C. Patrick Yue, "A Dual-band SP6T T/R Switch in SOI CMOS with 37 dBm P−0.1dB for GSM/W-CDMA Handsets," *IEEE Trans. Microwave Theory and Tech (TMTT).,* vol. 62, no. 4, pp. 861-870, Apr 2014.

**X. Shawn. Wang**, et al, "A smartphone SP10T T/R switch in 180-nm SOI CMOS with 8 kV+ ESD protection by co-design," in *IEEE Custom Integr. Circuits Conf (CICC).,* Sep. 2013.

## **CHAPTER 1**

## **Introduction**

#### **1.1. Motivation**

A pulsed wave radar is one of the modern millimeter-wave radar systems. It can detect objects with higher range resolution compared with other continuous wave radar systems [1], [2], as shown in Fig. 1.1. ADC-based backplane receivers and coherent fiber-optical receivers are promising technologies for the next generation wireline communication systems [3], [4], as shown in Fig. 1.2. For both technologies, high speed analog-to-digital converter (ADC) of over multi GS/s is one key enabler.

Time-Interleaved (TI) analog-to-digital converters (ADCs) have been gaining popularity in the past few years, not only because it relaxes the power-speed tradeoffs of ADC and reduce their metastability error rate, but also because newer CMOS technologies no longer provide significant speed advantages. Furthermore, the successive approximation register (SAR) ADCs greatly benefit from their simple analog configuration and excellent power and area efficiency with the scaling down of CMOS technology. However, the benefit brought by time-interleaving is not free. In addition to extra area, more complex clocking and routing networks, massive interleaving structures not only increase input load that limits the signal bandwidth, but also incur various issues including offset, gain and timing mismatches. Consequently, complex calibrations [5-8] were required thus increasing system design complexity, massive routing, clocking, and total chip area. To achieve sufficient bandwidth, track-and-hold (T&H) buffers

are usually employed before and/or after the sampling switches in TI ADCs. Convectional T&H buffers [9-11] severely suffer from design challenges including bandwidth, slewing speed, linearity, noise and etc.



**Fig 1.1, Millimeter-wave pulsed radar system [2]**



**Fig 1.2, ADC-based data link transceivers [3]**

Fig. 1.3 shows a plot of Walden figure-of-merit ( $FOM_W = P/(2^{ENOB} \times f_S)$ ) vs. conversion speed for SAR ADCs using the data collected in [12]. It is evident that recent designs cover a wide performance spectrum (conversion speeds from several GS/s to several tens of GS/s) and achieve very good power-efficiency at low to moderate conversion speeds (< 300 MS/s). This can be attributed to the improvement in CMOS process technology, new ADC architecture developments and circuit level



**Fig 1.3, FOM<sup>W</sup> vs. conversion speed showing ADC performance [12]**

optimizations.

In this dissertation, I report two efficient T&H front-end architectures for high speed timeinterleaved ADCs. First, I propose a time-interleaved two-step ADC architecture built upon a new concept of virtual-ground sampling and using merged front-end T/H with residue generation, input termination and buffering, aimed to push effective-number-of-bits (ENOB) of time-interleaved ADCs from a current 5~6b range to 7~9b level without degrading conversion rates. Thereafter, I introduce an 8.8 GS/s 16-way time-interleaved asynchronous SAR ADC. A two-level 2×8 master-slave hierarchical interleaved architecture is employed. A complementary dual-loop-assisted buffer is proposed to achieve both high linearity and bandwidth with low power.

#### **1.2. Dissertation Organization**

The remainder of this thesis is organized as follows. Chapter 2 introduces the basics and design challenges of time-interleaved ADCs and track & hold front end. Chapter 3 explains a new concept of virtual-ground sampling and presents a time-interleaved two-step ADC architecture built upon a new concept of virtual-ground sampling and using merged front-end T/H with residue generation, input termination and buffering. And the modeling, design and measurement results are also shown in this chapter. Chapter 4 describes a complementary dualloop-assisted buffer for high speed time-interleaved SAR ADC. The design, analysis and measurements of an 8.8-GS/s, 50-dB SNFR prototype ADC in a 28-nm CMOS process are presented in this section. Chapter 5 concludes the dissertation.

## **CHAPTER 2**

## **Time-Interleaved ADCs and Track & Hold Front End**

### **2.1. Basics of Time-Interleaved ADC**

In Fig. 2.1 an implementation of a time-interleaved (TI) ADC is shown consisting of several channels, each with a track-and-hold (T&H) section and a sub-ADC [13]. The samplerate of an interleaved ADC is N times the sample-rate of a sub-ADC, with N the number of channels. The main benefit of the time-interleaved architecture is that the overall sample-rate can be very high, while the sub-ADCs only need a moderate sample-rate, enabling a high power efficiency. A corresponding timing diagram is shown in Fig. 2.2. At each falling edge of the master-clock (MCLK), one of the T&Hs goes from track-mode to hold-mode and takes a sample of the input-signal. For a master-clock with sample-rate fs and a number of channels N, each T&H and sub-ADC has a sample-rate of  $f_S/N$ . Making a time-interleaved ADC involves more than just placing a few non-interleaved ADCs in parallel, since the requirements for a non-interleaved T&H and ADC differ from that of a time-interleaved T&H and sub-ADC: Aspects like offset, gain error and absolute timing, which are usually not an issue for a general purpose non-interleaved T&H and ADC, are important for a time-interleaved architecture, as will be explained in this chapter. Moreover, the Nyquist frequency of a time-interleaved architecture is N times higher than that of a non-interleaved ADC, so the T&Hs should have a much higher bandwidth and should be able to sample signals with an N times higher frequency.



**Fig 2.1, Time-interleaved ADC architecture**



**Fig 2.2, Timing diagram of a time-interleaved ADC**

### **2.2. Mismatches of Time-Interleaved ADCs**

Since random mismatches and processes and temperature and voltage variations exist, each sub-ADC may have different characteristics. And these mismatches can seriously limit the ADC linearity.

The effect of offset and gain mismatches has been studied extensively [14-16]. In the absence of the input signal, the two ADCs digitize their own DC offsets, shown in Fig. 2.3. Consequently, the multiplexed output toggles between the two offsets and presents a pattern at fs/2. Obviously the digital output and the analog input do not agree with each other.



**Fig 2.3, Offset mismatch of TI-ADC**

An intuitive example to understand gain mismatch is also a two way interleaved ADC with DC input shown in Fig. 2.4. The output is an amplitude modulate version of the input. But unlike the offset mismatch case, the spurious tone generated by gain mismatch is input frequency dependent.



**Fig 2.4, Gain mismatch of TI-ADC**

The performance of a time-interleaved ADC is affected by mismatch in offset and gain, and this is not dependent on the signal frequency. For bandwidth mismatch however, the performance degradation is dependent on the signal frequency. The ADC front-end can be modeled as a low-pass-filter (or any transfer function). However, each ADC's front-end may have a different transfer function. Luckily the transfer function can be decomposed into magnitude and phase response which can be considered as gain and skew mismatch. One must notice that the equivalent gain and skew mismatch in this case is frequency dependent.



**Fig 2.5, Bandwidth mismatch of TI-ADC.**

In ideal case the sampling point of each sub-ADC is assumed to be evenly spaced. However in real implementation, mismatches always exits and cause the sampling point to shift. A two-way interleaved ADC example with skew mismatch is shown in Figure 2.6. The black solid line shows the sampling point of sub-ADC1 and the black dotted line shows the sampling point of sub-ADC2 in a skew free case. However because of skew the sampling point of sub-ADC2 is delayed by ∆t and represented by the red solid line. Hence the sampled value of subADC2 is wrong by ∆t × dV/dt assuming the skew magnitude is small. Obviously the skew error power is proportional to the frequency of the input signal. Thus making this error significant for high frequency input.



**Fig 2.6, Timing mismatch of TI-ADC.**

Offset mismatch causes distortion tones at multiples of  $f<sub>S</sub>/N$ , while mismatch in gain or timing results in tones at multiples of  $f_S/N \pm f_N$ . In Fig. 2.7 the spectrum of a reconstructed sinusoid is shown for N = 8 and band-limited to  $f_S/2 + f_{IN}$ , with  $f_{IN}$  the frequency of the input signal. The upper part shows the case where only offset mismatch is present and the lower part shows the effect of gain or phase mismatch. The amplitude of the spurious tones depends on the offset/gain distribution of the channels and on the number of channels: for a larger number of channels, the error energy is divided between more tones, so the amplitude per tone decreases. And bandwidth mismatch can be split into resulting gain and timing mismatches [13].



**Fig 2.7, Spectrum of a reconstructed sinusoid for a time-interleaved ADC with 8 channels and mismatch in (a) offset and (b) gain or timing mismatch**

### **2.3. Time-interleaved Track and Hold Architectures**

In this section three time-interleaved T&H architectures are discussed: the normal TI architecture without a front-end sampler, the TI architecture with a front-end sampler and the architecture with a hierarchical front-end sampler. Optional improvements and limitations are discussed for all architectures in relation to bandwidth and accuracy. The section ends with a comparison of the architectures.

2.3.1. TI architecture without a front-end sampler

The most straightforward configuration of a TI T&H is shown in Fig. 2.1, where each sub-ADC has its own T&H circuit [17, 18], with corresponding timing diagram shown in Fig. 2.8. At each falling edge of the master-clock, one of the T&Hs goes from track-mode to hold-mode and takes a sample of the input signal. This signal is then converted to the digital domain by the ADC in the same channel.

An important consideration is the input capacitance of the time-interleaved T&H. For high speed input signals often transmission lines with on-chip  $50\Omega$  termination are used to mitigate reflections. The resistance at the input node is therefore  $25Ω$ :  $50Ω$  of the on-chip termination parallel to  $50\Omega$  of the external source. With this fixed input resistance, the input capacitance determines the bandwidth. If the resulting bandwidth is not large enough, an input buffer can be used to increase the bandwidth. The input capacitance of the T&H determines the power consumption of this buffer and for a very large capacitive load it can be unfeasible to drive it with sufficient bandwidth. Also, due to the high demands on this buffer (the combination of a high speed and a large capacitive load), it requires a lot of power. When the timing diagram of Fig. 2.8 is used, at each moment in time N/2 sample capacitors are connected to the input. The input capacitance can be decreased by reducing the track-time. In Fig. 2.9 the timing diagram is shown for a track-time of one period of the master clock. In this case only one samplecapacitor is connected to the input at a time, lowering the input capacitance and enabling higher number of channels for a given bandwidth. A short track-time implies a long hold-time, which is advantageous in most ADC architectures, as the ADC has more time to do the conversion.



**Fig 2.9, Timing diagram of TI ADC with track-time of one period**

#### 2.3.2. TI architecture with a front-end sampler

To avoid timing-misalignment between channels, a frontend sampler (FRS) [19] can be added to the conventional architecture as shown in Fig. 2.10. The essence of this architecture is that the frontend sampler determines all sampling moments, avoiding timing mismatch. To reduce the input capacitance and to increase the conversion-time available for the ADC, the track-time can be reduced to one clock period, in the same way as in the conventional architecture without a front-end sampler. The resulting timing diagram is shown in Fig. 2.1.



**Fig 2.10, Timing diagram of TI ADC with the track-time of one period and front-**

#### **end sampler**

Without a frontend sampler the track-time can be made one or more periods. A disadvantage of the architecture with a frontend sampler is that the track-time is limited to about half a clock-cycle of the master clock, because the frontend sampler has to operate at the full sample-rate. The track-time can be slightly increased by using a clock with a duty-cycle larger than 50%, but it can never reach a full clock period, as the sample-switch in the channel has to be opened while the frontend switch is still open. Ensuring that the clocks are nonoverlapping at high sample rates, takes a significant part of the sample-period.

2.3.3. TI architecture with hierarchical front-end sampler

The main disadvantage of a frontend sampler is the decrease in bandwidth, due to the large capacitance of the wires and switches after the sampler. This capacitance can be decreased by using additional switches placed between the frontend sampler (FRS) and the T&H switches in the channels. The hierarchical sampler is shown in Fig. 2.11.



**Fig 2.11, TI Architecture with hierarchical front-end sampler**

The corresponding timing diagram is shown in Fig. 2.12 and the operation is as follows: Suppose switches FRS, SA1 and SB1 are conducting, such that the first channel is in track mode. Then, FRS opens first and determines the sample moment. Next, SB1 opens and fixes the charge on the sample capacitor. After this, FRS and SB2 close and the second channel is in track-mode. After a sample period, FRS opens again followed by SB2 and so on. When SB4 is opened (after FRS is opened), also SA1 is opened and SA2 is closed, such that the next four channels can take samples of the input signal. The A-switches can be opened, after the Bswitches are opened. Since the charge on the sample-capacitor is then already fixed, charge injection of the A-switches does not degrade the performance.



**Fig 2.12, Timing diagram of architecture with hierarchical front-end sampler**

The advantage of this architecture is that timing misalignment is avoided, and the bandwidth is larger than without using additional switches. A disadvantage of this architecture is that the (in this example) four quarters of the circuit will have bandwidth mismatch due to spread in the A switches (w.r.t.  $R_{ON}$  and  $C_{parasitic}$ ) and the capacitance of the wire and the B switches. This can limit the performance or require bandwidth calibration.

## **2.4 Track and Hold Buffer**

In a TI ADC multiple sub-ADCs operate in parallel, resulting in an N times higher samplerate, with N of the number of channels. If the ratio of the maximum input frequency and the sample-rate (e.g. 1/2 for Nyquist operation) is kept constant, the T&Hs need to operate with N times higher input frequencies than if used non interleaved.

To achieve good linearity, closed-loop configurations (shown in Fig. 2.13(a)) using feedback are commonly used in T&Hs for medium signal frequencies [20, 21]. However, for high input signal frequency, the gain-bandwidth product (GBW) is limited due to return factor less than one and imperfections of virtual ground, resulting in reduced linearity at higher input frequencies.

Open-loop configurations offer a higher bandwidth at the cost of accuracy and linearity. A bandwidth of multi-GHz is easily achievable with a configuration with a source-follower buffer, shown in Fig. 2.13(b). This configuration suffers from distortion introduced by sampling switch, nonlinear capacitance and the buffer.



**Fig 2.13, T&H configuration with (a) closed-loop buffer and (b) open-loop buffer** 2.4.1. Buffer Distortion

Differential implementation of the T&H can reduce even-order harmonics by a large amount. The actual reduction depends on the matching of the halves of the circuit. For oddorder harmonics, consider the source follower buffer of Fig. 2.13(b). The bulk of the PMOST is tied to its source to mitigate the non-linear body effect. Assuming an ideal current source, the small signal transfer function is given by:

$$
V_{out} = V_{in} \frac{1}{1 + \frac{1}{g_m (V_{in}) \times r_{out} (V_{in})}}
$$
(2-1)

with  $g_m$  the transconductance of the transistor and r<sub>out</sub> the output resistance of the transistor. Both gm and rout are functions of the drain-source voltage  $(V_{DS})$  due to channel length modulation, and as  $V_{DS}$  depends on the input voltage, they are functions of the input voltage. So, when the input voltage varies, the transfer function varies, and the output signal becomes distorted.

In modern sub-micron CMOS processes, the non-linearity of the output resistance is the dominant source of distortion in the configuration of Fig. 2.13(b). To get a high bandwidth, the length of the transistor must be small, so the absolute value of the output resistance is small. If the intrinsic gain ( $g_m \times r_{out}$ ) is small and nonlinear, the output signal is significantly distorted.

The best way to increase the linearity is to decrease the variation of  $V_{DS}$ . An example of a circuit where this is implemented is the cascode source follower [22], shown in Fig. 2.14(a). A disadvantage of this implementation is the increased input capacitance. Moreover, the upper transistor needs to have a much smaller threshold voltage than the lower transistor to keep the lower transistor in saturation. This can be accomplished by scaling the transistors, which can be disadvantageous for other circuit aspects like speed or it can be accomplished by using a process option such as the low-VT option [23], which requires additional process steps.



# **Fig 2.14, Buffer implementation with (a) cascode transistor and (b) additional bootstrapped transistor**

The other example of unity-gain buffer is shown in Fig. 2.14(b). It is in fact a P-type source-follower (SF), with an additional N-type SF aiming to bootstrap the drain-source voltage of the PMOS transistor constant. The second SF decreases the variation in VDS of the PMOST, such that the effective output resistance of the PMOST is increased and that the gain and linearity of the buffer are increased. This is explained in the next paragraphs. The second SF transistor needs to have a short channel length to achieve a large SF bandwidth, and its bulk is connected to ground, since this is required by most standard CMOS processes. Due to the small output resistance and the body-effect, the voltage gain of the 2<sup>nd</sup> SF buffer is only around 0.9. The signal swing over the source-drain of the first SF transistor is therefore only  $0.1V<sub>OUT</sub>$ instead of  $V_{\text{OUT}}$ . Consequently, there will flow 10 times less current in the output resistance, and its effective resistance is increased by the same factor. The  $g<sub>m</sub>$  of the first SF transistor is unchanged, so the intrinsic gain ( $g_m \times r_{out}$ ) is increased by a factor of 10 as well, and the voltage gain of the buffer will be closer to 1.

For the linearity the following holds: Suppose the output resistance is described by the following equation:

$$
r_{out} = a + bV_{DS} + cV_{DS}^2 + dV_{DS}^3
$$
 (2-2)

Compared to a conventional SF,  $V_{DS}$  is 10 times less (−20 dB), the second-order distortion component  $cV_{DS}^2$  is reduced by 40 dB (100 times) and the third-order distortion component  $dV_{DS}^3$  is reduced by 60 dB (1000 times).
#### 2.4.2. Input Capacitance

It is important that the input capacitance of the T&H buffer is low and linear, to avoid distortion at the input of the buffer for high-frequency input signals. The new buffer has less non-linear input capacitance than a conventional or the cascaded source-follower, as described in the following: In the conventional source-follower, the gate-source capacitance is effectively lowered thanks to the Miller effect:

$$
C_{eff} = (1 - A_V) \times C_{real} \tag{2-3}
$$

with  $C_{\text{eff}}$  the effective capacitance when looking into the gate,  $A_V$  the voltage gain between the gate and the source and Creal the real gate-source capacitance. For a source-follower, the gain A<sup>V</sup> is close to 1 and the effective capacitance is only a small fraction of the real capacitance. This is true for both the gate-source and the gate-bulk capacitance, assuming the bulk is connected to the source. What remains is the gate-drain capacitance and this is the dominant input capacitance for both the conventional and the cascode source-follower. In the buffer with additional bootstrapped transistor in Fig. 2.14(b), the drain terminal of the input transistor also tracks the input signal. The gate-drain capacitance is therefore mitigated as well, resulting in a very small input capacitance.

#### 2.4.3. Distortion at High Frequencies with a Capacitive Load

If a buffer, implemented as a switch source follower is loaded with a capacitance (e.g. sub-ADC), the current through the input transistor of the buffer varies when the capacitance is charged or discharged, see Fig. 2.15 with switch S2 closed. If the bias current is not constant, the gate-source voltage  $V_{GS}$  of the input transistor is not constant and the output will be distorted.



**Fig 2.15, Hierarchical T&H configuration with buffer and additional switch**

For a non-interleaved (NI) T&H and a buffer with first-order settling behavior, the bandwidth requirement for the buffer with respect to settling is:

$$
BW_{NI,settle} = \frac{(n+1) \times \ln(2) \times 2 \times f_S}{2\pi}
$$
 (2-4)

with n the resolution in bits,  $f_S$  the sample-rate and assuming half the sample period for settling. With n = 10, the resulting bandwidth requirement yields: *BW<sub>NI,settle</sub>* > 4.9*fNyquist*. An input buffer with this bandwidth tracks input signals at Nyquist frequency closely.

For a TI T&H the bandwidth requirement for settling is relaxed by the interleaving factor (number of channels). The bandwidth requirement for a TI T&H is:

$$
BW_{INT,settle} = \frac{(n+1) \times \ln(2) \times 2 \times f_S}{2\pi \times N}
$$
 (2-5)

with N the interleaving factor and again assuming half the sample-period for settling. For the example of  $n = 10$  and an interleaving factor of 16, the bandwidth requirement is: *BW<sub>INT,settle</sub>* > 0.3*fNyquist*. If a buffer with minimal bandwidth for settling is used to save power, the buffer output no longer tracks input signals at the Nyquist frequency, but a large attenuation and phase-shift is present, and the problem as shown in Fig. 2.16(a) arises: During tracking, the buffer output  $V_{BUF}$  cannot follow the input signal  $V_{TH}$  and at the sample moment (t<sub>sample</sub>), the output signal  $V_{\text{BUF}}$  is not yet fully settled. After the sample moment, the buffer output  $V_{\text{BUF}}$ will slowly settle to its final value. During this settling, charge-redistribution between (1) the non-linear parasitic capacitance  $C_P$  between the input and output of the buffer and  $(2)$  the sample capacitor  $C_s$ , causes distortion of the voltage on the sample capacitor  $V_{T/H}$  and the buffer output  $V_{\text{BUF}}$ , as indicated in the figure.

To avoid distortion, the buffer bandwidth could be increased, but this increases its power consumption significantly. Moreover, up-scaling of the buffer is limited, as this also increases the nonlinear input capacitance of the buffer, which requires more drive-power and introduces distortion at the input of the buffer. Up-scaling is therefore always a compromise between the required bandwidth on one side, and linearity, power and available drive on the other side.



**Fig 2.16, Hierarchical sampling with a buffer having a limited-bandwidth (a) without S2** 

**in Fig. 2.15, (b) with S2 in Fig. 2.15**

To overcome this compromise, switch S2 is introduced between the buffer output and the input capacitance of the ADC as shown in Fig. 2.15 [10]. In track-mode this switch is open and the load capacitance of the buffer is small. Hence the buffer bandwidth is high and output  $V_{\text{BUF}}$ can now follow the input  $V_{TH}$  closely, as shown in Fig. 2.16(b). In this case, the distortion due to charge redistribution is mitigated, without decreasing the linearity or increasing the power consumption.

When the ADC is connected at  $t = t_{switch}$ , the buffer output will first make a step to the value of the previous sample, still present on the ADC input capacitance. Then the buffer will charge the ADC load to the new sample value. Charge redistribution after  $t = t_{switch}$  causes a signal dependent step in  $V_{T/H}$ , marked by S. This seems to cause distortion, however as  $V_{BUF}$ settles to its final value, the process of charge redistribution is reversed and  $V<sub>T/H</sub>$  returns to its initial, undistorted value. This is thanks to charge conservation at the capacitor plates connected to the input node of the amplifier.

In conclusion, in an interleaving architecture the settling time can be relatively long. If the buffer has a large capacitive load, its bandwidth can be reduced to save power. However, this causes distortion. Now, by disconnecting the load during tracking, the distortion is avoided and the buffer bandwidth can remain reduced and power is saved. In next two chapters, two efficient T&H architectures with buffers and techniques are introduced with demonstrations of prototype ADCs.

# **CHAPTER 3**

# **A 2-GS/s 8-bit ADC Featuring Virtual-Ground Sampling Interleaved Architecture in 28-nm CMOS**

This chapter will propose a time-interleaved pipelined ADC architecture built upon a new concept of virtual-ground sampling, featuring merged front-end track-and-hold (T/H), residue generation, input termination, and buffering [24].

#### **3.1. Introduction**

High-speed time-interleaved (TI) ADCs have recently enabled data transmissions up to 100 Gb/s using PAM4 or DP-QPSK modulations to overcome channel bandwidth (BW) limitations [25-27]. To support higher-order modulations for data rate beyond tens of Gb/s, the ADC effective-number-of-bits (ENOB) needs to be pushed to a higher resolution range than the current low-resolution level of 5~6 bits. This poses a significant design challenge because the simple track-and-hold (T/H) front-end circuits commonly used for high speed operations [26-30], become the bottleneck in linearity or total-harmonic-distortion (THD), especially as supply voltage scales down. The ENOB is limited by the open-loop source followers (SF) buffer nonlinearity, the T/H switch charge injection, the nonlinear switch parasitics, and so on. To enhance the linearity, a close-loop configuration is explored where the TI T/H switches are placed at the virtual grounds of an array of op-amp based inverting buffers [31]. This ADC front-end configuration can be made power efficient by reusing the T/H buffer for residue generation in the hold phase. Because the reconstruction DAC reference and the signal go through the same signal path for each time-interleaved channel, there is no gain mismatch or calibration problem among different channels. In addition, the linearity requirement of the buffer is relaxed by the coarse quantization preceding the residue generation.

#### **3.2. Virtual-Ground Sampling**

The hierarchical T/Hs typically consist of a source follower (open-loop) buffer or voltage follower (close-loop) buffer (shown in Fig. 3.1) and a MOS sampling switch at the input or output of the buffer. The linearity is limited by the full-scale voltage swing experienced by the sampling switch and buffer at both the input and output nodes.

Other than raising the voltage headroom using switch gate bootstrapping and/or highvoltage supply at the risk of over-voltage stress, an alternative is developing a T/H front-end based on virtual-ground sampling as shown in Fig. 3.2, where the sampling switch is placed at the input virtual ground node of a closed-loop buffer. This sampling architecture offers three benefits: (1) Since only the buffer output sees full-scale voltage swing, which can be easily accommodated with a high-swing op-amp output stage, the linearity is significantly improved. (2) The loop gain is expected to render better linearity for the buffer and T/H switch without the need for switch gate bootstrapping that causes reliability concerns, input feedthrough, clock feedthrough, excessive switch parasitics  $(C_{BS})$  and transient noise. (3) Ideally, the virtualground switch has minimal overhead comparing with the conventional bootstrapping counterpart. (4) The input parallel resistors  $(R_{in})$ , which can function as input impedance termination without adding extra thermal noise, isolate the switch kickback noise and nonlinear parasitics from each other and the feedthrough from the input, thus alleviating the total harmonic distortion (THD) and BW tradeoff against the sampling rate (*fS*) or the interleaving factor N. The only limitation in this sampling architecture is Op-Amp headroom and its Gainbandwidth Product (GBW), which affect the T/H bandwidth for one channel, while relaxes the bandwidth limitation of the overall N-way TI T/H front-end.



**Fig 3.1, Conventional T/H front-end with close-loop voltage follower buffer**



**Fig 3.2, Virtual ground sampling based T/H front-end**

### **3.3. New ADC Architecture**

#### 3.3.1. T/H front-end BW, THD and Power Consumption

The analog input bandwidth of an interleaver depends on the size of the sampling capacitor, which is usually sized not much larger than required by the kT/C noise for the target SNDR. Furthermore, the analog input bandwidth depends on the input network. For high-speed ADCs, the input is often terminated with a 50  $\Omega$  resistor to reduce signal reflections. The ADC sees a 25  $\Omega$  resistance from the termination resistor and the input source resistance. A lower resistance is beneficial to obtain a higher bandwidth, and could be achieved, e.g., by introducing a buffer with a low output impedance or using a resistive divider as termination resistor. The former has an inherent bandwidth limit due to the buffer transistors, and the latter reduces the signal swing at the ADC. For a given input resistance and capacitor size, the remaining degree of freedom is the architecture of the interleaver, specifically, the number of switches in each stage and their sizing.

The BW and THD analysis and simulations are conducted on the first stage of a typical hierarchical T/H interleaver [25], [32-33] and the proposed closed-loop counterpart, captured in Fig. 3.3 as case (a) and (b), respectively. Standard 50-ohm source (RS) instead of a T/H buffer is assumed to directly drive each interleaver with 50-ohm input termination (*Rterm*). Output buffers driving the following hierarchical stages are included as part of the comparison. Based on the simplified equivalent RC models in Fig. 1, the track mode transfer functions *Ha(s)* and  $H_b(s)$  from source to the interleaver output for case (a) and (b), respectively, are derived as follows:

$$
H_a(s) = \frac{1/2}{(1 + sR_{2a}C_{2a})(1 + sR_{1a}C_{1a}) + sR_{1a}C_{2a}}
$$
(3-1)

$$
H_b(s) \approx \frac{-1}{1 + s(R_i + R_{1b} + R_{2b})(C_i + C_{1b} + C_{2b})} \times \frac{1}{(1 + \frac{s}{2\pi\beta \times GBW})}
$$
(3-2)

respectively, where it is assumed that the SF unity-gain buffer bandwidth is much higher than the preceding T/H stage for case (a);  $(C_i, C_{1b}) \ll C_{2b}, C_f = C_{in} = C_{2b}, R_f = R_i + R_{1b} + R_{2b}$ , and feedback factor  $\beta = 0.5$  for case (b); and  $R_{on} = 1/g_m$ ,  $C_g = 2C_{J_o}$  for both cases (a) and (b).



(a)



(b)



**Fig. 3.3. Time-interleaved T/H front-ends (a) the typical case using bootstrapped switches and output source follower buffer, (b) the proposed case using virtual-ground** 



**sampling combined with closed-loop buffer**

**Fig. 3.4. Estimated BW versus interleaving factor N with different Nc, based on** 

**equation (3-3), Op-amp GBW = 30 GHz** 



#### **Fig. 3.5. Simulated BW and THD for two cases with same T/H switch size in 28-nm**

#### **CMOS** and same load  $C_L$ , with  $N_C = 2$ ,  $O_p$ -amp  $GBW = 30$   $GHz$

For a first order BW comparison between case (a) and (b), *Ha(s)* and *Hb(s)* are simplified to a one (dominant) pole expression, and  $R_{on}C_g$  products in the expressions are related to the transition frequency  $f_T = g_m/2\pi C_g$  as follows:

$$
H_a(s) \approx \frac{1/2}{1+s\left(25+\frac{50}{N_C}\right)\Omega\left[N\left(20fF+\frac{0.8mS}{p_a f_T}\right)+N_C\times64fF\right]}\tag{3-3}
$$

$$
H_b(s) \approx \frac{-1}{1 + s100\Omega \left[ N \frac{0.8 \text{m} S}{p_b f} + N_C \times 64 f F \right]}
$$
(3-4)

where p is a correction factor of  $f_T$  taking the parasitic resistors and capacitors into account [25], and is found by calculating the ratio of layout parasitics from the wiring of a sampling switch to that without wiring. It is assumed that  $p_b \approx 2p_a \approx 0.9$  since case (b) does not have feedthrough issues and avoids the BW reduction by putting differential switches with cross-coupled transistors with gates connected to ground for compensation [25]. Also, it is assumed that *CBS*  $= 20$  fF for case (a), and  $R_{on} = 50 \Omega$ ,  $f_T = 300$  GHz,  $g_m/pf_T << C_{in} = C_L = 64$  fF. The BW of close-loop buffer in case (b) is much higher than the preceding T/H stage, which is true when  $N > 4$ . It is shown in the model estimation in Fig. 3.4 and simulation results in Fig. 3.5 that, as N increases, the proposed T/H front-end exceeds the conventional in BW and THD performance.

In terms of power consumption, it is expected that in case (b) op-amp in a typical two-stage topology burns about twice as much static current as a single stage SF given similar *C<sup>L</sup>* and *gm*.

However, the net power consumption could be comparable or even less for the following reasons: first, the proposed T/H dissipates much lower dynamic power without using gatebootstrapped switches; second, power reduction is possible by reusing the closed-loop buffer for residue generation as mentioned above; finally, the two-stage op-amp output swing is 2X the SF output of case (a) that suffers voltage division by the input impedance termination.

# 3.3.2. Top Level Architecture

Fig. 3.6 shows the top-level architecture for the 2-GS/s 8b two-step pipelined TI ADC using the new virtual-ground sampling technique. The prototype ADC architecture consists of two TI channels in a 2b/7b partition with 1b over-range to accommodate the comparator offsets in the coarse ADC and timing/BW mismatch between the coarse ADC and the main T/H circuit. The coarse flash ADC exhibits good power efficiency at 2b resolutions, compared to the SAR ADC, which consumes larger power at the most-significant-bits (MSBs) settling. After quantization, the leading 2b binary code directly switches the R-2R ladder of the 2b R-DAC. The DAC output is injected into the buffer's virtual ground instead of directly to the T/H capacitor CSA. The T/H buffer sharing has advantages of distortion cancellation and gain matching that will be discussed in detail later. The 7-bit fine quantization is implemented with two SAR ADCs in a ping-pong operation. Since comparator offset calibration is used for each SAR ADC, no residue amplification is needed for each residue generation stage succeeding the T/H buffer.



**Fig. 3.6. ADC architecture featuring virtual-ground sampling**

To verify the function, 2X interleaving is chosen. Higher order of interleaving can benefit more from this structure. First, the bandwidth and linearity are much better for the virtualground sampling structure than the conventional TI counterpart according to (3-3) and (3-4), which is verified through simulations in Fig. 3.5. Second, sharing the T/H buffer with reference generation can reduce gain mismatches among the interleaved ADC lanes, because the input signal and reference pass into the same buffer. On the other hand, small N helps reduce the timing skew mismatch [34]. Assuming the skew has a flat probability density function (PDF), the resolution of the timing mismatch in unit of seconds can be calculated as:

$$
Mismatch\ Resolution \leq 2\sqrt{\frac{N}{N-1}\frac{2}{3\omega_0^2}\frac{1}{2^{2(ENOB)}}}
$$
(3-5)

where  $\omega_0$  is the input angle frequency. In this design, the required timing mismatch resolution is less than 2.7 ps. By carefully re-timing the clock that controls a 2-GHz clock sequence (shown in Fig. 3.7) similar to a custom-designed divider in [35], the problem of timing errors is substantially mitigated. Symmetrical layout of both clock and input signal can minimize systematic mismatch, thus avoiding timing error calibration in the case of the two-way T/H as in this design. Monte Carlo simulation performed on this new T/H structure shows a standard deviation of ~270*fs*, well tolerable for the resolution requirement.

#### 3.3.3. Operation Sequences

Fig. 3.7 illustrates the timing sequence of time-interleaving and pipelining operations for the ADC in this work. A full-rate 2-GHz clock is divided by 2 to generate clocks  $\Phi_{A1}$  and  $\Phi_{B1}$  of 250 ps with a 25% duty cycle. Within 1-ns conversion cycle dedicated to each channel (e.g., channel A), the first 250 ps is devoted to the tracking mode. As  $\Phi_{\text{Al}}$  turns on the virtual-ground switch, the input signal is buffered to  $C_{SA}$ . As  $\Phi_{A1e}$  turns off the reset switch of the succeeding OTA around 15 ps earlier than  $\Phi_{A1}$  (D1 in Fig. 3b),



**Fig. 3.7. Timing sequence (D1=15ps, D2=150-200ps, S=0.5ns and C=1.5ns)**

the buffered signal is sampled on  $C_{SA}$ . Upon the bottom-plate sampling,  $\Phi_{A1e}$  turns off the input virtual-ground switch, the 2b coarse sub-ADC starts digitization at the rising edge of  $\Phi_{A1}$  ADC, the feedback path is opened by turning off  $\Phi_{fA}$ , and the virtual ground XA is reset. After a 150-200ps programmable delay (D2 in Fig. 3b), the feedback is turned on by  $\Phi_{fA}$  before  $\Phi_{A1}$  pac to avoid output clipping. In 500 ps when the coarse binary output bits are ready,  $\Phi_{A1_DAC}$  turns on R-DAC output switch to inject the DAC current into the virtual ground while the residue voltage VresA is generated at OTA's output. At the same time, the residue is sampled and settled by one of the two SAR sub ADCs before  $\Phi_{A1}$  DAC turns off. The DAC settling clocks ( $\Phi_{1}$  DAC) are divided by 2 to generate the sampling clocks of asynchronous SAR ADCs ( $\Phi_{2,1}$  and  $\Phi_{2,2}$ ) with 25% of duty cycle. At this moment, the front-end T/H starts repeating the input tracking while the SAR ADC starts the quantization, which takes 1.5 ns to resolve the fine 7b output. Channel B starts tracking the input signal at the moment of channel A's residue generation.

#### 3.3.4. Distortion Cancellation

The new virtual-ground sampling based front-end is inherently better in linearity due to the closed-loop configuration. The linearity is further enhanced by sharing the closed-loop amplifier for both input and DAC buffering as shown in Fig. 3.6. Since the DAC output is close to the input sample, the distortion error is cancelled substantially when the difference is taken during the residue generation, intuitively described in Fig. 3.8(a). Obviously, the cancellation is more effective with smaller residue or larger coarse resolution, and if the buffer experiences similar nonlinearity processes between the input tracking and the DAC buffering phases. Ideally in the extreme case when the input signal is equal to the DAC output, the distortion is completely cancelled, as shown in Fig. 3.8(b). However, in the tracking phase, the buffer sees a continuoustime signal, while in the residue generation phase, it settles to the DAC output voltage. Such different operation processes introduce nonlinearity mismatch. To minimize such distortion mismatch between the two phases, the closed-loop buffer 3-dB bandwidth is set to be multiple times ( $\sim$ 4 $\times$ ) of the input bandwidth and a 500 ps (2 $\times$  of tracking time) is assigned for the residue and R-2R DAC to be able to settle completely. Fig. 3.9 shows simulated distortion of the designed T/H buffer with R-2R DAC vs. input frequency, with ideal coarse ADCs and fine SAR ADCs. It demonstrates that, with the above design excise, the harmonic distortions within the Nyquist input range can be maintained low.



**Fig. 3.8. Distortion Analysis with (a) general case when Vres > 0, (b) extreme case when** 



 $V_{res} = 0$ 

**Fig. 3.9. Simulated 3rd and 5th harmonic distortions versus input frequency**

#### **3.4. Circuit Building Blocks**

#### 3.4.1. T/H Buffer

Fig. 3.10 depicts a fully-differential Op-Amp in folded-cascade topology used in this design that also utilizes the T/H buffer sharing technique for relaxation of the Op-Amp DC gain requirement. Compared to a pseudo-differential counterpart, a fully-differential topology provides better PSRR and CMRR performance, which can significantly mitigate the impact of noise injected from supply and common-mode generator. A small Miller capacitor and resistor for M3 provide the amplifier with sufficient phase margin. The open-loop Op-Amp achieves 48-dB DC gain and 7.9-GHz GBW in the 28-nm CMOS. The PMOS input transistors can accommodate an input common mode (virtual ground) voltage as low as 0.3V to ensure enough voltage headroom for the virtual-ground switches. In addition, the current sources in combination with a DAC serve to calibrate out the offsets of the input differential pair. The total input-referred noise voltage is around ~440µVrms, i.e., ~58dB SNR.



**Fig. 3.10. A fully differential Op-Amp T/H buffer topology used in this design, with offset calibration**

#### 3.4.2. Coarse ADC and R-DAC

Fig. 3.11 depicts the coarse 2b flash ADC used which includes a  $2-K\Omega$  reference resistor string, three comparators, a sampling capacitor of 16 *fF* at the input of each comparator, and a thermo-to-binary encoder. The comparators of coarse ADC use strong-arm based dynamic structures without pre-amplifier, as shown in Fig. 3.12. Both differential reset switches and pull-up switches are employed to increase the reset speed. In the adopted 28-nm technology, the time constant  $\tau_{cp}$  is designed to be ~7ps, i.e., the regeneration time  $\tau_{cp\_reg}$  is almost  $20\tau_{cp}$  to resolve the first 2b flash ADC and the probability of error is lower to  $\langle 10^{-12}$ . The use of small transistors minimizes the parasitic effects, kickback noise and power consumption, which contributes to larger input-referred offsets. By adjusting the capacitive load at the drain nodes of the input differential pair [36], the comparator offsets are effectively corrected. After the coarse quantization, the flash decision is converted to binary code and stored in the latches whose outputs are gated by R-DAC Enable. As shown in Fig. 3.13, the 2b binary outputs directly control the R-2R DAC; all the DAC switches are connected to either the virtual ground or a common-mode voltage at the same voltage level as the virtual ground to maintain the R/2R ratio with the switch on-state resistance included. This also helps matching the gains between input tracking and residue generation phases. Ultra-low-threshold-voltage-transistors (*µlvt*) are used in this design to minimize the impact of switch on-resistance.



**Fig. 3.11. 2b coarse flash ADC**



**Fig. 3.12. Coarse ADC dynamic comparator schematics**



**Fig. 3.13. R-2R DAC**

#### 3.4.3. Fine SAR ADC and Comparator

Fig. 3.14 shows the simplified single-ended diagram of the fully differential sub-ranging SAR ADC used. The fine ADC employs top-plate sampling to avoid signal attenuation at comparator input, thus minimizing the degradation of SNR referred to SAR ADC input. In addition, top-plate sampling saves the settling time for the fine DAC references because the comparator can start comparison immediately after the residue sampling phase. This feature not only reduces the conversion time, but also avoids high voltage requirements at the comparator input [35]. The SAR ADC uses external references with on-chip decoupling capacitors. With programmable attenuation capacitors ( $C_{\text{att}}$  in Fig. 3.14), the reference voltage is adjusted to align with the full swing of the residue voltage. The comparator (Fig. 3.15) uses a double-tail latch topology [37] with an integrator (M1P/M1N) followed by three parallel differential pairs (M2aP/M2aN, M2bP/M2bN, M3bP/M3bN) and a regenerative latch to accommodate the 1V low supply voltage. The latch reset differential pairs help to minimize the regeneration time by minimizing the device capacitances. In addition, cross-coupled PMOS-switches (M3P/M3N) are used at the output node to boost the regeneration and avoid the glitches.



**Fig. 3.14. 7b fine SAR ADC**



**Fig. 3.15. SAR comparator and glitch filter**

#### **3.5. Measurements and Discussions**

The 2GS/s 8b ADC, as a proof-of-concept prototype for the new high-speed TI ADC architecture utilizing the virtual-ground sampling concept, was designed and fabricated in foundry 28-nm CMOS. Fig. 3.16 shows a die photo for the fabricated ADC having an active area of  $210\mu$ m  $\times$  380 $\mu$ m. A 200-pF on-chip decoupling capacitor of  $\sim$ 0.12 mm<sup>2</sup> is used for the ADC reference supplies that are provided externally. The Op-Amps are powered at 1.4V, while the rest of the ADC uses 1.0V supplies. The measured total power consumption is 25.7 mW where the analog power (including T/H buffers, residue generators, and R-2R DACs) is 21.1 mW and the digital power (including clock generator, digital logic and comparators) is 4.6 mW. The power consumption of T/H buffers is 12.6 mW, nearly half of the whole ADC power consumption. Fig. 3.17 shows the measured DNL and INL. Fig. 3.18 is the measured ADC output FFT spectrum showing a 7.29b ENOB achieved for input signal of 62MHz at 2GS/s and Fig. 3.19 is the measured ADC output FFT spectrum showing a 7b ENOB achieved for input signals up to 900MHz at 2GS/s, respectively. No channel mismatch spectral signatures are observed above the noise level, though the signal image around *fS*/2 apparently grows with the signal frequency, indicating a timing mismatch between the two channels. The SNDR and SFDR are dominated by HD3, especially at high frequencies. As suggested by simulation shown in Fig. 3.9, HD3 and HD5 are effectively suppressed through the closed-loop configuration. The measured HD3 is limited by imperfections of the virtual ground of T/H buffer and the input path virtual-ground switch that can be alleviated by enlarging the switch sizes or further lowering the virtual ground voltage, which is being implemented in our ongoing new design. The HD2 may be attributed to the phase imbalance of the balun and asymmetry of the PCB traces and bonding wires. The simulated SNR is ~48 dB at low input frequency while ~46.5 dB at high input frequency, implying that SAR comparators do not contribute much noise. Fig. 3.20 and Fig. 3.21 shows measured SNDR and SFDR versus input frequency at 2 GS/s, indicating a SNDR ~42.9 dB and a SFDR ~54.88 dB achieved. These results demonstrate that the linearity improvement by the new virtual-ground sampling based ADC architecture is effective across the entire Nyquist band. Table I summarizes the key measured specs for the prototype ADC, showing a favorable comparison over the relevant 8-9b ADCs recently reported.



**Fig. 3.16. 2GS/s 8b ADC Die Photo** 



**Fig. 3.17. ADC (a) DNL and (b) INL** 



**Fig. 3.18.** ADC output spectrum measured at  $f_s = 2$  GS/s and  $f_{in} = 62$  MHz



**Fig. 3.19. ADC output spectrum measured at f<sup>S</sup> = 2 GS/s and fin = 900 MHz**



# **Fig. 3.20. Measured SNDR and SFDR versus input frequency for the ADC at 2 GS/s**



**sampling rate**

**Fig. 3.21. Measured SNDR and SFDR versus sampling frequency for the ADC with** 

**input signal of 62 MHz**



\*Includes Buffer Power

# **Table 3.1. Performance comparison of relevant ADCs reported**

# **CHAPTER 4**

# **An 8.8-GS/s 8b Time-Interleaved SAR ADC with 50-dB SFDR Using Complementary Dual-Loop-Assisted Buffers in 28-nm CMOS**

This chapter will propose an 8.8 GS/s 16-way time-interleaved asynchronous SAR ADC fabricated in 28-nm CMOS technology. A two-level 2×8 master-slave hierarchical interleaved architecture is employed. A complementary dual-loop-assisted buffer is proposed to achieve both high linearity and bandwidth with low power [40].

#### **4.1. Introduction**

With technology scaling, high speed (multi-GS/s) analog-to-digital converters (ADC) followed by digital signal processing (DSP) offer benefits to both modern wireline data receivers and next generation RF wireless receivers. Highly time-interleaved (TI) SAR ADCs can boost the conversion rate up to  $\sim$ 8GS/s and the resolution up to 8 bits [26] [41-42] with improved power efficiency and simpler topology over its TI pipelined counterpart [43] when implemented in deep-scaled CMOS. However, in addition to extra area, more complex clocking and routing networks, massive interleaving structures not only increase input load that limits the signal bandwidth, but also incur various issues including offset, gain and timing mismatches. Consequently, complex calibrations [43] were required thus increasing system design complexity, massive routing, clocking, and total chip area. To achieve sufficient bandwidth, track-and-hold (T&H) buffers are usually employed before and/or after the sampling switches in TI ADCs. Source follower (SF) architecture is widely used due to its good driving capability with low output impedance. However, its linearity suffers from the input gate- and drain-source dependent nonlinearity with constant biasing, and the bandwidth and settling speed degrade due to large loading parasitic capacitances. Several techniques are adopted to improve the bandwidth or linearity, such as bootstrapped SF [10], cascade SF and class-AB operation [41]. For example, [41] achieved a measured SFDR of 41 dB at Nyquist input by applying the above techniques. However, its gain degrades significantly with the cascade stage and the signal swing is limited by the large number of stacking transistors at the output.

 To address these problems, a design of an 8b 8.8-GS/s two-level TI master-slave SAR architecture is reported with three features: First, SF buffers with complementary dual-loopassisted (CDLA) technique are introduced to improve the linearity, bandwidth, and settling speed. Second, it is 16-way TI in a  $2\times 8$  hierarchical structure, which increases the SAR sub-ADC clocking period and relaxes the requirement of settling-related linearity as well as comparator metastability burden. Third, the T&H employs a master-slave topology to reduce the critical timing instant and clock skew calibration complexity. Each of the two master T&Hs followed by the customized buffers is shared between 8 SAR sub-ADCs to increase the bandwidth and reduce front-end loading and related dynamic nonlinearities. The measured results show that this architecture consumes 83.4 mW including all buffers and achieves SFDR better than 50 dB up to Nyquist frequency with a 140-fJ/conv.-step Walden FOM.



Ξ

**Master-Slave Architecture** 

(a)



**Fig. 4.1. (a) Two-step master-slave architecture [25] and (b) Tradeoff simulation** 

# **between settling and input bandwidth**

### **4.2. ADC Architecture and Front-end**

4.2.1. 2×8 Time-Interleaved Architecture Derivation

For a single-channel SAR ADC, a lower speed leads to a larger interleaving number and thus increases the loading for the master T&H stage. On the other hand, a higher speed also causes large power consumption while providing no benefit to reduce the loading of the frontend buffers. Therefore, a master-slave hierarchical TI ADC architecture is selected in this design, as described in Chapter 2, shown in Fig. 4.1(a). By comprehensively consideration and PVT simulations of the energy efficiency of the T&H buffer and SAR sub-ADCs with the target 8b resolution, a 550-MHz clock is specified in this 28-nm prototype design, resulting in a  $16\times$  TI architecture for the targeted  $>8$ -GS/s sampling speed. Fig. 4.1(b) shows the tradeoff of input bandwidth versus the settling time of the T&H buffer and sampling switch in various number of the master TI channels. A single-level  $16\times$  TI T&H limits input bandwidth due to the parasitic capacitances of massive routing and switch transistors. On the other hand,  $1\times16$ two-step TI architecture suffers from a stringent settling requirement, i.e., ~10 ps of time constant  $(\tau)$  of T&H buffer and switch for the targeted resolution. Additionally, the large interleaving number of the first stage T&H leads to more critical time instants and sophisticated time-skew calibrations, which incurs bandwidth mismatches and occupies large chip area. Therefore, a 2×8 two-level TI master-slave architecture is adopted in this design.

#### 4.2.2. Top Level Architecture View and Operation Sequences

Fig. 4.2 shows the block diagram of the proposed  $2\times 8$  hierarchical TI SAR ADC, in which two interleaved channels are implemented for its first stage, where the input signal is buffered by b0 and then sampled and held by using a two-phase 4.4-GHz master sampling clocks generated from the CMOS divider. The sampled signal is then delivered to the buffer b1/b2 followed by the second stage sub ADCs. The second stage is an eight-way time interleaver, where each of the passively sampled signals by the 4.4-GHz clock is further sampled and held using eight-phase sampling pulses and converted into digital codes with eight instances of 550- MHz SAR sub-ADCs. The digitized data stream is then retimed with a common clock and delivered to the decimator for measurement purposes. As this design targets 8b resolution, clock bootstrapping circuit is avoided, thus further alleviates the input parasitic capacitance and timing skew, and saves power. By setting a proper size and overdrive voltage range of the sampling switches driven by  $\Phi_A$ , the achievable sampling bandwidth of this design is higher than 15 GHz,  $\sim$ 3.4 $\times$  of the Nyquist input frequency to compensate for the bandwidth mismatch among the 16-way TI SAR ADCs. The buffer b0 driving the master T&H provides a more constant input impedance for better matching with the  $50\Omega$  PCB traces and reduces signaldependent kickback. The buffers b1 and b2 are used to isolate the T&H switch and large parasitic capacitance of sub-ADCs to increase bandwidth. In addition, timing skew only exists between the two master T&Hs. By minimizing systematic mismatches with symmetrical layout, only device mismatch-related timing skew matters, and the calibration requirement can be significantly relaxed.

The timing diagram in Fig. 4.3 shows the relationship between the master and slave T&Hs. A full-rate 8.8-GHz clock is divided by 2 to generate master clocks  $\Phi_{A1}$  and  $\Phi_{A2}$ . Each of the sampling pulses has a duty cycle of approximately 40% (~90ps) that ensures no overlap between  $\Phi_{A1}$  and  $\Phi_{A2}$  to avoid charge sharing. After the master T&H enters into its hold mode, one of the eight slave T&Hs tracks the sampled signal followed by b1/b2, running at 550 MS/s. Each  $\Phi_B$  clock has 6.25% duty cycle (~113 ps) to minimize the loading of the master T&H because b1/b2 can only support one SAR ADC at each instant, and this gives ~1.7ns for the sub-ADC. The master clock  $\Phi_A$  resumes its tracking mode after the slave clock  $\Phi_B$  turns off to avoid the case that the previously sampled data is mixed with the current data which significantly increases signal distortion. In addition,  $\Phi_A$  turns off before the SAR sub-ADC's sampling clock  $\Phi_B$  turns on to avoid charge-redistribution caused by the non-linear parasitic capacitance between the input and output of the buffers b1/b2.



**Fig. 4.2. Proposed 2×8 master-slave TI ADC block diagram**



**Fig. 4.3. ADC timing sequence**

4.2.3. Master-slave Track-and-Hold with proposed T&H Buffers

The conventional pseudo-differential PMOS source follower (SF), shown in Figure 4.4, is a well-known and widely used voltage buffer in an open-loop track-and-hold amplifier (THA), that effectively drives the following pre-amp stage with its highly linear characteristic and level shifting property. The drawbacks of a SF are gain deterioration (about 0.8 to 0.9) and output swing compression due to short channel effects. These drawbacks will in turn result in a small ADC full scale, and excessive power dissipation by the converter to achieve the desired accuracy. Methods to mitigate the output swing compression effect include the use of cascode bias currents and long-channel length devices. However, doing so will demand a large headroom and cause speed penalty respectively.

When the SF has a large step input, slew rate may limit its settling speed which is proportional to ID/CL. In order to speed up the SF, DC power consumption has to be linearly scaled up. Several buffer designs [10], [41], [44] have been presented to overcome the aforementioned design issues. Some use a bootstrapped stage to fix the drain-source voltage of the current source transistor and therefore remove the distortion; while others dynamically adjust the bias voltage by sensing and feeding back the AC current that flows in the SF. These approaches improve the buffer linearity, but they complicate the circuit structure and cause headroom and settling problems.

Fig. 4.5 depicts the T&H architecture with the schematic of buffers. The basic structure of the buffers is the SF, with PMOS input for b0 and NMOS input for b1/b2 for common mode consideration. To improve its linearity, bandwidth, and settling speed, a complementary dualloop-assisted (CDLA) technique is introduced and applied for b0 and b1/b2. For example, the buffer b0 utilizes a current-feedback loop by sensing the AC current (i1) that flows through the SF transistors (M1/M2) and dynamically adjusts the bias current (i2); consequently, the required current of SF transistors is significantly reduced while still being able to provide sufficient current  $(i1+i2)$  to the sampling capacitors, especially at high frequencies. Fig. 4.6 depicts the simulated AC current between the proposed buffers with dual-loop current feedback and conventional SF with constant biasing under the same power consumption. It can be seen that the current feedback provides more output current, thus significantly widens the bandwidth of the buffer. The current feedback loop not only alleviates the distortion caused by variation of  $V_{GS}$  and  $V_{DS}$  of M1, but also improves the settling speed by increasing the slew rate because the bias current can be entirely directed to the drain of diode-connected transistors M4. In addition to CDLA, other techniques are also adopted to further improve the SF's linearity, bandwidth, and gain. First, an additional SF (M2) is employed to bootstrap the drain-source voltage of M1 [10]. The drain of M1 tracks the input signal with assistance of M2, and this significantly reduces the effective nonlinear  $C_{gd}$  by ~80% which is usually the dominant input capacitance for the conventional SF. Second, the triple-well ultra-low-threshold-voltage (µLVT) transistors are selected for M1 and M2 where their bodies are tied to the SFs' output to eliminate the body effect, thus further improving the SFs' linearity. Similar techniques have been applied to b1/b2 for high bandwidth and linearity as well. In addition, low-thresholdvoltage (LVT) transistor M5 is applied in b0 to create a negative  $g_{ds}$  to improve gain linearity by ~3 dB with push-pull operation.

The master switch employs a PMOS transistor clocked at 4.4 GHz with a cross-coupled pair to cancel the signal feedthrough. The input parasitic capacitances ( $C_{F1}$  and  $C_{F2}$ ) of b1/b2 are sufficient to meet the sampling KT/C noise requirement. The settling time requirement can be easily satisfied with the proposed buffers and proper switches sizing in the 28-nm CMOS technology.

To illustrate the efficiency of the proposed buffer and sampling front-end, Fig. 4.7 depicts the simulated THD of the buffer b0 (node A) and the overall front-end T&H (node B) as a function of the input frequency at a conversion rate of 4.4 GS/s. The THD remains below  $-65$ dB for b0 and  $-52$  dB for overall T&H for input frequency  $\leq 5.5$  GHz, which is ~10 dB lower than conventional SF buffers with same amounts of power. Simulations shown in Fig. 4.8 show the voltage gain of the proposed T&H is  $\ge -0.5$  dB for input frequency up to 7 GHz and the
bandwidth is larger than 15 GHz with input swing of 600 mVPP. The total current of b0 and b1/b2 are ~8 mA and ~12 mA, respectively.



**Fig. 4.4. Conventional Source Follower Buffer**



**Fig. 4.5. Proposed Complimentary Dual-Loop Assisted Buffer**



**Fig. 4.6. Simulated output AC current of front-end S/Hs with proposed and** 



**conventional SF buffers with similar power**

**Fig. 4.7. Simulated THD of front-end S/Hs with proposed and conventional SF buffers** 

**with similar power**



**Fig. 4.8. Voltage gain of front-end S/Hs with proposed and conventional SF buffers with** 

**similar power**

## **4.3. Circuits Building Blocks**

#### 4.3.1. High Speed Clock Generation

To avoid phase mismatch of the cable and PCB traces, single-ended 8.8-GHz clock is generated externally and terminated near the center of the chip, and then converted to differential signal and divided to two 4.4-GHz evenly- spaced clocks ( $\Phi_{A1}/\Phi_{A2}$ ) in Fig. 4.9. In the case of the two-way TI master T&H as in this design, symmetrical layout can remove systematic mismatch. External foreground time-skew calibration is employed to reduce the random timing error between clock paths of the two master T&Hs. Delay lines with digitally controlled MOS capacitor banks are employed in the two-phase distribution network to calibrate the timing skew between the two critical sampling phases. Assuming the skew has a flat probability density function (PDF), the resolution of the timing mismatch in unit of seconds can be calculated as [45], which is approximately 250 *fs* to meet 7b ENOB, and the calibration range is  $\pm 3$  ps for 3-sigma requirement.

To stabilize the delay values under different temperature or power supply variations, constant current inverters are applied to drive the variable capacitance. Offset and gain errors are calibrated by using conventional foreground calibration with DC input references. The eight-phase clock is locally divided down into subsections of 550 MHz each via a ring counter, where the output and the master  $\Phi_A$  together pass a NOR logic gate to generate the nonoverlapping pulses  $(\Phi_B)$  shown in Fig. 4.3.



**Fig. 4.9. High speed clock generation with timing skew adjustment cell and 8-phase non-**

#### **overlap clock divider**

The SAR sub-ADC uses an asynchronous architecture (Fig. 4.10), which achieves better power/speed performance compared with its synchronous counterparts and does not require multiple phase-matched SAR sub-ADC clocks to be distributed. A 550-MHz sampling speed is adopted for unit SAR sub-ADC's clock in this design. To further improve efficiency, the SAR ADC uses two CDACs (split-capacitor DACs). Also sub-radix [26] technique is utilized to provide over-range protection to deal with capacitor mismatch and insufficient settling at the expense of one more conversion cycle, as shown in Fig. 4.10.



**Fig. 4.10. Split-cap asynchronous SAR sub-ADC with redundancy**

#### **4.4. Layout, Measurements and Discussions**

#### 4.4.1. Layout

For a high speed ADC to achieve optimum performances, layout techniques are critical. The layout strategies are presented sequentially with the design flow, which supports fast design iterations.

The core layout design is shown in Fig. 4.11. The signal inputs and clocks are provided from the bottom and top respectively; analog bias voltages or currents come from the left, while high speed digital data goes out on the right to eliminate coupling effects. The two TI channels are mirrored with respect to the input signal wires. The high speed clock generator and T/H front-end are located at the chip center. The conversion blocks are situated sequentially with the signal processing flow, in order to shorten their routing distance, and hence, delay. Clock generation and output buffers are placed away from the analog cores to suppress the substrate noise coupling.

The design guidelines are summarized below.

For Analog Matching,

• Make the matching elements have the same orientation.

• Place the matching elements close together, use "common-centroid" or interdigitated" placement.

• Add dummy elements to improve symmetry, as shown in Fig. 4.12.

• Flip the signal polarity of consecutive differential pairs to average the offsets due to gradient.

• Avoid routing metal over a matching pair.

59

• Put matching poly resistors and metal capacitors on an N-well to suppress the substrate coupling.

• If metal layers cannot meet the density rule, insert dummies manually with same geometric placements.



**Fig. 4.11. Core Layout Overview**



**Fig. 4.12. Dummies in grey to improve symmetry**

For Bias Distribution, distributing bias in current domain is better than in voltage domain since voltages drop along the ground/supply lines. The idea is to rout the reference current to the amplifier vicinity and perform the current mirror operation locally. However, routing all the bias currents with low resistance wires cost a large area, which complicates the critical signals routing. Hence, grouping certain pre-amps and biasing them locally in voltage domain is a tradeoff solution.

For Clock Distribution,

• Run clocks and critical signals perpendicularly to eliminate crosstalk.

• Use multiple metal layers with via arrays to rout clocks in order to reduce their series resistance.

• Use a tree structure to balance the delay of each tap, as shown in Fig. 4.13.



**Fig. 4.13. A clock distribution**

For Guard Rings, in a mixed-signal environment, thousands of digital gates may introduce disturbance in the substrate voltage, especially during digital logic transitions. The disturbance would further corrupt the amplifiers biasing, and hence, their precision. In order to minimize the effect of substrate noise, the following methods can be applied:

• Use differential signaling.

• Distribute digital signals and blocks in complementary form, thereby reducing the net current variation.

• Shorten the bond wires connected to the substrate.

• Increase the space between the sensitive analog blocks and digital blocks. However, this remedy may not be effective, if the substrate operates as a low impedance plane, in which voltage ripples can distribute uniformly regardless of the distance from the noise source.

• Guard rings can be employed to isolate the sensitive circuits from a noisy section. However, how to layout and tie the guard rings is controversial.

#### 4.4.2. Measurement and Discussions

The 8.8-GS/s 8b ADC, as a prototype for the new 2×8 high-speed master-slave TI ADC architecture, is designed and fabricated in TSMC 28-nm 1P7M CMOS process. Fig. 4.14 shows its die photo for the fabricated ADC having a core area of  $400 \mu m \times 320 \mu m$ . A ~300 pF onchip decoupling capacitor of  $\sim 0.18$  mm<sup>2</sup> is used for the ADC reference supplies that are provided externally. The buffer b0 is powered at 1.5 V, which is for the purpose of measurement and integration with analog front-end of the receiver like wireline data link, while the rest of the ADC uses 1.0 V supplies.

The measured total power consumption is 85.4 mW, where the power of input buffer b0 is 15.1 mW, the analog power with 1V supply (including T/H buffer b1/b2, SAR sub-ADCs) is 60.7 mW and high speed clock generator power is 9.6 mW. Fig. 4.15 is the measured ADC output FFT spectrum showing a 6.5b ENOB achieved for input signals up to 141.2 MHz at 8.8 GS/s sampling speed while Fig. 4.16 is the measured ADC output FFT spectrum showing a 6b ENOB achieved for input signals up to 4.26 GHz at 8.8 GS/s sampling speed. No channel mismatch spectral signatures are observed above the noise level, though the signal image around *fs*/2 apparently grows with the signal frequency. The SFDR is dominated by HD2 at high frequencies, which may be attributed to the phase imbalance of the balun and asymmetry of the PCB traces and bonding wires. Fig. 4.17 and Fig. 4.18 show the measured SNDR and SFDR versus sampling rate for input frequency of 141.2 MHz and versus input frequency at 8.8 GS/s sampling rate, indicating a peak SNDR = 41.14 dB and SFDR = 52.35 dB achieved. Table I summarizes the key measured performance parameters for the prototype ADC, showing a favorable comparison over recently reported 6-8b >4GS/s ADCs.



**Fig. 4.14. A die photo for the designed 8.8-GS/s 8b ADC** 



**Fig. 4.15. Measured ADC spectrum at 8.8 GS/s with** *fin* **= 141.2 MHz**



**Fig. 4.16.** Measured ADC spectrum at 8.8 GS/s with  $f_{in} = 4.258$  GHz



**Fig. 4.17. Measured SNDR/SFDR versus sampling rate for** *fin* **= 141.2 MHz**



**Fig. 4.18. Measured SNDR/SFDR versus input frequency for**  $f_s = 8.8$  **GHz** 



**\*includes input buffers** 

**Table 4.1. Performance comparison of reported 6-8b >4GS/s ADCs**

### **4.5. Performance Summary**

The results are summarized in Table 4.2. Compared with the state-of-the-art of high-speed and medium-resolution converters [11, 42, 43], as shown in Table 4.3, this design achieves the best FOM and most compact area from a master-slave ADC and cost-effective manufacturing.



**Table 4.2. Performance Summary of the prototype ADC**

# **CHAPTER 5**

## **Conclusion**

This dissertation focused on high-speed time-interleaved A/D converters designed in advanced CMOS processes for pulse radar system and next generation ADC-based data link receiver applications. Specifically, this research focused on solving the design challenges of track-and-hold front-end and proposed two efficient techniques and successfully demonstrated them in two prototype ADCs.

First, we overviewed the time-interleaved Track and Hold architecture. Mismatch between channels, like differences in offset, gain and timing, degrade the performance. Three T&H architectures were discussed, one with a frontend sampler, another without and the third with hierarchical sampler with additional switches to decrease the capacitance after the frontend sampler. Then we introduced T/H buffer and its typical architectures, advantages and limitations.

Second, we report a time-interleaved two-step ADC architecture built upon a new concept of virtual-ground sampling and using merged front-end T/H with residue generation, input termination and buffering, aimed to push ENOB of time-interleaved ADCs from a current 5~6b range to 7~9b level without degrading conversion rates. The new ADC architecture is validated using a 2-GS/s 8b ADC fabricated in foundry 28-nm CMOS, achieving 43-dB SNDR and 55 dB SFDR up to Nyquist frequency.

Finally, we present design and implementation of a two-level TI master-slave SAR ADC prototype with CDLA T&H buffers for high bandwidth and linearity. This prototype is validated using an 8.8-GS/s 8b ADC fabricated in a foundry 28-nm CMOS, achieving 38.4-dB SNDR and 50-dB SFDR up to the Nyquist frequency. The measured results demonstrate that this architecture achieves superior power- and area- efficiency compared with reported stateof-the-art recent designs.

#### **REFERENCES**

- [1] T. Kishigami *et al.*, "Advanced millimeter-wave radar system using coded pulse compression and adaptive array for pedestrian detection," in *Proc. IEEE Radar Conf.*, Apr. 2013, pp. 1–6.
- [2] J. Sato, K. Takinami, and K. Takahashi, "A 2-GS/s 8-bit Time-Interleaved SAR ADC for Millimeter-Wave Pulsed Radar Baseband SoC," *IEEE J. Solid-State Circuits*, vol.52, no.10, pp. 2712-2720, Oct. 2017.
- [3] S. Palermo, "CMOS ADC-based receivers for high-speed electrical and optical links," in *IEEE Communications Magazine,* vol. 54, no. 10, Oct. 2016, pp. 168–175.
- [4] I. Dedic, "56 GS/s ADC enabling 100 GbE," in *Proc. Opt. Fiber Commun./Nat. Fiber Opt. Eng. Conf. (OFC/NFOEC),* Mar. 2010, pp. 1–3.
- [5] H. Wei, P. Zhang, B. D. Sahoo, and B. Razavi, "An 8-bit 4-GS/s 120-mW CMOS ADC," in *Proc. IEEE Custom Integr. Circuits Conf.*, pp. 1–4, Sep. 2013.
- [6] D. Stepanovic, and B. Nikolic, "A 2.8 GS/s 44.6 mW Time-Interleaved ADC Achieving 50.9 dB SNDR and 3 dB Effective Resolution Bandwidth of 1.5 GHz in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol.48, no.4, pp. 971-982, Apr. 2013.
- [7] N. Dortz *et al.*, "A 1.62GS/s Time-Interleaved SAR ADC with Digital Background Mismatch Calibration Achieving Interleaving Spurs Below 70dBFS," *ISSCC Dig. Tech. Papers*, pp. 386-388, Feb. 2014.
- [8] S. Jamal, D. Fu, N.-J. Chang, P. Hurst, and S. Lewis, "A 10-b 120-msample/s timeinterleaved analog-to-digital converter with digital background calibration," *Solid-State Circuits, IEEE Journal of,* vol. 37, pp. 1618-1627, Dec. 2002.
- [9] B. Razavi, "Design of Sample-and-Hold Amplifiers for High-Speed Data Converters," (Invited) Proc. *IEEE CICC*, pp. 59-66, May 1997.
- [10]S. M. Louwsma, E. J. M. van Tuijl, M. Vertregt, and B. Nauta, "A time-interleaved track & hold in 0.13µm CMOS sub-sampling a 4-GHz signal with 43-dB SNDR," in *IEEE CICC*, Sep. 2007, pp. 329–332.
- [11]M. Q. Le, et al., "A background calibrated 28GS/s 8b interleaved SAR ADC in 28nm CMOS," IEEE CICC, Apr. 2017.
- [12]Boris Murmann, "ADC Performance Survey 1997-2017," [Online]. Available: [http://www.stanford.edu/~murmann/adcsurvey.html.](http://www.stanford.edu/~murmann/adcsurvey.html)
- [13]Simon Louwsma, Ed van Tuijl, Bram Nauta. "Time-interleaved Analog-to-Digital Converters", Springer, 2011.
- [14]B. Razavi, "Design considerations for interleaved ADCs," *IEEE J. Solid-State Circuits*, vol. 48, no. 8, pp. 1806–1817, Aug. 2003.
- [15]Benwei Xu, CALIBRATION TECHNIQUES FOR HIGH SPEED TIME-INTERLEAVED SAR ADC. PhD dissertation, The University of Texas at Dallas, 2017.
- [16] N. Kurosawa et al., "Explicit analysis of channel mismatch effects in time-interleaved ADC systems," *IEEE Tran. Circuits Syst. I,* vol. 38, no. 3, pp. 261–271, Mar. 2001.
- [17]W. C. Black, D. A. Hodges, Time interleaved converter arrays. *IEEE J. Solid-State Circuits*, vol. 15, no. 6, pp. 1022–1029, 1980.
- [18]K. Poulton, J.J. Corcoran, T. Hornak, A 1-GHz 6-bit ADC System. *IEEE J. Solid-State Circuits*, vol. 22, no. 6, pp. 962–970, 1987.
- [19]S. K. Gupta, M. A. Inerfield, J. Wang, A 1-GS/s 11-bit ADC with 55-dB SNDR, 250- MAW power realized by a high bandwidth scalable time-interleaved architecture. *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2650–2657, 2006.
- [20]H. Pan, M. Segame, M. Choi, J. Cao, A.A. Abidi, A 3.3-V 12-b 50-MS/s A/D converter in 0.6-μm CMOS with over 80-dB SFDR. *IEEE J. Solid-State Circuits*, vol. 35, no. 12, pp. 1769–1780, Dec 2000.
- [21]W. Yang, D. Kelly, L. Mehr, M.T. Sayuk, L. Singer, A 3-V 340-mW 14-b 75-Msample/s CMOS ADC with 85-dB SFDR at Nyquist input. *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1931–1936, 2001.
- [22]K. Hadidi, A. Khoei, A highly linear cascode-driver CMOS source-follower buffer, in *IEEE Intl. Conf. on Electronics, Circuits and Systems*, pp. 1243–1246, 1996.
- [23]C.-C. Hsu, F.-C. Huang, C.-Y. Shih, C.C. Huang, Y.-H. Lin, C.-C. Lee, B. Razavi, An 11 b 800 MS/s time-interleaved ADC with digital background calibration, in *ISSCC Dig. Tech. Papers*, pp. 464–465, Feb 2007.
- [24]X. S. Wang et al., "A 2-GS/s 8-bit ADC featuring virtual-ground sampling interleaved architecture in 28-nm CMOS," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, to be published, doi: 10.1109/TCSII.2017.2758323.
- [25]L. Kull et al., "Implementation of Low-Power 6-8b 30-90 GS/s Time-Interleaved ADCs with Optimized Input Bandwidth in 32 nm CMOS," IEEE *J. Solid-State Circuits*, vol. 51, no. 3, Mar. 2016, pp. 636–648.
- [26]J. Cao et al., "A Transmitter and Receiver for 100Gb/s Coherent Networks with Integrated 4×64GS/s 8b ADCs and DACs in 20nm CMOS," *ISSCC Dig. Tech. Papers*, pp. 484-485, Feb. 2017.
- [27]D. Cui et al., "A 320mW 32Gb/s 8b ADC-Based PAM-4 Analog Front-End With Programmable Gain Control and Analog Peaking in 28nm CMOS," *ISSCC Dig. Tech. Papers*, Feb. 2016, pp. 58–59.
- [28]M. Kramer, E. Janssen, K. Doris and B.Murmann, "A 14-Bit 30-MS/s 38-mW SAR ADC Using Noise Filter Gear Shifting" *IEEE Trans. Circuits Syst. II: Exp. Briefs*, vol. 64, no. 2, pp. 116–120, Feb. 2017.
- [29]T. Miki, T Ozeki and J. Naka, "A 2GS/s 8b time-interleaved SAR ADC for millimeterwave pulsed radar baseband SoC," in *Proc. IEEE Asian Solid-State Circuits Conf.(A-SSCC),* Toyama, Japan, Nov.2016, pp. 5–8.
- [30]S. Kundu, et al., "A 1.2 V 2.64 GS/s 8bit 39 mW Skew-Tolerant Time-interleaved SAR ADC in 40 nm Digital LP CMOS for 60 GHz WLAN," *Proc. CICC Dig. Tech. Papers,* Sep. 2014, pp. 1–4.
- [31]H. Pan and K. Abdelhalim, "Distributed virtual ground switching for SAR and pipeline ADCs," *US Patent No.: 9,356,593 B2*, May 31, 2016.
- [32]K. Doris, E. Janssen, C. Nani, A. Zanikopoulos, and G. Weide, "A 480mW 2.6 GS/s 10b Time-Interleaved ADC With 48.5 dB SNDR up to Nyquist in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 46, no. 12, Mar. 2011, pp. 2821–2833.
- [33]H. Pan and I. Fujimori, "Hierarchical Parallel Pipelined Operation of Analog and Digital Circuits," *US Patent No.7,012,559 B1*, Mar 14, 2006.
- [34]N. Kurosawa, H. Kobayashi, K. Maruyama, H. Sagawara, and K. Kobayashi, "Explicit analysis of channel mismatch effects in time-interleaved ADC systems," *IEEE Trans. Circuits Syst. I: Fund. Theory Appl*., vol. 48, no. 3, pp. 261–271, Mar. 2001.
- [35]M. Inerfield et al., "An 11.5-ENOB 100-MS/s 8mW dual-reference SAR ADC in 28nm CMOS," in *IEEE Symp. VLSI Circuits*, June 2014.
- [36]A. Abidi and H. Xu, "Understanding the regenerative comparator circuit," Proc. CICC Dig. Tech. Papers, Sep. 2014, pp. 1–8.
- [37]D. Schinkel, et al., "A Double-Tail Latch Type Voltage Sense Amplifier with 18ps Setup+Hold Time," *IEEE ISSCC Dig. Tech. Papers*, pp. 314-315, Feb. 2007.
- [38]J. Pernillo and M. P. Flynn, "A 9b 2GS/s 45mW 2X-interleaved ADC," *in Proc. IEEE ESSCIRC*, Sep. 2013, pp. 125–128.
- [39]Y.-C. Lien, "A 4.5-mW 8-b 750-MS/s 2-b/step asynchronous subranged SAR ADC in 28 nm CMOS technology," in *Proc. IEEE Symp. VLSI Circuit*s, Honolulu, HI, USA, Jun. 2012, pp. 88–89.
- [41]M. Q. Le, et al., "A background calibrated 28GS/s 8b interleaved SAR ADC in 28nm CMOS," *IEEE CICC*, Apr. 2017.
- [42]L. Kull, et al., "A 35mW 8b 8.8GS/s SAR ADC with Low-Power Capacitive Reference Buffers in 32nm Digital SOI CMOS," *IEEE Symp. VLSI Circuits*, Jun 2013.
- [43]H. Wei, P. Zhang, B. D. Sahoo, and B. Razavi, "An 8-bit 4-GS/s 120-mW CMOS ADC," in *Proc. IEEE Custom Integr. Circuits Conf*., pp. 1–4, Sep. 2013.
- [44]Hairong Yu, A 1V 2.5GS/S 8-BIT SELF-CALIBRATED FLASH ADC IN 90NM GP CMOS. PhD dissertation, The University of California, Los Angeles, 2008.
- [45]M. El-Chammas and B. Murmann, "General analysis on the impact of phase-skew in timeinterleaved ADCs," *IEEE Trans. Circuits Syst. I: Reg. Papers*, vol. 56, no. 5, pp. 902–910, May 2009.
- [46]B. P. Ginsburg and A. P. Chandrakasan "An energy-efficient charge recycling approach for a SAR converter with capacitive DAC," *Proc. IEEE Int. Symp. Circuits and Systems*, vol. 1, pp. 184 -187, 2005.