## Title

Low Jitter Techniques for High-Speed Phase-Locked Loops

## Permalink

https://escholarship.org/uc/item/4t69s8z1

## Author

Zhao, Yu

## Publication Date

2022
Peer reviewed|Thesis/dissertation

# UNIVERSITY OF CALIFORNIA 

Los Angeles

Low Jitter Techniques for High-Speed Phase-Locked Loops

# A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical and Computer Engineering 

 byYu Zhao
© Copyright by
Yu Zhao
2022

ABSTRACT OF THE DISSERTATION<br>Low Jitter Techniques for High-Speed Phase-Locked Loops<br>by<br>Yu Zhao<br>Doctor of Philosophy in Electrical and Computer Engineering<br>University of California, Los Angeles, 2022<br>Professor Behzad Razavi, Chair

The problem of clock generation with low jitter becomes much more challenging as wireline transceivers are designed for higher data rates, e.g., $224 \mathrm{~Gb} / \mathrm{s}$. This dissertation addresses the clock generation problem and proposes both integer- $N$ and fractional- $N$ phase-locked loop architectures that achieve low jitter with low power consumption.

This dissertation consists of two parts. We first introduce an integer- $N$ PLL that incorporates two new techniques. A double-sampling architecture samples both the rising and falling edge of the reference clock, which improves the in-band phase noise by 3 dB . Also, a robust retiming technique is presented to reduce the phase noise of the frequency divider. Fabricated in 28 nm CMOS technology, the $19-\mathrm{GHz}$ prototype achieves an rms jitter of 20.3 fs from 10 kHz to 100 MHz with a spur of -66 dBc , all at a power of 12 mW .

Next, we propose a $56-\mathrm{GHz}$ fractional- $N$ PLL targeting $224-\mathrm{Gb} / \mathrm{s}$ PAM4 transmitters. The PLL employs a novel current-mode FIR filter to avoid phase and frequency detectors (PFDs) and charge pumps and to suppress the DSM quantization noise with negligible noise folding. To provide a compact solution suited to multi-lane systems, the PLL also incorporates an inductorless divide-by- 8 circuit that draws 3.1 mW . Fabricated in 28-nm CMOS technology, the PLL exhibits an rms
jitter of 110 fs , consumes 23 mW , and occupies an active area of $0.1 \mathrm{~mm}^{2}$.

The dissertation of Yu Zhao is approved.
Chee Wei Wong
Danijela Cabric
Gregory J. Pottie
Behzad Razavi, Committee Chair

## University of California, Los Angeles

2022

To my parents

## TABLE OF CONTENTS

1 Introduction ..... 1
1.1 Motivation ..... 1
1.2 Thesis Organization ..... 2
2 Background ..... 4
2.1 Basic Phase-Locked Loops ..... 4
2.2 PLL Jitter Optimization ..... 5
2.2.1 Reference Phase Noise ..... 5
2.2.2 Reference Buffer Phase Noise ..... 8
2.2.3 Phase/Frequency Detector Phase Noise ..... 12
2.2.4 Charge Pump Noise ..... 12
3 A 19-GHz Integer- $N$ PLL with 20.3-fs Jitter ..... 15
3.1 Double-Sampling PD ..... 16
3.2 Reference Phase Noise Reduction ..... 19
3.3 PD Transfer Function and Phase Noise ..... 22
3.4 Effect of Duty Cycle Error ..... 23
3.5 Duty Cycle Detection and Correction ..... 24
3.6 VCO and $\div 2$ Stage ..... 25
3.7 Multimodulus Divider ..... 26
3.8 Nonoverlapping Clock Generator ..... 30
3.9 Experimental Results ..... 31
4 A 56-GHz 23-mW Fractional- $N$ PLL with 110-fs Jitter ..... 40
4.1 Design Challenges ..... 40
4.2 Linearity analysis of the Phase-Domain FIR filter ..... 41
4.2.1 Resistor-based FIR filter ..... 42
4.2.2 Switched-current FIR filter ..... 43
4.3 Proposed PLL architecture ..... 45
4.3.1 Divide ratio ..... 46
4.3.2 Switched-current FIR/PD Implementation ..... 48
4.3.3 Phase Detection ..... 49
4.3.4 PD Gain Limit ..... 51
4.3.5 Cascode current source ..... 52
4.3.6 Binary Delay Line ..... 53
4.3.7 VCO and $\div 8$ circuit ..... 56
4.4 Experimental Results ..... 57
5 Conclusion ..... 60

## LIST OF FIGURES

1.1 A 224G PAM4 wireline transmitter. ..... 2
1.2 (a) An ADC clocked by a PLL and, (b) tolerable jitter for a $56-\mathrm{GHz}$ ADC for SNR penalties of 1,2 , and 3 dB . ..... 3
2.1 (a) Basic PLL architecture, (b) output profile due to reference phase noise, and (c) output profile due to VCO. ..... 5
2.2 Integrated jitter of PLL as a function of reference frequency and reference phase noise. ..... 7
2.3 Optimum integrated jitter of PLL as a function of reference frequency and reference phase noise for a VCO phase noise of $-113 \mathrm{dBc} / \mathrm{Hz}$ at $1-\mathrm{MHz}$ offset. ..... 8
2.4 $S R_{\text {out }} / S R_{\text {in }}$ ratio vs reference frequency. ..... 9
2.5 (a) CMOS inverter input/output waveforms during sharp transitions, and (b) noise win- dow of NMOS device in RBUF with a sinusoidal input. ..... 10
2.6 Phase noise of reference buffer at different input frequencies. ..... 11
2.7 (a) Optimized TSPC PFD, and (b) phase noise of TSPC PFD. ..... 14
3.1 Proposed PLL architecture. ..... 15
3.2 (a) Single-sampling PD, and (b) its time-domain waveforms. ..... 16
3.3 (a) Double-sampling PD, (b) its time-domain waveforms, and (c) its simulated phase noise. ..... 18
3.4 Double-sampling PD detecting (a) rising edge of $V_{R E F}$, or (b) falling edge of $V_{R E F}$. ..... 20
3.5 Simulated phase noise of RBUF in a noiseless PLL. ..... 20
3.6 Double-sampling PD response to RBUF supply noise. ..... 21
3.7 Effect of RBUF flicker noise. ..... 22
3.8 $V_{R E F}$ duty cycle error. ..... 23
$3.9 V_{R E F}$ duty cycle correction circuit. ..... 24
3.10 (a) VCO implementations, and (b) its simulated phase noise. ..... 25
3.11 (a) $\mathrm{C}^{2} \mathrm{MOS} \div 2$ circuit, and (b) its simulated phase noise at an input frequency of 20 GHz. ..... 26
3.12 (a) Modular divider, (b) multimodulus divider with one flipflop as retimer, (c) timing diagram, and (d) multimodulus divider with two flipflops as retimers. ..... 28
3.13 (a) Proposed multimodulus divider with 3 flipflops as retimers, and (b) its simulated phase noise spectrum. ..... 30
3.14 (a) Nonoverlapping clock generator, (b) nonoverlapping clock waveform. ..... 31
3.15 Measured PLL output spectrum. ..... 32
3.16 Measured PLL phase noise. ..... 33
3.17 Die photograph. ..... 34
3.19 Measured spur level due to the RBUF supply disturbance. ..... 34
3.18 Measured phase noise of the $250-\mathrm{MHz}$ crystal oscillator. ..... 35
3.20 (a) Measured spur levels, and (b) PLL output jitter against $V_{R E F}$ DCE. ..... 36
3.21 Measured transient response of $\mathrm{V}_{\mathrm{PD}, \mathrm{CM}}$. ..... 36
3.22 Measured PLL frequency transient response ..... 37
3.23 Comparison to the state-of-the-art low-jitter PLLs. ..... 38
4.1 A general 56-GHz Fractional- $N$ PLL. ..... 40
4.2 (a) Resistor-based 2-tap FIR summer, (b) input and response of the resistor-based FIR summer. ..... 42
4.3 (a) A two-tap Switched-current FIR filter, (b) input and response of the switched- current FIR summer. ..... 43
4.4 A two-tap switched-current FIR filter with finite output resistance. ..... 45
4.5 Proposed PLL architecture. ..... 46
4.6 (a) Normalized PLL closed-loop response and, (b) PLL output $\Delta \Sigma$ phase noise spec- trum with $\div 4$ and $\div 8$ circuit. ..... 47
4.7 (a) Implementation of the 22-tap FIR/PD and, (b) waveforms of the FIR control signal. ..... 48
4.8 Monte-Carlo results showing variation of $\Delta \Sigma$ Jitter. ..... 49
4.9 (a) $\Delta \Sigma$ phase error probability distribution without FIR filtering and, (b) with FIR filtering. ..... 50
4.10 Switched-current FIR operation in the integer- $N$ mode. ..... 52
4.11 (a) I-V characteristic of a current unit with and without cascode and, (b) simulated $\Delta \Sigma$ phase noise spectrum of switched-current FIR with and without cascode. ..... 53
4.12 (a) FIR delay output without binary delay, and (b) FIR delay output without binary delay. ..... 55
4.13 TSPC flip-flop. ..... 55
4.14 (a) VCO implementations, and (b) its simulated phase noise. ..... 56
4.15 (a) $\div 2$ circuit with feedforward, and (b) its simulated frequency range. ..... 57
4.16 Die photograph. ..... 58
4.17 Measured PLL output spectrum. ..... 58
4.18 Measured PLL fractional spur levels. ..... 59

## LIST OF TABLES

3.1 Performance summary and comparison to prior art . . . . . . . . . . . . . . . . . . . 38
4.1 Performance summary and comparison to prior art . . . . . . . . . . . . . . . . . . . 59

## ACKNOWLEDGMENTS

I would like to express my sincere gratitude to my advisor, Professor Behzad Razavi, for advising my Ph.D. research. It is my great honor to be his Ph.D. student and it is a pleasure working with him. Professor Razavi is hard-working, creative, patient and organized. I had a hard time at the beginning of my Ph.D. research due to the transition from the role of an engineer to that of a researcher. He set me an example about how to be a researcher and provided me very useful ideas when I came across a technical problem. He sets an excellent example for me in my future career.

I would like to thank all the members of our group that I had overlap with, namely, Dr. Long Kong, Dr. Atharav, Dr. Yikun Chang, Dr. S. Hossein Razavi, Dr. Mehrdad Babamir, Onur Memioglu and Matias Jara. I would like to express my special thanks to Dr. Atharav for his suggestion on research and tape-out procedure. I am thankful to Yikun for sharing her research experience. I'm thankful to Long Kong for encouraging me to start the Ph.D. research. I'm grateful to Hossein for sharing his slides of HFSS simulation. I am also thankful to Hossein, Mehrdad Onur and Matias for being always available for sharing their knowledge and thoughts that helped my research work. It was a wonderful experience being part of a research group with a very friendly environment.

I would like to thank my friends at UCLA. I would like to thank Dr. Weiyu Leng for sharing his valuable experience of integrated circuit design, layout design and printed circuit board design with me. I would like to thank Dr. Kejian Shi and Dr. Yan Zhang for technical discussions.

I would like to thank Professor Greg Pottie, Professor Danijela Cabric and Professor Chee Wei Wong for serving on my committee and their valuable time to review the thesis and give me suggestions about improving the manuscript. I am grateful to Prof. Sudhakar Pamarti for the inspiring discussions on the phase noise analysis of crystal oscillators. I am thankful to all the professors in the circuit area that provides the best courses on the analysis and design of integrated circuits.

I gratefully acknowledge the TSMC University Shuttle Program for chip fabrication. This research was supported by Realtek Semiconductor.

Finally, I would like to give my greatest gratitude to my parents. I owe every success in my life to them. Their constant love and support are the biggest spiritual power for me to overcome all the challenges and difficulties.

## VITA

2009-2013 B.S. (Electrical Engineering), Shanghai Jiao Tong Univeristy, Shanghai, China.

2013-2015 M.S. (Electrical Engineering), University of California, Los Angeles, USA.

2015-2018
Analog/RFIC Design Engineer, Ubilinx Technologies Inc, San Jose, USA.

2019 Ph.D. Candidate (Electrical and Computer Engineering), University of California, Los Angeles, USA.

## PUBLICATIONS

Y. Zhao and B. Razavi, "A 19-GHz PLL with 20.3-fs Jitter," IEEE Symposium on VLSI Circuits, pp. 1-2, Jun 2021.
Y. Zhao, O. Memioglu and B. Razavi, "A 56-GHz 23-mW Fractional- $N$ PLL with 110-fs Jitter," accepted by International Solid-State Circuits Conference, Feb. 2022.

## CHAPTER 1

## Introduction

### 1.1 Motivation

The demand for higher data rates in wireline systems have been steadily increasing with the dramatic rise of data transport over the Internet. It has been predicted that the data traffic grows by $25 \%$ per year, possibly reaching 20 zetabytes ( $20 \times 10^{21}$ bytes) in 2025 [1]. Such a demand poses several challenges to the clock generation with low jitter in the wireline transmitter/receiver design.

On the transmitter part, PAM4 wireline transmitters operating at $224 \mathrm{~Gb} / \mathrm{s}$ can employ a 56GHz phase-locked loop (PLL) for multiplexing, as shown in Figure 1.1. Such an environment poses three constraints on the design. First, the PLL rms jitter must be no more than a few percent of the symbol period, 8.93 ps , dictating values around $100 \mathrm{fs}_{r m s}$. Second, the PLL should preferably provide fractional- $N$ operation so as to accommodate different crystal frequencies. Third, in a multi-lane system, it is desirable to avoid distributing a $56-\mathrm{GHz}$ clock over long interconnects. Hence the need for a low-power, compact PLL that can be used within each lane.

PAM4 receivers can employ an analog-to-digital converter (ADC) to enable more complex and flexible digital signal processing (DSP) for equalization and symbol detection compared to analog receivers [2]. The ADC-based wireline receiver poses challenging requirements on the PLLs in terms of speed, power consumption, and jitter. Observed in both wireless and wireline systems, this trend arises primarily because of the need for higher data rates. For example, a 112-Gb/s PAM4 wireline receiver employing a 7 -bit $56-\mathrm{GHz}$ ADC incurs 3 dB of signal-to-noise ratio penalty at


Figure 1.1: A 224G PAM4 wireline transmitter.
the Nyquist rate if the clock jitter exceeds $36 \mathrm{fs}_{r m s}$, as plotted in Figure 1.2. While, in practice, the ADC is realized as a number of time-interleaved channels running at lower clock frequencies, this jitter bound still governs the generation of the clocks. Moreover, 12-bit ADCs designed for direct RF sampling [3] face similar jitter constraints as they approach a rate of 20 GHz .

Recent work has demonstrated jitter values below $100 \mathrm{fs}_{r m s}$ at frequencies ranging from 7 GHz to 31 GHz [4-14]. Some of these examples incorporate subsampling; extensive work on subsampling PLLs has been reported [15-19].

### 1.2 Thesis Organization

This dissertation consists of 5 chapters. Chapter 2 reviews the fundamentals of PLLs including bandwidth, noise transfer function and optimal PLL output jitter.

Chapter 3 presents a double-sampling integer- $N$ PLL that samples both the rising and falling edge of the reference clock, which improves the in-band phase noise by 3 dB . A robust retiming technique is used to reduce the phase noise of the frequency divider.

Chapter 4 proposes a $56-\mathrm{GHz}$ fractional-N PLL targeting 224-Gb/s PAM4 transmitters. The PLL employs a current-mode FIR filter to avoid phase and frequency detectors (PFDs) and charge

(a)

(b)

Figure 1.2: (a) An ADC clocked by a PLL and, (b) tolerable jitter for a $56-\mathrm{GHz}$ ADC for SNR penalties of 1,2 , and 3 dB .
pumps and to suppress the DSM quantization noise with negligible noise folding. To provide a compact solution suited to multi-lane systems, the PLL also incorporates an inductorless divide-by- 8 circuit that draws 3.1 mW .

Chapter 5 summarizes the dissertation.

## CHAPTER 2

## Background

This chapter provides the background for the integer- $N$ PLL design and presents the optimization of the loop in terms of the reference and oscillator phase noise.

### 2.1 Basic Phase-Locked Loops

A phase-locked loop (PLL) is a feedback system that generates an output signal whose phase is regulated with respect to that of a reference signal. As in shown in Figure 2.1(a), a general PLL consists of a phase detector (PD), a loop filter, a voltage-controlled oscillator (VCO) and a feedback divider. The phase detector compares the phase of the divider output to that of the reference clock, $f_{\text {REF }}$, and converts the phase difference to a voltage or current signal. The PD output contains periodic pulses at the reference frequency, which disturbs the VCO control voltage [20]. To resolve this issue, a low-pass filter is placed between the PD and the VCO to suppress the high-frequency component of the PD output. A VCO is an oscillator whose oscillation frequency is controlled by its voltage input. A CMOS VCO can be implemented using a ring topology or an LC resonant circuit. The frequency divider takes the output of the VCO and generates an output signal of a frequency equal to $f_{\mathrm{VCO}} / N$, where $f_{\mathrm{VCO}}$ is the VCO oscillation frequency and $N$ is an integer. With the help of the divider, the PLL can generate an output signal whose frequency is $N$ times of the reference frequency. This function is also known as "frequency multiplication".

### 2.2 PLL Jitter Optimization

As we seek jitter values in the range of a few tens of femtoseconds, the contribution of all noise sources becomes significant. We first quantify these contributions and then decide which ones can be avoided. Given our target jitter of $20 \mathrm{fs}_{r m s}$ and the numerous contributors in a typical design, we also explore the possibility of jitter values around a few femtoseconds for some of the functions.

### 2.2.1 Reference Phase Noise

The phase noise of crystal oscillators has become increasingly more critical as sub-100-fs jitter values have been targeted. We predict that PLL bandwidths must fall well below the $f_{R E F} / 10$ rule of thumb if both the reference and the voltage-controlled oscillator (VCO) contributions are to be minimized. We neglect the effect of flicker noise for now.

(a)

(b)

(c)

Figure 2.1: (a) Basic PLL architecture, (b) output profile due to reference phase noise, and (c) output profile due to VCO.

Consider the generic type-II PLL shown in Figure 2.1(a), noting that the reference phase noise, $\mathrm{S}_{\text {REF }}$, experiences a low-pass response as it travels to the output. The loop bandwidth is likely to be far narrower than $f_{R E F} / 10$, and hence the damping factor, $\zeta$, to be greater than 2 , allowing us to assume that the zero and the first pole of the transfer function, $H(s)$, coincide. We thus have

$$
\begin{equation*}
H(s) \approx \frac{N}{1+\frac{s}{\omega_{0}}} \tag{2.1}
\end{equation*}
$$

where $\omega_{0}=2 \pi f_{0}$ denotes the second pole frequency and the loop bandwidth. The reference phase noise emerges at the output as

$$
\begin{equation*}
S_{o u t 1}=\frac{N^{2} S_{R E F}}{1+\frac{\omega^{2}}{\omega_{0}^{2}}} \tag{2.2}
\end{equation*}
$$

yielding a total area of $A_{R E F}=\pi f_{0} N^{2} S_{\text {REF }}$ from $f=-\infty$ to $f=+\infty$.
To appreciate the reference phase noise's significance, let us assume a noiseless PLL and express the output jitter as

$$
\begin{align*}
\sigma_{j} & =\frac{\sqrt{A_{R E F}}}{2 \pi} \frac{T_{R E F}}{N} \\
& =\sqrt{\frac{f_{0}}{4 \pi} \frac{S_{R E F}}{f_{R E F}^{2}}} \tag{2.3}
\end{align*}
$$

where $T_{R E F}=1 / f_{R E F}$. If $f_{0}$ is near its practical upper bound of $0.1 f_{R E F}$, we can plot $\sigma_{j}$ as a function of $S_{R E F}$ and $f_{R E F}$ (Figure 2.2). The resulting envelope indicates that, for $\sigma_{j}=20 \mathrm{fs}_{r m s}$, one can select $200 \mathrm{MHz} \leqslant f_{R E F} \leqslant 500 \mathrm{MHz}$ and $-175 \mathrm{dBc} / \mathrm{Hz} \leqslant S_{R E F} \leqslant-170 \mathrm{dBc} / \mathrm{Hz}$. The situation becomes more difficult if the VCO phase noise is included.

The VCO phase noise contribution, $S_{\text {out } 2}$, can be approximated as shown in Figure 2.1(c), with a plateau up to $\pm f_{1}$ and an $\alpha / f^{2}$ roll-off beyond this offset. The total area under this profile is equal to $A_{V C O} \approx 4 S_{1} f_{1}$, where $S_{1} \approx \alpha / f_{1}^{2}$. For $\zeta>2$, we have $f_{1} \approx f_{0}$. We must now minimize $S_{t o t}=A_{R E F}+A_{V C O}$ as a function of the loop bandwidth, $f_{0}$. The optimum bandwidth is given by

$$
\begin{equation*}
f_{0, o p t}=\sqrt{\frac{4 \alpha}{\pi N^{2} S_{R E F}}}, \tag{2.4}
\end{equation*}
$$



Figure 2.2: Integrated jitter of PLL as a function of reference frequency and reference phase noise.
and the minimum integrated phase noise by

$$
\begin{equation*}
S_{t o t, \text { min }}=4 \sqrt{\alpha \pi N^{2} S_{R E F}} . \tag{2.5}
\end{equation*}
$$

This optimum leads to two attributes. First, the reference and VCO contributions become approximately equal. Second, the plateaus in the output spectra of Figs. 2.1(b) and (c) roughly coincide. This is seen by recognizing that, for $f \ll f_{0}, S_{\text {out } 1}=N^{2} S_{R E F}$ and $S_{\text {out } 2} \approx \alpha / f_{0, \text { opt }}^{2} \approx$ $(\pi / 4) N^{2} S_{R E F}$. It can also be shown that the tails of $S_{\text {out } 1}$ and $S_{\text {out } 2}$ coincide in a similar manner. In other words, the optimum attempts to shape the reference profile so that it resembles that of the VCO.

To obtain the total jitter, we write

$$
\begin{align*}
\sigma_{j} & =\frac{\sqrt{S_{t o t, m i n}}}{2 \pi} \frac{T_{R E F}}{N} \\
& =\sqrt[4]{\frac{\alpha S_{R E F}}{\pi^{3} N^{2}}} \frac{1}{f_{R E F}} \tag{2.6}
\end{align*}
$$

We repeat the plot of Figure 2.2 for Eq. (2.6), assuming that the VCO is so designed as to provide a phase noise of $\alpha / f^{2}=-113 \mathrm{dBc} / \mathrm{Hz}$ at $1-\mathrm{MHz}$ offset (as is the case in our prototype) (Figure 2.3). To obtain a jitter of $20 \mathrm{fs}_{r m s}$, we can still choose $200 \mathrm{MHz} \leqslant f_{R E F} \leqslant 500 \mathrm{MHz}$ and $-175 \mathrm{dBc} / \mathrm{Hz} \leqslant S_{R E F} \leqslant-170 \mathrm{dBc} / \mathrm{Hz}$. With $f_{R E F}=250 \mathrm{MHz}$ and $S_{R E F}=-170 \mathrm{dBc} / \mathrm{Hz}$,


Figure 2.3: Optimum integrated jitter of PLL as a function of reference frequency and reference phase noise for a VCO phase noise of $-113 \mathrm{dBc} / \mathrm{Hz}$ at $1-\mathrm{MHz}$ offset.
we must have a loop bandwidth of 10 MHz . Note that these results apply to subsampling PLLs as well.

### 2.2.2 Reference Buffer Phase Noise

Stand-alone low-noise crystal oscillators typically provide a nearly-sinusoidal output. For example, Crystek's CRBSCS-01-250, used in our measurements, exhibits harmonics that are at least 20 dB below the fundamental. The sampling phase detector can directly sample this sinusoidal waveform [21, 22], but the noise contribution from the sampler and the following Gm stage will be large due to the low phase detector gain. This waveform must be sharpened by an on-chip inverter before reaching the PLL, thereby suffering from additional phase noise. The resulting phase noise adds to that of the crystal oscillator and must be included in the bandwidth optimization described above. The principal issues here are that, owing to the slow input transitions, (1) the inverter transistors inject noise over a long time window, and (2) both devices produce noise on each output edge.

For a sinusoidal input, the output slew rate $\left(S R_{\text {out }}\right)$ strongly depends on the input slew rate $\left(S R_{i n}\right)$. As an approximation, we can say that the two differ by a factor equal to the inverter's


Figure 2.4: $S R_{\text {out }} / S R_{\text {in }}$ ratio vs reference frequency.
small-signal voltage gain, $A_{v}$. At sufficiently high frequencies, however, the output slew rate is also limited by the output current and the load capacitance. We thus expect the general behavior depicted in Figure 2.4. For the reference buffer (RBUF) design in our work, we note that $S R_{\text {out }} / S R_{\text {in }} \approx 9$ at 250 MHz .

The phase noise of an inverter due to the transistors' white noise is derived in [23] for an input with a period of $T_{i n}$ and expressed as

$$
\begin{equation*}
S_{\phi}(f)=\frac{\pi^{2}}{r_{\text {edge }}^{2} C_{L}^{2}} \frac{\Delta T}{T_{i n}}\left[S_{I, N}(f)+S_{I, P}(f)\right] \tag{2.7}
\end{equation*}
$$

where $r_{\text {edge }}$ is the output slew rate (also denoted by $S R_{\text {out }}$ in this paper), $C_{L}$ the load capacitance, $\Delta T$ the noise window shown in Figure 2.5(a), and $S_{I, N}(f)$ and $S_{I, P}(f)$ the noise current spectra of the NMOS and PMOS devices, respectively. ${ }^{1}$ This result is derived for relatively fast input edges, and assumes that only the NMOS device corrupts the falling edge and only the PMOS device, the rising edge.

Equation (2.7) can be extended to the case of a sinusoidal input as follows. We consider the input and output waveforms shown in Figure 2.5(b), noting that the NMOS transistor enters saturation at $t_{1}$. We also assume $t_{1}$ to be the starting point of the PMOS noise window because the noise injected onto $C_{L}$ before $t_{1}$ is discharged by the triode NMOS device. This point is verified by transient simulations in Cadence's Spectre. Another simplifying assumption is that the noise

[^0]

Figure 2.5: (a) CMOS inverter input/output waveforms during sharp transitions, and (b) noise window of NMOS device in RBUF with a sinusoidal input.
injected by the transistors after $t_{\text {mid }}$ is unimportant to the output phase noise [23]. We conclude that, for both transistors, the noise window, $\Delta T$, is from $t_{1}$ to $t_{m i d}$, which is approximately half of the rise time. The overall output phase noise then emerges as:

$$
\begin{equation*}
S_{\phi}(f)=\frac{2 \pi^{2}}{r_{\text {edge }}^{2} C_{L}^{2}} \frac{\Delta T}{T_{i n}}\left[S_{I, N}(f)+S_{I, P}(f)\right] \tag{2.8}
\end{equation*}
$$

where we assume equal output rise and fall times and hence the same $\Delta T$ for the two edges. The factor of 2 accounts for the phase corruption on each edge due to both devices.

The dependence of the RBUF phase noise upon the input frequency is of interest but is made more complex by the behavior depicted in Figure 2.4. In this particular design, the buffer's phase noise decreases by about 1.4 dB if $f_{R E F}$ rises from 40 MHz to 80 MHz . This is because $S R_{\text {out }}$ in Figure 2.4 increases by only a factor of 1.4 and $\Delta T$ decreases by a factor of 1.4 in Eq. (2.8).

With $T_{\text {in }}$ halved, the right hand side of Eq. (2.8) drops by a factor of $(1.4)^{3} / 2 \equiv 1.4 \mathrm{~dB}$. Plotted in Figure 2.6 are the simulated phase noise profiles of our buffer for $f_{R E F}=40 \mathrm{MHz}, 80 \mathrm{MHz}, 160$ MHz and 250 MHz . The key point here is that the buffer's integrated jitter falls as $f_{\text {REF }}$ rises. In Figure 2.6, the corresponding rms jitter values are equal to $79.2 \mathrm{fs}, 33.7 \mathrm{fs}, 14.3 \mathrm{fs}$ and 8.5 fs .


Figure 2.6: Phase noise of reference buffer at different input frequencies.

Besides using higher reference frequencies, the noise-power trade-off of RBUF can also be exploited to reduce its jitter contribution. If the inverter's output capacitance is much greater than the input capacitance of the next stage, every doubling of the transistor widths lowers the phase noise by 3 dB . This can be seen from Eq. (2.8), where $S_{I, N}(f), S_{I, P}(f)$, and $C_{L}$ are doubled while other quantities remain unchanged. In this work, the NMOS and PMOS aspect ratios are 1120 $\mu \mathrm{m} / 400 \mathrm{~nm}$ and $1600 \mu \mathrm{~m} / 400 \mathrm{~nm}$, respectively, leading to a power consumption of 1.3 mW at 250 MHz and the phase noise profile shown in Figure 2.6. With such large dimensions, the buffer still contributes significant jitter, underscoring the future challenges that we will face as we seek smaller jitter values.

The last issue related to RBUF is its supply sensitivity, $K_{D D}$. Typically fed from an on-chip low-dropout (LDO) regulator, RBUF converts the LDO noise to phase noise. For the inverter design described above, $K_{D D}=1.2 \mathrm{rad} / V$. To maintain the supply-induced phase noise about 10 dB below the profile shown in Figure 2.6, the LDO noise spectrum must be less than $0.5 \mathrm{nV} / \sqrt{\mathrm{Hz}}$,
an extremely stringent constraint. For example, an LDO op amp employing a differential pair with ideal exponential transistors would require a tail current of at least 3.4 mA to achieve this noise level. As explained in Section 3.2, our proposed phase detector relaxes this issue by orders of magnitude.

### 2.2.3 Phase/Frequency Detector Phase Noise

The phase noise of phase/frequency detectors (PFDs) has been analyzed in [23], with the conclusion that true single-phase clocking (TSPC) implementations are advantageous. Figure 2.7(a) depicts an example optimized according to [23] and Figure 2.7(b) plots the circuit's simulated phase noise at 250 MHz . Consuming $60 \mu \mathrm{~W}$, the PFD generates an rms jitter of 9.4 fs . For this value to fall below, for example, 5 fs , one would need to multiply the transistor widths by a factor of 3.5. ${ }^{2}$ The PFD therefore does not appear to be serious bottleneck.

### 2.2.4 Charge Pump Noise

The thermal and flicker noise of the up and down current sources in a charge pump (CP) corrupt the current delivered to the loop filter, equivalently generating phase noise. It can be shown that the CP thermal noise referred to the PFD input leads to

$$
\begin{equation*}
S_{C P}(f)=8 \pi^{2} \frac{T_{C P}}{T_{R E F}} \frac{\overline{I_{n}^{2}}}{I_{P}^{2}} \tag{2.9}
\end{equation*}
$$

where $T_{C P}$ denotes the minimum PFD output pulse width, $\overline{I_{n}^{2}}$ the thermal noise spectrum of each current source, and $I_{P}$ the nominal CP current. Neglecting the CP flicker noise and considering typical values for the parameters in Eq. (2.9), we can readily appreciate the difficulties. Suppose we wish the CP contribution in a PLL bandwidth of 10 MHz to be less than 5 fs. From Section 2.2.1, we have

$$
\begin{equation*}
\frac{\sqrt{\pi f_{0} S_{C P}}}{2 \pi} T_{R E F}<5 \mathrm{fs} \tag{2.10}
\end{equation*}
$$

[^1]It follows that $S_{C P}=-177 \mathrm{dBc} / \mathrm{Hz}$ if $T_{R E F}=4 \mathrm{~ns}$. Returning to Eq. (2.9) and assuming (1) $\overline{I_{n}^{2}}=2 k T \gamma g_{m}=2 k T \gamma\left(2 I_{P}\right) /\left|V_{G S}-V_{T H}\right|$, (2) $\left|V_{G S}-V_{T H}\right|=200 \mathrm{mV}$, and (3) $T_{C P}=50 \mathrm{ps}$, we obtain $I_{P}=110 \mathrm{~mA}$.

The foregoing observations suggest that CPs prove ill-suited to low-jitter PLLs.


Figure 2.7: (a) Optimized TSPC PFD, and (b) phase noise of TSPC PFD.

## CHAPTER 3

## A 19-GHz Integer- $N$ PLL with 20.3-fs Jitter

In this chapter, we proposed a low-jitter PLL architecture and analyze the phase noise of each block.

The proposed PLL architecture is shown in Figure 3.1. It consists of a reference buffer, a double-sampling PD (DSPD), a transconductor, a loop filter, a VCO followed by a $\div 2$ stage, and a multimodulus "self-retimed" divider that controls the PD through a nonoverlap generator. We wish to make negligible the jitter arising from the PD, the Gm stage, and the divider. If successful, such an endeavor allows us to apply the optimization described in Section 2.2.1.


Figure 3.1: Proposed PLL architecture.

### 3.1 Double-Sampling PD

The PD proposed here plays a central role in the PLL's performance. Before describing this topology, we consider the (single) master-slave sampling PD introduced in [24,25] and shown in Figure 3.2(a). The circuit adjusts the PLL feedback signal, $\phi_{1}$, such that the sampled value of $V_{R E F}$ $\left(V_{a}\right)$ becomes equal to the control voltage necessary for the VCO (Figure 3.2(b)).

(a)

(b)

Figure 3.2: (a) Single-sampling PD, and (b) its time-domain waveforms.

Next, $\phi_{2}$ and $C_{2}$ resample this level, creating minimal perturbation on $V_{\text {cont }}{ }^{1}$
Owing to the high slew rate of $V_{R E F}$, the master-slave sampling PD exhibits a high gain, thereby minimizing the noise contributed by the switched-capacitors and any other components preceding the VCO. If the slew rate of $V_{R E F}$ in Figure 3.2(b) is denoted by $S R_{R E F}$, this PD's gain emerges as

$$
\begin{equation*}
K_{P D}=\frac{S R_{R E F}}{2 \pi \cdot f_{R E F}} \tag{3.1}
\end{equation*}
$$

[^2]We now turn to the proposed double-sampling PD shown in Figure 3.3(a). Assuming for now that $V_{R E F}$ has a $50 \%$ duty cycle, we note that $C_{1}$ and $C_{2}$ sample $V_{a}$ and $V_{b}$, respectively, such that $V_{a}-V_{b}$ translates to the necessary control voltage for the VCO. The double-sampling action not only provides higher gain than single sampling but also offers new benefits. We elaborate on these points below.

Double sampling increases the PD gain by a factor of 2. This is seen by noting that, in Figure 3.3(b), a phase displacement of $\Delta t$ in $\phi_{1}$ shifts both A and B to the right or to the left, changing $V_{a}$ and $V_{b}$ in opposite directions. Thus,

$$
\begin{equation*}
K_{P D}=\frac{S R_{R E F}}{\pi \cdot f_{R E F}} . \tag{3.2}
\end{equation*}
$$



Figure 3.3: (a) Double-sampling PD, (b) its time-domain waveforms, and (c) its simulated phase noise.

As a result, the kT/C noise components associated with the four switches in Figure 3.3(a) are divided by another factor of 4 when referred to the PD input (Section 3.3), providing a $3-\mathrm{dB}$ reduction in PD's phase noise. For $C_{1}=C_{2}=100 \mathrm{fF}$ and $C_{3}=C_{4}=40 \mathrm{fF}$, simulations yield the phase noise profiles shown in Figure 3.3(c) at 250 MHz . The integrated jitter drops from 2.9 fs to 2.1 fs .

### 3.2 Reference Phase Noise Reduction

The most remarkable advantage of double sampling arises from its ability to reduce the jitter contributed by the crystal oscillator and the reference buffer. We present this property for three sources of phase noise, namely, thermal noise, supply noise, and flicker noise.

Illustrated in Figure 3.4(a), this PD attribute can be understood by assuming that the rising edge of $V_{R E F}$ is displaced by a random amount, $\Delta t_{1}$. Consequently, the sampled voltage inherited by $V_{3}$ in Figure 3.3(a) changes by

$$
\begin{equation*}
\Delta V_{3}=\Delta t_{1} \cdot S R_{R E F} \tag{3.3}
\end{equation*}
$$

if charge sharing between $C_{1}$ and $C_{3}$ is neglected.
Similarly, a displacement of $\Delta t_{2}$ in the falling edge translates to a change of

$$
\begin{equation*}
\Delta V_{4}=\Delta t_{2} \cdot S R_{R E F} \tag{3.4}
\end{equation*}
$$

in $V_{4}$. These random changes are combined by the differential-to-single-ended converter shown in Figure 3.3(a). If $V_{R E F}$ carries white phase noise and hence $\Delta t_{1}$ and $\Delta t_{2}$ are uncorrelated, the differential output noise of the double-sampling PD is given by

$$
\begin{equation*}
\overline{V_{n, o u t}^{2}}=S R_{R E F}^{2} \cdot\left(\sigma_{\Delta t_{1}}^{2}+\sigma_{\Delta t_{2}}^{2}\right) \tag{3.5}
\end{equation*}
$$

where $\sigma_{\Delta t_{1}}$ and $\sigma_{\Delta t_{2}}$ denote the rms jitter of $V_{R E F}$ on the rising and falling transitions, respectively. Divided by $K_{P D}^{2}$, this noise is referred to the PD input as

$$
\begin{equation*}
\phi_{n, i n, r m s}^{2}=\frac{\pi^{2}}{T_{R E F}^{2}}\left(\sigma_{\Delta t_{1}}^{2}+\sigma_{\Delta t_{2}}^{2}\right) . \tag{3.6}
\end{equation*}
$$

To appreciate the significance of this result, we convert $\phi_{n, i n, r m s}$ to jitter:

$$
\begin{equation*}
\overline{\sigma_{j}^{2}}=\frac{\sigma_{\Delta t_{1}}^{2}+\sigma_{\Delta t_{2}}^{2}}{4} \tag{3.7}
\end{equation*}
$$

That is, double sampling in essence averages the jitter of the PD input rising and falling edges, providing a 3-dB reduction. This property applies to the jitter of both the crystal oscillator and the


Figure 3.4: Double-sampling PD detecting (a) rising edge of $V_{R E F}$, or (b) falling edge of $V_{R E F}$.
reference buffer.
Plotted in Figure 3.5 are the simulated phase noise profiles at the output of a noiseless PLL employing our RBUF design and with single-sampling and double-sampling PDs. The PLL band-


Figure 3.5: Simulated phase noise of RBUF in a noiseless PLL.
width is about 10 MHz and the feedback divide ratio is unity. We note that the phase noise of RBUF is lowered by 3 dB around $1-\mathrm{MHz}$ offset in the latter case.

At low offsets, double sampling reduces the phase noise by even greater factors, e.g., by 7 dB at 100 kHz ; we explain this phenomenon below. The integrated jitter falls from $8.6 \mathrm{fs}_{r m s}$ to 5.8 $\mathrm{fs}_{r m s}$.

The proposed PD also lowers the effect of RBUF supply noise dramatically. Unlike noise
sources within an inverter, the supply noise modulates the output duty cycle, and the doublesampling PD converts this effect to a common-mode perturbation. To illustrate this point, we begin with the RBUF output waveform, $V_{R E F}$, shown in Figure 3.6 and recognize that a static supply change of $+\Delta V_{D D}$ raises the slew rates while keeping the transition times fairly constant. As a result, the duty cycle increases. We observe that the values sampled by $\phi_{1}$ on the rising and falling edges shift up together, introducing a common-mode change of $\Delta V_{3}=\Delta V_{4}$ in $V_{3}$ and $V_{4}$. Most of this perturbation is rejected by the Gm stage. Verified experimentally (Section ??), this


Figure 3.6: Double-sampling PD response to RBUF supply noise.
property greatly eases the LDO output noise requirement.
If the supply noise frequency is high enough to cause substantial change from one $V_{R E F}$ edge to the next, then the PD suppresses the result to a lesser extent. But such noise components can be filtered by means of moderately-sized capacitors attached to the LDO output.

The common-mode effect described above also explains the large RBUF phase noise suppression observed at low offsets in Figure 3.5. Recall from Section 2.2.2 that both transistors in the buffer inject noise on the output rising and falling edges. For example, the flicker noise current of $M_{1}$ in Figure 3.7 injects excess positive charge on the rising transition of $\mathrm{V}_{\mathrm{REF}}$, thus shifting it upward. Another packet of positive charge is also deposited on $C_{L}$ on the falling edge, shifting this transition upward as well. The falling transition is delayed by approximately the same amount because this noise changes negligibly in a time interval of $T_{1} \approx T_{R E F} / 2$. That is, the noise components injected by $M_{1}$ on two consecutive edges are strongly correlated. As a result, in a manner
similar to that in Figure 3.6, the flicker noise of $M_{1}$ and $M_{2}$ translates to a CM error in $V_{3}$ and $V_{4}$ and is thus suppressed.

### 3.3 PD Transfer Function and Phase Noise

The single-sampling circuit of Figure 3.2(a) can be approximately modeled by the following transfer function [24]:

$$
\begin{align*}
H_{P D}(j \omega)= & \frac{S R_{R E F}}{2 \pi \cdot f_{R E F}} \cdot \frac{1}{1+\frac{C_{2}}{C_{1} f_{R E F}} j \omega} \times \\
& \times e^{-j \omega T_{R E F} / 2} \frac{\sin \left(\omega T_{R E F} / 2\right)}{\omega T_{R E F} / 2} \tag{3.8}
\end{align*}
$$

For the double-sampling counterpart, the gain rises by a factor of 2 but the remaining terms are unchanged. With a gain of $S R_{R E F} /\left(\pi f_{R E F}\right)=39.5 \mathrm{~V} / \mathrm{rad}, f_{R E F}=250 \mathrm{MHz}, C_{1}=100 \mathrm{fF}$, and $C_{2}$ $=40 \mathrm{fF}$, the PD magnitude and phase responses are relatively flat across the bandwidth of 10 MHz chosen in this design. That is, the PD behavior negligibly affects the PLL dynamics.

The PD phase noise, $\phi_{n, P D}$, arises primarily from the samplers' $\mathrm{kT} / \mathrm{C}$ noise. If $C_{1}=C_{2}$ and $C_{3}$ $=C_{4}$ in Figure 3.3(a), the noise voltage deposited on $C_{1}$ is equal to $\sqrt{k T / C_{1}}$, corresponding to a charge amount of $\sqrt{k T C_{1}}$. This charge is next shared with $C_{3}$, yielding a voltage of $\sqrt{k T C_{1}} /\left(C_{1}+\right.$ $C_{2}$ ). The square of this value is added to the $\mathrm{kT} / \mathrm{C}$ noise associated with the slave sampler, and the final result is multiplied by 2 for the differential output:

$$
\begin{equation*}
V_{n, o u t, r m s}^{2}=2\left[\frac{k T C_{1}}{\left(C_{1}+C_{2}\right)^{2}}+\frac{k T}{C_{2}}\right] . \tag{3.9}
\end{equation*}
$$



Figure 3.7: Effect of RBUF flicker noise.

We must now divide this quantity by the square of the PD gain to obtain the equivalent phase noise. This gain, $S R_{R E F} /\left(\pi f_{R E F}\right)$, can be approximated as follows. When the voltage on $C_{1}$ is around $V_{D D} / 2$, the current available for charging it is given by $\left(V_{D D}-V_{D D} / 2\right) /\left(R_{B U F}+R_{s w}\right)$, where $R_{B U F}$ and $R_{s w}$ denote the buffer output resistance and the switch resistance, respectively. Thus,

$$
\begin{equation*}
S R_{R E F} \approx \frac{V_{D D}}{2\left(R_{B U F}+R_{s w}\right) C_{1}} . \tag{3.10}
\end{equation*}
$$

From Eqs. (3.9), (3.10) and (3.2), we compute the PD's jitter as

$$
\begin{align*}
\phi_{i n,, P D, r m s}^{2} & =\frac{\phi_{n, P D, r m s}^{2} \cdot T_{R E F}^{2}}{(2 \pi)^{2} K_{P D}^{2}} \\
& =\frac{2 k T}{V_{D D}^{2}}\left(R_{B U F}+R_{s w}\right)^{2} C_{1}^{2}\left[\frac{C_{1}}{\left(C_{1}+C_{2}\right)^{2}}+\frac{1}{C_{2}}\right] . \tag{3.11}
\end{align*}
$$

Note, however, that this jitter "power" resides in a frequency range of $-f_{R E F} / 2$ to $+f_{R E F} / 2$.


Figure 3.8: $V_{R E F}$ duty cycle error.

We must therefore divide $\phi_{i n,, P D, r m s}^{2}$ by $f_{R E F}$, subject the spectrum to the PLL transfer function, and integrate the result.

### 3.4 Effect of Duty Cycle Error

The PD operation described in Section 3.1 tacitly assumes a duty cycle of $50 \%$ for the reference. Crystal oscillators, on the other hand, can suffer from some duty cycle error (DCE). We wish to determine how DCE affects the performance.

Consider the reference buffer waveforms shown in Figure 3.8(a), where the solid plot represents a duty cycle of $50 \%$ and the dashed plot a greater value. We observe two phenomena. (1) Samples


Figure 3.9: $V_{R E F}$ duty cycle correction circuit.

A and B assume a higher common-mode level as the duty cycle increases. That is, for a sufficiently large DCE, the CM level approaches $V_{D D}$ or zero, an issue resolved by designing the Gm stage in Figure 3.1 so as to accommodate rail-to-rail inputs. (2) Either $A^{\prime}$ or $B^{\prime}$ in Figure 3.8 can land near $V_{D D}$, carrying little phase information and converting the circuit to a single-sampling PD. To avoid this difficulty, the input duty cycle can be adjusted such that the CM level of $V_{3}$ and $V_{4}$ in Figure 3.3(a) remains near $V_{D D} / 2$.

### 3.5 Duty Cycle Detection and Correction

The task of duty cycle correction (DCC) has been widely studied [10, 26], achieving errors less than $0.004 \%$ [10]. An important advantage of the proposed double-sampling PD is the simplicity that it affords for duty cycle detection. As explained above, the optimum duty cycle ensures that the CM level of $V_{3}$ and $V_{4}$ in Figure 3.8, i.e., $\left(V_{3}+V_{4}\right) / 2$, is around $V_{D D} / 2$. Thus, $\left(V_{3}+V_{4}\right) / 2-V_{D D} / 2$ serves as the duty cycle error.

Figure 3.9 shows the duty cycle correction loop. On-chip unity-gain buffers sense $V_{3}$ and $V_{4}$ and resistors $R_{1}$ and $R_{2}$ provide their CM level at node N . For test and characterization flexibility, an off-chip op-amp compares the result with $V_{D D} / 2$ and adjusts the bias input of the reference
buffer. An external input port allows a perturbation to be applied to the loop so that its response can be studied (Section 3.9).

### 3.6 VCO and $\div 2$ Stage

Shown in Figure 3.10(a), the VCO employs a complementary LC topology with inductive tail resonance at the second harmonic. ${ }^{2}$ Due to the lack of ultra-thick-metal layers, the $93-\mathrm{pH}$ inductor is realized as two metal- 8 and metal- 9 octagons in parallel. The quality factor of the tank is around


Figure 3.10: (a) VCO implementations, and (b) its simulated phase noise.

[^3]14.5, yielding the simulated phase noise profile shown in Figure 3.10(b) for a power consumption of 7.2 mW .

The $\div 2$ stage following the VCO in Figure 3.1 is realized using complementary CMOS ( $\mathrm{C}^{2} \mathrm{MOS}$ ) logic and shown in Figure 3.11(a). Drawing 1.4 mW , the circuit exhibits the simulated output phase

(a)

(b)

Figure 3.11: (a) $\mathrm{C}^{2} \mathrm{MOS} \div 2$ circuit, and (b) its simulated phase noise at an input frequency of 20 GHz.
noise plotted in Figure 3.11(b), which translates to a jitter of about 2.6 fs.

### 3.7 Multimodulus Divider

Multimodulus dividers generally produce a great deal of phase noise because of the large number of asynchronous stages that they incorporate. It is possible to insert at the divider output a retiming flipflop (FF) driven by the VCO so as to remove the divider's phase noise [27]. This
method, however, is prone to failure with process, supply voltage, and temperature (PVT) variations.

To elaborate on this point, we begin with the "modular" divider shown in Figure 3.12(a) [28], where $L_{j}$ denotes a latch. For ease of illustration, we draw a 4-stage example as shown in Figure 3.12(b), follow it with a $\div 2$ circuit (necessary for our PLL), and retime its output by means of $\mathrm{FF}_{0}$. We denote the delay of dual-modulus stage $j$ by $\Delta t_{j}$. Constructing the circuit's waveforms as in Figure 3.12(c), we observe that $\mathrm{FF}_{0}$ avoids metastability if the total delay from $C K_{\text {in }}$ to $C K_{5}$ does not exceed one period of $C K_{i n}$. More specifically, this path introduces the CK-to-Q delay of four $\div 2 / 3$ cells and one $\div 2$ stage. To this total, we must add the setup time of $\mathrm{FF}_{0}$, arriving at the following bound:

$$
\begin{equation*}
\Delta t_{1}+\Delta t_{2}+\cdots+\Delta t_{5}+t_{\text {setup }, F F_{0}}<100 \mathrm{ps} . \tag{3.12}
\end{equation*}
$$

Otherwise, the falling edges of $C K_{i n}$ and and $C K_{5}$ can coincide and make $\mathrm{FF}_{0}$ metastable, a condition that prohibits the system from locking.

Unfortunately, the condition expressed by Eq. (3.12) is difficult to meet even in the typicaltypical corner of the process. Simulations of the extracted layout suggest a total delay of about 110 ps in this corner.


Figure 3.12: (a) Modular divider, (b) multimodulus divider with one flipflop as retimer, (c) timing diagram, and (d) multimodulus divider with two flipflops as retimers.

To alleviate this issue, we recognize that $C K_{1}$ in Figure 3.12(b) is also available as a retiming command. We then interpose between the $\div 2$ stage and $\mathrm{FF}_{0}$ another flipflop and drive it by $C K_{1}$ [Figure 3.12(d)]. Here, $\mathrm{FF}_{1}$ avoids metastability if the total delay from $C K_{1}$ to $C K_{5}$ is less than one period of $C K_{1}$ :

$$
\begin{equation*}
\Delta t_{2}+\cdots+\Delta t_{5}+t_{\text {setup }, F F_{1}}<200 \mathrm{ps} \tag{3.13}
\end{equation*}
$$

For $\mathrm{FF}_{0}$, on the other hand, the delay from $C K_{\text {in }}$ to $C K_{1}$ to $C K_{6}$ plus the setup time of $\mathrm{FF}_{0}$ must remain less than 100 ps :

$$
\begin{equation*}
\Delta t_{1}+\Delta t_{F F_{1}}+t_{\text {setup }, F F_{0}}<100 \mathrm{ps} . \tag{3.14}
\end{equation*}
$$

Of the two conditions prescribed by Eq. (3.13) and Eq. (3.14), the former proves more stringent as the extracted layout in the slow-slow high-temperature corner yields a value of 120 ps for its left-hand side. To improve the robustness of the circuit, we add one more flipflop as shown in Figure 3.13(a) obtaining

$$
\begin{align*}
& \Delta t_{3}+\Delta t_{4}+\Delta t_{5}+t_{\text {setup }, F F_{2}}<400 \mathrm{ps} \\
& \Delta t_{2}+\Delta t_{F F_{2}}+t_{\text {setup }, F F_{2}}<200 \mathrm{ps} \\
& \Delta t_{1}+\Delta t_{F F_{1}}+t_{\text {setup }, F F_{0}}<100 \mathrm{ps} . \tag{3.15}
\end{align*}
$$

The proposed divider in Figure 3.13(a) merits two remarks. First, the output, $\phi_{1}$, carries only the phase noise of $C K_{\text {in }}$ and $\mathrm{FF}_{0}$. Second, this method guarantees that the excess delay around the critical loop is no more than the delay of one divider cell and one flipflop.

Plotted in Figure 3.13(b) are the divider output phase noise profiles before and after retiming flipflops are added, suggesting a $16-\mathrm{dB}$ reduction. The integrated jitter falls from 19 fs to $3 \mathrm{fs} .^{3}$ Drawing 1.8 mW at 10 GHz (mostly in the input clock buffer), the circuit provides a divide ratio from 32 to 62 .

The multimodulus divider blocks are realized by TSPC and CMOS circuits. Specifically, the first two $\div 2 / 3$ stages, $\mathrm{FF}_{0}, \mathrm{FF}_{1}$ employ the former type and the slower blocks, the latter.

[^4]

Figure 3.13: (a) Proposed multimodulus divider with 3 flipflops as retimers, and (b) its simulated phase noise spectrum.

### 3.8 Nonoverlapping Clock Generator

In order to minimize the ripple on the control voltage, the PD of Figure 3.3(a) must avoid transparency between the master and slave samplers, requiring nonoverlapping clock phases. The challenge here is that conventional topologies, such as those based on cross-coupled gates, generate significant jitter. We must therefore avoid passing $\phi_{1}$ through additional stages and yet generate $\phi_{2}$. This is accomplished as shown in Figure 3.14(a), where latches $L_{1}-L_{3}$ and delay stage $\Delta T$ produce a signal $\phi_{0}$ at 500 MHz , with a delay of $\Delta T$ with respect to $\phi_{1}$. From the $\phi_{2}$ and $\overline{\phi_{2}}$ waveforms shown in Figure 3.14(b), we observe a nonoverlap time of $\Delta T$, about 50 ps in this work. We should note that $\phi_{0}$ and $\phi_{2}$ inherit the phase noise of the delay stage, but the master samplers in Figure 3.3(a) rely on only $\phi_{1}$ and $\overline{\phi_{1}}$. Since $\phi_{2}$ and $\overline{\phi_{2}}$ only transfer charge to the slave


Figure 3.14: (a) Nonoverlapping clock generator, (b) nonoverlapping clock waveform.
capacitors, their phase noise is not critical.

### 3.9 Experimental Results

The proposed PLL has been fabricated in $28-\mathrm{nm}$ CMOS technology. Figure 3.17 shows a photograph of the die, where the active area measures approximately $320 \mu \mathrm{~m} \times 310 \mu \mathrm{~m}$. The prototype consumes $12 \mathrm{~mW}: 7.2 \mathrm{~mW}$ in the $\mathrm{VCO}, 1.4 \mathrm{~mW}$ in the $\div 2$ stage, 1.8 mW in the multimodulus divider, and 1.3 mW in the reference buffer. ${ }^{4}$ The power supply voltage of reference buffer is 1.2 V and the rest of the PLL is supplied at 1 V . The loop is locked with a divide ratio of 80 and an output frequency of 20 GHz . The VCO has a gain of $120 \mathrm{MHz} / \mathrm{V}$ and a total tuning range of 450 MHz , allowing synthesis of only 20 GHz with a $250-\mathrm{MHz}$ reference. This range somewhat relaxes the oscillator power-jitter trade-off and should be borne in mind in the comparison with the prior

[^5]art (see below). The PD can be configured to operate as a single-sampling or a double-sampling circuit.


Figure 3.15: Measured PLL output spectrum.

The $250-\mathrm{MHz}$ reference frequency is provided by Crystek's CRBSCS-01-250 crystal oscillator. Its phase noise is plotted in Figure 3.18, exhibiting a value of $-171.5 \mathrm{dBc} / \mathrm{Hz}$ at $1-\mathrm{MHz}$ offset.

For ease of measurement, the output of the $\div 2$ circuit, $^{2}$ Div $_{\mathrm{a}}$, in Figure 3.1 is used for the characterization. Figure 3.15 shows the measured spectrum, indicating a reference spur level of -72 dBc , which translates to -66 dBc at the VCO output.

Figure 3.16 plots the measured phase noise at the output of the $\div 2$ circuit for single sampling and double sampling. The profile exhibits a plateau of about $-133 \mathrm{dBc} / \mathrm{Hz}$ up to $10-\mathrm{MHz}$ offset and falls to $-156 \mathrm{dBc} / \mathrm{Hz}$ at $100-\mathrm{MHz}$ offset; the phase noise at the VCO output is 6 dB higher. We observe that double sampling lowers the profile by 2.5 dB from 10 kHz to 1 MHz and 1.5 dB from 1 MHz to 3 MHz . Since the VCO contribution remains the same, ${ }^{5}$ the overall phase noise declines

[^6]

Figure 3.16: Measured PLL phase noise.
by less than 3 dB . The free-running VCO flicker noise corner is around 800 kHz , contributing negligible jitter after the loop is closed.

The jitter integrated from 10 kHz to 100 MHz is equal to 20.25 fs . According to simulations, the crystal oscillator contributes 10 fs , the reference buffer 6.2 fs , and the VCO 15 fs .

As explained in Section 3.1, the reference buffer supply rejection becomes critical unless the LDO feeding it provides an extremely low output noise voltage. With double sampling, on the other hand, this issue is greatly relaxed. This point is verified as follows. The supply voltage of the buffer is modulated by a sinusoid having a peak amplitude of 140 mV and a variable frequency. The corresponding spurs at the PLL output are then studied for single sampling and double sampling. Figure 3.19 plots the measured spur levels as a function of the sinusoid's frequency, revealing an improvement of at least 20 dB .


Figure 3.17: Die photograph.


Figure 3.19: Measured spur level due to the RBUF supply disturbance.

The duty cycle and its correction circuit have been characterized by several tests. Since direct, accurate measurement of the duty cycle is difficult, we first disable the loop in Figure 3.9 and


Figure 3.18: Measured phase noise of the $250-\mathrm{MHz}$ crystal oscillator.
measure the PLL output reference spur levels and phase noise for different values of the PD output CM level, $\mathrm{V}_{\mathrm{PD}, \mathrm{CM}}$. This is accomplished by changing the input bias voltage of RBUF. We also find the relationship between between $\mathrm{V}_{\mathrm{PD}, \mathrm{CM}}$ and the duty cycle error (DCE) by simulations. We can then plot the spur levels and the integrated jitter as a function of DCE. The results are depicted in Figure 3.20. We should remark that the minima occur for $V_{P D, C M} \approx V_{D D} / 2$. Next, we enable the correction loop and apply an external step as illustrated in Figure 3.9. Shown in Figure 3.21,


Figure 3.20: (a) Measured spur levels, and (b) PLL output jitter against $V_{R E F}$ DCE.


Figure 3.21: Measured transient response of $\mathrm{V}_{\mathrm{PD}, \mathrm{CM}}$.
the transient response of $\mathrm{V}_{\mathrm{PD}, \mathrm{CM}}$ reveals that this voltage jumps by 250 mV but returns to 530 $\mathrm{mV}\left(\approx V_{D D} / 2\right)$.

In order to study the robustness of the paper, we apply to the VCO supply voltage an external square wave having a peak-to-peak amplitude of 300 mV . The Agilent E5052A signal analyzer
captures the frequency transient. ${ }^{6}$ Plotted in Figure 3.22 is the result, indicating that the loop relocks.


Figure 3.22: Measured PLL frequency transient response.

Table 3.1 presents the measured performance of our prototype and compares it to that of other PLLs ( Figure 3.23) that have achieved sub-60-fs jitter values. The jitter is reduced by more than a factor of 2 and the FoM is improved by 4.1 dB .
${ }^{6}$ Due to this equipment's limitations, we precede it with an external $\div 2$ stage.

Table 3.1: Performance summary and comparison to prior art

|  | Zhang <br> ISSCC 2019 | Gong <br> RFIC 2020 | Mercandelli <br> ISSCC 2020 | Turker <br> ISSCC 2018 | This <br> Work |
| :---: | :---: | :---: | :---: | :---: | :---: |
| Architecture | Sub-sampling <br> PLL | Charge <br> Sampling <br> PLL | Single- <br> Sampling <br> PLL | Charge-pump <br> based PLL | Double <br> Sampling <br> PLL |
| Ref. Freq.(MHz) | 200 | 100 | 500 | 500 | 250 |
| Freq. Range (GHz) | $12 \sim 16$ | $9.8 \sim 12.2$ | $11.9 \sim 14.1$ | $7.4 \sim 14$ | 19 |
| RMS Jitter (fs) <br> Integ. range (MHz) | 56.4 <br> $(0.001 \sim 100)$ | 50.5 <br> $(0.001 \sim 100)$ | $51.7^{2}$ <br> $(0.001 \sim 100)$ | 53.6 <br> $(0.01 \sim 10)$ | 20.3 <br> Ref. Spur (dBc) |
| -64.6 | -65.7 | -73.5 | -75.5 | $-66)$ |  |
| Power (mW) | 7.2 | 5 | 18 | 45 | 12 |
| Area (mm $\left.{ }^{2}\right)$ | 0.234 | 0.13 | 0.16 | 0.45 | 0.06 |
| Tech. (nm) | 40 | 40 | 28 | 16 | 28 |
| FoM ${ }^{1}(\mathrm{~dB})$ | -256.4 | -258.9 | -253.2 | -248.9 | -263 |
| Crystal Osc. <br> Power (mW) | $\mathrm{N} / \mathrm{A}$ | $150^{3}$ | $175^{4}$ | $\mathrm{~N} / \mathrm{A}$ | 170 |

1: FoM $=10 \log _{10}\left[\left(\frac{\text { Jitter }}{1 \mathrm{~s}}\right)^{2}\left(\frac{\text { Power }}{1 \mathrm{~mW}}\right)\right] \quad$ 2: Integer-N Jitter
3: From datasheet of Taitien VLCU-type series
4: From private communication with author and datasheet of Crystek CCSO-914X-500


Figure 3.23: Comparison to the state-of-the-art low-jitter PLLs.

As explained in Chapter 2, the reference phase noise and frequency play a significant role in the performance of PLLs. For this reason, the crystal oscillator power consumption also becomes
problematic. According to our measurements, Crystek's CRBSCS-01-250 draws about 170 mW . Shown in Table 3.1 are the crystal oscillator power consumptions.

## CHAPTER 4

## A 56-GHz 23-mW Fractional- $N$ PLL with 110-fs Jitter

This chapter proposes a $56-\mathrm{GHz}$ fractional- $N$ PLL that achieves an integrated jitter of 110 fs . This is accomplished through the use of a phase-domain FIR filter that filters out the $\Delta \Sigma$ noise. The power consumption of this PLL is 23 mW .

### 4.1 Design Challenges

Fractional- $N$ synthesis generating a clock at the frequency of 56 GHz (Figure 4.1) faces two main challenges. The first challenge is that the speed of the multimodulus divider (MMD) is limited


Figure 4.1: A general 56-GHz Fractional- $N$ PLL.
below 20 GHz , which motives us to insert a chain of div-by-2 stages after the voltage-controlled oscillator (VCO). The second challenge is that the optimization of the PLL bandwidth faces a tradeoff between the VCO phase noise and the $\Delta \Sigma$ quantization noise. A wideband PLL suppresses the VCO phase noise but the peaking of the $\Delta \Sigma$ noise worked against this premise. Therefore, the bandwidth-noise trade-off motivates us to reduce the $\Delta \Sigma$ noise by additional techniques.

Two general approaches to $\Delta \Sigma$ noise have been reported in the prior art. The first approach incorporates a digital-to-time converter (DTC) to cancel the quantization error of the DSM for fractional- $N$ synthesis $[8,10,16,26,29-32]$. This method requires a calibration loop to adjust the gain of the DTC and assumes the DTC are linear enough to negligibly fold down the highpass shaped $\Delta \Sigma$ noise. Various noise cancellation techniques have been proposed to achieve better performance, but they require stringent matching or complex calibration.

The second approach is to filter the $\Delta \Sigma$ noise before it reaches the VCO. In [33] a simple FIR filter is inserted between the phase detector (PD) and the frequency divider of the PLL to generate the delayed copies of the divider output. The filtering operation happens when a resistor-based network combines the copies to a single feedback signal with much less jitter. The following MSSF samples the output of the resistor network and controls the frequency of the VCO. The FIRfiltering method has two advantages: 1. The mismatch of the resistors (or other type of combination elements) only alters the frequency response of the FIR filter but does not introduce folding of the high-frequency $\Delta \Sigma$ noise, which eases the matching requirement. 2. The flip-flops in the FIR filter is clocked by the VCO output, generating delayed copies of the divider output without affecting the loop stability of the PLL. This FIR-filtering method affords a loop BW of around $f_{R E F} / 4$.

In this paper, we introduce a $56-\mathrm{GHz}$ fractional- $N$ PLL that incorporates the above-mentioned FIR-filter-based method that suppresses the $\Delta \Sigma$ noise. The FIR filter consists of 22 taps and a switched-current combination circuit with better linearity and less noise-folding than those of the resistor-based network. The FIR-filter provides $12-\mathrm{dB} \Delta \Sigma$ noise rejection at $10-\mathrm{MHz}$ offset and offers a BW of 3 MHz .

### 4.2 Linearity analysis of the Phase-Domain FIR filter

In this section, we analyze the linearity of the FIR filter in the phase-domain.

### 4.2.1 Resistor-based FIR filter

In [33], XOR gates convert the phase error to a voltage signal and a resistor network performs the FIR filtering to the PD output. The coefficient of the FIR filter is determined by the value of the resistance. According to our analysis, despite of linear resistors, capacitors and switches, there is a nonlinearity issue in the phase domain with this implementation. We use a simple example to illustrate this point.

Figure 4.2(a) shows a simplified 2-tap resistor-based FIR filter. $V_{\mathrm{F}}$ represents the MMD output and $V_{\mathrm{F} \Delta}$ the delayed copy. As shown in Figure 4.2(b), the phase jump $\Delta t_{a}$ is delayed by $\mathrm{T}_{\text {REF }}$ and

(a)

(b)

Figure 4.2: (a) Resistor-based 2-tap FIR summer, (b) input and response of the resistor-based FIR summer.
combined the next phase jump $\Delta t_{b}$. Initially, $C_{1}$ is charged to $V_{D D}$. From $t_{1}$ to $t_{2}$, one XOR PD discharges $C_{1}$ through $R_{2}$ and the other charges $C_{1}$ through $R_{1}$. The output resistance of the XORs is neglected to simplify the analysis. After $t_{2}$, both XOR PDs discharge $C_{1}$ and the output voltage, $V_{\text {out }}$, is sampled at $t_{s}$. With the help of superposition, the sampled output voltage is given by

$$
\begin{equation*}
V_{s}\left(t_{s}\right)=V_{D D}\left(\frac{R_{2}}{R_{1}+R_{2}} \cdot e^{-\frac{t_{s}-\Delta t_{a}}{\tau}}+\frac{R_{1}}{R_{1}+R_{2}} \cdot e^{-\frac{t_{s}-\Delta t_{b}}{\tau}}\right), \tag{4.1}
\end{equation*}
$$

where $\tau=R_{1} R_{2} C_{1} /\left(R_{1}+R_{2}\right)$ is the time constant of the circuit. As shown in Eq. 4.1, the expression of $V_{s}$ contains a linear combination of two exponential terms. If $\Delta t_{a}$ and $\Delta t_{b}$ represent the phase noise introduced by the $\Delta \Sigma$ modulator, the exponential action introduces nonlinearity
and applies before the FIR action. The nonlinearity distortion comes from two reasons. First, the charge delivered to the load capacitance, $C_{1}$, is not linear with respect to the each phase error because the current flowing through the the resistors changes with the output voltage, $V_{\text {out }}$. Second, the branches are never "tristated". For example, when $R_{2}$ discharges $C_{1}$ from $t_{1}$ to $t_{2}$, it is desirable to disconnect $R_{1}$ from $C_{1}$ to isolate $V_{\mathrm{F}}$ from charging $C_{1}$. Here we also give the expression of an N-tap resistor-based FIR filter:

$$
\begin{equation*}
V_{s}\left(t_{s}\right)=V_{D D} \sum_{k=1}^{N}\left(\frac{R_{\|}}{R_{k}} \cdot e^{-\frac{t_{s}-\Delta t_{k}}{\tau}}\right) \tag{4.2}
\end{equation*}
$$

where $R_{\|}=\frac{1}{\sum_{k=1}^{N} \frac{1}{R_{k}}}$ is the equivalent output resistance of the resistor-based FIR filter.

### 4.2.2 Switched-current FIR filter

The linearity analysis of the resistor-based FIR filter in Sec 4.2.1 leads us to the idea of a switched-current FIR filter that achieves better linearity.

Figure 4.3(a) shows a simplified 2-tap resistor-based FIR filter. In this topology, $V_{\mathrm{F}}$ and $V_{\mathrm{F} \Delta}$ control two current sources. $C_{1}$ begins with a zero initial condition. As illustrated in Figure 4.3(b), at $t=t_{1}, V_{\mathrm{F} \Delta}$ turns on $I_{2}$ and $V_{\text {out }}$ increases with a slope of $I_{2} / C_{1}$. At $t=t_{2}, V_{\mathrm{F}}$ turns on $I_{1}$ and

(a)

(b)

Figure 4.3: (a) A two-tap Switched-current FIR filter, (b) input and response of the switchedcurrent FIR summer.
$V_{\text {out }}$ increases with $\left(I_{1}+I_{2}\right) / C_{1}$ because both $I_{1}$ and $I_{2}$ charges $C_{1}$. We derive the expression of
$V_{\text {out }}$ after $t_{2}$ :

$$
\begin{equation*}
V_{\mathrm{out}}(t)=-\frac{I_{1}}{C_{1}} \cdot \Delta t_{b}-\frac{I_{2}}{C_{1}} \cdot \Delta t_{a}+\frac{I_{1}+I_{2}}{C_{1}} \cdot t \tag{4.3}
\end{equation*}
$$

If we view $\Delta t_{b}$ as $x(t)$ and $\Delta t_{a}$ as $x\left(t-T_{\text {REF }}\right)$, we observe that $V_{\text {out }}$ provides a two-tap FIR response, with normalized filter coefficients $\alpha_{1}=-I_{1} /\left(I_{1}+I_{2}\right)$ and $\alpha_{2}=-I_{2} /\left(I_{1}+I_{2}\right)$. In order to perform phase comparison with the reference, we sample $V_{\text {out }}$ by $V_{\mathrm{REF}}$ at $t=t_{s}$. The sampled voltage, $V_{s}$, is given by

$$
\begin{equation*}
V_{s}=\frac{I_{1}+I_{2}}{C_{1}}\left[\alpha_{1} \cdot x(t)+\alpha_{2} \cdot x\left(t-T_{\mathrm{REF}}\right)+t_{s}\right] . \tag{4.4}
\end{equation*}
$$

Thus, $V_{s}$ contains an integrated value from $t=0$ to $t_{s}$ representing the reference phase minus the a linear combination of two terms involving $\Delta t_{a}$ and $\Delta t_{b}$. From Eq. (4.4), we conclude that the switched-current FIR exhibits no nonlinearity with ideal current sources. This is because in this case, we do tristate the current sources so that $C_{1}$ stores the charge that represents the phase difference between the reference clock and each feedback clock. We write the expression of $V_{s}\left(t_{s}\right)$ in the case of an N-tap FIR filter:

$$
\begin{equation*}
V_{s}\left(t_{s}\right)=\frac{\sum_{k=1}^{N} I_{k}}{C_{1}} \sum_{k=1}^{N} \alpha_{k} \cdot\left(t_{s}-t_{k}\right), \tag{4.5}
\end{equation*}
$$

where $\alpha_{k}=\frac{I_{k}}{\sum_{k=1}^{N} I_{k}}$ is the normalized filter coefficient.
The foregoing analysis is based on the assumption of an ideal current source. However, in practice, MOSFETs have finite output impedance introduced by the channel-length modulation effect. In the case of the FIR filter, the current delivered by the current sources is no longer constant, but a function of the FIR filter output voltage ( $V_{\text {out }}$ ). Let us again start with a simple case of a two-tap switched-current FIR filter with the output resistance associate with $I_{1}$ and $I_{2}$ denoted by $R_{1}$ and $R_{2}$ included (Figure 4.4). We note that the output resistance of a current branch is inversely proportional to its output current. Therefore, we have $I_{1} \cdot R_{1}=I_{2} \cdot R_{2}=\left(I_{1}+I_{2}\right) \cdot R_{\|}$, where $R_{\|}=R_{1} R_{2} /\left(R_{1}+R_{2}\right)$. We redo the calculation of $V_{\text {out }}$ for the 2-tap switched-current FIR filter with the input signal shown in Figure 4.3(b). Let us assume that $C_{1}$ begins with an zero initial
condition. At $t_{1}, V_{\mathrm{F} \Delta}$ turns on $I_{2}$ so both $I_{2}$ and $R_{2}$ charges $C_{1}$. The voltage of $C_{1}$ is given by

$$
\begin{equation*}
V_{\text {out }}(t)=\left(V_{D D}+I_{2} R_{2}\right)\left(1-e^{-\frac{\left(t-\Delta t_{a}\right)}{R_{2} C_{1}}}\right), \quad\left(\text { for } t_{1}<t<t_{2}\right) \tag{4.6}
\end{equation*}
$$

At $t_{2}, V_{\mathrm{F}}$ turns on $I_{1}$ and both two current sources, i.e., $I_{1}, I_{2}$, and two resistors, i.e., $R_{1}, R_{2}$, charge $C_{1}$. The expression of $V_{\text {out }}$ is given by

$$
\begin{equation*}
V_{\text {out }}(t)=\left(V_{D D}+I_{1} R_{1}\right)\left(1-e^{\frac{\frac{R_{\|}}{R_{2}} \Delta t_{a}+\frac{R_{\|}}{R_{1}} \Delta_{b}-t}{R_{\| \mid} C_{1}}}\right) \cdot \quad\left(\text { for } t_{2}<t\right) \tag{4.7}
\end{equation*}
$$

By comparing Eq. 4.7 with Eq. 4.2, we note that one advantage of the proposed switched-current FIR filter over the resistor-based FIR is the nonlinearity from the exponential action applies after the FIR filtering action.


Figure 4.4: A two-tap switched-current FIR filter with finite output resistance.

### 4.3 Proposed PLL architecture

The proposed PLL architecture is shown in Figure 4.5. A switched-current FIR circuit acts as both a quantization noise filter and a phase detector, and is followed by a sampler, a Gm stage, a loop filter and a VCO. The Gm stage provides a gain of 30 dB at DC, relaxing the voltage compliance at the FIR filter output. The feedback path consists of a low-power, compact divide-by-8 circuit and an MMD driven by a 1-1-1 MASH $\Delta \Sigma$ modulator. Despite the limited speed of the 28 nm CMOS devices, the PLL employs only in inductor (in the VCO) so as to occupy a small footprint. Here we need to answer two questions. First, how many taps do we need for the FIR filter? Second, do we need $\mathrm{a} \div 8$ or $\div 4$ circuit between the VCO and the MMD?


Figure 4.5: Proposed PLL architecture.

### 4.3.1 Divide ratio

As mentioned in Section 4.1, a divider needs to be inserted between the VCO and the MMD because of the limited speed of the latter. Here we compare two cases: a $56-\mathrm{GHz}$ VCO followed by a divide-by- 4 and then by a MMD running at 14 GHz vs the same VCO followed by a divide-by- 8 and then by a MMD running at 7 GHz . Without an FIR filter, the phase noise at the PLL output introduced by the MASH 1-1-1 DSM [34] is

$$
\begin{equation*}
S_{\Phi_{\Delta \Sigma}}=\frac{N^{2}}{12 f_{R E F}} \cdot|G(f)|^{2}\left((2 \pi)^{2}\left(2 \sin \left(\pi \frac{f}{f_{R E F}}\right)\right)^{4}\right) \tag{4.8}
\end{equation*}
$$

where $N$ is the divide ratio of the divider in front of the MMD, $G(f)$ is the normalized closed-loop response of a PLL with a low-pass profile and a DC gain of one and plotted in Figure 4.6(a). We note that in the second case, the phase noise increases by 6 dB referred to the PLL output and the integrated jitter doubles, compared to that of the first case. In other words, the higher the frequency of the MMD input clock is, the less the DSM introduces phase noise to the PLL output. Here we assume the same BW in this comparison.

Now we include the FIR filter in both cases. In the first case, an 8-tap Chebyshev FIR filter is added to the output of the MMD. The integrated jitter from the $\Delta \Sigma$ noise is $31 \mathrm{fs}_{r m s}$. In the second case, the length of the FIR filter needs to be increased to 22 taps to achieve the same amount of jitter. At the output of a PLL with the BW of 3 MHz , the DSM phase noise at the PLL output is


Figure 4.6: (a) Normalized PLL closed-loop response and, (b) PLL output $\Delta \Sigma$ phase noise spectrum with $\div 4$ and $\div 8$ circuit.
shown in Figure 4.6(b).
The power consumption of clock buffer driving the flipflops in the FIR merits attention. The number of flipflops in the FIR is given by $f_{\text {div }} / f_{R E F} \cdot\left(N_{\text {taps }}-1\right)$, where $N_{\text {taps }}$ is the number of the FIR taps. The power of the clock buffer can be estimated as $C_{C K} \cdot V_{D D}^{2} \cdot f_{M M D}^{2} / f_{R E F} \cdot\left(N_{\text {taps }}-1\right)$, where $C_{C K}$ is the capacitance of the clock input node for each flipflop and $V_{D D}$ is the supply voltage of the clock buffer.

Here is the trade-off. If we use $\mathrm{a} \div 8$ circuit, the FIR must be longer and hence presents more capacitance in its clock path, conversely with a $\div 4$, the FIR can be shorter but its clock frequency and the number of flipflops in each tap doubles. The power consumption of the flipflop clock path is given by $f_{\text {div }}^{2} / f_{\mathrm{REF}} C_{\mathrm{CK}} V_{\mathrm{DD}}^{2}\left(N_{\text {tap }}-1\right)$, where $C_{C K}$ is the capacitance of the clock input of a flipflop in the FIR delay element and $N_{\text {tap }}$ is the number of FIR taps. Based on the transistor level implementation of the FIR, we obtain these power number for the 2 cases $(\div 4: 8 \mathrm{~mW}, \div 8: 6 \mathrm{~mW})$, which means the $\div 8$ option is preferable.

The MASH $1-1-1$ modulator in Figure 4.5 employs a word length of 20 bits for a frequency resolution of 2 kHz at 56 GHz .

### 4.3.2 Switched-current FIR/PD Implementation

The complete FIR filter and phase detector is depicted in Figure 4.7(a). The core consists of 22 cascode current sources, with integer weighting factors $k_{1}$ to $k_{22}$ chosen to create a Chebychev response having zeros at 11 MHz and its harmonics. In this work, the minimum k factor is 3 and maximum is 10 . The coefficients of the FIR filter is given by $h_{0}=h_{21}=0.1, h_{1}=$ $h_{20}=0.03, \ldots, h_{10}=h_{11}=0.05 .{ }^{1}$ The proposed topology incorporates 21 delay elements and 22 NAND gates to apply FIR filtering to the phase difference between the reference and the MMD output. Among the available windows to implement the FIR filter, the Kaiser and the Chebyshev


Figure 4.7: (a) Implementation of the 22-tap FIR/PD and, (b) waveforms of the FIR control signal.
filter provides the minimal $\Delta \Sigma$ jitter. Since the actual filter coefficients are realized by the ratio between integer multiple of current sources over the total number of current sources, the quantization error alters the filter coefficients and hence the response. We find that the minimum coefficient of the Kaiser window is relatively smaller (e.g. $h_{1}=0.015$ for $\beta=2.9$ ) than that of the Chebyshev window and its response is more sensitive to coefficient quantization error. Thus we choose Chebyshev filter in this design.

[^7]The cascode switched-current cell employs a timing scheme (Figure 4.7(b)) that halves the power consumption and yet achieves high linearity. Initially, both $M_{3}$ and $M_{4}$ are off. Next, at the rising edge of $f_{\text {REF }}, M_{3}$ turns on, bringing $V_{\mathrm{A}}$ down to its desired value and finally, $M_{4}$ turns on and $M_{3}$ turns off at the rising edge of $\phi_{k}$, allowing $C_{1}$ to charge.

In a typical analog layout, PMOS mismatches can be readily maintained below about $10 \%$. We thus perform Monte-Carlo simulations to determine the variation of the $\Delta \Sigma$ jitter. Figure 4.8 shows the variation of the $\Delta \Sigma$ jitter at the PLL output. The FIR filter response is not sensitive to the current mismatches because the filter coefficient is determined by the ratio of the current in a tap over the total current in the FIR.


Figure 4.8: Monte-Carlo results showing variation of $\Delta \Sigma$ Jitter.

Another merit of the proposed FIR is that the probability density function of the phase error is narrowed from $\pm 2 \mathrm{~T}_{\text {div }}$ (Figure 4.9(a)) at the MMD output to $\pm 0.3 \mathrm{~T}_{\text {div }}$ (Figure 4.9(b)) equivalently at the FIR output.

### 4.3.3 Phase Detection

We analyze the phase noise the switched-current FIR filter/PD in the integer- $N$ operation of the PLL for simplicity. In the integer- $N$ mode, the output delay element, $\phi_{j}$, is aligned with the MMD


Figure 4.9: (a) $\Delta \Sigma$ phase error probability distribution without FIR filtering and, (b) with FIR filtering.
output, $V_{\mathrm{F}}$, because the delay of each tap is equal to $T_{\text {REF }}$. Therefore, all of the current branches are turned on at the rising transition of $V_{\mathrm{F}}$. The current sources charge $C_{1}$ until the falling edge of $V_{\text {REF }}$ arrives. The sampler samples $V_{\text {out }}$ at the falling edge of $V_{\text {REF }}$. As a result, the sampled voltage $V_{S}$ is proportional to the phase difference between $V_{\text {REF }}$ and $V_{\mathrm{F}}$. The slope of $V_{\text {out }}$ is given by $I_{t o t} / C_{1}$ and a phase deviation of $2 \pi f_{R E F} \Delta t$ translates to a sampled voltage of $\Delta t I_{t o t} / C_{1}$. So the phase detection gain is

$$
\begin{align*}
K_{P D} & =\frac{S R_{V_{s}}}{2 \pi f_{R E F}} \\
& =\frac{I_{t o t}}{2 \pi f_{R E F} C_{1}} . \tag{4.9}
\end{align*}
$$

The current source deposits noise to $C_{1}$ during the ramp-up time of $V_{\text {out }}$, denoted by $\Delta t$, and this noise voltage is sampled by means of $V_{\text {REF }}$. We note that the sampled noise voltage $V_{n, s}$ translates to the output phase noise of the PLL. The noise current $i_{n}(t)$ is integrated from 0 to $\Delta t$ and sampled at the end of the integral. Similar to the phase noise analysis of an inverter [23], the spectrum of
the sampled noise voltage is written as

$$
\begin{equation*}
S_{V_{n}}(f)=\sum_{m=-\infty}^{m=+\infty} \frac{1}{C_{1}^{2}} \Delta t^{2} \frac{\sin ^{2}\left(\pi\left(f-m f_{R E F}\right) \Delta t\right)}{\left(\pi\left(f-m f_{R E F}\right) \Delta t\right)^{2}} S_{I n}\left(f-m f_{R E F}\right), \tag{4.10}
\end{equation*}
$$

where $S_{\text {In }}(f)$ is the total spectrum of the noise current of the FIR filter and $f_{R E F}=1 / T_{\text {REF }}$ is the reference frequency. Referred to the input of the FIR by dividing $S_{V n}(f)$ with $K_{P D}^{2}$, the phase noise of the FIR filter is

$$
\begin{equation*}
S_{\phi}(f)=\frac{4 \pi^{2} S_{V n}(f) f_{R E F}^{2} C_{1}^{2}}{I_{\text {tot }}^{2}} . \tag{4.11}
\end{equation*}
$$

If $S_{I n}(f)$ is white, the phase noise is a sampled and shaped white noise with a spectrum given by

$$
\begin{equation*}
S_{\phi}(f)=4 \pi^{2} \frac{\Delta t}{T_{\mathrm{REF}}} \frac{S_{I n}(f)}{I_{\text {tot }}^{2}} . \tag{4.12}
\end{equation*}
$$

We note that $S_{I n}(f)=4 k T \gamma I_{t o t} /\left(V_{G S}-\left|V_{T H}\right|\right)$, every doubling of the current improves the phase noise by 3 dB . For the flicker noise current, noise folding effect is neglected, the phase noise spectrum is

$$
\begin{equation*}
S_{\phi}(f)=4 \pi^{2} \frac{\Delta t^{2}}{T_{\mathrm{REF}}^{2}} \frac{S_{1 / f}(f)}{I_{t o t}^{2}} \tag{4.13}
\end{equation*}
$$

One would reduce $\Delta t$ to improve $S_{\phi}(f)$ by increasing the charging slope of $V_{\text {out }}$. But the $\Delta \Sigma$ phase error applies a lower limit to $\Delta t$. When the PLL operates in the fractional- $N$ mode, the instantaneous phase of $V_{\mathrm{F}}$ is modulated and the rising edge arrives earlier or later than it does in the integer $-N$ mode. For a 1-1-1 MASH $\Delta \Sigma$ modulator, the phase error is with $\left[-2 T_{\text {div }},+2 T_{\text {div }}\right]$. So the worst case happens when the rising edge of $V_{\mathrm{F}}$ arrives two $T_{\text {div }}$ later than. The minimum $\Delta t$ should guarantee the rising edge of $V_{\mathrm{F}}$ comes earlier than the falling edge of $V_{\mathrm{REF}}$.

### 4.3.4 PD Gain Limit

There are two factors limiting the slope of $V_{\text {out }}$ and hence the PD gain. First, The actual current source has limited voltage headroom for the devices to operate in the saturation region. For the PMOS devices used in the current source, the voltage of $C_{1}$ should be lower than $V_{D D}-\left(V_{G S}-\right.$ $\left.\left|V_{T H P}\right|\right)$. Second, the charging time is limited by the spread of $\Delta \Sigma$ phase error. We explain this


Figure 4.10: Switched-current FIR operation in the integer- $N$ mode.
point in detail and start with the integer- $N$ mode operation. In Figure 4.10, $t_{L}$ represents the locking point of the PLL in the integer $-N$ mode. At $t=t_{L}$, all the current branches are turned on and $V_{\text {out }}$ increases at a rate of $I_{t o t} / C_{1}$. At $t=t s$, all current branches are turned off. The time interval from $t_{L}$ to $t_{S}$ is called "charging time". In the fraction $N$ mode, the feedback clock, $\mathrm{V}_{\mathrm{F}}$ and its delay copy, $\mathrm{V}_{\mathrm{F} \Delta}$ are not aligned due to $\Delta \Sigma$ modulation. But we can imagine $\mathrm{V}_{\mathrm{F}}$ and $\mathrm{V}_{\mathrm{F} \Delta}$ moves around the locking point, $t_{L}$ when the PLL is locked. From the distribution of $\Delta \Sigma$ phase error plotted in Figure 4.9(b), we can expect the maximum phase deviation of $\mathrm{V}_{\mathrm{F}}$ (or $\mathrm{V}_{\mathrm{F} \Delta}$ ) with respect to $t_{L}$ to be $\pm 2 T_{\text {div }}$. Therefore, the charging time should be at least $\pm 2 T_{\text {div }}$ to make sure the all the current branches contribute to the charging of $C_{1}$ for proper FIR operation. From the analysis above, the voltage headroom and the charging time requirement together applies an upper limit on the slope and hence the PD gain.

### 4.3.5 Cascode current source

As discussed in Section 4.2.2, the output resistance of the current sources degrades the linearity of the switched-current FIR filter. Figure 4.11(a) plots the simulated I-V characteristics of two PMOS current sources delivering about 6.5 mA to its output node. The output resistance of a PMOS current source drops from $3.3 \mathrm{k} \Omega$ to $500 \Omega$ (calculated at $V_{\text {out }}=0.5 \mathrm{~V}$ ) if the cascode device is removed. Figure 4.11 (b) plots the simulated $\Delta \Sigma$ phase noise spectrum of a FIR filter with and without cascode devices. The cascode current source reduces the $\Delta \Sigma$ noise floor by 14
dB at 100 kHz offset and 10 dB at 1 MHz offset.


Figure 4.11: (a) I-V characteristic of a current unit with and without cascode and, (b) simulated $\Delta \Sigma$ phase noise spectrum of switched-current FIR with and without cascode.

### 4.3.6 Binary Delay Line

In the delay element of the FIR filter, we can place a chain of 28 TSPC flipflops in each delay stage clocked by the $\div 8$ circuit, providing discrete values equal to integer multiples of $T_{\text {div }}$.

With this clocking method, $\phi_{1}$ to $\phi_{22}$ all carry the feedback information for the PLL to lock [33]. However, this approach leads to phase error accumulation. As illustrated in Figure 4.12(a), the delay from $\phi_{1}$ to $\phi_{2}$ is equal $\frac{N}{N+\alpha} \cdot T_{\text {REF }}$ and not equal to $T_{\mathrm{REF}}$, where $\alpha$ is the frequency command word (FCW). Therefore, this negative phase error accumulates, creating a large error by the time we get $\phi_{22}$. As analyzed in Section 4.3.3, the phase detection gain is inversely proportional to the charging time of the FIR filter, the lower bound of which is limited by the distribution of the $\Delta \Sigma$ phase error. Now, the phase error accumulation further increases the charging time and reduces the phase detection gain for proper FIR operation. As a result, the phase noise contribution from the FIR and the following Gm stage increases.

To resolve this issue, the delay elements assume a binary value of either $T_{1}=28 T_{\text {div }}$ or $T_{2}=$ $29 T_{\text {div }}$ so as to create a tight bound for this error. Programmed individually in conjunction with $\alpha$, the delay of Stage $j$ is set according to the following rules: if the accumulated error from Stage 1 to Stage $j$ is less than $T_{\text {div }}$, then $T_{1}=28 T_{\text {div }}$ is selected; otherwise, $T_{2}=29 T_{\text {div }}$ is used. The accumulated error is predicted by $(j-1) \alpha T_{\text {div }}$. As shown by the waveforms in Figure 4.12(b), the delay from $\phi_{2}$ to $\phi_{3}$ is compensated by one more $T_{\text {div }}$. In this way, the last FIR phase, $\phi_{22}$, experiences a difference of only about $T_{\text {div }}$, with respect to the others.


Figure 4.12: (a) FIR delay output without binary delay, and (b) FIR delay output without binary delay.

The flipflops in the FIR delay elements employ a true single-phase clock (TSPC) structure (Figure 4.13). The extracted total capacitance of the clock input is 1.4 fF for a single flipflop. In this design, we employ 609 flipflops. The clock path is driven by the $\div 8$ output at 7 GHz and consumes a power of 6 mW .


Figure 4.13: TSPC flip-flop.

### 4.3.7 VCO and $\div 8$ circuit $^{2}$

Shown in Figure 4.14, the VCO employs a complementary LC topology. Due to the lack of ultra-thick-metal layers, the $45-\mathrm{pH}$ inductor is realized as two metal- 8 and metal- 9 octagons in parallel. The quality factor of the tank is around 10 , yielding the simulated phase noise profile shown in Figure 4.14(b) for a power consumption of 7.2 mW .

The $\div 8$ circuit between the VCO and the MMD can potentially consume high power and a large area if it employs inductors $[35,36]$. We use $\mathrm{a} \div 8$ topology that significantly reduces both [37]. As shown in Figure 4.15(a), the circuit is based on two dynamic latches and a third inverter in the feedback path for proper toggling. The performance is dramatically improved by introducing a feed-forward path from $A$ to $B$ so that the signal arrives at the latter before $S_{2}$ turns on. Proper scaling of this path with respect to the main inverters allows the upper end of the lock range to be extended, with some limitation on the lower end. As plotted in Figure 4.15(b), the feedforward path raises the divider's maximum speed from 55 GHz to 68 GHz while imposing a lower end of 43 GHz . This $\div 2$ stage draws 1.8 mW at 56 GHz .

(a)

(b)

Figure 4.14: (a) VCO implementations, and (b) its simulated phase noise.

[^8]

Figure 4.15: (a) $\div 2$ circuit with feedforward, and (b) its simulated frequency range.

### 4.4 Experimental Results

The proposed PLL has been fabricated in $28-\mathrm{nm}$ CMOS technology. Figure 4.16 shows a photograph of the die, where the active area measures approximately $540 \mu \mathrm{~m} \times 290 \mu \mathrm{~m}$. The prototype consumes $23 \mathrm{~mW}: 11 \mathrm{~mW}$ in the FIR filter, 7 mW in the VCO, 3.1 mW in the $\div 8$ stage, 1 mW in the multi-modulus divider, 0.5 mW in the reference buffer, and 0.4 mW in the $\Delta \Sigma$ modulator and is supplied at 1 V . The external $250-\mathrm{MHz}$ reference is provided by a low-noise crystal oscillator from Crystek corporation.

For ease of measurement, the output of the $\div 8$ circuit following the VCO is used for testing. Fig. 4.17 shows the measured $\div 8$ output spectrum. The fractional spur at 2 MHz offset has a level
of -66 dBc , which translates to -48 dBc at the VCO output and hence 16 fs of rms deterministic jitter. Fortunately, with receive clock and data recovery (CDR) bandwidths of tens of megahertz [38, 39] or above 100 megahertz [25, 40], such low-frequency spurs are rejected. Figure 4.18 plots the fractional spur levels as the FCW varies from 0.004 to 0.06 and the offset frequency varies from 1 MHz to 15 MHz .

Table 4.1 presents the measured performance of our prototype and compares it to that of other 60 GHz and 30 GHz fractional- $N$ PLLs. With a power consumption of 23 mW and a jitter of 110 fs, we observe a nearly twofold reduction in jitter, an 8.3 dB improvement in the FoM, and a more than threefold reduction in area.


Figure 4.16: Die photograph.


Figure 4.17: Measured PLL output spectrum.


Figure 4.18: Measured PLL fractional spur levels.

Table 4.1: Performance summary and comparison to prior art

|  | Wu <br> ISSCC 2013 | Grimaldi <br> ISSCC 2014 | Hussein <br> ISSCC 2017 | Zong <br> JSSC 2019 | This <br> Work |
| :---: | :---: | :---: | :---: | :---: | :---: |
| Freq. Range (GHz) | $56.4 \sim 63.4$ | $50.2 \sim 66.5$ | $30.6 \sim 34.2$ | $57.5 \sim 67.2$ | $52.3 \sim 56.8$ |
| RMS Jitter (fs) | 522.9 | 223 | 197.6 | 213 | 110 |
| Integ. range (MHz) | $(0.01 \sim 10)$ | $(0.001 \sim 40)$ | $(0.001 \sim 10)$ | $(0.01 \sim 30)$ | $(0.01 \sim 40)$ |
| Frac. Spur (dBc) | N/A | -68 | -42.2 | -38 | -48 |
| Ref. Spur (dBc) | -74 | - N/A | N/A | -65 | -50 |
| Ref. Freq.(MHz) | 100 | 100 | 100 | 100 | 250 |
| Tech. (nm) | 65 | 65 | 65 | 28 | 28 |
| Power (mW) | 40 | 46 | 35 | 31 | 23 |
| Area (mm ${ }^{2}$ ) | 0.48 | 0.45 | 0.55 | 0.38 | 0.1 |
| FoM $^{1}(\mathrm{~dB})$ | -229.6 | -236.4 | -238.6 | -237.2 | -245.5 |

1: FoM $=10 \log _{10}\left[\left(\frac{\text { Jitter }}{1 \mathrm{~s}}\right)^{2}\left(\frac{\text { Power }}{1 \mathrm{~mW}}\right)\right]$

## CHAPTER 5

## Conclusion

In this dissertation, we demonstrate design techniques for both integer- $N$ and fractional- $N$ high-speed PLLs for wireline and wireless applications. The crystal oscillator, the reference buffer, and the VCO become the main contributors for low-jitter integer- $N$ PLL design. We introduce a new phase detector and a self-retimed frequency divider that ease the trade-offs in PLLs. We also propose a novel current-mode FIR filter to avoid phase and frequency detectors (PFDs) and charge pumps and to suppress the DSM quantization noise with negligible noise folding for low-jitter fractional- $N$ PLLs. To provide a compact solution suited to multi-lane systems, the PLL also incorporates a low-power, inductorless $\div 8$ circuit.
[1] T. Ali et al., "A 460mW 112Gb/s DSP-Based Transceiver with 38dB Loss Compensation for Next-Generation Data Centers in 7nm FinFET Technology," in ISSCC Dig. Tech. Papers Slide Supplements, 2020, pp. 118-120.
[2] S. Palermo et al., "CMOS ADC-based receivers for high-speed electrical and optical links," IEEE Communications Magazine, vol. 54, no. 10, pp. 168-175, 2016.
[3] A. M. A. Ali et al., "A 12b 18GS/s RF Sampling ADC with an Integrated Wideband Track-and-Hold Amplifier and Background Calibration," in IEEE ISSCC Dig. Tech. Papers, 2020, pp. 250-252.
[4] J. Gong, F. Sebastiano, E. Charbon, and M. Babaie, "A 10-to-12 GHz 5 mW ChargeSampling PLL Achieving 50 fsec RMS Jitter, -258.9 dB FOM and -65 dBc Reference Spur," in IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2020, pp. 15-18.
[5] M. Mercandelli et al., "17.5 A 12.5GHz Fractional-N Type-I Sampling PLL Achieving 58fs Integrated Jitter," in IEEE ISSCC Dig. Tech. Papers, 2020, pp. 274-276.
[6] D. Turker et al., "A 7.4-to-14GHz PLL with 54fsrms jitter in 16nm FinFET for integrated RF-data-converter SoCs," in IEEE ISSCC Dig. Tech. Papers, 2018, pp. 378-380.
[7] Z. Zhang, G. Zhu, and C. P. Yue, "A 0.65V 12-to-16GHz Sub-Sampling PLL with 56.4fsrms Integrated Jitter and -256.4dB FoM," in IEEE ISSCC Dig. Tech. Papers, 2019, pp. 488-490.
[8] A. Santiccioli et al., "A $66 \mathrm{fs}_{r m s}$ Jitter 12.8-to-15.2GHz Fractional-N Bang-Bang PLL with Digital Frequency-Error Recovery for Fast Locking," in IEEE ISSCC Dig. Tech. Papers, 2020, pp. 268-270.
[9] Z. Yang et al., "A 25.4-to-29.5GHz 10.2mW Isolated Sub-Sampling PLL Achieving 252.9dB Jitter-Power FoM and -63dBc Reference Spur," in IEEE ISSCC Dig. Tech. Papers, 2019, pp. 270-272.
[10] W. Wu et al., "A $28-\mathrm{nm} 75-\mathrm{fs}_{\mathrm{rms}}$ Analog Fractional- $N$ Sampling PLL With a Highly Linear DTC Incorporating Background DTC Gain Calibration and Reference Clock Duty Cycle Correction," IEEE J. Solid-State Circuits, vol. 54, no. 5, pp. 1254-1265, 2019.
[11] E. Thaller et al., "A K-Band 12.1-to-16.6GHz Subsampling ADPLL with $47.3 \mathrm{fs}_{\mathrm{rms}}$ Jitter Based on a Stochastic Flash TDC and Coupled Dual-Core DCO in 16nm FinFET CMOS," in IEEE ISSCC Dig. Tech. Papers, vol. 64, 2021, pp. 451-453.
[12] Y. Hu, X. Chen, T. Siriburanon, J. Du, V. Govindaraj, A. Zhu, and R. B. Staszewski, "A Charge-Sharing Locking Technique With a General Phase Noise Theory of Injection Locking," IEEE J. Solid-State Circuits, pp. 1-1, 2021.
[13] J. Kim et al., "A $76 \mathrm{fs}_{\mathrm{rms}}$ Jitter and -40 dBc Integrated-Phase-Noise 28 -to- 31 GHz Frequency Synthesizer Based on Digital Sub-Sampling PLL Using Optimally Spaced Voltage Comparators and Background Loop-Gain Optimization," in IEEE ISSCC Dig. Tech. Papers, 2019, pp. 258-260.
[14] Y. Lim et al., "A $170 \mathrm{MHz}-$ Lock-In-Range and $-253 \mathrm{~dB}-\mathrm{FoM}_{\mathrm{jitter}} 12-\mathrm{to}-14.5 \mathrm{GHz}$ Subsampling PLL with a $150 \mu \mathrm{~W}$ Frequency-Disturbance-Correcting Loop Using a Low-Power Unevenly Spaced Edge Generator," in IEEE ISSCC Dig. Tech. Papers, 2020, pp. 280-282.
[15] X. Gao, E. Klumperink, G. Socci, M. Bohsali, and B. Nauta, "A 2.2 GHz sub-sampling PLL with $0.16 \mathrm{ps}_{r m s}$ jitter and $-125 \mathrm{dBc} / \mathrm{Hz}$ in-band phase noise at $700 \mu \mathrm{~W}$ loop-components power," in Symposium on VLSI Circuits Dig. of Tech. Papers, 2010, pp. 139-140.
[16] K. Raczkowski, N. Markulic, B. Hershberg, and J. Craninckx, "A 9.2-12.7 GHz Wideband Fractional-N Subsampling PLL in 28 nm CMOS With 280 fs RMS Jitter," IEEE J. Solid-State Circuits, vol. 50, no. 5, pp. 1203-1213, 2015.
[17] A. Sharkia, S. Mirabbasi, and S. Shekhar, "A Type-I Sub-Sampling PLL With a $100 \times 100$ $\mu \mathrm{m}^{2}$ Footprint and -255-dB FOM," IEEE J. Solid-State Circuits, vol. 53, no. 12, pp. 35533564, 2018.
[18] D.-G. Lee and P. P. Mercier, "A Sub-mW 2.4-GHz Active-Mixer-Adopted Sub-Sampling PLL Achieving an FoM of -256 dB," IEEE J. Solid-State Circuits, vol. 55, no. 6, pp. 1542-1552, 2020.
[19] X. Gao, E. A. M. Klumperink, M. Bohsali, and B. Nauta, "A Low Noise Sub-Sampling PLL in Which Divider Noise is Eliminated and PD/CP Noise is Not Multiplied by $N^{2}$," IEEE J. Solid-State Circuits, vol. 44, no. 12, pp. 3253-3263, 2009.
[20] B. Razavi, Design of Analog CMOS Integrated Circuits, 2nd ed. New York, NY, USA: McGraw-Hill, 2017.
[21] J. Sharma and H. Krishnaswamy, "A 2.4-GHz Reference-Sampling Phase-Locked Loop That Simultaneously Achieves Low-Noise and Low-Spur Performance," IEEE J. Solid-State Circuits, vol. 54, no. 5, pp. 1407-1424, 2019.
[22] J. Du et al., "A 24-31 GHz Reference Oversampling ADPLL Achieving FoM $\mathrm{M}_{\mathrm{jitter}-\mathrm{N}}$ of -269.3 dB," in Symposium on VLSI Circuits Dig. of Tech. Papers, 2021, pp. 1-2.
[23] A. Homayoun and B. Razavi, "Analysis of Phase Noise in Phase/Frequency Detectors," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 60, no. 3, pp. 529-539, 2013.
[24] L. Kong and B. Razavi, "A 2.4 GHz 4 mW Integer-N Inductorless RF Synthesizer," IEEE J. Solid-State Circuits, vol. 51, no. 3, pp. 626-635, 2016.
[25] L. Kong, Y. Chang, and B. Razavi, "An Inductorless 20-Gb/s CDR With High Jitter Tolerance," IEEE J. Solid-State Circuits, vol. 54, no. 10, pp. 2857-2866, 2019.
[26] X. Gao et al., "A 28nm CMOS Digital Fractional- $N$ PLL with -245.5dB FOM and a Frequency Tripler For 802.11abgn/ac Radio," in IEEE ISSCC Dig. Tech. Papers, 2015, pp. 1-3.
[27] L. Romano et al., "Low jitter design of a $0.35 \mu \mathrm{~m}$-CMOS frequency divider operating up to 3GHz," in Proceedings of the 28th European Solid-State Circuits Conference, 2002, pp. 611-614.
[28] C. Vaucher et al., "A Family of Low-Power Truly Modular Programmable Dividers in Standard 0.35- $\mu \mathrm{m}$ CMOS Technology," IEEE J. Solid-State Circuits, vol. 35, no. 7, pp. 10391045, 2000.
[29] H. Liu et al., "A Sub-mW Fractional- $N$ ADPLL With FOM of - 246 dB for IoT Applications," IEEE J. Solid-State Circuits, vol. 53, no. 12, pp. 3540-3552, 2018.
[30] N. Pavlovic and J. Bergervoet, "A 5.3GHz Digital-to-Time-Converter-Based Fractional- $N$ All-Digital PLL," in IEEE ISSCC Dig. Tech. Papers, 2011, pp. 54-56.
[31] A. Elkholy, T. Anand, W.-S. Choi, A. Elshazly, and P. K. Hanumolu, "A 3.7 mW Low-Noise Wide-Bandwidth 4.5 GHz Digital Fractional-N PLL Using Time Amplifier-Based TDC," IEEE J. Solid-State Circuits, vol. 50, no. 4, pp. 867-881, 2015.
[32] W. Wu et al., "A 14-nm Ultra-Low Jitter Fractional- $N$ PLL Using a DTC Range Reduction Technique and a Reconfigurable Dual-Core VCO," IEEE J. Solid-State Circuits, vol. 56, no. 12, pp. 3756-3767, 2021.
[33] L. Kong and B. Razavi, "A 2.4-GHz RF Fractional- $N$ Synthesizer With BW= $0.25 f_{R E F}$," IEEE J. Solid-State Circuits, vol. 53, no. 6, pp. 1707-1718, 2018.
[34] M. Perrott, M. Trott, and C. Sodini, "A Modeling Approach for $\Sigma-\Delta$ Fractional- $N$ Frequency Synthesizers Allowing Straightforward Noise Analysis," IEEE J. Solid-State Circuits, vol. 37, no. 8, pp. 1028-1038, 2002.
[35] H. Razavi and B. Razavi, "A 27-73 GHz Injection-Locked Frequency Divider," in 2021 Symposium on VLSI Circuits, 2021, pp. 1-2.
[36] A. Tomkins et al., "A Zero-IF 60 GHz 65 nm CMOS Transceiver With Direct BPSK Modulation Demonstrating up to $6 \mathrm{~Gb} / \mathrm{s}$ Data Rates Over a 2 m Wireless Link," IEEE J. Solid-State Circuits, vol. 44, no. 8, pp. 2085-2099, 2009.
[37] O. Memioglu, "Low Power THz CMOS Receiver with On-Chip LO Generation," Ph.D. dissertation, Electrical and Computer Engineering Department, University of California, Los Angeles, Los Angeles, CA, USA, 2021.
[38] A. Atharav and B. Razavi, "A 56-Gb/s 50-mW NRZ Receiver in 28-nm CMOS," IEEE J. Solid-State Circuits, vol. 57, no. 1, pp. 54-67, 2022.
[39] X. Zhao, Y. Chen, P.-I. Mak, and R. P. Martins, "A 0.0285mm² 0.68pJ/bit Single-Loop FullRate Bang-Bang CDR without Reference and Separate Frequency Detector Achieving an 8.2(Gb/s)/ $\mu \mathrm{s}$ Acquisition Speed of PAM-4 data in 28 nm CMOS," in IEEE Custom Integrated Circuits Conference (CICC), 2020, pp. 1-4.
[40] G. Hou and B. Razavi, "A 56-Gb/s 8-mW PAM4 CDR/DMUX with High Jitter Tolerance," in Symposium on VLSI Circuits Dig. of Tech. Papers, 2021, pp. 1-2.


[^0]:    ${ }^{1}$ These spectra are measured with $\left|\mathrm{V}_{\mathrm{GS}}\right|=\mathrm{V}_{\mathrm{DD}}$ and $\left|\mathrm{V}_{\mathrm{DS}}\right|=\mathrm{V}_{\mathrm{DD}} / 2$.

[^1]:    ${ }^{2}$ Every doubling of the transistor widths in the PFD reduces the jitter by a factor of $\sqrt{2}$.

[^2]:    ${ }^{1}$ The PD can directly sample the reference sinusoid (without a buffer) [21, 22], but the much lower PD gain makes the noise of the subsequent stages more significant.

[^3]:    ${ }^{2}$ The resonance occurs with the tail parasitics and is not tunable.

[^4]:    ${ }^{3}$ Simulations confirm our intuition that the phase noise is the same as in the case of using a single retiming flipflop.

[^5]:    ${ }^{4}$ The DC current from the RBUF supply is 1.08 mA .

[^6]:    ${ }^{5}$ The value of the Gm following the PD is adjusted for single and double sampling so as to keep the loop bandwidth constant.

[^7]:    ${ }^{1}$ The coefficients of the FIR filter $=\frac{k_{n}}{\sum_{n=1}^{n=2^{2} k_{n}}}$.

[^8]:    ${ }^{2}$ The $\div 2$ circuit is proposed and designed by Onur Memioglu. The rest of the PLL is designed by the author of this dissertation.

