# UCLA UCLA Electronic Theses and Dissertations

**Title** DESIGN TECHNIQUES FOR HIGH SPEED POWER EFFICIENT DATA INTERFACES

**Permalink** https://escholarship.org/uc/item/4vk870jf

**Author** HU, BOYU

Publication Date 2016

Peer reviewed|Thesis/dissertation

# UNIVERSITY OF CALIFORNIA

Los Angeles

Design Techniques For High Speed Power Efficient Data Interfaces

A dissertation submitted in partial satisfaction of the

requirements for the degree Doctor of Philosophy

in Electrical Engineering

by

Boyu Hu

2016

© Copyright by

Boyu Hu

2016

#### ABSTRACT OF THE DISSERTATION

Design Techniques For High Speed Power Efficient Data Interfaces

by

Boyu Hu

Doctor of Philosophy in Electrical Engineering

University of California, Los Angeles, 2016

Professor Mau-Chung Frank Chang, Chair

The ever-exploding massive data traffic through both wireless and wireline channels has put critical requirements on the design of high-speed power-efficiency data interface. Serving as the role of bridging the real physical world, of which represented in a continuous form both on the amplitude and time axis, and the computing/controlling digital core operating with quantized measurements in a discrete step-by-step form, the performance matrix of the data interface directly reflects how well the information is preserved after being conveyed from one-domain to the other.

The first part of the dissertation discusses how to effectively mapping the recently emerged Compressive-Sensing theory into real world acquisition hardware implementations and thus provides an alternative solution to potentially improve the power and speed performance of data interfaces which fit in such domain-sparse-signal-oriented processing scheme. Two dual-mode ADC experimental prototypes, one of which is based on self-timed pipeline SAR-BS architecture while the other hybrids voltage-domain SAR-ADC and time-domain locally-readjusted 2D-Vernier TDC, supporting both general purpose Nyquist-Sampling as well as Compressive-Sensing for certain spectral sparse signals, are presented.

The second part focuses on exploring various low-resolution ultra-high-speed DAC implementations for voltage-mode multi-level signaling wireline transmitter design with equalization. A Capacitor-DAC-based approach is proposed due to its inherent advantages on providing linear binary-weighted voltage-domain summing; delivering high swing for capacitive loading; being power efficient and free of driver on-resistance introduced eye-distortion. This architecture is further extended into a pre-distortion-enabled one with independent eye-opening control when the transmitter serves as a baseband modulator for carrier-based communication applications, to effectively correct the non-linear transfer curve of the mixer stage within the RF transmitter. In addition, a C2C-DAC-based architecture is investigated to embed equalization into this pre-distortion-enabled transmitter. To cope with the swing shrink due to the buffer stage for applications requiring channel impedance matching in the Capacitor-DAC-based approach, a R2R-DAC-based transmitter architecture is investigated, which provides compact recursive structural and flexibility to be extended into a multi-tap equalization version.

The dissertation of Boyu Hu is approved.

William Kaiser

Wentai Liu

Danijela Cabric

Mau-Chung Frank Chang, Committee Chair

University of California, Los Angeles

2016

## **TABLE OF CONTENTS**

| LIST OF FIGURES                                                                                            | vii         |  |
|------------------------------------------------------------------------------------------------------------|-------------|--|
| LIST OF TABLES                                                                                             | xi          |  |
| ACKNOWLEDGEMENT                                                                                            |             |  |
| VITA                                                                                                       | xiv         |  |
| PUBLICATIONS                                                                                               | xv          |  |
| Chapter 1 Introduction                                                                                     | 1           |  |
| 1.1 Motivation                                                                                             | 1           |  |
| 1.2 Dissertation Organization                                                                              |             |  |
| Chapter 2 Dual-Mode Nyquist-Sampling / Compressive-Sensing Enabled Data Com-                               | verters 11  |  |
| 2.1 An 8-Bit Compressive-Sensing ADC With 4GS/s Equivalent Speed Utilizing S<br>Pipeline SAR-Binary-Search | Self-Timed  |  |
| 2.1.1 Hardware mapping from theory to implementation                                                       | 11          |  |
| 2.1.2 System-Level Simulation                                                                              |             |  |
| 2.1.3 Random Matrix Design                                                                                 |             |  |
| 2.1.4 Architecture Design                                                                                  |             |  |
| 2.1.5 Circuit Implementation                                                                               |             |  |
| 2.1.6 Signal Recovery                                                                                      |             |  |
| 2.1.7 Experiment Results                                                                                   |             |  |
| 2.2 A 9-Bit Time-Digital-Converter-Assisted Compressive-Sensing ADC With 4C Equivalent Speed               | ∂S/s<br>54  |  |
| 2.2.1 Time-Domain Signal Processing                                                                        |             |  |
| 2.2.2 System-Level Architecture                                                                            |             |  |
| 2.2.3 Circuit Implementation                                                                               |             |  |
| 2.2.4 Experiment Results                                                                                   |             |  |
| Chapter 3 Design Techniques for Multi-Level Wireline Transmitters                                          |             |  |
| 3.1 A Capacitive-DAC-Based Technique For Pre-emphasis-Enabled Multi-level T                                | ransmitters |  |
|                                                                                                            |             |  |
| 3.1.1 CML-Based Approach                                                                                   |             |  |
| 3.1.2 Direct-Coupling Approach                                                                             |             |  |

| 3.1.3 Resistor-DAC-Based Approach                                                                                              | 89  |
|--------------------------------------------------------------------------------------------------------------------------------|-----|
| 3.1.4 Capacitor-DAC-Based Approach                                                                                             | 91  |
| 3.1.5 Circuit Implementation                                                                                                   | 94  |
| 3.1.6 Experiment Results                                                                                                       | 97  |
| 3.2 Transmitter Design With Pre-distortion and C2C-Based Architecture                                                          | 100 |
| 3.2.1 Pre-distortion-enabled transmitter                                                                                       | 100 |
| 3.2.2 Circuit Implementation                                                                                                   | 105 |
| 3.2.3 C2C-DAC based architecture with equalization                                                                             | 108 |
| 3.3 A 34Gbps Voltage-Mode PAM-4 Wireline Transmitter With 2-Tap Feed-Forward-<br>Equalization Utilizing R2R-Based Architecture | 114 |
| 3.3.1 Revisited Resistor-DAC-Based Approach                                                                                    | 114 |
| 3.3.2 R2R-DAC-Based Approach                                                                                                   | 115 |
| 3.3.3 Circuit Implementation                                                                                                   | 122 |
| 3.3.4 Experiment Results                                                                                                       | 123 |
| Chapter 4 Conclusion                                                                                                           | 127 |
| REFERENCE                                                                                                                      | 131 |

## **LIST OF FIGURES**

| Figure 1.1 Data interface design trade offs                                                                                                       | 1     |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------|-------|--|--|
| Figure 1.2 (a) Nyquist-Sampling framework (b) Compressive-Sensing framework<br>Figure 1.3 Potential applications of CS-enabled acquisition system |       |  |  |
|                                                                                                                                                   |       |  |  |
| Figure 2.1 (a) Spectral sparse signal (b) Nyquist-sampling in time-domain (c) Math equation                                                       |       |  |  |
| describing Nyquist-sampling procedure                                                                                                             | . 11  |  |  |
| Figure 2.2 (a) Spectral sparse signal (b) Random-sampling in time-domain (c) Math equation                                                        |       |  |  |
| describing compressive-sensing procedure                                                                                                          | . 12  |  |  |
| Figure 2.3 (a) Random-sampling in time-domain (b) Mapping from math equation to hardward                                                          | e (c) |  |  |
| Math equation describing compressive-sensing procedure                                                                                            | . 14  |  |  |
| Figure 2.4 System-level simulated results of a spectral sparse signal in time/frequency-domain                                                    | 1     |  |  |
| and its recovered version in time/frequency-domain utilizing CS-enabled acquisition framewo                                                       | rk    |  |  |
|                                                                                                                                                   | . 15  |  |  |
| Figure 2.5 Simulated recovered SNDR performance with different number of input tones and                                                          |       |  |  |
| compression ratios                                                                                                                                | . 16  |  |  |
| Figure 2.6 Simulated recovery detection rate performance with different number of input tones                                                     | S     |  |  |
| and compression ratios                                                                                                                            | . 17  |  |  |
| Figure 2.7 Simulated recovered SNDR performance with different input SNDR and 6-bit/9-bit                                                         | t     |  |  |
| quantization                                                                                                                                      | . 19  |  |  |
| Figure 2.8 Simulated recovered SNDR performance with different input SNDR and different                                                           |       |  |  |
| compression ratios                                                                                                                                | . 21  |  |  |
| Figure 2.9 Random-matrix                                                                                                                          | . 22  |  |  |
| Figure 2.10 Sampling points on time-axis                                                                                                          | . 23  |  |  |
| Figure 2.11 System diagram of the dual-mode SAR-BS ADC at chip-level                                                                              | . 25  |  |  |
| Figure 2.12 Proposed self-timed pipeline SAR-BS ADC architecture                                                                                  | . 26  |  |  |
| Figure 2.13 Proposed self-timed pipeline scheme                                                                                                   | . 28  |  |  |
| Figure 2.14 Conventional two-stage pipeline SAR-ADC and proposed pipeline SAR-BS ADC                                                              | 29    |  |  |
| Figure 2.15 Waveform of 1st-stage ADC Cap-DAC and 2nd-stage ADC input in conventional                                                             |       |  |  |
| and proposed architectures                                                                                                                        | . 32  |  |  |
| Figure 2.16 Circuit implementation of the comparator in SAR-ADC                                                                                   | . 33  |  |  |
| Figure 2.17 Direct feed-through control logic                                                                                                     | . 35  |  |  |
| Figure 2.18 Co-work of the direct feed-through logic with split capacitor-DAC                                                                     | . 36  |  |  |
| Figure 2.19 Passive-charge-sharing with open-loop-amplifier                                                                                       | . 37  |  |  |
| Figure 2.20 (a) Comparator and its related digital logics, timing and calibration circuits (b) rea                                                | ıdy   |  |  |
| generator for the whole BS-ADC (c) 15-to-4 priority encoder                                                                                       | . 39  |  |  |
| Figure 2.21 Circuit implementation of the comparator in BS-ADC                                                                                    | . 40  |  |  |
| Figure 2.22 Reference-voltage-fitting calibration                                                                                                 | . 41  |  |  |

| Figure 2.23 Concept flow of signal recovery                                                     | . 42 |
|-------------------------------------------------------------------------------------------------|------|
| Figure 2.24 State-transfer-diagram of OMP-based recovery                                        | . 44 |
| Figure 2.25 Time and frequency-domain waveforms of a simulated recovery procedure               | . 46 |
| Figure 2.26 Die micrograph of the dual-mode pipeline-SAR-BS ADC                                 | . 47 |
| Figure 2.27 Measured DNL and INL performance before and after calibration                       | . 48 |
| Figure 2.28 Measured SNDR and SFDR performances as the input frequency changes in NS-           |      |
| mode and CS-mode with post-processing and recovery                                              | . 48 |
| Figure 2.29 Measured output spectrum of an input signal of 0.73MHz in NS-mode with a            |      |
| decimation of 64                                                                                | . 49 |
| Figure 2.30 Recovered output spectrum of an input signal of 1941MHz in CS-mode                  | . 50 |
| Figure 2.31 Recovered output spectrum of a 3% spectral sparse signal                            | . 50 |
| Figure 2.32 Recovered output spectrum of a spectral sparse two-band AM signal                   | . 51 |
| Figure 2.33 Ideal and demodulated two baseband signals                                          | . 52 |
| Figure 2.34 Different information representation forms in time-domain                           | . 54 |
| Figure 2.35 Corresponding building blocks for voltage-domain signal processing and time-        |      |
| domain signal processing                                                                        | . 55 |
| Figure 2.36 (a) System-level architecture of dual-mode hybrid SAR-TDC ADC (b) its timing-       |      |
| diagram                                                                                         | . 58 |
| Figure 2.37 Comparison between static amplifier and voltage-time-converter                      | . 60 |
| Figure 2.38 Circuit implementation of the voltage-time-converter                                | . 63 |
| Figure 2.39 Quantization principles of the Flash-TDC, Vernier-TDC and 2D-Vernier-TDC            | . 64 |
| Figure 2.40 Locally-Readjusted 2D-Vernier-TDC                                                   | . 68 |
| Figure 2.41 (a) Circuit implementation of the time-arbiter (b) time-adjuster                    | . 69 |
| Figure 2.42 (a) Replica-based coarse V-T mapping (b) 1-on-1 fine V-T mapping                    | . 71 |
| Figure 2.43 Die micrograph of the dual-mode hybrid SAR-TDC ADC                                  | . 73 |
| Figure 2.44 Measured output spectrum of an input signal at 0.73MHz with calibration             | . 74 |
| Figure 2.45 Measured SNDR and SFDR performances as the input frequency changes in NS-           |      |
| mode and CS-mode with calibration, post-processing and recovery                                 | . 74 |
| Figure 3.1 Simulated frequency response of a 20-inch FR-4 trace and its time-domain response    | e    |
| of a 50ps width pulse                                                                           | . 79 |
| Figure 3.2 Simulated two-tap FFE pulses with different coefficients at TX side before and after | er   |
| passing through the 20-inch FR-4 trace                                                          | . 79 |
| Figure 3.3 Simulated frequency-domain response of a two-tap FIR filter with different           |      |
| coefficients                                                                                    | . 80 |
| Figure 3.4 Simulated channel response with a combine of FR-4 trace and Tx-side FFE with         |      |
| different coefficients                                                                          | . 81 |
| Figure 3.5 Simulated 40Gbps PAM-4 signal eye-diagram after passing through a 20-inch FR-        | -4   |
| trace without equalization                                                                      | . 82 |
| Figure 3.6 Simulated 40Gbps PAM-4 signal eye-diagram after passing through a 20-inch FR-        | .4   |
| trace with post-tap equalization                                                                | . 83 |

| Figure 3.7 CML-based PAM-4 transmitter front-end.                                              | 83       |
|------------------------------------------------------------------------------------------------|----------|
| Figure 3.8 Direct-coupling-based PAM-4 transmitter front-end                                   | 85       |
| Figure 3.9 (a) Direct-coupling-based PAM-4 driver (b) the dependency of transistor's on-       |          |
| resistance on output voltage                                                                   | 85       |
| Figure 3.10 (a) voltage variation for output middle-levels (b) the sensitivity of output volta | ige on   |
| the size of driver's transistor                                                                | 87       |
| Figure 3.11 The PAM-4 output voltage waveform and its corresponding current waveform           | with     |
| direct-coupling-based approach                                                                 | 88       |
| Figure 3.12 Resistor-DAC-based PAM-4 driver                                                    | 89       |
| Figure 3.13 (a) Eye-diagram of Resistor-DAC-based approach at 30Gb/s with R=100 $\Omega$ (b    | ) with   |
| R=400 Ω                                                                                        | 90       |
| Figure 3.14 (a) Capacitor-DAC-based PAM-4 driver (b) its equivalent model (c) conceptua        | al       |
| voltage and current transition waveform                                                        | 91       |
| Figure 3.15(a) Eye-diagram of Capacitor-DAC-based approach at 30Gb/s with no pre-emp           | ohasis   |
| (b) with 25% pre-emphasis                                                                      | 93       |
| Figure 3.16 The PAM-4 output voltage waveform and its corresponding current waveform           | with     |
| Capacitor-DAC-based approach                                                                   | 94       |
| Figure 3.17 A 2-tap pre-emphasis PAM-4 front-end with Capacitor-DAC-based approach .           | 95       |
| Figure 3.18 Capacitor-DAC-based PAM-4 transmitter architecture                                 | 96       |
| Figure 3.19 Die micrograph of the Capacitor-DAC-based PAM-4 transmitter                        | 97       |
| Figure 3.20 (a) Measured eye-diagram at 20Gb/s (b) Measured eye-diagram at 25Gb/s              | 98       |
| Figure 3.21 (a) RF transmitter with PAM-4 baseband modulator (b) non-linear input-outp         | ut       |
| curve of RF transmitter (c) distorted and ideal PAM-4 eye-diagram                              | 100      |
| Figure 3.22 (a) Mapping baseband input voltages on the non-linear transfer-curve (b) pre-      |          |
| distorted PAM-4 eye-diagram                                                                    | 103      |
| Figure 3.23 Binary-to-thermal encoder                                                          | 105      |
| Figure 3.24 Pre-distortion-enabled Capacitor-DAC-based PAM-4 transmitter front-end             | 106      |
| Figure 3.25 (a) Evenly distributed PAM-4 eye-diagram (b) with bottom eye stretched (c) w       | vith     |
| middle-eye stretched (d) with top-eye stretched                                                | 107      |
| Figure 3.26 C2C-DAC architecture                                                               | 108      |
| Figure 3.27 Pre-distortion-enabled C2C-based PAM-4 transmitter front-end                       | 110      |
| Figure 3.28 Top-level architecture of pre-distortion-enabled C2C-based PAM-4 transmitte        | r 111    |
| Figure 3.29 (a) Eye-diagram of the proposed C2C-based PAM-4 transmitter after passing          |          |
| through an 8-inch FR-4 trace with no equalization (b) with equalization                        | 112      |
| Figure 3.30 (a) Resistor-DAC-based PAM-4 transmitter front-end w/o equalization (b) w/i        | two-     |
| tap equalization                                                                               | 114      |
| Figure 3.31 (a) Proposed R2R-DAC-based 2-tap FFE enabled PAM-4 transmitter front-end           | d (b)    |
| extend the architecture to N-tap equalization                                                  | 116      |
| Figure 3.32 (a) R-C model for speed analysis (b) for power analysis (c) for linearity analysis | sis. 118 |

| Figure 3.33 (a) non-lineary eye-diagram with stretched middle-eye (b) non-linear eye-diagram | n   |
|----------------------------------------------------------------------------------------------|-----|
| with compressed middle-eye (c) RLM as driver's on-resistance varies (d) RLM as the ratio     |     |
| between on-resistance and 2R varies                                                          | 119 |
| Figure 3.34 Regulate on-resistance through replica-biasing                                   | 121 |
| Figure 3.35 R2R-DAC-Based Transmitter Architecture                                           | 122 |
| Figure 3.36 Die micrograph of the R2R-DAC-Based PAM-4 transmitter                            | 123 |
| Figure 3.37 Measured eye-diagram at 20Gb/s w/o equalization                                  | 124 |
| Figure 3.38 (a) Measured eye-diagram at 34Gb/s w/o equalization (b) w/i post-tap equalizati  | on  |
|                                                                                              | 125 |

## LIST OF TABLES

| Table 2.1 Summary of Performance Comparison of the Pipeline-SAR-BS ADC With State-of- |                |
|---------------------------------------------------------------------------------------|----------------|
| Art                                                                                   | 53             |
| Table 2.2 Summary of Performance Comparison of the Hybrid-ADC with State-of-Art       | 75             |
| Table 3.1 Summary of Performance Comparison of Capacitor-DAC-Based PAM-4 Transmitter  |                |
| with State-of-Art                                                                     | <del>)</del> 9 |
| Table 3.2 Summary of Performance Comparison for R2R-DAC-Based PAM-4 Transmitter With  | h              |
| State-of-Art 12                                                                       | 26             |

#### ACKNOWLEDGEMENT

People always say that "To Get a Ph.D is a lonely journey". I feel so blessed that I have been guided by great mentors and warmed by encouraging peer people all the way along so that I can be whom I am today. Now standing at the end of my 5-years academic life at UCLA, I would say that the names on the list to whom I should express my deepest gratitude can never be too long.

I would like to take this opportunity to thank my advisor, Professor Mau-Chung Frank Chang. I sincerely appreciate his guidance over these past years and will always keep his encouragement of "never limit yourself" in my mind. Without his support, I could hardly have had the chance to widely broaden my circuit and system design knowledge and skill set into various different fields within my graduate career. I would also like to thank Professor William Kaiser, Professor Wentai Liu and Professor Danijela Cabric for taking their time to serve on my committee.

I would like to thank Dr. Xicheng Jiang in Broadcom, for his patient and insightful guidance during my internship and later on serving as the mentor for my Broadcom Fellowship project. I would also like to thank Dr. Young-Kai Chen in Bell Labs for providing me the opportunity to work on the high-speed wireline project, and sharing with me his extensive knowledge and experience on both circuit design and optical communications.

I have learned a lot from my colleagues at UCLA EE Department. Dr. Zuow-Zun Chen was always there for help and kindly answered my technique questions all over these years. Dr. Fengbo Ren introduced me to many important concepts and techniques on the topics of algorithms and DSP-related VLSI design. Dr. Yuan Du and Rod Kim shared with me their knowledge on many aspects of high-speed wireline design and testing. I enjoyed the discussion time with all of them on those exciting new ideas.

Finally, I must thank my parents for their endless love, support and encouragement in the past years. I sincerely appreciate all they have done for me. Without them, I could have never walked so far and made my childhood dream come true.

# VITA

| 2004-2008 | B.S., Electronics and Information Engineering, Chu-<br>Kochen Honors College, Zhejiang University, China |
|-----------|----------------------------------------------------------------------------------------------------------|
| 2008-2011 | M.S., Circuit and System, Zhejiang University, China                                                     |
| 2011-2016 | Ph.D Candidate, University of California, Los Angeles                                                    |

### **PUBLICATIONS**

1. **B.Hu**, Y. Du, R. Huang, J. Lee, Y.-K. Chen and M.-C.F.Chang, "A Capacitor-DAC-Based Technique For Pre-Emphasis-Enabled Multi-Level Transmitters," IEEE Trans. Circuits Syst.II, Exp. Briefs, Accepted

2. **B.Hu**, Y. Du, R. Huang, J. Lee, Y.-K. Chen and M.-C.F.Chang, "A 34Gbps Voltage-Mode PAM-4 Wireline Transmitter With 2-Tap Feed-Forward-Equalization Utilizing R2R-Based Architecture," IEEE Trans. VLSI Systems., Submitted

3. **B.Hu**, F.Ren, Z.-Z.Chen, X.Jiang and M.-C.F.Chang, "An 8-Bit Compressive-Sensing ADC With 4GS/s Equivalent Speed Utilizing Self-Timed Pipeline SAR-Binary-Search," IEEE Trans. Circuits Syst.II, Exp. Briefs, Vol.63, No.10, Oct. 2016

4. **B.Hu**, F.Ren, Z.-Z.Chen, X.Jiang and M.-C.F.Chang, "9-bit time-digital-converter-assisted compressive-sensing analogue-digital-converter with 4GS/s equivalent speed," IEE Electronics Letters, Vol.52, No.6, Mar. 2016

5.Z.-Z.Chen, Y.Li, Y.-C. Kuan, **B.Hu**, C.-H Wong and M.-C.F.Chang, "Digital PLL for phase noise cancellation in ring oscillator based I/Q receivers", 2016 IEEE Symp.VLSI

# **Chapter 1 Introduction**

## **1.1 Motivation**



Figure 1.1 Data interface design trade offs

Data interfaces serve as the bridge linking the way how we feel world and that how machine does.

A typical data interface is composed of data converters with or without pre-conditioning circuits. As the data rate is growing exponentially these days, data interfaces are very often the performance bottle neck of the whole system. Trade-offs are lasting forever among achievable speed, dynamic range and power consumption due to all the non-idealities brought by the physical world.

It is noted that we are actually not exploiting the full capability of such data interfaces in certain cases when we complained that we had crashed on the technology limitation. The observation is based on the fact that such data interfaces always employing Nyquist-Sampling, which is a safe yet conservative scheme assuming that we have no prior knowledge on the data besides the instantaneous bandwidth. Data is acquired, conditioned, transformed at its raw volume.

We can do better than that under certain conditions. Actually many of the natural signals have more concise, or so called sparser representations, which means only a small amount of coefficients could well-contain most of the information, on certain basis. As shown in Fig.1.1, for a signal in its sparse domain, if its coefficients are rearranged by their magnitude, a significant decay will appear when the index exceeds a relatively small number.

Taking frequency domain signal acquisition as a concrete example. If we know that a narrow band sparse frequency signal is within 1% percentage anywhere within a 2GHz bandwidth. Does this additional prior knowledge of 1% percentage brings us additional benefits in terms of data interface design compared with the minimum knowledge of 2GHz bandwidth?

The answer is yes. With this prior knowledge of spectral sparsity, it is possible to compress the data at the acquisition stage, which effectively reduces the downstream data burden, relaxes the circuit design effort and shifts the computation complexity from power/area sensitive places to non-sensitive places within the system.

2

How can we achieve this? With prior knowledge provided, we may turn to a alternative acquisition framework utilizing compressive-sensing [1-5].



(b) Compressive-Sensing(CS)

Figure 1.2 (a) Nyquist-Sampling framework (b) Compressive-Sensing framework

General purpose signal acquisition using Dirac samplers at uniform time grids and follows Nyquist sampling theory as shown in Fig.1.2 (a) : the sampling rate should be at least twice the maximum frequency presented in the signal. The output data is processed with a Fast-Fourier-Transform Decoder at its raw data rate.

Compressive Sensing theory suggests an alternative data acquisition framework that can indirectly access the signal information in its sparse domain at sub-Nyquist rate as shown in Fig.1.2 (b). By incorporating randomness into the sampling process, sparse signal information can be well encoded into much fewer samplers, and the original signal can be robustly recovered in digital domain by applying sparsity as prior knowledge given the random-matrix. In addition, recent research from signal processing community show promising and powerful signal processing techniques to solve such problems as detection, classification and filtering directly on

the compressed data domain with low-complexity computations [6]. Such a trend makes the CSframework a potentially power and hardware efficient complementary solution to its NSframework counterpart for domain-specific sparse signal processing.



Figure 1.3 Potential applications of CS-enabled acquisition system

One kind of sparse signal is the spectral sparse signals, which have sparse coefficients representation on Fourier-basis. They are widely encountered in signal processing.

Compressive sensing may find its potential applications [7-10] in the future for such signals as acquiring certain spectral-sparse signals located anywhere within a wide frequency range; localization/detection of wireless transmitters for intelligence communication, power spectral density estimation and so on, as shown in Fig.1.3.

Though promising in theory, only limited hardware implementation of CS-enabled data interface have been reported, and mainly focused on demonstrating its applications to relatively low-speed conversion with improved power efficiency [11-12], wideband analog receiver front-end but with no embedded quantization [13], or sampling the signal at Nyquist-rate and implement compressive-sensing in digital domain [14]. The combination of CS-theory with high-speed power-efficiency data converter architectures in mainstream deep-sub-micron CMOS technology has remained rarely touched.

In the first part of this dissertation, we demonstrate our approach to fill the gap between the theory and the hardware implementation of CS-equipped devices from a hardware designer's perspective. We start from architecture-level mapping from the abstracted CS-based acquisition framework model to concrete ADCs. Then techniques at both the ADC architecture level as well as circuit level are proposed and employed, leading to two high-speed power-efficiency ADC prototypes featured by both NS and CS-based operation for certain spectral sparse signals.



Figure 1. 4(a) Electrical link with multi-level signaling (b) Optical link with multi-level signaling

On the other hand, although the aforementioned potential applications where specific sparse domain can be utilized for signal processing at a reduced data-rate are promising, there are many applications where the data flow do not show a statistically stable sparse characteristic, or the rate is so high that the computation complexity introduced overburden has surpassed the potential benefits brought by CS-based framework, such as general purpose ultra high-speed I/O interfaces. For such cases, it is a natural choice to remain within the general purpose NS-based framework and focus on how to improve the data conversion/transfer efficiency with novel architecture/circuit solutions.

High data-rate electrical and optical interconnects are in high demand to meet exponentially increasing information capacity requirements associated with high-performance computing, chip-to-chip communication and data-center related applications. Electrical and Optical links employing NRZ signaling achieving >20Gb/s were reported [15-19]. Nevertheless, parasitic capacitance introduced by chip pads and bonding/packaging limit the maximum symbol-rate that can be delivered from the silicon chip to the electrical/optical channels with manageable ISI levels. In addition, connectors, vias, and PCB traces in both electronic backplanes and fiber optical modulators introduce severe loss and distortions at high speed/frequency operations. In many cases, complex equalization with bulky on-chip inductors were needed for signal compensation, raising serious issues in terms of energy and cost-effectiveness.

Multi-level modulation schemes with higher spectral efficiency such as PAM-4 [20-22] or even PAM-10 [23] have attracted increasing attention recently for both electrical and optical link designs due to its relaxed circuit bandwidth and equalization requirements as well as tolerance to clock jitters, as shown in Fig.1.4. Nonetheless, the increased quantization levels from 2 to 4 in PAM-4 case inevitably leads to SNR reduction of 9.5dB, leading to the requirement of high-

swing driver techniques. Most of the previously reported PAM-4 transmitters were based on Current-Mode-Logic (CML) and can hardly be applied to applications of driving devices with capacitive loading effect and high voltage swing request, such as <u>Microring</u> modulators or Mach-<u>Zehnder</u> modulators in optical communications, as shown in Fig.1.4 (b). On the other hand, the voltage-mode transmitter may be better suited for high-swing applications. The major part of previous reported equalization schemes for voltage-mode transmitters are focused on NRZ signaling [24-26]. Voltage-mode PAM-4 transmitters have been reported yet with no equalization implemented [27-28]. It is noted that multi-level signaling in voltage-domain with equalization is much more challenging than that in current-domain due to the lack of effective direct linear voltage combining technique.

In the second part of this dissertation, we explore novel architectures suitable for high-speed power-efficient multi-level wireline transmitter design with embedded equalization functionality. We first analyze several existing multi-level transmitter architecture: CML-based approach, Direct-coupling-based approach and Resistor-DAC-based approach, in terms of their speed, power-efficiency, linearity, as well as implementation complexity. Then the proposed Capacitive-DAC-based approach will be introduced with detailed analysis and a silicon prototype. This architecture is further extended into a pre-distortion enabled one which can adjust the opening level of each of the 3 eyes. In addition, C2C-based architecture is introduced to embed the equalization functionality into the pre-distortion-enabled one. For dedicated high-swing applications with request of impedance matching, a compact R2R-based architecture is investigated and concept proved with a silicon prototype.

### **1.2 Dissertation Organization**

The dissertation is composed of four chapters.

Chapter two presents two ADC silicon prototypes, both of which supporting Nyquist-Sampling as well as Compressive-Sensing for certain spectral sparse signals.

Section 2.1 introduce the first prototype. This is a single-channel 8-Bit ADC with self-timed pipelined clocking scheme, which is dedicated for both regular and randomness-embedded irregular sampling and quantization in Nyquist-Sampling and Compressive-Sensing mode, respectively. At architecture level, a 5-Bit comparator-interleaved Successive-Approximate-Register (SAR) ADC hybrids a 4-Bit Binary-Search (BS) ADC with 1-Bit overlap for over-range correction. This architecture provides 1-Bit/cycle power efficient two-step conversion and being able to absorb the inter-stage error into the 2nd-stage. In addition, passive-charge-sharing scheme together with open-loop-amplifier is adopted for effective pipeline cycle acceleration, which in turn leads to extended equivalent acquisition bandwidth. This approach results in a 8-Bit ADC with 0.5GS/s speed in Nyquist-Sampling-mode with a Figure-of-Merit (FoM) of 239 fJ/CS and 4GS/s equivalent speed in Compressive-Sensing-mode with a FoM of 71 fJ/CS.

Section 2.2 presents the second prototype, which is a 9-Bit dual-mode ADC design. Compared with previous approach, this architecture hybrids a voltage-domain 5-Bit comparator-interleaved SAR ADC and a time-domain 5-Bit locally re-adjusted folding two-dimensional (2D) Vernier Time-Digital-Converter (TDC) with 1-Bit redundancy. The voltage-domain residue amplifier is

replaced by a cross-domain Voltage-Time-Converter (VTC). This highly digital-intensive architecture leads to a FoM of 101 fJ/CS with a equivalent speed of 4GS/s.

Chapter three demonstrates three multi-level wireline transmitter designs targeting at high-speed power-efficient I/O applications.

In section 3.1, after analyzing the drawbacks of conventional Current-Mode-Logic (CML)-based approach, Direct-Coupling-based approach and Resistor-DAC-based approach, the first novel architecture proposed is a Capacitive-DAC-based approach. This technique can be directly applied to capacitive-loading applications with high-swing request such as driving optical modulators, and can also be utilized for electrical I/O transmitter design with cascaded driver for impedance matching. A PAM-4 transmitter silicon prototype with 2-tap Feed-Forward-Equalization (FFE) is demonstrated for concept proof. It achieves 25Gbps data-rate and energy-efficiency of 2mW/Gbps.

Based on this, In section 3.2, a pre-distortion scheme at transmitter side derived from the proposed Capacitive-DAC-based approach is also analyzed, which is critical to the multi-level optical modulator driver design or baseband modulator design for carrier-based communication systems. The architecture is further extended into a C2C-DAC-based one with equalization.

At last, a R2R-based transmitter architecture for voltage-mode multi-level high-swing electrical I/O application is analyzed in section 3.3. Compared with the conventional Resistor-DAC-based approach, the R2R structure simplifies the high speed digital decoding logics and effectively reduces the total number of unit cell within the driver stage, thus leads to high-speed power-efficient design with compact implementation. It achieves 34Gbps data-rate and energy-efficiency of 2.7mW/Gbps.

Chapter four summaries the dissertation with a conclusion.

# Chapter 2 Dual-Mode Nyquist-Sampling /

# **Compressive-Sensing Enabled Data Converters**

# 2.1 An 8-Bit Compressive-Sensing ADC With 4GS/s Equivalent Speed Utilizing Self-Timed Pipeline SAR-Binary-Search

### **2.1.1 Hardware mapping from theory to implementation**



Figure 2.1 (a) Spectral sparse signal (b) Nyquist-sampling in time-domain (c) Math equation describing Nyquist-sampling procedure

The time and frequency domain corresponding relationship in NS-framework is straightforward. For a sparse spectrum on n-bin discrete-Fourier-basis represented by a vector  $f_{nx1}$ , as shown in Fig.2.1(a), only k bins contains significant coefficients, where k<<n . In Fig.2.1(b), n measurements  $x_{nx1}$  in time domain would be needed to get the k coefficients. The frequency coefficients can be retrieved by multiplying  $x_{nx1}$  with the inverse of the DFT basis matrix  $\psi_{nxn}$ .



Figure 2.2 (a) Spectral sparse signal (b) Random-sampling in time-domain (c) Math equation describing compressive-sensing procedure

$$f_{n\times 1} = x_{n\times 1} \times \psi^{-1}_{n\times n} \tag{2.1}$$

In CS-framework, only m measurements from the signal, which is much smaller than n, would be sufficient to recover the original k coefficients, as shown in Fig.2.2 (b), if the sampling process adopts a known random-matrix  $A_{mxn}$  and m satisfies:

$$m > k \times \log_2 n \tag{2.2}$$

It is shown in Fig.2.2 (c) that combining  $A_{mxn}$  and the orthogonal n-bin DFT basis  $\psi_{nxn}$  leads to the sampling matrix in frequency-domain as  $\phi_{mxn}$ . This encodes  $f_1...f_k$  into the compressed measurement as  $y_{mx1}$  in CS-operation instead of  $x_{nx1}$  in general NS-operation. This acquisition procedure can be expressed as:

$$y_{m\times 1} = A_{m\times n} \times \psi_{n\times n} \times f_{n\times 1} \tag{2.3}$$

It is obvious that any attempt of direct solving equation (2.3) is ill-posed since the dimension of the unknown  $f_{nx1}$  is higher than  $y_{mx1}$ , which is the measurement provided. However, it is noted that we have the pre-knowledge that the signal is sparse in frequency-domain. This means the major part of the coefficients in vector  $f_{nx1}$  are non-significant and thus can be ignored with minimal effect on the recovered signal.

Thus, instead of linear projection based calculated as in Nyquist-Sampling framework, the estimation of only significant coefficients  $f_1...f_k$  (supposing the coefficients are arranged according to their magnitude) are calculated in digital domain by solving non-linear convex optimization problems, while several effective recovery algorithms have been proposed and proven in theory by the signal processing community [29-32].

One straightforward way to understand the sampling procedure is depicted as in Fig.2.2 (b). The procedure can be considered as first sampling the input signal uniformly with "virtual" samplers at time intervals of a minimum time grid  $\Delta t_{min}$ , and then picking up the physical samplers from them following the random-matrix  $A_{mxn}$ .  $\Delta t_{min}$  corresponds to the pre-defined physical time intervals between any two neighboring elements in a same row of  $A_{mxn}$ . The equivalent CS acquisition bandwidth is given by the inverse of  $\Delta t_{min}$  while the compression-rate, which relates

to the tolerable input signal sparsity levels, is defined by the average physical sampling period over  $\Delta t_{min}$ .



Figure 2.3 (a) Random-sampling in time-domain (b) Mapping from math equation to hardware (c) Math equation describing compressive-sensing procedure

Hardware mapping from math model need to finish several tasks, as shown in Fig.2.3. First, to embed the two-dimensional random-matrix into the one-dimensional time axis for a single ADC to operate. 2<sup>nd</sup>, to design the ADC logics to make it being able to follow both the regular and randomness embedded irregular operation. 3<sup>rd</sup>, to achieve high-speed power-efficient quantization through architecture and circuit level design. 4<sup>th</sup>, to recover/process compressed data in digital domain if CS-mode is enabled. These issues will be discussed in the following contents.

### 2.1.2 System-Level Simulation



Figure 2.4 System-level simulated results of a spectral sparse signal in time/frequency-domain and its recovered version in time/frequency-domain utilizing CS-enabled acquisition framework

To model the acquisition system in physical world, additional noise should be brought into consideration, and equation (2.3) should be modified as:

$$y_{m\times 1} = A_{m\times n} \times \psi_{n\times n} \times f_{n\times 1} + n_{m\times 1} \tag{2.4}$$

where  $n_{mx1}$  models the additional noise during the measurement.

A system-level model resembling our targeted design is built with Matlab to verify the acquisition and recovery performance. The maximum physical sampling speed is set as 0.5GS/s and the equivalent acquisition bandwidth is set as 2GHz in CS-mode. The input signal consists of ten frequency tones distributed across the 2GHz bandwidth on 1024 DFT bins, which is around a sparsity level of 1%. Gaussian white noise is added to the signal to model the measurement noise term in (2.4) while keeping the Signal-to-Noise-Ratio (SNR) > 60dB. Fig.2.4 shows the simulated original signal and the recovered signal both in frequency-domain and in time-domain. The recovered signals resemble their original counterparts with high accuracy in terms of both the location/magnitude in frequency-domain and the continuous waveform shape in time-domain.



Figure 2.5 Simulated recovered SNDR performance with different number of input tones and compression ratios

To further investigate the recovery performance under different compression rate as the sparsity level of input signal varies, the following experiment is conducted. Under each compression ratio 2/4/6/8, the number of input tones, which in other word the input signal sparsity level, is gradually increased. The signal is sent to the system model for acquisition and recovery and the recovered Signal-to-Noise-Distortion (SNDR) is calculated. The recovered SNDR is defined as the ratio between the sum of the total power on the input signal bins and that on all the other bins in terms of the recovered frequency spectrum. As can be seen from Fig.2.5, higher compression rate naturally leads to a shrink of the signal sparsity tolerance level. As the input number of frequency bins increase, the recovered SNDR degrades gracefully.



Figure 2.6 Simulated recovery detection rate performance with different number of input tones and compression ratios

Detection rate is another benchmark to evaluate the recovery performance. It is defined as the ratio of correctly detected number of frequency bins over the total number of frequency bins in the input signal. This experiment is also conducted under different compression ratio while varying the input signal sparsity levels. Ten runs are repeated for each calculation and their average value is adopted for the detection rate. Same trend is shown in Fig.2.6 as that in Fig.2.5, for higher compression ratio, the calculated detection rate starts to drop at earlier stages with sharper slope as the number of input tones increases.

These two experiments indicate that both the compression ratio and the sparsity level of input signal play critical role in terms of the signal recovery performance. While the compression ratio can be adjusted at the system level, the sparsity level of the input signal is mainly decided by the surrounding RF environment and thus may not be fully under control. For such a reason, if the input signal is very densely populated in frequency-domain, the system needs to switch back to the general purpose Nyquist-Sampling framework with a reduced acquisition bandwidth.



Figure 2.7 Simulated recovered SNDR performance with different input SNDR and 6-bit/9-bit quantization

Since an ADC performs both sampling and quantization, the other major non-ideality added during the acquisition procedure is the quantization noise. After quantization, the quantized measurement in digital-domain can be represented as:

$$y_q_{m\times 1} = Q(A_{m\times n} \times \psi_{n\times n} \times f_{n\times 1} + n_{m\times 1})$$
(2.5)

Here Q(.) models the quantization effect put on the noise-added measurement.

For a N-bit quantizer with a full-scale voltage range of  $V_{FS}$ , the quantization noise power can be calculated as:
$$\Delta^2 q = \frac{1}{12} \left( \frac{V_{FS}}{2^N} \right)^2 \tag{2.6}$$

And the Signal-Quantization Noise-Ratio (SQNR) can be expressed as:

$$SQNR = \frac{\frac{1}{2} \left(\frac{V_{FS}}{2}\right)^2}{\frac{1}{12} \left(\frac{V_{FS}}{2^N}\right)^2}$$
(2.7)

If doing log scale to both side of equation (2.7), it can be re-written as:

$$SQNR = 6.02 \times N + 1.76$$
 (2.8)

To verify a combination of the quantization noise effect with the additional measurement white noise on the recovery performance, an ideal quantizer model is applied to the previous CS acquisition system model in Matlab. For quantizer resolution of 6-bit and 9-bit, the SNDR of the input signal are increased from low to high and the recovered SNDR is calculated. It is noted that when the input SNDR is at its relatively lower range, the recovered SNDR is linear proportional to the input SNDR, indicating that the major noise is contributed from the additional white noise. As the input signal SNDR further increase, the quantization noise starts to dominate. The recovered SNDR gradually saturates and approximates the theoretical SQNR upper limits of the two quantizers, respectively.



Figure 2.8 Simulated recovered SNDR performance with different input SNDR and different compression ratios

The last system level simulation conducted is to verify the impact of compression ratio on the recovered SNDR with the existing of both measurement noise and quantization noise. As shown in Fig.2.8, the recovered SNDR is calculated as the input SNDR increase from low to high under different compression ratios. It is noted that as long as the input signal sparsity level is far below the lower bound as shown in equation (2.2), the recovered SNDR is marginally affected by the compression ratio variation.

### 2.1.3 Random Matrix Design

As stated in 2.1.1, one of the most critical part that leads to a successfully measurement compression is to embed the sampling procedure with randomness. For a vector dimension

reduction from n to m, a random matrix with dimension m by n would be necessary. However, two issues remain challenging when mapping this random matrix into corresponding hardware implementation. First, since there are m rows, m parallel ADCs would need to work simultaneously, if "1", which indicates sampling, would appear at any positions within a row. Second, the physical time interval between any of the two neighboring elements within the same



Figure 2.9 Random-matrix

row is  $\Delta t_{min}$ , which is the inverse of the equivalent acquisition bandwidth. This means that the minimum sampling time window would be  $\Delta t_{min}$  if "1" appears in the consecutive two elements.

To avoid these two issues, the random matrix design adopts a similar approach as [11,33]. Each row of  $A_{mxn}$  is designed to contain only one sampling time-window. For row k, its time window is composed of a fixed part of  $8\Delta t_{min}$  from  $p_k$  to  $p_k+7$  and a subsequent variable part randomly being  $q \times \Delta t_{min}$ , where q uniformly distributed among 0 to 7. "1" indicates a physical sampling initiated by a single-pulse trigger while "0" indicates no physical action. The consecutive sampling time points can be expressed as:

$$t_{k+1} = t_k + (8+q) \times \Delta t_{\min}$$
(2.9)

The starting point  $p_{k+1}$  of the next row begins right after row k's sampling point. Since there is no overlapping between any of the sampling time-windows, the two-dimension  $A_{mxn}$  can be flattened onto the one-dimension time axis. In such a way, a single ADC works in a time-division-multiple-access-like scheme and thus avoids parallel operation of a ADC bank.



Figure 2.10 Sampling points on time-axis

Fig.2.10 shows the generated consecutive sampling time points within a fraction of the time window. They follow equation (2.9) with a minimum time interval of  $8\Delta t_{min}$ .

#### 2.1.4 Architecture Design

Fig.2.11 shows the system diagram at chip level. The whole acquisition system consists of the Dual-mode ADC, a UART controller, a memory block and a random-matrix clock generator.

Following the design scheme discussed in 2.1.3, the randomness embedded sampling and quantization time point sequence is provided by a random-matrix clock generator. Similar as the architecture in [33], it is composed of two parts: the pulse generator with variable time interval and the randomness generator. The pulse generator is implemented in the form of a length variable looped shifter register chain, which is implemented in True-Single-Phase-Clock (TSPC) logic for high-speed power-efficient operation. The looped shifter register chain is composed of a fixed part of 8 registers and a variable part of 7 registers. For each generated pulse, after propagating through the fixed part, it may further pass through any number of registers between 0 and 7 in the variable part and be directly feedback to the first register of the fixed part through a 8:1 Mux. The 3-bit select of the Mux, which controls the length of the register loop, is updated for each of the trigger by a 3bit output of a 16bit Fibonacci Linear-Feedback-Shift-Register (LFSR). Running at a peak speed of 500MHz, the LFSR and its control logics are directly digital synthesized with standard CMOS gates.



Figure 2.11 System diagram of the dual-mode SAR-BS ADC at chip-level

Targeting at 4GS/s equivalent speed in CS-mode,  $\Delta t_{min}$  should be set as the inverse of the speed and thus 250ps. Since the time interval between any of the two consecutive operation points is from 8 to 15  $\Delta t_{min}$ , the physical operation time for a single measurement collection varies from 2ns to 3.75ns. Switching the system from CS-mode to NS- mode is implemented by removing the randomness from the sampling time points and triggers the ADC with only the fixed part of  $8\Delta t_{min}$ , which leads to a general purpose ADC with 0.5GS/s sampling speed.

For flexibility consideration, the signal recovery work is implemented off-chip. From equation (2.3) it is noted that the random matrix is also needed besides the measurement itself for recovery. In this design, an on-chip memory block serves as a high-speed buffer to cache both ADC's output and random sampling time intervals q. The read/write/reset operation of the memory block is controlled by the on-chip UART controller. In later stage, the recorded time intervals are utilized to re-built the random matrix, which provides critical information for recovery in CS-mode together with the recorded measurements.



Figure 2.12 Proposed self-timed pipeline SAR-BS ADC architecture

As stated before, this work focus more on mapping the CS theory to CS-equipped data converter from a circuit and system design's perspective. Proposing power-efficient high-speed architecture being able to follow both regular and randomness embedded irregular sampling and quantization is essential for such hardware mapping.

Fig.2.12 shows the detailed consisting building-blocks of the ADC core at system level. In this work, several new techniques at both architecture and circuit level are exploited to fulfill the request.

First, unlike conventional multi-phase driven pipeline scheme, the proposed pipeline operation is fully self-propagated. The initiation of one operation is pulse-triggered by the completion of its previous event, which makes it fully compatible with both of the two sampling and quantization schemes.

Second, at architecture level, a 5-bit comparator-interleaved SAR-ADC hybrids a 4-bit BS-ADC with 1-bit overlap for over-range correction. This architecture provides 1-bit/cycle power efficient two-step conversion and being able to absorb the inter-stage error into the 2<sup>nd</sup>-stage.

Third, passive-charge-sharing scheme is adopted for effective pipeline cycle acceleration

Fourth, Instead of conventional closed-loop based inter-stage residue amplification, open-loopamplifier is exploited here for power-efficient residue amplification.

The concept of self-timed pipeline logics at system level and its timing diagram are shown in Fig.2.13.

Each pipeline cycle is initiated by the corresponding pulse from the random-matrix clock generator. The pulse first serves as the sampling signal of the front-end Capacitor-DAC. After that, it triggers the ADC digital logic chains. In the 1st-stage SAR-ADC, the two comparators work in a time-interleaved way to relax the critical timing request for the decision/reset procedure for each comparison [34]. In such a way, the reset time of one comparator can overlap with the decision time of its counterpart, thus effectively shorten the loop delay for each conversion cycle. After each trigger, the comparators both automatically reset itself and send out the ready signal after the comparison decision has been made. This ready signal is sent to the control logic which generates the trigger signal for the next comparison and also kick the Finite-State-Machine (FSM) of the ADC into the next state. The latch bank gradually holds the comparison results for each cycle and flip the split Cap-DAC accordingly. After 5 cycles, the control logic holds the operation of the SAR-ADC. A binary-search ADC is a hybrid architecture evolved from both SAR and Flash ADC [35]. It keeps the same number of comparators as Flash



Figure 2.13 Proposed self-timed pipeline scheme

ADC while maintaining the 1bit/cycle characteristic of SAR ADC. Keeping energy-efficiency almost the same, a binary-search ADC trades the conversion speed with additional hardware cost compared with purely SAR or Flash architecture. In the 2nd stage BS-ADC, a delayed version of the random-matrix clock generator's pulse first resets and then sets the conversion clock of its 1<sup>st</sup> layer of the binary-search tree. The clock propagates through the 4-layer binary-search tree in a ripple-like form. For each of the layer, only one comparator is triggered. The selection of the next triggered comparator is based on the comparison result of the current cycle. For instance, for a 4-bit BS-ADC, a correct comparator triggering path for a voltage digitized as "13" will be: "8-12-14-13". The ready signal from the 4 triggered comparators will be combined. It serves for

two purpose: first, resets the Passive-Charge-Sharing (PCS) capacitor in front of the open-loop amplifier; second, informs the control logic that the BS-ADC has finished its conversion. Once having received the ready signals from both the SAR and BS-ADC, the control logic sets up the PCS capacitor sample signal and resets the split Cap-DAC, awaiting for the triggering of the next pipeline cycle.

A comparison between the conventional two-stage pipeline SAR-ADC and the proposed architecture is depicted in Fig.2.14.

In conventional approach, bottom-plate-sampling is adopted. After disconnecting the sampling, switch, the bottom of the sampling capacitors are first connected to ground and then starts the first comparison with the voltage at the other plate of the sampling capacitors. In the proposed architecture, top-plate-sampling is utilized and the first comparison starts directly at the same plate as sampling after disconnecting the sampling switch, thus speeding up the conversion.



Figure 2.14 Conventional two-stage pipeline SAR-ADC and proposed pipeline SAR-BS ADC

In conventional approach, after coarse quantization within the first stage, the voltage residue on the Capacitor-DAC is amplified with the inter-stage residue amplifier. The settled voltage is sampled by the 2nd-stage SAR-ADC for fine quantization. During the whole amplification procedure, the Capacitor-DAC of the 1st-stage ADC is unavailable for the next pipeline cycle until the residue amplifier is fully settled. For simplicity, supposing the closed-loop residue amplifier resembles a first-order system, the settling behavior can be described as:

$$V(t) = V_{final} (1 - e^{-\frac{t}{\tau}})$$
(2.10)

where  $V_{\text{final}}$  is the final settled voltage at the output of the residue amplifier,  $\tau$  is the time constant of the closed-loop system

$$\tau = \beta \frac{G_m}{C_{Ltot}}$$
(2.11)

where  $G_m$  is the trans-conductance of the residue amplifier,  $C_{Ltot}$  is the total loading capacitance at the output of the amplifier as:

$$C_{Ltot} = \frac{C_F C_{DAC}}{C_F + C_{DAC}} + C_L \tag{2.12}$$

where  $C_F$  is the feedback capacitor across the amplifier;  $C_{DAC}$  is the total capacitance of the Capacitor-DAC;  $C_L$  includes both the parasitic capacitance at the output and the sampling capacitance of the 2nd-stage ADC.

 $\beta$  represents the feedback factor, which is defined as:

$$\beta = \frac{C_F}{C_F + C_{DAC}} \tag{2.13}$$

From equation (2.10), it can be seen that a settling to within certain precision of the final value would need a corresponding time period. For instance, it needs around  $7\tau$  for a settling accuracy of 0.1%. This put the lower bound of the minimum time needed for inter-stage residue transfer. Increasing the settling speed will directly leads to a increase of the power consumption supposing same capacitor values and feedback factor according to equation (2.11).

From above, it is obvious that the settling time occupies the conversion period of both two-stages and severely delays the total pipeline cycle. Burning more power is necessary for higher conversion speed.

In the proposed approach, after coarse quantization, the Capacitor-DAC shares its residue passively with the Passive-Charge-Sharing capacitor and then is free for the next pipeline cycle. Unlike closed-loop residue amplifier based approach, in the proposed approach, the residue transfer time is decided by the Resistor-Capacitor time constant as:

$$\tau = \frac{1}{R_{switch}(C_{DAC} + C_{sample})}$$
(2.14)

where  $R_{switch}$  is the on-resistance of the charge-sharing switch and  $C_{sample}$  is the charge-sharing capacitor of the 2nd-stage. Comparing with equation (2.11), it can be seen that the time needed for inter-stage residue transfer is significantly shortened with proper sizing of the switch and charge-sharing capacitor and without any need of significantly pushing the power consumption up.



Figure 2.15 Waveform of 1st-stage ADC Cap-DAC and 2nd-stage ADC input in conventional and proposed architectures

Besides the previously mentioned speed acceleration by passive-charge-sharing, the pipeline cycle is further optimized at architecture level. Fig.2.15 shows the waveforms at the input of both of the two sub-ADCs for both the conventional approach and the proposed architecture. Unlike conventional architecture with SAR-ADC for both two-stages, the 2<sup>nd</sup> stage of the proposed architecture is a BS-ADC. As discussed before, BS-ADC trades-off the speed with hardware cost while maintaining the power efficiency. Compared with SAR-ADC, three factors improves the conversion speed of BS-ADC as the 2nd-stage in this design. First, there is no need for DAC switching after each comparison cycle since the reference voltages of each comparators are provided by the reference resistor ladder; second, the selected comparator can be directly triggered without waiting for the reset procedure from the previous cycle. Third, the sampling procedure is eliminated for the 2nd-stage since it is merged with the passive-charge-sharing procedure. Since the pipeline cycle is decided by the maximum conversion time of the two stages, a balance between them is critical. In this design, it is arranged that the settling period of the

open-loop amplifier is located in the pipeline cycle of the second stage while the passive-chargesharing is in the first stage. By properly design, the total time of open-loop-amplifier settling plus 4-bit BS conversion can be comparable to 5-bit SAR conversion plus passive-charge-sharing, which makes the pipeline cycle of the proposed two-stage architecture well-balanced and further shortened than conventional approach.





Figure 2.16 Circuit implementation of the comparator in SAR-ADC

The circuit implementation of several critical building blocks are discussed in this section.

The two comparators in the SAR-ADC is shown in Fig.2.16. Each of the comparator consists of two parts: a pre-amplifier and a strong-arm latch.

The pre-amplifier is with dual-differential input and resistor load. One of the input differential pair is for input signal while the other is for offset cancellation purpose. The input of the calibration differential pair is provided with tunable differential voltages. During calibration

phase, the signal input differential pair is shorted together while the tunable differential voltage sweeps from negative maximum to positive maximum. It is frozen when the latch output flip from low to high. In such a way, the offset voltage from the pre-amplifier as well as the strongarm latch are cancelled. The strong-arm-latch further amplifies the output voltage difference from pre-amplifier's output by positive feedback based regeneration and keeps the regenerated value with R-S latch.

To further accelerate the conversion for each cycle, a direct feed-through scheme is adopted in this design and is explained in Fig.2.17.

For a conventional self-timed SAR-ADC, one complete conversion cycle includes the following logic steps: comparison; ready signal generation; SAR status propagation and its related signals generation in the Finite-State-Machine; data storage in the flip-flops; comparator reset; DAC switching. Among them, the reset time has been eliminated by utilizing the interleaved two comparators architecture. However, the total time period summing up the remaining logic steps is still considerably long and thus severely degrades the maximum conversion speed. In this design, the remaining logic steps are further divided into two parallel parts. Instead of switching



Figure 2.17 Direct feed-through control logic

the DAC according to the output of the data storage flip-flops, the drivers of the DAC are controlled by a pair of CMOS latch bank, the input of which directly connect to the differential output of the Strong-Arm latch. The Capacitor-DAC architecture adopted in this design is Split-Capacitor type. Fig.2.18 explains the working principle of the direct feed-through control together with Split-Capacitor DAC.

In state n, a pair of split capacitor  $C_n$  needs to be switched according to the comparison result of this same cycle. The bottom plates of  $C_n$  are directly driven by the buffers which are cascaded with the latches. The input of the latches hooks to the output of the Strong-Arm-Latch. The initial output of the latches are set as high. At the beginning of the state n, the latches become transparent, while the differential output of the Strong-Arm-Latch has been reset to high. This generates the reference voltage of "1" and "0" at the bottom-plates of the Split-Capacitor pair, respectively, and leads to a "0.5" at the common top-plate if considering one such Split-Capacitor pair only.



Figure 2.18 Co-work of the direct feed-through logic with split capacitor-DAC

The Strong-Arm-Latch begins to make decision and is fully regenerated after certain delay. One of the output remains "1" while the other flips down to "0". This change directly feeds through to

the drivers of the bottom-plate and thus switches the DAC accordingly. The common top-plate will either goes up to "1" or down to "0" from the previous centre point of "0.5". At the end of state n, the latches become opaque and the switching of the DAC for the current cycle is fixed.



Figure 2.19 Passive-charge-sharing with open-loop-amplifier

On the other side, after comparison, the fully regenerated output of the Strong-Arm-Latch is detected by the ready detect, which generates the ready signal that kicks the status of the SAR Finite-State-Machine moving forward to the next state. The output of the Strong-Arm-Latch is then sampled with the D-Flip-Flop bank and sent out after 5 cycles.

From the discussion above, it is seen that by cutting the long logic steps into two parallel running ones, the conversion cycle time can be effectively shortened with acceptable hardware cost.

The Passive-Charge-Sharing sampler together with the open-loop amplifier is shown in Fig.2.19. Top-plate sampler is adopted here for high-speed and moderate resolution. The open-loop amplifier is a simple two-stage one with resistor load targeting at low-gain but high-speed. This amplifier serves for two purpose: first, it amplifies the residue voltage sampled at the Passive-Charge-Sharing sampler; second, it protects such floating voltage against the kick-back noise from the comparators of the 2nd-stage BS-ADC. The offset of the amplifier is calibrated by an additional differential pair. Also, the common-mode of the output stage needs to be aligned with the comparators of the BS-ADC. These two calibrations are combined by utilizing an analog MUX and an clocked auxiliary comparator. For offset calibration, the analog MUX selects the differential output of the amplifier and tunes the input voltage of the calibration differential pair until the auxiliary comparator flips. For common-mode calibration, the common-mode of the amplifier detected by resistor-divider is compared with the reference common-mode. The tail-current source of the amplifier's 2nd-stage is a current DAC and is digitally tuned with unit step until the auxiliary comparator's output change its polarity.

Building-blocks of the 2nd-stage BS-ADC is shown in Fig.2.20. BS-ADC can be considered as an unrolled SAR-ADC, so there are 15 comparators in total for 4-bit quantization. Each comparator has dual-differential input pair, one of which is directly connected to the output of the open-loop amplifier while the other is connected to the reference voltages. Due to the requirement of calibration, which will be discussed in the later stage, the reference voltage for each comparator is provided by the resistor-ladder and selected from a 16:1 analog MUX.



Figure 2.20 (a) Comparator and its related digital logics, timing and calibration circuits (b) ready generator for the whole BS-ADC (c) 15-to-4 priority encoder

According to the conversion cycles, the 15 comparators can be divided into 4 groups , which are "8","4,12","2,6,10,14","1,3,5,7,9,11,13,15", and correspond to 4 conversion cycles, respectively. In each conversion cycle, one comparator is selected from this cycle's corresponding group based on the comparison result from the previous cycle. In such a way, 4 comparators are triggered for the whole 4-bit conversion. The ready signal is generated by detecting the comparator's differential output with an XOR gate. The trigger clock for the next cycle is directly generated by utilizing the differential output of the comparator. Each comparator takes care of two comparators in the group of its next cycle. If the comparison result is positive, the comparator at the upper-side of its following group is triggered, and the lower-side is triggered if it is the opposite case. In calibration mode, however, each of the comparator needs to be triggered individually without the clock coming from its previous cycle group. To deal with such case, for

each of the comparator, the clock signal goes through a 2:1 MUX in order to decide whether it is triggered by the clock from its previous cycle or from the calibration clock generator.

After 4 conversion cycles, the ready signals from the last group (with comparator number of "1,3,...,15") are combined and passes through a pulse generator to provide the ready signal for the whole conversion.

The raw outputs of the 15 comparators need to be encoded before being combined with SAR-ADC's output. They are sampled by the ready signal for the whole conversion and then sent to a 15-to-4 priority encoder to generate the output.



Figure 2.21 Circuit implementation of the comparator in BS-ADC

The circuit implementation of the comparators in BS-ADC is shown in Fig.2.21. It is a dualdifferential input Strong-Arm-Latch without pre-amplifier. As discussed before, the inter-stage open-loop-amplifier serves as the common preamplifier for all of the 15 comparators.



Figure 2.22 Reference-voltage-fitting calibration

Calibration feasibility with the proposed architecture compared with conventional two-stage SAR-ADC is shown in Fig.2.22. Besides the calibrations in SAR-ADC's comparator and open-loop-amplifier, the inter-stage gain error and non-linearity bring by passive-charge-sharing and open-loop-amplifying is migrated by inter-stage reference fitting. For each of the 15 comparators in the BS-ADC, its reference voltage can be adjusted +2/-2 LSB around the original voltage with a step of <sup>1</sup>/<sub>4</sub> LSB. During calibration, a combined usage of the calibration capacitor in the Cap-

DAC and a 4-bit resistor DAC generates 15 calibration voltages from -7LSB to +7LSB, one corresponding to calibration of BS-ADC's each comparator. The correct reference voltage on the non-linear transfer-curve for each comparator is searched and fixed during this mapping procedure. Since the calibration voltages are generated on the signal path and experience the same passive-charge-sharing and open-loop-amplifying as normal input signal, the signal path imperfectness, and the comparator's offset of the BS-ADC, is analog pre-distorted through this calibration procedure [36].

## 2.1.6 Signal Recovery



Figure 2.23 Concept flow of signal recovery

The concept flow of signal recovery in CS-mode at system-level is shown in Fig.2.23. As discussed before, although the sampling is randomness-embedded, this randomness should be provided as critical information for the recovery process. The reconstruction of the random matrix  $A_{mxn}$  can be considered as a reverse process of the generation of the sampling time points. The numbers of unit minimum time intervals between each random sampling time positions are recorded in the on-chip memory block together with its corresponding quantized measurement in terms of the signal amplitude. In the recovery process, the sampling positions in each row of  $A_{mxn}$  is calculated with the provided number. This reconstructed matrix  $A_{mxn}$  is multiplied with inversed DFT basis matrix  $\psi_{nxn}^{-1}$  to get the random-sampling matrix  $\phi_{mxn}$ .  $\phi_{mxn}$  and the time domain measurements y are sent to following steps for recovery.

The target of recovery is to find the optimized signal estimation given the random matrix and the measurements. One of the effective recovery method is Orthogonal Matching Pursuit algorithm [31]. This algorithm incorporate least-square (LS) steps to compute a signal estimate of the original one. The concept flow of the algorithm can be briefly describe as following:

- 1. Initialize the residue  $r_0$ =y and the active index set as empty, set the iteration counter i=1;
- 2. Find the index  $\lambda_i$  by selecting the column with maximum correlation with the residue  $r_i$ .
- 3. Augment the index set and the matrix of chosen columns.
- 4. Solve the least square problem for a new estimation f<sub>i</sub> of the original signal f
- 5. Compute the new residue
- 6. Increment counter i until all significant coefficients have been found

#### 7. Retrieve the final estimate f



Figure 2.24 State-transfer-diagram of OMP-based recovery

A more detailed explanation of the basic steps taken OMP-algorithm as an example is shown in Fig.2.24, which follows the scheme proposed in [37]. Based on the data flow dependency, the

computation status can be divided into 7 major parts as shown. All the states are about how to solve the analytical solution to the least square problem.

At S\_corr, the column  $a_{\lambda i}$  in the sampling matrix  $\phi$  having the maximum correlation with the current residue  $r_i$  is find and added into the active index set  $\phi_t$ .

As shown in the equations from the 2nd line in Fig.2.24, the calculation of analytical solutions for the least square problem involves in reversing the matrix. Computational-friendly matrix reversing algorithm requires the matrix to be positive definite. For such a reason, as shown in the 3rd line in Fig.2.24, the transpose of  $\phi_t$  is multiplied by both side, making  $\phi_{Dt}$  the positive definite matrix.

At S\_ $\phi$ , since  $\phi_t$  is updated with the newly added column,  $\phi_{Dt}$  is also updated accordingly. The additional matrix elements, which are all related to  $a_{\lambda i}$ , are computed, respectively.

For matrix reversing,  $\phi_{Dt}$  needs to be decomposed into the production of three parts: the lower triangle matrix  $L_t$ ; the diagonal matrix  $D_t$ ; and the upper triangle matrix  $L_t^T$ . Instead of doing the decomposition each time  $\phi_{Dt}$  is updated, which is very computational exhausting, these three matrix are updated each time based on their values from the previous iteration. The state S\_L\_D involves in the necessary computations for the updating of these matrix, respectively.

With the updated three matrix  $L_t$ ,  $D_t$ , and  $L_t^T$ , the task of reversing  $\phi_{Dt}$  has been transformed into that of reversing  $L_t$ ,  $D_t$ , and  $L_t^T$  separately and calculate the production of the reversed three. The state S\_Inv is responsible for all the related computation. Due to their specific shapes, the reverse of these three decomposed matrix are relatively computational-friendly compared with directly reversing of  $\phi_{Dt}$ . After having  $\phi^{-1}_{Dt}$  on hand, the updated part  $d_i$  for signal estimation in the current iteration cycle can be calculated by multiplying it with the transposed matrix  $\phi^{T}_{t}$  and the residue  $r_{i-1}$  in state S\_d. The residue  $r_i$  for the current iteration cycle is updated in state S\_r by removing the production of  $\phi_t$  and  $d_i$  so as to eliminate the effect of newly added part of the signal estimation from the residue and ensure it is orthogonal to the active index set  $\phi_t$ .

At the end of the current iteration, the signal estimation  $f_t$  is updated. If all the major coefficients have been found,  $f_t$  will be retrieved as the final result; otherwise the iteration will continue.



Figure 2.25 Time and frequency-domain waveforms of a simulated recovery procedure

Simulated results of the signal recovery process as discussed above for a spectral sparse signal composed of 3 complex Fourier coefficients are shown in Fig.2.25. As the iteration goes up, the

three complex coefficients are computed gradually and the time-domain signal fully resembles the original one.

## 2.1.7 Experiment Results



Figure 2.26 Die micrograph of the dual-mode pipeline-SAR-BS ADC

The prototype is implemented in standard 65nm CMOS technology. The output signals are postprocessed, and recovered.

The die micrograph of the proposed dual-mode ADC is shown in Fig.2.26. It takes an active core area of  $260 \text{um} \times 600 \text{um}$ .



Figure 2.27 Measured DNL and INL performance before and after calibration

The DNL and INL performances are tested in its Nyquist-Sampling-mode as a general purpose ADC and shown in Fig.2.27. After calibration, the DNL is between -0.6 to +1.5 LSB and the INL is between -0.8 to +1.4 LSB.



Figure 2.28 Measured SNDR and SFDR performances as the input frequency changes in NSmode and CS-mode with post-processing and recovery

Fig.2.28 shows the SNDR and the SFDR performance in both NS-mode and CS-Mode across their corresponding frequency bands after post-processing and recovery. The SNDR is 40.2dB at low frequency and 37.1dB near Nyquist frequency in NS-mode. The SNDR is 36.2dB near the equivalent acquisition bandwidth in CS-mode.

The output spectrum of the ADC in its NS-mode with 0.73MHz input signal is shown in Fig.2.29. The SNDR is 40.2dB and the SFDR is 47.72dB

The post-processed and recovered output spectrum of the ADC in its CS-mode with 1941MHz input signal is shown in Fig.2.30. It shows a recovered SNDR of 36.2dB and SFDR of 47.3dB.



Figure 2.29 Measured output spectrum of an input signal of 0.73MHz in NS-mode with a decimation of 64



Figure 2.30 Recovered output spectrum of an input signal of 1941MHz in CS-mode



Figure 2.31 Recovered output spectrum of a 3% spectral sparse signal



Figure 2.32 Recovered output spectrum of a spectral sparse two-band AM signal

The capture of multi-tone input signals is demonstrated as Fig.2.31. For a 3% sparsity input signal composed of 30 randomly located active frequency bins over 1024 bins across 2GHz frequency band, the output signal is captured and recovered. It can be seen that each recovered active frequency bin resembles its original signal counterpart with good matching.

The acquisition of modulated signal with multiple carrier frequencies located aside each other is shown in Fig.2.32.

The input signal is a two-carrier spectral sparse AM signal. The two carrier frequencies are located at 404MHz and 1186MHz respectively. Each baseband signal is represented by 6 coefficients randomly located on DFT-basis bins within 30MHz.



Figure 2.33 Ideal and demodulated two baseband signals

The ideal and the demodulated baseband signals are shown in Fig.2.33. The time-domain recovery SNDR, which is defined here as the ratio of the normalized ideal signal amplitude over the normalized maximum deviation of recovered signal from ideal signal, are 37dB and 36dB, respectively.

The performance of the prototype and its comparison with state-of-art NS and CS-ADCs are summarized as shown in Table 2.1. A  $40 \times$  wider acquisition bandwidth is achieved compared with its prior art CS counterpart. It also shows significant power and area reduction compared with its NS-ADC counterparts by exploring CS for certain spectral sparse signals.

|                                                                                | This work       |        | [11]          |         | [38]     | [39]        | [40]        |
|--------------------------------------------------------------------------------|-----------------|--------|---------------|---------|----------|-------------|-------------|
| Channel Number                                                                 | 1-way           |        | 1-way         |         | 4-way TI | 1-way       | 2-way TI    |
| Sample-Mode                                                                    | Nyquist         | CS     | Nyquist       | CS      | Nyquist  | Nyquist     | Nyquist     |
| Resolution                                                                     | 8               |        | 10            |         | 8        | 6           | 4           |
| Sample-rate(GS/s)                                                              | 0.5             | 4*     | 0.0095        | 0.1*    | 4        | 4.1         | 4           |
| Power (mW)                                                                     | 10**            | 15***  | 0.55**        | 0.63*** | 76       | 76          | 20          |
| SNDR (dB)<br>(Low/Nyquist)                                                     | 40.2<br>/37.1   | -/36.2 | 57.6<br>/49.9 | -/55.9  | 31.2     | 31.2        | 24.1        |
| FoM (fJ/CS)                                                                    | 239             | 71     | 92            | 12      | 219      | 625         | 378         |
| DNL (LSB)                                                                      | -0.6/+1.5       |        | -1/+0.5       |         | -0.75    | -0.48/+0.49 | -0.18/+0.18 |
| INL (LSB)                                                                      | -0.8/+1.4       |        | -1/+1         |         | -1.5     | -0.74/+0.74 | -0.11/+0.11 |
| Active-Area (mm <sup>2</sup> )                                                 | 0.16            |        | 0.15          |         | 1.35     | 0.38        | 0.15        |
| Architecture                                                                   | Pipeline-SAR-BS |        | SAR           |         | Pipeline | Flash       | Flash       |
| Technology (nm)                                                                | 65              |        | 90            |         | 65       | 90          | 65          |
| * Equivalent sampling speed **Core consumption ***With random-matrix generator |                 |        |               |         |          |             |             |

Table 2.1 Summary of Performance Comparison of the Pipeline-SAR-BS ADC With State-of-Art

# 2.2 A 9-Bit Time-Digital-Converter-Assisted Compressive-Sensing ADC With 4GS/s Equivalent Speed



## 2.2.1 Time-Domain Signal Processing

Figure 2.34 Different information representation forms in time-domain

This section introduces an alternative approach for high-speed dual-mode ADC design.

Information can be represented not only in the form of voltage waveform, but also time waveform. As shown in Fig.2.34, information can be encoded into several different styles on the time axis. Pulse-Width-Modulation (PWM) encodes the information into the width of the pulse. For each cycle, the rising edge starts with a fixed time interval while the falling edge start time is decided by the encoded information. A more efficient version of PWM is Double-Pulse-Width-Modulation (DPWM), which doubles the data rate by encoding information into the start time of both the rising edge and the falling edge. Other time-domain encoding styles includes Pulse-Position-Modulation (PPM), which represents the encoded information by the position distance on the time-axis between the consecutive pulses; and Pulse-Frequency-Modulation (PFM), which embeds the information into the frequency variations of the pulses.



Figure 2.35 Corresponding building blocks for voltage-domain signal processing and timedomain signal processing

Devices that bridges the time-domain waveforms and their digital representatives are essential. As shown in Fig.2.35, Analog-Digital-Converter (ADC) and Digital-Analog-Converter (DAC) are the key interface components in voltage-domain measurement; while their corresponding counterparts: Time-Digital-Converter (TDC) and Digital-Time-Converter (DTC) quantize the time axis.

An one-on-one mapping of the building blocks can be made from the voltage-domain to timedomain in terms of their functionalities.

In voltage-domain, a DAC provides reference voltages corresponding to the digital codes. One of the very simple implementation is to equivalently divide the power-rail into many sections
with a resistor-ladder and switch the output to the specific section according to the digital code. In time-domain, the function of such resistor-ladder is provided by an inverter chain. Each inverter is one unit section as the resistor in voltage-domain. The variable time length output pulse is generated by a D-flip-flop. For each output, the input pulse first sets the D-flip-flop and then passes through the inverter chain. A MUX selects the output from which section of the inverter chain according to the digital codes, which plays the same role as the switches in the DAC. The selected output resets the D-flip-flop. In such a way, the digital code is represented in the time-domain in the form of a pulse with the corresponding time length.

An ADC quantizes the voltage-domain signals and provides its equivalent digital representations. One simply implementation of a Flash-type ADC includes a resistor-ladder-based DAC and  $2^{n}$ -1 comparators, where n is the resolution. The comparators compare the input signal with their reference voltages and gives out the results in digital domain. Similarly, a Flash-type TDC incorporates inverter chain based DTC, which has been discussed above, to provide  $2^{n}$ -1 time references. D-flip-flops serves as the time-domain comparators. For each comparison cycle,  $2^{n}$ -1 lgradually delayed input pulse triggers the D-flip-flops and samples its original non-delayed version. The width of the input pulse is thus quantized with a unit step of one inverter delay.

Although many similarities have been shared between DAC and DTC, as well as ADC and TDC, several unique characteristics of time-domain signal processing comparing with its voltage-domain counterpart worth noting:

Firstly, the voltage-domain headroom is limited by the supply voltage, which keeps shrinking. As the technology node goes from 0.35um into 28nm deep sub-micro, the supply voltage drops from 3.3V down to 0.9V, which leads to a SNR degradation of more than 20dB supposing the same noise level. However, as discussed before, time-domain signal processing quantizes the width of the pulse (in PWM case) instead of the amplitude. In other words, the voltage supply can be extremely low as long as the two states "0" and "1" are detectable in theory. Since there is no fundamental limitation on the length of a pulse width, time-domain headroom can be arbitrary high at the cost of longer conversion time. This potentially provides higher dynamic range trading off with lower conversion speed.

Secondly, deep sub-micron transistor provides very poor intrinsic gain around only 10, which is defined as:

$$A_{v} = g_{m}r_{o} \tag{2.15}$$

where  $g_m$  is the trans-conductance and  $r_o$  is the output impedance. This puts heavy burden on voltage-domain amplifier design. For instance, a 0.1% static error requires the amplifier to have at least 60dB DC gain. With intrinsic gain of 10, 3-stage cascading will be needed if cascode structure is not available due to limited headroom issue. The bandwidth of such multi-stage amplifier is limited and the frequency compensation will become very complicated.

On the other hand, the delay of inverter built by such transistors, which is decided by the onresistance and the parasitic capacitance, can be less than 10ps, providing fine resolution in timedomain.

In addition, basic components in time-domain signal processing are digital-intensive circuits as inverters, MUXes and D-flip-flops. The power consumption of such circuits dynamically scales as the operation frequency changes.

These characteristics make it attractive to incorporating time-domain signal processing into the dual-mode ADC design.

#### 2.2.2 System-Level Architecture



Figure 2.36 (a) System-level architecture of dual-mode hybrid SAR-TDC ADC (b) its timingdiagram

Although time-domain-signal-processing has its own unique advantages, one issue that needs to be considered is its interfacing with the voltage-domain input signal. Voltage-to-Time conversion needs to be provided at the boundary of the two different domains. Besides, the location of the boundary should also be chosen carefully so that advantages can be taken from both voltage-domain and time-domain. The system-level architecture of the hybrid voltage-time-domain dual-mode ADC is shown in Fig.2.36 (a) . Voltage-domain and time-domain take care of the coarse quantization and the fine quantization, respectively. This consideration is based on the fact that time-domain quantization can potentially provide fine quantization levels but the voltage-time-conversion is not very suitable for too wide the input signal amplitude range due to the inherent non-linearity of the input differential pair of the Voltage-Time-Converter (VTC). The voltage-domain front-end sub converter is a 5-Bit time-interleaved SAR-ADC as employed in section 2.1, and the back-end sub converter are with 1-bit redundancy for over-range error correction. The voltage-domain residue amplifier is replaced with a cross-domain VTC. The random-matrix clock generator and other auxiliary building-blocks are inherited from the previous design.

The timing diagram is shown in Fig.2.36 (b). Similar as the previous approach, the clocking system of the hybrid-ADC is designed to be fully self-propagated so that it can better accommodate the randomness embedded sampling and quantization. At the arrival of the non-uniform trigger, the Capacitor-DAC samples the input signal, after when the first conversion cycle of the front-end sub-ADC is initiated . The two interleaved ADC triggers each other with their comparison ready signals. These ready signals are received by the SAR control logic, which kicks the internal Finite-State-Machine to move forward and also latches the comparison results to flip the Capacitor-DAC for each cycle. After 5 conversion cycles, the passive-sampler in front of the VTC samples the residue on the Capacitor-DAC. The Capacitor-DAC is then reset by the SAR logic and waits for the next pipeline cycle. At the VTC, the sampled voltage residue is conveyed into time-domain as the delay difference between the rising edges of the differential output of the VTC. The completion of the voltage-to-time-conversion resets the passive-sampler

and the VTC waits for the next pipeline cycle. The two rising edges out from the VTC automatically trigger the conversion of its following TDC as the back-end fine quantizer. When complete, the TDC resets itself and ready for the next pipeline cycle.





Figure 2.37 Comparison between static amplifier and voltage-time-converter

In the previous approach, as shown in Fig.2.37, open-loop amplifier with static bias is employed for residue amplifying. The power consumption does not scale as operation frequency changes. The output voltage can be written as:

$$V_{out}(t) = V_{in}g_m R_L (1 - e^{-\frac{t}{\tau}})$$
(2.16)

where  $\tau$  is the time constant set by loading resistor R and parasitic capacitance C at the output node. The slope of the output voltage is:

$$\frac{\partial}{\partial t}V_{out}(t) = \frac{1}{\tau}V_{in}g_m R_L e^{-\frac{t}{\tau}}$$
(2.17)

In this approach, a voltage-time-converter is used. It charges the loading capacitor with the current difference proportional to the input. The output is connected to a pair of pseudo differential zero-crossing detector. A cross of the voltage threshold triggers its output to flip from low to high. When both of the detector's output goes high, the self-timed control logic disconnects the current source and resets the loading capacitors. In such way, the sampled voltage residue is transferred into the rising edge time difference between the outputs of the zero-crossing detector. The output voltage can be expressed as:

$$V_{out}(t) = \frac{V_{in}g_m t}{C_L}$$
(2.18)

where  $g_m$  is the trans-conductance of the input differential-pair and  $C_L$  is the loading capacitor. The slope of the output can be derived as:

$$\frac{\partial}{\partial t}V_{out}(t) = \frac{V_{in}g_m}{C_L}$$
(2.19)

the converted time difference is:

$$t_{2} - t_{1} = \frac{V_{threshold}C_{L}}{g_{m}} \frac{\Delta V_{in}}{V_{incm}^{2} - (\Delta V_{in}/2)^{2}}$$
(2.20)

where  $V_{threshold}$  is the voltage threshold set for the crossing point of the charging process, as shown in Fig.2.37.  $V_{incm}$  is the common-mode component and  $\Delta V_{in}$  is the differential component in the input signal. If  $\Delta V_{in} \ll V_{incm}$ , equation (2.20) can be simplified as:

$$\Delta t = \frac{V_{threshold} C_L}{g_m V_{incm}^2} \Delta V_{in}$$
(2.21)

In this design, the VTC is located after the first stage of the 5-Bit sub-ADC and process only the residue voltage after the coarse quantization. Such arrangement effectively shrinks the input voltage range of the VTC and thus can suppress the non-linearity presented in the converted time-domain signal. In addition, the calibration which will be discussed in the following sections will further alleviate this.

It is noted that a cross-domain gain can be defined as:

$$gain_{cross} = \frac{V_{threshold}C_L}{g_m V_{incm}^2}$$
(2.22)

By dedicated designing the value of  $C_L$  and  $g_m$ , the residue in voltage-domain can be tailored into a proper range of time differences for the following TDC to process in terms of both conversion speed and quantization resolution.

It is noted that there are several differences between the inter-stage residue transfer with voltagedomain amplifier and voltage-time-converter. First, the VTC is on only during the charging time period; while the voltage-domain amplifier is always on during the whole conversion period. Second, the power consumption of the VTC scales dynamically with its operation frequency, while the voltage-domain amplifier keeps constant. Also, considering the output voltage settling procedure. For voltage-domain amplifier, the settling involves both loading resistor and capacitor and the settling speed decreases as approaching the final voltage; VTC works more like a integrator, which deals with only loading capacitors with a constant settling speed, thus provides potentially advantages in terms of the speed and power-efficiency. In addition, additional flexibility for the design of the fine quantization is provided by co-design VTC and TDC so that the cross-domain gain can be utilized to trade off between conversion speed and resolution request of the quantizer.



Figure 2.38 Circuit implementation of the voltage-time-converter

The implementation of the VTC is shown in Fig.2.38. A low-power low-gain buffer is inserted between the top-plate passive-sampler and the VTC. Since the sampled residue voltage is floating during the voltage-time-conversion, the main purpose of the buffer is to prevent the charge from being ruined by the kicking back noise of the VTC.

For each pipeline cycle, the VTC is initiated by the non-uniform trigger and the following operations are fully self-timed.



Figure 2.39 Quantization principles of the Flash-TDC, Vernier-TDC and 2D-Vernier-TDC

Similar as that of ADCs, different architecture of TDCs lead to their different unique characteristics, as shown in Fig.2.39.

Flash-type TDC [41] is the proper candidate for high-speed conversion. As discussed before, a general flash TDC takes the absolute delay of one delay element as a quantization unit and makes decision based on the digital output of  $2^{n}$ -1 time-arbiters (as comparators in voltage-

domain ) for n-Bit resolution. A typical unit delay element is composed of two inverters so as to provide non-inverting delayed output. This limits the minimum achievable time-domain resolution to two inverters under the given technology node.

A Vernier TDC [42] breaks this limitation by utilizing the relative delay difference of two different delay elements instead of the absolute delay of a single delay element. For instance, a minimum achievable delay of a unit delay element is  $4\Delta t$  under certain technology node. In Vernier TDC, two delay lines are built, one fast-line of which is composed of elements with delay of  $4\Delta t$ ; while the other slow-line is composed of elements with delay of  $5\Delta t$ . For quantization, each delay element on the slow-line pairs with one delay element on the fast-line, and the output of the two are connected to one time-arbiter. The time difference of two pulses is measured by sending them into the fast-line and slow-line, respectively, and collecting the digital output of the time-arbiters. In such a way, the quantization resolution is enhanced from  $4\Delta t$  to  $1\Delta t$ . However, the total conversion time is decided by the time period that both of the two pulses have passed through the delay lines. In addition, the total number of delay elements is proportional to the power of 2 to the number of the quantization bits. Both the conversion speed and the power/area efficiency will be low if relatively large quantization bits is required.

A 2-Dimentional Vernier TDC [43] effectively alleviates the previous mentioned issues by pairing each delay element on the fast-line with multiple elements on the slow-line, thus significantly reduces the total number of delay elements needed and boosting the conversion speed. For instance, each delay element on the fast-line can pair with 2 delay elements on the slow-line:  $F_1/S_1$ -> $\Delta t$ ,  $F_1/S_2$ ->  $6\Delta t$ ;  $F_2/S_2$ -> $2\Delta t$ ,  $F_2/S_3$ ->  $7\Delta t$ ; ...; $F_5/S_5$ ->  $5\Delta t$ ,  $F_5/S_6$ ->10 $\Delta t$ . This reduces the total number of delay elements by half. However, this architecture has some potential issues. First, the quantization using both relative and absolute time difference. Relatively large transition jump would happen on those places where two consecutive quantization points using different number of elements on fast-line and slow-line.

Supposing the delay elements on the fast-line and slow-line obey Gaussian distribution with a mean and variation of  $t_{slow} / \sigma_{slow}$  and  $t_{fast} / \sigma_{fast}$ , respectively. As in the case shown in Fig.2.39, For  $1\Delta t$  -5 $\Delta t$ , the expression is:

$$k\Delta t = \sum_{i=1}^{k} (t_{slow_{i}} - t_{fast_{i}})$$
(2.23)

where  $t_{slow_i}$  is the absolute delay of the i<sup>th</sup> element on the slow-line and  $t_{fast_i}$  is that on the fastline. The accumulated variation from the ideal case at k<sup>th</sup> delay element is:

$$\sigma_k = \sqrt{k(\sigma_{slow}^2 + \sigma_{fast}^2)}$$

The worst-case time difference variation from the previous step within 1-Sigma range is:

$$t_{worst} = \sigma_{slow} + \sigma_{fast} \tag{2.24}$$

For  $6\Delta t$ - $9\Delta t$ , the expression is:

$$k\Delta t = \sum_{i=1}^{k-4} t_{slow_i} - \sum_{i=1}^{k-5} t_{fast_i}$$
(2.25)

The accumulated variation is:

$$\sigma_k = \sqrt{(k-4)\sigma_{slow}^2 + (k-5)\sigma_{fast}^2)}$$
(2.26)

And the worst-case time difference variation is the same as equation (2.24).

However, at the transition, which is from  $5\Delta t$  to  $6\Delta t$ , the time step can be derived as:

$$6\Delta t - 5\Delta t = \left(\sum_{i=1}^{2} t_{slow_{i}} - t_{fast_{1}}\right) - \left(\sum_{i=1}^{5} t_{slow_{i}} - \sum_{i=1}^{5} t_{fast_{i}}\right)$$

$$= t_{fast_{2}} - \sum_{i=1}^{3} \left(t_{slow_{i}} - t_{fast_{i}}\right)$$
(2.27)

and the worst-case time difference variation can be as large as:

$$t_{worst tran} = 4\sigma_{fast} + 3\sigma_{slow}$$
(2.28)

which may severely degrades the Differential-Non-Linearity (DNL) performance of the whole TDC.

In addition, this 2D implementation unavoidably leads to non-equivalent arbiter numbers at different delay-element's output nodes.

For instance, in a 4-Bit 2D Vernier TDC, the number of time-arbiters connected to each of the delay elements on the two delay lines can be listed as:

Fast-line: F<sub>1</sub>-F<sub>7</sub>: 3

Slow-line: S<sub>1</sub>:1; S<sub>2</sub>:2; S<sub>3</sub>-S<sub>5</sub>:3;

The unevenly distributed capacitance loading shifts the time grids away from their original positions on the 2D time mesh and introduce a systematic quantization error.



Figure 2.40 Locally-Readjusted 2D-Vernier-TDC

The Locally-Adjusted Folding 2-Dimentional Vernier TDC is shown in Fig.2.40. The fast-line and slow-line are composed of delay elements with delay of  $4\Delta t$  and  $5\Delta t$ , respectively. A 1-Bit time-folder converts the negative input time differences  $t_1$  and  $t_2$  into positive ones and thus reduces the number of necessary time arbiters by half. The time-folder is composed of a MUX and a time-arbiter. If the decision result of the arbiter is high,  $t_1$  and  $t_2$  keep the same; while the result is negative, the paths of  $t_1$  and  $t_2$  are swapped so as to ensure the rising edge time difference of the two pulses is always positive. To mitigate the issues in the previous discussion, a pair of time-adjuster is embedded inside each of the time-arbiter, providing one more degree of freedom to the time mesh. By calibration, the shifted grid points can be brought back to their original position.



Figure 2.41 (a) Circuit implementation of the time-arbiter (b) time-adjuster

The time-arbiter and time-adjuster are shown in Fig.2.41(a) and (b), respectively. The timearbiter consists of a pair of pseudo-differential input pair, a pair of back-to-back cross-coupled inverters and reset transistors. Compared with D-flip-flop-based arbiter, such architecture can ensure the symmetric of the two signal input branches to the output and owns higher sensitivity to the input time difference.

The voltage offset of the arbiter can be referred to the input and transferred into time-domain as:

where  $s_{rise}$  is the slope of the rising edge of the pulses. As shown in Fig.2.41 (a), similar as the case in voltage-domain, the converted time-offset may change the decision result of the two input pulses if the time difference of the two is within the time-offset range of the arbiter. Such offset can be calibrated together with the errors in the time mesh.

The time-adjuster plays an important role in the TDC. As discussed before, the errors on the time grid as well as the offset inside each time-arbiters need to be calibrated by the time-adjuster. It is a current-starving inverter followed by a general inverter. By adjusting the rising edge slope of the current-starving inverter, the flip time point of its following inverter varies accordingly, thus providing tunable delay to the input pulse.



Figure 2.42 (a) Replica-based coarse V-T mapping (b) 1-on-1 fine V-T mapping

Calibration is necessary in this hybrid converter architecture due to two main reasons. First, there need to be a scheme that maps the quantization levels in voltage-domain to that in time-domain; second, the non-idealities within the voltage-time-conversion and TDC need to be fixed.

For such considerations, a 2-fold coarse-fine mapping between voltage and time domain is implemented in an analog way, as shown in Fig.2.42(a)-(b).

First, a coarse voltage-time mapping is set up by calibrating the absolute delay of replica unit element of fast-line / slow-line to  $4\Delta t$  /5 $\Delta t$ , respectively. A input pulse with time difference of  $4\Delta t$ , which corresponds to 4lsb in voltage-domain, is provided for the replica unit element of

fast-line. The digital control code is searched until the corresponding time-arbiter changes its output result. The replica unit element of the slow-line is calibrated in the same way. After calibration, the digital control codes are used for the fast-line and slow-line.

In the second step , each time arbiter is locally readjusted by a 1-on-1 fine mapping from voltage to time domain so as to bring the quantization points to their proper positions. As shown in the previous analysis, the coarse voltage-time mapping build up the backbone of the time-domain quantization, but additional calibration is needed to alleviate the varies non-idealities and further improves the performance. In the fine mapping, each time grid on the 2D time mesh is calibrated. For instance,  $\Delta t$  time difference is provided for the first time-arbiter. After tuning the time-adjuster embedded within the time-arbiter, the calibration of the first one is complete. Then  $2\Delta t$  time difference is provided for the second delay element and go on, until all the 15 time-arbiters have been calibrated.

The procedure is self-calibration based. All of the time-domain calibration signals are generated by reconfiguring the front-end SAR-ADC's Capacitor-DAC and conveying through VTC.

In addition, the output code of SAR-ADC and TDC are adjusted in digital domain to alleviate the signal offset and asymmetry in the time folding procedure.

72

### **2.2.4 Experiment Results**



Figure 2.43 Die micrograph of the dual-mode hybrid SAR-TDC ADC

The prototype is implemented in standard 65nm CMOS technology. The chip is bare-die bonded on the PCB for testing. Fig.2.43 shows the die micrograph. It takes an active core area of 270 $\mu$  × 720 $\mu$ m.

The calibrated output spectrum of the hybrid ADC in its NS-mode is shown Fig.2.44. With an input signal at 0.73MHz, the SNDR and SFDR are 41.0dB and 45.1dB, respectively.



Figure 2.44 Measured output spectrum of an input signal at 0.73MHz with calibration



Figure 2.45 Measured SNDR and SFDR performances as the input frequency changes in NSmode and CS-mode with calibration, post-processing and recovery

The calibrated SNDR and SFDR performance of the ADC operating in both NS-mode and CSmode across the frequency band after prost-processing and recovery is shown in Fig.2.45. It achieves 41.0dB SNDR at low frequency in NS-mode and 34.2dB SNDR near 2GHz in CSmode.

|                                   | This work         | [11] | [39]  | [40]  |
|-----------------------------------|-------------------|------|-------|-------|
| Sample-Mode                       | CS                | CS   | NS    | NS    |
| Resolution                        | 9                 | 10   | 6     | 4     |
| Sample-<br>rate(GS/s)             | 4*                | 0.1* | 4.1   | 4     |
| Power (mW)                        | 17                | 0.63 | 76    | 20    |
| SNDR (dB)                         | 34.2              | 55.9 | 31.2  | 24.1  |
| FoM (fJ/CS)                       | 101               | 12   | 625   | 378   |
| Active-Area<br>(mm <sup>2</sup> ) | 0.194             | 0.15 | 0.38  | 0.15  |
| Architecture                      | Pipelined-SAR-TDC | SAR  | Flash | Flash |
| Technology (nm)                   | 65                | 90   | 90    | 65    |
| * Equivalent sampling speed       |                   |      |       |       |

Table 2.2 Summary of Performance Comparison of the Hybrid-ADC with State-of-Art

Table 2.2 summaries the performance of the prototype and its comparison with state-of-art CS and NS ADCs. With the proposed hybrid architecture, it achieves 40× increase of the acquisition bandwidth compared with its prior art CS-operation counterpart. It also shows significant energy-efficiency advantages over its NS-operation counterparts for certain sparse frequency-domain signal acquisition.

## Chapter 3 Design Techniques for Multi-Level Wireline Transmitters

# **3.1 A Capacitive-DAC-Based Technique For Pre-emphasis-Enabled Multi-level Transmitters**

The architecture of the transmitter side for wire-line communication can be considered more as the low-resolution high-speed DAC design. Based on different applications, the output impedance may require 50 ohm impedance matching, as in electrical I/O interconnects; current-driving capability, as in optical I/O interconnect with VCSEL-Modulator; or high-swing capacitive-driving capability, as in optical I/O interconnect with Ring-Resonant-Modulator.

Unlike wireless communication, of which complex modulation schemes are often adopted due to the limited allocated frequency band, wireline communication generally has no or simple baseband modulation due to its very broadband characteristics.

The dominant modulation scheme in wireline communication is Non-Return-to-Zero (NRZ), of which the digital baseband signal are presented as simple binary "0" and "1" when being sent into the channel. This leads to simplified implementation at both the transmitter-side and receiver-side as 1-bit DAC and 1-bit ADC, respectively. In addition, the two state binary signaling puts low request on the linearity for both the TRX side, which makes the usage of saturated driver/amplifier a proper choice in terms of speed, energy-efficiency and implementation complexity. However, as discussed in chapter 1, physical limitations put by the

Vias, PCB traces, etc. as well as the clocking distribution difficulties, severely limits the maximum achievable symbol rate. Multi-level signaling, such as PAM-4, instead of two-level signaling, is considered to be a proper candidate to fulfill the request of ever exponential growing I/O data-rate.

PAM-4 signaling can provide twice the data-rate compared to its NRZ counterpart at the cost of a shrink eye open amplitude of 9.5dB, which requires either higher output swing at the TX side or higher sensitivity at the RX side. In addition, both the driver and amplifier at the TRX sides need to be linear ones so as to properly maintain the shape of the multi-level signal. All these request additional implementation efforts for PAM-4 transceiver compared with NRZ transceiver.

Feed-forward-equalization makes the situation even more complicated.

Channel loss is one of the most critical issues that leads to Inter-Symbol-Interference (ISI) and thus degraded Bit-Error-Rate (BER) performance. The simulated S-parameter of a typical FR-4 trace at the length of 20 inch is shown in Fig.3.1(a). It can be seen that  $S_{21}$  decades monotonically across the frequency band and be with up to 17dB loss at 10GHz. Its corresponding time domain response for a pulse with 50ps width is shown in Fig.3.1(b). Long tail of the pulse spanning several Unit-Interval (UI)s is observed, which severely disturbs the decision results of its following symbol.



Figure 3.1 Simulated frequency response of a 20-inch FR-4 trace and its time-domain response of a 50ps width pulse

One of the most effective methods to alleviate the ISI issue at the TX side is to employ Feed-Forward-Equalization (FFE), or in other words, Pre-emphasis. The expression for the signal with k-tap FFE can be expressed as:



Figure 3.2 Simulated two-tap FFE pulses with different coefficients at TX side before and after passing through the 20-inch FR-4 trace

The coefficients  $a_0...a_{k-1}$  are adjusted so as to cancel the tail effect at x[n] introduced by its previous multiple intervals as x[n-1], x[n-2]...x[n-k+1]. The simulated two-tap FFE pulse waveforms at TX side before and after passing through the 20-inch FR-4 trace are shown in Fig.3.2 (a) and (b), respectively. By varying the coefficient of the post-tap  $a_1$  from 0% to 50% of the main tap  $a_0$ , it is observed that the interference can be effectively alleviated.



Figure 3.3 Simulated frequency-domain response of a two-tap FIR filter with different coefficients



Figure 3.4 Simulated channel response with a combine of FR-4 trace and TX-side FFE with different coefficients

Applying z-transform to both side of equation (3.1), it can be re-written as:

$$H(z) = \sum_{i=0}^{k} \alpha_k z^{-k}$$
(3.2)

And the frequency-domain mapping of equation (3.2) for a 2-tap case is:

$$H(f) = \alpha_0 + \alpha_1 e^{-j2\pi f T_s}$$
(3.3)

The frequency response of equation (3.3) is shown in Fig.3.3. It can be seen that the FFE boosts the gain at high frequency to compensate for the channel loss. The gain of the transfer function (3.3) achieves its peak at the Nyquist frequency, which is half the frequency of symbol-rate, and drops to its minimum at both DC and the frequency of symbol-rate. The shape of the transfer function is controlled by the amplitude ratio between the post-tap and the main-tap. As the ratio

goes up high, the boosted gain goes high but also with higher suppression as the frequency approaches DC or the frequency of symbol-rate.

Combining the channel characteristic with the FFE transfer function, the simulated frequency response of the whole system is shown in Fig.3.4. It can be seen that by properly selecting the tap-coefficients, the channel response can be effectively flattened with the help of FFE introduced gain-boosting.



Figure 3.5 Simulated 40Gbps PAM-4 signal eye-diagram after passing through a 20-inch FR-4 trace without equalization

The simulated time-domain eye-diagram of 40Gbps PAM-4 signal passing through the 20-inch FR-4 PCB trace is shown in Fig.3.5. Without equalization, the eye is fully closed. With post-tap equalization, all the three eyes open with sufficient margin at both vertical and horizon levels, as shown in Fig.3.6.

From the previous discussion, it is obvious that adjustable equalization is necessary at TX side to combat degraded transmission performance introduced by both the channel and the circuit. Due

to the request of multi-level signaling with amplitude adjustable multi-tap equalization, novel techniques are needed for high-speed power-efficient design. Several design techniques are discussed in the following sections to fulfill the demand.



Figure 3.6 Simulated 40Gbps PAM-4 signal eye-diagram after passing through a 20-inch FR-4 trace with post-tap equalization



#### **3.1.1 CML-Based Approach**

Figure 3.7 CML-based PAM-4 transmitter front-end

A CML-based PAM-4 transmitter front-end is shown in Fig.3.7. It is composed of two switching differential pairs with weighted bias current at a ratio of 2. Cascaded current source is employed to provide enhanced output impedance while the high-swing bias structure is dedicated to alleviate headroom issue. The output voltage can be written as:

$$v_o = \frac{1}{2} \times (msb \times 2I + lsb \times I) \times R_o \tag{3.4}$$

where 2I and I are the currents dumped to the loading resistor Ro, which is dedicated as the channel characteristic impedance for matching. To maintain both the switching pairs as well as the current sources in saturation, the minimum voltage at the drain of the switching pair should be no lower than  $2v_{dsat_{cs}} + v_{dsat_{sw}}$ , which leads to a limited output voltage swing of:

$$v_{swing} = 2 \times (vdd - 2 \times v_{dsat\_cs} - v_{dsat\_sw})$$
(3.5)

Equation (3.5) sets up the upper bound for (3.4). In addition, the input amplitude should provide enough amplitude to fully switch the current to either side of the differential pair but not too large to drive the differential pair and the current source into triode region. For such purpose, dedicated servo-loop and high-speed amplitude scaler/level-shifter [22] would be needed to tailor both the amplitude and the common-mod voltage of the CMOS logic to fit the request of CML input, which adds significant implementation burdens.

#### 3.1.2 Direct-Coupling Approach



Figure 3.8 Direct-coupling-based PAM-4 transmitter front-end

Voltage-mode-based transmitter serves as a better candidate in terms of the output swing. With inverter-based structure, as shown in Fig.3.8, the output may provide even rail-to-rail swing.



Figure 3.9 (a) Direct-coupling-based PAM-4 driver (b) the dependency of transistor's onresistance on output voltage

One straight-forward implementation of a voltage-mode PAM-4 transmitter is shown in Fig.3.9 (a). Two inverters are driven with 2-bit signals, respectively, while their outputs are tied together. The output for the center two levels can be written as:

$$\alpha_{1} \times v_{dd} = \frac{R_{n}(\alpha_{1} \times v_{dd}) / w_{n1}}{R_{n}(\alpha_{1} \times v_{dd}) / w_{n1} + R_{p}(\alpha_{1} \times v_{dd}) / w_{p2}} \times v_{dd}$$

$$\alpha_{2} \times v_{dd} = \frac{R_{n}(\alpha_{2} \times v_{dd}) / w_{n2}}{R_{n}(\alpha_{2} \times v_{dd}) / w_{n2} + R_{p}(\alpha_{2} \times v_{dd}) / w_{p1}} \times v_{dd}$$
(3.6)

where  $R_n(v)$  and  $R_p(v)$  are the unit size (1um/0.6um) on-resistance with drain voltage at v and  $w_p/w_n$  are normalized transistor sizes, respectively. As shown in Fig.3.9 (b),  $R_n(v)$  and  $R_p(v)$  are non-linear functions of v, which implies that merely scaling sizes of two inverters while keeping their P/NMOS ratio intact will not necessarily open 3-eyes evenly. Instead, each of the transistor size ( $w_p1,2, w_n1,2$ ) must be used as a tuning parameter to set the 4-levels appropriately while satisfying the fan-out requirement in terms of parasitic loading capacitance. However, such conventional approaches leave no room for pre-emphasis. Adding one pre-emphasis tap to the PAM-4 transmitter will increase the output voltage levels from 4 to 16. Since an individual tuning parameter is required for each voltage level due to circuit/channel non-linearity, adding another pair of inverters for the delayed PAM-4 tap will only provide an additional 4 parameters and thus cannot fulfill the design request.

Equation (3.7) shows the output voltage sensitivity in terms of the sizing of the PMOS and NMOS transistors, respectively. Fig.3.10 (b) shows that the NMOS size variation of  $\pm$ -12.5% will lead to  $\pm$ -20% variation to the output voltage. This relatively high sensitivity imposes stringent requests on the precise tuning of active device sizes.

$$\frac{\partial v}{\partial w_p} = \frac{\left(R_n(v) \times R_p(v)\right) / \left(w_n \times w_p^2\right)}{\left(R_n(v) / w_n + R_p(v) / w_p\right)^2} \times v_{dd}$$

$$\frac{\partial v}{\partial w_n} = \frac{-2R_n(v)^2 / w_n^3 - R_n(v) \times R_p(v) / w_n^2}{\left(R_n(v) / w_n + R_p(v) / w_p\right)^2} \times v_{dd}$$
(3.7)



Figure 3.10 (a) voltage variation for output middle-levels (b) the sensitivity of output voltage on the size of driver's transistor



Figure 3.11 The PAM-4 output voltage waveform and its corresponding current waveform with direct-coupling-based approach

Another issue encountered by the direct-coupling approach is its power consumption, which can be written as:

$$P = 0.25 \times \left(\frac{v_{dd}^{2}}{R_{n}(\alpha_{1} \times v_{dd}) / w_{n1} + R_{p}(\alpha_{1} \times v_{dd}) / w_{p2}} + \frac{v_{dd}^{2}}{R_{n}(\alpha_{2} \times v_{dd}) / w_{n2} + R_{p}(\alpha_{2} \times v_{dd}) / w_{p1}}\right) + \frac{f \times v_{dd}^{2} \times C_{par}}{2}$$
(3.8)

Unlike NRZ signaling, of which only one of the P/NMOS in the driver will be "On" at a time, both PMOS and NMOS will be "On" in PAM-4 signaling when the signal is "01" or "10", leading to static current flow during the entire unit symbol time period, as shown in Fig.3.11. This contributes to the first part in (3.8) and will dominate if small on-resistance is dedicated for high-speed switching. Applications with high supply voltage operation such as in optical interconnect will make the situation even worse.

#### 3.1.3 Resistor-DAC-Based Approach



Figure 3.12 Resistor-DAC-based PAM-4 driver

Fig.3.12 shows the Resistor-DAC-Based approach for PAM-4 transmitter design, of which the two middle voltage levels can be expressed as:

$$\alpha_{1} \times v_{dd} = \frac{R_{n1}(v_{ds\_n1}) + 2R}{R_{p2}(v_{ds\_p2}) + R + R_{n1}(v_{ds\_n1}) + 2R} \times v_{dd}$$

$$\alpha_{2} \times v_{dd} = \frac{R_{n2}(v_{ds\_n2}) + R}{R_{p1}(v_{ds\_p1}) + 2R + R_{n2}(v_{ds\_n2}) + R} \times v_{dd}$$
(3.9)

Compared with the Direct-Coupling-Based approach, this approach utilizes passive resistors for 2-bits weighting instead of relying on the non-linear on-resistance. However, as shown in equation (3.9), these on-resistances are still in series with passive ones and will contribute to the output voltage as:

$$\frac{\partial v}{\partial R_{n1}(v_{ds_n1})} = \frac{R_{p2}(v_{ds_p2}) + R}{\left(R_{p2}(v_{ds_p2}) + 3R\right)^2} \times v_{dd} \propto \frac{1}{R}$$
(3.10)

$$\frac{\partial v}{\partial R_{p2}(v_{ds_p2})} = \frac{R_{n1}(v_{ds_n1}) + 2R}{(R_{p2}(v_{ds_p2}) + 3R)^2} \times v_{dd} \propto \frac{1}{R}$$
(3.11)

Notice that the sensitivity to the non-linear part is inversely proportional to the value of the passive resistor. Fig.3.13(a) shows the simulated PAM-4 eye diagram at 30Gb/s with a size of 25.6um (12.8um) / 0.6um and 57.6um (28.8um) / 0.6um for N/PMOS of the MSB and LSB drivers, respectively; a resistor value of 200 ohm /100 ohm; and parasitic loading capacitance of 80 fF. Differences in vertical eye opening up to 50% are observed between the central eye and the upper/lower eye due to the relatively large on-resistance compared with that of passive resistors. Increasing resistor values to 800 ohm/ 400 ohm may reduce the vertical eye opening difference to 13% at the expense of closing the eye both vertically and horizontally, as shown in Fig.3.13(b).



Figure 3.13 (a) Eye-diagram of Resistor-DAC-based approach at 30Gb/s with R=100  $\Omega$  (b) with R=400  $\Omega$ 

The power consumption of the Resistor-DAC-Based approach is:

$$P = 0.25 \times \left(\frac{v_{dd}^2}{R_{p2}(v_{ds_p2}) + R_{n1}(v_{ds_n1}) + 3R} + \frac{v_{dd}^2}{R_{p1}(v_{ds_p1}) + R_{n2}(v_{ds_n2}) + 3R}\right) + \frac{f \times v_{dd}^2 \times C_{par}}{2} \quad (3.12)$$

The simultaneous conduction of P/NMOS still exists but is greatly reduced due to addition of passive resistors in series in comparison with the Direct-Coupling approach.

#### **3.1.4 Capacitor-DAC-Based Approach**



Figure 3.14 (a) Capacitor-DAC-based PAM-4 driver (b) its equivalent model (c) conceptual voltage and current transition waveform
Among major DAC types, both Current-DAC in CML-based transmitter and Resistor-DAC in voltage-mode transmitter were explored, while leaving Capacitor-DAC based approach unexplored in the past. As shown in Fig.3.14 (a), a PAM-4 driver with 2-tap pre-emphasis contains inverters for the main and the post-tap, respectively. Inside each tap, inverters which are driven by the MSB and LSB signals are coupled through capacitors with a scaling factor of 2. The pre-emphasis level is decided by the scaling factor of capacitors between the main and post-tap. The output can be AC-biased and directly drives the capacitive loading [16] in optical interconnect applications, or cascaded with buffers to fulfill the impedance matching request for electrical interconnect applications. The output voltage can be written as:

$$v = \frac{msb \times 2C_1 + lsb \times C_1 + msb_p \times 2C_2 + lsb_p \times C_2}{3C_1 + 3C_2} \times \frac{3C_1 + 3C_2}{3C_1 + 3C_2 + C_{par}} \times v_{dd}$$
(3.13)

There are several interesting aspects worth noting from equation (3.13). First, the impact of each digital bit to the output voltage is linearly weighted and combined according to its driving capacitor size, rendering the multi-level signaling transmitter with pre-emphasis easily implemented. Second, the on-resistance introduced non-linear inverter output is eliminated from the output voltage, rendering output levels determined solely by the matching of passive capacitors. Third, the output voltage swing can be regulated simply by selecting the ratio among capacitances between the DAC and the loading. Fig.3.15 (a) shows a simulated PAM-4 eye-diagram at 30Gb/s with DAC capacitances of 240fF and 120fF, respectively, and loading capacitor of 80fF with the same inverter sizes as in Section 3.1.3. Fig.3.15 (b) demonstrates a post-tap added with a 25% pre-emphasis, leading to 16 separated voltage levels at the eye-diagram. The output levels have now achieved faster settling speed and higher inter-level

linearity compared with that of Resistor-DAC-Based approach by using the same driver sizes and loading capacitances.



Figure 3.15(a) Eye-diagram of Capacitor-DAC-based approach at 30Gb/s with no pre-emphasis (b) with 25% pre-emphasis

The power consumption of the Capacitor-DAC-Based approach can be calculated as:

$$P = \frac{f \times v_{dd}^2}{2} \times \frac{(3C1 + 3C2) \times C_{par}}{3C1 + 3C2 + C_{par}}$$
(3.14)

It is noted that static currents which existed in previous approaches are eliminated in this approach. Fig.3.14 (b-c) shows a 2-tap transmitter transitioned from '00'-'00' to '00'-'11'. The charges associated with the Capacitor-DAC and the loading capacitor are redistributed through the on-resistance to sustain the voltage. Fig.3.16 shows simulated current waveforms conveyed

by both MSB and LSB drivers as the PAM-4 waveform varies. Since only dynamic current exists within a fraction of the unit interval (UI), the consumed energy of the Capacitor-DAC-Based approach is expected to be substantially lower than that of previous approaches.



Figure 3.16 The PAM-4 output voltage waveform and its corresponding current waveform with Capacitor-DAC-based approach

# **3.1.5 Circuit Implementation**

Fig.3.17 shows the driver design of the PAM-4 transmitter. As a proof of concept to the proposed Capacitor-DAC-Based approach, standard 1.2V supply is applied with no special circuit design for high voltage drivers. The Capacitor-DAC consists of three parts: the main-tap, the post-tap, and the scaling capacitor. The main-tap contains 64 unit capacitors, each with a size of 2fF, driven by a 64x driver and 32 unit capacitors driven by a 32x driver. Both the MSB and LSB of the post-tap are composed of a group of 4 binary-weighted capacitors and corresponding drivers, while each unit driver can be switched on/off separately for coefficient adjustment. It

provides up to 50% level pre-emphasis with a unit tuning step of 3.125%. For optical I/O applications with capacitive loading, the Capacitor-DAC's output can be directly connected to the pad; for electrical I/O applications, a scaling capacitor and a cascaded buffer is added. The scaling capacitor tailors the full swing output into a proper fraction to fit into the input range of the cascaded buffer. With proper designs, either a class-A source-follower with tunable biasing current or a class-AB self-biased inverter with segmented unit cells can serve as a linear buffer to fulfill the impedance matching request while facilitating the multi-level signaling.



Figure 3.17 A 2-tap pre-emphasis PAM-4 front-end with Capacitor-DAC-based approach



Figure 3.18 Capacitor-DAC-based PAM-4 transmitter architecture

The architecture of the transmitter is shown in Fig.3.18. The baseband signal is generated by a 32-way PRBS generator and MUXed up by using a 32:4 serializer. The clock generation unit is composed of cascading of the TSPC-DFF-based divide-by-2 chain and corresponding re-timers. The 4-way PAM-4 signals are delayed by 1-tap and sent into the 8:4 half-rate MUX to generate the full-rate data stream. This data stream then passes through the sign-selector, which provides pre-emphasis tap sign selection. Then the single-ended signals are converted into pseudo-differential version and drives the capacitor-DAC. The whole transmitter design is purely based on voltage-mode CMOS and TSPC logic without any CML or peaking inductors involved.

# **3.1.6 Experiment Results**



Figure 3.19 Die micrograph of the Capacitor-DAC-based PAM-4 transmitter

A prototype PAM-4 transmitter is designed and fabricated in standard 65nm CMOS technology and assembled on board for testing.

Fig.3.19 shows the die micrograph. The active core circuit takes up 150 $\text{um} \times 550$ um of silicon area.



Figure 3.20 (a) Measured eye-diagram at 20Gb/s (b) Measured eye-diagram at 25Gb/s

Fig.3.20 (a) shows a PAM-4 eye-diagram at 20Gb/s with 3 fully opened eyes larger than 38mV vertically and 0.6 UI horizontally. At 25Gb/s data rate as shown in Fig.3.20 (b), the vertical opening remains above 25mV and horizontal opening around 0.4UI.

|                               | This work | [22]  | [28]  | [20]   | [21]  |
|-------------------------------|-----------|-------|-------|--------|-------|
| Architecture                  | VM        | CML   | VM    | CML    | CML   |
| TX FFE                        | 2-taps    | DAC   | No    | 4-taps | No    |
| Data-Rate(Gb/s)               | 25        | 36    | 40    | 25     | 20    |
| mW/Gbps                       | 2.0       | 2.3   | 4.2   | 4.1    | 7.5   |
| Active-Area(mm <sup>2</sup> ) | 0.083     | 0.05  | 0.028 | 0.052  | 0.19  |
| Modulation                    | PAM-4     | PAM-4 | PAM-4 | PAM-4  | PAM-4 |
| Technology(nm)                | 65        | 28    | 14    | 90     | 90    |

Table 3.1 Summary of Performance Comparison of Capacitor-DAC-Based PAM-4 Transmitter with State-of-Art

Table 3.1 summarizes our prototype performance and benchmarked with other state-of-art PAM-4 transmitters. With the proposed technique, the silicon prototype is demonstrated with up to data-rate of 25Gb/s and an energy efficiency of 2mW/Gbps.

# **3.2 Transmitter Design With Pre-distortion and C2C-Based Architecture**

# 3.2.1 Pre-distortion-enabled transmitter



Figure 3.21 (a) RF transmitter with PAM-4 baseband modulator (b) non-linear input-output curve of RF transmitter (c) distorted and ideal PAM-4 eye-diagram

Besides general purpose I/O applications, the multi-level transmitter can also serves the role of baseband modulator for Carrier based communication system, either for wireline or for wireless.

General wireline system occupies signal bandwidth starting from DC. However, it is noted that this is not always the optimum choice in terms of speed as well as power-efficiency. The claim is based on the observation that the wireline channel response at certain frequency band will show sharp notch due to the existence of vias or other slots. Complicated equalization schemes at both transmitter and receiver side would be necessary to re-open the ISI-distorted eye. Instead, the Radio-Frequency-Interconnect (RFI) ( or so called Multi-Tone-Signaling) technique [44] targets to alleviate the issue by modulating the baseband signal up to certain carrier frequencies so as to avoid those severely distorted frequency bands. In other words, this technique slices the whole frequency band into pieces and sets up communications only within those suitable ones.

One other application of utilizing RFI technique includes picking up dedicated frequency window for certain specific material as wave-guide, such as holy-plastic fiber [45].

Besides, terahertz wireless communication has gained more and more attraction with potentially large communication bandwidth available. High speed baseband modulator is necessary for such applications.

For the applications mentioned above, non-coherent communication is a preferable choice compared to its coherent counterpart. The reason behind is that the carrier synchronization becomes extremely difficult when its frequency goes up high. The cost in terms of power consumption, chip area and implementation complexity would be very high if not all impossible. Instead, amplitude modulation and de-modulation based non-coherent scheme provides much simplified system architecture at both transmitter and receiver sides, thus leading to practical and robust solutions. Due to the ever growing data rate requirement, multi-level amplitude modulator is highly demanded.

The conceptual architecture of amplitude-modulated non-coherent transmitter is shown in Fig.3.21 (a). The signal is generated by modulating the output of a baseband PAM-4 transmitter on to the RF band with a carrier oscillator. The mixing procedure of the baseband and Local-Oscillator (LO) signals is provided by a mixer.

It is noted that the conversion gain of the mixer from baseband to the output is not ideally a linear curve, which is a general result of the non-linear gain of the MOSFET with different input voltage levels, as shown in Fig.3.21 (b). This will marginally affect the system if NRZ coding scheme is adopted in baseband since this two-point modulation is inherently linear. However, eye-distortion is observed when advanced multi-level modulation scheme such as PAM-4 is introduced. The reason behind is that the output swing of the baseband PAM-4 transmitter needs to be high in order to provide sufficient conversion gain. Under such situation, the evenly distributed 4 PAM-4 output levels v1-v4 will fully experience the non-linear region of the transfer-curve and result in un-evenly distributed 3 eye-openings, as shown in Fig.3.21 (c). The minimum SNR request at the receiver side is bounded by the worst-case eye-opening among all the output levels. The SNR expression at the receiver side for both evenly distributed eye-openings and distorted eye-openings can be written as:

$$SNR_{rx\_normal} = \frac{\frac{1}{\sqrt{2}} \frac{FS}{2^{n} - 1}}{\sqrt{\overline{N}^{2}}_{ISI} + \overline{N}^{2}_{Circuit}}$$
(3.15)  
$$SNR_{rx\_distorted} = \frac{\frac{1}{\sqrt{2}} \min(V_{i})}{\sqrt{\overline{N}^{2}}_{ISI} + \overline{N}^{2}_{Circuit}}$$
(3.16)

where FS is the full amplitude of the PAM-4 transmitter output; n is the number of output bits;  $\overline{N}_{ISI}$  is the square-root of ISI-induced noise;  $\overline{N}_{Circuit}$  is the additional electronic noise from both transmitter and receiver; V<sub>i</sub> is the amplitude of the i<sup>th</sup> eye-opening.

Assuming the same full amplitude, the distorted multi-level eye-openings lead to a shrunk SNR at the transmitter output. This, together with all the other non-idealities introduced by the channel and the circuitry itself, will severely degrade the BER performance at the receiver side.



Figure 3.22 (a) Mapping baseband input voltages on the non-linear transfer-curve (b) predistorted PAM-4 eye-diagram

One effective method to alleviate such issue is by pre-distortion. The transfer-curve of the whole transmitter system can be modeled as:

$$vo(i) = f_{mixer}(f_{pam-4}(i))$$
 (3.17)

where  $f_{pam-4}(.)$  is the transfer function from digital input to baseband PAM-4 transmitter output;  $f_{mixer}(.)$  is the transfer function from PAM-4 transmitter output to mixer output.

As discussed before, the non-linearity during the mapping from digital input i to  $v_0(i)$  is introduced by  $f_{mixer}(.)$ ; while  $f_{pam-4}(.)$  is kept as a linear mapping procedure. If  $f_{pam-4}(.)$  can be dedicatedly designed to reverse the mapping procedure of  $f_{mixer}(.)$ , the eye-openings at the mixer output can be evenly distributed. This approach can be expressed as:

$$f_{mixer}^{-1}(vo_{target}(i)) = f_{pam-4\_pre-distort}(i)$$
(3.18)

where  $vo_{target}(i)$  is the targeted linear eye-opening level at digital input i;  $f^{1}_{mixer}(.)$  is the reverse transfer function of the mixer;  $f_{pam-4\_pre-distort}(.)$  is the dedicatedly designed transfer function with pre-distortion.

The conceptual explanation is also shown in Fig.3.22. Instead of output four evenly distributed voltage levels v1-v4, the output of the PAM-4 transmitter is now v1,v2',v3',v4. v2' and v3' are the two pre-distorted voltage levels which reversely fit the non-linear transfer curve of the mixer. The middle-compressed eye-openings, as shown in Fig.3.22 (b), will become evenly distributed ones after passing through its following mixer stage.

## **3.2.2 Circuit Implementation**



Figure 3.23 Binary-to-thermal encoder

As the baseband modulator, the output of the PAM-4 transmitter connects to the gate of the mixer and the loading is mainly the parasitic capacitance at the gate of the input transistor. This is suitable for the application of the proposed Capacitor-DAC-Based multi-level transmitter architecture.

In the previous approach, the weighting of the MSB and LSB of the PAM-4 digital signal is represented by capacitors with value ratio of 2:1. However, merely adjusting the ratio between MSB and LSB can hardly fulfill the request of individually pre-distort each of the eye-openings. Instead, a 2-3 encoding should be performed so that each of the 3 eye-openings of the PAM-4 transmitter can be adjusted.

The circuit implementation and the true-table of the 2-3 encoder are shown in Fig.3.23. It can be seen that each of the three outputs out1,out2,out3, represents one of the 3 eye-openings, respectively. By adjusting their corresponding capacitor values in the Capacitor-DAC, the amplitude of each of the opened eye can thus be controlled.



Figure 3.24 Pre-distortion-enabled Capacitor-DAC-based PAM-4 transmitter front-end

The circuit implementation of the Capacitor-DAC-based transmitter with pre-distortion is shown in Fig.3.24. The Capacitor-DAC is divided into three equal sub-sections, resembling upper-eye, middle-eye and lower-eye of the PAM-4 transmitter output, respectively. Each sub-section is composed of 5-bit binary-weighted capacitors and their correspondingly sized inverter-based drivers for tuning purpose. Each inverter-based driver is connected to a 3-to-1 MUX. The 3 inputs of the MUX are the encoded three digital bits representing the PAM-4 signal. With such implementation scheme, any of the capacitor in the Capacitor-DAC can be connected with any of the three digital bits, which provides maximum flexibility in terms of pre-distortion and keeps the total output amplitude constant.



Figure 3.25 (a) Evenly distributed PAM-4 eye-diagram (b) with bottom eye stretched (c) with middle-eye stretched (d) with top-eye stretched

The simulated output eye-diagrams of the proposed pre-distortion-enabled PAM-4 transmitter at 20Gbps is shown in Fig.3.25. Fig.3.25 (a) shows the evenly distributed 3 eye-openings with a capacitor ratio among the three sub-sections as 31:31:31. A setting of the capacitor ratios from lower eye to upper eye as 47:23:23 leads to a stretched lower eye as shown in Fig.3.25 (b). Similarly, a setting of 23:47:23 and 23:23:47 will lead to the stretched middle-eye and upper eye, respectively, as shown in Fig.3.25 (c) and (d).

#### **3.2.3 C2C-DAC based architecture with equalization**



Figure 3.26 C2C-DAC architecture

The combining of pre-distortion capability with the Capacitor-DAC-Based transmitter has been successfully demonstrated in the previous section. One following challenging task is to merge the FFE into the existing architecture.

In section 3.1, the post-tap is combined with the main-tap through directly capacitor coupling and the coefficient is adjusted by tuning the binary-weighted capacitor values. In section 3.2.2, the three sub-sections are also directly capacitor coupled. The adjusting of each of the eyeopening amplitude is implemented by selecting the connected digital input bit of each of the binary-weighted capacitor. When directly merging these two techniques, the total number of binary-weighted capacitors will be considerably large. For instance, adding one post-tap FFE with 4-bits tuning into the existing pre-distortion enabled transmitter with 5-bit eye-opening level adjustment will lead to a directly coupled 9-bit Capacitor-DAC. This leads to a capacitor-ratio of 512 between the MSB and LSB, which severely complicates the driver sizing and routing of the signal lines, not to mention the exponentially increased total amount of capacitor value within the DAC.

Fortunately, it is not necessary to limit the Capacitor-DAC architecture to solely direct binaryweighted one. Instead, the C2C DAC architecture can be introduced. By its simply but elegant recursive characteristic, the implementation of the transmitter can be much more compact, faster and power-efficient.

A typical C2C DAC is shown in Fig.3.26. For a n-bit approach, each bit is composed of a unit capacitor C driven by a unit driver. In between each of the two neighboring bits, a capacitor with value of 2C bridges the unit capacitor C of these two bits.

The output voltage of the C2C DAC can be written as:

$$vo(i) = \sum_{i=0}^{n-1} \frac{D[i]}{2^{n-i}} \times vdd$$
(3.19)

One specific characteristic of the C2C structure is that the equivalent capacitance is constant at the output node no matter how many bits are employed. A simple prove can be derived as following. Starting at the very left-side node of the DAC, the total capacitance of the grounded unit C in parallel with unit C for D[0] is 2C. This 2C is in serial with the bridge 2C and the total equivalent capacitance is C. This equivalent capacitance again in parallel with the unit C for D[1]. The calculation repeats until to the very right-side node as the output.



Figure 3.27 Pre-distortion-enabled C2C-based PAM-4 transmitter front-end

Replacing the unit capacitor 'C' in Fig.3.26 with the binary-weighted Capacitor-DAC in Fig.3.24 leads to the pre-distortion enabled PAM-4 transmitter with one-tap FFE, as shown in Fig.3.27. As discussed before, 5-bits tuning of the 3 eye-opening amplitudes leads to a binary-weighted sub Capacitor-DAC of 93 unit capacitors. Four of such sub Capacitor-DACs are recursively

cascaded with inter-stage bridging capacitor of 186 unit capacitors, leading to a main-tap and a 3-bit tunable post-tap.

Compared with directly binary-weighted approach, this C2C-DAC architecture based approach owns the following advantages.

First, it effectively reduces the maximum-to-minimum capacitor and driver size ratio from 512 to 16, which simplifies the corresponding design and routing issues.

Second, the total capacitance at the output node is only 186 unit capacitors and will not change even with additional tuning bits of the post-tap, which leads to a significantly reduction of the total capacitance value assuming same unit capacitor value as in directly binary-weighted approach.

In addition, the highly structured architecture leads to compact layout design.



Figure 3.28 Top-level architecture of pre-distortion-enabled C2C-based PAM-4 transmitter

The transmitter architecture is shown in Fig.3.28. The half-rate digital signal msb[1:0] and lsb[1:0] are delayed by 1 UI to generate both the half-rate main-tap and post-tap signals. The half-rate signals pass through 8:4 serializer. After muxing up, the two tap full-rate signals enter the 2-3 encoder to split for pre-distorting the 3-level eyes. In the front-end DAC design, there are 3-bits tuning for the post-tap. The tuning is provided as selecting either main-tap or the post-tap as the input signal for all the capacitors within that tuning bit. After tap-selection, the sign of the post-tap is selected according to the equalization request. The signals are converted into differential version and drives the front-end C2C Capacitor-DAC.



Figure 3.29 (a) Eye-diagram of the proposed C2C-based PAM-4 transmitter after passing through an 8-inch FR-4 trace with no equalization (b) with equalization

To verify the effectiveness of the FFE, the output of the Capacitor-DAC is further scaled passively and cascaded with a source-follower-based buffer before entering the channel. Fig.3.29 (a) shows the simulated PAM-4 eye-diagram at 20Gbps after passing through an 8-inch FR-4 PCB trace. It can be seen that the eyes are partially both vertically and horizontally due to the ISI induced noise. By enabling the post-tap, the performance of the PAM-4 eyes is significantly boosted, which demonstrates the feasibility of the proposed pre-distortion and FFE enabled transmitter with C2C DAC based architecture.

# **3.3 A 34Gbps Voltage-Mode PAM-4 Wireline Transmitter With 2-Tap Feed-Forward-Equalization Utilizing R2R-Based Architecture**

## **3.3.1 Revisited Resistor-DAC-Based Approach**



Figure 3.30 (a) Resistor-DAC-based PAM-4 transmitter front-end w/o equalization (b) w/i two-tap equalization

In Section 3.1.3, PAM-4 transmitter based on Resistor-DAC has been discussed in terms of its trade-off between speed and linearity. This section will provide analysis more from a view of implementation complexity.

Fig.3.30 (a) shows a voltage-mode PAM-4 transmitter front-end with Resistor-DAC-based approach and no equalization-tap. Adding tunable equalization taps will bring significant design

complexity to such architecture. Fig.3.30 (b) shows the transmitter front-end with 2-tap FFE. Each unit slice is composed of the following components: 2 unit inverters and one unit resistor for the msb; 1 unit inverter and two unit resistor for the lsb; two muxes for coefficient adjustment. For 3-bit post-tap tunability, 8 such slices will be necessary. In addition, supposing the post-tap covers 50% amplitude of the main-tap msb at maximum, the main-tap needs to be composed of 24 unit slices. These, in total, leads to 96 unit resistors, 96 unit drivers and 64 unit muxes for the transmitter front-end. This huge amount of unit components will bring several issues in terms of both circuit design and layout. First, the loading capacitance of the pre-driver will be unavoidably large since there are 24 muxes connected to the main-tap pre-driver and 8 connected to the post-tap. Second, the total layout of the front-end will occupy a considerable area, which makes the routing of the critical full data-rate signal wires relatively long. Additional repeaters along the wire will be needed to maintain the driving capability. In addition, long wires to connect each slice inside the front-end will bring additional parasitic resistance, which may change the contribution ratio of each slices to the output voltage.

#### **3.3.2 R2R-DAC-Based Approach**

In section 3.2, C2C-DAC-based approach is proposed to simplify the transmitter front-end architecture. Although Capacitor-DAC-based approach owns numerous advantages as discussed before, it unavoidably needs 50 ohm output buffers when applied to electrical I/O applications. And such buffer may lead to a shrink of the available output voltage swings. With tolerance of higher power-consumption, the Resistor-based approach provides the possibility of high swing output with direct 50 ohm impedance matching. And as a dual counter-part of C2C-DAC, R2R-DAC-based approach is considered here as an alternative and more compact implementation of voltage-mode multi-level transmitters utilizing passive resistor for direct impedance matching.



Figure 3.31 (a) Proposed R2R-DAC-based 2-tap FFE enabled PAM-4 transmitter front-end (b) extend the architecture to N-tap equalization

As shown in Fig.3.31(a), for a 2-bit case, instead of connecting msb and lsb driver with 2:1 rated resistors and then directly tying them all together at the single output node, now both the msb and the lsb driver are connected to the resistor with same value of 2R, and each other side of the two resistors is bridged by a inter resistor with a value of R. It is noted that the structure can be recursively extended beyond 2-bit by cascading such R/2R sections. Due to its well-known elegant recursive characteristic, the equivalent input resistance will remain as 2R no matter how

many sections are cascaded, which brings convenience for signal channel matching. The output voltage for a n-bit cascading can be written as:

$$v_o = \frac{1}{2} \sum_{i=0}^{n-1} \frac{1}{2^{n-i}} \times d[i]$$
(3.20)

To construct the voltage-mode PAM-4 transmitter front-end with the similar functionality as in Section 3.3.1, an 8-bit R2R DAC architecture can be employed by assigning d[7] and d[6] for the msb and lsb of the main-tap, and d[5]-d[0] for the post-tap with 3-bit <u>tunability</u>. Compared with the previous approach, only 26 unit resistors, 8 unit drivers and 8 <u>muxes</u> are needed, which effectively simplifies the transmitter front-end structure and thus leads to a significant relieve on the design burden in terms of both circuit and layout.

The architecture can be further expanded into a multi-tap FFE enabled one by embedding the sub-Resistor-DAC into the R2R architecture. As shown in Figure.3.31(b), for N additional taps, each 2R part in Figure.3.31(a) can be split evenly into 2N sub-section, N sub-sections of which is composed of a resistor value of  $3R \times N$  and the other N sub-sections is composed of a resistor value of  $3R \times N$  and the other N sub-sections is composed of a resistor value of  $6R \times N$ , Each sub-section is with a driver and its dedicated mux for coefficient adjustment. The output voltage can be expressed as:

$$v_o = \frac{1}{2} \sum_{i=1}^{m} \frac{1}{2^i} \sum_{k=0}^{n-1} \frac{1}{n} (msb[k] + \frac{1}{2} \times lsb[k]) \times tap\_sel[i][k]$$
(3.21)

where m is the number of bits for coefficients adjustment, tap\_sel[i][k] decides whether the i-<u>th</u> bit of tap-k should be connected to the signal or ground, msb[k] and lsb[k] represents the 2-bit k-<u>th</u> tap input signal.

Fig.3.32(a) shows the R-C circuit model of the R2R front-end for a 2-bit case.  $C_{p0}$  and  $C_{p2}$  models the parasitic capacitance at both driver's output as well as one side of the resistor;  $C_{p1}$  and

 $C_{p3}$  models the parasitic capacitance at the junction node of the cascaded R-2R sections, and  $C_L$  models the total parasitic capacitance at output node. By employing zero-value-time-constant analysis, the bandwidth can be estimated as:

$$\omega_{-3dB} = \frac{1}{\sum_{i=0}^{n-1} \frac{1}{\tau_i}} = \frac{1}{\sum_{i=0}^{n-1} R_{eqi} \times C_i} \approx \frac{1}{R \times C_L}$$
(3.22)

where  $R_{eqi}$  is the equivalent resistance at node i. It is noted that while the equivalent resistance at each node is approximately the same as R due to the recursive structure of R2R front-end, the parasitic capacitance at output node is orders larger than all the other nodes due to a combined affection from pad, bonding and channel. Besides FFE, circuit techniques such as T-coil peaking [28] is compatible with such structure and can be explored to further extend the bandwidth.



Figure 3.32 (a) R-C model for speed analysis (b) for power analysis (c) for linearity analysis

As in Fig.3.32 (b), both static current consumption and dynamic current consumption exist in the PAM-4 transmitter front-end. One branch of the static current flows through the loading resistor, while the other flows through the R2R network to build up the corresponding voltage at each junction node. The dynamic part is mainly composed of the charging/discharging of node parasitic capacitance when its voltage changes. The power consumption for the 2-bit case can be estimated as:

$$P \approx \frac{1}{4} \left( \frac{29 \times vdd}{64 \times R} + \frac{20 \times vdd}{64 \times R} + \frac{21 \times vdd}{64 \times R} \right) \times vdd + C_L \left( \frac{vdd}{2} \right)^2 \times f$$
(3.23)



Figure 3.33 (a) non-lineary eye-diagram with stretched middle-eye (b) non-linear eye-diagram with compressed middle-eye (c) RLM as driver's on-resistance varies (d) RLM as the ratio between on-resistance and 2R varies

One another issue worth noting is the on-resistance introduced non-linearity to the output eyediagram. As shown in Fig.3.32 (c), the on-resistance of the driver is in serial with the 2R part and unavoidably affect the output voltage. Supposing the P/NMOS of the driver is properly sized so that both the pull up/down on-resistance are  $R_p$ , the output can be written as:

$$v_{o} = \frac{msb \times (12R^{2} + 3R_{p}R) + lsb \times (6R^{2} + 3R_{p}R)}{32R^{2} + 15R_{p}R + R_{p}^{2}} \times vdd$$
(3.24)

And the sensitivity of the output voltage in terms of  $R_p$  can be derived as:

$$\frac{\partial v_o}{\partial R_p} = \frac{msb \times (-24R^2R_p - 84R^3 - 3RR_p^2) + lsb \times (-12R^2R_p + 6R^3 - 3RR_p^2)}{(32R^2 + 15R_pR + R_p^2)^2} \times vdd$$
(3.25)

Supposing the resistor in serial with  $R_p$  is  $R_c$ . It can be seen from equation (3.24-3.25) that a mismatch of Rp+Rc from 2R will lead to distorted eye diagram at output node. Fig.3.33 (a)-(b) shows the simulated eye-diagrams based on R-C circuit models with R=25 $\Omega$ , Rc=40 $\Omega$ , and Rp set to 0 $\Omega$  and 20 $\Omega$ , respectively. It demonstrates that if the total resistance > 2R, the middle-eye will be compressed, while it will be stretched in the opposite case. One metric for quantitative non-linearity measurement is ratio-of-level-mismatch (RLM) [46], which is defined as:

$$RLM = 3 \times \frac{v_{\min}}{v_{pk-pk}}$$
(3.26)

where  $v_{min}$  is the smallest eye-opening of the three eyes while  $v_{pk-pk}$  is the amplitude sum of them. Targeting at 50 $\Omega$  impedance matching, Fig.3.33 (c) shows the simulated RLM when varying R<sub>p</sub> +/-100% around its pre-set ideal value of 10 $\Omega$  while keeping Rc=40 $\Omega$ . As expected, RLM reaches its perfect matching when R<sub>c</sub>+R<sub>p</sub>=2R and decreases monotonically when R<sub>p</sub> deviates from its ideal value. Fig.3.33 (d) demonstrates a monotonic decrease of the RLM when keeping the same 50% deviation ratio of R<sub>c</sub> from its pre-set ideal value but changes the relative ratio of R<sub>c</sub> to 2R from low to high. These indicate that the sizing of the driver and the resistor should be purposely co-designed in order to improve the RLM. Larger driver size helps to alleviate the eyedistortion caused by its on-resistance variation.

An alternative method to mitigate the effect of on-resistance variation on output linearity can be by employing feedback. As shown in Fig.3.34, inverter with both P/NMOS cascading can serve as the driver. The gate voltages of its top and bottom transistors can be provided by the replica regulated block. Two cascaded NMOS transistor with the same size of those in driver is loaded by a resistor with value  $R_{inv}$ . The output node is connected to an Op-amp, of which the other terminal is connected to half the vdd as a reference voltage. Through negative feedback, the gate bias voltage  $v_{bn}$  can be generated so as to ensure the on-resistance of the cascade NMOS transistors is the same as  $R_{inv}$ .  $v_{bp}$  is generated in the similar way by employing PMOS-style replica-biasing. By choosing the value of  $R_c$  as  $2R-R_{inv}$ , the driver on-resistance introduced nonlinearity can be mitigated and this tracks PVT variations. One side effect of this technique is that such cascading unavoidably degrades the speed of the driver, larger device size would be in need to compensate for this.



Figure 3.34 Regulate on-resistance through replica-biasing

# 3.3.3 Circuit Implementation



Figure 3.35 R2R-DAC-Based Transmitter Architecture

Fig.3.35 shows the transmitter architecture. An on-chip double-edge-triggered PRBS generator provides 32 channel baseband data flows, which are MUXed up by a two clock-phase 32:4 serializer. A cascading of 4 TSPC-DFF-based divide-by-2 chain and corresponding re-timers provides clocking for the PRBS generator and the serializer. To implement 2-Tap FFE, both the original and 1-tap delayed PAM-4 signals pass through the front-end 8:4 half-rate MUX to

generate the full-rate data stream. The last digital block, which is single-to-differential converter, provides the complementary differential signals to drives the R2R-DAC. As discussed before, 2 msbs of the 8-bit DAC are assigned to the main-tap, while the remaining bits are for the post-tap with 3-bit coefficient tunability. The driver and resistor network are carefully co-designed to meet the channel matching characteristic and also maximize the RLM performance. Based on the proposed R2R architecture, the transmitter is digital-intensive with CMOS and TSPC dynamic logic only, and there is no need for any level-shifting or voltage-scaling between different logic boundaries.

## **3.3.4 Experiment Results**



Figure 3.36 Die micrograph of the R2R-DAC-Based PAM-4 transmitter

To verify the proposed architecture, a PAM-4 transmitter is designed and fabricated in standard 65nm CMOS technology and tested with bare-die on board assembling.

Fig.3.36 shows the die micrograph. The core digital part takes up an active area of  $170 \text{um} \times 350 \text{um}$  while the silicon area consumed by the R2R-DAC is  $50 \text{um} \times 340 \text{um}$ .



Figure 3.37 Measured eye-diagram at 20Gb/s w/o equalization

Fig.3.37 shows a single-ended PAM-4 eye-diagram at 20Gb/s with no equalization. The 3 eyes are fully opened with a total peak-to-peak differential voltage swing of around 760mV and 0.6UI horizontal sampling window.



Figure 3.38 (a) Measured eye-diagram at 34Gb/s w/o equalization (b) w/i post-tap equalization

At 34Gb/s data rate as shown in Fig.3.38 (a), the eyes are fully closed due to the severe ISI. Fig.3.38 (b) demonstrates that by employing post-tap equalization, the 3-eyes are re-opened with a vertical opening around 80mV for each and the horizontal opening around 0.34UI.

Table 3.2 summarizes the prototype performance and benchmarked with other state-of-art PAM-4 transmitters. Implemented in 65nm CMOS technology, it shows competitive power/speed performance with designs in much more advanced technology nodes.

|                               | This work | [22]  | [28]  | [20]   | [21]  |
|-------------------------------|-----------|-------|-------|--------|-------|
| Architecture                  | VM        | CML   | VM    | CML    | CML   |
| TX FFE                        | 2-taps    | DAC   | No    | 4-taps | No    |
| Data-Rate(Gb/s)               | 34        | 36    | 40    | 25     | 20    |
| mW/Gbps                       | 2.7       | 2.3   | 4.2   | 4.1    | 7.5   |
| Active-Area(mm <sup>2</sup> ) | 0.077     | 0.05  | 0.028 | 0.052  | 0.19  |
| Modulation                    | PAM-4     | PAM-4 | PAM-4 | PAM-4  | PAM-4 |
| Technology(nm)                | 65        | 28    | 14    | 90     | 90    |

Table 3.2 Summary of Performance Comparison for R2R-DAC-Based PAM-4 Transmitter With State-of-Art

# **Chapter 4 Conclusion**

This dissertation focuses on exploring several novel design techniques for high-speed powerefficient data interfaces, involving data converters conveying information both from analog to digital domain and also digital to analog domain.

In the first part, the design of dual-mode ADCs supporting both Nyquist-Sampling and Compressive-Sensing are discussed. As a newly developed yet powerful signal processing framework, compressive-sensing theory and its related hardware implementations have attracted extensive attention from both the signal processing community and the circuit design community. Benefiting from its graceful utilizing of signal sparsity by embedding randomness into the sampling procedure, compressive-sensing can effectively reduce the total number of measurements and robustly recover the original signal by employing related DSP techniques. Such unique characteristic may lead itself into a broad range of potential applications such as communication and biomedical related fields. Rather than being a direct replacement of the general purpose Nyquist-Sampling framework, the Compressive-Sensing framework is believed to play the role as an alternative application-oriented tool in the signal processing toolkit. Instead of diving into the theory development, this work is more of focusing on developing ADC architectures and circuit design techniques that can effectively mapping the theory into the real world implementations from a hardware designer's perspective. For such purpose, two experimental hardware prototypes are designed.

The first one is a 8-Bit dual-mode ADC with 0.5GS/s Nyquist-Sampling speed and 4GS/s equivalent Compressive-Sensing speed for certain spectral sparse signals. A fully self-timed

127
pipeline scheme is dedicatedly designed to brace the randomness embedded sampling and quantization procedure. A passive-charge-sharing technique together with open-loop-amplifier is adopted for improving inter-stage residue transferring speed and power-efficiency compared with its conventional closed-loop counterpart. Also, a hybrid two-stage SAR-BS architecture is proposed to effectively accelerate the pipeline cycle and also absorb the inter-stage gain-error and non-linearity by employing corresponding calibrations.

An alternative approach is demonstrated with the 2nd prototype. In there, a hybrid architecture which combines signal processing in both voltage-domain and time-domain is investigated. The pipelined two-stage ADC is composed of a SAR-ADC front-end and a TDC back-end. Unlike the static open-loop-amplifier employed in the first prototype, cross-domain voltage-time-converter with dynamic logic based operation is adopted to provide inter-stage residue transferring of which the power consumption scales as the operation frequency varies. A locally-readjusted scheme with embedded time-adjuster is employed to alleviate the timing offset issues presented in the 2D-Vernier TDC architecture. A two-fold coarse-fine mapping scheme together with post digital calibration is utilized to align the references between voltage and time-domain, and also boosts the conversion performance. This approach provides a highly digital-intensive energy-efficient high-speed ADC candidate for both CS-operation of certain spectral sparse signals and also general purpose data conversion applications.

In the second part, several design techniques for voltage-mode high-speed multi-level wireline transmitters equipped with equalization are discussed. As the non-idealities brought by PADs, packaging parasitic, Vias and PCB traces led to a considerable voltage amplitude shrink and severe ISI issues for high-speed wireline communications, complicated equalization schemes at both transmitter and receiver side with bulky inductors are in need to compensate for the

performance degradation. This unavoidably results in a significant increase of power and area consumption. Instead of pushing the symbol rate up to the technology limitation, an alternative approach is to extend the NRZ signaling into a multi-level one. A multi-level transmitter design can be considered as a low-resolution but ultra-high-speed DAC design. Various DAC architectures are thus considered and discussed in terms of their pros and cons for such design target in this part.

The first technique discussed is a Capacitor-DAC-based approach for PAM-4 transmitter design with 2-tap equalization. Compared with previous CML-based approach and Resistor-DAC based approach, the Capacitor-DAC-based approach owns the following advantages: first, the multi-level voltage can be linearly combined at the output in a direct binary-weighted form, thus simplifies the front-end design; second, the non-linearity brought by the inverter-based driver is eliminated from the settled output voltage; third, it shows the potential of providing high-swing for directly driving capacitive-loading such as Micro-Ring-Modulator in optical fiber applications; in addition, the conduction time of the drivers takes up only a fractional period of the whole symbol period and no dc current flows within the Capacitor-DAC after settling, which potentially provides power-efficient operation. To verify the concept, a PAM-4 transmitter with 2-tap FFE based on Capacitor-DAC front-end is implemented in 65nm CMOS technology. It achieves 25Gbps speed and a power efficiency of 2mW/Gbps.

Besides directly transmitting signals, the PAM-4 transmitter can also serve as a baseband modulator for carrier-based communication systems. For such applications, one critical issue is the non-linear input-output transfer curve of the mixer stage within the RF transmitter signal chain. To migrate such issue, a pre-distortion scheme is proposed based on the Capacitor-DAC-based PAM-4 transmitter architecture. By decoding the 2-bit PAM-4 signal and splitting the

Capacitor-DAC, the opening of each of the three eyes can be adjusted while keeping the peak-topeak voltage amplitude the same. In addition, a C2C-DAC based architecture is further explored to embed the equalization function into the pre-distortion enabled transmitter front-end design to cope with the ISI issue as the symbol rate goes high. It is noted that such pre-distortion scheme and C2C-DAC for equalization equipped multi-level transmitter design can also be applied to general purpose I/O applications since the non-evenly distributed opening of the 3 eyes will also happen after passing through the non-ideal physical channels.

Dedicated output buffers following the Capacitor-DAC, such as a pair of class-A source follower, will be requested if the output channel needs 50 ohm impedance matching instead of capacitive loading. For such cases, the output swing may be limited by the buffer stage if no dedicated high-voltage supply is provided. For such a reason, a R2R-DAC-based transmitter architecture is investigated. With passive resistor based impedance matching, the output can deliver high voltage swing proportional to the supply rail of the drivers. The number of front-end high-speed logic gates is significantly reduced compared with previously reported Resistor-DAC-based approach due to its inherent binary-weighted digital-to-analog representation format, thus leading to very compact design. In addition, the unique characteristic of its R-2R recursive resistor network makes the design structural, relieving the design burden both on schematic as well as layout. The eye-distortion introduced by the parasitic resistance of the driver is also discussed, with suggestive design guidance and circuit techniques to improve the distortion performance. A concept proven PAM-4 transmitter with 2-tap FFE prototype is designed in 65nm CMOS technology, achieving 34Gbps speed and a power efficiency of 2.7mW/Gbps.

## REFERENCE

[1] Candes, Emmanuel J., Justin Romberg, and Terence Tao. "Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information." *IEEE Trans. Inf. Theory*, vol.52, no.2, pp. 489-509, Feb. 2006.

[2] Candes, Emmanuel J. and Terence Tao "Near optimal signal recovery from random projections: universal encoding strategies." *IEEE Trans. Inf. Theory*, vol.52, no.12, pp. 5406-5425, Feb. 2006.

[3] D.Donoho "Compressed sensing", *IEEE Trans. Inf. Theory*, vol.52, no.4, pp. 1289-1306, April 2006.

[4] Candes, Emmanuel J. and Terence Tao "Decoding by linear programming", *IEEE Trans. Inf. Theory*, vol.51, no.12, pp. 4203-4215, Dec. 2005.

[5] E. J. Candès, M. B. Wakin, "An introduction to compressive sampling," *IEEE Signal Processing Magazine*, vol.21, March 2008, Spain, 2006.

[6] M. A. Davenport, P. T. Boufounos, M. B. Wakin, and R. G. Baraniuk, "Signal processing withcompressive measurements," *IEEE J.Sel. Topics Signal Process.*, vol. 4, no. 2, pp. 445–460, Apr. 2010.

[7] Tropp, Joel, et al. "Beyond Nyquist: Efficient sampling of sparse bandlimited signals.", *IEEE Trans. Inf. Theory*, vol.56, no.1, pp. 520-544, Jan. 2010.

[8] Tian,Zhi, and Georgios B.Giannakis, "Compressed sensing for cognitive radios." 2007 *International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, vol.4, pp.1357-1360, April. 2007

[9] Wang, Yue, Tian, Zhi and Chunyan Feng, "A two-step compressed spectrum sensing scheme forwideband cognitive radios." *2010 IEEE Globecom*, pp.1-5, Dec. 2010

[10] Sun, Hongjian, et al. "Wideband spectrum sensing for cognitive radio networks: a survey." *Wireless Communications, IEEE* 2013, vol.20, pp.74-81, April 2013

[11] M. Trakimas, et al., "A compressed sensing analog-to-information converter with edgetriggered SAR ADC core." *IEEE Trans. Circuits Sys. I,Reg. Papers*, vol.60, no.5, pp.1135-1148, May. 2013. [12] P. Yenduri, et al., "A low power compressive sampling time-based ADC." *IEEE Journal* on *Emerging and Selected Topics in Circuits and Systems*, vol.2, no.3, pp.502-515, Sept. 2012.

[13] Chen, Xi, et al ,"A sub-Nyquist rate sampling receiver exploiting compressive sensing" *IEEE Trans. Circuits Sys. I, Reg. Papers*, vol.58, no.3, pp.507-520, Mar. 2011.

[14] Chen, F., et al ,"Design and Analysis of a Hardware-Efficient Compressed Sensing Architecture for Data Compression in Wireless Sensors" *IEEE J. Solid-State Circuits*, vol.47, no.3, pp.744-756, Mar. 2012.

[15] M. S. Chen, and C. K. K. Yang "A 50 - 64 Gb/s serializing transmitter with a 4-tap, LC-ladder-filter-based FFE in 65 nm CMOS technology." *IEEE J. Solid-State Circuits*, vol. 50, no.8, pp. 1903-1916, Aug. 2015.

[16] H. Li, et al."A 25Gb/s 4.4 V-swing AC-coupled Si-photonic microring transmitter with 2tap asymmetric FFE and dynamic thermal tuning in 65nm CMOS "*IEEE ISSCC Dig. Tech. Papers*, Feb. 2015, pp. 410-411.

[17] Y. H. Chien, K. L. Fu and S. I. Liu "A 3-25 Gb/s 4-channel receiver with noise-canceling TIA and power scalable LA." *IEEE Trans. Circuits Syst.II, Exp. Briefs*, vol. 61, no.11, pp. 845-849, Nov. 2014.

[18] S. Y. Kao and S. I. Liu "A 20Gbps transmitter with adaptive pre-emphasis in 65nm CMOS technology." *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 57, no.5, pp. 319-323, May. 2010.

[19] T. Kim, S. Jang, S. Kim, S. Chu, J. Park and D. Jeong. "A Four-Channel 32-Gb/s Transceiver With Current-Recycling Output Driver and On-Chip AC Coupling in 65-nm CMOS Process." *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 61, no.5, pp. 304-308, May. 2014.

[20] C. Menolfi, et al., "A 25Gb/s PAM4 transmitter in 90nm CMOS SOI." *IEEE ISSCC Dig. Tech. Papers*, Feb. 2005, pp. 72-73.

[21] J. Lee, M. S. Chen and H. D. Wang, "Design and comparison of three 20-Gb/s backplane transceivers for duobinary, PAM4, and NRZ data." *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2120-2133, Sep. 2008.

[22] A. Nazemi, et al., "A 36Gb/s PAM4 transmitter using an 8b 18GS/S DAC in 28nm CMOS." *IEEE ISSCC Dig. Tech. Papers*, Feb. 2015, pp. 58-59.

[23] B. Song, K. Kim, J. Lee, and J. Burm. "A 0.18 um CMOS 10-Gb/s Dual-Mode 10-PAM Serial Link Transceiver." *IEEE Trans. Circuits Syst.I, Reg. Papers*, vol. 60, no.2, pp. 457-468, Feb. 2013.

[24] Song, Young-Hoon, and Samuel Palermo. "A 6-Gbit/s hybrid voltagemode transmitter with current-mode equalization in 90-nm CMOS." *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 59, no.8, pp. 491-495, Aug. 2012.

[25] Song, Young-Hoon, et al. "An 8–16 Gb/s, 0.65–1.05 pJ/b, voltage-mode transmitter with analog impedance modulation equalization and sub-3 ns power-state transitioning." *IEEE J. Solid-State Circuits*, Vol.49, pp. 2631-2643, Nov.2014.

[26] Lu, Yue, et al. "Design and analysis of energy-efficient reconfigurable pre-emphasis voltage-mode transmitters." *IEEE J. Solid-State Circuits,* Vol. 48, No.8, pp 1898-1909, Aug.2013

[27] B. Song, K. Kim, J. Lee, J. Chung, Y. Choi and J. Burm. "A 13.5-mW 10-Gb/s 4-PAM Serial Link Transmitter in 0.13-CMOS Technology." *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 61, no.9, pp. 646-650, Aug. 2014.

[28] Kim, J., Balankutty, A., Elshazly, A., Huang, Y. Y., Song, H., Yu, K., and O'Mahony, F, "A 16-to-40Gb/s quarter-rate NRZ/PAM4 dual-mode transmitter in 14nm CMOS." *IEEE ISSCC Dig. Tech. Papers*, Feb. 2015, pp. 60-61.

[29] Tropp, Joel A., Anna C. Gilbert, and Martin J. Strauss. "Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit." *Signal Processing*, Vol.86, No.3, pp.572-588. Mar. 2006

[30] Tropp, Joel A., Anna C. Gilbert, and Martin J. Strauss. "Algorithms for simultaneous sparse approximation. Part II: Convex optimization." *Signal Processing*, Vol.86, No.3, pp.589-602. Mar. 2006

[31] Tropp, Joel A., and Anna C. Gilbert., "Signal recovery from random measurements via orthogonal matching pursuit." *IEEE Trans. Inf. Theory*, vol. 53, no.12, pp. 4655-4666, Dec. 2007.

[32] Ji, Shihao, Ya Xue, and Lawrence Carin. "Bayesian compressive sensing." *IEEE Trans. Signal Processing*, Vol.56, No.6, pp. 2346-2356, June 2008

[33] Bellasi, David E., et al., "VLSI design of a monolithic compressive sensing wideband analog-to-information converter." *IEEE J. Emerging and Selected Topics in Circuits and System.*, vol. 3, no.4, pp. 552-565, Apr. 2013.

[34] Kull, Lukas, et al. ,"A 3.1 mW 8b 1.2 GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32 nm digital SOI CMOS." *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3049-3058, Dec. 2013.

[35] Lin, Ying-Zu, et al. "An asynchronous binary-search ADC architecture with a reduced comparator count." *Circuits and Systems I: Regular Papers, IEEE Transactions on* Vol.57, No.8, pp.1829-1837, Aug. 2010

[36] Hashemi, S., et al., "A 7.1-mW 1-GS/s ADC with 48-dB SNDR at Nyquist Rate." *IEEE Custom Integrated Circuits Conference*, pp.1-4, Sept. 2013.

[37] Ren, F., W. Xu, and D. Markovic. "Scalable and parameterised VLSI architecture for efficient sparse approximation in FPGAs and SoCs." *Electronics Letters*, Vol.49, No.23, pp.1440-1441, Dec. 2013

[38] Wei, Hegong, et al., "An 8-Bit 4-GS/s 120-mW CMOS ADC." *IEEE Custom Integrated Circuits Conference*, pp.1-4, Sept. 2013

[39] Kim, Jong-In, et al. "A 6-b 4.1-GS/s flash ADC with time-domain latch interpolation in 90nm CMOS." *IEEE J. Solid-State Circuits*, vol. 48, no.6, pp. 1429-1441, June. 2013.

[40] Yousry Ramy, et al., "An architecture-reconfigurable 3b-to-7b 4GS/sto-1.5GS/s ADC using subtractor interleaving." 2013 IEEE Asian. Solid-State Circuits Conf, pp. 285-288, Nov. 2013.

[41] Roberts, Gordon W., and Mohammad Ali-Bakhshian. "A brief introduction to time-todigital and digital-to-time converters." *IEEE Trans. Circuits and Systems II: Express Briefs* Vol.57, No.3, pp. 153-157, Mar. 2010

[42] Dudek, Piotr, Stanislaw Szczepanski, and John V. Hatfield. "A high-resolution CMOS timeto-digital converter utilizing a Vernier delay line." *IEEE J. Solid-State Circuits*, Vol. 35, No.2, pp. 240-247, Feb.2000

[43] Liscidini, A., Vercesi, L., and Castello, R.: 'Time to digital converter based on a 2dimensions Vernier architecture'. *2009 IEEE Custom Integrated Circuits Conf.*, San Jose, USA, September 2009, pp. 45–48

[44] Tam, Sai-Wang, et al. "A simultaneous tri-band on-chip RF-Interconnect for future network-on-chip." *2009 Symposium on VLSI Circuits*. IEEE, 2009.

[45] Y. Kim, L. Nan, J. Cong, and M.-C. Frank Chang, "Hollow Plastic Cable and mm-Wave CMOS Transceiver Enabled High-Speed and Energy-Efficient Data Link for Short Distance Communications", *IEEE Microwave and Wireless Components Letters (MWCL)*, Vol.23. No.12, pp.674-686, Dec. 2013

[46] Bassi, M., Radice, F., Bruccoleri, M., Erba, S., and Mazzanti, A. ,"A 45Gb/s PAM-4 transmitter delivering 1.3 Vppd output swing with 1Vsupply in 28nm CMOS FDSOI." *IEEE ISSCC Dig. Tech. Papers*, Feb.2016, pp. 66-67.