## **UCLA UCLA Electronic Theses and Dissertations**

### **Title**

Design of Energy-Efficient Single-Ended Frequency-Division Multiplexing Wireline **Transceivers** 

**Permalink** <https://escholarship.org/uc/item/8tf9q4x4>

**Author** Du, Jieqiong

**Publication Date** 2019

Peer reviewed|Thesis/dissertation

### UNIVERSITY OF CALIFORNIA

Los Angeles

Design of Energy-Efficient Single-Ended Frequency-Division Multiplexing Wireline Transceivers

A dissertation submitted in partial satisfaction

of the requirements for the degree

Doctor of Philosophy in Electrical Engineering

by

Jieqiong Du

© Copyright by

Jieqiong Du

2019

#### ABSTRACT OF THE DESSERTATION

Design of Energy-Efficient Single-Ended Frequency-Division Multiplexing Wireline Transceivers

By

Jieqiong Du

Doctor of Philosophy in Electrical Engineering

University of California, Los Angeles, 2019

Professor Mau-Chung Frank Chang, Chair

The demand for the aggregate I/O bandwidth increases rapidly while the number of available I/Os are limited by packaging constraints and cost. Compared with differential signaling, singleended signaling improves the pin-efficiency and aggregate I/O bandwidth by doubling the data rate per pin and is widely used for inter-processor or processor-memory applications. However, as the date rate increases, inter-symbol interference due to channel nonideality becomes a major issue for signal integrity. Equalizers are usually required for conventional NRZ links for data recovery, which limits the energy efficiency of the wireline transceivers as the data rate scales. Additionally, crosstalk between parallel transmission lines aggravate the signal integrity. In this dissertation, frequency-division multiplexing links are studied for single-ended wireline applications to resolve these issues energy-efficiently.

The first part of the dissertation introduces a dual-lane tri-band single-ended transceiver using PAM-2 modulation. The proposed transceiver effectively alleviates inter-symbol interference and far-end crosstalk between the two transmission lines. The transceiver mitigates inter-symbol interference by enabling a smaller symbol rate and self-equalizing the channel loss variation through the down-conversion process. The transceiver reduces crosstalk by exploiting the characteristic of the channel response and taking advantage of the frequency and phase orthogonality for the coherent PAM modulation. The dual-lane transceiver achieves an aggregate data rate of 24Gb/s with an energy-efficiency of 1.17pJ/bit over two coupled transmission line.

The second part of the dissertation describes a high-throughput low-power 16-QAM singlelane single-ended wireline transceiver. By reducing the symbol rate, most of the building blocks such as the MUX/DeMUX run at only <sup>1</sup>/4 of the total data rate and therefore the energy-efficiency of these blocks is improved. The self-equalization effect during the down-conversion process reduces the inter-symbol interference. By combining two QPSK modulators to realize 16-QAM modulation, the TX reduces the linearity requirement for most building blocks. The receiver adopts a single-to-differential low-noise amplifier with DC feedback without an external reference. The single-ended transceiver obtained a maximum interface data transfer speed of 32Gb/s/pin with only 28mW power consumed.

The dissertation of Jieqiong Du is approved.

Wentai Liu

Gregory J. Pottie

Gregory P. Carman

Mau-Chung Frank Chang, Committee Chair

University of California, Los Angeles

2019

To my families…

## **Table of Contents**





## **LIST OF FIGURES**







## **LIST OF TABLES**



### **ACKNOWLEDGEMENTS**

I would like to express my appreciation to my advisor, Professor M. C. Frank Chang who is supportive and patient with me during my graduate study in UCLA. I am truly grateful for the rich research opportunities and resources he provides me for exploring new things. I am also thankful for his guidance as well as patience as he taught me how to conduct research, write papers and prepare presentations. He is one of the most creative and passionate people I have seen, and he serves as an excellent role model for me in my future career.

I would also like to sincerely thank Professor Wentai Liu, Professor Gregory J. Pottie and Professor Gregory P. Carman for their valuable discussion and their effort to serve on my committee.

I would like to thank for my lab mates in HSEL whom I learned a lot from, especially Jia Zhou, Shawn Wang, Chien-Heng Wong, Yuan Du, Yilei Li, Wei-Han Cho, Yan Zhang, Rulin Huang, Boyu Hu, Li Du. Their knowledge, encouragement and their assistance (especially during tapeout period) has been extremely helpful for me.

Last and most importantly, I would like to thank my families for their love, support and encouragement.

### **VITA**

2012 B.S. in Microelectronics, Shanghai Jiao Tong University, China

2014 M.S. in Electrical Engineering, University of California, Los Angeles, USA

2014 Intern, Teradyne Inc, Los Angeles, USA

2015 Ph.D. Candidate in Electrical Engineering, University of California, Los Angeles, USA

2017 Intern, TSVLink Corp., Santa Clara, USA

### **PUBLICATIONS**

**Jieqiong Du**, Chien-Heng Wong, Yo-Hao Tu, Wei-Han Cho, Yilei Li, Yuan Du, Po-Tsang Huang, Sheau-Jiung Lee and Mau-Chung Frank Chang. 2019. A 7.5-mW 10-Gb/s 16-QAM Wireline Transceiver with Carrier Synchronization and Threshold Calibration for Mobile Inter-chip Communications in 16-nm FinFET. In *International Symposium on Networks-on-Chip (NOCS'19), October 17-18, 2019, New York, NY, USA.* ACM, NEW York, NY, USA, 8 pages.

**J. Du** *et al*., "A Compact Single-Ended Dual-band Receiver with Crosstalk and ISI Reductions for Highdensity I/O Interfaces," *2019 IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, Boston, MA, USA, 2019, pp. 231-234.

W. Qiao, **J. Du**, Z. Fang, M. Lo, M. F. Chang and J. Cong, "High-Throughput Lossless Compression on Tightly Coupled CPU-FPGA Platforms," *2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)*, Boulder, CO, 2018, pp. 37-44.

X. S. Wang; C.H. Chan, **J. Du**, C.H. Wong, Y. Li, Y. Du, Y.C. Kuan, B. Hu, M.C.F. Chang, "An 8.8- GS/s 8b Time-Interleaved SAR ADC with 50-dB SFDR Using Complementary Dual-Loop-Assisted

Buffers in 28nm CMOS," *2018 IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, Philadelphia, PA, 2018, pp. 88-91.

X. S. Wang, X. Jin, **J. Du**, Y. Li, Y. Du, C. H. Wong, Y. C. Kuan, C. H. Chan, M. C. F. Chang, "A 2- GS/s 8-bit ADC Featuring Virtual-Ground Sampling Interleaved Architecture in 28-nm CMOS," in *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. PP, no. 99, pp. 1-1, Sept 2017

C.-H. Wong, Y. Li, **J. Du**, X.S. Wang, M.-C.F. Chang, "A 0.75V 2.6GHz Digital Bang-Bang PLL with Dynamic Double-Tail Phase Detector and Supply-Noise-Tolerant gm-Controlled DCO", *Electronic Letters*, 2 pp. Dec 2017

Y. Du; W. H. Cho; P. T. Huang; Y. Li; C. H. Wong; **J. Du**; Y. Kim; B. Hu; L. Du; C. Liu; S. J. Lee; M. C. F. Chang, "A 16-Gb/s 14.7-mW Tri-Band Cognitive Serial Link Transmitter With Forwarded Clock to Enable PAM-16/256-QAM and Channel Response Detection," in *IEEE Journal of Solid-State Circuits*, vol.PP, no.99, pp.1-12, April 2017

Y. Du, W-H Cho, Y. Li, C-H Wong, **J. Du**, P-T Huang\*, Y. Kim, Z-Z Chen, S. J. Lee, M-C Frank Chang, "A 16Gb/s 14.7mW TriBand Cognitive Serial Link Transmitter with Forwarded Clock to Enable PAM16/256-QAM and Channel Response Detection in 28 nm CMOS", *IEEE Symposium on VLSI Circuits (VLSI 2016),* June 2016

W-H. Cho, Y. Li, Y. Du, C-H. Wong, **J. Du**, P-T. Huang, S.J. Lee, H-N. Chen, C-P. Jou, F-L. Hsueh, M-C. F. Chang, "A 38mW 40Gb/s 4-Lane Tri-Band PAM-4/16-QAM Transceiver in 28nm CMOS for High-Speed Memory Interface", *IEEE International Solid-State Circuits Conference (ISSCC 2016)*, Feb. 2016, pp. 184-1

Y. Li, W. H. Cho ; Y. Du ; **J. Du** ; P. T. Huang ; S. J. Lee ; M. C. F. Chang, "Carrier synchronisation for multiband RF interconnect (MRFI) to facilitate chip-to-chip wireline communication", *Electronic Letters*, vol. 54, issue 7, pp. 535-537, Feb. 2016

### **CHAPTER 1**

### **Introduction**

#### **1.1 Motivation**

Over the past decade, the inter-chip data transfer bandwidth has grown dramatically. The scaling of CMOS technology substantially expands the digital processing power of microprocessors and subsequently leads to an exponentially growth of I/O data movements. As is shown in Fig 1.1, the demands for the aggregate input/output (I/O) bandwidth have expanded at a rate of two-to-three times every two years to facilitate applications such as gaming, graphics, machine learning, etc. To meet this growing demand, the I/O pin count per component has been increased, leading to an increasing amount of area and power on I/O circuitry. With the growing number of I/O pins per component, the power consumed by each I/O link becomes critical. It is desired that the I/O circuits consumes only a small fraction of total chip power. On the other hand, the increment of I/O pins per component is usually limited by package constraints and cost. As a result, the per-pin data rate has been doubled every four years for various I/O standards to meet the total bandwidth requirement. Fig 1.2 shows the per-pin data rate development for a variety of I/O standards within a decade [1]. The I/O designs have become one of the major factors that impact the performance and efficiency of computing systems. Therefore, wireline transceivers with high throughput while maintaining sub-pJ/bit energy efficiency is in high demand.

While differential signaling is widely used in wireline communications, single-ended I/Os still dominate low-cost high-bandwidth inter-processor and processer-to-memory by providing better

pin-efficiency and the aggregate I/O bandwidth by doubling the data rate per pin. Even though most high-speed single-ended links adopt baseband signaling such as non-return-to-zero (NRZ) signaling, it becomes more and more challenging to scale up the throughput without sacrificing energy-efficiency due to signal integrity issues such as inter-symbol interference and crosstalk. The motivation of this dissertation is to introduce and exploit frequency division multiplexing technique into single-ended wireline communication for better signal integrity and higher throughput while maintaining good energy efficiency.



**Fig 1.1 Trend with I/O pad count, data rate, aggregate I/O BW and CPU clock rate.** 



**Fig 1.2 Per-pin data rate vs. year for a variety of I/O standards. [1]** 

#### **1.2 Overview of Baseband Wireline Links**

#### **1.2.1 Conventional NRZ Signaling**

Currently, NRZ signaling is widely adopted for wireline communication. In NRZ signaling, signals switches between signal levels of high and low to represent logical '1's and '0's. A digital binary sequence is transferred between chips by mapping to or de-maping from a sequence of high/low-level symbols. Each symbol has a fixed duration and contains only one-bit information. Fig 1.3 illustrates a binary sequence, the corresponding time domain waveform for NRZ signaling, and the power spectral density for NRZ signaling. As can be seen, the power spectral density has the shape of the sinc function with its nulls at  $k/T_s$  where  $T_s$  is the symbol rate of the NRZ signal and  $k=1,2,3,...$  Most energy are concentrated within its main lobe from DC to  $1/T_s$ .



**Fig 1.3 The transient waveform and the power spectral density of NRZ signaling.** 

#### **1.2.2 Contemporary Equalization Solutions**

As the data rate scales, NRZ signaling suffers from increasing inter-symbol interference due to the bandwidth limitation and reflections of the wireline channels. Fig 1.4 shows the simulated insertion loss of a FR-4 printed circuit board (PCB) microstrip line and the power spectral density of the NRZ signaling at the transmitter output and receiver input. As can be seen, since the signal is not evenly attenuated over frequency by the channel, the signal spectrum is distorted at receiving end. The distortion within its main lobe, especially the distortion below  $0.5/T_s$ (recognized as the Nyquist rate), results in inter-symbol interference which is a major issue for signal integrity in high-speed wireline communications. The time domain waveforms of a square pulse over the channel at transmitter and receiver side are also illustrated in Fig 1.4. As can be seen, the received pulse extends over multiple symbol periods. In addition to the frequencydependent loss of the wireline channel, reflections arise when there are impedance discontinuities along the channel, such as via holes, open stubs or poor impedance matching at the transceiver front-end. Impedance discontinuities introduce deep notches/ripples in channel response, again leading to signal distortion and signal integrity degradation



**Fig 1.4 Pulse response and power spectral density at transmitter output and receiver** 

**input.** 

While discontinuities are reduced by board-level techniques and employing termination circuits at the front-end, equalization techniques are usually required to compensate channel loss and reduce inter-symbol interference in high-speed wireline transceivers. Feedforward equalizers (FFE), continuous-time linear equalizers and decision feedback equalizers are the most widely adopted techiques for equalization in wireline transceivers. These techniques are usually combined to optimize the performance of transceiver. The complexity of the equalization is set by the targeted data rate and channel bandwidth. An example of the conventional high-speed NRZ transceiver architecture is shown in Fig. 1.5. In addition to equalizers, serializers and deserializers are required for serializing low-speed data streams to the high-speed NRZ signal. State-of-the-art single-ended NRZ transceivers of up to 25Gb/s/pin have been demonstrated in [2] with equalization techniques while differential NRZ transceiver of up to 56Gb/s has been achieved in [3],[4]. However, the complexity of equalizers such as the tap numbers for FFEs/DEFs as well as the bandwidth requirement of the transceiver increases with the data rate, and so does the power consumption, which renders it challenging to achieve good energy efficiency for high-speed NRZ transceivers.

Other than inter-symbol interference, crosstalk noise due to electromagnetic coupling is another issue for signal integrity, especially for long parallel single-ended transmission lines. Crosstalk introduces jitter and reduces the eye opening for TDM baseband signaling. Board-level techniques are conventionally used. Employing proper spacing is the simplest solution while other solutions include shielding signal channels such as via-stitching guard traces or serpentine guard lines [5],[6],[7]. On the other hand, circuit level solutions were also investigated [8],[9],[10]. For example, analog IIR filters have been employed in [8] to deal with far-end

crosstalk. Additionally, decision feedback-based crosstalk canceller has been used in [9]. Fig 1.6 illustrates a simplified solution for crosstalk cancellation for NRZ signaling. These crosstalk cancellation blocks, however, can be power hungry, consuming more than 1pJ/bit.



**Fig 1.5 An illustration of crosstalk cancellation.** 

#### **1.3 Organization of Thesis**

 The rest of the dissertation is organized as follows. Chapter 2 introduces the concept of frequency-division multiplexing signaling, state-of-the-art wireline links based on FDM and the advantage of FDM link regarding equalization. Chapter 3 describes a dual-lane FDM signalended transceiver that deal with crosstalk and inter-symbol interference efficiently by exploiting the characteristics of channel response as well as FDM technique. In chapter 4, a high-speed low-power single-ended 16-QAM transceiver for memory interface is described. Chapter 5 concludes the dissertation.

### **CHAPTER 2**

### **Frequency-Division Multiplexing Links**

#### **2.1 Introduction of Frequency-Division Multiplexing Links**

It has been established for long in communication theory that multi-band/FDM systems have the potential to achieve superior performance against channel non-ideality. Currently, the most widely-used FDM technique in wireline communication is discrete multi-tone (DMT, also known as OFDM) for digital subscriber line. However, it is challenging to apply such techniques for high-speed links with multi-gigabit per second data rate [11]. Recently, progress has been reported on frequency-division multiplexing (FDM) links for wireline communications [12][13][14]. In these links, only a small number of frequency bands are used but each band have relatively high bandwidth from hundreds of MHz to several GHz. It has shown promising characteristics dealing with channel imperfections and achieving better energy-efficiency than conventional baseband links.



**Fig 2.1 A conceptual FDM link.** 

Fig. 2.1 shows a conceptual FDM link. Instead of multiplexing low-speed data stream in time domain, FDM links use orthogonal frequency bands to transfer low-speed data simultaneously. For this reason, the symbol period can be longer compared with conventional NRZ links. With a small sub-band bandwidth, the FDM links experiences less in-band gain variation than conventional baseband links and thus less signal distortion and inter-symbol interference. Additionally, FDM offers a unique opportunity to program signal spectrum in accordance to the channel characteristic and to transfer data over frequencies where channel response is relatively flat, avoiding wasting energy in channel notches [14]. Furthermore, in FDM links, frequency and phase orthogonality can be exploited to handle crosstalk between adjacent parallel transmission lines [15]. Therefore, it would be worthwhile to investigate FDM-based high-speed single-ended links where inter-symbol interference as well as crosstalk has been major signal integrity issues for advancing the throughput while maintaining good efficiency.

#### **2.2 State-of-the-art Frequency-Division Multiplexing Wireline Transceivers**

 Early works of wireline transceivers using FDM was demonstrated in [16],[17] using noncoherent on-off keying modulation. In [17], data were transferred though the baseband and a radio-frequency (RF) band at 18GHz simultaneously as Fig. 2.2 shows. It achieved 8Gb/s/pin over 5-cm FR-4 PCB channel at 4pJ/bit. However, the design is bulky and does not scale with process due to the employment of inductors. Additionally, the employment of mm-wave carrier frequency for non-coherent modulation makes it challenging to maintain energy efficiency as the data rate further scales.



**Fig 2.2 On-board RF interconnect transceiver in 2012 ISSCC using on-off keying.** 



**Fig 2.3 On-board 5-band QPSK RF interconnect transceiver in 2015 CICC.** 

Later in 2015, a QPSK-based RF interconnect is realized as Fig 2.3 shows [13]. It employs five frequency bands from 1.6GHz to 5.2GHz with a symbol rate of 400 Mbaud. It achieves 4Gb/s aggregate data rate per differential pair (2Gb/s/pin) over 2-inch FR-4 channel. However, due to the large number of carriers, the RF clock generation has been a challenge for energy efficiency for the overall system.

An advanced version using 16-QAM shown as Fig 2.4 was demonstrated in 2016 ISSCC, transferring data through the baseband and two RF band at 3GHz and 6GHz [14]. An aggregate data rate of 10Gb/s per differential pair (5Gb/s/pin) over 2-inch FR-4 channel was achieved with 0.9pJ//bit. It also demonstrated the application with multi-drop channel where signal spectrum is designed to match channel characteristic and to bypass notch frequencies using FDM.



**Fig 2.4 3-band 16-QAM RF interconnect transceiver for multi-drop bus in 2016 ISSCC.** 



**Fig 2.5 Cognitive FDM TX in VLSI 2016.** 

A transmitter is then proposed to cognitively adapt signal spectrum to different channel conditions [18]. It was proposed to learn the channel frequency response by detecting the received single-tone power at receiver side for different frequencies. The transmitter with proposed channel learning mechanism is shown in Fig 2.5. With 3-band spectrum, the transmitter achieves 16Gb/s per differential pair at 0.9pJ/bit by supporting up to 256-QAM modulation. However, the receiver is implemented with discrete components in this work.

Most recent work on FDM links realized an ADC-based discrete multitone receiver data path where OFDM demodulation are realized in digital domain with the implementation of a highspeed ADC [19]. Shown as in Fig 2.6, it achieved 56Gb/s over differential traces using 14nm

Finfet technology. However, its energy-efficiency (2.875pJ/bit) does not show improvement over conventional baseband links due to the employment of high-speed ADCs and DSPs.



**Fig 2.6 ADC-based discrete multitone receiver data-path in ISSCC 2019.** 

Overall, we can see a substantial reduction over the equalization complexity compared to conventional baseband links as a result of the reduced symbol rate. Also, by shaping the signal spectrum according to channel frequency response, the channel capacity can be fully exploited. Compared with baseband links, FDM links show promising feature to achieve good energyefficiency by reducing equalization complexity. Prior arts of FDM links, especially coherentmodulation-based FDM links, are mostly implemented with differential signaling, resulting in low pin-efficiency and data rate per pin. It is of great interest to apply FDM into high-speed single-ended applications. In chapter 3&chapter 4, coherent-modulation-based frequency division multiplexing links are proposed and implemented for single-ended applications.

#### **2.3 Equalization for Coherent RF-modulated Signal**

#### **2.3.1 Equalization for Double-Sideband Signaling**

It was found out previously in [13] and [14] that the double-sideband signaling such as PAM signaling provides a self-equalization effect through the process of RF-carrier down-conversion for channels with monotonic attenuation over frequency. The idea is illustrated in Fig 2.7. Double-sideband signals are generated by modulating the baseband signal by the RF carrier. After modulation, the signal spectrum contains duplicated information in it upper and lower sideband. Assuming a channel with linear-loss frequency response, the upper-sideband would experience more loss while the lower-sideband will be less attenuated. When the upper sideband and lower sideband is combined at receiver baseband, the recovered signal spectrum is evenly attenuated. In this case, the equivalent baseband channel model for the RF band has a flat response and thus the RF signal is only attenuated but not distorted and thereby does not suffer from inter-symbol interference.



**Fig 2.7 Simple explanation of self-equalization [13].** 



**Fig 2.8 a) Channel response and signal spectrum at TX output. b) baseband-equivalent channel model before/after LPF.** 

However, the assumption of a linear-loss channel response is only valid for narrowband applications. For wideband applications where the signal bandwidth is comparable with RF carrier frequency, this is rarely the truth. For example, for FR-4 microstrip line, channel attenuation is approximately linear in dB versus frequency. On the other hand, it is interesting to find out that the baseband equivalent channel model for a RF-modulated signal over a monotonical lossy transmission line has a peaking response [20]. Fig 2.8 shows an example of transmitting a wideband RF-modulated signal through a lossy FR-4 microstrip line. Fig 2.8 a) shows the channel response and signal spectrum. The signal has a baud rate of 8 GBaud and is modulated by an 8 GHz carrier. The corresponding baseband equivalent channel loss is shown in blue in Fig 2.8b). As can be seen, the channel loss variation is about 20dB across the signal bandwidth. The baseband equivalent model, however, shows about 6dB peaking. In order to equalize the channel and remove the peaking, a low-pass filter with its frequency response as the grey line is applied so that the final channel bandwidth is around 6 GHz. The bandwidth of the

low-pass filter can vary to achieve desired bandwidth. However, this reveals an interesting fact that, the equalization of a double-sideband signal through a monotonic lossy channel can be achieved by applying a low-pass filter with proper bandwidth at baseband after downconversion. A system level simulation is performed with a microstrip line channel model that has a monotonic 20dB loss variation within signal bandwidth (similar to the case in Fig 2.8). To see the equalization effect without of high-frequency component interference after down-conversion, the carrier frequency is at 16 GHz for this simulation. The demodulated eye diagram at receiver low-pass filter is shown in Fig. 2.9. We can see that a low-pass filter with a narrower bandwidth is preferred due to the peaking effect after down-conversion. In order to maximize the performance, the response of the low-pass filter should be designed in accordance to the targeted channel.



**Fig 2.9 Simulated eye diagram at RX LPF with different LPF BW.** 

#### **2.3.2 Equalization for Single-Sideband Signaling**



**Fig 2.10 Illustration of I/Q interference of quadrature modulation due to in-band channel loss variations. [14]** 

Compared with double-sideband signaling, single-sideband signaling such as QPSK/QAM modulation shows better spectrum utilization. However, it is more affected by frequencydependent channel loss as illustrated in Fig. 2.10 [14]. Under ideal linear-loss channel condition, we can see that the in-phase terms self-equalized. However, the quadrature terms have some residue, which results in IQ interference. Fig. 2.12 shows baseband equivalent model on a non-linear loss channel falling at a rate of -20dB/dec [14]. We can see that the quadrature term residue grows at a faster rate than the in-phase term which means I/Q interference aggravates as channel loss variation increases. Since the quadrature term has 0 power at DC but peaks at higher frequency, the interference is a relatively high-frequency component. Fig 2.12 also shows the eye diagram the I/Q interference and the degraded eye diagram due to I/Q interference. The I/Q interference centers at symbol transitioning edge and introduces jitter.



**Fig 2.11 a) Example channel response, b) effective baseband I/Q transfer function [14], c) eye diagram of the interference, d) the degraded eye diagram due to I/Q interference.** 

### **CHAPTER 3**

# **A Dual-lane Tri-band Single-Ended Transceiver using Coherent PAM Modulation**

This chapter studies double-sideband signaling based frequency-division multiplexing transceivers to enable high-speed data transfer with inter-symbol interference and far-end crosstalk (FEXT) reduction. The equalization with double-sideband signaling is illustrated in chapter 2. On the other hand, as the number of I/O increases and the distance between lanes reduces, crosstalk, in addition to inter-symbol interference, become a severe issue for signal integrity, especially with single-ended transceivers. In this chapter, the properties of PAM-based multi-band signaling are exploited to enable low-power high-speed data transfer over two closely-spaced lossy FR-4 microstrip line by reducing FEXT and inter-symbol interference. A dual-lane transceiver is implemented in TSMC 28nm HPC process and tested over 4-inch FR-4 coupled microstrip lines on PCB. It achieves an aggregate data rate of 24Gb/s.

#### **3.1 Channel Overview**

Microstrip lines are widely adopted for high-speed inter-chip communication on PCB for its low cost. However, the FEXT between the long parallel microstrip lines become one critical signal integrity issue. When a pulse is applied at one end of the aggressor line, a FEXT voltage appears at the other end of the victim line as Fig. 3.1 shows. In two parallel microstrip lines shown as Fig 3.2, the FEXT voltage in time domain can be approximated by [21]:

$$
V_{fext}(t) = \frac{1}{2} \left( \frac{C_m}{C_T} - \frac{L_m}{L_S} \right) * TD * \frac{dV_a(t - TD)}{dt}
$$
(3.1)

where  $C_m$  is the mutual capacitance,  $C_T$  is the sum of self and mutual capacitance,  $L_m$  and  $L_S$  is the self and mutual inductance, respectively, TD is the propagation time through the transmission line and  $V_a(t)$  is the applied voltage at the aggressor line. Since  $\frac{L_m}{L_c}$  $rac{L_m}{L_S} > \frac{C_m}{C_T}$  $\frac{c_m}{c_T}$  in microstrip line, FEXT is dominated by inductive coupling. The FEXT voltage is proportional to signal transitioning. As the signal pulse gets narrower and transitioning gets faster, this far-end crosstalk voltage gets larger and become comparable with the direct-coupled pulse response. FEXT thus introduces jitter and reduces the eye opening for TDM baseband signaling as illustrated in Fig. 3.1.



**Fig 3.1 Illustration of the FEXT voltage and its effect to eye diagram of NRZ signaling.** 

FEXT can also be evaluated in frequency domain [5]. With the simulation setup shown as in Fig 3.2, the simulated s-parameter for two 4-inch microstrip line with 6-mil width and 6-mil spacing using HFSS is shown in Fig 3.2. Both lines have characteristic impedances of  $~50$ Ohms. As can be seen, the  $S_{41}$  (FEXT) is a function of frequency and has a band-pass shape over frequency. Additionally, the  $S_{41}$  and  $S_{21}$  show a phase difference of about 90 degree over most frequencies. It is also interesting to note that the  $FEXT(S_{41})$  has a larger magnitude than direct

coupling  $(S_{21})$  at high frequency. This behavior is explained in [5] and a quasi-period variation with respect to frequency is expected for a wider range of frequency. This magnitude and phase characteristics are exploited for our proposed tri-band transceiver as is explained in next section.



**Fig 3.2 a) Illustration of channel arrangement, b) HFSS simulation setup, c) simulated channel response in magnitude d) phase difference between the direct-coupled term and FEXT term over frequency.**
#### **3.2 System Overview and Analysis**

A PAM-based frequency-division multiplexing architecture shown in Fig 3.3 is proposed to enable high-speed data transfer over two closely-spaced FR-4 microstrip line. The proposed architecture consists of three frequency bands using PAM-2 with a baud rate of 4 GHz – one at baseband, 8 GHz and 16 GHz, respectively. Three data bitstreams are transmitted simultaneously over the different frequency bands. The targeted channel frequency response including the insertion loss and far-end crosstalk of targeted channel, the designed signal power spectral density and the resulting FEXT power spectral density from the adjacent channel are illustrated in Fig.3.4.



**Fig 3.3 System architecture of the dual-lane tri-band PAM-2 transceiver.** 



**Fig 3.4 Power spectral density of the proposed tri-band signaling at the RX of the duallane front-end and insertion loss and FEXT of the targeted dual-lane channel.** 

The proposed tri-band architecture can effectively minimize inter-symbol interference. By employing the tri-band architecture, the symbol rate is reduced to only one third of the total data rate. The self-equalization effect explained in chapter 2 further improves the signal integrity with the design of low-pass filter for the two RF bands. Each band is expected to experience less than 3dB in-band loss variation and thus the inter-symbol interference is minimized.

 The proposed tri-band architecture is also designed to effectively reduce crosstalk noise between the two coupled microstrip lines. Orthogonality in frequency and phase are both exploited to suppress crosstalk for the coherent PAM modulations. In addition to that, for frequency bands where the FEXT overpowers direct coupling signal, it is possible to transfer data through FEXT as explained in the following paragraphs.

For the baseband signal integrity, the crosstalk noise from adjacent baseband will be mitigated for the reduced the baseband bandwidth due to the band-pass characteristic for FEXT. The signal bandwidth is reduced by a factor of three compared with conventional one band NRZ signaling. Therefore, the signal transitioning time can be three time larger than conventional NRZ signaling and thus less FEXT noise. On the other hand, FEXT coming from the RF bands can be efficiently suppressed by more than 30dB by the low-pass filter/integrator at the receiver due to frequency orthogonality. The simulated pulse response and FEXT noise for this scenario are shown in Fig 3.5.



**Fig 3.5 Pulse response and crosstalk analysis for transmitted pulse at baseband.** 

 For the FEXT interference between adjacent RF band I (the RF band at 8 GHz), it can also be effectively filtered using phase orthogonality. Even though the FEXT power from adjacent RF band I is not reduced and even comparable with the power of the wanted signal, the crosstalk can be efficiently filtered out, because the crosstalk and signal are designed to be orthogonal to each other in terms of phase. For inductive coupling dominated channels, adjacent channel FEXT transfer function can be approximated as a derivative of the channel frequency response  $H(\omega)$  [8]

$$
FEXT(\omega) = -j\omega\beta H(\omega) \tag{3.2}
$$

Therefore, the crosstalk from adjacent channel RF band will be near 90<sup>º</sup> different from the wanted signal. At the receiver, RF I carrier phase is calibrated using a digital-controlled delay line so that the crosstalk is minimized and small deviation from 90<sup>º</sup> phase shift will not severely degrade the performance since signal energy changes little near the optimal phase. Fig 3.6 illustrates the pulse response and transient noise from the crosstalk for RF band I when TRX phase is synchronized.

For data transmission over RF band II (RF band at 16 GHz) for this dual-lane channel, a crosstransfer scheme is proposed as in Fig 3.7. The direct coupling (insertion loss) of the channel has a frequency notch within the 16 GHz RF band. On the other hand, the FEXT exhibits flat response within RF III band and overpowers direct coupling response. In order to maximize signal-to-noise ratio and signal-to-interference power, it is best to cross-decode the data between the dual-lane channel as Fig 3.7 shows. That is to take advantage of the FEXT response to transfer wanted signal and then exchange the recovered data in RF II band at the receiver after decoding. Since the FEXT overpowers the direct coupling response by  $>20$  dB at center frequency, the interference from the adjacent the RF III band over the dual-lane channel is insignificant.



**Fig 3.6 Pulse response and crosstalk analysis for a transmitted pulse at RF I band (8** 

#### **GHz).**



**Fig 3.7 Pulse response for a transmitted pulse at RF II band (16 GHz).** 

#### **3.3 Circuit Design of The Proposed Dual-lane Tri-band Single-ended Transceiver**



#### **3.3.1 Transmitter Design**

# **Fig 3.8 The transmitter architecture, time domain waveform and the output spectrum of the transmitter.**

The dual-lane transceiver consists of two identical tri-band transmitters that are shown as Fig 3.8. Both transmitters are modulated with same RF carriers and triggered by the same data clock. The tri-band transmitter consists of three parallel branches – the baseband, one RF band I with a center frequency at 8 GHz and one RF band II with a center frequency at 16 GHz. The baud rate for the three bands are 4 Gbaud. The baseband branch consists of a PAM-2 modulator. The RF band I&II has a PAM-2 RF modulator. High-pass filters follow the RF modulator to remove residue baseband component generated by circuit non-ideality such as non-linearity and carrier duty-cycle distortion. The three branches are then summed at the output node to generate the tri-band signal. Fig 3.8 illustrates the transmitter architecture, time domain waveform and the simulated output spectrum of the transmitter. Three uncorrelated PRBS generators provide pseudo-random data and the data are forwarded to the modulators for testing purpose. The PRBS for the two lanes are triggered by the data clock with the same phase.

In order to suppress in-band interference from adjacent frequency bands, pulse-shaping technique is used. The goal of the pulse shaping is to enlarge the power in the main lobe and reduce that of the side lobe. Pulse shaping of a baseband PAM-2 signal can be implemented by summing the PAM-2 signal of various weighted amplitudes and different delays as Fig 3.9 shows. It can be proved that the signal spectral density of the shaped PAM-2 signal is in form of [21]

$$
S_x(t) = \frac{\overline{P(f)^2}}{T_b} \tag{3.3}
$$

where  $T_b$  is the symbol period and  $P(f)$  is the Fourier transform of the shaped pulse. If  $w_k$  ( $k =$ 1,2,3 ...) denotes the weights and  $T_k$  is the delay lengths, the shaped pulse in time domain  $P(t)$  can be then expressed as:

$$
P(t) = \sum_{k=1,2,3...} w_k * rect(t - T_k)
$$
 (3.4)

$$
rect(t) = \begin{cases} 1, & \text{for } 0 \le t \le T_b \\ 0, & \text{otherwise.} \end{cases} \tag{3.5}
$$

The number of delay taps k and the lengths of the delay  $T_k$  as well as the weights  $w_k$  are combined to achieve the desired signal spectrum. In order to suppress the side lobe energy, the delays  $T_k$  are of a fraction of the symbol period  $T_b$ . The delay can be implemented either with flipflop-based delay-line triggered by different clock phases or analog delay-cells such as buffer chains. In order to reduce energy consumption, CMOS buffer chains are used here. The more detailed implementation of the PAM-2 modulator used in baseband branch is shown in Fig 3.11. The

summing function is performed at the output stage of the transmitter with current-controlled lowswing output driver and pre-drivers can be relieved in terms of linearity requirements. The output power of the transmitter is tunable by programming the current of the output driver. Control logics are included to enable high-impedance state of the modulator. The simulated output spectrum of the baseband modulator is shown in Fig 3.10. With the pulse shaping function, the adjacent side lobe is about 25dB lower in magnitude than the main lobe.



**Fig 3.9 Pulse shaping illustration for baseband PAM-2 signal.** 



**Fig 3.10 Simulated output spectrum for baseband signal with and without pulse** 

**shaping.** 



**Fig 3.11 PAM-2 modulator for baseband signal.** 

For the RF band signals, the conventional implementation of pulse shaping is shown in Fig 3.12. It requires a pulse shaping DSP and a high-speed multi-bit digital-to-analog converter (DAC). However, this topology is not energy-efficient with single-ended implementation due to its requirement for high-speed multi-bit DACs and linear pre-drivers and output drivers. A revised implementation that removes the multi-bit DACs and reduces the linearity requirement is illustrated in Fig 3.13. The resulting circuit implementation is shown in Fig. 3.15. As can be seen, most building blocks in the RF PAM-2 modulator have the output levels of either high or low and thus can be implemented with CMOS logic. The frequency up-conversion is realized with a MUX where the one-bit data selects between the carrier clock of 0 degree or 180 degree to

the output. Different from the baseband PAM-2 modulator, a capacitor is placed between the output node of the current-controlled driver and the termination circuits to perform a high-pass filtering function. The simulated output spectrum of the RF PAM-2 modulator is shown in Fig 3.16.



**Fig 3.12 Conventional implementation of pulse shaping in a RF-modulated system.** 



**Fig 3.13 Revised implementation of pulse shaping for the RF bands.** 



**Fig 3.14 CML-to-CMOS converter.** 



**Fig 3.15 PAM-2 RF modulator.** 



**Fig 3.16 Simulated output spectrum for a RF signal with and without pulse shaping.** 

## **3.3.2 Receiver Design**





Same as the transmitter, the dual-lane receiver has two identical receiver front-end as in Fig 3.17. Each receiver front-end has three signal branches. Each of the RF branches consists of an accoupling capacitor, a current-mode passive mixer, a transimpedance amplifier, a  $5<sup>th</sup>$  order low-pass filter, a comparator, and decoders while the baseband branch has similar building blocks except the ac-coupling capacitor and the mixer.

The receiver adopts a passive mixer-first architecture for attaining better efficiency and linearity. The input impedance termination of the receiver is provided by a parallel combination of the baseband branch and the RF branches. Fig. 3.18 shows the passive mixer and the first stage of TIA. The mixer serves for three purposes: 1) the mixer naturally serves as a single to differential circuit to improve PSRR; 2) the mixer translates the impedance characteristics of the baseband transimpedance to the RF bands and therefore reduces the required bandwidth of the transimpedance amplifier; and 3) the mixer down-converts the RF signal to baseband. The input impedance of the receiver is given by the parallel impedance of the baseband branch and the RF branches, and the simulated input impedance versus frequency for each branch and the combination is shown in Fig. 3.19. Along the baseband path, the input impedance equals the input impedance of the TIA, which is band-pass shaped. Along the RF branches, the input impedance equals the upconverted and scaled input impedance of the TIA in series of the switching resistance of the mixers, which is band-stop shaped. The bandwidth of the input impedance is designed so that most of the baseband signal current flows to the baseband branch and RF signal currents flows to the RF branches. The impedance at RX input is designed to ~50 Ohms over frequency of interest.

Following the TIA, a 5<sup>th</sup>-order linear-phase low-pass filter helps to remove out-of-band interference resulting from adjacent frequency band as well as the crosstalk from the adjacent channel. The 5<sup>th</sup>-order low-pass filter consists of three stages as Fig. 3.20 shows. The first stage has a first-order amplifier. Source degeneration is used to improve the linearity. The other two stages are second-order bi-quads using flipped-source-follower-based cells [23]. This flipped-source-

follower enables low-power and low-voltage operation. The low-pass filter provides about 35dB attenuation at 8 GHz. Continuous-time comparators then amplify this low-pass filter output to full scale where the thresholds are tuned by a low-speed digital-to-analog converter to cancel out the mismatches.



**Fig 3.18 RX Front-End: mixer and transimpedance amplifier.** 



**Fig 3.19 Simulated RX input impedance, input impedance of baseband branch, and RF** 

**branches.** 



**Fig 3.20 A 5th order low-pass filter.** 



**Fig 3.21 A continuous-time comparator.** 

#### **3.3 Measurement**

A test chip of the proposed dual-lane tri-band transceiver is implemented in CMOS 28nm HPC process. The testing environment is shown in Fig. 3.22. The proposed transceiver is tested by wire bonding to two closely-spaced 4-inch FR-4 PCB channel with 6-mil width and 4-mil spacing. A common external RF clock source of 16 GHz is applied to both the TX and RX. Another data clock at 4 GHz is applied to TX to trigger the PRBS pattern generation. UART interfaces is integrated on-chip to program the test chip. The proposed transceiver is tested by measuring the RX eye diagrams after the continuous-time comparators. The measured eye diagrams at all frequency bands at the dual-lane transceiver are shown in Fig 3.23. With 4Gb/s at each band, the proposed dual-lane transceiver achieves an aggregate rate of 24 Gb/s with a BER $\leq 10^{-12}$  with the dual-lane 4-inch FR-4 channel. The prototype chip is shown in Fig 3.24, with die size of 0.018mm<sup>2</sup>. It consumes 14.1mW at 1V supply, including 6mW for TX, 8.1mW for RX. Fig 3.25 is the image of the tested PCB channel.



**Fig 3.22 Measurement platform.** 



**Fig 3.23 Measured eye diagram.** 



**Fig 3.24 Die photo.** 

| Reference                            | <b>VLSI'12</b>        | <b>VLSI'15</b>      | ISSCC'16     | <b>This Work</b>               |
|--------------------------------------|-----------------------|---------------------|--------------|--------------------------------|
| <b>Technology</b> (nm)               | 65nm                  | 40nm                | 40nm         | 28nm                           |
| <b>XTC</b> type                      | <b>CTXC</b>           | <b>Multi-tone</b>   | Multi-band   | <b>Tri-band</b><br><b>Band</b> |
| I/O Type                             | <b>Single-Ended</b>   | <b>Differential</b> | Differential | <b>Single-Ended</b>            |
| <b>Channel</b>                       | 6-inch                | MDB-12-inch         | 2-inch       | 4-inch                         |
| <b>Data Rate</b><br>(Gb/s/pin)       | 12                    | 4.5                 | 5            | 12                             |
| Area<br>(mm <sup>2</sup> /lane)      | $0.0036$ <sup>a</sup> | 0.013               | 0.01         | 0.018                          |
| <b>Energy Efficiency</b><br>(pJ/bit) | 1.78 <sup>a</sup>     | $\mathbf{1}$        | 0.9          | 1.175                          |

**Table 1 Comparison with prior arts.** 



**Fig 3.25 Photo of tested PCB FR-4 channel.** 

# **CHAPTER 4**

# **A 32Gb/s/pin 16-QAM Single-Ended Transceiver for High-Speed Memory Interface**

 This chapter presents a 16-QAM transceiver to enable low-power high-speed data transfer by improving spectral utilization and reducing inter-symbol interference. With a single RF band at a symbol rate of 8GBaud with a carrier frequency of 8 GHz, the transceiver implemented in TSMC 28nm HPC technology achieves an aggregate data rate of 32Gb/s/pin consuming only 28mW.

#### **4.1 Introduction**

 As discussed in chapter 1, the demand for high-bandwidth memory interfaces has increased rapidly as a result of the continuous scaling of memory capacity and processor performance. As the data rate of the interfaces increases, inter-symbol interference resulting from the limited channel bandwidth aggravates signal integrity, requiring complex equalizers to compensate the channel loss. This also reduces the energy efficiency of the I/O circuits because more complex signal equalization is required as the data rate becomes even higher [24]. To mitigate the tradeoffs between the data rate and the energy efficiency, multi-level frequency-division multiplexing transceivers were introduced to enable multi-bits data transmission with longer symbol period and reduced ISI [14]. However, previous work did not provide good spectrum utilization for channels with monotonic attenuation. Additionally, differential signaling was used in previous work, resulting in low pin-efficiency and data rate per pin. In order to provide high throughput per pin and improve spectrum utilization, a 16-QAM single-ended transceiver is proposed here to leverage the advantage of frequency-division multiplexing technique for wireline

communication. By transmitting four-bits data simultaneously, the demonstrated CMOS transceiver reduces the symbol rate to only ¼ of the total data rate and greatly relax equalization. Additionally, it allows most of the building blocks such as the MUX/DeMUX to run at  $\frac{1}{4}$  the total data rate and therefore the energy-efficiency of these blocks is improved. It achieves better spectrum utilization for channels with monotonic attenuation than previous multi-band works by removing guard bands. By using a 16-QAM scheme, it also reduces the inter-symbol interference due to the equalization effect discussed in chapter 2. It obtained a maximum interface data transfer speed of 32Gb/s/pin with only 28mW power consumed.

#### **4.2 System Architecture**

The block diagram of the proposed 16-QAM transceiver is shown in Fig 4.1. In the transmitter, 4-bits parallel data each running at 8 Gb/s are generated through an on-chip PRBS generator and multiple serializers (SER). These data are then modulated to 16-QAM symbols via the I/Q carriers at 8 GHz and transmitted through the inter-chip transmission-line channel. The receiver is implemented with a direct-conversion architecture. A low-noise amplifier is designed to amplify the received signal, converting the single-ended signal to differential, followed by the I/Q mixers to down-convert the in-phase and quadrature signal to baseband. Two low-pass filters subsequently recover the four-level baseband symbols at 8 Gbaud. In this work, the recovered low-pass filters are buffered out to an oscilloscope for testing purposes. Fig 4.2 shows the output spectrum at the transmitter output.



**Fig 4.1 System architecture of the proposed single-ended 16-QAM transceiver.** 



**Fig 4.2 Proposed 16-QAM signal spectrum.** 

#### **4.3 Circuit Design of the Proposed 16-QAM Transceiver**

#### **4.3.1 Transmitter Design**

 The architecture of the transmitter is shown in Fig. 4.3. A PRBS generator generates thirtytwo parallel data at 1 Gb/s and eight 4:1 serializers convert the low-speed data into 4 Gb/s. Four MUXs further serialize the 4 Gb/s data to four parallel high-speed data at 8Gb/s. A modulator converts the 8Gb/s data streams to 32 Gb/s 16-QAM symbols. The 16-QAM modulator consists of two QPSK modulators with an amplitude ratio of 2. As is shown in Fig. 4.5, the 16-QAM constellation is decomposed into two QPSK constellations. One constellation is shown in gray color with amplitude of 2 to encompass the other constellation of amplitude of 1 over the four I-Q quadrants. Each QPSK modulator consists of a 4:1 mux where the 2-bits input data select one of the four clock phases. The encoding scheme of the MUX is shown in Fig.4.4. Since the QPSK modulator has an output level of either high or low, the linearity requirement is low, which is preferred for single-ended implementations. To simplify the circuit and reduce power consumption, the QPSK modulated is implemented with CMOS logic. The outputs of the two QPSK modulators are then combined at the final output stage of the TX. An array of resistors is connected in parallel at the output stage so that the combined impedance of the TX driver will be 50 Ohms for transmission line termination. The DC level of the output signal of the TX is around one half of VDD. The output signal has four levels and has a peak to peak amplitude of around 350 mW. The transmitted signal is inherently dc-balanced because the signal is upconverted to RF frequency, which is shown in the TX output spectrum in Fig. 4.2.



**Fig 4.3 Transmitter architecture.** 



**Fig 4.4 16-QAM modulator.** 



**Fig 4.5 16-QAM constellation illustration for the transmitter.** 

#### **4.3.2 Receiver Design**

In the proposed RX shown in Fig. 4.6, a single-to-differential low-noise amplifier provides 12 dB gain to the input signal and converts the single-ended signal to differential. Since the received signal is inherently DC balanced, a low-bandwidth DC feedback amplifier is used to force the positive and negative outputs to be the same DC level without the need of an external reference. By using DC feedback, the transfer function of the low-noise amplifier possesses a band-pass transfer nature. The DC feedback amplifier is designed with a low bandwidth subthreshold amplifier and a capacitor of 0.5pF is applied at the feedback point. The simulated passband of the low-noise amplifier is from 2MHz to 16GHz.

After converting the single-ended input signal to differential, two double-balanced I/Q mixer down-convert the information to baseband and a fourth-order low-pass filter designed with two super-source-follower-based bi-quads recover the baseband symbols. The bandwidth of the lowpass filters is designed to be around 6 GHz and can be calibrated by tuning the bias currents of

the low-pass filters. The recovered symbols are then buffered out to the oscilloscope for characterization.

The receiver uses an external 16 GHz clock source. A CML-based divider-by-2 circuit generates both in-phase and quadrature-phase carrier at 8 GHz. Receiver carriers' phases are calibrated foreground to be synchronized. The phase adjustments are achieved via phaseinterpolation between the in-phase and quadrature-phase carriers generated by the divider. The carrier phases for the I/Q mixers are adjusted separately so that I/Q interference are minimized at each signal path.



**Fig 4.6 RX architecture.** 



**Fig 4.7 Single-to-differential LNA and I/Q mixers.** 



**Fig 0.8 Low-pass filters.** 



**Fig 4.9 Bi-quad used in low-pass filters** 

#### **4.4 Measurement**

A test chip of the proposed 16-QAM transceiver is implemented in CMOS 28nm HPC process. The testing environment is shown in Fig 4.10. The proposed transceiver is tested by wire bonding to a 4-inch FR-4 PCB trace and a 12-inch FR-4 PCB trace respectively. A common external RF clock source is applied to both the TX and RX. Another data clock at  $\frac{1}{4}$  the RF clock source with the same reference clock with the RF clock source is applied to TX to trigger the PRBS pattern generation. The clock rates are adjustable to support different data rates. A UART interface is integrated on-chip to program the test chip. The proposed transceiver is tested by measuring the RX eye diagrams after the low-pass filters. Both the in-phase and quadrature phase eye diagrams are measured. Fig 4.11, 4.12 and 4.14 are the measured eye diagrams at different rates. The proposed transceiver achieves 32 Gb/s/pin on 4-inch FR-4 trace while 22 Gb/s/pin are tested over a 12-inch FR-4 trace. Even though the amplitude of the eye diagram is reduced due to ~6dB signal loss at output buffer for measurement, the BER is estimated to be less than 10-8 based on measured eye diagram and calculated SNR. The prototype chip is shown in Fig 4.15, with die size of  $0.018$ mm<sup>2</sup> (TX: 110um x 40um; RX: 110um x 70um; Clocking: 140um x 40um). It consumes 28mW at 1V supply, including 11.5mW for TX, 9mW for RX, and 7.5mW for clock dividers, phase-interpolators and buffers. Fig 4.14 shows the power breakdown for building blocks in the TRX. Table 2 shows this work substantially increases the data rate per pin by 6x than those of frequency-division transceivers with comparable energy efficiency. Compared to prior arts using NRZ/multi-level single-ended signaling, this work also demonstrates 20% higher data rate for longer channel with better efficiency.



**Fig 4.10 Testing environment.** 



**Fig 4.11 Measured eye diagram at 22Gb/s/pin over 12-inch FR-4 microstrip line.** 



**Fig 4.12 Measured eye diagram at 30 Gb/s/pin over 4-inch FR-4 microstrip line.** 



**Fig 4.13 Measured eye diagram at 32 Gb/s/pin over 4-inch FR-4 microstrip line.** 



## **Fig 4.14 TRX power breakdown.**



**Fig 4.15 Die photo.** 



# **Table 2 Comparison with prior arts.**

# **CHAPTER 5**

# **Conclusion and Future Work**

 This research studies the characteristic of frequency-division multiplexing links. This research applys the FDM technique to single-ended wireline links for reduced inter-symbol interference as well as far-end crosstalk and improved energy-efficient.

A dual-lane single-ended receiver is described in chapter 3, consisting of one NRZ baseband signaling and two coherent RF band using PAM-2 signaling. By applying double-sideband signaling, the inter-symbol interference is reduced and so is equalization complexity. Exploiting the characteristic of the channel frequency response and using the frequency and phase orthogonality, the dual-lane transceiver is also effective dealing with crosstalk between the channels, enabling low-BER data transmission at frequency bands where channel loss and crosstalk are high. Energy-efficient pulse shaping functions are implemented to reduce interband interference and the receiver adopts passive-mixer first architecture to improve linearity and a flipped-source-follower-based low-pass filter to improve energy efficiency. The dual-lane transceiver achieves an aggregate rate of 24Gb/s with an energy efficiency of 1.175pJ/bit.

For future work, the dual-lane transceiver can be studied for extension for multi-lane applications by adopting the same concept of frequency and phase orthogonality. Given the symbol rate is running at a low speed, more signal processing technique can be exploited to further improve signal integrity or tackle more challenging channel conditions with little penalty.

A 16-QAM single-ended transceiver running at 32Gb/s/pin is described in chapter 4. By transmitting four-bits data simultaneously, the demonstrated CMOS transceiver can reduce the symbol rate to only  $\frac{1}{4}$  of the total data rate. It allows most of the building blocks such as the MUX/DeMUX to run at one fourth of the total data rate and therefore the energy-efficiency of these blocks can be improved. It achieves better spectrum utilization for channels with monotonic attenuation than previous multi-band works by removing guard bands. Using 16- QAM scheme, it also reduces the inter-symbol interference due to the self-equalization effect. The TX achieves 16-QAM modulation by combining two QPSK modulators and thus reducing linearity for most building blocks. Due to the inherently DC-balanced signal, the receiver adopts a single-to-differential low-noise amplifier with DC feedback without the need for an external reference. The single-ended transceiver obtained a maximum interface data transfer speed of 32Gb/s/pin with only 28mW power consumed.

In order to further improve the performance of the 16-QAM transceiver, simple equalization such as CTLE can be implemented to further improve the signal integrity and further increase the data rate.

#### **REFERENCE**

[1] D. C. Daly, L. C. Fujino and K. C. Smith, "Through the Looking Glass - The 2018 Edition: Trends in Solid-State Circuits from the 65th ISSCC," in *IEEE Solid-State Circuits Magazine*, vol. 10, no. 1, pp. 30-46, winter 2018.

[2] J. W. Poulton *et al*., "A 1.17-pJ/b, 25-Gb/s/pin Ground-Referenced Single-Ended Serial Link for Off- and On-Package Communication Using a Process- and Temperature-Adaptive Voltage Regulator," in *IEEE Journal of Solid-State Circuits*, vol. 54, no. 1, pp. 43-54, Jan. 2019.

[3] M. Erett *et al*., "A 126mW 56Gb/s NRZ wireline transceiver for synchronous short-reach applications in 16nm FinFET," *2018 IEEE International Solid - State Circuits Conference - (ISSCC)*, San Francisco, CA, 2018, pp. 274-276.

[4] T. Shibasaki *et al*., "3.5 A 56Gb/s NRZ-electrical 247mW/lane serial-link transceiver in 28nm CMOS," *2016 IEEE International Solid-State Circuits Conference (ISSCC)*, San Francisco, CA, 2016, pp. 64-65.

[5] F. D. Mbairi, W. P. Siebert and H. Hesselbom, "High-Frequency Transmission Lines Crosstalk Reduction Using Spacing Rules," in *IEEE Transactions on Components and Packaging Technologies*, vol. 31, no. 3, pp. 601-610, Sept. 2008.

[6] K. Lee, H. Lee, H. Jung, J. Sim and H. Park, "A Serpentine Guard Trace to Reduce the Far-End Crosstalk Voltage and the Crosstalk Induced Timing Jitter of Parallel Microstrip Lines," in *IEEE Transactions on Advanced Packaging*, vol. 31, no. 4, pp. 809-817, Nov. 2008.

[7] Li Zhi, Wang Qiang and Shi Changsheng, "Application of guard traces with vias in the RF PCB layout," *2002 3rd International Symposium on Electromagnetic Compatibility*, Beijing, China, 2002, pp. 771-774.

[8] T. Oh and R. Harjani, "A 12-Gb/s Multichannel I/O Using MIMO Crosstalk Cancellation and Signal Reutilization in 65-nm CMOS," in *IEEE Journal of Solid-State Circuits*, vol. 48, no. 6, pp. 1383-1397, June 2013.

[9] C. Aprile *et al*., "An Eight-Lane 7-Gb/s/pin Source Synchronous Single-Ended RX With Equalization and Far-End Crosstalk Cancellation for Backplane Channels," in *IEEE Journal of Solid-State Circuits*, vol. 53, no. 3, pp. 861-872, March 2018.

[10] T. Oh and R. Harjani, "A 6-Gb/s MIMO Crosstalk Cancellation Scheme for High-Speed I/Os," in *IEEE Journal of Solid-State Circuits*, vol. 46, no. 8, pp. 1843-1856, Aug. 2011.

[11] A. Amirkhany, et al. "Practical limits of multi-tone signaling over high-speed backplane electrical links", *Proc. IEEE Int. Conf. Communications*, pp2693-2698, 2007.

[12] Kiarash Gharibdoust, et al. "A 7.5mW 7.5Gb/s Mixed NRZ/Multi-Tone Serial-Data Transceiver for Multi-Drop Memory Interfaces in 40nm CMOS" *ISSCC Dig. Tech. Papers* ,pp. 180-181, Feb 2015.

[13] W.-H. Cho, et al. "A 5.4-mW 4-Gb/s 5-Band QPSK Transceiver for Frequency-Division Multiplexing Memory Interface", *CICC Dig. Tech. Papers*, Sept. 2015

[14] W. Cho *et al*., "10.2 A 38mW 40Gb/s 4-lane tri-band PAM-4 / 16-QAM transceiver in 28nm CMOS for high-speed Memory interface," *2016 IEEE International Solid-State Circuits Conference (ISSCC)*, San Francisco, CA, 2016, pp. 184-185.
[15] J. Du *et al*., "A Compact Single-Ended Dual-band Receiver with Crosstalk and ISI Reductions for High-density I/O Interfaces," *2019 IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, Boston, MA, USA, 2019, pp. 231-234.

[16] S. Tam, E. Socher, A. Wong and M. F. Chang, "A simultaneous tri-band on-chip RFinterconnect for future network-on-chip," *2009 Symposium on VLSI Circuits*, Kyoto, Japan, 2009, pp. 90-91.

[17] Y. Kim *et al*., "An 8Gb/s/pin 4pJ/b/pin Single-T-Line dual (base+RF) band simultaneous bidirectional mobile memory I/O interface with inter-channel interference suppression," *2012 IEEE International Solid-State Circuits Conference*, San Francisco, CA, 2012, pp. 50-52.

[18] Y. Du *et al*., "A 16-Gb/s 14.7-mW Tri-Band Cognitive Serial Link Transmitter With Forwarded Clock to Enable PAM-16/256-QAM and Channel Response Detection," in *IEEE Journal of Solid-State Circuits*, vol. 52, no. 4, pp. 1111-1122, April 2017.

[19] G. Kim *et al*., "30.2 A 161mW 56Gb/s ADC-Based Discrete Multitone Wireline Receiver Data-Path in 14nm FinFET," *2019 IEEE International Solid- State Circuits Conference - (ISSCC)*, San Francisco, CA, USA, 2019, pp. 476-478.

[20] Y. Kim *et al*., "Impulse response analysis of carrier-modulated multiband RF-interconnect (MRFI)," Analog Integr. Circuits Signal Process., vol. 93, no. 3, pp. 395–413, 2017.

[21] Young-Soo Sohn, Jeong-Cheol Lee, Hong-June Park and Soo-In Cho, "Empirical equations on electrical parameters of coupled microstrip lines for crosstalk estimation in printed circuit board," in *IEEE Transactions on Advanced Packaging*, vol. 24, no. 4, pp. 521-527, Nov. 2001.

[22] H. Taub and D. L. Schilling, Principles of Communication Systems, Second

Edition, McGraw-Hill, 1986.

[23] M. De Matteis and A. Baschirotto, "A Biquadratic Cell Based on the Flipped-Source-Follower Circuit," in *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 64, no. 8, pp. 867-871, Aug. 2017.

[24] J. Song, H. Lee, J. Kim, S. Hwang and C. Kim, "17.6 1V 10Gb/s/pin single-ended transceiver with controllable active-inductor-based driver and adaptively calibrated cascade-DFE for post-LPDDR4 interfaces," *2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers*, San Francisco, CA, 2015, pp. 1-3.

[25] J. M. Wilson *et al*., "A 1.17pJ/b 25Gb/s/pin ground-referenced single-ended serial link for off- and on-package communication in 16nm CMOS using a process- and temperature-adaptive voltage regulator," *2018 IEEE International Solid - State Circuits Conference - (ISSCC)*, San Francisco, CA, 2018, pp. 276-278.

[26] H. Park, J. Song, Y. Lee, J. Sim, J. Choi and C. Kim, "23.3 A 3-bit/2UI 27Gb/s PAM-3 Single-Ended Transceiver Using One-Tap DFE for Next-Generation Memory Interface," *2019 IEEE International Solid- State Circuits Conference - (ISSCC)*, San Francisco, CA, USA, 2019, pp. 382-384.