### UNIVERSITY OF CALIFORNIA Los Angeles # Low Noise RF CMOS Circuits and Systems for Wireless Communications A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering by Hao Wu © Copyright by Hao Wu 2014 ### ABSTRACT OF THE DISSERTATION ### Low Noise RF CMOS Circuits and Systems for Wireless Communications by #### Hao Wu Doctor of Philosophy in Electrical Engineering University of California, Los Angeles, 2014 Professor Mau-Chung Frank Chang, Chair Accompanying and enabling the explosion of information technology in the recent decades, RF CMOS design has grown into a mature discipline and a multi-billion industry, and CMOS radio transceivers can be found in almost every consumer electronic devices nowadays. To enable the next generation RF CMOS applications, advances in both system and circuit techniques need to be accomplished. This work presents several of these advances, specifically in the context of the data communication application and low noise circuit techniques. First, a new wideband receiver architecture suitable for wireless communication is proposed and analyzed. The architecture has both phase and thermal noise cancellations to significantly relax the trade-offs between VCO phase noise and LOGEN power consumption, and trade-offs between noise, out-of-band linearity, and wide input bandwidth. Next, a current-mode mm-wave receiver architecture is described. The mm-wave receiver relies on current mode operation and novel techniques in passive devices to achieve wide RF bandwidth, low noise, and high out-of-channel linearity. It considers mm-wave receivers' adjacent/alternative channel blocking scenarios for the first time. Finally, a high speed on-chip RF-Interconnect with quarter-wavelength directional coupler for bi-direction communication and multi-drop arbitration is presented. The proposed system introduces an emerging on-chip interconnect solution with superior latency and power efficiency. It also brings tremendous flexibility and re-configurability, which proves to be extremely beneficial to the next generation large scale Network-on-Chip (NoC) system. The dissertation of Hao Wu is approved. Glenn Reinman Sudhakar Pamarti Tatsuo Itoh Hooman Darabi Mau-Chung Frank Chang, Committee Chair University of California, Los Angeles 2014 | Dedicated to my p | parents | |---------------------------------------------|----------------------------------| | "Our destiny offers not the cup of despair, | but the chalice of opportunity." | | | | | | | | | | | | | | | | | | | ### TABLE OF CONTENTS | 1 | The | esis Overview | 1 | |---|------------|-----------------------------------------------------------------|------| | 2 | <b>A</b> 1 | Highly-Linear Wideband Receiver with Phase and Thermal No | oise | | | Car | ncellation | 3 | | | 2.1 | Introduction | 3 | | | 2.2 | Prior-Arts | 6 | | | | 2.2.1 High-Purity VCO Design | 7 | | | | 2.2.2 Reciprocal Mixing Cancellation | 9 | | | | 2.2.3 Frequency Translational Noise Cancellation | 11 | | | 2.3 | Phase Noise Cancellation with Modulated Blocker | 12 | | | 2.4 | Auxiliary LO Generation | 17 | | | | 2.4.1 Frequency/Phase Multiply by Two and Noise at $\Delta f_b$ | 18 | | | | 2.4.2 Injection-Locked Oscillator – Phase Tracking Filter | 19 | | | | 2.4.3 Injection-Locked Oscillator – Phase Tracking Error | 22 | | | | 2.4.4 Injection-Locked Oscillator – Phase Noise | 27 | | | | 2.4.5 Injection-Locked Oscillator vs. Phase Locked Loop (PLL) | 30 | | | 2.5 | Proposed Receiver with Phase and Thermal Noise Cancellation | 31 | | | 2.6 | Out-of-band Linearity – Blocker Tolerance | 32 | | | | 2.6.1 Mixer-first Main Path | 33 | | | | 2.6.2 Auviliary Path | 12 | | 2.7 | Noise . | Analysis | 44 | |------|--------------------------------------------------------|----------------------------------------------------------|----| | | 2.7.1 | Receiver In-band Noise | 44 | | | 2.7.2 | Folded Noise | 47 | | | 2.7.3 | Phase Noise in N-path Filters | 49 | | 2.8 | Circuit | Design | 58 | | | 2.8.1 | Receiver Topology | 58 | | | 2.8.2 | Auxiliary LO Generation & 8/8-Phase Mixer | 59 | | | 2.8.3 | RF Multiphase Clock Generation | 61 | | | 2.8.4 | Baseband TIAs | 63 | | 2.9 | Measu | rement Results | 64 | | | 2.9.1 | Noise Figure | 65 | | | 2.9.2 | Input Matching | 66 | | | 2.9.3 | RF LO Generation | 66 | | | 2.9.4 | Blocker Noise Figure – CW Blocker | 67 | | | 2.9.5 | Blocker Noise Figure – Modulated Blocker | 69 | | | 2.9.6 | Phase Noise Cancellation with Multiple Blockers | 71 | | | 2.9.7 | Comparison with Prior Arts | 74 | | 2.10 | Digital | ly Assisted Phase Noise Cancellation | 76 | | | 2.10.1 | Digital Calibration of ILRO's Phase Tracking Error | 76 | | | 2.10.2 | Digitally Assisted Phase Noise Cancellation without ILRO | 77 | | 2.11 | Conclu | ision | 79 | | A W | A Wideband, Low-Noise Current-Mode mm-Wave Receiver 80 | | | | 3.1 | Introdu | action | 80 | **3.** | 3.2 | Receiv | ver Architecture | 82 | |-----|--------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | | 3.2.1 | FSRCS G <sub>m</sub> Front-end | 83 | | | 3.2.2 | FSRCS Load Effect and $C_{gd}$ Neutralization | 87 | | | 3.2.3 | FSRCS Front-end's Stability | 93 | | | 3.2.4 | FSRCS's Layout Consideration | 94 | | | 3.2.5 | Passive Mixer and TIAs | 95 | | | 3.2.6 | 60GHz LO Generation & Control ASIC | 99 | | 3.3 | Measu | rement Results | 100 | | | 3.3.1 | 60GHz QVCO | 101 | | | 3.3.2 | Gain and Noise Figure | 102 | | | 3.3.3 | Linearity: In-band | 104 | | | 3.3.4 | Linearity: Out-of-band | 105 | | | 3.3.5 | I/Q Mismatch and LO Leakage | 106 | | | 3.3.6 | Comparison with Prior Arts | 106 | | 3.4 | Concl | usion | 107 | | A | High | -Speed Bi-Directional RF-Interconnect with Multi-D | rop | | Arb | itratio | on | 109 | | 4.1 | Introd | uction | 109 | | 4.2 | RF-In | terconnect with Multi-Drop Arbitration Architecture | 111 | | | 4.2.1 | RF-I System with Multi-Drop Arbitration Capability | 111 | | | 4.2.2 | Channel and On-Chip Directional Coupler | 113 | | | 4.2.3 | 60GHz ASK RF Transceiver | 115 | | | 3.4<br><b>A</b><br><b>Arb</b><br>4.1 | 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 3.2.6 3.3 Measu 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 3.3.6 3.4 Conclude A High Arbitration 4.1 Introduct 4.2 RF-Introduct 4.2.1 4.2.2 | 3.2.1 FSRCS G <sub>m</sub> Front-end 3.2.2 FSRCS Load Effect and C <sub>sd</sub> Neutralization 3.2.3 FSRCS Front-end's Stability 3.2.4 FSRCS's Layout Consideration 3.2.5 Passive Mixer and TIAs 3.2.6 60GHz LO Generation & Control ASIC 3.3 Measurement Results 3.3.1 60GHz QVCO 3.3.2 Gain and Noise Figure 3.3.3 Linearity: In-band 3.3.4 Linearity: Out-of-band 3.3.5 I/Q Mismatch and LO Leakage 3.3.6 Comparison with Prior Arts 3.7 Conclusion A High-Speed Bi-Directional RF-Interconnect with Multi-Darbitration 4.1 Introduction 4.2 RF-Interconnect with Multi-Drop Arbitration Architecture 4.2.1 RF-I System with Multi-Drop Arbitration Capability 4.2.2 Channel and On-Chip Directional Coupler | | References | | | 137 | |------------|--------|------------------------------------------------------------|-----| | 4.5 | Conclu | usion | 136 | | | 4.4.7 | Results and Discussion | 135 | | | 4.4.6 | Power and Area Estimation | 133 | | | 4.4.5 | Time Division Modulation Multicast for Stream Augmentation | 131 | | | 4.4.4 | Curl Transmission Line for Stream Circulation | 129 | | | 4.4.3 | Stream Arbitration in RF-I | 128 | | | 4.4.2 | Stream Arbitration: Example | 127 | | | 4.4.1 | Stream Arbitration: Scheme | 123 | | 4.4 | Stream | n Arbitration – Arbitration Multi-Drop RF-I in NoC | 123 | | | 4.3.4 | Comparison with Prior Arts | 123 | | | 4.3.3 | Multi-cast with Destructive Reading | 122 | | | 4.3.2 | Multi-cast Functionality | 120 | | | 4.3.1 | 60GHz VCO | 119 | | 4.3 | Measu | rement Results | 118 | ## LIST OF FIGURES | 2.1 | Receiver desensitization due to reciprocal mixing in a wideband receiver | 4 | |------|------------------------------------------------------------------------------------|----| | 2.2 | Receiver desensitization due to gain compression in a wideband receiver | 5 | | 2.3 | FOM of published LC and ring VCOs in recent years | 8 | | 2.4 | Phase noise and its replica, and their effect in reciprocal mixing | 9 | | 2.5 | A limiter-based phase noise cancelling approach | 10 | | 2.6 | Frequency translational noise cancelling | 11 | | 2.7 | Spur cancellation with an arbitrarily modulated blocker | 16 | | 2.8 | Phase noise cancellation with modulated blocker | 17 | | 2.9 | Rectifier circuit examples | 18 | | 2.10 | Freq/phase multiplied by two and noise at $\Delta f_b$ | 18 | | 2.11 | Simulated PM filtering response of injection-locking | 21 | | 2.12 | Spectrum expansion in (a) EDGE and (b) WCDMA [70]-[71] | 22 | | 2.13 | An oscillator locked to an FM signal | 24 | | 2.14 | Phase error in injection-locked oscillator under FM signal | 24 | | 2.15 | Phase noise cancellation's system simulation for injection locking's PM distortion | 25 | | 2.16 | Phase noise cancellation vs. BPF bandwidth | 26 | | 2.17 | Injection locked oscillator's phase noise and its effect on receiver | 27 | | 2.18 | Theoretical best oscillator effective FOM | 29 | | 2.19 | PLL as phase tracking filter | 29 | | 2.20 | The proposed receiver with phase and thermal noise cancellations | 31 | | 2.21 | Mixer-first receiver and its equivalent model | 33 | |------|-------------------------------------------------------------------------|----| | 2.22 | Mixer-first receiver – decomposed into LTI and LTV sections | 36 | | 2.23 | Baseband's equivalent LTI model around <i>m</i> -th harmonic | 39 | | 2.24 | Simplified LTI model, and the baseband TIA | 39 | | 2.25 | Simulated gain compression and blocker NF with varying $C_F$ and $C_L$ | 41 | | 2.26 | Phase and thermal noise cancelling auxiliary path impedance - linearity | 42 | | 2.27 | Auxiliary path baseband impedances | 43 | | 2.28 | Receiver's in-band noise sources | 45 | | 2.29 | Phase and thermal noise cancelling auxiliary path impedance - noise | 45 | | 2.30 | Auxiliary path baseband impedances | 46 | | 2.31 | Folded noise caused by phase noise cancellation | 47 | | 2.32 | Phase noise in <i>N</i> -path filters | 49 | | 2.33 | Graphic demonstration of phase noise in N-path filters | 51 | | 2.34 | Phase and thermal noise cancelling receiver model | 53 | | 2.35 | Phase and thermal noise cancelling receiver folded noise model | 55 | | 2.36 | The complete phase and thermal noise cancelling receiver | 58 | | 2.37 | Schematic for auxiliary LO generation | 60 | | 2.38 | 8/8-phase RM image passive mixer | 61 | | 2.39 | RF and non-overlapping clock generations | 62 | | 2.40 | TIA Op-Amp schematic | 63 | | 2.41 | Die-micrograph of the phase and thermal noise cancelling receiver | 64 | | 2.42 | Small signal noise figure measurement | 65 | | 2.43 | Measured S <sub>11</sub> | 66 | | 2.44 | Measured RF LO phase noise at 2GHz | 67 | |------|------------------------------------------------------------------------------------------|------| | 2.45 | Measured CW blocker NF | 68 | | 2.46 | Phase noise cancellation with AM/PM blockers | 69 | | 2.47 | Measured WCDMA blocker NF | 70 | | 2.48 | Blockers on opposite side-bands | 71 | | 2.49 | Blockers on the same side-band | 72 | | 2.50 | Phase noise cancellation with two LTE blockers | 73 | | 2.51 | Measured phase noise cancellation with two CW tones | 74 | | 2.52 | System for digitally calibrating ILRO's phase tracking error | 77 | | 2.53 | Digitally assisted phase noise cancellation without ILRO | 78 | | 2.54 | Digitally assisted phase noise cancellation | 79 | | 3.1 | (a) A traditional multi-stage voltage-mode 60GHz LNA [48]-[49]; (b) A 60 G | Ήz | | | receiving front-end frequency response and blocker scenario | 80 | | 3.2 | Proposed current-mode broadband mm-wave receiver with integrated VCO | 82 | | 3.3 | (a) Series resonance tank and its passive amplification; (b) the series resonance tank i | in a | | | common source stage | 83 | | 3.4 | Frequency-staggered Series Resonance Common Source (FSRCS) and its frequency | ncy | | | response | 84 | | 3.5 | Schematic of low noise G <sub>m</sub> Front-end with FSRCS | 86 | | 3.6 | (a) Cascode with inter-stage inductors; (b) single-ended equivalent model | 87 | | 3.7 | (a) $C_{gd}$ creates coupling between tanks; (b) single-ended equivalent model | 89 | | 3.8 | FSRCS with neutralization capacitors | 91 | | 3.9 | Simulated (a) frequency response (b) input matching of Gm front-end with neutralizat | | |------|-----------------------------------------------------------------------------------------------------------------------------------|--| | | capacitors | | | 3.10 | FSRCS front-end's stability factors 93 | | | 3.11 | FSRCS's layout 94 | | | 3.12 | (a) mixers (b) TIA and its Op-Amp (CMFB of second stage not shown) (c) QVCC | | | | schematic (LO buffer not shown) 96 | | | 3.13 | Equivalent model for analyzing mixer and TIA's noise | | | 3.14 | Simulated TIA performance: (a) input referred noise voltage; (b) input impedance; (c) | | | | AC response 98 | | | 3.15 | Die photograph of the proposed (a) QVCO; (b) current-mode receiver showing key | | | | blocks 101 | | | 3.16 | Measured QVCO phase noise | | | 3.17 | Measured receiver conversion gain and input matching | | | 3.18 | Measured and simulated receiver noise figure | | | 3.19 | Measured receiver gain compression (P <sub>1dB</sub> ) | | | 3.20 | Measured receiver gain compression (P <sub>1dB</sub> ) in the presence of blocker 105 | | | 3.21 | Measured receiver I/Q output waveforms. ( $V_{pI} = 36.1 \text{ mV}$ , $V_{pQ} = 35.5 \text{ mV}$ , $\Delta t = 2.454 \text{ ns}$ | | | | f = 100 MHz) 106 | | | 4.1 | Original FDD RF-I concept [59]-[61] and its application in large NoCs [60]-[62] 110 | | | 4.2 | 4-drop RF-I with Arbitration Capability (Drop A multicasts) | | | 4.3 | On-chip directional coupler and its simulated performance | | | 4.4 | 60GHz ASK transmitter 115 | | | 4.5 | 60GHz ASK receiver 116 | | | 4.6 | Testing environment of the system | 117 | |------|-------------------------------------------------------------------------------------|-------| | 4.7 | 5Gbps RF-I with multi-drop and arbitration die-photo | 118 | | 4.8 | Transmitter output frequency and calibrated output power | 119 | | 4.9 | Eye diagrams when drop A multi-casts to drops B, C, D | 120 | | 4.10 | Eye diagrams when drop B multi-casts to drops C, D, A | 121 | | 4.11 | Measured BER vs. data rate at different drops | 121 | | 4.12 | Eye diagrams when drop A transmits and only B receives | 122 | | 4.13 | The substream augmented by each node as the stream passed by | 125 | | 4.14 | An example of the stream arbitration scheme | 127 | | 4.15 | The curl transmission line | 130 | | 4.16 | An example of time division modulation multicast for stream augmentation with prior | ority | | | rotation | 132 | ## LIST OF TABLES | 2.1 | Comparison with other blocker-tolerant or phase noise cancelling receivers | 75 | |-----|----------------------------------------------------------------------------|-----| | 3.1 | Comparison with other mm-wave receivers | 108 | | 4.1 | Comparison with other CMOS on-chip interconnects | 123 | | 4.2 | Power Parameters of Point-to-Point RF Transceiver in 32nm Technology | 134 | | 4.3 | Power Parameters of Arbitration RF Transceiver in 32nm Technology | 134 | ### ACKNOWLEDGEMENT This dissertation ends my graduate studying journey at UCLA over the last 5 fantastic years. It would not have been made possible without the help of many people. Firstly, I cannot be more grateful to my advisor, Prof. Mau-Chung Frank Chang, for admitting me into his research lab back when I was still an undergraduate student from Zhejiang University knowing nothing about CMOS and integrated circuits design, leading me into the fascinating integrated circuit design field, and most importantly, inspiring me to be a better engineer and researcher by exhibiting it in himself. I am also grateful to Dr. Hooman Darabi, who essentially served as a second adviser of mine. The receiver architecture described in this dissertation was developed under his daily guidance at Broadcom Corporation. I also want to express gratitude to Ning-Yi Wang and David Murphy, who have given me tremendous advices during my Ph.D. study. Many others have also had a direct influence on the work in this dissertation. I am particularly thankful to Mohyee Mikhemar, Ahmad Mirzaei, Adrian Tang, Yuan Du, Lan Nan, Sai-Wang Tam, Rod Kim, Prof. Jason Cong, Prof. Glenn Reinman, Tim LaRocca, Bryan Wu, and Yen-Cheng Kuan. It is impossible for me to express my gratitude and thanks to my parents. I am and will always be indebted to them as they made me who I am today, and taught me to always be a good person. Finally, I would like to thank my fiancée, Xingyun (Lizzy) Liao, for her love and encouragement along the way. Although we are almost always thousands of miles apart, her support has been beside me all the time. Go Bruins. ## VITA | 2009 | B.Sc., Information Science and Electronic Engineering (Honor) | |-------------|---------------------------------------------------------------| | | Zhejiang University, Hangzhou, China | | | | | 2011 | M.Sc., Electrical Engineering | | | University of California, Los Angeles | | | | | 2009 – 2014 | Graduate Student Researcher | | | University of California, Los Angeles | | | | | 2013 – 2014 | Graduate Student Internship | | | Broadcom Corporation, Irvine, California | ### **PUBLICATIONS** - **Hao Wu**, M. Mikhemar, D. Murphy, H. Darabi, and M.-C. F. Chang, "A highly-linear inductor-less wideband receiver with phase and thermal noise cancellation", in *IEEE International Solid-State Circuits Conference (ISSCC 2015)*, Feb. 2015 (Accepted) - **Hao Wu**, N. Wang, Y. Du, Y. Kuan, F. Hsiao, S. Lee, M. Tsai, C. Jou, and M.-C. F. Chang, "A current-mode mm-Wave direct-conversion receiver with 7.5 GHz bandwidth, 3.8 dB minimum Noise-Figure and +1dBm P1dB, out linearity for high data rate communications", in *IEEE Radio Frequency Integrated Circuits (RFIC 2013)*, pp. 89-92, Jun. 2013 - **Hao Wu**, L. Nan, S.-W. Tam, H.-H. Hsieh, C. Jou, G. Reinman, J. Cong, and M.-C. F. Chang, "A 60GHz on-chip RF-interconnect with $\lambda/4$ coupler for 5Gbps bi-directional communication and multi-drop arbitration", in *IEEE Custom Integrated Circuits Conference (CICC 2012)*, pp. 1-4, Sep. 2012 - D. Murphy, H. Darabi, and **Hao Wu**, "A VCO with implicit common-mode resonance," in *IEEE International Solid-State Circuits Conference (ISSCC 2015)*, Feb. 2015 (Accepted) - A. Tang, **Hao Wu**, and M.-C. F. Chang, "A 245 GHz, 2.6 mW/pixel Antenna-less CMOS Imager with 0.7 fW/√Hz NEP and 3.5m Backscattered Range", in *IEEE Asian Solid-State Circuit Conference (A-SSCC 2012)*, Nov. 2012 - N.Y. Wang, **Hao Wu**, J.Y.-C. Liu, J. Lu, H.H. Hsieh, P.Y. Wu, C. Jou, and M.-C. F. Chang, "A 60dB gain and 4dB noise figure CMOS V-band receiver based on two-dimensional passive Gmenhancement", in *IEEE Radio Frequency Integrated Circuits (RFIC 2011)*, pp. 1-4, Jun. 2011 - C. Xiao, M.-C. F. Chang, J. Cong, M. Gill, Z. Huang, C. Liu, G. Reinman, and **Hao Wu**, "Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects", *ACM Transactions on Architecture and Code Optimization (TACO)*, vol. 9, no. 4, pps. 60, Jan. 2013 - A. Tang, F. Hsiao, D. Murphy, I. Ku, J. Liu, S. D'Souza, N. Wang, **Hao Wu,** Y. Wang, M. Tang, G. Virbila, M. Pham, D. Yang, Q. Gu, Y. Wu, Y. Kuan, C. Chien, and M.-C. F. Chang, "A Low Overhead Self-Healing Embedded System for Ensuring High Performance Yield and Long-Term Sustainability of a 60GHz 4Gbps Radio-on-a-Chip", in *IEEE International Solid-State Circuits Conference (ISSCC 2012)*, pp. 316-318, Feb. 2012 ### **CHAPTER 1** ### **Thesis Overview** This dissertation covers topics on low noise RF CMOS circuits and systems for wireless communications, and consists of 3 distinct parts. Chapter 2 introduces and analyzes a highly linear wideband receiver with phase and thermal noise cancellations. Chapter 3 presents a wideband current-mode mm-wave receiver for high data rate wireless communications, while Chapter 4 shows a 60GHz on-chip RF-Interconnect with multi-drop arbitration capability. A more detailed overview of each chapter is as below. Unlike in most dissertations, the conclusions are drawn at the end of each chapter, rather than at the end of the thesis, since each chapter presents a self-contained contribution. ## Chapter 2: A Highly Linear Wideband Receiver with Phase and Thermal Noise Cancellation As there is no off-chip RF filtering available in a true Software-Defined-Radios (SDR), SDR receivers typically suffer from two fundamental issues when subject to large out-of-band blockers: gain compression, and reciprocal mixing (RM) caused by phase noise. In this work, we propose a new architecture with both thermal and phase noise cancellations to tackle the aforementioned challenges. The resulting design achieves sub-2dB small signal NF and tolerates 0dBm blockers, yet incorporates no inductors including the RF VCO. Its NF is lower than 13.5dB under a 0dBm continuous-wave (CW) or -10dBm WCDMA blocker, resulting into an equivalent oscillator FOM of 181.5dB using a low-cost yet noisy ring oscillator. ### Chapter 3: A Wideband, Low-Noise Current-Mode mm-Wave Receiver A current-mode 60GHz direct-conversion receiver breaking trade-offs among bandwidth, NF and linearity is designed and realized in 65nm CMOS. The 60GHz receiver employs novel Frequency-staggered Series Resonance Common Source (*FSRCS*) stage to extend RF bandwidth with superior noise performance. The receiver's current-mode operation offers excellent out-of-band blocker tolerance and linearity. With on-chip quadrature LO generations, the fabricated receiver simultaneously achieves minimal noise figure of 3.8dB, RF bandwidth of 7.5GHz, output P1dB of 1dBm, maximum conversion gain of 36dB, and IRR of -35dB. The receiver is capable of tolerating out-of-channel blocker up to -9dBm at 3.5GHz away. It occupies silicon area of 1.3mm<sup>2</sup> and draws 25.5mA from 1V supply. ### Chapter 4: A High-Speed Bi-Directional RF-Interconnect with Multi-Drop Arbitration A 5Gbps bi-directional *RF-Interconnect* (RF-I) with multi-drop and arbitration functions is designed and realized in 65nm CMOS. The baseband data are modulated in RF-I by using a 60GHz carrier in ASK format. An on-chip differential transmission line (TL) is used as the communication channel, which minimizes the latency (9ps/mm) only under the speed-of-light limitation. We insert $\lambda/4$ directional couplers for implementing multi-drops without signal reflection. We also use MOS switches along the signal path to reconfigure/arbitrate communication priority for multi-drops. This design consists of four TX/RX drops along a 5.5mm TL ring, supports destructive reading with fixed priority, and can reconfigure any drop as the transmitter. The tested data rate of the RF-I is 5Gbps with lower than $10^{-12}$ BER. The power consumptions for TX and RX are 10mW and 5mW, respectively, corresponding to conversion rate of 1.67pJ/b & 0.303pJ/b/mm. ### **CHAPTER 2** # A Highly-Linear Wideband Receiver with Phase and Thermal Noise Cancellation ### 2.1 Introduction Narrowband receiver front-ends invariably make use of external and/or internal RF filtering to prevent large out-of-band signals corrupting the wanted signal. The external RF filtering is typically realized by exploiting Surface Acoustic Waves (SAW filters), and the internal RF filtering is realized by on-chip resonance tanks (inductors or transformers). Since these resonance-based RF filters are almost always fixed in frequency, multiple front-ends are required to cover the large number of frequency bands serviced by a modern wireless device. The alternative is a single wideband receiver that is tunable over the entire spectrum of interest, but since such a receiver must work without RF filtering, it is easily desensitized by large unwanted signals. This inability to handle interferers has prevented wideband designs from being adopted in commercial products, but, if this issue could be overcome, an wideband approach would have some distinct advantages including: lower cost, lower pin count, simplified package design, reduced number of off-chip components and faster design times. As well as simplifying conventional multi-band receiver designs, a highly-linear wideband receiver is fundamental to the flexible, universal radio platform knows as Software-Defined-Radio (SDR) [1]-[4]. The desensitization of a wideband receiver by large out-of-band blockers happens through two fundamental mechanisms: reciprocal mixing and gain compression. They will be explained below. Reciprocal mixing is a mechanism occurs with receiver's mixing down-conversion. Ideally, the receiver's down-conversion mixer multiplies the RF input with its LO, shifting RF's spectrum down to around DC for baseband signal processing. The ideal LO is a continuous wave, which is a single tone in the frequency domain after Laplace transformation. However, in practice the LO will be accompanied by certain noise. The noise is usually defined as "phase noise", as it effectively modulates LO's phase in a random fashion. In the frequency domain, the phase noise appears as skirts around the carrier. Since there is no RF filtering available in the wideband design, any blocker present will be down-converted along with the wanted signal. When the blocker mixes with LO phase noise, it could deposit a significant amount of noise in the received band (Fig 2.1). This noise is linearly proportional to the blocker power and the phase noise at the blocker offset. Thus, for a wideband receiver to maintain the same noise figure as an equivalent narrowband receiver, its LO phase noise must reduce by one dB for every dB of filtering attenuation that is removed at the blocker frequency. The blocker NF is given by $$NF_{blocker} = P_b[dBm] + 174[Hz/dBm] + \mathcal{L}\{\Delta\omega\}[dBc/Hz]$$ (2.1) Noisy LO Blocker Wanted Signal Figure 2.1: Receiver desensitization due to reciprocal mixing in a wideband receiver The second challenge in a wideband receiver design is the gain compression. As out-of-band blocker could be as large as 0dBm (1mW) power, which is commonly wanted for out-of-band blockers for GSM and 2G-3G-4G cellular standards, it poses serious challenge to the circuit design. 0dBm on a 50ohm resistor corresponds to a peak-to-peak voltage of about 600mV. This is significant voltage swing for a CMOS chip operating at a standard 1V supply. Even a very low voltage amplification of 6dB would already clip the output to the 1V supply. In a traditional RF receiver, 15dB is likely needed for LNA to achieve 2dB NF. In this case, the LNA's output experiences an internal swing of 3.5V, clearly too large for modern CMOS process to handle (Fig. 2.2). Figure 2.2: Receiver desensitization due to gain compression in a wideband receiver As explained, the reciprocal mixing and gain compression are the two major issues in a wideband receiver design. Overcoming these two challenges has been the focus of both industry and academia for some years under different forms and on different directions. Reciprocal mixing is usually alleviated by improving the phase noise of the LO generation (LOGEN) system. To achieve an acceptable blocker NF, defined in (2.1), inductors are typically used, and their power consumptions are typically high. However, to cover the wide bandwidth, e.g. 0.1-3GHz, more than one *LC* VCO's are likely required. On the other hand, overcoming gain compression has been studied under different forms. They include the noise cancelling receiver [5]-[6], "SAW-less" receivers [44], wideband receivers [9]-[10], software defined radio receivers [11]-[12], and so on [13]-[14]. In summary, what is needed is a highly-linear wideband *inductor*less receiver that can tolerate large out-of-band blocker with minimum LOGEN power consumption, without relying on SAW pre-filters, and without sacrificing small signal noise performance. Within this context, we describe a new receiver architecture that can tolerate large out-of-band blockers, maintains superior noise performance, and greatly relaxes the requirement on LOGEN system. The design explores frequency translational noise cancellation, and exploits phase noise's PM nature to cancel its reciprocal mixing. As a result, the wideband receiver is completely *inductor-less*, but achieves competitive performances with state-of-the-art receivers. The next section firstly reviews relevant prior-art before the general phase noise cancellation technique is introduced in Sec. 2.3. Section 2.4 discusses the generation of the phase noise cancellation auxiliary LO. Afterwards, the proposed receiver architecture is introduced in Sec. 2.5. Section 2.6 discusses the design of key building blocks in the auxiliary LO generation and auxiliary path baseband. Section 2.7 presents the noise analysis of the receiver. Section 2.8 discusses the circuit implementation of the prototype. Section 2.9 presents measurement results relating to the design and Section 2.10 presents a proposed system to digitally calibrate phase tracking error, before conclusions are drawn in Sec. 2.10. ### 2.2 Prior-Arts Given the discussion in the previous section, it is obvious that to overcome the reciprocal mixing, an LOGEN system with minimum phase noise is desired. Also to overcome gain compression, various blocker-tolerant receivers have been proposed and need to be examined. ### 2.2.1 High-Purity VCO Design An LOGEN system is comprised of source generation blocks (VCO inside a Frequency Synthesizer), and LO buffering/driving blocks. While low noise LO buffering/driving blocks have been studied and can be realized at acceptable level of power consumption, VCO phase noise typically dominates the process for blockers located at a typical blocker offset (tens of MHz). As the VCO phase noise directly affects the blocker NF in a linear fashion, the trade-off between phase noise and power consumption in a VCO is fundamentally limited. Despite intensive research, oscillator performance metrics have not improved a great deal over the past fifteen years. Indeed the best Figure-of-Merit (FOM) reported so far was published back in 2001 [15] with an extra inductor, and recently approached in [16] without the extra inductor. In general, the VCO's FOM can be written below. $$FoM = \frac{\left(\frac{\omega_c}{\Delta\omega}\right)^2}{\mathcal{L}\{\Delta\omega\}P_{DC[mW]}} = \frac{2\eta Q^2}{kTF} 10^{-3}$$ (2.2) In the equation, $\eta = P_{TANK[mW]}/P_{DC[mW]}$ , the power dissipated in the tank is $P_{TANK[mW]} = A_c^2/(2R_p)$ , the oscillation amplitude is $A_c$ , and the equivalent tank loss is given by $R_p$ . $F_{min} = 1 + \gamma$ , the transistor noise coefficient is $\gamma$ (2/3 in long-channel CMOS). As all these parameters are physically limited, LC VCO has pre-dominantly used because of its much higher Q compared to a ring VCO. As originally explained by Leeson [17], VCO's far-out phase noise has a slope of 20dB/dec due to the VCO's inherent filtering effect. This effect is true for both LC and ring VCOs, as LC VCO's filtering is accomplished by a second-order tank, and ring VCO's filtering is built into the feedback which is also second order. Thus, higher Q implicates heavier filtering, and lower phase noise at given offset. Compared with LC VCO's tank Q ( $Q = \omega_c L/R_s$ ), ring oscillator's effective Q is much lower. As derived in [18], to achieve a same phase noise, the power consumption of LC and 3-stage ring oscillators are: $$I_{RO} \approx 50Q^2 I_{LC} \tag{2.3}$$ Reflected in the FOM performance, the VCOs published in recent years also proved ring oscillators' worse FOM compared with *LC* counterpart (Fig. 2.3). It is the dominant reason that using ring oscillator in applications such as wireless receivers for cellular use has been impractical. Figure 2.3: FOM of published *LC* and ring VCOs in recent years While the *LC* VCOs have been proven indispensable in an RF receiver that complies with cellular standards, they also have several serious drawbacks compared with ring VCOs. Firstly, to achieve a high *Q* in the inductors, ultra thick metal layer is required, yet it also requires special process which is more expensive. Secondly, as transistors and even capacitors<sup>1</sup> have enjoyed the process scaling by following the Moore's law [19], an *LC* VCO hardly scales, because inductor density are physically irrelevant to the CMOS process scaling. Last but not least, in a wideband receiver, multiple *LC* VCOs are required to cover the entire RF band, due to VCO's limited <sup>&</sup>lt;sup>1</sup> It has been observed that on-chip Metal-Over-Metal CAPacitors (MOMCAP) density increases as process scales, and in a similar rate as transistor scaling. tuning range. For example, [13]-[14] employed 2 VCOs to cover the entire 0.4-6GHz RF bandwidth. For these reasons, it is desirable to bridge the performance gap between the ring and *LC* VCOs, so *LC* VCOs can be replaced by ring VCOs. ### 2.2.2 Reciprocal Mixing Cancellation Figure 2.4: Phase noise and its replica, and their effect in reciprocal mixing To relax the stringent requirement on the entire LOGEN system and potentially lower its cost and power consumption, one recent work [20]-[21] has proposed a scheme to cancel the phase noise caused reciprocal mixing. The basic concept is that, since any spur or phase noise from LO have experienced hard limiting blocks (oscillator itself, buffers, dividers, etc.) before driving the mixer, they are in nature phase modulation (PM) products, and inherently symmetrical around the main carrier. As shown in Fig. 2.4, the LO phase noise at $f_{LO}$ - $\Delta f$ has its replica at $f_{LO}$ + $\Delta f$ . As the strong blocker is down-converted along with the weak desired signal, the phase noise at $f_{LO}$ + $\Delta f$ mixes with the blocker and deposits noise in the received band (DC) and the replica at $f_{LO}$ + $\Delta f$ mixes with the blocker and appears at $2\Delta f$ . This replica at $2\Delta f$ can be used to cancel the RM noise in the received band (DC). Figure 2.5: A limiter-based phase noise cancelling approach Based on this principle, [20]-[21] also implemented a prototype based on limiters (Fig. 2.5). After down-conversion, the blocker and image pass through a limiter. Through the limiter's strong $3^{rd}$ order non-linearity, a portion of the image $2\Delta f$ is down-converted to DC. After proper scaling, it cancels the phase noise around DC from the signal path. While this approach is low power and proven to work with a moderate (up to -10dBm) continuous-wave (CW) blocker, it does not work with the blocker when it is modulated. In this case, the replica will not be properly down-converted, and the limiter also generates un-wanted product around DC (e.g. blocker's PM expansion, 2<sup>nd</sup> order non-linearity, etc.). Furthermore, as the authors admitted themselves, their work did not address the challenge of gain compression. The strongest blocker it could tolerate is around -10dBm, which is not adequate to meet the cellular standards. ### 2.2.3 Frequency Translational Noise Cancellation In recent years, a lot of research efforts [11]-[14], [22]-[24] have been paid on high linearity receiver designs, in a large part by realizing the superior linearity of current driven passive mixers. But in almost each case, the linearity and wideband operation comes at the expense of noise figure. However, one recent work [9]-[10], as an exception, addressed this issue and broken the trade-offs between linearity, bandwidth and noise by implementing two down-conversion paths and performed frequency translational noise cancellation. It is shown in Fig. 2.6. Figure 2.6: Frequency translational noise cancelling In the receiver, a passive mixer immediately down-converts the RF current to baseband. A TIA then converts any current in the receive-band to voltage. The voltage measurement is provided by an auxiliary path, where an RF trans-conductance ( $G_m$ ) cell converts the RF node voltage to a current, which is then down-converted by another passive mixer. Another TIA then converts any in-band current to voltage. If high gain baseband operational amplifiers are employed, the input terminal of both TIAs appears as virtual grounds. Additionally, if large switches are used in the passive mixers, the impedance looking into the RF terminal of each mixer is small and, therefore, no RF voltage gain is experienced. At baseband, heavy low-pass filter is applied to filter any strong out-of-band blockers. In this arrangement, the auxiliary $G_m$ cell will ultimately determine the large-signal linearity of the system. The main path provides real resistive wideband matching, and its noise (matching resistor, mixer, and baseband TIA) is cancelled by the auxiliary path. The receiver is capable of tolerate 0-dBm blocker, and maintains a sub-2dB small signal noise figure across the band. But however linear the receiver could achieve, its filtering still relies on the switching of the passive mixers, so it is still susceptible to phase noise. Its LOGEN design now becomes critical, and could be very power hungry. ### 2.3 Phase Noise Cancellation with Modulated Blocker As mentioned, in the cellular or other wireless connectivity standards (WiFi, Bluetooth, etc.), the weak desired signal could be accompanied by strong out-of-band blockers. The origins of these blockers are various: it could be a signal from some other standard whose spectrum happens to sit very close to receiver's working spectrum, or a signal from the same standard at a nearby channel, or the signal from the receiver's own transmitter in a full-duplex FDD system. As a result, the blocker could be either continuous-wave (CW), or modulated. Typically speaking, the CW blockers defined in the cellular standards are stronger than the modulated ones. For example, the strongest out-of-band CW blocker defined is 0dBm, whereas the modulated blocker usually experiences attenuation from duplexer or circulator before reaching receiver's input port. The typical output power from a handset's transmitter is about +30dBm. After 40~55dB attenuation, the blocker is about -25 ~ -10dBm at receiver's input. Both the CW and modulated blockers could mix with LO's phase noise, and corrupt the signal through reciprocal mixing. While the phase noise cancellation with the presence of a CW blocker is straight-forward (Fig 2.4, 2.5), the modulated blocker's reciprocal mixing with phase noise needs to be cancelled in a more complicated approach. Firstly, as mentioned, the phase noise from receiver's LOGEN is always in nature PM because of VCO and buffer's amplitude limiting property. The LO with phase noise can be written as: $$V_{LO}(t) = A_{LO}e^{-j(\omega_{LO}t + \theta_{LO} + \varphi_n(t))}$$ (2.4) where $A_{LO}$ is a constant amplitude, $\omega_{LO}$ is the LO frequency, $\theta_{LO}$ is a static phase offset, and $\varphi_n(t)$ is the instantaneous phase noise in time domain. The minus sign in the exponent indicates its down-conversion function. Since phase noise or spur is always much weaker compared to the main carrier, narrow-band FM approximation can be applied to the LO expression (2.4). It can be re-written as: $$V_{LO}(t) = A_{LO} e^{-j \cdot (\omega_{LO} t + \theta_{LO})} e^{-j \cdot \varphi_n(t)} \approx A_{LO} e^{-j(\omega_{LO} t + \theta_{LO})} (1 - j\varphi_n(t))$$ (2.5) Without losing generality, the phase noise can be modeled as a set of infinite uncorrelated spurs with the same PSD. Therefore, the analysis with a single spur at $\Delta\omega_s$ with amplitude $\varphi_s$ can be easily extended to a phase noise with arbitrary power spectral density. So for a spur, $\varphi_n(t) = \frac{\varphi_s}{2} \left( e^{j(\Delta\omega_s t + \theta_s)} + e^{-j(\Delta\omega_s t + \theta_s)} \right)$ . Substituting it in (2.5), we can get: $$V_{LO}(t) = A_{LO} \left[ e^{-j(\omega_{LO}t + \theta_{LO})} - \frac{\varphi_s}{2} e^{-j((\omega_{LO} + \Delta\omega_s)t + \theta_{LO} + \theta_s - \pi/2)} - \frac{\varphi_s}{2} e^{-j((\omega_{LO} - \Delta\omega_s)t + \theta_{LO} - \theta_s - \pi/2)} \right]$$ (2.6) The first term in (2.6) is the clean carrier without phase noise, whereas the second term is an AM signal with the phase noise modulating a 90° phase-shifted version of the carrier. In the receiver, this LO mixes with both desired signal and blocker. The RF input can be expressed as: $$V_{RF}(t) = A_d(t)e^{j(\omega_d t + \theta_d(t))} + A_b(t)e^{j(\omega_b t + \theta_b(t))}$$ (2.7) where subscripts b and d indicate the blocker and the desired signal, respectively. Both of them have amplitude modulation A(t) and phase modulation $\theta(t)$ . After mixing (2.6) with (2.7), 4 components are generated: $$V_{IF} = V_{IF,d} + V_{IF,b} + V_{RM} + V_{RM,d}$$ (2.8) where $$V_{IF,d} = A_d(t)e^{j(\Delta\omega_d t + \theta_d(t) - \theta_{LO})}$$ (2.9) $$V_{IF,b} = A_b(t)e^{j(\Delta\omega_b t + \theta_b(t) - \theta_{LO})}$$ (2.10) $$V_{RM,d} = \frac{\varphi_s A_d(t)}{2} \left[ e^{j((\Delta \omega_d - \Delta \omega_s)t + \theta_d(t) - \theta_{LO} - \theta_s + \pi/2)} + e^{j((\Delta \omega_d + \Delta \omega_s)t + \theta_d(t) - \theta_{LO} + \theta_s + \pi/2)} \right]$$ $$(2.11)$$ $$V_{RM,b} = \frac{\varphi_s A_b(t)}{2} \left[ e^{j((\Delta \omega_b - \Delta \omega_s)t + \theta_b(t) - \theta_{LO} - \theta_s + \pi/2)} + e^{j((\Delta \omega_b + \Delta \omega_s)t + \theta_b(t) - \theta_{LO} + \theta_s + \pi/2)} \right]$$ (2.12) the blocker and desired signal offset frequencies are given by $\Delta \omega_d = \omega_d - \omega_{LO}$ and $\Delta \omega_b = \omega_b - \omega_{LO}$ . The terms $V_{IF,d}$ and $V_{IF,b}$ represent the down-converted desired signal and blocker. The desired signal mixing term $V_{RM,d}$ is typically much weaker than the desired signal (2.9), and is only important if a very high SNR is desired for demodulation. It usually only matters for very high order modulation schemes (16/64/256QAM, etc.). So we neglect $V_{RM,d}$ in this work. The reciprocal mixing product of the spur and strong blocker is expressed in (2.12). If $\Delta\omega_b - \Delta\omega_s$ is close to zero, the 1<sup>st</sup> term in (2.12) is in receive-band, and is in-distinguishable from the desired signal in (2.8). So it is desirable to cancel the 1<sup>st</sup> term of (2.12) with its 2<sup>nd</sup> term, the replica, as they are correlated but at two different frequencies. Also, the replica and the inband noise share the same amplitude modulation, $\varphi_s A_b(t)/4$ , therefore the replica needs to be multiplied by an auxiliary LO carrier<sup>2</sup> $V_{aux,LO}(t) = e^{j(\omega_{aux}t + \theta_{aux}(t))}$ , such that: $$e^{j\left((\Delta\omega_b + \Delta\omega_s)t + \theta_b(t) - \theta_{LO} + \theta_s + \pi/2\right)} \cdot V_{aux,LO}(t) = e^{j\left((\Delta\omega_b - \Delta\omega_s)t + \theta_b(t) - \theta_{LO} - \theta_s + \pi/2\right)}$$ (2.13) Solving (2.13) for $V_{aux,LO}(t)$ seems to be trivial. However, when dealing with complex frequency domain signals, it has to be kept in mind that the two complex exponentials are always equal in real time domain if their exponents are exact opposites, i.e. $Re\{e^{jf(t)}\}=Re\{e^{-jf(t)}\}$ . Therefore, there are two solutions to (2.13): 1) $$\omega_{aux} = -2\Delta\omega_s$$ , $\theta_{aux}(t) = -2\theta_s$ 2) $$\omega_{aux} = -2\Delta\omega_b$$ , $\theta_{aux}(t) = -2\theta_b(t) + 2\theta_{LO}$ Both solutions are mathematically valid, but the first solution requires knowledge of the spur frequency, which is impractical if the analysis is extended to phase noise. The second one, however, requires knowledge of the blocker offset frequency and modulation, which are deterministic for phase noise cancellation. Thus, to cancel the phase noise reciprocal mixing, the ideal carrier to multiply its replica is $$V_{aux,LO}(t) = e^{-j(2\Delta\omega_b t + 2\theta_b(t) - 2\theta_{LO})}$$ (2.14) It is also interesting to note the physical effect of this $V_{aux,LO}$ on the phase noise replica (the $2^{nd}$ term in (2.12)). After multiplying the replica with (2.14), it is shifted in frequency: <sup>&</sup>lt;sup>2</sup> It will be called "auxiliary LO", or "Aux. LO", throughout the context to differentiate from the main RF down-conversion LO. $$V_{aux}(t) = -\frac{\varphi_s A_b(t)}{2} e^{-j((\Delta \omega_b - \Delta \omega_s)t + \theta_b(t) - \theta_{LO} - \theta_s + \pi/2)}$$ (2.15) Clearly, the carrier frequency of $V_{aux}(t)$ is moved by $2\Delta\omega_b$ . Then the replica and $V_{aux}(t)$ can be both expressed with their complex baseband equivalent: $$V_{rp,BB}(t) = \frac{\varphi_s A_b(t)}{2} e^{j\theta_b(t)}$$ (2.16) $$V_{aux,BB}(t) = \frac{\varphi_s \bar{A_b}(t)}{2} e^{-j\theta_b(t)}$$ (2.17) In both equations above, their static phase offsets are ignored. The complex baseband equivalents show that the two spectrum's AM remains same, while PM is inversed. In frequency domain, it's equivalent to *flip* the blocker's complex spectrum. Its effect is shown in Fig. 2.7. This underlying physics is very important to understand our proposed phase noise cancellation scheme. Later on, this principle will be re-visited, and it will be shown that various alternative but more efficient approaches can be evolved to achieve phase noise cancellation based on this understanding. Figure 2.7: Spur cancellation with an arbitrarily modulated blocker ### 2.4 Auxiliary LO Generation The ideal auxiliary LO in (2.14) to cancel phase noise is a function of the down-converted blocker. Thus, in principle, we can use the information in the strong blocker to generate this auxiliary LO, as the blocker is available to us after down-conversion. The system diagram is shown in Fig. 2.8. Figure 2.8: Phase noise cancellation with modulated blocker After RF down-conversion, the desired signal is around DC, the blocker resides around $\Delta\omega_b$ or $\Delta f_b$ , and the phase noise replica is located at around $2\Delta f_b$ . To avoid any interference, a high-pass filter (HPF) can be used to reject the signal and admit the blocker and phase noise replica. The down-converted blocker is expressed in (2.10). So to generate $V_{aux,LO}(t)$ , intuitively, the blocker's AM needs to be rejected, and its frequency/phase to be multiplied by two. This section will discuss the issues in its realization and the implementation. ## 2.4.1 Frequency/Phase Multiply by Two and Noise at $\Delta f_b$ A signal's frequency/phase can be multiplied by two in a variety of ways. To minimize power consumption and for simplicity, we use full-wave rectifiers to realize this function. The full-wave rectifier essentially generates strong second-order non-linearity product. Its implementation is very convenient in differential circuits, as their common mode nodes always experience certain degree of second order non-linearity. The circuits in Fig. 2.9 are two simple examples. Strong 2<sup>nd</sup> order harmonic can be generated and maximized if the transistors are properly biased. Figure 2.9: Rectifier circuit examples Figure 2.10: Freq/phase multiplied by two and noise at $\Delta f_b$ After rectification, the blocker's phase and frequency is multiplied by two. However, it also generates unwanted noise which degrades receiver's performance. It is illustrated in Fig. 2.10. After RF down-conversion, blocker and phase noise replica is admitted by the HPF. It is used to generate the auxiliary LO, and also through the mixer to shift replica in-band. Since blocker passes through the mixer, any noise at $\Delta f_b$ in the auxiliary LO will again mix with this blocker, and create noise around DC. There are two sources of noise at $\Delta f_b$ . The first source is the phase noise replica at $2\Delta f_b$ that goes into the squarer. The second-order nonlinearity will mix this noise and blocker, and shift the noise to $\Delta f_b$ . Another source of noise is from pre-amplifier's flicker/DC noise inside this square block. The low frequency noise will also be shifted to $\Delta f_b$ by the second-order nonlinearity. Therefore, it is necessary to filter this noise. #### 2.4.2 Injection-Locked Oscillator - Phase Tracking Filter To filter the noise, an injection-locked oscillator is used in this work as a phase tracking filter. Injection-locking has been used extensively in modern CMOS IC chips mostly for frequency generation applications, e.g. inside PLL, CDR, etc. In [25], it is shown that in a synchronized (injection-locked) oscillator, the phase noise within the locking range will be suppressed to that of the injection signal, and that outside the locking range will be dominated by the oscillator itself. This property has been used to reduce the noise in high performance PLLs [26]-[27]. However, what has been investigated decades ago, but for reasons has also been abandoned, is injection-locking's ability to track injection's close-in PM, and reject far-out, if the injection is not a clean reference tone as when it is used in a PLL. For example, [28] used injection-locking as a phase tracking filter in an FM demodulator system. The most widely used model for injection-locking was derived by Adler in 1946, and was later re-printed in 1973 [29]. For a single tuned regenerative circuit (oscillator) under the influence of an external CW signal, assuming the injection signal as $I_{in}(t) = I_{inj}e^{j(\omega_{inj}t + \theta(t))}$ , and the oscillation output as $I_{out}(t) = I_{osc}e^{j(\omega_{inj}t + \alpha(t))}$ , they have the relationship: $$\frac{d\alpha(t)}{dt} = -\frac{I_{inj}}{I_{osc}} \frac{\omega_0}{2Q} \sin(\alpha(t) - \theta(t)) + (\omega_0 - \omega_{inj})$$ (2.18) where $\omega_0$ is the free-running frequency of the oscillator, Q is the quality factor of the tank or its equivalent in a ring oscillator. This relationship is valid when the injection current is small, i.e. $I_{inj} \ll I_{osc}$ . To understand its filtering property, we'll firstly study the case when the injection is a sine-wave with weak frequency modulation. The injection can be written as $$I_{in}(t) = I_{inj}e^{j(\omega_{inj}t + \varphi_m\cos\omega_m t)}$$ $$= I_{inj}e^{j\omega_{inj}t}\left[1 + \frac{\varphi_m}{2}e^{j(\omega_m t - \pi/2)} + \frac{\varphi_m}{2}e^{j(-\omega_m t - \pi/2)}\right]$$ (2.19) where $\varphi_m \ll 1$ and $\omega_m$ is the phase modulation rate. Assuming output is $I_{out}(t) = I_{osc}e^{j\left(\omega_{inj}t + \alpha(t)\right)}$ , and apply Adler's equation: $$\frac{d\alpha(t)}{dt} = -\frac{I_{inj}}{I_{osc}} \frac{\omega_0}{2Q} \sin(\alpha(t) - \varphi_m \cos \omega_m t) + (\omega_0 - \omega_{inj})$$ (2.20) It is also assumed that $\omega_0 = \omega_{inj}$ , so the system is symmetrical and the analysis can be simplified. Under this condition, the main carrier falls right onto the oscillator's free running frequency, and will not pull the oscillator but only increase the effective oscillation strength. Therefore, in this case, the Adler's equation can be simplified as $$\frac{d\alpha(t)}{dt} = \frac{\varphi_m I_{inj}}{I_{osc}} \frac{\omega_0}{4Q} \left( \sin(\omega_m t - \pi/2 - \alpha(t)) + \sin(-\omega_m t - \pi/2 - \alpha(t)) \right)$$ (2.21) As shown in [30], this differential equation can be solved with the solution of each frequency component's power strength. The oscillation center is expected to be at $\omega_0$ , and is accompanied by infinite side-bands at $\omega_0 \pm k\omega_m$ . The first pair of side-bands (k=1) account for more than 80% of the total energy, so we can neglect the higher order side-bands. The first-order side-bands have power of $$P(\omega_0 \pm \omega_m) \approx \frac{4\left(\frac{\varphi_m I_{inj}}{I_{osc}} \frac{\omega_0}{2Q}\right)^2 \left(C + \frac{1}{C}\right)^2}{\omega_m^2 + \left(\frac{\varphi_m I_{inj}}{I_{osc}} \frac{\omega_0}{2Q}\right)^2 \pi^2 \left(C + \frac{1}{C}\right)^2}$$ (2.22) where C is an integration constant, and has an empirical value of $\pm(\sqrt{2}\pm 1)$ . This shows that the envelop of the sideband power is Lorentzian in $\omega_m$ . So a far-out $\omega_m$ sideband injection is attenuated (filtered), while a closer-in $\omega_m$ sideband is retained, if it falls in-band. The PM filter's pass bandwidth is roughly: Figure 2.11: Simulated PM filtering response of injection-locking Figure 2.11 shows the simulated PM filtering of injection-locking. The weak PM sidebands' response is plotted, which is obtained from Spectre-RF's pac analysis. #### 2.4.3 Injection-Locked Oscillator - Phase Tracking Error As injection locking is proven to be useful to filter the far-out PM components, specifically the noise at $\Delta f_b$ in our case, it also introduces certain distortion into the tracked phase within its locking range. Firstly, we examine the likely spectrum density of the injections. Ignoring the weak PM noises, if the blocker is a CW, the injection is obviously also a down-converted CW; however, if the blocker is modulated, the injection would be $e^{-j(2\Delta\omega_b t + 2\theta_b(t) - 2\theta_{LO})}$ . It is the PM of a modulated signal. While the signal itself might be a well-defined band-limited signal, its PM could be much wider. It is also called PM spectrum expansion. Figure 2.12 shows its effect in EDGE and WCDMA signals. Their PM roll off very slowly. Therefore, injection locking's limited tracking bandwidth potentially introduces distortion. Figure 2.12: Spectrum expansion in (a) EDGE and (b) WCDMA [70]-[71] Besides the PM being filtered, the injection locking also distorts the PM within its locking range. (2.18) needs to be solved to analyze its effect. We define $\theta_e(t) = \alpha(t) - \theta(t)$ , and $A = \frac{l_{inj}}{l_{osc}} \frac{\omega_0}{20}$ so (2.18) can be re-written as: $$\frac{d\theta_e(t)}{dt} = \frac{d\theta(t)}{dt} - A\sin\theta_e(t) + (\omega_0 - \omega_{inj})$$ (2.24) Its solution can only be found in an iteration procedure [31]: $$\sin \theta_e(t) = \frac{1}{A} \left\{ \theta' \left( t - \frac{1}{A} \right) + \frac{\theta'''(t)}{2A^2} - \frac{5\theta''''(t)}{6A^3} + \dots - \frac{\left[ \theta'^3(t) \right]'}{6A^3} + \dots \right\}$$ (2.25) When A is large, the series in (2.25) converges rapidly and only the first term is significant. And assuming $\theta_e(t)$ to be small: $$\theta_e(t) \approx \frac{1}{A} \theta' \left( t - \frac{1}{A} \right)$$ (2.26) Apply Laplace transform to the exponent, and assuming s/A to be small (which is true if the PM is within the locking range): $$\alpha(s) = \mathcal{L}[\theta(t) - \theta_e(t)] = \theta(s) - \frac{s}{A} \cdot \theta(s) e^{-\frac{s}{A}} \approx \theta(s) e^{-\frac{s}{A}}$$ $$\therefore \alpha(t) \approx \theta\left(t - \frac{1}{A}\right)$$ (2.27) Therefore, the output of the injection locked oscillator is $$I_{out}(t) = I_{osc} e^{j(\omega_{inj}t + \alpha(t))} \approx I_{osc} e^{j(\omega_{inj}t + \theta(t - \tau))}$$ (2.28) where $\tau=1/A$ . (2.28) states that, if the majority of injection's PM is within injection locked oscillator's locking range, these PM will be tracked by the oscillator, but the PM will be effectively delayed. The delay is inverse proportional to A, which is defined as $A=\frac{l_{inj}}{l_{osc}}\frac{\omega_0}{2Q}$ . If a ring oscillator is used instead of an LC one, the Q in the expressions above can be replaced by its counterpart in a ring oscillator. Without going into details, it can be proven that for injection locked ring oscillator, $$A = \frac{I_{inj}}{I_{osc}} \frac{(RC\omega_0)^2 + 1}{RC} = \frac{I_{inj}}{I_{osc}} \cdot \omega_0 \cdot \frac{\tan^2(\pi/N) + 1}{\tan(\pi/N)}$$ (2.29) where R and C are the value of resistor and capacitor in the ring oscillator's delay cell stage, and N is the number of stages in the ring oscillator. So in the circuit design, a stronger injection strength results in a smaller error, as well as more delay stages in the ring oscillator, assuming a constant oscillation frequency is maintained. Figure 2.13: An oscillator locked to an FM signal Figure 2.14: Phase error in injection-locked oscillator under FM signal This delay PM distortion can also be verified through simulation. Again, a simple FM signal is injected into the oscillator, as shown in Fig. 2.13. Because of the distortion, the output is still an FM signal, but with the two sidebands experiencing certain phase shift relative to the main carrier. Figure 2.14 plots the phase error with swept modulation rate, and various injection strengths. It is shown that as modulation rate increases, the phase error ( $\theta_{error}$ ) also increases linearly, which is consistent with the constant delay model in (2.28). And stronger injection strength also results in smaller $\theta_{error}$ , validating (2.29). Since the injection-locking's PM distortion can be modeled as a constant delay, or linear phase shift within its locking range; and it filters PM outside its locking range, a system simulation is set up to evaluate the distortion and give guidance to the actual circuit design. Figure 2.15 shows the system diagram in the simulation. Figure 2.15: Phase noise cancellation's system simulation for injection locking's PM distortion A 1.4MHz LTE blocker is applied with 7.5MHz offset. The blocker bandwidth and offset are both scaled down to save simulation resource and time. The spectrums at the critical nodes are also shown. The blocker's PM component is ideally extracted and modulates an oscillator at $2\Delta f_b$ . This output is exactly the ideal Aux. LO to down-convert the phase noise image. If this ideal Aux. LO is used to drive the phase noise cancelling mixer (the BPF shown in Fig. 2.15 is shorted), an ideal phase noise cancellation can be achieved (right bottom plot of Fig. 2.15). The high noise floor due to phase noise reciprocal mixing can be significantly lowered. Afterwards, instead of using this ideal LO, a 2<sup>nd</sup>-order Butterworth IIR band-pass filter is applied to model the PM distortion from the injection locking. Similar to injection locking, the Butterworth BPF also has a linear in-band phase distortion (constant group delay) with the 2<sup>nd</sup>-order band-pass filtering response. And more importantly, they both have a smaller phase distortion with a wider pass/locking bandwidth. Although there is one major difference that the Butterworth filtering introduces varied amplitude, we believe the model is accurate enough to model injection locking's PM distortion, and its impact on the phase noise cancellation. With this filter, the spectrum plot on the middle right still shows certain error, but a significant amount of cancellation can still be achieved. To get more insight, the pass bandwidth of the BPF is swept, and the amount of phase noise cancellation is plotted in Fig. 2.16. Figure 2.16: Phase noise cancellation vs. BPF bandwidth This simulation is very useful, because it provides guidance for us to choose the PM filtering bandwidth (locking range) that is sufficient to achieve certain phase noise cancellation. From a desired cancellation level, a circuit designer can work out the injection locking bandwidth, and design the injection strength accordingly. ### 2.4.4 Injection-Locked Oscillator - Phase Noise As mentioned in previous sections, the injection locking tracks PM inside its locking range and filters PM at far-out. Instead, its own phase noise dominates at far-out (Fig. 2.17). The phase noise mixes with blocker, and appears in the received band after phase noise cancellation. Figure 2.17: Injection locked oscillator's phase noise and its effect on receiver To fairly compare our proposed phase noise cancellation scheme with a stand-alone VCO, an effective oscillator FOM can be defined with receiver's blocker NF. Substituting (2.1) into (2.2), FOM can be re-defined as $$FoM = \frac{\left(\frac{\omega_c}{\Delta\omega}\right)^2}{\mathcal{L}\{\Delta\omega\}P_{DC[mW]}} = \frac{\left(\frac{\omega_{RF}}{\Delta\omega_b}\right)^2}{(P_b - 174)P_{DC[mW]}}$$ (2.30) Assuming RF VCO's phase noise can be perfectly cancelled, this phase noise from injection locked oscillator sets receiver's new noise floor. However, unlike the RF VCO, this oscillator oscillates at $2\Delta f_b$ , a much lower frequency. In fact, since this frequency of interest, $\Delta f_b$ , is always half of the oscillation frequency, the oscillator's power budget is now fixed given the desired blocker tolerance. For example, assuming a typical ring oscillator with a FOM of 168dB is injection-locked, its required power consumption is 6mW to tolerate a 0dBm blocker (-170dBc/Hz at $\Delta f_b$ offset), regardless of its RF location. Therefore, the oscillator power is no longer a function of the signal's RF frequency and blocker offset, whereas in the RF VCO, its FOM is relatively constant given the topology, so its power consumption is linearly dependent on the ratio of RF frequency and blocker offset. $$FoM_{new} = FoM_{ILRO} + 20 \log(\omega_{RF}/2\Delta\omega_b)$$ (2.31) Since now the power consumption is fixed, the LOGEN's new effective FOM, as defined in (2.30) and (2.31), can be improved dramatically. The resulting FOM is plotted in Fig. 2.18. As the RF frequency increases, and blocker gets closer in, a higher effective FOM can be achieved. It is all due to the reason that we have effectively de-coupled the VCO power and phase noise budget from its oscillation frequency and offset. It has to be noted that the plotted FOM in Fig. 2.18 is only the theoretical best that we can achieve. In reality, the phase noise cancellation is finite, and injection locking's preamplification also consumes about 5mW power, which is not included in this plot. Figure 2.18: Theoretical best oscillator effective FOM Figure 2.19. PLL as phase tracking filter #### 2.4.5 Injection-Locked Oscillator vs. Phase Locked Loop (PLL) Besides the injection locking, a phase locked loop is also capable of tracking close-in and filtering far-out PM. The filtering is realized in a PLL's loop filter. Figure 2.19 shows two potential PLL based Aux. LO generation systems. The first system has down-converted blocker amplified, then fed into the PLL as reference. The PLL's division ratio of 2 makes the VCO's output as reference's frequency/phase multiplied by two. While this diagram looks feasible on the first sight, it suffers a serious issue, which is the reference spur. The reference spur appears at $\Delta f_b$ and would mix with blocker. As this spur is high, it could potentially corrupt the receiver. The second approach is very similar to our proposed approach based on injection locking (Fig. 2.10). The down-converted blocker is amplified and squared, before it is fed into the PLL as reference. Compared to injection locking, this approach is likely more power hungry as the frequency/phase detector and charge pump are needed. Also, a PLL's bandwidth is likely limited to ensure its stability. Considering its low free-running frequency (100~200MHz), the loop bandwidth is likely to be limited under 1MHz. And a PLL is inherently at least 2<sup>nd</sup> order, as the VCO is an integrator itself. Such narrow loop bandwidth is not able to cover the expanded PM bandwidth of a modulated signal (Fig. 2.11); together with the high-order filtering, the PLL is likely to introduce even more distortion to the tracked PM than an injection-locked oscillator. ### 2.5 Proposed Receiver with Phase and Thermal Noise Cancellations With the Aux. LO generation being addressed, Fig. 2.20 shows the proposed receiver topology with phase and thermal noise cancellations [45]. Figure 2.20: The proposed receiver with phase and thermal noise cancellations In the main path, a passive mixer immediately down-converts the RF current to baseband. A TIA then converts any current in the receive-band to voltage, while the blocker current is used to generate the Aux. LO, using rectifier and injection locking. An auxiliary path provides the RF voltage measurement, where an RF trans-conductance converts the RF node voltage to a current, which is then down-converted by another passive mixer. Afterwards, the current splits into two paths: the noise cancelling auxiliary (NC. Aux.) path, and the phase noise cancelling auxiliary (PNC Aux.) path. The NC Aux. path has a TIA to converts the current to voltage in the receive-band. It cancels the noise from the mixer-first main path. The PNC Aux. path down-converts the phase noise image current, and converts it to voltage for cancellation. As outlined in the remainder of this chapter, this receiver can cancel the reciprocal mixing caused by VCO phase noise and blocker, be low-noise, and tolerate strong out-of-band blocker. Similar to [9]-[10], the thermal noise from the main path can be cancelled if $$R_{MAIN} = G_m R_{NC,AUX} R_S (2.31)$$ And the VCO phase noise can be cancelled if $$R_{MAIN} = \frac{G_m R_{PNC,AUX} R_S}{2} \tag{2.32}$$ $$V_{LO,aux} = \cos(2\Delta\omega_b t + 2\theta_b(t) - 2\theta_{LO}) \tag{2.33}$$ If (2.32) and (2.33) are both met in Fig. 2.20, the phase and thermal noise can both be cancelled. The factor of 2 is needed in (2.33), because both main path's and NC Aux. path's outputs contain correlated phase noise. While thermal noise is cancelled by adding these outputs, the phase noise doubles, therefore the PNC Aux. path needs a gain of 2 to cancel it. If high gain baseband operational amplifiers are employed, the input terminals of all TIAs appear as virtual grounds. If large switches are used in the passive mixers, the impedance looking into the RF terminal of each mixer is small and, therefore, no RF voltage gain is experienced. In addition to their excellent linearity and low flicker noise [32]-[33], passive mixers can handle large down-conversion currents and, therefore, the auxiliary transconductance cell will ultimately limit the achievable large-signal linearity of the system. # 2.6 Out-of-band Linearity – Blocker Tolerance Overcoming gain compression with out-of-band blocker is essential in any wideband receiver designs. In a practical application, strong blockers are usually at a much higher frequency offset than the receiver's receive-channel bandwidth. We'll examine the out-of-band linearity of the receiver's main path and auxiliary path, respectively. #### 2.6.1 Mixer-first Main Path Figure 2.21: Mixer-first receiver and its equivalent model Mixer-first architecture is firstly applied in wide-band receiver designs in [22]-[24]. The work created an RF bandpass impedance by up-converting the baseband low-pass impedance with the passive mixers (Fig. 2.21), and used this impedance for antenna impedance matching. The clock switching the passive mixers is $$Sw_i(t) = Sw_0 \left( t - \frac{iT_{LO}}{M} \right) \tag{2.34}$$ The Fourier coefficients of the periodic waveform are $$sw_i[k] = \frac{1}{M}\operatorname{sinc}\left(\frac{k\pi}{M}\right)$$ (2.35) If the time constant of baseband impedance $Z_{BB}(\omega)$ is much larger than the clock period, within the receive-band, the impedance loading the antenna is: $$Z_{in}(\omega) = R_{sw} + \frac{1}{M} \operatorname{sinc}^{2} \left(\frac{\pi}{M}\right) Z_{BB}(\omega + \omega_{LO})$$ (2.36) where $R_{sw}$ is the switch resistance, M is the number of clock phases, $\omega_{LO}$ is the clock frequency, and $Z_{BB}$ is the baseband impedance. (2.36) shows that the baseband impedance is frequency translated to $\omega_{LO}$ . For example, a low-pass impedance in $Z_{BB}$ results in a high-Q band-pass impedance at $\omega_{LO}$ . This is also known as N-path filtering. It has to be noted that the impedance expressed in (2.36) only takes the component around fundamental into consideration. In reality, the low-pass impedance is also upconverted to the harmonics of LO (m $\omega_{LO}$ ), and a driving current around $\omega_{LO}$ could result in voltages not only around $\omega_{LO}$ , but also at its harmonics. Applying the *N*-path filtering directly following the antenna brings superior linearity to the receiver. The mixer itself is highly-linear given that the sharp clock pulses can be generated, which is fairly easy to achieve with modern sub-micrometer CMOS technology. And by creating the high-*Q* bandpass filter, the blocker swing at mixer's output (IF port) can be heavily attenuated, so the TIA does not experience any gain compression. In [9]-[10], [22]-[24], large filtering capacitors (~50pF) are placed at TIA input to suppress the blocker swing. [22]-[24] also showed that with a larger input filtering shunt capacitor, receiver's out-of-band linearity improves, and a closer-in blocker can be tolerated. However, in our work, the explicit capacitor at TIA's input is avoided for two reasons: 1) a moderate blocker swing is needed for Aux. LO generation (Fig. 2.20); 2) the receiver's noise degrades if heavy filtering is introduced at this node, it will be explained in Section 2.7.3. Next, we'll show that by optimizing the switch sizes and filtering capacitors at TIA's feedback and output, the mixer-first path can achieve comparable out-of-band linearity with much smaller capacitors, and a smaller die area as a result. In the mixer-first receiver, there are 3 nodes that could potentially be saturated with the strong blocker: the RF input, the baseband TIA's input, and the TIA's output. In Fig. 2.21, if the antenna is modeled as an ideal voltage source with an impedance of $Z_S$ , the voltage source has amplitude of $$V_{in}(\omega) = 2\sqrt{Re(Z_s)P(\omega)}$$ (2.37) where $P(\omega)$ is the power at the specific frequency, e.g. blocker and the desired signal. To derive the voltage at different node in Fig. 2.20, an LTV approach needs to be adopted because of passive mixer's LTV time-variant sampling property. In [34], the system is modeled as in Fig. 2.22, and the voltage at virtual node $V_P$ is expressed: $$V_P(m\omega_{LO} + \Delta\omega) = I_{N_{FOLD}}(m\omega_{LO} + \Delta\omega)Z_{P_{FOLD}}(m\omega_{LO} + \Delta\omega)$$ (2.38) $$I_{N_{FOLD}}(m\omega_{LO} + \Delta\omega) = \sum_{g=-\infty}^{\infty} \frac{sw[m - gM]}{sw[m]} I_N((m - gM)\omega_{LO} + \Delta\omega)$$ (2.39) $Z_{P_{FOLD}}(m\omega_{LO} + \Delta\omega) =$ $$\{M|sw[m]|^{2}Z_{BB}(\Delta\omega)\} \parallel \prod_{g=-\infty}^{\infty} \left\{ \frac{|sw[m]|^{2}}{|sw[m-gM]|^{2}} Z_{N} ((m-gM)\omega_{LO} + \Delta\omega) \right\}$$ (2.40) The operator $\prod$ here is defined as $\prod Z_i = \cdots \parallel Z_0 \parallel Z_1 \parallel Z_2 \parallel \cdots = \left(\sum Z_i^{-1}\right)^{-1}$ , which is the parallel impedance of $Z_i$ , or more generally, the harmonic mean of $Z_i$ without times its length. It is not the product operator commonly seen in other mathematical literature. In equations (2.38)-(2.40), $I_N$ and $Z_N$ are the Norton equivalent source of antenna, matching network, and passive mixer turn-on resistor. They are expressed in Fig. 2.22. Obviously, $I_N$ is related to blocker power $P_b$ . It can be expressed as $$I_N(\omega_b) = \frac{2\sqrt{Re(Z_s)P_b}}{Z_N(\omega_b)} = \frac{2\sqrt{Re(Z_s)P_b}}{Z_s(\omega_b) + R_{sw}}$$ (2.41) So, with the knowledge of $V_P$ , the voltage at nodes $V_{RF}$ can be found: Figure 2.22: Mixer-first receiver – decomposed into LTI and LTV sections $$I_{MX}(\omega) = I_N(\omega) - \frac{V_P(\omega)}{Z_N(\omega)}$$ $$V_{RF}(\omega) = V_P(\omega) + R_{SW}I_{MX}(\omega) = V_P(\omega) \frac{Z_S(\omega)}{Z_S(\omega) + R_{SW}} + R_{SW}I_N(\omega)$$ (2.42) While $V_{RF}$ is straightforward to derive in (2.42), the voltage at mixer's baseband output $V_{BBi}$ is more complicated. In time domain, the current following into each baseband path is $$i_{BBi}(t) = i_{MX}(t) \cdot Sw_i(t) \tag{2.43}$$ Apply Fourier transformation to (2.43), and substitute (2.42) and (2.38) into (2.43), $$I_{BBi}(\Delta\omega) = \sum_{k=-\infty}^{\infty} sw_i[k]I_{MX}(k\omega_{LO} + \Delta\omega)$$ $$= \sum_{k=-\infty}^{\infty} sw_i[k] \left\{ I_N(k\omega_{LO} + \Delta\omega) - \frac{I_{N_{FOLD}}(k\omega_{LO} + \Delta\omega)Z_{P_{FOLD}}(k\omega_{LO} + \Delta\omega)}{Z_N(k\omega_{LO} + \Delta\omega)} \right\}$$ (2.44) The current $I_{N_{FOLD}}$ and impedance $Z_{P_{FOLD}}$ are defined in (2.39) and (2.40). This expression has an implicit meaning that the fundamental as well as harmonics of $I_N$ are down-converted to baseband, and in turn generate the baseband voltages. But interestingly, if we focus on the RF current around the m-th harmonic, unlike in an LTI system, the voltage induced by this current at baseband does not only depend on the RF impedance around the m-th harmonic ( $Z_N(m\omega_{LO} + \Delta\omega)$ ), but all other harmonics. It is because the voltage at baseband input is sampled by the mixer again and also defines the voltage $V_P$ . Substitute (2.39) into (2.44) $$I_{BBi}(\Delta\omega)$$ $$= \sum_{k=-\infty}^{\infty} sw_{i}[k] \left\{ I_{N}(k\omega_{LO} + \Delta\omega) - \frac{\sum_{g=-\infty}^{\infty} \frac{sw[k+gM]}{sw[k]} I_{N}((k+gM)\omega_{LO} + \Delta\omega) Z_{P_{FOLD}}(k\omega_{LO} + \Delta\omega)}{Z_{N}(k\omega_{LO} + \Delta\omega)} \right\}$$ $$= \sum_{k=-\infty}^{\infty} sw_{i}[k] I_{N}(k\omega_{LO} + \Delta\omega)$$ $$- \sum_{k=-\infty}^{\infty} sw_{i}[k] \frac{\sum_{g=-\infty}^{\infty} \frac{sw[k+gM]}{sw[k]} I_{N}((k+gM)\omega_{LO} + \Delta\omega) Z_{P_{FOLD}}(k\omega_{LO} + \Delta\omega)}{Z_{N}(k\omega_{LO} + \Delta\omega)}$$ $$= \sum_{k=-\infty}^{\infty} sw_{i}[k] I_{N}(k\omega_{LO} + \Delta\omega)$$ $$- \sum_{k=-\infty}^{\infty} sw_{i}[k] I_{N}(k\omega_{LO} + \Delta\omega) \sum_{g=-\infty}^{\infty} \frac{Z_{P_{FOLD}}((k-gM)\omega_{LO} + \Delta\omega)}{Z_{N}((k-gM)\omega_{LO} + \Delta\omega)}$$ $$= \sum_{k=-\infty}^{\infty} sw_{i}[k] I_{N}(k\omega_{LO} + \Delta\omega) \left(1 - \sum_{g=-\infty}^{\infty} \frac{Z_{P_{FOLD}}((k+gM)\omega_{LO} + \Delta\omega)}{Z_{N}((k+gM)\omega_{LO} + \Delta\omega)}\right)$$ $$(2.45)$$ During the derivation of (2.45), the summation is manipulated such that the RF current around the same harmonic is grouped together. Substituting (2.40) into (2.45), we can obtain $I_{BBi}(\Delta\omega)$ $$= \sum_{k=-\infty}^{\infty} sw_{i}[k]I_{N}(k\omega_{LO} + \Delta\omega) \left(1 - \sum_{g=-\infty}^{\infty} \frac{Z_{P_{FOLD}}((k+gM)\omega_{LO} + \Delta\omega)}{Z_{N}((k+gM)\omega_{LO} + \Delta\omega)}\right)$$ $$=\sum_{k=-\infty}^{\infty}\left\{\left(1-\sum_{g=-\infty}^{\infty}|sw[k+gM]|^2\frac{MZ_{BB}(\Delta\omega)\parallel\prod_{h=-\infty}^{\infty}\left\{\frac{Z_{N}\big((k+gM-hM)\omega_{LO}+\Delta\omega\big)}{|sw[k+gM-hM]|^2}\right\}}{Z_{N}\big((k+gM)\omega_{LO}+\Delta\omega\big)}\right\}\right\}$$ $$= \sum_{k=-\infty}^{\infty} \left\{ \left( 1 - \sum_{g=-\infty}^{\infty} \frac{M|sw[k+gM]|^2}{Z_N((k+gM)\omega_{LO} + \Delta\omega)} \left\{ Z_{BB}(\Delta\omega) \parallel \prod_{h=-\infty}^{\infty} \frac{Z_N((k+hM)\omega_{LO} + \Delta\omega)}{M|sw[k+hM]|^2} \right\} \right) \right\}$$ (2.46) Inspecting this expression, it can be concluded that each element in the summation is a current dividing among a bus of impedances. If we focus on the RF current around the *m*-th harmonic, $I_{BBi,m}(\Delta\omega) = sw_i[m]I_N(m\omega_{LO} + \Delta\omega)$ $$\times \left(1 - \sum_{g = -\infty}^{\infty} Z_{N,BB}[m,g]^{-1} \left\{ Z_{BB}(\Delta \omega) \parallel \prod_{h = -\infty}^{\infty} Z_{N,BB}[m,h] \right\} \right)$$ (2.47) where $$Z_{N,BB}[m,h] = \frac{Z_N((m+hM)\omega_{LO} + \Delta\omega)}{M|sw[m+hM]|^2}$$ (2.48) Given (2.47), an LTI model can be made by mapping the RF current around the *m*-th harmonic to baseband. The model is shown in Fig. 2.23. This model also verifies the conclusion from [34]: as the number of mixer phases increases the folding effects become less pronounced. When *M* becomes very large, folding effects can be ignored: $$\lim_{M \to \infty} I_{BBi,m}(\Delta \omega) = \frac{1}{M} I_N(m\omega_{LO} + \Delta \omega) \frac{Z_{BB}(\Delta \omega) \parallel M \cdot Z_N(m\omega_{LO} + \Delta \omega)}{Z_{BB}(\Delta \omega)}$$ (2.49) Infinite number of impedance folding terms $$Z_{BB,i}(\omega)$$ $I_{NFOLD} = sw_i[m]I_N(m\omega_{LO} + \Delta\omega)$ $Z_{NFOLDh} = \frac{Z_N((m+hM)\omega_{LO} + \Delta\omega)}{M|sw[m+hM]^2}$ Figure 2.23: Baseband's equivalent LTI model around m-th harmonic Therefore, when M is large, the baseband TIA can be modeled with simple Norton equivalent circuit shown in Fig. 2.24. Under this assumption, all the impedances folded from other harmonics are much larger than the one folded from LO's fundamental, $Z_{N,BB}[m,0]$ or $Z_{NFOLD0}$ , so in Fig. 2.24 only this dominant impedance is shown and other impedances are neglected. Figure 2.24: Simplified LTI model, and the baseband TIA Next, we analyze the baseband TIA circuit. The TIA is based on operational transconductance amplifier (OTA) with resistive feedback (Fig. 2.24). To avoid gain compression, filtering capacitors can be placed at OTA's input, feedback and output. The OTA's transconductance is $G_m$ . Write KCL and KVL of the circuit: $$(V_{in} - V_{out}) \left( sC_F + \frac{1}{R_F} \right) = V_{out} \cdot sC_L + G_m \cdot V_{in}$$ $$I_{in} = (V_{in} - V_{out}) \left( sC_F + \frac{1}{R_F} \right) + V_{in} \cdot sC_{in}$$ Solving the equations above, we can obtain the input impedance and the gain of the TIA: $$Z_{BB}(s) = \left\{ \frac{s(C_F + C_L) + \frac{1}{R_F}}{sC_L + G_m} \cdot \frac{1}{sC_F + \frac{1}{R_F}} \right\} / / \frac{1}{sC_{in}}$$ (2.50) $$V_{out}(s) = \frac{sC_F + \frac{1}{R_F} - G_m}{\left(s(C_F + C_L) + \frac{1}{R_F}\right) \frac{1}{Z_{NFOLD}} + (sC_L + G_m) \left(sC_F + \frac{1}{R_F}\right)} I_{NFOLD}$$ (2.51) As expected, the low frequency in-band impedance is $1/G_m$ . This impedance is up-converted by passive mixer, and matches the antenna. As mentioned in [9]-[10], [22]-[24], $C_{in}$ dominates the impedance at blocker frequency to filter blocker swing at TIA's input and output. In this work, explicit $C_{in}$ is avoided to avoid the RF filtering at RF node. So without $C_{in}$ , assuming $C_F$ dominates over $R_F$ and a well-matched front-end ( $Z_{NFOLD} \approx 1/G_m$ ), at blocker offset (2.50) and (2.51) can be written as: $$Z_{BB}(j\Delta\omega_b) = \frac{C_F + C_L}{C_F} \cdot \frac{1}{j\Delta\omega_b C_L + G_m}$$ (2.52) $$V_{out}(j\Delta\omega_b) \approx \frac{j\Delta\omega_b C_F - G_m}{j\Delta\omega_b (2C_F + C_L)G_m - \Delta\omega_b^2 C_L C_F} I_{NFOLD}$$ (2.53) From (2.52), it can be seen that with large $C_L$ and $C_F$ , the baseband impedance can also be made small, which attenuates the blocker swing at TIA's input. At TIA's output, its expression (2.53) is a bit more complex. But we can also obtain the guideline that increasing $C_L$ and $C_F$ could suppress the output blocker swing as well. Figure 2.25 is the simulated blocker gain and noise figure of the mixer-first path (Fig. 2.21). Figure 2.25: Simulated gain compression and blocker NF with varying $C_F$ and $C_L$ In the simulation, the output channel bandwidth is maintained as >1MHz. Without the $C_{in}$ , it can be seen that with modest $C_F$ and $C_L$ , the mixer-first path's gain compression and noise figure degradation at 0dBm blocker is sufficiently low. Furthermore, the gain compression can be compensated by the back-end, and most of the noise figure increase can be cancelled by the thermal noise cancellation path when it is turned on. After cancellation, the overall 0dBm blocker noise figure is less than 5dB. #### 2.6.2 Auxiliary Path Unlike the mixer-first main path, the auxiliary path employs a low-noise trans-conductance at the front-end to suppress the noise from mixers and baseband circuitries. Its trans-conductance is typically large to achieve a low-noise on its own; therefore it is important to minimize its load impedance at the blocker frequency. The auxiliary path of the receiver (Fig. 2.20) is re-drawn. Figure 2.26: Phase and thermal noise cancelling auxiliary path impedance - linearity Assuming the main path's noise can be perfectly cancelled, the auxiliary RF $G_m$ is also the dominating source of receiver's noise [9]-[10]. Since a CMOS $G_m$ 's input referred noise is $\overline{V_n}^2 = 4kT \gamma/G_m$ , its trans-conductance can be chosen from a desired receiver NF. In our design, it is sized to have 150mS of trans-conductance. To achieve the blocker tolerance, 2-step N-path filtering is implemented. As shown in Fig. 2.26, the phase noise cancellation mixer (MX<sub>PNAUX</sub>) is passive mixer, and is driven by the auxiliary LO at around $2\Delta f_b$ . At phase noise cancellation TIA's input, large shunt capacitors, $C_{p\_pnc}$ , is inserted to filter the blocker. Due to the reciprocity of the passive mixers (MX<sub>AUX</sub> and MX<sub>PNAUX</sub>), the phase noise cancellation auxiliary TIA's input impedance is up-converted twice: firstly up-converted to $2\Delta f_b$ by MX<sub>PNAUX</sub>, and then up-converted to RF by MX<sub>AUX</sub>. This effect is also illustrated in Fig. 2.26. Therefore, the PNC TIA's input impedance at $-\Delta f_b$ , which is dominated by $C_{p\_pnc}$ , appears at $\Delta f_b$ after first up-conversion, and then appears at $f_{RF} + \Delta f_b = f_b$ , and loads RF Gm's output. Or, it can be understood such a way that the blocker current generated from the RF Gm flows through MX<sub>AUX</sub> and MX<sub>PNAUX</sub>, and is finally filtered by $C_{p\_pnc}$ . The RF G<sub>m</sub>'s load impedance can be written as $$Z_{RF}(\omega_b) = R_{sw,AUX} + \frac{1}{M} \operatorname{sinc}^2\left(\frac{\pi}{M}\right) \left(\frac{1}{j\Delta\omega_b C_{s\_pnc}} + R_{sw,PNAUX} + \operatorname{sinc}^2\left(\frac{\pi}{M}\right) Z_{TIA,PNC}(-\Delta\omega_b)\right)$$ (2.54) By sizing the mixer switches large, this load impedance is about $10\Omega$ at blocker frequency, so the RF Gm does not compress in the presence of a 0dBm blocker. Figure 2.27: Auxiliary path baseband impedances It is noted that the capacitor $C_{s\_pnc}$ sets a high-pass corner of the up-converted impedance, thus it has to be large so the blocker current is passed to the filtering capacitor $C_{p\_pnc}$ . As shown in the latter sections, this high-pass capacitor is important for achieving low-noise. ### 2.7 Noise Analysis This section presents the analysis of receiver's noise performance. There are three different types of noise that are relevant in this design. The first type is the in-band thermal noise, these noise are also the commonly defined noise in a receiver design. The second type is the folded thermal noise from the image frequency. They are folded through the frequency down-conversion in the phase noise cancellation. The third type is the noise originated from RF LO's phase noise. Besides reciprocal mixing, the phase noise that appears at the RF node is also of interest in our multi-path architecture. It can be shown that the main path design greatly determines the phase noise and its replica image at the auxiliary output, and in turn affects the system's performance. #### 2.7.1 Receiver In-band Noise The receiver in-band noise is defined as the noise that is at the same frequency as the desired signal. For example, in a direct conversion system, the signal is around $f_{LO}$ , the antenna noise at $f_{LO}$ is called in-band noise; and after down-conversion, the TIA's noise around DC is also called in-band noise, because the signal is around DC now. Figure 2.28 shows all the in-band noise sources in the receiver. Similar to [9]-[10], because of the noise cancelling auxiliary path, all noise sources from the main path is nulled after cancellation. On the other hand, the noise sources from the auxiliary paths can be suppressed if careful design is considered. Figure 2.28: Receiver's in-band noise sources Figure 2.29: Phase and thermal noise cancelling auxiliary path impedance - noise Similar to Fig. 2.26, Fig. 2.29 shows the auxiliary path impedances but with emphasis on the noise. In a current mode circuit, to suppress the noise of the latter stage, it is important to maximize the driving impedance at each node (a perfect example is the cascode circuit). As explained in the last section, the passive mixer $MX_{PNAUX}$ up-converts the phase noise cancelling TIA's impedance to $2\Delta f_b$ . This impedance must be much smaller than the impedance at $2\Delta f_b$ at node P. A large capacitor at noise cancelling TIA's output creates a zero in its input impedance [20]-[21] [37]. At $2\Delta f_b$ , the impedance is close to $R_{NC,AUX}$ . In reality, the parasitic capacitance at TIA's input sets the upper limit of this large impedance. For the noise cancelling TIA, its DC noise is in-band with the signal. At DC, the TIA is driven by the down-converted Gm output impedance, and loaded with the input impedance of the phase noise cancellation path, which is dominated by $C_{s\_pnc}$ . At RF $G_m$ 's output, cross-coupled inverters are implemented to create negative resistance and boost its output impedance [35]. Figure 2.30: Auxiliary path baseband impedances As shown in Fig. 2.30, at $2\Delta f_b$ the phase noise cancelling TIA is driven by large impedance which is 20 times larger than its own input impedance. It is large enough to suppress TIA's noise. #### 2.7.2 Folded Noise Figure 2.31: Folded noise caused by phase noise cancellation Besides the in-band noise, there is another noise source caused by the frequency down-conversion in phase noise cancellation. When the phase noise replica is shifted to from $2\Delta f_b$ to DC, the antenna noise at $f_{LO}+2\Delta f_b$ is also shifted to received-band along with the replica. This effect is illustrated in Fig. 2.31. In fact, not only the antenna noise at $f_{LO}+2\Delta f_b$ is down-converted in-band, the noise from main path and the RF Gm cell at $f_{LO}+2\Delta f_b$ is also folded and raises the noise floor. If we assume the main path can be modeled as a pure matching resistor, the receiver's noise figure is $$NF_{FOLD} = (2 + 1 + 2 \frac{\overline{v_{Gm}^2}}{\overline{v_{Rs}^2}}) \frac{1}{\operatorname{sinc}^2(\pi/M)}$$ (2.55) Therefore, if the RF $G_m$ is noise-less and M is large, the receiver's NF raises from 0 dB to 4.77 dB after phase noise cancellation. In reality, the RF $G_m$ is about 1.7 dB in NF. After phase noise cancellation, the receiver's NF raises to 6.02 dB. It is important to note that in the bracket in equation (2.55), the first term 2 is from the antenna's noise at $f_{LO}$ and $f_{LO}+2\Delta f_b$ , and the second term is from the main path's noise at $f_{LO}+2\Delta f_b$ . The main path's noise at $f_{LO}$ is canceled through noise cancellation. However, if the phase noise cancellation is implemented in a conventional receiver based on single LNA or LNTA, there is no pure resistive matching element, and the folded NF can be reduced to $$NF_{FOLD} = (2 + 2\frac{\overline{v_{Gm}^2}}{\overline{v_{Rs}^2}}) \frac{1}{\sin^2(\pi/M)}$$ (2.56) This is the fundamental limit of a minimum NF in a phase noise cancelling receiver. People may ask that, since the main path is not a pure resistor, why can't we take advantage of it and reduce the folded noise? This argument sounds very legitimate, and seems extremely attractive. In fact, because the main path is based on N-path filter, the mixers can be sized large and aggressive filtering can be designed at the baseband, so the noise at image frequency $f_{LO}+2\Delta f_b$ can be heavily attenuated. However, as will be discussed next, this seemingly helpful approach in filtering the image noise inevitably increases the phase noise, which in fact makes the receiver's NF worse. It is necessary to examine phase noise in an N-path filter, and its implication in the phase noise cancelling receiver. #### 2.7.3 Phase Noise in N-path Filters Phase noise in N-path filters has been studied before [36]-[37]. Here we re-visit its impact. Figure 2.32: Phase noise in *N*-path filters Figure 2.32 shows the *N*-path filter and its ideal switching waveforms $Sw_0$ , $Sw_1$ , ..., $Sw_{M-1}$ . When the LO is accompanied with phase noise, the time domain clock waveforms are accompanied with jitter. If we focus on the first arm of the *N*-path filter, its noisy LO is represented by $Sw_{0,r}(t)$ . Due to the phase noise, its rising and falling edges are skewed with respect to the noiseless clock, $Sw_0(t)$ . The error pulses is $Sw_{0,e}(t) = Sw_{0,r}(t) - Sw_0(t)$ . The input of the *N*-path filter is $I_{RF}(t)$ . The resulting RF voltage contributed by this arm is $$V_{RF,0}(t) = R_{SW}I_{RF}(t) + Sw_{0,r}(t)\{Z_{BB}(t) * [I_{RF}(t)Sw_{0,r}(t)]\}$$ (2.57) Since the error clock $Sw_{0,e}(t)$ is composed of narrow pulses, (2.57) is simplified to $$V_{RF,0}(t) \approx R_{SW}I_{RF}(t) + Sw_0(t)\{Z_{BB}(t) * [I_{RF}(t)Sw_0(t)]\}$$ $$+Sw_{0,e}(t)\{Z_{BB}(t) * [I_{RF}(t)Sw_0(t)]\}$$ $$+Sw_0(t)\{Z_{BB}(t) * [I_{RF}(t)Sw_{0,e}(t)]\}$$ (2.58) where the error term with $Sw_{0,e}^{2}(t)$ is second-order and is ignored. The first term in (2.58) is the desired component for the band-pass response, whereas the last two terms are errors that describe the reciprocal folding of the blocker into the desired signal frequency due to the phase-noise. If we focus on the phase noise at $f_b$ and the strong blocker at $f_b$ , the physical meaning of the two error terms can be explained as below. Both of the two terms are caused by the passive mixer's reciprocity. The term $E_1(t) = Sw_{0,e}(t)\{Z_{BB}(t) * [I_{RF}(t)Sw_0(t)]\}$ can be viewed as an error which the jitter modulates turn-on period over which the baseband voltage appears at the RF side. The second term $E_2(t) = Sw_0(t)\{Z_{BB}(t) * [I_{RF}(t)Sw_{0,e}(t)]\}$ is the error in which the turn-on periods over which the RF current flows to the baseband impedance are modulated. Intuitively, if $Z_{BB}(t)$ has a low-pass response that heavily attenuates the component at $\Delta f_b$ , the second term $E_2(t)$ would dominate over $E_1(t)$ , and phase noise would appears at $f_{LO}$ at the RF node. Next, we examine the general scenario when $Z_{BB}(t)$ is an arbitrary impedance. Apply Fourier transform to (2.57): $$V_{RF,0}(\omega) = R_{SW}I_{RF}(\omega) + Sw_{0,r}(\omega) * \left\{ Z_{BB}(\omega) \cdot \left[ I_{RF}(\omega) * Sw_{0,r}(\omega) \right] \right\}$$ (2.59) where $Z_{BB}(\omega)$ is the baseband impedance in the frequency domain, and $Sw_{0,r}(\omega)$ is the clock with phase noise. The convolution and multiplication of (2.59)'s second term can be depicted graphically (Fig. 2.33). Figure 2.33: Graphic demonstration of phase noise in *N*-path filters In Fig. 2.33, the higher order harmonics of the clock pulse are neglected, so only the fundamental components around $f_{LO}$ are considered. And for the first convolution, the LO's negative frequency components are considered because the output of interest is the product of the down-conversion (RF current being down-converted); whereas for the second convolution, the LO's positive frequency components are considered because the output of interest is the product of the up-conversion (baseband voltage being up-converted). Two scenarios of the baseband impedance are illustrated in Fig. 2.33. Firstly, if the $Z_{BB}(\omega)$ has a wide bandwidth ( $>\Delta f_b$ ), the resulting RF node voltage does not have any band-pass filter response, which means the blocker is not attenuated. More importantly, in this case, the phase noise does *not* appear at the RF node. It is due to the fact that the phase noises around the carrier are symmetrical. Therefore, after the second convolution, they cancel each other. It is as if the RF node is loaded with a wideband resistor. Secondly, if the $Z_{BB}(\omega)$ has a much narrower bandwidth ( $<<\Delta f_b$ ), the desired narrow-band band pass filter response is realized. However, the phase noise appears at the RF node on top of the desired signal. It is due to the heavy attenuation of the blocker at the baseband. This is consistent with the observation obtained in [36]. Assuming the baseband impedance is a simple RC network (e.g. equation (2.50) & (2.52)), it can be expressed as $$Z_{BB}(\omega) = \frac{1}{j\omega C_{BB} + 1/R_{BB}} \tag{2.60}$$ And if the phase noise of the LO is modeled as spur around the blocker offset, it can be expressed as (2.6). The blocker RF current and the LO can be written as: $$V_{LO}(t) = A_{LO} \left[ e^{-j(\omega_{LO}t + \theta_{LO})} - \frac{\varphi_s}{2} e^{-j((\omega_{LO} + \Delta\omega_s)t + \theta_{LO} + \theta_s - \pi/2)} - \frac{\varphi_s}{2} e^{-j((\omega_{LO} - \Delta\omega_s)t + \theta_{LO} - \theta_s - \pi/2)} \right]$$ $$I_{rf}(t) = I_h e^{j\omega_b t}$$ (2.61) As shown in Fig. 2.33, the RF node voltage has three components around $f_{LO}$ , $f_{LO} + \Delta f_b$ ( $f_b$ ), and $f_{LO} + 2\Delta f_b$ . Assuming M is large, it can be expressed as: $$V_{rf}(t) = \frac{\varphi_s}{2} \frac{\left(Z_{BB}(0) - Z_{BB}(\Delta\omega_b)\right)}{M} I_b e^{j(\omega_{LO} + \Delta\omega_b - \Delta\omega_s)t}$$ $$+ \left(R_{sw} + \frac{Z_{BB}(\Delta\omega_b)}{M}\right) I_b e^{j(\omega_{LO} + \Delta\omega_b)t}$$ $$+ \frac{\varphi_s}{2} \frac{\left(Z_{BB}(\Delta\omega_b) - Z_{BB}(2\Delta\omega_b)\right)}{M} I_b e^{j(\omega_{LO} + \Delta\omega_b + \Delta\omega_s)t}$$ (2.62) With the knowledge of the RF node voltage with the LO phase noise, we can analyze the entire receiver's noise performance since the auxiliary path senses this voltage alone (Fig. 2.34). Figure 2.34: Phase and thermal noise cancelling receiver model Firstly, we examine the in-band phase noise RM product at receiver's output. The output phase noise is consisted of the noise from the 2 paths' outputs. From Fig. 2.33, it can be seen that the main path's output phase noise is $$V_{RM,main}(t) = \frac{\varphi_s}{2} I_b e^{j(\Delta \omega_b - \Delta \omega_s)t} Z_{main}$$ (2.63) where $Z_{main}$ is the main path's trans-impedance. On the other hand, the auxiliary path's output phase noise is consisted by two components: the phase noise at the RF node due to main path's reciprocity, and the mixing by blocker and the LO phase noise in the auxiliary path. $$V_{RM,aux}(t) = \left\{ \frac{\varphi_s}{2} \frac{\left( Z_{BB}(0) - Z_{BB}(\Delta \omega_b) \right)}{M} I_b e^{j(\Delta \omega_b - \Delta \omega_s)t} + \frac{\varphi_s}{2} \left( R_{sw} + \frac{Z_{BB}(\Delta \omega_b)}{M} \right) I_b e^{j(\Delta \omega_b - \Delta \omega_s)t} \right\} G_m Z_{tn,aux}$$ $$= \frac{\varphi_s}{2} \left( R_{sw} + \frac{Z_{BB}(0)}{M} \right) I_b e^{j(\Delta \omega_b - \Delta \omega_s)t} G_m Z_{tn,aux}$$ $$(2.64)$$ In the equation above, the $G_m$ and $Z_{tn,aux}$ are auxiliary path's RF trans-conductance and the thermal noise cancelling path's baseband trans-impedance, respectively. When the receiver is well matched, $R_{sw} + \frac{Z_{BB}(0)}{M} = R_s$ . Thus, (2.64) tells that, regardless of the main path's baseband impedance, the auxiliary path's output phase noise is constant, given that the receiver is well matched to the antenna. When the thermal noise from the main path is fully cancelled by the auxiliary path, the following condition is met [9]-[10]: $$R_s G_m Z_{tn.aux} = Z_{main} (2.65)$$ Therefore, the phase noise RM products from the 2 paths are equal and are added in-phase at receiver's final output, regardless of whether the main path employs *N*-path filtering. On the other hand, we examine the phase noise image replica to cancel this phase noise, which is extracted in the auxiliary path as well. After auxiliary path's down-conversion, the replica is consisted of two components: the replica from the RF node voltage and the one generated by the down-conversion. It can be expressed as: $$V_{RM,im}(t) = \left\{ \frac{\varphi_s}{2} \frac{\left( Z_{BB}(\Delta \omega_b) - Z_{BB}(2\Delta \omega_b) \right)}{M} I_b e^{j(\Delta \omega_b + \Delta \omega_s)t} + \frac{\varphi_s}{2} \left( R_{sw} + \frac{Z_{BB}(\Delta \omega_b)}{M} \right) I_b e^{j(\Delta \omega_b + \Delta \omega_s)t} \right\} G_m Z_{pn,aux}$$ $$= \frac{\varphi_s}{2} \left( R_{sw} + \frac{2Z_{BB}(\Delta \omega_b) - Z_{BB}(2\Delta \omega_b)}{M} \right) I_b e^{j(\Delta \omega_b + \Delta \omega_s)t} G_m Z_{pn,aux}$$ (2.66) Therefore, to perfectly cancel the phase noise, $V_{RM,im}(t) = V_{RM,main}(t) + V_{RM,aux}(t)$ , we can derive the trans-impedance of the phase noise cancelling path: $$Z_{pn,aux} = \frac{2\left(R_{sw} + \frac{Z_{BB}(0)}{M}\right)}{R_{sw} + \frac{2Z_{BB}(\Delta\omega_b) - Z_{BB}(2\Delta\omega_b)}{M}} Z_{tn,aux}$$ (2.67) Inspecting (2.67), it can be derived that if the main path has a high-Q band-pass response $(R_{sw} \to 0, Z_{BB}(\Delta \omega_b) \ll Z_{BB}(0))$ , the phase noise cancelling gain needs to be increased dramatically, compared to the case that no aggressive filtering is implemented at the RF node. Although the in-band phase noise can always be cancelled by meeting (2.67), the increased gain also amplifies the folded noise at $f_{LO}+2\Delta f_b$ from the main path, antenna and the RF $G_m$ cell. With the gain of each path determined, we can conduct the noise analysis of the entire receiver, with the focus on the folded noise which has been partly discussed in section 2.7.2. Figure 2.35: Phase and thermal noise cancelling receiver folded noise model As shown in Fig. 2.35, the noise source of the antenna, switch resistance and baseband impedances are shown. Examining the noises at image frequency $f_{LO}+2\Delta f_b$ , assuming M is large again, they appear at the RF node as: $$\frac{\overline{V_{n,a,rf}^{2}(\omega_{LO} + 2\Delta\omega_{b})}}{V_{n,a,rf}^{2}(\omega_{LO} + 2\Delta\omega_{b})} = \frac{\overline{V_{n,a}^{2}(\omega_{LO} + 2\Delta\omega_{b})}}{\overline{R_{s} + R_{sw} + \frac{Z_{BB}(2\Delta\omega_{b})}{M}}} \right)^{2}$$ (2.68) $$\overline{V_{n,sw,rf}^2(\omega_{LO} + 2\Delta\omega_b)} = \overline{V_{n,sw}^2(\omega_{LO} + 2\Delta\omega_b)} \left(\frac{R_s}{R_s + R_{sw} + \frac{Z_{BB}(2\Delta\omega_b)}{M}}\right)^2$$ (2.69) $$\overline{V_{n,bb,rf}^{2}(\omega_{LO} + 2\Delta\omega_{b})} = M\overline{I_{n,bb}^{2}(2\Delta\omega_{b})} \left\{ \left[ (R_{s} + R_{sw}) / / \frac{Z_{BB}(2\Delta\omega_{b})}{M} \right] \frac{R_{s}}{R_{s} + R_{sw}} \right\}^{2}$$ (2.70) where $\overline{V_{n,a}^2(\omega_{LO}+2\Delta\omega_b)}=4kTR_s$ , $\overline{V_{n,sw}^2(\omega_{LO}+2\Delta\omega_b)}=4kTR_{sw}$ , and $\overline{I_{n,bb}^2(2\Delta\omega_b)}=4kT/R_{BB}$ . These noise voltages at the RF node are sensed by the phase noise cancelling path. They appear at the receiver's output as: $$\overline{V_{n,fold}^{2}} = \left(\overline{V_{n,a,rf}^{2}}(\omega_{LO} + 2\Delta\omega_{b}) + \overline{V_{n,sw,rf}^{2}}(\omega_{LO} + 2\Delta\omega_{b}) + \overline{V_{n,bb,rf}^{2}}(\omega_{LO} + 2\Delta\omega_{b})}\right) G_{m}^{2} Z_{pn,aux}^{2}$$ $$= \begin{cases} 4kTR_{s} \left(\frac{R_{sw} + \frac{Z_{BB}(2\Delta\omega_{b})}{M}}{R_{s} + R_{sw} + \frac{Z_{BB}(2\Delta\omega_{b})}{M}}\right)^{2} + 4kTR_{sw} \left(\frac{R_{s}}{R_{s} + R_{sw} + \frac{Z_{BB}(2\Delta\omega_{b})}{M}}\right)^{2} \\ + M\frac{4kT}{R_{BB}} \left\{\left[(R_{s} + R_{sw}) / / \frac{Z_{BB}(2\Delta\omega_{b})}{M}\right] \frac{R_{s}}{R_{s} + R_{sw}}\right\}^{2} \end{cases}$$ $$\times G_{m}^{2} \left\{\frac{2\left(R_{sw} + \frac{Z_{BB}(0)}{M}\right)}{R_{sw} + \frac{2Z_{BB}(\Delta\omega_{b}) - Z_{BB}(2\Delta\omega_{b})}{M}} Z_{tn,aux}\right\}^{2} \tag{2.71}$$ Consider the case that the $Z_{BB}(\omega)$ has a very small bandwidth, which means at $\Delta\omega_b$ and $2\Delta\omega_b$ , the impedance is dominated by the capacitor $C_{BB}$ (2.60). Furthermore, we assume that the switches are not impractically large. Therefore, the following condition can be applied to (2.71). $$|Z_{BB}(\Delta\omega_b)| \approx \left|\frac{1}{j\Delta\omega_bC_{BB}}\right| \ll MR_{sw}, R_{BB}$$ Then equation (2.71) can be simplified as $$\overline{V_{n,fold}^{2}} \approx 16kT \left( R_{s} + \frac{R_{s}^{2}}{R_{sw}} + \frac{1}{R_{BB}} \frac{1}{(j2\Delta\omega_{b}C_{BB})^{2}} \frac{R_{s}^{2}}{R_{sw}^{2}} \right) \frac{R_{s}^{2}}{(R_{s} + R_{sw})^{2}} G_{m}^{2} Z_{tn.aux}^{2}$$ $$\approx 16kT \left( R_{s} + \frac{R_{s}^{2}}{R_{sw}} + Z_{BB} (2\Delta\omega_{b}) \frac{BW_{BB}}{j2\Delta\omega_{b}} \frac{R_{s}^{2}}{R_{sw}^{2}} \right) \frac{R_{s}^{2}}{(R_{s} + R_{sw})^{2}} G_{m}^{2} Z_{tn.aux}^{2} \tag{2.72}$$ where $BW_{BB}$ is the bandwidth of the baseband RC network. Therefore, the first 2 terms in the bracket dominates over the third term, and sets the noise floor. Equation (2.72) clearly shows that, when the in-band phase noise is perfectly cancelled by the phase noise cancellation, the folded noise at receiver's output is minimized when the switch resistance is maximized. It has to be noted that, the noise analysis in (2.68)-(2.72) hasn't included the image noise from the RF $G_m$ cell. Unlike the noises from the main path and antenna, the RF $G_m$ 's folded noise does not interact with main path's filtering, but only depends on the trans-impedance gain of the phase noise cancelling auxiliary path, $Z_{pn,aux}$ . As $Z_{pn,aux}$ increases dramatically with high-Q bandpass filter in the main path, the RF $G_m$ 's folded noise also increases accordingly. Thus, it answers the intuition that a "large" switch and high-Q band-pass filter filters the image noise at $f_{LO}$ +2 $\Delta f_b$ and could lead to a smaller NF than (2.26). This intuition is proved being wrong by (2.72). The fundamental reason is that the total phase noise remains constant regardless of the filtering. Although the image noise voltage at the RF node is attenuated by the high-Q filter, the blocker swing at the RF node is also attenuated, which leads to a smaller available phase noise replica for cancellation, and in turn a greater gain from the phase noise cancellation. The gain is proved to overwhelm the attenuation, therefore the filtering is undesirable. This is the fundamental reason why no significant filtering is placed in main path's baseband. It has been proved that the linearity can still be met without the filtering (see Section 2.6.1). # 2.8 Circuit Design #### 2.8.1 Receiver Topology The complete schematic of the proposed phase and thermal noise cancelling receiver, which was fabricated in 28nm CMOS, is shown in Fig. 2.36. The receiver is implemented in a differential fashion. The series resistance of the main path passive mixer switches ( $\approx 40\Omega$ single-end) and the up-converted input impedance of the main-path TIAs provide a $50\Omega$ input impedance. This relatively small switches (large $R_{sw}$ ) is to minimize the folded noise (see Section 2.7.3). In the auxiliary path, passive mixer switches are sized large so that the out-of-band impedance is small ( $\approx 15\Omega$ ), which ensures the RF $G_m$ 's linearity (see Section 2.6.2). Figure 2.36: The complete phase and thermal noise cancelling receiver The auxiliary class-AB trans-conductance is sized to give $G_m$ =150mS and uses minimum length devices (28nm). Cross-coupled inverters are inserted at its output to boost the output impedance and suppress the baseband noise (see Section 2.7.1). Both the RF Gm and the negative resistance are programmable through digital control bits. The class-AB $G_m$ is self-biased with large resistors ( $\approx 100 \text{k}\Omega$ ). Through the self-biasing, its common-mode is extracted and used to bias the entire receiver close to the mid-rail voltage. This ensures that the DC voltages across the mixers are equal, so no DC current flows through the mixer switches. The outputs of the TIAs are weighted and summed with 24 (3 paths, 8 cells each path) separate 9-bit programmable $G_m$ cells. These cells can provide an arbitrary magnitude and phase shift between all 3 paths. In summary, the entire receiver is consisted of capacitors, resistors, switches and inverters, with the exception in the auxiliary LO generation circuits. ### 2.8.2 Auxiliary LO Generation & 8/8-Phase Mixer The auxiliary LO generation is proved to be important in the phase noise cancelation (see Section 2.4). After the high-pass capacitor, TIAs buffer the blocker and provides enough swing to drive the second-order generator. Due to the non-linearity of rectification, its product is also rich in other harmonics. A simple harmonic re-combination follows to boost the $2^{nd}$ order harmonic and suppress the $3^{rd}$ and $5^{th}$ order ones. Afterwards, a 16-stage injection locked ring oscillator is used to filter the auxiliary LO. Both the injection strength and ring oscillator's delay cell are fully programmable by digital control bits. The delay cell's R and load C are both programmable. It is shown in Fig. 2.37. Finally, AND gates are used to generate non-overlapping clocks for the reciprocal mixing image passive mixer, which is discussed next. Figure 2.37: Schematic for auxiliary LO generation To down-convert the phase noise replica around $2\Delta f_b$ , an 8/8-phase mixer is implemented. The 8/8-phase mixer creates complex input impedance [38]-[39], and rejects the phase noise skirts at 3<sup>rd</sup> and 5<sup>th</sup> harmonics [20]-[21] (Fig. 2.38). The complex input impedance ensures that only the components at $f_{LO}+2\Delta f_b$ are admitted by the phase noise cancelling path, while the noise or interferers at $f_{LO}-2\Delta f_b$ are rejected. If the blocker is at the lower side of the desired signal (i.e. $f_{LO}-2\Delta f_b$ ), the 8/8-phase mixer needs re-arrangement so the lower side band is admitted and upper side band is rejected. This mixer is also sized large to ensure the linearity of the auxiliary path (see Section 2.6.2). Figure 2.38: 8/8-phase RM image passive mixer ### 2.8.3 RF Multiphase Clock Generation To truly prove the concept of the phase noise cancelling and the idea of the receiver being *inductor-less*, a ring oscillator is integrated on-chip to generate the RF clock for the receiver. Like the rest of the receiver, the ring oscillator is consisted of digitally programmable inverter stages and capacitors as well. It employs 3 stages of minimum length channel inverters to oscillate at $4*f_{LO}$ . Besides the extremely small area, the other advantage of ring oscillator is its extremely wide tuning range, which is 3.6GHz to 12GHz in our prototype. This property is particularly desirable in a wideband receiver, as the *LC*-based VCOs are usually much narrower in the tuning range, and multiple VCOs are likely needed [13]-[14]<sup>3</sup>. The ring oscillator's Figure-of-Merit is about 162dB, about 25dB lower than a well-designed *LC*-VCO. The ring oscillator is phase locked to a 52MHz reference clock by an off-chip frequency synthesizer (ADF 4151 from Analog Devices). In a product, the synthesizer can be also integrated. In our prototype, the VCO is the dominating source of the far-out phase noise, so an external synthesizer in used for simplicity. To facilitate testing and characterization, the internal ring oscillator can also be bypassed by an external signal source. This allows us to use an LO source with a higher phase noise than the internal ring oscillator, so the phase noise dominates the receiver floor. Hence, the receiver's phase noise rejection can be fully characterized. Figure 2.39: RF and non-overlapping clock generations <sup>&</sup>lt;sup>3</sup> Efforts have been made to this issue as well, with the notable example of [40]-[41]. With the $4*f_{LO}$ source, a shift-registered (or Johnson) frequency divider is employed [9]-[10]. Compared to the AND-gate based non-overlapping clock generator, the shift-registers' multiphase outputs have correlated noise. The correlated phase noise can be cancelled by the phase noise cancellation, whereas the uncorrelated one cannot. #### 2.8.4 Baseband TIAs The baseband TIAs are built around inverter-based Op-Amps. The feedbacks have been described in Sections 2.6 and 2.7 for different noise and linearity reasons. The inverter-based differential amplifiers are similar to the ones used in [9]-[10] (Fig. 2.40). Thick oxide long channel devices are used so the flicker noise is minimized. Because the transistors are biased in sub-threshold region under core device's supply voltage (1V), their $g_m/I_d$ ratio is high and the channel thermal noise is minimized as well. Figure 2.40: TIA Op-Amp schematic # 2.9 Measurement Results The die micrograph is shown in Fig. 2.41. The design occupies an active area of 1.85mm<sup>2</sup> after excluding excessive re-configurable capacitors which are not used in the measurement (actual chip area: 2.7mm<sup>2</sup>). The entire receiver including the LO generation does *not* have on-chip inductors. Figure 2.41: Die-micrograph of the phase and thermal noise cancelling receiver # 2.9.1 Noise Figure Small signal noise figure is swept across the RF frequency. In this measurement, the phase noise cancellation is turned off because no phase noise will be mixed in-band without a blocker. The receiver works in the same way as in [9]-[10]. A similar sub-2.5dB small signal noise figure is observed in this prototype as well across the bandwidth of 200-2800MHz (Fig. 2.42). Figure 2.42: Small signal noise figure measurement ### 2.9.2 Input Matching To reduce the Bill-of-Materials (BoM) on the PCB, no external matching networks are used. The $S_{II}$ is measured with the receiver fully-on. It is measured at about -10dB across the receiver's passband, with a worst case of -9.3dB (Fig. 2.43). Figure 2.43: Measured $S_{11}$ #### 2.9.3 RF LO Generation The phase noise of the RF ring oscillator is measured (Fig. 2.44). Its free-running frequency is at about 8GHz. The phase noise is measured after divide-by-4 at $f_{LO}$ . At blocker offset (80MHz), the far-out phase noise is dominated by VCO at -141.4dBc/Hz, consuming 6mW. Figure 2.44: Measured RF LO phase noise at 2GHz #### 2.9.4 Blocker Noise Figure – CW Blocker A CW blocker is firstly injected along with the small wanted signal. To fully characterize the receiver's performance, three measurements are conducted (Fig. 2.45). Firstly, the internal ring oscillator is by-passed, and the receiver is driven by a clean external LO with minimum phase noise. The receiver's 0-dBm blocker noise figure is about 6dB. Afterwards, the internal ring oscillator is switched on. With the -141.4dBc/Hz phase noise, the 0-dBm blocker noise figure is 32.4dB, consistent with equation (2.1). Finally, the phase noise cancellation circuitry is turned on to cancel the ring oscillator's phase noise. In the presence of the 0-dBm CW blocker, the noise figure is reduced to 13.5dB, achieving 19dB NF reduction. Therefore, with a low-power ring oscillator, the receiver meets the PCS (1950MHz) 3GPP specification. Figure 2.45: Measured CW blocker NF As expected, when the phase noise cancellation is turned on, receiver's small signal noise figure increases to about 6dB. This is caused by the noise folding (see Section 2.7.2). However, when the blocker power is low, the phase noise cancellation path can be turned off, so a 2-dB noise figure can be achieved (Fig. 2.42). # 2.9.5 Blocker Noise Figure – Modulated Blocker Figure 2.46: Phase noise cancellation with AM/PM blockers Firstly, AM/PM blockers NF are measured. To fully characterize the phase noise cancellation, the receiver is driven by a noisy external LO source, so the phase noise dominates the receiver's output noise. Due to the limitation of the equipment, the LO is tuned to 1.5GHz and its phase noise at 80MHz offset is -132dBc/Hz. As shown in Fig. 2.46, in the presence of AM blockers, the reciprocal mixing caused by phase noise can be cancelled with large modulation index. The ~1dB drop-off is likely caused by the AM/PM distortion in the auxiliary LO generation. When the PM blockers are applied, the phase noise cancellation degrades much quicker. Due to the PM distortion in the injection locked ring oscillator. As the modulation rate and modulation index are increased to 3MHz and 1.7rad, respectively, significant phase noise cancellation is still achieved at about 12dB. Figure 2.47: Measured WCDMA blocker NF Finally, a 5MHz bandwidth WCDMA blocker is applied to the receiver input. When the WCDMA blocker is present, its AM component is down-converted and raises the noise floor by receiver's second-order non-linearity. Since IIP2 improvement is not the focus of this work, and is relatively easy to calibrate in hardware [43], its effect is calibrated in our measurement by subtracting the noise when no phase noise is present. At -7dBm blocker power, the receiver's noise figure is reduced by 12dB, measured at about 15dB (Fig. 2.47). It has to be noted that a modulated blocker is always from the receiver's own transmitter or a transmitter within the same handset, and the blocker power tolerance level is much lower than the CW ones. The modulated blocker power is at most -10dBm in most cases. #### 2.9.6 Phase Noise Cancellation with Multiple Blockers So far, we have considered and measured the phase noise cancellation receiver with one single strong blocker. However, in the real hostile wireless environment, the receiver could operate under multiple blockers at the same time. It is important that the proposed phase noise cancellation could cope with this situation as well. They are studied in this sub-section. Figure 2.48: Blockers on opposite side-bands Firstly, we consider the case that the blockers are on the opposite side-bands, i.e. one on the upper side-band, the other on the lower side-band. In this case, the 8/8-phase phase noise cancellation mixer only admits the product from the desired side-band, and rejects the other, because of its complex frequency response. Therefore, two parallel phase noise cancellation paths can be implemented (Fig. 2.48). As long as the blockers are not exactly on the mirrored frequencies, i.e. $f_0\pm\Delta f_b$ , the complex mixer's rejection is sufficient to maintain the receiver's performance. Figure 2.49: Blockers on the same side-band When the multiple blockers are on the same side-band of the signal (Fig. 2.49), it is interesting that the proposed system will treat the multiple blockers as one large blocker, with a more complicated, but unified, PM. This allows the receiver to readily work with multiple blockers with no modifications, given that the Aux. LO's generation has sufficient bandwidth, as the new "large" blocker has a wider bandwidth. Figure 2.50 shows the system simulation with two LTE blockers. It can be seen that the phase noise can still be perfectly cancelled. Figure 2.50: Phase noise cancellation with two LTE blockers It is also verified through measurement. Two CW blockers with equal power are injected into the receiver, and their spacing is swept. The injection-locked oscillator is tuned to the middle of the two tones, and can be properly locked. Figure 2.51 is the measured phase noise rejection with swept spacing. 10dB rejection can be maintained up to 30MHz spacing. Figure 2.51: Measured phase noise cancellation with two CW tones #### 2.9.7 Comparison with Prior Arts As shown in the CW blocker NF measurement, the receiver's NF is reduced from 32.4dB to 13.5dB. By adding auxiliary LO's power consumption, 11mW, to the VCO power, 6mW, its effective FOM improves from 163.1dB to 181.5dB, which is comparable to an *LC* counterpart. Table 2.1 compares the proposed phase and thermal noise cancelling receiver with other recently published SDR receivers. Our design achieves 2dB small-signal noise figure, and cancels the phase noise reciprocal mixing regardless of blocker's modulation. Compared with the SDR proposed in [13]-[14], which is the only other published SDR that has on-chip LO generation (*LC* VCOs), our approach achieves better small signal and blocker noise figure, and consumes less power in LO generation without using any on-chip inductors. Furthermore, as the entire receiver consists of only transistors, resistors and capacitors, it is fully scalable with process, whereas an *LC* VCO-based design is not. | | Fabiano et al.<br>ISSCC'13 [44] | Borremans et al.<br>JSSC '11 [13]-[14] | Murphy et al.<br>JSSC'12 [9]-[10] | Mikhemar et al.<br>JSSC'13 [20]-[21] | This Work [45] | |----------------------------------------------------------|---------------------------------------|----------------------------------------|-----------------------------------|--------------------------------------|--------------------------------------| | Topology Description | No Cancellation | No Cancellation | Thermal Noise<br>Cancellation | Phase Noise<br>Cancellation | Thermal and Phase noise Cancellation | | | External LO | Integrated LO (LC) | External LO | External LO | Integrated LO (Ring) | | CMOS Technology | 40nm | 40nm | 40nm | 40nm | 28nm | | RX Frequency [MHz] | 1800-2400 | 400-6000 | 80-2700 | 2000-3000 | 200-3000 | | RF Input | Single-Ended | Differential | Single-Ended | Differential | Differential | | Gain [dB] | 45.5 | 70 | 72 | N/A | 60 | | NF @ 2GHz [dB] | 3.8 | 3.2 | 1.9 | 5 | 1.8 | | 0dBm OB-Blocker NF [dB]<br>(w/o phase noise) | 7.9<br>(Δf=20MHz) | 15<br>(Δf=20MHz) | 4.1<br>(Δf=80MHz) | N/A | 5.5<br>(Δf=80MHz) | | 0dBm OB-Blocker NF [dB]<br>(-141dBc/Hz @ blocker offset) | ≈32* (estimated) | | | 25* (PNC) | 13.5/32 (PNC on/off) | | RF Power [mW]<br>BB Power [mW] | 16.2 <sup>*</sup><br>7.2 <sup>*</sup> | 55 | 12<br>20 | 20<br>30 | 12<br>23 | | LOGEN Power [mW] | N/A | 30-40 | N/A | N/A | 5-15 | | PN Cancellation Power [mW] | 0 | | | 12 | 11 | | LOGEN FOM Improvement [dB] | 0 | | | 18 | 18.4 | | Supply Voltage [V] | 1.2/1.8 | 1.1/2.5 | 1.3 | 1.5 | 1.0 | | OB-P1dB [dBm] | -1 | -8 | -2 | N/A | -2.5 | | OB-IIP3 [dBm] | +18 | +10 | +13.5 | N/A | ≈13.5 | | OB-IIP2 [dBm] | +64 | +70** | +54 | N/A | ≈50 | | Active Area [mm <sup>2</sup> ] | 0.84 <sup>†</sup> | 2 | 1.2 <sup>†</sup> | 1.4 <sup>†</sup> | 1.85 <sup>††</sup> | <sup>\*</sup> Estimated and/or interpreted from plots, figures and/or reported numbers Table 2.1: Comparison with other blocker-tolerant or phase noise cancelling receivers <sup>\*\*</sup> With calibration <sup>†</sup> Off-chip LOGEN <sup>††</sup> Excluding excessive re-configurable capacitors which are not used in the measurement. The actual chip area is: 2.7mm² ### 2.10 Digitally Assisted Phase Noise Cancellation #### 2.10.1 Digital Calibration of ILRO's Phase Tracking Error As mentioned in Section 2.4.3, and verified through measurement, the ILRO introduces phase tracking error to the auxiliary LO, and results in degraded phase noise cancellation with modulated blocker. The distortion is analytically derived and expressed in (2.28) and (2.29). They are re-written below. $$I_{out}(t) = I_{osc} e^{j(\omega_{inj}t + \alpha(t))} \approx I_{osc} e^{j(\omega_{inj}t + \theta(t - \tau))}$$ (2.28) $$\frac{1}{\tau} = A = \frac{I_{inj}}{I_{osc}} \frac{(RC\omega_0)^2 + 1}{RC} = \frac{I_{inj}}{I_{osc}} \cdot \omega_0 \cdot \frac{\tan^2(\pi/N) + 1}{\tan(\pi/N)}$$ (2.29) The distortion can be simplified as a constant delay in injection signal's PM, which is the auxiliary LO's PM in this receiver, and the delay a function of injection strength, oscillation strength, free-running frequency, and ring oscillator delay cell's components value (R & C). All these parameters are within our design space, and can be pre-determined. In another word, the delay $\tau$ can be obtained and the distortion can be digitally calibrated in digital domain. Replace the distorted auxiliary LO (2.28) into equation (2.13), the real signals at main path's and phase noise cancellation's outputs are: $$V_{RM,Main} = A_b(t) \sin[(\Delta \omega_b - \Delta \omega_s)t + \theta_b(t)]$$ $$V_{RM,Aux} = A_b(t) \sin[(\Delta \omega_b - \Delta \omega_s)t + \theta_b(t) + 2(\theta_b(t - \tau) - \theta_b(t))]$$ $$\approx A_b(t) \sin[(\Delta \omega_b - \Delta \omega_s)t + \theta_b(t)]$$ $$+ 2A_b(t) \cos[(\Delta \omega_b - \Delta \omega_s)t + \theta_b(t)] \cdot [\theta_b(t - \tau) - \theta_b(t)]$$ (2.74) In (2.74), the approximation is valid when $\tau$ is very small. Inspecting the RM product at main path's output (2.73) and the replica for (2.74), the cancellation error is expressed in the (2.74)'s second term. The error is product of a 90° shifted RM product and a differentiated blocker PM component. The system to cancel this error is shown in Fig. 2.52. Figure 2.52: System for digitally calibrating ILRO's phase tracking error It has to be noted that, to calibrate the phase tracking error, the blocker's PM is required. Fortunately, this information is available for us most of the time, as the modulated blocker is often from the receiver's own transmitter (FDD system) or a transmitter within the same handset (co-existence). Therefore, extracting the blocker's PM in the digital domain is fairly trivial, and all the calibration can be conducted in the digital domain after ADC sampling. #### 2.10.2 Digitally Assisted Phase Noise Cancellation without ILRO The phase noise cancellation system we have discussed so far has been based on the assumption that the receiver has no knowledge of the blocker's information, except the one that enters the chip at its RF input. The auxiliary LO generation shown in Fig. 2.37 extract the LO from the received blocker. However, the injection-locking introduces certain phase distortion during the extraction, and results in degradation in phase noise cancellation when the blocker is modulated. On the other hand, it is noted that the modulated blockers almost always originate from the receiver's own transmitter (FDD system) or another transmitter within the same handset (co-existence). Therefore, the receiver does have access to blocker's information, e.g. its PM and offset frequency. As a result, we can take advantage of this information, and design an alternative system without the injection-locking. It is shown in Fig. 2.53. Figure 2.53: Digitally assisted phase noise cancellation without ILRO In this system, no injection-locking is required. Instead, the phase noise replica is down-converted by a CW tone at $2\Delta f_b$ . The PM in the ideal Aux. LO is applied in the digital domain. In this arrangement, the bandwidth of the LPF, $BW_{LPF}$ , has to satisfy: $$BW_{LPF} > 2(BW_{blocker} + BW_{signal}) \tag{2.75}$$ where $BW_{blocker}$ and $BW_{signal}$ are the bandwidths of the blocker and the signal, respectively. As a result, the phase noise cancellation performance is plotted in Fig. 2.54. Figure 2.54: Digitally assisted phase noise cancellation ### 2.11 Conclusion A new wideband, highly-linear phase and thermal noise cancelling receiver is reported. It cancels reciprocal mixing caused by blocker and LO phase noise, relaxes phase noise/LOGEN power trade-off inherent in wideband receivers, and tolerates strong blocker (up to 0dBm) without compromising the NF. The fabricated prototype suitable for SDR integrates on-chip ring oscillator as the LO source and achieves sub-14dB NF under 0dBm CW blocker and -10dBm 5MHz WCDMA blocker. It is the first *Inductor-less* receiver reported with state-of-the-art receiver performances. # **CHAPTER 3** # A Wideband, Low-Noise Current-Mode mm-Wave Receiver ### 3.1 Introduction 56 Figure 3.1: (a) A traditional multi-stage voltage-mode 60GHz LNA [48]-[49]; (b) A 60GHz receiving front-end frequency response and blocker scenario. (b) 62 GHz 66 Multi-Gb/s wireless communications leveraging on the vastly available mm-wave spectra (60GHz and above) have drawn increasing attention in recent time, yet the receiver performance has suffered from limited CMOS device performance, especially when wide bandwidth and high sensitivity/linearity are simultaneously needed for high-order modulated Gb/s digital communications. Recently reported works on 60 GHz receivers achieved low noise figure (NF) but lacked bandwidth coverage (<5% fractional bandwidth) [46], or covered wide bandwidth (>10% fractional bandwidth) but achieved only moderate NF (>5.5dB) [47]-[49]. Most reported works relied heavily on accumulative voltage gains from multiple low noise amplifier (LNA) stages, which greatly limit receiver's linearity and bandwidth. In addition, out-of-channel but inband blocker may cause issues for broadband receiving, because it can desensitize the front-end and corrupt the overall receiver performance. In this work, we present a non-traditional current mode mm-wave receiver architecture to address the aforementioned issues [50]. The prototype for 60 GHz applications is realized in 65nm CMOS. However, the same architecture is generally applicable to broadband receiver designs at any other mm-wave frequencies. The prototype is a direct-conversion receiver featuring a trans-conductance ( $G_m$ ) front-end with Frequency-staggered Series Resonance Common Source (*FSRCS*) stage to overcome the bandwidth, noise, and linearity trade-offs that have plagued conventional receiver design. The prototype receiver is also integrated with an on-chip 60GHz quadrature voltage-controlled-oscillator (QVCO) as local oscillator (LO) for ease-of-testing. The resulting prototype was optimized to achieve NF = 3.8dB, bandwidth = 7.5GHz, P1dBout = +1dBm, and concurrently demonstrates effective out-of-channel blocker tolerance level (marked at 1-dB gain compression point) up to -9 dBm at an offset frequency 3.5GHz away. In this Chapter, Section 3.2 will present the proposed receiver architecture and details of each sub-block. Section 3.3 and 3.4 will present the receiver prototype's measured performance and summarize the overall work. #### 3.2 Receiver Architecture Figure 3.2 shows the receiver architecture, comprising a low noise G<sub>m</sub> front-end tuned to 60GHz, I/Q passive mixers, and baseband trans-impedance amplifiers (TIAs). The $G_m$ cell provides input matching and voltage-to-current conversion. The current flows into the mixers and is then converted back to voltage by the baseband TIAs. The TIA employs a wideband Op-Amp with resistive feedback and has low input impedance. It suppresses voltage gain at TIA's input. Since voltage gain is held at the minimum level through the entire receiver chain until the TIA output, its linearity is much improved with higher tolerance to the out-of-band blocker [9]-[10]. Passive mixers are employed for achieving better linearity as well. The front-end $G_m$ cell determines the RF bandwidth and noise performance of the receiver. Figure 3.2: Proposed current-mode broadband mm-wave receiver with integrated VCO # 3.2.1 FSRCS G<sub>m</sub> Front-end Figure 3.3: (a) Series resonance tank and its passive amplification; (b) the series resonance tank in a common source stage As shown in Fig. 3.3, series resonance tank has the well-known property of providing passive voltage gain around resonance, and has been used before in receiver front-end to suppress noise at RF frequency [51]. Its usefulness at mm-wave frequency has also been demonstrated by our group in the past [46]. However, its limitation on bandwidth has made it unsuitable for wideband mm-wave designs. For example, if an on-chip transformer tuned to 60GHz has a Q of 15, the series resonance tank provides passive gain of > 20dB which helps to suppress noise from active devices. However, its 3-dB bandwidth is limited to about 4GHz. Figure 3.4: Frequency-staggered Series Resonance Common Source (*FSRCS*) and its frequency response To tackle this limitation on bandwidth while still benefiting from the passive voltage gain, we instead propose to use Frequency-staggered Series Resonance Common Source (FSRCS) structure that can accomplish both objectives simultaneously. As exemplified in Fig. 3.4, a parallel path can be added to provide a staggered resonance. The two paths' currents are then summed at the transistors' drain nodes, so the overall $G_m$ response can achieve a wider bandwidth. It has to be noted that the simulation in Fig. 3.4 assumes a perfect zero load impedance. As will be shown later, the actual load in the mm-wave front-end could affect this $G_m$ 's frequency response significantly. Although the 50- $\Omega$ source resistance may lower the Q of the FSRCS transformers, their effectively loaded Q can still be kept high owing to the fact that the source resistance is effectively transformed up by the transformers' 1:3 turn ratio. The second resonance in FSRCS contributes only insignificant noise ( $\sim$ 0.1dB) based on simulations, due to each of resonators' off-peak filtering effect. When the front-end is matched to $50\Omega$ , its associated noise is the power loss of the transformers, plus the input devices' noise suppressed by transformer's passive voltage gain. Across the bandwidth, although the signal may experience certain filtering from one of the LC tanks, the voltage gain is still sufficiently large to suppress the active devices' noise, resulting in small noise degradation ( $\sim$ 0.1dB). The output mm-wave signals are then combined at the differential drain nodes in the current domain with extended 7GHz bandwidth. To further reduce the noise, minimize the LO-RF feed-through and improve the front-end stability, M5/M6 cascode devices with differential inductors (between M1/M3 and M5 and between M2/M4 and M6), and push/pull capacitors (between M5-source to M6-gate or M6-source to M5-gate) are inserted [52]. The current mode front-end can therefore be built by using a single FSRCS stage in contrast to conventional multi-stage LNAs. Compared with multi-stage LNA with staggered resonance, it also allows the receiver to reduce explicit voltage gain in the front-end, and improves out-of-band linearity significantly. Figure 3.5 shows the proposed schematic of low noise $G_m$ front-end. It however does not include the neutralization capacitors yet, which will be discussed next. Figure 3.5: Schematic of low noise $G_m$ Front-end with FSRCS # 3.2.2 FSRCS Load Effect and $C_{gd}$ Neutralization While *FSRCS* can be used effectively to simultaneously extend the RF bandwidth and achieve low noise, its associated finite load and active devices parasitics must be fully analyzed to facilitate its proper performance. For this purpose, we will first assess its load, a $\pi$ -network with common gate cascode devices. Figure 3.6: (a) Cascode with inter-stage inductors; (b) single-ended equivalent model As suggested by [52], the inter-stage inductor $L_s$ (as shown in Fig. 3.6) may resonate with parasitic capacitors of common-source (C-S) and common gate (C-G) devices, and consequently suppress the noise of cascode devices. With $L_s$ , looking into the cascode node, $Z_{SX}$ is derived as $$Z_{SX}(s) = \frac{s^2 L_s C_1 + 1}{s(C_1 + C_2 - \omega^2 L_s C_1 C_2)}$$ (3.1) where FSRCS' output impedance is assumed to be infinite and $L_s$ is ideal. When $Z_{SX}(s)$ is large, which implies that the denominator of Eq. (3.1) is approaching zero, or $L_s$ resonating with $(C_1 + C_2)$ , the C-G devices' noise currents are forced to flow through themselves and do not manifest themselves at the output. However, while the series inductor effectively improves the noise performance of a cascode amplifier, its effect on the load $(Z_{LX})$ seen by the previous trans-conductance stage has not been fully assessed in the past. Without a proper design, we discover that it will reduce FSRCS' bandwidth (Fig. 3.9(a)) by degenerating intended staggered resonances into a single major resonance. The underlining physics can be understood through careful analyses on $Z_{LX}$ , and FSRCS' frequency response under the influence of $Z_{LX}$ . With the equivalent circuit model depicted in Fig. 3.6(b), $Z_{LX}$ can be derived as $$Z_{LX}(s) = \left(\frac{1}{sC_2 + g_{m,cg}} + sL_s\right) || \frac{1}{sC_1}$$ $$= \frac{R_{cg} + sL_s + s^2 R_{cg} C_2 L_s}{1 + sR_{cg} C_1 + sR_{cg} C_2 + s^2 C_1 L_s + s^2 R_{cg} C_1 C_2 L_s}$$ (3.2) When $L_s$ resonates with $(C_1 + C_2)$ , the denominator of Eq. (3.1) approaches zero and can be substituted into Eq. (3.2), to simplify $Z_{LX}(s)$ as $$Z_{LX}(s) = \frac{R_{cg} + sL_s + s^2R_{cg}C_2L_s}{1 + s^2C_1L_s} = \frac{sL_s - R_{cg}C_2/C_1}{-C_1/C_2} = R_{cg}\left(\frac{C_2}{C_1}\right)^2 - sL_s\frac{C_2}{C_1}$$ (3.3) In the equations above, $R_{cg} = 1/g_{m,cg}$ , and $g_{m,cg}$ represents the trans-conductance of the C-G device. This result clearly indicates $Z_{LX}(s)$ containing an $R_{cg}$ related term by multiplying it by $(C_2/C_1)^2$ in series with an inter-stage $L_s$ related *capacitive* term by multiplying it by $(C_2/C_1)$ . Since $C_2$ (the source parasitic capacitance of the C-G device) is typically larger than $C_1$ (the drain parasitic capacitance of the C-S device), both $R_{cg}$ and $L_s$ related reactive terms will be transformed to larger values. Figure 3.7: (a) $C_{gd}$ creates coupling between tanks; (b) single-ended equivalent model This impedance has two effects: first, it will create enlarged voltage swing, which is undesirable for current-mode operation and may degrade the receiver's linearity; second, it inevitably couples with the gate-drain capacitance ( $C_{gd}$ ) of FSRCS' C-S devices (Fig. 3.7(a)) and degenerates two resonance peaks into a single one, as illustrated in Fig. 3.9(a). Writing KCL and KVL of the equivalent circuit, the drain voltage $V_d$ can be expressed as $$V_d = \left[ g_m (V_{g1} + V_{g2}) + s C_{gd} (V_{g1} + V_{g2} - 2V_d) \right] Z_{load}$$ (3.4) $$\frac{V_{in} - V_{g1}}{R_s + sL_1} = sC_{gs1}V_{g1} + sC_{gd}(V_{g1} - V_d)$$ (3.5) $$\frac{V_{in} - V_{g2}}{R_s + sL_2} = sC_{gs2}V_{g2} + sC_{gd}(V_{g2} - V_d)$$ (3.6) where $Z_{load}$ is the same as previously derived $Z_{LX}$ in Eq. (3.3). By solving equations (3.4-3.6), we obtain the *FSRCS*' overall trans-conductance $G_m$ as $$G_m(s) = \frac{I_{load}}{V_{in}}(s)$$ $$= \frac{(g_m + sC_{gd})(H_1(s) + H_2(s))}{1 - [(g_m + sC_{gd})(\alpha_1 - \alpha_1H_1(s) + \alpha_2 - \alpha_2H_2(s)) - 2sC_{gd}]Z_{load}}$$ (3.7) where the functions and coefficients are defined as $$H_1(s) = \frac{1}{s(C_{gd} + C_{gs1})(R_s + sL_1) + 1}$$ (3.8) $$H_2(s) = \frac{1}{s(C_{gd} + C_{gs2})(R_s + sL_2) + 1}$$ (3.9) $$\alpha_1 = \frac{C_{gd}}{C_{gd} + C_{gs1}}; \quad \alpha_2 = \frac{C_{gd}}{C_{gd} + C_{gs2}}$$ (3.10) In this model, it is assumed that the gate-drain capacitances and trans-conductances are equal for simplicity. The functions $H_1(s)$ and $H_2(s)$ are the staggered band-pass responses from the two series resonance tanks. In Eq. (3.7), the nominator has the sum of the two band-pass responses, which are the desired responses from *FSRCS*. However, the presence of $C_{gd}$ may cause $G_m$ 's frequency response to degenerate into an undesired single resonance value. Figure 3.8: FSRCS with neutralization capacitors To overcome this degeneration, neutralization capacitors $C_n$ [53] are inserted, as shown in Fig. 3.8. The neutralization capacitors are known for their effectiveness in cancelling the effect of $C_{gd}$ , consequently improving the isolation between gate and drain, and improving devices' $G_{max}$ . The extra advantages it brings to our design include eliminating the effect that $C_{gd}$ causes to the FSRCS' frequency response. The neutralization capacitors $C_n$ also improve the stability and matching of the entire front-end (Fig. 3.9(b)) because the feedback through $C_{gd}$ is a major cause of front-end's instability. Figure 3.9 shows the simulated *FSRCS* Gm front-end's frequency response with input matched with and without the neutralization capacitors. Figure 3.9: Simulated (a) frequency response (b) input matching of Gm front-end with neutralization capacitors # 3.2.3 FSRCS Front-end's Stability During the design, the *FSRCS* front-end's stability is ensured in simulation by simulating the stability factors $K_f$ and Mu (Fig. 3.10). The simulation shows that the front-end is unconditionally stable across all frequencies. Figure 3.10: FSRCS front-end's stability factors #### 3.2.4 FSRCS' Layout Considerations Since *FSRCS* requires two tanks in parallel, its layout must be carefully designed in the planar CMOS process. It is implemented with G-S-G signal pads, as shown in Fig. 3.11. The arrangement is also package-friendly with co-planar waveguides. Both coils of the transformers are implemented with the ultra-thick metal layer (thickness 3.4 µm) to minimize the loss, which directly contributes to noise figure. The simulated loss of the transformers is <1dB. Figure 3.11: FSRCS's layout Each of the transformer tanks in the *FSRCS* has a size of 75 $\mu$ m by 75 $\mu$ m. While it occupies more area than a single transformer, a single *FSRCS* stage avoids multiple inductor or transformers in a multi-stage LNA. The overall area is still smaller compared with prior-arts, evidenced in Table 3.1. The transformers' coupling coefficients, k, are about 0.8. The non-unity k results in finite power loss and non-ideal mutual inductances. Design for manufacturing (DFM) is an important concern in advanced CMOS technology nodes to increase the foundry yield. Therefore, dummy metal filling is required to prevent any non-uniformity that would negatively impact the *Chemical Mechanical Polishing (CMP)* process. In this design, all the on-chip passive components (inductors and transformers) are manually filled with dummy metal to meet the foundry's design rule. Its impact is simulated with a 3-D EM field simulator (HFSS), and the performance degradation is accounted for in the design. #### 3.2.5 Passive Mixer and TIAs The low noise Gm front-end's output current is fed into passive I/Q mixers through a 2:1 transformer, which offers passive current gain. Current driven passive mixer is used to achieve high linearity in frequency down-conversion, lower 1/f noise and reciprocal impedance upconversion. Afterwards, I/Q TIAs transfer the down-converted signal current back into voltage. Each of these TIAs employs wideband two-stage differential Op-Amps with resistive feedback. The TIA has low input impedance ( $\approx 60\Omega$ at the highest gain setting). Such low impedance is consequently up-converted back to the front-end via the passive mixer to draw the signal current from the Gm front-end and lowers the Q of Gm front-end's output transformer. The latter feature enhances receiver's bandwidth in contrast to traditional high-Q load in voltage mode amplifiers. The mixer and TIA's noise contribution is analyzed in Fig. 3.13 [34]. The Gm front-end is modeled as Norton equivalent source with an output resistance $R_0$ . Op-Amp's input equivalent noise source is $V_{n,op}$ . Assuming a finite Op-Amp gain A, the output noise due to $V_{n,op}$ is $$V_{n,Op,out} = \frac{1}{\frac{1}{A} + \frac{R_o + R_m}{R_o + R_m + R_f}} V_{n,Op}$$ (3.11) Figure 3.12: (a) mixers (b) TIA and its Op-Amp (CMFB of second stage not shown) (c) QVCO schematic (LO buffer not shown) Figure 3.13: Equivalent model for analyzing mixer and TIA's noise Therefore, $V_{n,op}$ appears at the output as voltage without significant gain, whereas signal is converted to voltage with large gain. So when referred to the input, TIA's noise is greatly suppressed by signal gain. The TIA's first stage is biased close to transistor's sub-threshold to maximize its $g_m/I_d$ ratio. Its second stage is configured with feed-forward and Miller compensation to ensure adequate phase margin with wideband operation. The feed-forward path creates a LHP zero, improving phase margin [54]. Figure 3.14: Simulated TIA performance: (a) input referred noise voltage; (b) input impedance; (c) AC response The TIA's out-of-band filtering also determines the out-of-band linearity of the receiver. To avoid gain suppression at the TIA's input, an input shunt capacitor is placed to lower the out-of-band impedance to $<10\Omega$ at >2GHz offset (Fig. 3.14). Since the implemented prototype adopted a simple first-order filtering at the input, there exists a fundamental tradeoff between baseband bandwidth and tolerable blocker offset frequency. The baseband bandwidth is set to be 500MHz in the prototype to ensure that blockers situated at offset frequencies of 3.5GHz and above will receive adequate attenuation. A higher order filter response can be adopted at the TIA as well to tolerate a more close-in blocker with wider baseband channel bandwidth. #### 3.2.6 60 GHz LO Generation & Control ASIC The passive mixer is driven by an on-chip 60GHz *LC*-QVCO with cross-coupled NMOS differential pair shown in Fig. 3.12(c). The LO frequency is tuned by MOS varactors. The I/Q LO distribution is routed as symmetrical as possible to minimize I/Q mismatch. While it has been shown that overlapping LO clocks could introduce imperfections to the mixer down-conversion and degrades noise performance with sharp LO clocking waveforms [55], this phenomenon is less an issue at mm-wave frequencies. This is because at mm-wave frequency, the LO waveforms are more sinusoidal in nature. The imperfections can be further alleviated with adjustable mixer gate bias to minimize the overlapping period (when multiple mixer switches are simultaneously ON). LO buffers are inserted between QVCO and mixer. Both receiver and QVCO are digitally controlled by on-chip digital-to-analog converters (DACs). These DACs set the bias voltage for receiver stages and frequency tuning and current control for the QVCO [56]. Each DAC uses an 8-bit R2R architecture with small area. They are driven from an external PC via an on-chip serial-to-parallel interface, a universal asynchronous receiver/transmitter (USART). # 3.3 Measurement Results Two test-chips are fabricated in TSMC 65-nm CMOS technology. The first chip is to characterize the stand-alone on-chip QVCO, and the second chip is the proposed 60GHz current-mode prototype receiver (Fig. 3.15). (a) Figure 3.15: Die photograph of the proposed (a) QVCO; (b) current-mode receiver showing key blocks. ## 3.3.1 60 GHz QVCO The QVCO test-chip is measured to characterize its frequency tuning range and phase noise performance. The QVCO drives two open drain buffers that are probed directly using GSSG probes. It provides 15% tuning range from 54.8GHz to 63.8GHz, covering the entire 60GHz bandwidth. The phase noise is measured as -91dBc/Hz and -113.5dBc/Hz at 1 MHz and 10MHz offsets, respectively (Fig. 3.16), with a free running frequency of 61GHz. The phase noise performance is competitive with other state-of-the-art mm-wave VCOs. With a power consumption of 24mW, the QVCO achieves a *Figure-of-Merit (FOM)* of -175.4dBc/Hz, where $$FOM = \mathcal{L}(\Delta\omega) + 10\log_{10}(Power[mW]) + 20\log_{10}(\Delta\omega/\omega)$$ (3.12) Figure 3.16: Measured QVCO phase noise. ## 3.3.2 Gain and Noise Figure With the on-chip QVCO properly tuned, the peak conversion gain of the receiver is 36dB. The 3-dB RF bandwidth is 7.5GHz, from 55GHz to 62.5GHz. Over most of the 3-dB bandwidth, the input return loss ( $S_{II}$ ) is below -10dB (Fig. 3.17). Figure 3.17: Measured receiver conversion gain and input matching. Figure 3.18: Measured and simulated receiver noise figure. Next, the NF of the receiver is measured using a V-band noise source and the Y-factor method. The minimum measured NF is 3.8dB after de-embedding out the input cable and probe losses. Good agreement is achieved between the measured and simulation results. The NF remains below 7dB across the entire 3-dB bandwidth (Fig. 3.18). ## 3.3.3 Linearity: In-band Gain compression is tested for both in-band and out-of-band to characterize the receiver's linearity. At the high gain setting (36dB), the input $P_{1dB}$ is -35dBm; at the low gain setting (17dB), the input $P_{1dB}$ is -18dBm. The receiver's output non-linearity is also critical for higher order modulation schemes, so that the EVM requirement can be met while not substantially backing off from ADC's full scale. The tested output $P_{1dB}$ is about +1dBm (Fig. 3.19). The receiver gain can be varied by adjusting the TIA's feedback resistors. Figure 3.19: Measured receiver gain compression (P1dB). #### 3.3.4 Linearity: Out-of-band The receiver's blocker performance is also measured. In this measurement, two V-band signal sources are used to generate a weak in-band signal and a strong out-of-band signal. The desired signal is 57GHz with a power of -54dBm. The on-chip QVCO is tuned to 57.1GHz, so the down-converted signal is at 100MHz. The strong blocker resides at 60.5GHz, creating a down-converted tone at 3.5GHz at the output. The two inputs are combined with a V-band magic-T. Because two signals are at different frequency, the combiner's phase response is not a major concern. By increasing the blocker power, the signal gain starts to compress and reach the 1dB compression point with a -9dBm blocker power. However, the gain compresses much faster than simulations, probably due to the inaccuracy of the transistor models at 60GHz. Figure 3.20: Measured receiver gain compression (P<sub>1dB</sub>) in the presence of blocker. #### 3.3.5 I/Q Mismatch and LO Leakage The receiver's I/Q mismatch is measured by a continuous wave input. The outputs from I and Q paths are shown in Fig. 3.21. The I/Q amplitude and phase mismatches are 0.15dB and 1.62 degrees, respectively. The measured LO to RF feed-through is -59dBm. Figure 3.21: Measured receiver I/Q output waveforms. ( $V_{pI} = 36.1 \text{mV}$ , $V_{pQ} = 35.5 \text{mV}$ , $\Delta t = 2.454 \text{ns}$ , f = 100 MHz) #### 3.3.6 Comparison with Prior Arts Table 3.1 compares the proposed mm-wave receiver with other recently published 60GHz receivers. Compared with other recently published receivers with comparable 7GHz bandwidth [47]-[49],[57], our prototype receiver achieves the best-in-class NF = 3.8dB, $P_{1dB,in}$ = -18dBm and $P_{1dB,out}$ = 1dBm. It also uniquely tolerates an out-of-channel blocker up to -9dBm at 3.5GHz away, which has never been reported in V-band receivers for the past. The power and silicon area are also very competitive with prior arts. Finally, the proposed current-mode architecture is equally applicable to other wideband receivers at mm-Wave frequencies. ## 3.4 Conclusion A novel current-mode mm-wave receiver with FSRCS Gm front-end is demonstrated with: 1) relaxed bandwidth/noise/linearity trade-offs inherited in traditional mm-wave receivers and 2) enhanced blocker tolerance without compromising receiver's performance. The realized 60GHz prototype is integrated with a high purity QVCO, and reported state-of-the-art performance in bandwidth, NF, linearity and blocker tolerance. | | Gain<br>(dB) | Min. NF<br>(dB) | P <sub>1dB,in</sub><br>(dBm) | P <sub>1dB,out</sub><br>(dBm) | RF BW<br>(GHz) | P <sub>1dB,blocker</sub> (dBm)* | VCO Phase<br>Noise<br>(dBc/Hz)** | CMOS<br>Technology | Rx Power (mW) | Area<br>(mm²) | |----------------|--------------|-----------------|------------------------------|-------------------------------|----------------|---------------------------------|----------------------------------|--------------------|---------------|---------------| | [46] | 60 | 3.9 | -33 | -5 | 1.5 | N./A. | Ext. | 65nm | 20***/34 | 1.1**** | | [47] | 30 | 5.5 | -31 | -1 | 10 | N./A. | -79 | 40nm | 35 | N./A. | | [48] | 35.5 | 5.6 | -21 | -3.5 | 13 | N./A. | -90 | 65nm | 40 | 2.4 | | [67] | 17.3 | 6.8 | N./A. | N./A. | <4 | N./A. | -85 | 65nm | 81.5 | 8 | | [68] | 41 | 7.6 | -29 | -1 | <5 | N./A. | N./A. | 65nm | 74 | N./A. | | [57] | 62 | 7.9 | N./A. | -12.6~-8.8 | 8 | N./A. | Ext. | 65nm | 12.4 | 0.6**** | | This work [50] | 36 | 3.8 | -18 | +1 | 7.5 | -9 | -91 | 65nm | 25***/53.5 | 1.3 | Table 3.1: Comparison with other mm-wave receivers. # **CHAPTER 4** # A High-Speed Bi-Directional RF-Interconnect with Multi-Drop ## **Arbitration** ## 4.1 Introduction On-chip interconnects, especially those used in future chip multi-processors and Network-on-Chip (NoC), have been projected to be the limiting factor in terms of bandwidth, power and latency. RF modulated and transmission-line-based interconnects (RF-I) have been demonstrated as superior in latency, scalability, re-configurability, bandwidth and power efficiency [59]-[61]. According to [60], RF-I global links combined with local RC wires provide the inter-core network with either 1.7X performance gain under the same power or 5X power savings under the same performance. Meanwhile, the large number of peer-to-peer on-chip interconnects in NoC becomes another major issue, which inevitably leads to large power and area consumption for connecting cores. Therefore, the next generation of NoC architecture demands the benefit of having on-chip high-speed interconnect with multi-drop and arbitration capabilities as data buses [60],[62]. Hence an RF-I solution can be extremely beneficial by itself and can be further enhanced by adding the multi-drop and arbitration capabilities. The arbitration capability is especially critical so that all computing cores can share a common communication channel efficiently with fairness and without collision. Previously published multi-drop works are either too power-hungry (optical link involves power-consuming O/E and E/O conversions) [62], or has long latency and lacks arbitration capability because of reflections introduced by multiple drops at the baseband and unpredictable receiving priorities [63]. Figure 4.1: Original FDD RF-I concept [59]-[61] and its application in large NoCs [60]-[62]. In this work, we propose a novel multi-drop RF-I with arbitration capability based on $\lambda/4$ directional couplers. This architecture offers superior scalability and re-configurability, while retaining RF-I's benefits of high data rate, low latency and low energy per bit. In this Chapter, Section 4.2 will present the proposed RF-I architecture, on-chip $\lambda/4$ directional coupler and TX/RX circuit blocks; and Section 4.3 presents measurement results; Section 4.4 shows its application in a Network-on-Chip application; and Section 4.5 summarizes the overall work. ## 4.2 RF-Interconnect with Multi-Drop Arbitration Architecture Taking into account the aforementioned considerations, we designed and implemented a high-speed RF-I with 60GHz carrier and multi-drop arbitration capability in 65nm 1P6M GP CMOS with a 3.4µm thick top metal. The carrier frequency of 60GHz is carefully chosen as lower frequency results in larger coupler size incompatible with future NoC's core size and larger carrier to bandwidth ratio that potentially causes dispersion and requires power hungry equalization techniques; whereas higher frequency causes higher TL loss and higher VCO power consumption which implies degradation of link's power efficiency. #### 4.2.1 RF-I System with Multi-Drop Arbitration Capability In this design, the RF-I with multi-drop and arbitration consists of four drops, each of which can perform as transmitter (multi-caster) or receiver. Figure 4.2 shows the system block diagram. At each drop, a $\lambda/4$ directional coupler and a high-speed (5Gbps) ASK transceiver are employed. The $\lambda/4$ directional coupler is known for its directivity, isolation and matching properties at specific ports. The isolation and matching suppress reflection issues in multi-drop system, whereas directivity sets pre-defined priority. Digitally controlled switches are also implemented between adjacent drops for both impedance matching of arbitrary multi-cast and destructive reading arbitration. The switch has 1.5dB loss and 40dB isolation when it is on and off. With transmission-gate-based switches, signal on the main channel can be terminated to $Z_0$ (100 $\Omega$ differentially) or passed to subsequent drops [60],[62]. In this method the RF-I is implemented with a scalable daisy-chain arbitration scheme. Fig. 1 shows the scenario that drop A multi-casts to drops B, C, and D. Signal flow is highlighted in green. Inactive components (drop A' RX and other drops' TX) are powered off to save power. Figure 4.2: 4-drop RF-I with Arbitration Capability (Drop A multicasts). With the switches, the system can adapt to the following fashions based on the global link condition in the NoC: 1) an arbitrary drop can be reconfigured as transmitter with a fixed set of priorities for the receivers by simply opening the switch before the transmitting node and closing the other ones; 2) drops with lower priorities can be destructed reading by opening the switch before them; 3) the physical link can be divided to two separated sections as A transmits to B whereas C communicates with D (the number of available sections increases when more nodes are attached to this RF-I bus). The two links will not interfere with each other since the isolation of switch is high enough. This architecture brings superior flexibility and scalability, which potentially saves considerable bandwidth, latency and power on the NoC interconnects [60]. Although 4 drops are implemented in this design due to silicon area constraints, more drops can be added as long as TX's output power is large enough for the last drop to detect the data. For real world applications, the number could be 8 or 16, whichever is more convenient for implementing an NoC. #### 4.2.2 Channel and On-Chip Directional Coupler The main communication channel is made of differential transmission lines with a total length of 5.5mm between farthest nodes under multicast scenario. One challenge of high-speed transmission-line-based multi-drop interconnect is the signal reflections from various drops along the channel. In our design for example, the loss on the TL is 1.2dB/mm, and the distance between two adjacent drops is 1.5mm. The round trip path and switch loss is only -6.6dB which means reflection at each drop has to be low enough to achieve SNR of 18dB required by ASK [61]. On-chip directional coupler is introduced to solve the problem. We take advantage of the short wavelength of mm-wave carrier, so the on-chip coupler could be realized in a small size. The coupled lines (AP layer) are placed on top of the main channels (Metal-6 layer), costing no extra silicon area and providing tight coupling. Additionally, due to the short wavelength at 60GHz, the quarter-wavelength coupler is compact in size. Because of the directivity of couplers, the signals will propagate in one direction along the channel. It also results in the priority ordering of drops. Figure 4.3: On-chip directional coupler and its simulated performance. In Fig. 4.3, a 3.7dB coupling loss is achieved at 60GHz, and it remains almost flat within a 10GHz bandwidth. The wide bandwidth is able to support the desired ultra-high speed communication. The main channel reflection and isolation between the transmitter and receiver on the same drop are both below 26dB. It results in the reflection caused ISI below 32.6dB (reflection plus round trip path loss) per drop, so it is no longer the limiting factor to achieving the required SNR. This solves the reflection issues that have plagued traditional multi-drop interconnect designs. #### 4.2.3 60GHz ASK RF Transceiver To minimize the energy consumption and design complexity, a simple ASK transceiver is proposed. In contrast to other modulation such as BPSK, ASK is a non-coherent communication scheme that does not require any synchronization. It simplifies the transceiver architecture and reduces power consumption by avoiding any phase or frequency locking circuits. The carrier frequency is tuned to 60GHz to enable the small coupler size. Figure 4.4: 60GHz ASK transmitter. As shown in Fig. 4.4, the transmitter consists of a VCO and a simple single-stage, push-pull driver with embedded ASK modulator. The VCO is a free-running, cross-coupled differential pair with a resonant tank. In a non-coherent modulation scheme, such as ASK, it can tolerate output frequency drift up to several GHz. As long as the frequency is within the coupling band of the coupler and receiving band of the receiver, the signal can be demodulated. The transmitter VCO's output is inductively coupled to the differential push-pull driver through an on-chip transformer with a 1-to-1 ratio. The input data bits are directly fed to the center tap of the transformer's secondary coil. The digital data bits modulate the on/off state of the driver by modulating the bias (common mode) of the input transistors. By combining the modulator and driver into one single stage, this design avoids an additional mixer to perform ASK modulation, and therefore achieves high efficiency and simplicity. At the end, the modulated ASK signal is inductively fed into the directional coupler. Figure 4.5: 60GHz ASK receiver. The receiver (Fig. 4.5) consists of a single-stage low noise amplifier and an envelope detector. The output power from the transmitter is around 3dBm, and the path loss from the transmitter to the farthest receiver (worst case) is 27dB. Therefore, the received power at the input of the LNA is about -24dBm. The differential common source LNA with inductive degeneration has an 18dB gain at 60GHz, 9GHz bandwidth and 6dB noise figure. The cascode configuration offers better stability. From link budget analysis, the farthest distance this architecture can support is more than 12mm with 8 drops, assuming adjacent drops are 1.5mm apart. After the LNA, the amplified signal is then fed into a differential mutual mixer to detect the envelope. From simulation, this detector is able to recover a 60GHz ASK signal's envelope up to 6Gbps, and the input sensitivity is about 10mV<sub>pk-pk</sub>. Because the mixer's devices work at sub-threshold, its power consumption is very low. Entire RX consumes power of 5mW. Figure 4.6: Testing environment of the system. # 4.3 Measurement Results Figure 4.7: 5Gbps RF-I with multi-drop and arbitration die-photo. The RF-I with 60GHz carrier and multi-drop arbitration capabilities is fabricated in the TSMC 65nm 1P6M process with the die-photo in Fig. 4.7. The transmitter and receiver occupy core areas of 0.0048mm<sup>2</sup> and 0.034mm<sup>2</sup> respectively. The link can operate properly with conventional digital logic circuits placed directly under its passive structure, which gives better area utilization [61]. Besides the entire RF-I, a stand-alone transmitter test-chip is measured to characterize the frequency tuning of the VCO, and output power of the driver. Figure 4.6 shows the testing setup environment of the system. # 4.3.1 60GHz VCO Figure 4.8: Transmitter output frequency and calibrated output power. Figure 4.8 shows the measured TX output frequency and output power with different tuning voltages. In the real RF-I application, this frequency does not have to change with time. For the driver, its output power is the main metric of interest because it determines the link budget. The output power is reasonably flat within the band of interest. Its ripple is within 3dB across the VCO's tuning range; therefore, the driver's bandwidth is large enough for this application. The designed driver is able to deliver more than 3dBm across the band. ## **4.3.2** Multi-cast Functionality Figure 4.9: Eye diagrams when drop A multi-casts to drops B, C, D. For the complete RF-I with multi-drop arbitration, eye diagrams and BER at 5Gbps for all drops in various configurations are measured to demonstrate the multi-drop and arbitration capability. Figure 4.9 shows the diagrams when drop A transmits and drops B, C, D receive. This demonstrates the capability of multi-drop. Fig. 4.11 shows that the BER is lower than $10^{-12}$ . Link analysis and BER also imply that the link bandwidth is limited by speed of the circuitry rather signal's SNR. Therefore more drops can be added into this multi-drop system without degrading link bandwidth. Figure 4.10: Eye diagrams when drop B multi-casts to drops C, D, A. Figure 4.11: Measured BER vs. data rate at different drops. Figure 4.10 shows the eye diagrams when drop B transmits and drops C, D and A receive. This demonstrates the capability of an arbitrary drop multi-casts function. In fact, it is expected that all drops will perform exactly the same because of the structure symmetry. ## 4.3.3 Multi-cast with Destructive Reading Finally, destructive reading is demonstrated in the scenario that drop A transmits to B (Fig. 4.12) and drops C, D are blind, as the switch before C is open. It also implies that the system can operate as multiple non-interfering links (A->B, C->D). This function is not measured due to testing equipment constraints. Figure 4.12: Eye diagrams when drop A transmits and only B receives. ## 4.3.4 Comparison with Prior Arts Table 4.1 shows the proposed RF-Interconnect compared with other recently published CMOS on-chip interconnects. | | This work [67] | [63] | [65] | |------------------|----------------|----------|----------| | Category | RF | Baseband | Baseband | | Multi-drop | Yes | Yes | No | | Arbitration Cap. | Yes | No | No | | Data rate (Gbps) | 5 | 8 | 5 | | Power (pJ/b/mm) | 0.24* | 0.18 | 0.21 | | Latency (ps/mm) | 9 | 30 | 50 | Table 4.1: Comparison with other CMOS on-chip interconnects. # 4.4 Stream Arbitration – Arbitration Multi-drop RF-I in NoC With the arbitration RF-I built in hardware, its impact in the future large NoC is investigated [66]. #### 4.4.1 Stream Arbitration: Scheme In this section we describe stream arbitration from an algorithmic and architectural point of view. We partition the aggregate bandwidth provided by the RF-I or waveguide into several logical communication channels. One of them is used for arbitration, which is called the *arbitration channel*. The remaining channels are used for PE-memory data requests and responses, which are called *data channels*. Each RF node has one transmitter and receiver pair to access both the arbitration channel and data channels. Active sources (nodes that want to send flits) compete for the data channels in the arbitration channel to talk to their desired destination nodes. Arbitration is done for each flit that is transmitted. The key component of our approach is the arbitration stream that travels across the arbitration channel. Conceptually, the arbitration stream starts at a single node, which is called the stream origin. The arbitration stream starts out logically empty and will travel in a unidirectional manner across all the nodes on the chip, this is called *Trip* 1. In this trip, when the stream passes each node, the node logically augments a number of bits (referred to as *substream*) in the arbitration stream to specify whether or not this node *is attempting to send to another node*, and whether or not this node is *capable of receiving packets*. It is important to note that these two pieces of information (desire to send and availability to receive) do not require any parsing of the stream – they only rely on information known a priori at the node. So there is no dependence where the stream must be read first and then modified – such a dependence would impair the arbitration latency by bringing slower logic on the critical path of the stream propagation. To ensure this decoupling and that nodes need only modify the stream without first reading it, each node has a specified region of bits that make up that node's substream, and substreams are disjoint within the arbitration stream. Collectively, these disjoint substreams will represent each node's interest in sending over a data channel and availability to receive from a data channel. The layout of a single substream element is shown in Fig. 4.13. A node that wants to send a flit and is therefore contending for data channels is referred to as a *source node*. The destination ID is the label of the node to which a source node intends to send a flit (referred to as a *destination node*). The flow control bit indicates the whether or not there is sufficient buffer space in this node to accept a flit. N is the number of RF nodes. Figure 4.13: The substream augmented by each node as the stream passed by. After the arbitration stream passes the last RF node in Trip 1, it circulates over all nodes a second time, which we refer to as *Trip* 2. In this trip, when the stream passes each node, the node receives the arbitration stream but does not modify the stream. The purpose of Trip 2 is to parse the stream in order to check: - *Ability to Send*: If this node is attempting to send a flit, information from the stream will be used to indicate whether this node can acquire a data channel, and if so, the data channel ID - *Receive Channel*: Determine whether this node will be receiving a flit, and if so, the data channel ID where this data will be arriving is computed from the stream. ``` Algorithm 1. Stream Arbitration Input: Stream: flowControl[1..N], interested[1..N], destination[1..N], where N is the number of RF nodes; the total number of channels M; this node's ID node_id. Output: Transmitting_channel_ID, Receiving_channel_ID. Transmitting\_channel\_ID = INVALID; Receiving_channel_ID = INVALID; channel_ID = 0; for i = 1 ... N do if (interested[i] and (not flowControl[destination[i]]) ) then flowControl[destination[i]] = TRUE; channel_ID++; if (destination[i] = node\_id) then Receiving_channel_ID = channel_ID; end if (i = node\_id) then Transmitting\_channel\_ID = channel\_ID; end if (channel\_ID = = M-1) then break; end end end ``` The algorithm is very simple and straightforward. It parses the stream in the order of the augmentation of the bits. A source node can acquire a data channel to a destination node if: - The flow control bit of the desired destination node is zero. - There is no upstream node already sending to the desired destination node. In this context, "upstream" means that an earlier node in the unidirectional flow of the stream. - There are still available data channels. After the arbitration stream is parsed, the transmitter of a source node that has successfully acquired a data channel will be tuned to send on this channel, and the receiver of the intended destination node will be tuned to listen to the same data channel. A node can be a source and a destination simultaneously, using different channels. After Trip 2, the sources that successfully acquired data channels begin to use these data channels to communicate with their corresponding destinations. A channel is used for a single flit, and is surrendered. This does not incur a performance penalty because arbitration can be initiated every cycle, so a pair of nodes is allowed to communicate so long as the source continues to win arbitration. This requires a pipelined stream arbitration and data transferring. The latency of the two trip arbitration and the pipelining of the arbitration and data transferring in the physical design are detailed in Section 4.4.3. It can be seen that the upstream nodes always have higher priority in the arbitration than the downstream nodes (nodes encountered later in the unidirectional flow of the stream). In order to introduce fairness into stream arbitration, we use a rotating prioritization scheme, where each node is gradually reduced in priority each cycle, until it reaches the lowest priority. Each cycle, the lowest priority node from the previous cycle becomes the highest priority node. This prevents nodes that are lowest priority from being starved during periods of high system load. Moreover, each node gradually reduces priority to reduce the likelihood of burst data transfers dropping suddenly from highest priority in one arbitration cycle to lowest priority in the next. We found this method allowed for flits associated with multi-flit messages to arrive at the destination subsequently without high transmission latency deviation. From an architectural point of view, this gradually priority reduction is achieved by rotating the stream origin in the reverse direction of the stream traversal in the transmission line. However, we have a smart scheme detailed in Section 3.3 to support this without really rotating the stream origin. ## 4.4.2 Stream Arbitration: Example To illustrate our approach, this section presents a single example arbitration attempt (Fig. 4.14). This example consists of only 4 nodes participating in arbitration, one arbitration channel, one data channel, and assumes the stream origin is at the node A. Node A is attempting to send to node C, while node B and D are attempting to send to node A. C has no flits waiting for transmission, but has a fully occupied buffer and cannot receive any flits. Figure 4.14: An example of the stream arbitration scheme. In Trip 1, each node augments its substream as described in Section 4.4.1 to the stream as showed in Fig. 4.14(a) (The stream vectors follow the direction of arrow in the figure ). Nodes A, B and D are source nodes in this arbitration cycle. They modulate the interested bit "1" and their respective destination node IDs when the stream travels by. The node C has no requirement for data transmission, and its receiver buffer is unavailable for new messages in this arbitration. So node C only sets the flow control bit to "1" and leaves the interested bit at 0 when the stream travels by. Therefore the substreams to be modulated by the node A, B, C and D are "0110", "0100", "1000" and "0100", respectively. In Trip 2 as shown in Fig. 4.14(b), each node receives the full arbitration stream, and executes the algorithm presented in Section 4.4.1. Node A is the first node to modulate the stream and has the highest priority to acquire the data channel, but it cannot send because its destination, node C, has declared that it does not have available buffer space. Thus, node A loses in arbitration and does not receive a data channel. Then the next node in priority order, node B, wins the data channel in this arbitration since its destination, node A, has an available buffer. Node C does nothing, since its receiving buffer is full and it is not an active node. From parsing the stream, Node D knows the upstream node B gets the data channel, and there are no more channels left. So node D also loses in this arbitration. After the arbitration, the winner, node *B*, will transmit the message through data channel, and others will retry in the next arbitration. ### 4.4.3 Stream Arbitration in RF-I The key advantage of RF-I over traditional interconnects is its capability of multi-cast with onchip directional couplers (see Section 4.2.2). Impedance matched directional couplers eliminate signal reflection that has inhibited multi-cast on traditional interconnects. Although Fig. 4.2 shows an exemplary RF-Interconnect multi-cast link of only one band, more RF channel can be added into this multi-cast with a similar fashion as shown in Error! Reference source not found. in the case that higher aggregate data rate is desired for this arbitration channel with multi-band directional couplers. Since required signal power scales with the number of drops along a multi-cast link, a larger amount of power is required to transmit a signal on a multi-cast link as compared to required to transmit on a point-to-point link [61],[67]. Such effects are taken into consideration in our power estimation shown in Section 4.4.6. ## 4.4.4 Curl Transmission Line for Stream Circulation A conventional RF-I transmission line is unidirectional and acyclic, i.e., the starting point and the end point of the transmission line are two different points. This prohibits the stream circulation in the arbitration channel. To enable the two-trip stream arbitration, we propose a curled transmission line. Figure 4.15(a) shows the curled transmission line for the arbitration channel and the normal RF-I transmission line for the data channel. This curl starts from the stream origin at the outside loop and ends at the last RF node on the trip in the inner loop. The outer loop of the arbitration channel is Trip 1 for transmitting only, while the inner loop is Trip 2 for receiving only. The transmitters to the arbitration channel (TX-A) of all the nodes are attached to the outer loop, while the receivers (RX-A) attached to the inner loop. There is also a frequency-tunable transceiver pair (TX-D and RX-D) at each node, which is attached to the data channels. Although we presented a rectangular style transmission line in Fig. 4.15(a) for better illustration in the real physical design, all the transmission lines should go through each node, as shown in Fig. 4.15(b). The reflection and discontinuity effect of sharp 90 degree turns in the curl transmission lines can be mitigated by careful designs for impedance matching. For example, our hardware prototype used diagonal routing at each corner of the channel to eliminate sharp turns (Fig. 4.7). As a result this did not impact the interconnect performance. Depending on different CMOS fabrication technology, rounded turns can also be implemented to better avoid the reflection issue caused by the sharp turns. In our evaluated system, which is a 1cm<sup>2</sup> chip with 16 PE clusters (each cluster has one RF node), the total distance for 1 trip in the arbitration channel, or the longest distance in the data channel, is 6cm. The speed of light in silicon is 8ps/mm, thus each trip of the arbitration can be finished in 480ps. Our evaluated system has a working frequency of 2GHz. Therefore, each trip only takes one cycle, and any flit transfers on the data channel can reach its destination in one cycle. Figure 4.15: The curl transmission line. At any particular cycle, the TX-As of the nodes are augmenting their substreams for the arbitration initiated during cycle x; the RX-As of the nodes are receiving the entire stream for the arbitration initiated during cycle x-1; the local stream parsing unit is parsing stream for the arbitration initiated during cycle x-2; the RX-Ds and TX-Ds of the winning sources of the arbitration initiated during cycle *x*-3 are using data channels to transfer data. In this way, stream arbitration can be initiated at every cycle. ## 4.4.5 Time Division Modulation Multicast For Stream Augmentation We propose a time division modulation multicast (TDMM) approach on top of the multi-band nature of RF-I in the arbitration channel to achieve the stream augmentation with priority rotation. The latency T for one stream trip can be divided into N slots, where N is the number of RF nodes, and the length of each slot is $\lambda = T/N$ . The length of each sub-stream is denoted as $\delta$ . Let d(v) denotes the RF node hops from the stream origin to node v. Let p(v) denotes the priority of node n in a particular arbitration, in which "p=0' means highest priority. Then, the slot for a node to modulate its substream to the arbitration channel is $(p\delta + d\lambda)$ . The latency of each node can be obtained from its receiver's local sampling clock, which is available in large scaled NoCs. An example of TDMM is shown in Fig. 4.16. In this example, there are 4 RF nodes. Node A is the stream origin where the curl starts. Assume the current highest priority is rotated to node C and the priority order is in the reverse of the stream travel direction. Then we have: node A: d=0, p=2; node B: d=1, p=1; node C: d=2, p=0; node D: d=3, p=3. Therefore, node A, B, C, and D will modulate their substreams at slot $2\delta$ , $\delta$ + $\lambda$ , $2\lambda$ , and $3\delta$ + $3\lambda$ , respectively. These modulated substreams will finally form a stream that is in the order of the priority of their nodes when the stream makes the second pass to be read. The proposed TDMM approach can support any arbitrary priority assigned to these nodes using the formula described above, provided all nodes have unique priorities. Here we adopt the gradual priority reduction scheme discussed in Section 4.4.1. Figure 4.16: An example of time division modulation multicast for stream augmentation with priority rotation: (a) t=0 and $t=\lambda$ : no substream is modulated. (b) $t=2\lambda$ : node C, B, A modulate their substreams simultaneously. (c) $t=3\lambda$ : substream C, B, A achieves node D, C, B, respectively. (d) $t=4\lambda$ : substream C, B, A achieves node A (inner loop), D, C, respectively. Substream C is received by node A. (e) $t=5\lambda$ : substream C, B, A achieves node B (inner loop), A (inner loop), D, respectively. Substream C and B is received by node B and A, respectively. (f) $t=6\lambda$ : node D modulates its substream. Substream C, B, A achieves node C (inner loop), B (inner loop), A (inner loop), and are received by them, respectively. Each node locally keeps a small counter to record its current priority p. Initially, each node v is assigned with a priority p(v) = d(v). After each arbitration, it increases its p(v) by 1. When p(v) reaches N, where N is the number of RF nodes, indicating that it is at the lowest priority, the node will reset p(v) to 0, which is the highest priority in the next arbitration. #### 4.4.6 Power and Area Estimation For power estimation, the predicted power parameters are different between the arbitration channel and data channels. Although data channels are broadcast links, the arbitration strategy allows them to be treated as point-to-point links in any data communication cycle. On the other hand, due to large signal attenuation for the multicast link in the arbitration channel, increased power is needed to meet the signal-noise-ratio (SNR) requirement of desired bit-error-rate (BER, $10^{-12}$ ). The power and area modeling values of point-to-point RF-Interconnect RF Transceivers used in this paper implemented with 32nm CMOS Technology are shown in Table 4.2. The TX and RX power consumptions are predicted (scaled) from our implementation of a multi-band RF-I. For the scaling from 90nm to 32nm CMOS process performance, it is assumed that the average power consumption per transceiver channel is expected to stay constant at about 6mW. The logic behind the assumption is that although RF circuits at higher carrier frequencies require more power, this additional power is compensated by the power saved at the lower carrier frequencies due to higher f<sub>T</sub> transistors available with scaling. In addition to increased number of channels, the modulation speed of each carrier would also increase, allowing a higher data rate per channel. As a result, the data rate per channel per wire is predicted as 8Gbps, which results in a power efficiency of 0.75pJ/b. A behavioral model simulation shows that 15GHz channel spacing is sufficient to carry 8Gb/s data with a low BER. Therefore it is projected that 12 carriers can be sent simultaneously on each wire (transmission line) given the 350GHz f<sub>T</sub> of 32nm technology, which indicates 96Gbps aggregate data rate on each wire. The active area and passive area are also predicted from our 90nm multi-band RF-I prototype. | | Power (mW) | Power Efficiency (pJ/b) | Active Area | Passive Area | |-------------|------------|-------------------------|--------------------|---------------------| | TX Mixer | 1 | | 5um x 5um | 0 | | TX PA | 2.5 | | 10um x 10um | 50um x 50um | | Total TX | 3.5 | 0.44 | 125um <sup>2</sup> | 2500um <sup>2</sup> | | RX Mixer | 0.5 | | 10um x 10um | 50um x 50um | | RX Baseband | 2 | | 20um x 20um | 0 | | Total RX | 2.5 | 0.31 | 500um <sup>2</sup> | 2500um <sup>2</sup> | Table 4.2: Power Parameters of Point-to-Point RF Transceiver in 32nm Technology. | | Power (mW) | Power Efficiency (pJ/b) | Active Area | Passive Area | |-------------|------------|-------------------------|--------------------|---------------------| | TX Mixer | 1 | | 5um x 5um | 0 | | TX PA | 5 | | 15um x 15um | 50um x 50um | | Total TX | 6 | 0.6 | 250um <sup>2</sup> | 2500um <sup>2</sup> | | RX Mixer | 3.5 | | 20um x 20um | 50um x 50um | | RX Baseband | 2 | | 20um x 20um | 0 | | Total RX | 5.5 | 0.55 | $800 \text{um}^2$ | 2500um <sup>2</sup> | Table 4.3: Power Parameters of Arbitration RF Transceiver in 32nm Technology. Table 4.3 shows our power and area modeling of arbitration RF-Interconnect RF Transceivers in 32nm CMOS Technology. The power consumption is estimated by scaling our implementation of a multi-cast RF-I at 65nm CMOS technology in the similar fashion of the scaling of point-to-point RF transceivers. The power efficiency is predicted to be 1.15pJ/b. The number is higher than that of point-to-point RF transceivers mainly because of the larger channel loss of multi-cast data links. The data rate per channel per wire is predicted as 8Gbps. Therefore 12 carriers provide an aggregate bandwidth of 96Gbps on each wire for the arbitration channels. The active devices area is also less than 2x larger than the point-to-point link because of the higher gain required for arbitration links (larger devices implemented). Due to the power overhead is basically the charging and discharging of the bias transistor's gate capacitance, we configure the RF transmitter and receivers with simple logic gates which control their bias stage, so the RF transmitter and receivers can be turned off to save power when they are not in use. For example, a 50fF gate capacitance of the receiver bias transistor indicates 25fF energy consumption of turning it on and off, which is substantially smaller than demodulating one bit from the arbitration channel (>1pJ/b). The speed of this power switching depends on the driving strength of the controlling logic gates. In 32nm technology, this speed is expected to be well below 0.05ns. #### 4.4.7 Results and Discussions With the algorithm and RF-I modeled in Section 4.4.1 - 4.4.7, two top-level systems, hierarchical stream arbitration (HStream) and flat stream arbitration (FStream), are proposed in [66]. These top-level systems and the detailed evaluations are not presented in this thesis, since they are not the focus of the work. In conclusion, the proposed stream arbitration achieves upwards of 40% reduction in average flit transmission latency, which is significant in large NoC designs, while making effective use of scarce network resources. Additionally, stream arbitration scales well with the number of communicating nodes, and accommodates both low and high traffic solutions without degradation. For more details, the readers can refer to reference [66]. # 4.5 Conclusion A novel bi-directional multi-drop RF-Interconnect with arbitration is reported. It achieves 5Gbps with 60GHz carrier and compact quarter-wavelength directional couplers to avoid reflection in the multi-drop system. This is the first CMOS on-chip interconnect demonstrated with full multi-drop and arbitration capabilities. With the prototype as a hardware guideline, its application in a future large NoC is also investigated. Based on the prototype, time division modulation multicast is proposed for stream augmentation, and power and area model is generated. It could lead to more advanced NoC systems that have lower flit transmission latency, and make effective use of network resources. ## REFERENCES - [1] J. Mitola, "The software radio architecture," *IEEE Commun. Mag.*, vol. 33, no. 5, pp. 26-38, May 1995. - [2] R. Bagheri, A. Mirzaei, S. Chehrazi, M. Heidari, M. Lee, M. Mikhemar, W. Tang, and A. Abidi, "An 800MHz to 5GHz software-defined radio receiver in 90nm CMOS," in *IEEE ISSCC Dig.* 2006, Feb. 2006, pp. 1932-1941 - [3] R. Bagheri, A. Mirzaei, S. Chehrazi, M. Heidari, M. Lee, M. Mikhemar, W. Tang, and A. Abidi, "An 800MHz-6GHz software-defined wireless receiver in 90nm CMOS," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2860-2876, Dec. 2006 - [4] A. Abidi, "The path to the software-defined radio receiver," *IEEE J. Solid-State Circuits*, vol. 42, no. 5, pp. 954-966, May 2007 - [5] F. Bruccoleri, E. Klumperink, and B. Nauta, "Noise cancelling in wideband CMOS LNAs," in *IEEE ISSCC Dig.* 2002, Feb. 2002, pp. 406-407 - [6] ——, "Wide-band CMOS low-noise amplifier exploiting thermal noise canceling," *IEEE J. Solid-State Circuits*, vol. 39, no. 2, pp. 275-282, Feb. 2002 - [7] A. Abidi, "The path to the software-defined radio receiver," *IEEE J. Solid-State Circuits*, vol. 42, no. 5, pp. 954-966, May 2007 - [8] A. Abidi, "The path to the software-defined radio receiver," *IEEE J. Solid-State Circuits*, vol. 42, no. 5, pp. 954-966, May 2007 - [9] D. Murphy, A. Hafez, A. Mirzaei, M. Mikhemar, H. Darbabi, M.-C. F. Chang, and A. Abidi, "A blocker-tolerant wideband noise-cancelling receiver with a 2dB noise figure," in *IEEE ISSCC Dig.* 2012, Feb. 2012, pp. 74-75 - [10] D. Murphy, H. Darbabi, A. Abidi, A. Hafez, A. Mirzaei, M. Mikhemar, and M.-C. F. Chang, "A blocker-tolerant, noise-cancelling receiver suitable for wideband wireless applications," *IEEE J. Solid-State Circuits*, vol. 47, no. 12, pp. 2943-2963, Dec. 2012 - [11] Z. Ru, E. Klumperink, G. Wienk, and B. Nauta, "A software-defined radio receiver architecture robust to out-of-band interference," in *IEEE ISSCC Dig.* 2009, Feb. 2009, pp. 230-231 - [12] Z. Ru, N. Moseley, E. Klumperink, and B. Nauta, "Digitally enhanced software-defined radio receiver robust to out-of-band interference," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3359-3375, Dec. 2009 - [13] J. Borremans, G. Mandal, V. Giannini, T. Sano, M. Ingels, B. Verbruggen, and J. Craninckx, "A 40nm CMOS highly linear 0.4-6GHz receiver resilient to 0dBm out-of-band blockers," in *IEEE ISSCC Dig. 2011*, Feb. 2011, pp. 62-64 - [14] J. Borremans, G. Mandal, V. Giannini, B. Debaillie, M. Ingels, T. Sano, B. Verbruggen, and J. Craninckx, "A 40nm CMOS 0.4-6GHz receiver resilient to out-of-band blockers," *IEEE J. Solid-State Circuits*, vol. 46, no. 7, pp. 1659-1671, Jul. 2011 - [15] E. Hegazi, H. Sjoland, and A. Abidi, "A filtering technique to lower *LC* oscillator phase noise," *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1921-1930, Dec. 2001 - [16] D. Murphy, H. Darabi, and Hao Wu, "A VCO with implicit common-mode resonance," in *IEEE ISSCC Dig. 2015*, Feb. 2015 [Accepted]. - [17] D. B. Leeson, "A simple model of feedback oscillator noise spectrum," *Proc. IEEE*, vol. 54, pp. 329-330, 1966 - [18] A. Abidi, "Phase noise and jitter in CMOS ring oscillator," *IEEE J. Solid-State Circuits*, vol. 41, no. 8, pp. 1803-1816, Aug. 2006 - [19] G. Moore, "Cramming more components onto integrated circuits," *Electronics*, vol. 38, no. 8, Apr. 1965 - [20] M. Mikhemar, D. Murphy, A. Mirzaei, and H. Darbabi, "A phase-noise and spur filtering technique using reciprocal-mixing cancellation," in *IEEE ISSCC Dig. 2013*, Feb. 2013, pp. 86-88 - [21] ——, "A cancellation technique for reciprocal-mixing caused by phase noise and spurs," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3080-3089, Dec. 2013 - [22] C. Andrews, and A. Molnar, "A passive-mixer-first receiver with baseband-controlled RF impedance matching, <6dB NF, and >27dBm wideband IIP3," in *IEEE ISSCC Dig. 2010*, Feb. 2010, pp. 46-47 - [23] ———, "Implication of passive mixer transparency for impedance matching and noise figure in passive mixer-first receivers," *IEEE Trans. Circuits Syst. I*, vol. 57, no. 12, pp. 3092-3103, Dec. 2010 - [24] ——, "A passive-mixer-first receiver with digitally-controlled and widely tunable RF interface," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2696-2708, Dec. 2010 - [25] K. Kurokawa, "Noise in synchronized oscillators," *IEEE Trans. Microw. Theory Tech.*, vol. MTT-16, no. 4, pp. 234-240, Apr. 1968 - [26] J. Lee and H. Wang, "Study of subharmonically injection-locked PLLs," *IEEE J. Solid-State Circuits*, vol. 44, no. 5, pp. 1539-1553, May 2009 - [27] J. Chien, P. Upadhyaya, H. Jung, S. Chen, W. Fang, A. Niknejad, J. Savoy, and K. Chang, "A pulse-position-modulation phase-noise-reduction technique for a 2-16GHz injection-locked ring oscillator in 20nm CMOS," in *IEEE ISSCC Dig. 2014*, Feb. 2014, pp. 52-53 - [28] T. Osborne and C. Elmendorf, "Injection-locked avalanche diode oscillator FM receiver," *Proc. IEEE*, pp. 214-215, Feb. 1969 - [29] R. Adler, "A study of locking phenomena in oscillators," *Proc. IEEE*, vol. 61, no. 10, pp. 1380-1385, Oct. 1973 - [30] A. Mirzaei and A. Abidi, "The spectrum of a noisy free-running oscillator explained by random frequency pulling," *IEEE Trans. Circuits Syst. I*, vol. 57, no. 3, pp. 642-653, Mar. 2010 - [31] C. Ruthroff, "Injection-locked-oscillator FM receiver analysis," *The Bell System Technical Journal*, pp. 1653-1661, Oct. 1968 - [32] H. Darabi and A. Abidi, "Noise in RF-CMOS mixers: a simple physical model", *IEEE Trans. Solid-State Circuits*, vol. 35, no. 1, pp. 15-25, Jan. 2000 - [33] S. Zhou and M.-C. F. Chang, "A CMOS passive mixer with low flicker noise for low-power direct-conversion receiver," *IEEE J. Solid-State Circuits*, vol. 40, no. 5, pp. 1084-1093, May 2009 - [34] D. Murphy, A. Mirzaei, H. Darabi, M.-C. F. Chang, and A. Abidi, "An LTV analysis of the frequency translational noise-cancelling receiver," *IEEE Trans. Circuits Syst. I*, vol. 61, no. 1, pp. 266-279, Jan. 2014 - [35] D. Murphy, H. Darabi, and H. Xu, "A noise-cancelling receiver with enhanced resilience to harmonic blockers," in *IEEE ISSCC Dig. 2014*, Feb. 2014, pp. 68-69 - [36] A. Mirzaei and H. Darabi, "Analysis of imperfections on performance of 4-phase passive-mixer-based high-Q bandpass filters in SAW-less receivers," *IEEE Trans. Circuits Syst. I*, vol. 58, no. 5, pp. 879-892, May 2011 - [37] A. Mirzaei, M. Mikhemar, D. Murphy, and H. Darabi, "A 2dB NF receiver with 10mA battery current suitable for coexistence applications," *IEEE J. Solid-State Circuits*, vol. 49, no. 4, pp. 972-983, Apr. 2009. - [38] A. Mirzaei, H. Darabi, and D. Murphy, "A low-power process-scalable superheterodyne receiver with integrated high-Q filters," in *IEEE ISSCC Dig. 2011*, Feb. 2011, pp. 60-61. - [39] —, "A low-power process-scalable super-heterodyne receiver with integrated high-Q filters," *IEEE J. Solid-State Circuits*, vol. 46, no. 12, pp. 2920-2932, Dec. 2011. - [40] J. Lu, N. Wang, and M.-C. F. Chang, "A single-LC-tank 5-10GHz quadrature local oscillator for cognitive radio applications," in *Proc. RFIC Symp. 2011*, Jun. 2011, pp. 1-4. - [41] ——, "A compact and low power 5-10GHz quadrature local oscillator for cognitive radio applications," *IEEE J. Solid-State Circuits*, vol. 47, no. 5, pp. 1130-1140, May 2011. - [42] J. Borremans, B. Liempd, E. Martens, S. Cha, and J. Craninckx, "A 0.9V low-power 0.4-6GHz linear SDR receiver in 28nm CMOS," in *Proc. Symp. VLSI Circuits 2013*, June 2013, pp. 146-147. - [43] B. Liempd, J. Borremans, E. Martens, S. Cha, H. Suys, B. Verbruggen, and J. Craninckx, "A 0.9V 0.4-6GHz harmonic recombination SDR receiver in 28nm CMOS with HR3/HR5 and IIP2 calibration," *IEEE J. Solid-State Circuits*, vol. 49, no. 8, pp. 1815-1826, Aug. 2011. - [44] I. Fabiano, M. Sosio, A. Liscidini, and R. Castello, "SAW-less analog front-end receivers for TDD and FDD," in *IEEE ISSCC Dig. 2013*, Feb. 2013, pp.83-84. - [45] Hao Wu, M. Mikhemar, D. Murphy, H. Darabi, and M.-C. F. Chang, "A highly-linear inductor-less wideband receiver with phase and thermal noise cancellation," in *IEEE ISSCC Dig.* 2015, Feb. 2015 [Accepted]. - [46] N. Wang, H. Wu, J.-C. Liu, J. Lu, H.-H. Hsieh, P.-Y. Wu, C. Jou, and M.-C. F. Chang, "A 60dB gain and 4dB noise figure CMOS V-band receiver based on two-dimensional passive Gm-enhancement," in *Proc. IEEE RFIC Symp. 2011*, Jun. 2011, pp. 1-4. - [47] V. Vidojkovic, G. Mangraviti, K. Khalaf, V. Szortyka, K. Vaesen, W. V. Thillo, B. Parvais, M. Libois, S. Thijs, J. R. Long, C. Soens, and P. Wambacq, "A Low-Power 57-to-66GHz Transceiver in 40nm LP CMOS with -17dB EVM at 7Gb/s," in *IEEE ISSCC Dig.* 2012, Feb. 2012, pp. 268-269. - [48] F. Vecchi, S. Bozzola, M. Pozzoni, D. Guermandi, E. Temporiti, M. Repossi, U. Decanis, A. Mazzanti, and F. Svelto, "A wideband mm-wave CMOS receiver for Gb/s - communications employing inter-stage coupled resonators," in *IEEE ISSCC Dig. 2010*, Feb. 2010, pp. 220-221. - [49] F. Vecchi, S. Bozzola, E. Temporiti, D. Guermandi, M. Pozzoni, M. Repossi, M. Cusmai, U. Decanis, A. Mazzanti, and F. Svelto, "A wideband receiver for multi-Gbit/s communications in 65nm CMOS," *IEEE J. Solid-State Circuits*, vol. 46, no. 3, pp. 551-561, Mar. 2011. - [50] Hao Wu, N. Wang, Y. Du, Y. Kuan, F. Hsiao, S. Lee, M. Tsai, C. Jou, and M.-C. F. Chang, "A Current-Mode mm-Wave Direct-Conversion Receiver with 7.5 GHz Bandwidth, 3.8 dB Minimum Noise-Figure and +1dBm P<sub>1dB</sub>, out linearity for high data rate communications", in *Proc. IEEE RFIC Symp. 2013*, Jun. 2013, pp. 89-92. - [51] Z. Ru, E. A. M. Klumperink, C. Saavedra, and B. Nauta, "A 300-800 MHz Tunable Filter and Linearized LNA Applied in a Low-Noise Harmonic-Rejection RF-Sampling Receiver," *IEEE J. Solid-State Circuits*, vol. 45, no. 5, pp.967-978, May 2010. - [52] B.-J. Huang, K.-Y. Lin, and H. Wang, "Millimeter-wave low power and miniature CMOS multicascode low-noise amplifiers with noise reduction topology," *IEEE Trans. Microw. Theory Techn.*, vol. 57, no. 12, pp. 3049-3059, Dec. 2009. - [53] W. L. Chan, and J. R. Long, "A 58-65GHz neutralized CMOS power amplifier with PAE above 10% at 1-V supply," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 554-564, Mar. 2010. - [54] B. K. Thandri, and J. Silva-Martinez, "A robust feedforward compensation scheme for multistage operational transconductance amplifiers with no Miller capacitors," *IEEE J. Solid-State Circuits*, vol. 38, no. 2, pp.237-243, Feb. 2003. - [55] A. Mirzaei, H. Darabi, J. C. Leete, and Y. Chang, "Analysis and optimization of direct-conversion receivers with 25% duty-cycle current driven passive mixers," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 9, pp. 2353-2366, Sep. 2010. - [56] A. Tang, D. Murphy, F. Hsiao, G. Virbila, Y.-H. Wang, H. Wu, Y. Kim, and M.-C. F. Chang, "A *D*-band CMOS transmitter with IF-envelop feed-forward pre-distortion and injection-locked frequency-tripling synthesizer," *IEEE Trans. Microw. Theory Techn.*, vol. 60, no. 12, pp. 4129-4137, Dec. 2009. - [57] D. Cai, Y. Shang, H. Yu, and J. Ren, "Design of ultra-low-power 60-GHz direct-conversion receivers in 65-nm CMOS," *IEEE Trans. Microw. Theory Techn.*, vol. 61, no. 9, pp. 3360-3372, Sep. 2013. - [58] Hao Wu, "Low Noise RF CMOS Circuits and Systems for Wireless Communications," Ph.D. dissertation, University of California, Los Angeles, [in press]. - [59] M. F. Chang, J. Cong, A. Kaplan, M. Naik, G. Reinman, E. Socher, and S.-W. Tam, "CMP network-on-chip overlaid with multi-band RF-Interconnect," in *Proc. IEEE HPCA Symp.* 2008, Feb. 2008, pp. 191-202. - [60] M. F. Chang, J. Cong, A. Kaplan, C. Liu, M. Naik, J. Premkumar, G. Reinman, E. Socher, and S.-W. Tam, "Power reduction of CMP communication networks via RF-Interconnect," in *Proc. IEEE/ACM Sym. Micro.* 2008, Nov. 2008, pp. 376-387. - [61] S.-W. Tam, E. Socher, A. Wong, and M. F. Chang, "A simultaneous tri-band on-chip RF-Interconnect for future Network-on-Chip", in *Proc. Symp. VLSI Circuits* 2009, June 2009, pp. 90-91. - [62] Y. Pan, P. Kumar, J. Kim, G. Memik, Y. Zhang, and A. Choudhary, "Firefly: Illuminating future Network-on-Chip with Nanophotonics," in *Proc. IEEE ACM ISCA 2009*, Jun. 2009, pp. 429-440. - [63] H. Ito, M. Kimura, K. Miyashita, T. Ishii, K. Okada, K. Masu, "A bidirectional- and multi-drop-transmission-line interconnect for multipoint-to-multipoint on-chip communications," *IEEE J. of Solid-State Circuits*, Vol.43, No.4, pp. 1020-1029, Apr. 2008. - [64] J. Postman and P. Chiang, "A survey addressing on-chip interconnect: energy and reliability considerations," *ISRN Electronics*, vol. 2012, pp. 1-9, 2012. - [65] D. Schinkel, E. Mensink, E. Klumperink, E. Tuijl, and B. Nauta, "Low-power, high-speed transceivers for Network-on-Chip Communication," *IEEE Trans. VLSI Systems*, vol. 17, no. 1, pp. 12-21, Jan. 2009. - [66] C. Xiao, M.-C. F. Chang, J. Cong, M. Gill, Z. Huang, C. Liu, G. Reinman, and Hao Wu, "Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects", *ACM Transactions on Architecture and Code Optimization (TACO)*, vol. 9, no. 4, pp. 60, Jan. 2013. - [67] Hao Wu, L. Nan, S.-W. Tam, H. Hsieh, C. Jou, G. Reinman, J. Cong, and M.-C. F. Chang, "A 60GHz on-chip RF-interconnect with $\lambda/4$ coupler for 5Gbps bi-directional communication and multi-drop arbitration", in *Proc. IEEE CICC* 2012, Sep. 2012, pp. 1-4 - [68] K. Odaka, K. Matsushita, K. Bunsen, R. Murakami, A. Musa, T. Sato, H. Asada, N. Takayama, N. Li, S. Ito, W. Chaivipas, R. Minami, and A. Matsuzawa, "A 60GHz - 16QAM/8PSK/QPSK/BPSK direct-conversion transceiver for IEEE 802.15.3c," in *IEEE ISSCC Dig. 2011*, Feb. 2011, pp. 160-161. - [69] A. Siligaris, O. Richard, B. Martineau, C. Mounet, F. Chaix, R. Ferrragut, C. Dehos, J. Lanteri, L. Dussopt, S. D. Yamamoto, R. Pilard, P. Busson, A. Cathelin, D. Belot, and Vincent, "A 65nm CMOS fully integrated transceiver module for 60GHz Wireless HD applications," in *IEEE ISSCC Dig. 2011*, Feb. 2011, pp. 162-163. - [70] M. Youssef, A. Zolfaghari, H. Darabi, and A. Abidi, "A low-power wideband polar transmitter for 3G applications," in *IEEE ISSCC Dig. 2011*, Feb. 2011, pp. 378-379. - [71] M. Youssef, A. Zolfaghari, B. Mohammadi, H. Darabi, and A. Abidi, "A low-power GSM/EDGE/WCDMA polar transmitter in 65-nm CMOS," *IEEE J. of Solid-State Circuits*, Vol. 46, No. 12, pp. 3061-3074, Dec. 2011. - [72] A. Tang, D. Murphy, F. Hsiao, Q. Gu, Z. Xu, G. Virbila, Y.-H. Wang, Hao Wu, L. Nan, Y.-C. Wu, and M.-C. F. Chang, "A CMOS 135-140GHz 0.4dBm EIRP transmitter with 5.1dB P<sub>1dB</sub> extension using IF envelop feed-forward gain compensation," in *IEEE IMS Dig.* 2012, Jun. 2012, pp. 1-3. - [73] A. Tang, F. Hsiao, D. Murphy, I.-N. Ku, J. Liu, S. D'Souza, N.-Y. Wang, Hao Wu, Y.-H. Wang, M. Tang, G. Virbila, M. Pham, D. Yang, Q. Gu, Y.-C. Wu, Y.-C. Kuan, C. Chien, and M.-C. F. Chang, "A low overhead self-healing embedded system for ensuring high performance yield and long-term sustainability of a 60GHz 4Gbps Radio-on-a-Chip," in *IEEE ISSCC Dig. 2012*, Feb. 2012, pp. 316-318. - [74] Y. Kim, S.-W. Tam, G.-S. Byun, Hao Wu, L. Nan, G. Reinman, J. Cong, and M.-C. F. Chang, "Analysis of noncoherent ASK modulation-based RF-Interconnect for memory interface," *IEEE J. on Emerging and Selected Topics in Circuits Syst.*, Vol. 2, No. 2, pp. 200-209, Jun. 2012. - [75] T. LaRocca, Y.-C. Wu, R. Snyder, J. Patel, K. Thai, C. Wong, Y. Yang, L. Gilreath, M. Watanabe, Hao Wu, and M.-C. F. Chang, "A 45GHz CMOS transmitter SoC with digitally-assisted power amplifiers for 64QAM efficiency improvement," in *Proc. IEEE RFIC Symp. 2013*, Jun. 2013, pp. 359-362. - [76] A. Tang, G. Virbila, Hao Wu, and M.-C. F. Chang, "A 155GHz 220mW synthesizer-free phase based radar system in 65nm CMOS technology," in *IEEE IMS Dig. 2013*, Jun. 2013, pp. 1-3.