Design Techniques for Energy-Efficient, Low Latency High Speed Wireline Links
As data and computing systems get larger with more elements composing a single system, streamlined computation and data communication has put an ever-increasing demand on throughput of high speed SerDes. Industrial standards have responded to this trend by increasing the data-rate of chip I/Os, demanding doubling per-pin data-rate around every four years while the power budget remains the same. This implies that the energy efficiency of these links must improve all while being able to handle the harsher equalization environments seen at higher frequencies. To address the challenge of per-pin bandwidth, this thesis first presents various receive side equalization techniques used in a 60Gb/s non-return-to-zero (NRZ) link. In particular, a double data-rate (DDR) architecture uses current integration in several front-end equalization circuits, including the continuous-time linear equalizer (CTLE), feed-forward equalizer (FFE), and decision-feedback equalizer (DFE), demonstrated in a receiver frontend to achieve 60Gb/s operation with > 0.2 UI-timing margin at 1e-9 BER, while consuming 173mW. The same architecture was utilized within a complete non-return-to-zero transceiver with adaptive equalization achieving 60Gb/s with >0.3 UI opening at 10-12 Bit Error Rate (BER), while consuming 288 mW and occupying 2.48 mm2.
Furthermore, supporting this throughput in distributed system with a ubiquitous communication standard calls for links, which are able to quickly turn on and off and operate efficiently in low utilization modes while supporting capability for maximum throughput. This thesis then goes into an analysis of the requirements motivating our architectural and circuit level decisions for a burst-mode, energy proportional wireline link. To achieve energy proportionality, a 2-tap switched-capacitor transmitter with FFE equalization is presented that allows for a fully dynamic architecture operating at a nominal data-rate of 20Gb/s while maintaining energy-efficiency during both high and low link utilization. Additionally, a rapid-on/off voltage controlled LC oscillator uses resonant clocking to save power by directly driving the data-path capacitive loads all while improving overall latency, and a phase interpolator with a phase adjustable clock divider allows for the lowest achievable latency design for a 64:1 1-latch serializer implementation. The transmitter was taped out in TSMC’s 28nm GP process, and achieves 1.2ns startup time and 0.72-0.62 pJ/bit at 1-20Gb/s while occupying 0.19mm2.