Power consumption and delivery have emerged as one of the major challenges facing modern SoC design. As chip designs become more complex with aggressive architectures, pressure on efficient power delivery mechanisms is increasing.
Designing efficient voltage regulators gets harder due to increased current as power increases and voltage scales down. Off-chip voltage regulators can be made more efficient due to fewer restrictions on capacitor sizes but need to take into account parasitics on pads. Process variation further exacerbates the problem, because the design needs to take into account the worst case. As technology scales towards nano-scale and with the prevalence of having multiple voltage domains on the chip, voltage stacking offers an alternative in Power Delivery Network (PDN) design that alleviates conventional power delivery inefficiencies.
The first part of this dissertation explores different types of existing level shifters suitable for a voltage stacked logic, their optimal sizing, and the effect of PVT variation on delay and energy consumption.
In the second part of the dissertation, instead of inserting an SRAM into a voltage domain, as it is the common case, the SRAM logic itself is divided into multiple domain. The symmetric logic of SRAM is leveraged for a stacking technique and is divided into two logic domains, and the supply voltage Vdd is doubled. The supply voltage 2Vdd will distribute evenly between the stacks and the current demand will decrease up to half. Hence, the same amount of power is delivered, but with half the current.
The third part of this dissertation builds upon the idea of a floating voltage level in a voltage stacked system and on the observation that slower transistors have higher impedance in the presence of process variation. This chapter offers a GPU stacking method based on voltage stacking to manage the effects of process variation and improve the power delivery simulta- neously. The evaluation conducted in this dissertation considers Near Threshold Computing (NTC), because the effects of process variation are more severe in this scenario, however, the technique can be applied without the use of NTC. Using GPU Stacking brings the chip distribu- tion closer to the nominal, i.e., no process variation, and is shown to be better than simply using multiple clock domains, which is the current state-of-the-art.
The final contribution of the dissertation looks at SRAM design, specifically into reducing voltage and timing margins. I propose a timing speculative SRAM that extends the existing Replica Bit-Line (RBL) technique to detect read timing failures. And to protect it from incorrect write operations, the SRAM decode logic is extended. The Replica-based Timing Speculative SRAM (RTS) is evaluated as an energy and area efficient design alternative to prior techniques such as Razor-enabled SRAMs.