UC San Diego
Accurate temperature sensing and efficient dynamic thermal management in MPSoCs
- Author(s): Sharifi, Shervin
- et al.
Constant increase in performance demands, more aggressive technology scaling and higher transistor integration capacity result in continuously increasing power density and temperature in multi-processor System-on-Chip (SoC) devices. Dynamic thermal management (DTM) techniques try to avoid thermal violations by enabling the chip to control its temperature at runtime. To do this, accurate runtime temperature information is necessary, which is typically obtained from on-die thermal sensors. Sensor accuracy can be significantly affected by factors such as sensor degradation and failure, limitations on the number of sensors and their placement, dynamic change of hotspot locations, etc. To improve the accuracy of temperature sensing, which directly affects the efficiency of DTM, two techniques are proposed. Accurate direct temperature sensing is a design time technique for optimum allocation and placement of on-chip thermal sensors. It targets the inaccuracies due to sensor placement and can reduce the number of thermal sensors by 16% on average. Accurate indirect temperature sensing is a runtime technique which targets the sources of inaccuracy which cannot be addressed at design time. Based on inaccurate readings from a few noisy sensors, this method accurately estimates the temperature at any location on the die. It also reduces mean absolute error and standard deviation of the errors by up to an order of magnitude. DTM efficiency can be improved by predicting changes in temperature and proactively controlling them, which reduces DTM's response time and performance overhead. We propose a temperature prediction technique called Tempo to accurately evaluate the thermal impact of DTM actions. Compared to previous temperature prediction techniques, Tempo can reduce the maximum prediction error by up to an order of magnitude. Heterogeneous MPSoCs which integrate various types of cores are particularly at a disadvantage from a thermal perspective, due to the inherent imbalance in power density distribution. We present PROMETHEUS, a thermal management framework which systematically performs proactive temperature-aware scheduling for heterogeneous (and homogeneous) MPSoCs. PROMETHEUS framework provides two alternative temperature-aware scheduling techniques: TempoMP which uses online optimization for optimal power state assignment to the cores, and a more scalable technique TemPrompt, which is based on a heuristics and has a lower overhead