Presentation
Chip package level thermal integrity analysis of high-power data center chips for hot spot detection
DescriptionIn advanced node stacked chips, increasing power density within the chip results in reaching thermal limits of operation and a need for identifying thermal hotspots on the chip and designing appropriate cooling solutions. In this presentation, we demonstrate chip package level thermal analysis performed for a large Machine learning accelerator chip to model thermal gradient across the design for multiple distinct vectors.
The design consists of compute dies and high bandwidth memory chips stacked on an organic interposer. The tile-based power map distinct to each vector generated includes leakage power as a function of temperature along with switching and internal power. This led to an increase in accuracy (~3.5°C). Thermal material properties of on-die routing layers along with package and system level cooling details were incorporated.
Nearly 96% of the power was estimated to be dissipated though the heat sink system, compared to just 4% through the package. Heat map and thermal gradient at each individual on-die routing layer was extracted. We identified the hot spots specific to each vector and accurate thermal sensor placement at these locations enabled us to capture on die temperature, avoid overheating and minimize thermal failures by providing appropriate feedback to trigger mitigation techniques.
The design consists of compute dies and high bandwidth memory chips stacked on an organic interposer. The tile-based power map distinct to each vector generated includes leakage power as a function of temperature along with switching and internal power. This led to an increase in accuracy (~3.5°C). Thermal material properties of on-die routing layers along with package and system level cooling details were incorporated.
Nearly 96% of the power was estimated to be dissipated though the heat sink system, compared to just 4% through the package. Heat map and thermal gradient at each individual on-die routing layer was extracted. We identified the hot spots specific to each vector and accurate thermal sensor placement at these locations enabled us to capture on die temperature, avoid overheating and minimize thermal failures by providing appropriate feedback to trigger mitigation techniques.
Event Type
Engineering Poster
Networking
TimeMonday, June 235:00pm - 6:00pm PDT
LocationEngineering Posters, Level 2 Exhibit Hall


