Close

Presentation

Powering AI infrastructure with innovations in reliable in system/on-die memory design and characterization
DescriptionDriven by the frantic pace of Large Language Model adoption, AI is dramatically redefining Data Center Infrastructure requirements. For instance, Meta's Llama 3 needed 16,000 Nvidia Hopper GPUs and 70 days to train, negotiating 405 Billion parameters and using 15.6 Trillion tokens. Such massive workloads demand not only optimally fast interconnects and innovations in Power delivery, but they also demand materially superior design and reliability performance from Compute's eternal execution twin - Memory, both on and off chip. Memory requirements for High Performance compute are an increasing challenge as new versions of LLMs (eg. Llama 3.1) are pushing per-model-instance memory requirements to nearly a Terabyte (854 GB to be precise), which in turn means subsequent generations of GPUs (like the H200) and other general purpose SoCs will have to support significantly larger on-die Memory clusters. Designing, characterizing and delivering reliable Memory banks within acceptable time-to-market windows thus becomes a key competency that deserves increasing focus. Our design team's track record of delivering multiple generations of silicon proven Memory IP in the most advanced process nodes across multiple foundries and customers, enables us to be a significant part of that focus. Our design experience is enhanced by addressing advanced testability demands of modern multi-die SOCs, including proven history of designing on-chip memory diagnostics (to enable real-time fault monitoring without need for external testers), supporting pattern diagnosis, debug, flexible repair hierarchies and Quality of results optimization, all techniques for non-intrusive fault tolerance and optimal system performance essential for AI/ML applications.
Double-click on our expertise and you will find Infosys employing robust AI/ML powered algorithms in the Memory design and characterization process. Using these algorithms, we are identifying critical paths within large memory instances, predicting PPA metrics across different process, voltage and temperature corners and aging corners efficiently. These techniques dramatically reduce memory design times by eliminating the need to run actual simulations across all corners. Any design/feature change requires re-characterization and model retraining ONLY over the corresponding leaf cell(s) while complier range change does not require any further adjustments. Such innovations, using AI to power development of future AI platforms, is one of many reasons why Infosys is a dependable partner in delivering core (Silicon) elements of AI infrastructure.
Event Type
Exhibitor Forum
TimeMonday, June 235:00pm - 5:30pm PDT
LocationExhibitor Forum, Level 1 Exhibit Hall