Session
Everything About LLM and Transformer Accelerators
DescriptionThis session will provide an in-depth exploration of the latest advancements in accelerators designed for large language models (LLMs) and transformers. Attendees will gain insights into the intersection of hardware and AI, focusing on the innovations that enhance both computational efficiency and memory bandwidth with various quantization and prediction schemes. More specifically, the session covers the speculative and prediction on QKV computations, quantization schemes including block floating point and microscaling format, and how to sparsificate the models and leverage them. It also covers diffusion model acceleration and the intersection with compute-in-memory architecture.
Event TypeResearch Manuscript
TimeTuesday, June 243:30pm - 5:30pm PDT
Location3000, Level 3
AI
AI3: AI/ML Architecture Design
Presentations


