Close

Session

Research Manuscript: Everything About LLM and Transformer Accelerators
DescriptionThis session will provide an in-depth exploration of the latest advancements in accelerators designed for large language models (LLMs) and transformers. Attendees will gain insights into the intersection of hardware and AI, focusing on the innovations that enhance both computational efficiency and memory bandwidth with various quantization and prediction schemes. More specifically, the session covers the speculative and prediction on QKV computations, quantization schemes including block floating point and microscaling format, and how to sparsificate the models and leverage them. It also covers diffusion model acceleration and the intersection with compute-in-memory architecture.
Event TypeResearch Manuscript
TimeTuesday, June 243:30pm - 5:30pm PDT
Location3000, Level 3
Topics
AI
Tracks
AI3: AI/ML Architecture Design
Presentations
3:30pm - 3:45pm PDT3D-TokSIM: Stacking 3D Memory with Token-Stationary Compute-in-Memory for Speculative LLM Inference
3:45pm - 4:00pm PDTA Memory-Efficient LLM Accelerator with Q-K Correlation Prediction using Cluster-Based Associative Array for Selective KV Accessing
4:00pm - 4:15pm PDTPrecon: A Precision-Convertible Architecture for Accelerating Quantized Deep Learning Models across Various Domains Including LLMs
4:15pm - 4:30pm PDTRADiT: Redundancy-Aware Diffusion Transformer Acceleration Leveraging Timestep Similarity
4:30pm - 4:45pm PDTSQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity
4:45pm - 5:00pm PDTXShift: FPGA-efficient Binarized LLM with Joint Quantization and Sparsification
5:00pm - 5:15pm PDTBBAL: A Bidirectional Block Floating Point-Based Quantization Accelerator for Large Language Models
5:15pm - 5:30pm PDTAn Algorithm-Hardware Co-design Based on Revised Microscaling Format Quantization for Accelerating Large Language Models