BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260402T024533Z
LOCATION:3001\, Level 3
DTSTART;TZID=America/Los_Angeles:20250623T133000
DTEND;TZID=America/Los_Angeles:20250623T134500
UID:dac_DAC 2025_sess108_RESEARCH201@linklings.com
SUMMARY:SSFT: Algorithm and Hardware Co-design for Structured Sparse Fine-
 Tuning of Large Language Models
DESCRIPTION:Miao Yu and Trevor E. Carlson (National University of Singapor
 e)\n\nA significant number of users depend on Large Language Models (LLMs)
  for downstream tasks, but training LLMs from scratch remains prohibitivel
 y expensive. Sparse fine-tuning (SFT) has emerged as an effective strategy
  to reduce both the time and memory requirements of fine-tuning LLMs, achi
 eving accuracy on par with fully fine-tuned models. Although SFT has the p
 otential to achieve superior performance by minimizing computational requi
 rements, SFT on GPUs often underperforms compared to dense algorithms like
  LoRA due to sparse data accesses that modern GPUs cannot efficiently hand
 le. To address these issues, we propose Structured Sparse Fine- Tuning (SS
 FT). It comprises a novel algorithm, SSFT-Alg, which introduces predictabl
 e sparsity patterns to reduce memory access overhead and enhance regularit
 y in the SFT process. To support SSFT-Alg, an accelerator, SSFT-Hw, is pro
 posed to optimize SSFT-Alg through an innovative sparsity-aware design, av
 oiding the overhead of sparsity operations on GPUs and optimizing latency 
 and energy efficiency. Experiments with relevant models and benchmarks dem
 onstrate that SSFT achieves comparable accuracy to state-of-the-art models
  on BERT, LLaMA 2 7B, and LLaMA 2 13B. Moreover, SSFT-Hw outperforms both 
 GPUs and the state-of-the-art sparsity-aware transformer accelerators in t
 hroughput by 51.0× and 1.32×, respectively, while reducing energy efficien
 cy by 19.0× and 1.48×.\n\nTopics: AI\n\nTracks: AI3: AI/ML Architecture De
 sign\n\nSession Chairs: Abdelrahman Hosny (Apple Inc.) and Marina Neseem (
 Nvidia)\n\n
END:VEVENT
END:VCALENDAR
