BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260402T024534Z
LOCATION:3001\, Level 3
DTSTART;TZID=America/Los_Angeles:20250625T110000
DTEND;TZID=America/Los_Angeles:20250625T111500
UID:dac_DAC 2025_sess114_RESEARCH417@linklings.com
SUMMARY:DataMaestro: A Versatile and Efficient Data Streaming Engine Bring
 ing Decoupled Memory Access To Dataflow Accelerators
DESCRIPTION:Xiaoling Yi, Yunhao Deng, Ryan Antonio, and Fanchen Kong (KU L
 euven); Guilherme Paim (INESC-ID); and Marian Verhelst (KU Leuven)\n\nDeep
  Neural Networks (DNNs) have achieved remarkable success across various in
 telligent tasks but encounter performance and energy challenges in inferen
 ce execution due to data movement bottlenecks. We introduce DataMaestro, a
  versatile and efficient data streaming unit that brings the decoupled acc
 ess/execute architecture to DNN dataflow accelerators to address this issu
 e.\nDataMaestro supports flexible and programmable access patterns to acco
 mmodate diverse workload types and dataflows, incorporates fine-grained pr
 efetch and addressing mode switching to mitigate bank conflicts, and enabl
 es customizable on-the-fly data manipulation to reduce memory footprints a
 nd access counts. We integrate five DataMaestros with a Tensor Core-like G
 eMM accelerator and a Quantization accelerator into a RISC-V host system f
 or evaluation. The FPGA prototype and VLSI synthesis results demonstrate t
 hat DataMaestro helps the GeMM core achieve nearly 100% utilization, which
  is 1.05-21.39× better than state-of-the-art solutions, while minimizing a
 rea and energy consumption to merely 6.43% and 15.06% of the total system.
 \n\nTopics: AI\n\nTracks: AI4: AI/ML System and Platform Design\n\nSession
  Chairs: Xiaoxuan Yang (University of Virginia, Stanford University) and S
 hihao Song (Nvidia)\n\n
END:VEVENT
END:VCALENDAR
