BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260402T024533Z
LOCATION:3001\, Level 3
DTSTART;TZID=America/Los_Angeles:20250624T170000
DTEND;TZID=America/Los_Angeles:20250624T171500
UID:dac_DAC 2025_sess121_RESEARCH881@linklings.com
SUMMARY:NDFT: Accelerating Density Functional Theory Calculations via Hard
 ware/Software Co-Design on Near-Data Computing System
DESCRIPTION:Buxin Tu, Qingcai Jiang, Xiaoyu Hao, Junshi Chen, and Hong An 
 (University of Science and Technology of China)\n\nLinear-response time-de
 pendent Density Functional Theory (LR-TDDFT) is a widely used method for a
 ccurately predicting the excited-state properties of physical systems.\nPr
 evious works have attempted to accelerate LR-TDDFT using heterogeneous sys
 tems such as GPUs, FPGAs, and the Sunway architecture.\nHowever, a major d
 rawback of these approaches is the constant data movement between host mem
 ory and the memory of the heterogeneous systems, which results in substant
 ial \textit{data movement overhead}.\nMoreover, these works focus primaril
 y on optimizing the compute-intensive portions of LR-TDDFT, even though th
 e calculation steps are fundamentally \textit{memory-bound}.\nTo address t
 hese challenges, we propose NDFT, a \underline{N}ear-\underline{D}ata Dens
 ity \underline{F}unctional \underline{T}heory framework.\nSpecifically, we
  design a novel task partitioning and scheduling mechanism to offload each
  part of LR-TDDFT to the most suitable computing units within a CPU-NDP sy
 stem.\nAdditionally, we implement a hardware/software co-optimization of a
  critical kernel in LR-TDDFT to further enhance performance on the CPU-NDP
  system.\nOur results show that NDFT achieves performance improvements of 
 5.2x and 2.5x over CPU and GPU baselines, respectively, on a large physica
 l system.\n\nTopics: Design\n\nTracks: DES2B: In-memory and Near-memory Co
 mputing Architectures, Applications and Systems\n\nSession Chairs: Arman R
 oohi (University of Illinois, Chicago) and Abhronil Sengupta (Pennsylvania
  State University)\n\n
END:VEVENT
END:VCALENDAR
