BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260402T024533Z
LOCATION:3001\, Level 3
DTSTART;TZID=America/Los_Angeles:20250625T113000
DTEND;TZID=America/Los_Angeles:20250625T114500
UID:dac_DAC 2025_sess114_RESEARCH429@linklings.com
SUMMARY:PaSK: Cold Start Mitigation for Inference with Proactive and Selec
 tive Kernel Loading on GPUs
DESCRIPTION:Xuanteng Huang, Jiangsu Du, Nong Xiao, and Xianwei Zhang (Sun 
 Yat-sen University)\n\nToday, DNN inference is widely adopted, with numero
 us inference services being spawned from scratch across instances scenario
 s such as spot serving, serverless scaling and edge computing, where frequ
 ent start-stops are required. In this work, we first delve into the infere
 nce workflow and uncover the origins of cold start when invoking a DNN mod
 el. Specifically, DNN execution is blocked by the kernel loading process t
 o prepare the code object executing on GPU at the DL primitive library (e.
 g., cuDNN and MIOpen). To tackle this, we propose PASK, a kernel loading a
 nd reusing middleware to mitigate the widespread cold start issue. Unlike 
 the reactive kernel scheduling policy used by existing frameworks, PASK ad
 opts a proactive strategy to interleave code loading, kernel issuing and G
 PU computation to achieve higher hardware utilization. To further reduce t
 he loading overhead, PASK recycles existing loaded kernels to accomplish t
 he DNN operator, rather than inducting new kernels for every layer. Meanwh
 ile, PASK categorically organizes the cached kernels to efficiently find t
 he applicable kernel for reuse and thus minimize incurred runtime overhead
 . We implement and evaluate PASK atop of open source DNN inference engine 
 and primitive library on off-the-shelf GPUs. Experiments demonstrate PASK 
 is capable of alleviating the cold start overhead of popular DNN models wi
 th 5.62x speedup on average.\n\nTopics: AI\n\nTracks: AI4: AI/ML System an
 d Platform Design\n\nSession Chairs: Xiaoxuan Yang (University of Virginia
 , Stanford University) and Shihao Song (Nvidia)\n\n
END:VEVENT
END:VCALENDAR
