BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260402T024534Z
LOCATION:3001\, Level 3
DTSTART;TZID=America/Los_Angeles:20250625T114500
DTEND;TZID=America/Los_Angeles:20250625T120000
UID:dac_DAC 2025_sess114_RESEARCH1434@linklings.com
SUMMARY:FLAG: An FPGA-Based System for Low-Latency GNN Inference Service U
 sing Vector Quantization
DESCRIPTION:Yunki Han, Taehwan Kim, Jiwan Kim, Seohye Ha, and Lee-Sup Kim 
 (Korea Advanced Institute of Science and Technology (KAIST))\n\nEnabling r
 eal-time GNN inference services requires low end-to-end latency to meet se
 rvice level agreements. However, intensive preparation steps and the neigh
 borhood explosion problem pose significant challenges to efficient GNN inf
 erence serving. In this paper, we propose FLAG, an FPGA-based GNN inferenc
 e serving system using vector quantization. To reduce preparation overhead
 , we introduce offline preprocessing to precompute and compress hidden emb
 eddings for serving. A dedicated FPGA accelerator leverages the precompute
 d data to enable lightweight aggregation. As a result, FLAG achieves avera
 ge speedups of 154×, 176×, and 333× on three GNN models compared to the ba
 seline system.\n\nTopics: AI\n\nTracks: AI4: AI/ML System and Platform Des
 ign\n\nSession Chairs: Xiaoxuan Yang (University of Virginia, Stanford Uni
 versity) and Shihao Song (Nvidia)\n\n
END:VEVENT
END:VCALENDAR
