BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260402T024534Z
LOCATION:3002\, Level 3
DTSTART;TZID=America/Los_Angeles:20250624T163000
DTEND;TZID=America/Los_Angeles:20250624T164500
UID:dac_DAC 2025_sess117_RESEARCH008@linklings.com
SUMMARY:ALLMod: Exploring \underline{A}rea-Efficiency of \underline{L}UT-b
 ased \underline{L}arge Number \underline{Mod}ular Reduction via Hybrid Wor
 kloads
DESCRIPTION:Fangxin Liu, Haomin Li, and Zongwu Wang (Shanghai Jiao Tong Un
 iversity); Bo Zhang, Mingzhe Zhang, and Shoumeng Yan (Ant Group); and Li J
 iang (Shanghai Jiao Tong University)\n\nModular arithmetic, particularly m
 odular reduction, is widely used in cryptographic applications such as hom
 omorphic encryption (HE) and zero-knowledge proofs (ZKP). High-bit-width o
 perations are crucial for enhancing security; however, they are computatio
 nally intensive due to the large number of modular operations required. Th
 e lookup-table-based (LUT-based) approach, a ``space-for-time'' technique,
  reduces computational load by segmenting the input number into smaller bi
 t groups, pre-computing modular reduction results for each segment, and st
 oring these results in LUTs. While effective, this method incurs significa
 nt hardware overhead due to extensive LUT usage.\nIn this paper, we introd
 uce ALLMod, a novel approach that improves the area efficiency of LUT-base
 d large-number modular reduction by employing hybrid workloads. Inspired b
 y the iterative method, ALLMod splits the bit groups into two distinct wor
 kloads, achieving lower area costs without compromising throughput. We fir
 st develop a template to facilitate workload splitting and ensure balanced
  distribution. Then, we conduct design space exploration to evaluate the o
 ptimal timing for fusing workload results, enabling us to identify the mos
 t efficient design under specific constraints. Extensive evaluations show 
 that ALLMod achieves up to $1.65\times$ and $3\times$ improvements in area
  efficiency over conventional LUT-based methods for bit-widths of $128$ an
 d $8,192$, respectively.\n\nTopics: Design\n\nTracks: DES1: SoC, Heterogen
 eous, and Reconfigurable Architectures\n\nSession Chairs: Tianhao Cai (Bei
 hang University) and Dirk Stroobandt (Ghent University)\n\n
END:VEVENT
END:VCALENDAR
