Presentation
SplitSync: Bank Group-Level Split-Synchronization for High-Performance DRAM PIM
DescriptionProcessing in Memory (PIM) architectures enhance memory bandwidth by utilizing bank-level parallelism, typically implemented with a SIMD structure where all banks operate simultaneously under a single command.
However, this synchronous approach requires the activation of all banks before computation, leading to activation times that exceed computation times, limiting performance gain.
Recently, asynchronous execution PIM has been proposed as an alternative, allowing banks to operate asynchronously and overlap activation with processing to hide the row activation overhead. While effective at reducing row activation overhead, the independent operation requires large shared accumulators for each bank group, increasing area overhead.
To address the issues, we propose bank group (BG)-level split synchronization DRAM PIM, where each bank group operates asynchronously to hide row activation overhead while operating synchronously within the bank group to eliminate the need for shared accumulators. Evaluation results show that our proposed design achieves an average throughput improvement of 1.70x and 1.06x compared to conventional PIM and asynchronous execution PIM.
Furthermore, the area overhead per processing unit (PU) increases by only 1.5% compared to conventional PIM and is significantly lower than that of asynchronous execution PIM.
However, this synchronous approach requires the activation of all banks before computation, leading to activation times that exceed computation times, limiting performance gain.
Recently, asynchronous execution PIM has been proposed as an alternative, allowing banks to operate asynchronously and overlap activation with processing to hide the row activation overhead. While effective at reducing row activation overhead, the independent operation requires large shared accumulators for each bank group, increasing area overhead.
To address the issues, we propose bank group (BG)-level split synchronization DRAM PIM, where each bank group operates asynchronously to hide row activation overhead while operating synchronously within the bank group to eliminate the need for shared accumulators. Evaluation results show that our proposed design achieves an average throughput improvement of 1.70x and 1.06x compared to conventional PIM and asynchronous execution PIM.
Furthermore, the area overhead per processing unit (PU) increases by only 1.5% compared to conventional PIM and is significantly lower than that of asynchronous execution PIM.
Event Type
Research Manuscript
TimeWednesday, June 254:00pm - 4:15pm PDT
Location3001, Level 3
Design
DES2B: In-memory and Near-memory Computing Architectures, Applications and Systems