Close

Presentation

SeIM: In-Memory Acceleration for Approximate Nearest Neighbor Search
DescriptionApproximate nearest neighbor search (ANNS) is crucial in many applications to find semantically similar matches for user queries. Especially with the development of large language models (LLMs), ANNS is becoming increasingly important in retrieval-augmented generation (RAG) technique. An in-depth analysis of ANNS reveals that its diverse operations, from extensive memory access to intensive sorting, are key performance bottlenecks, imposing significant strain on both the memory system and computing resources.

Based on these observations, we present SeIM, a hierarchical in-memory architecture to accelerate ANNS. SeIM is designed to accommodate the diverse operational characteristics of ANNS. Specifically, SeIM offloads highly parallel memorybound operations to the memory bank level and introduces a unified execution model to reuse hardware units, requiring only lightweight modifications to standard DRAM architecture. Additionally, SeIM places compute-bound sorting operations, which require cross-unit data access, at the memory controller level and employs an adaptive transmission filtering technique to reduce unnecessary data transfers and processing during sorting. Our evaluation shows that SeIM achieves 268×, 22×, and 5× higher throughput, 306×, 59×, and 4× lower latency, and 3081×, 287×, and 2× higher power efficiency than state-of-the-art CPU-, GPU-, and ASIC-based ANNS solutions.
Event Type
Research Manuscript
TimeWednesday, June 255:15pm - 5:30pm PDT
Location3001, Level 3
Topics
Design
Tracks
DES2B: In-memory and Near-memory Computing Architectures, Applications and Systems