Presentation
Me-MPK: Accelerating Krylov Subspace Solvers via Memory-efficient Matrix-Power Kernel
DescriptionThis paper focuses on optimizing the Matrix-Power Kernel (MPK), which relies on a series of Sparse Matrix-Vector multiplications (SpMVs) using the same sparse matrix. MPK is a crucial component of Krylov subspace methods for solving large sparse linear systems in various fields, including circuit simulations. MPK offers a potential for matrix reuse in cache, which can accelerate memory-bound sparse solvers. Additionally, many sparse matrices encountered in applications are symmetric, allowing us to reduce the memory footprint for SpMVs by half. However, reusing the matrix introduces data dependencies between subsequent SpMVs, and symmetric SpMVs can result in data conflicts during shared-memory parallelization. Previous research has often focused on either matrix reuse or symmetry, failing to leverage both aspects effectively.
This paper proposes a unified, memory-efficient approach called Me-MPK that takes advantage of both cache reuse and matrix symmetry for MPK on shared-memory multi-core systems. We first introduce a unified dependency graph for a sparse matrix, which represents all potential data dependencies and conflicts. Next, we perform architecture-aware recursive partitioning on this graph to create subgraphs and formulate a separating subgraph that decouples all dependencies and conflicts among the subgraphs. These independent subgraphs are then scheduled for parallel execution of SpMV or symmetric SpMV in a specified order to optimize cache reuse. We apply Me-MPK in two s-Step Krylov subspace solvers, and our evaluations show that Me-MPK significantly outperforms the current state-of-the-art solutions, delivering an average speedup of up to 2.00X and 1.86X on X86 and ARM CPUs, respectively. As a result, we achieve overall speedup in the sparse solvers of up to 1.65X and 1.58X.
This paper proposes a unified, memory-efficient approach called Me-MPK that takes advantage of both cache reuse and matrix symmetry for MPK on shared-memory multi-core systems. We first introduce a unified dependency graph for a sparse matrix, which represents all potential data dependencies and conflicts. Next, we perform architecture-aware recursive partitioning on this graph to create subgraphs and formulate a separating subgraph that decouples all dependencies and conflicts among the subgraphs. These independent subgraphs are then scheduled for parallel execution of SpMV or symmetric SpMV in a specified order to optimize cache reuse. We apply Me-MPK in two s-Step Krylov subspace solvers, and our evaluations show that Me-MPK significantly outperforms the current state-of-the-art solutions, delivering an average speedup of up to 2.00X and 1.86X on X86 and ARM CPUs, respectively. As a result, we achieve overall speedup in the sparse solvers of up to 1.65X and 1.58X.
Event Type
Research Manuscript
TimeWednesday, June 251:45pm - 2:00pm PDT
Location3003, Level 3
EDA
EDA6: Analog CAD, Simulation, Verification and Test