CitedEvidence
User Settings
Open AccessArticle

Sparse Matrix-Vector Multiplication on GPU

Arash Ashari-2014-01-01-OhioLink ETD Center (Ohio Library and Information Network)

TL;DRAbstract

Sparse Matrix-Vector multiplication (SpMV) is one of the key operations in linear algebra. Overcoming thread divergence, load imbalance and un-coalesced and indirect memory access due to sparsity and irregularity are challenges to optimizing SpMV on GPUs. This dissertation develops solutions that address these challenges effectively. The first part of this dissertation focuses on a new blocked row-column (BRC) storage format with a two-dimensional blocking mechanism. It reduces thread divergence by reordering and blocking rows of the input matrix with nearly equal number of non-zero elements onto the same execution units (i.e., warps). BRC improves load balance by partitioning rows into blocks with a constant number of non-zeros such that different warps perform the same amount of work. We also present an approach to optimize BRC performance by judicious selection of block size based on sparsity characteristics of the matrix. Themost commonly used format for a sparsematrix is CSR (Comp

Chat with Paper

AI Agents for this Paper

Sparse Matrix-Vector multiplication (SpMV) is one of the key operations in linear algebra. Overcoming thread divergence, load imbalance and un-coalesced and indirect memory access due to sparsity and irregularity are challenges to optimizing SpMV on GPUs. This dissertation develops solutions that address these challenges effectively. The first part of this dissertation focuses on a new blocked row-column (BRC) storage format with a two-dimensional blocking mechanism. It reduces thread divergence by reordering and blocking rows of the input matrix with nearly equal number of non-zero elements onto the same execution units (i.e., warps). BRC improves load balance by partitioning rows into blocks with a constant number of non-zeros such that different warps perform the same amount of work. We also present an approach to optimize BRC performance by judicious selection of block size based on sparsity characteristics of the matrix. Themost commonly used format for a sparsematrix is CSR (Comp

Keywords

Computer scienceParallel computingThread (computing)RowSparse matrixLinear algebraMultiplication (music)Overhead (engineering)

Chat

Click to start Chat