A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves

Weifeng Liu; A. Li; J. Hogg; IS Duff; Brian Vinter

doi:10.1007/978-3-319-43659-3_45

A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves

Weifeng Liu, A. Li, J. Hogg, IS Duff, Brian Vinter

eScience

39 Citationer (Scopus)

Abstract

The sparse triangular solve kernel, SpTRSV, is an important building block for a number of numerical linear algebra routines. Parallelizing SpTRSV on today’s manycore platforms, such as GPUs, is not an easy task since computing a component of the solution may depend on previously computed components, enforcing a degree of sequential processing. As a consequence, most existing work introduces a preprocessing stage to partition the components into a group of level-sets or colour-sets so that components within a set are independent and can be processed simultaneously during the subsequent solution stage. However, this class of methods requires a long preprocessing time as well as significant runtime synchronization overhead between the sets. To address this, we propose in this paper a novel approach for SpTRSV in which the ordering between components is naturally enforced within the solution stage. In this way, the cost for preprocessing can be greatly reduced, and the synchronizations between sets are completely eliminated. A comparison with the state-of-the-art library supplied by the GPU vendor, using 11 sparse matrices on the latest GPU device, show that our approach obtains an average speedup of 2.3 times in single precision and 2.14 times in double precision. The maximum speedups are 5.95 and 3.65, respectively. In addition, our method is an order of magnitude faster for the preprocessing stage than existing methods.

Originalsprog	Engelsk
Bogserie	Lecture notes in computer science
Vol/bind	9833
Sider (fra-til)	617-630
ISSN	0302-9743
DOI	https://doi.org/10.1007/978-3-319-43659-3_45
Status	Udgivet - 24 aug. 2016

Adgang til dokumentet

10.1007/978-3-319-43659-3_45

chp_10.1007_978-3-319-43659-3_45Forlagets udgivne version, 749 KB

https://link.springer.com/content/pdf/10.1007/978-3-319-43659-3_45.pdf

Citationsformater

@inproceedings{9027ea2465314491b11d634b231e057f,

title = "A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves",

abstract = "The sparse triangular solve kernel, SpTRSV, is an important building block for a number of numerical linear algebra routines. Parallelizing SpTRSV on today{\textquoteright}s manycore platforms, such as GPUs, is not an easy task since computing a component of the solution may depend on previously computed components, enforcing a degree of sequential processing. As a consequence, most existing work introduces a preprocessing stage to partition the components into a group of level-sets or colour-sets so that components within a set are independent and can be processed simultaneously during the subsequent solution stage. However, this class of methods requires a long preprocessing time as well as significant runtime synchronization overhead between the sets. To address this, we propose in this paper a novel approach for SpTRSV in which the ordering between components is naturally enforced within the solution stage. In this way, the cost for preprocessing can be greatly reduced, and the synchronizations between sets are completely eliminated. A comparison with the state-of-the-art library supplied by the GPU vendor, using 11 sparse matrices on the latest GPU device, show that our approach obtains an average speedup of 2.3 times in single precision and 2.14 times in double precision. The maximum speedups are 5.95 and 3.65, respectively. In addition, our method is an order of magnitude faster for the preprocessing stage than existing methods.",

author = "Weifeng Liu and A. Li and J. Hogg and IS Duff and Brian Vinter",

year = "2016",

month = aug,

day = "24",

doi = "10.1007/978-3-319-43659-3_45",

language = "English",

volume = "9833",

pages = "617--630",

journal = "Lecture notes in computer science",

issn = "0302-9743",

publisher = "Springer",

}

TY - GEN

T1 - A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves

AU - Liu, Weifeng

AU - Li, A.

AU - Hogg, J.

AU - Duff, IS

AU - Vinter, Brian

PY - 2016/8/24

Y1 - 2016/8/24

N2 - The sparse triangular solve kernel, SpTRSV, is an important building block for a number of numerical linear algebra routines. Parallelizing SpTRSV on today’s manycore platforms, such as GPUs, is not an easy task since computing a component of the solution may depend on previously computed components, enforcing a degree of sequential processing. As a consequence, most existing work introduces a preprocessing stage to partition the components into a group of level-sets or colour-sets so that components within a set are independent and can be processed simultaneously during the subsequent solution stage. However, this class of methods requires a long preprocessing time as well as significant runtime synchronization overhead between the sets. To address this, we propose in this paper a novel approach for SpTRSV in which the ordering between components is naturally enforced within the solution stage. In this way, the cost for preprocessing can be greatly reduced, and the synchronizations between sets are completely eliminated. A comparison with the state-of-the-art library supplied by the GPU vendor, using 11 sparse matrices on the latest GPU device, show that our approach obtains an average speedup of 2.3 times in single precision and 2.14 times in double precision. The maximum speedups are 5.95 and 3.65, respectively. In addition, our method is an order of magnitude faster for the preprocessing stage than existing methods.

AB - The sparse triangular solve kernel, SpTRSV, is an important building block for a number of numerical linear algebra routines. Parallelizing SpTRSV on today’s manycore platforms, such as GPUs, is not an easy task since computing a component of the solution may depend on previously computed components, enforcing a degree of sequential processing. As a consequence, most existing work introduces a preprocessing stage to partition the components into a group of level-sets or colour-sets so that components within a set are independent and can be processed simultaneously during the subsequent solution stage. However, this class of methods requires a long preprocessing time as well as significant runtime synchronization overhead between the sets. To address this, we propose in this paper a novel approach for SpTRSV in which the ordering between components is naturally enforced within the solution stage. In this way, the cost for preprocessing can be greatly reduced, and the synchronizations between sets are completely eliminated. A comparison with the state-of-the-art library supplied by the GPU vendor, using 11 sparse matrices on the latest GPU device, show that our approach obtains an average speedup of 2.3 times in single precision and 2.14 times in double precision. The maximum speedups are 5.95 and 3.65, respectively. In addition, our method is an order of magnitude faster for the preprocessing stage than existing methods.

U2 - 10.1007/978-3-319-43659-3_45

DO - 10.1007/978-3-319-43659-3_45

M3 - Conference article

SN - 0302-9743

VL - 9833

SP - 617

EP - 630

JO - Lecture notes in computer science

JF - Lecture notes in computer science

ER -

A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater