Strategies for regular segmented reductions on GPU

Rasmus Wriedt Larsen; Troels Henriksen

doi:10.1145/3122948.3122952

Strategies for regular segmented reductions on GPU

Rasmus Wriedt Larsen, Troels Henriksen

Datalogisk Institut

3 Citationer (Scopus)

Abstract

We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.

Originalsprog	Engelsk
Titel	Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing
Antal sider	11
Forlag	Association for Computing Machinery
Publikationsdato	2017
Sider	42-52
ISBN (Elektronisk)	978-1-4503-5181-2
DOI	https://doi.org/10.1145/3122948.3122952
Status	Udgivet - 2017
Begivenhed	6th ACM SIGPLAN International Workshop on Functional High-Performance Computing - Oxford, Storbritannien Varighed: 7 sep. 2017 → 7 sep. 2017 Konferencens nummer: 6

Workshop

Workshop	6th ACM SIGPLAN International Workshop on Functional High-Performance Computing
Nummer	6
Land/Område	Storbritannien
By	Oxford
Periode	07/09/2017 → 07/09/2017

Adgang til dokumentet

10.1145/3122948.3122952

Andre filer og links

Link to publication in Scopus

Citationsformater

Larsen, RW & Henriksen, T 2017, Strategies for regular segmented reductions on GPU. i Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing. Association for Computing Machinery, s. 42-52, 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing, Oxford, Storbritannien, 07/09/2017. https://doi.org/10.1145/3122948.3122952

@inproceedings{9b2c9d9bff73443dbcb7f9fc7a4a253c,

title = "Strategies for regular segmented reductions on GPU",

abstract = "We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.",

keywords = "Functional programming, GPGPU, Parallelism",

author = "Larsen, {Rasmus Wriedt} and Troels Henriksen",

year = "2017",

doi = "10.1145/3122948.3122952",

language = "English",

pages = "42--52",

booktitle = "Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing",

publisher = "Association for Computing Machinery",

note = "6th ACM SIGPLAN International Workshop on Functional High-Performance Computing ; Conference date: 07-09-2017 Through 07-09-2017",

}

TY - GEN

T1 - Strategies for regular segmented reductions on GPU

AU - Larsen, Rasmus Wriedt

AU - Henriksen, Troels

N1 - Conference code: 6

PY - 2017

Y1 - 2017

N2 - We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.

AB - We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.

KW - Functional programming

KW - GPGPU

KW - Parallelism

UR - http://www.scopus.com/inward/record.url?scp=85030990504&partnerID=8YFLogxK

U2 - 10.1145/3122948.3122952

DO - 10.1145/3122948.3122952

M3 - Article in proceedings

AN - SCOPUS:85030990504

SP - 42

EP - 52

BT - Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing

PB - Association for Computing Machinery

T2 - 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing

Y2 - 7 September 2017 through 7 September 2017

ER -

Strategies for regular segmented reductions on GPU

Abstract

Workshop

Adgang til dokumentet

Andre filer og links

Fingeraftryk

Citationsformater