Strategies for regular segmented reductions on GPU

Rasmus Wriedt Larsen; Troels Henriksen

doi:10.1145/3122948.3122952

Strategies for regular segmented reductions on GPU

Rasmus Wriedt Larsen, Troels Henriksen

Department of Computer Science

3 Citations (Scopus)

Abstract

We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.

Original language	English
Title of host publication	Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing
Number of pages	11
Publisher	Association for Computing Machinery
Publication date	2017
Pages	42-52
ISBN (Electronic)	978-1-4503-5181-2
DOIs	https://doi.org/10.1145/3122948.3122952
Publication status	Published - 2017
Event	6th ACM SIGPLAN International Workshop on Functional High-Performance Computing - Oxford, United Kingdom Duration: 7 Sept 2017 → 7 Sept 2017 Conference number: 6

Workshop

Workshop	6th ACM SIGPLAN International Workshop on Functional High-Performance Computing
Number	6
Country/Territory	United Kingdom
City	Oxford
Period	07/09/2017 → 07/09/2017

Keywords

Functional programming
GPGPU
Parallelism

Access to Document

10.1145/3122948.3122952

Cite this

Larsen, RW & Henriksen, T 2017, Strategies for regular segmented reductions on GPU. in Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing. Association for Computing Machinery, pp. 42-52, 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing, Oxford, United Kingdom, 07/09/2017. https://doi.org/10.1145/3122948.3122952

@inproceedings{9b2c9d9bff73443dbcb7f9fc7a4a253c,

title = "Strategies for regular segmented reductions on GPU",

abstract = "We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.",

keywords = "Functional programming, GPGPU, Parallelism",

author = "Larsen, {Rasmus Wriedt} and Troels Henriksen",

year = "2017",

doi = "10.1145/3122948.3122952",

language = "English",

pages = "42--52",

booktitle = "Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing",

publisher = "Association for Computing Machinery",

note = "6th ACM SIGPLAN International Workshop on Functional High-Performance Computing ; Conference date: 07-09-2017 Through 07-09-2017",

}

TY - GEN

T1 - Strategies for regular segmented reductions on GPU

AU - Larsen, Rasmus Wriedt

AU - Henriksen, Troels

N1 - Conference code: 6

PY - 2017

Y1 - 2017

N2 - We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.

AB - We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.

KW - Functional programming

KW - GPGPU

KW - Parallelism

UR - http://www.scopus.com/inward/record.url?scp=85030990504&partnerID=8YFLogxK

U2 - 10.1145/3122948.3122952

DO - 10.1145/3122948.3122952

M3 - Article in proceedings

AN - SCOPUS:85030990504

SP - 42

EP - 52

BT - Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing

PB - Association for Computing Machinery

T2 - 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing

Y2 - 7 September 2017 through 7 September 2017

ER -

Strategies for regular segmented reductions on GPU

Abstract

Workshop

Keywords

Access to Document

Other files and links

Fingerprint

Cite this