Transparent GPU Execution of NumPy Applications

Troels Blum; Mads Ruben Burgdorff Kristensen; Brian Vinter

Transparent GPU Execution of NumPy Applications

Troels Blum, Mads Ruben Burgdorff Kristensen, Brian Vinter

eScience

7 Citationer (Scopus)

Abstract

In this work, we present a back-end for the Python library NumPy that utilizes the GPU seamlessly. We use dynamic code generation to generate kernels, and data is moved transparently to and from the GPU. For the integration into NumPy, we use the Bohrium runtime system. Bohrium hooks into NumPy through the implicit data parallelization of array operations, this approach requires no annotations or other code modifications. The key motivation for our GPU computation back-end is to transform high-level Python/NumPy applications to the lowlevel GPU executable kernels, with the goal of obtaining highperformance, high-productivity and high-portability, HP3. We provide a performance study of the GPU back-end that includes four well-known benchmark applications, Black-Scholes, Successive Over-relaxation, Shallow Water, and N-body, implemented in pure Python/NumPy. We demonstrate an impressive 834 times speed up for the Black-Scholes application, and an average speedup of 124 times across the four benchmarks.

Originalsprog	Engelsk
Titel	Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2014 IEEE 28th International
Publikationsdato	27 nov. 2014
Status	Udgivet - 27 nov. 2014

Citationsformater

@inbook{fd3a1733676f4caca3e1081c79590f35,

title = "Transparent GPU Execution of NumPy Applications",

abstract = "In this work, we present a back-end for the Python library NumPy that utilizes the GPU seamlessly. We use dynamic code generation to generate kernels, and data is moved transparently to and from the GPU. For the integration into NumPy, we use the Bohrium runtime system. Bohrium hooks into NumPy through the implicit data parallelization of array operations, this approach requires no annotations or other code modifications. The key motivation for our GPU computation back-end is to transform high-level Python/NumPy applications to the lowlevel GPU executable kernels, with the goal of obtaining highperformance, high-productivity and high-portability, HP3. We provide a performance study of the GPU back-end that includes four well-known benchmark applications, Black-Scholes, Successive Over-relaxation, Shallow Water, and N-body, implemented in pure Python/NumPy. We demonstrate an impressive 834 times speed up for the Black-Scholes application, and an average speedup of 124 times across the four benchmarks.",

author = "Troels Blum and Kristensen, {Mads Ruben Burgdorff} and Brian Vinter",

year = "2014",

month = nov,

day = "27",

language = "English",

booktitle = "Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2014 IEEE 28th International",

}

TY - CHAP

T1 - Transparent GPU Execution of NumPy Applications

AU - Blum, Troels

AU - Kristensen, Mads Ruben Burgdorff

AU - Vinter, Brian

PY - 2014/11/27

Y1 - 2014/11/27

N2 - In this work, we present a back-end for the Python library NumPy that utilizes the GPU seamlessly. We use dynamic code generation to generate kernels, and data is moved transparently to and from the GPU. For the integration into NumPy, we use the Bohrium runtime system. Bohrium hooks into NumPy through the implicit data parallelization of array operations, this approach requires no annotations or other code modifications. The key motivation for our GPU computation back-end is to transform high-level Python/NumPy applications to the lowlevel GPU executable kernels, with the goal of obtaining highperformance, high-productivity and high-portability, HP3. We provide a performance study of the GPU back-end that includes four well-known benchmark applications, Black-Scholes, Successive Over-relaxation, Shallow Water, and N-body, implemented in pure Python/NumPy. We demonstrate an impressive 834 times speed up for the Black-Scholes application, and an average speedup of 124 times across the four benchmarks.

AB - In this work, we present a back-end for the Python library NumPy that utilizes the GPU seamlessly. We use dynamic code generation to generate kernels, and data is moved transparently to and from the GPU. For the integration into NumPy, we use the Bohrium runtime system. Bohrium hooks into NumPy through the implicit data parallelization of array operations, this approach requires no annotations or other code modifications. The key motivation for our GPU computation back-end is to transform high-level Python/NumPy applications to the lowlevel GPU executable kernels, with the goal of obtaining highperformance, high-productivity and high-portability, HP3. We provide a performance study of the GPU back-end that includes four well-known benchmark applications, Black-Scholes, Successive Over-relaxation, Shallow Water, and N-body, implemented in pure Python/NumPy. We demonstrate an impressive 834 times speed up for the Black-Scholes application, and an average speedup of 124 times across the four benchmarks.

M3 - Book chapter

BT - Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2014 IEEE 28th International

ER -

Transparent GPU Execution of NumPy Applications

Abstract

Fingeraftryk

Citationsformater