Financial software on GPUs: between Haskell and Fortran

Cosmin Eugen Oancea; Christian Andreetta; Jost Berthold; Alain Frisch; Fritz Henglein

doi:10.1145/2364474.2364484

Financial software on GPUs: between Haskell and Fortran

Cosmin Eugen Oancea, Christian Andreetta, Jost Berthold, Alain Frisch, Fritz Henglein

Department of Computer Science

10 Citations (Scopus)

Abstract

This paper presents a real-world pricing kernel for financial derivatives and evaluates the language and compiler tool chain that would allow expressive, hardware-neutral algorithm implementation and efficient execution on graphics-processing units (GPU). The language issues refer to preserving algorithmic invariants, e.g., inherent parallelism made explicit by map-reduce-scan functional combinators. Efficient execution is achieved by manually; applying a series of generally-applicable compiler transformations that allows the generated-OpenCL code to yield speedups as high as 70x and 540x on a commodity mobile and desktop GPU, respectively. Apart from the concrete speed-ups attained, our contributions are twofold: First, from a language perspective;, we illustrate that even state-of-the-art auto-parallelization techniques are incapable of discovering all the requisite data parallelism when rendering the functional code in Fortran-style imperative array processing form. Second, from a performance perspective;, we study which compiler transformations are necessary to map the high-level functional code to hand-optimized OpenCL code for GPU execution. We discover a rich optimization space with nontrivial trade-offs and cost models. Memory reuse in map-reduce patterns, strength reduction, branch divergence optimization, and memory access coalescing, exhibit significant impact individually. When combined, they enable essentially full utilization of all GPU cores. Functional programming has played a crucial double role in our case study: Capturing the naturally data-parallel structure of the pricing algorithm in a transparent, reusable and entirely hardware-independent fashion; and supporting the correctness of the subsequent compiler transformations to a hardware-oriented target language by a rich class of universally valid equational properties. Given the observed difficulty of automatically parallelizing imperative sequential code and the inherent labor of porting hardware-oriented and -optimized programs, our case study suggests that functional programming technology can facilitate high-level; expression of leading-edge performant portable; high-performance systems for massively parallel hardware architectures.

Original language	English
Title of host publication	FHPC’12 : Proceedings of the 1st ACM SIGPLAN Workshop on Functional High Performance Computing
Number of pages	12
Publisher	Association for Computing Machinery
Publication date	2012
Pages	61-72
ISBN (Print)	978-1-4503-1577-7
DOIs	https://doi.org/10.1145/2364474.2364484
Publication status	Published - 2012
Event	1st ACM SIGPLAN Workshop on Functional High-Performance Computing - København, Denmark Duration: 15 Sept 2012 → 15 Sept 2012 Conference number: 1

Conference

Conference	1st ACM SIGPLAN Workshop on Functional High-Performance Computing
Number	1
Country/Territory	Denmark
City	København
Period	15/09/2012 → 15/09/2012

Keywords

autoparallelization, functional language, memory coalescing, strength reduction, tiling

Access to Document

10.1145/2364474.2364484

Cite this

Financial software on GPUs: between Haskell and Fortran. / Oancea, Cosmin Eugen; Andreetta, Christian; Berthold, Jost et al.
FHPC’12: Proceedings of the 1st ACM SIGPLAN Workshop on Functional High Performance Computing. Association for Computing Machinery, 2012. p. 61-72.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Oancea, CE, Andreetta, C, Berthold, J, Frisch, A & Henglein, F 2012, Financial software on GPUs: between Haskell and Fortran. in FHPC’12: Proceedings of the 1st ACM SIGPLAN Workshop on Functional High Performance Computing. Association for Computing Machinery, pp. 61-72, 1st ACM SIGPLAN Workshop on Functional High-Performance Computing, København, Denmark, 15/09/2012. https://doi.org/10.1145/2364474.2364484

@inproceedings{a779ea5dffcc43cc8e108815b523f315,

title = "Financial software on GPUs: between Haskell and Fortran",

abstract = "This paper presents a real-world pricing kernel for financial derivatives and evaluates the language and compiler tool chain that would allow expressive, hardware-neutral algorithm implementation and efficient execution on graphics-processing units (GPU). The language issues refer to preserving algorithmic invariants, e.g., inherent parallelism made explicit by map-reduce-scan functional combinators. Efficient execution is achieved by manually; applying a series of generally-applicable compiler transformations that allows the generated-OpenCL code to yield speedups as high as 70x and 540x on a commodity mobile and desktop GPU, respectively. Apart from the concrete speed-ups attained, our contributions are twofold: First, from a language perspective;, we illustrate that even state-of-the-art auto-parallelization techniques are incapable of discovering all the requisite data parallelism when rendering the functional code in Fortran-style imperative array processing form. Second, from a performance perspective;, we study which compiler transformations are necessary to map the high-level functional code to hand-optimized OpenCL code for GPU execution. We discover a rich optimization space with nontrivial trade-offs and cost models. Memory reuse in map-reduce patterns, strength reduction, branch divergence optimization, and memory access coalescing, exhibit significant impact individually. When combined, they enable essentially full utilization of all GPU cores. Functional programming has played a crucial double role in our case study: Capturing the naturally data-parallel structure of the pricing algorithm in a transparent, reusable and entirely hardware-independent fashion; and supporting the correctness of the subsequent compiler transformations to a hardware-oriented target language by a rich class of universally valid equational properties. Given the observed difficulty of automatically parallelizing imperative sequential code and the inherent labor of porting hardware-oriented and -optimized programs, our case study suggests that functional programming technology can facilitate high-level; expression of leading-edge performant portable; high-performance systems for massively parallel hardware architectures.",

keywords = "autoparallelization, functional language, memory coalescing, strength reduction, tiling",

author = "Oancea, {Cosmin Eugen} and Christian Andreetta and Jost Berthold and Alain Frisch and Fritz Henglein",

year = "2012",

doi = "10.1145/2364474.2364484",

language = "English",

isbn = "978-1-4503-1577-7",

pages = "61--72",

booktitle = "FHPC{\textquoteright}12",

publisher = "Association for Computing Machinery",

note = "1st ACM SIGPLAN Workshop on Functional High-Performance Computing, FHPC '12 ; Conference date: 15-09-2012 Through 15-09-2012",

}

TY - GEN

T1 - Financial software on GPUs

T2 - 1st ACM SIGPLAN Workshop on Functional High-Performance Computing

AU - Oancea, Cosmin Eugen

AU - Andreetta, Christian

AU - Berthold, Jost

AU - Frisch, Alain

AU - Henglein, Fritz

N1 - Conference code: 1

PY - 2012

Y1 - 2012

N2 - This paper presents a real-world pricing kernel for financial derivatives and evaluates the language and compiler tool chain that would allow expressive, hardware-neutral algorithm implementation and efficient execution on graphics-processing units (GPU). The language issues refer to preserving algorithmic invariants, e.g., inherent parallelism made explicit by map-reduce-scan functional combinators. Efficient execution is achieved by manually; applying a series of generally-applicable compiler transformations that allows the generated-OpenCL code to yield speedups as high as 70x and 540x on a commodity mobile and desktop GPU, respectively. Apart from the concrete speed-ups attained, our contributions are twofold: First, from a language perspective;, we illustrate that even state-of-the-art auto-parallelization techniques are incapable of discovering all the requisite data parallelism when rendering the functional code in Fortran-style imperative array processing form. Second, from a performance perspective;, we study which compiler transformations are necessary to map the high-level functional code to hand-optimized OpenCL code for GPU execution. We discover a rich optimization space with nontrivial trade-offs and cost models. Memory reuse in map-reduce patterns, strength reduction, branch divergence optimization, and memory access coalescing, exhibit significant impact individually. When combined, they enable essentially full utilization of all GPU cores. Functional programming has played a crucial double role in our case study: Capturing the naturally data-parallel structure of the pricing algorithm in a transparent, reusable and entirely hardware-independent fashion; and supporting the correctness of the subsequent compiler transformations to a hardware-oriented target language by a rich class of universally valid equational properties. Given the observed difficulty of automatically parallelizing imperative sequential code and the inherent labor of porting hardware-oriented and -optimized programs, our case study suggests that functional programming technology can facilitate high-level; expression of leading-edge performant portable; high-performance systems for massively parallel hardware architectures.

AB - This paper presents a real-world pricing kernel for financial derivatives and evaluates the language and compiler tool chain that would allow expressive, hardware-neutral algorithm implementation and efficient execution on graphics-processing units (GPU). The language issues refer to preserving algorithmic invariants, e.g., inherent parallelism made explicit by map-reduce-scan functional combinators. Efficient execution is achieved by manually; applying a series of generally-applicable compiler transformations that allows the generated-OpenCL code to yield speedups as high as 70x and 540x on a commodity mobile and desktop GPU, respectively. Apart from the concrete speed-ups attained, our contributions are twofold: First, from a language perspective;, we illustrate that even state-of-the-art auto-parallelization techniques are incapable of discovering all the requisite data parallelism when rendering the functional code in Fortran-style imperative array processing form. Second, from a performance perspective;, we study which compiler transformations are necessary to map the high-level functional code to hand-optimized OpenCL code for GPU execution. We discover a rich optimization space with nontrivial trade-offs and cost models. Memory reuse in map-reduce patterns, strength reduction, branch divergence optimization, and memory access coalescing, exhibit significant impact individually. When combined, they enable essentially full utilization of all GPU cores. Functional programming has played a crucial double role in our case study: Capturing the naturally data-parallel structure of the pricing algorithm in a transparent, reusable and entirely hardware-independent fashion; and supporting the correctness of the subsequent compiler transformations to a hardware-oriented target language by a rich class of universally valid equational properties. Given the observed difficulty of automatically parallelizing imperative sequential code and the inherent labor of porting hardware-oriented and -optimized programs, our case study suggests that functional programming technology can facilitate high-level; expression of leading-edge performant portable; high-performance systems for massively parallel hardware architectures.

KW - autoparallelization, functional language, memory coalescing, strength reduction, tiling

U2 - 10.1145/2364474.2364484

DO - 10.1145/2364474.2364484

M3 - Article in proceedings

SN - 978-1-4503-1577-7

SP - 61

EP - 72

BT - FHPC’12

PB - Association for Computing Machinery

Y2 - 15 September 2012 through 15 September 2012

ER -

Financial software on GPUs: between Haskell and Fortran

Abstract

Conference

Keywords

Access to Document

Fingerprint

Cite this