FinPar: a parallel financial benchmark

Christian Andreetta; Vivien Begot; Jost Berthold; Martin Elsman; Fritz Henglein; Troels Henriksen; Maj-Britt Nordfang; Cosmin Eugen Oancea

doi:10.1145/2898354

FinPar: a parallel financial benchmark

Christian Andreetta, Vivien Begot, Jost Berthold, Martin Elsman, Fritz Henglein, Troels Henriksen, Maj-Britt Nordfang, Cosmin Eugen Oancea

9 Citationer (Scopus)

Abstract

Commodity many-core hardware is now mainstream, but parallel programming models are still lagging behind in efficiently utilizing the application parallelism. There are (at least) two principal reasons for this. First, real-world programs often take the form of a deeply nested composition of parallel operators, but mapping the available parallelism to the hardware requires a set of transformations that are tedious to do by hand and beyond the capability of the common user. Second, the best optimization strategy, such as what to parallelize and what to efficiently sequentialize, is often sensitive to the input dataset and therefore requires multiple code versions that are optimized differently, which also raises maintainability problems.

This article presents three array-based applications from the financial domain that are suitable for gpgpu execution. Common benchmark-design practice has been to provide the same code for the sequential and parallel versions that are optimized for only one class of datasets. In comparison, we document (1) all available parallelism via nested map-reduce functional combinators, in a simple Haskell implementation that closely resembles the original code structure, (2) the invariants and code transformations that govern the main trade-offs of a data-sensitive optimization space, and (3) report target cpu and multiversion gpgpu code together with an evaluation that demonstrates optimization trade-offs and other difficulties. We believe that this work provides useful insight into the language constructs and compiler infrastructure capable of expressing and optimizing such applications, and we report in-progress work in this direction.

Originalsprog	Engelsk
Artikelnummer	18
Tidsskrift	ACM Transactions on Architecture and Code Optimization (TACO)
Vol/bind	13
Udgave nummer	2
Sider (fra-til)	1
Antal sider	27
ISSN	1544-3566
DOI	https://doi.org/10.1145/2898354
Status	Udgivet - jun. 2016

Adgang til dokumentet

10.1145/2898354

http://dl.acm.org/citation.cfm?id=2898354

Citationsformater

@article{14f9a6eba936452392f2b7b3ffcd2ced,

title = "FinPar: a parallel financial benchmark",

abstract = "Commodity many-core hardware is now mainstream, but parallel programming models are still lagging behind in efficiently utilizing the application parallelism. There are (at least) two principal reasons for this. First, real-world programs often take the form of a deeply nested composition of parallel operators, but mapping the available parallelism to the hardware requires a set of transformations that are tedious to do by hand and beyond the capability of the common user. Second, the best optimization strategy, such as what to parallelize and what to efficiently sequentialize, is often sensitive to the input dataset and therefore requires multiple code versions that are optimized differently, which also raises maintainability problems.This article presents three array-based applications from the financial domain that are suitable for gpgpu execution. Common benchmark-design practice has been to provide the same code for the sequential and parallel versions that are optimized for only one class of datasets. In comparison, we document (1) all available parallelism via nested map-reduce functional combinators, in a simple Haskell implementation that closely resembles the original code structure, (2) the invariants and code transformations that govern the main trade-offs of a data-sensitive optimization space, and (3) report target cpu and multiversion gpgpu code together with an evaluation that demonstrates optimization trade-offs and other difficulties. We believe that this work provides useful insight into the language constructs and compiler infrastructure capable of expressing and optimizing such applications, and we report in-progress work in this direction.",

author = "Christian Andreetta and Vivien Begot and Jost Berthold and Martin Elsman and Fritz Henglein and Troels Henriksen and Maj-Britt Nordfang and Oancea, {Cosmin Eugen}",

year = "2016",

month = jun,

doi = "10.1145/2898354",

language = "English",

volume = "13",

pages = "1",

journal = "ACM Transactions on Architecture and Code Optimization (TACO)",

issn = "1544-3566",

publisher = "ACM",

number = "2",

}

TY - JOUR

T1 - FinPar

T2 - a parallel financial benchmark

AU - Andreetta, Christian

AU - Begot, Vivien

AU - Berthold, Jost

AU - Elsman, Martin

AU - Henglein, Fritz

AU - Henriksen, Troels

AU - Nordfang, Maj-Britt

AU - Oancea, Cosmin Eugen

PY - 2016/6

Y1 - 2016/6

N2 - Commodity many-core hardware is now mainstream, but parallel programming models are still lagging behind in efficiently utilizing the application parallelism. There are (at least) two principal reasons for this. First, real-world programs often take the form of a deeply nested composition of parallel operators, but mapping the available parallelism to the hardware requires a set of transformations that are tedious to do by hand and beyond the capability of the common user. Second, the best optimization strategy, such as what to parallelize and what to efficiently sequentialize, is often sensitive to the input dataset and therefore requires multiple code versions that are optimized differently, which also raises maintainability problems.This article presents three array-based applications from the financial domain that are suitable for gpgpu execution. Common benchmark-design practice has been to provide the same code for the sequential and parallel versions that are optimized for only one class of datasets. In comparison, we document (1) all available parallelism via nested map-reduce functional combinators, in a simple Haskell implementation that closely resembles the original code structure, (2) the invariants and code transformations that govern the main trade-offs of a data-sensitive optimization space, and (3) report target cpu and multiversion gpgpu code together with an evaluation that demonstrates optimization trade-offs and other difficulties. We believe that this work provides useful insight into the language constructs and compiler infrastructure capable of expressing and optimizing such applications, and we report in-progress work in this direction.

AB - Commodity many-core hardware is now mainstream, but parallel programming models are still lagging behind in efficiently utilizing the application parallelism. There are (at least) two principal reasons for this. First, real-world programs often take the form of a deeply nested composition of parallel operators, but mapping the available parallelism to the hardware requires a set of transformations that are tedious to do by hand and beyond the capability of the common user. Second, the best optimization strategy, such as what to parallelize and what to efficiently sequentialize, is often sensitive to the input dataset and therefore requires multiple code versions that are optimized differently, which also raises maintainability problems.This article presents three array-based applications from the financial domain that are suitable for gpgpu execution. Common benchmark-design practice has been to provide the same code for the sequential and parallel versions that are optimized for only one class of datasets. In comparison, we document (1) all available parallelism via nested map-reduce functional combinators, in a simple Haskell implementation that closely resembles the original code structure, (2) the invariants and code transformations that govern the main trade-offs of a data-sensitive optimization space, and (3) report target cpu and multiversion gpgpu code together with an evaluation that demonstrates optimization trade-offs and other difficulties. We believe that this work provides useful insight into the language constructs and compiler infrastructure capable of expressing and optimizing such applications, and we report in-progress work in this direction.

U2 - 10.1145/2898354

DO - 10.1145/2898354

M3 - Journal article

SN - 1544-3566

VL - 13

SP - 1

JO - ACM Transactions on Architecture and Code Optimization (TACO)

JF - ACM Transactions on Architecture and Code Optimization (TACO)

IS - 2

M1 - 18

ER -

FinPar: a parallel financial benchmark

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater