A T2 graph-reduction approach to fusion

Troels Henriksen; Cosmin Eugen Oancea

doi:10.1145/2502323.2502328

A T2 graph-reduction approach to fusion

Troels Henriksen, Cosmin Eugen Oancea

Department of Computer Science

14 Citations (Scopus)

Abstract

Fusion is one of the most important code transformations as it has the potential to substantially optimize both the memory hierarchy time overhead and, sometimes asymptotically, the space requirement.

In functional languages, fusion is naturally and relatively easily derived as a producer-consumer relation between program constructs that expose a richer, higher-order algebra of program invariants, such as the map-reduce list homomorphisms.

In imperative languages, fusing producer-consumer loops requires dependency analysis on arrays applied at loop-nest level. Such analysis, however, has often been labeled as "heroic effort" and, if at all, is supported only in its simplest and most conservative form in industrial compilers.

Related implementations in the functional context typically apply fusion only when the to-be-fused producer is used exactly once, i.e., in the consumer. This guarantees that the transformation is conservative: the resulting program does not duplicate computation.

We show that the above restriction is more conservative than needed, and present a structural-analysis technique, inspired from the T1--T2 transformation for reducible data flow, that enables fusion even in some cases when the producer is used in different consumers and without duplicating computation.

We report an implementation of the fusion algorithm for a functional-core language, named L0, which is intended to support nested parallelism across regular multi-dimensional arrays. We succinctly describe L0's semantics and the compiler infrastructure on which the fusion transformation relies, and present compiler-generated statistics related to fusion on a set of six benchmarks.

Original language	English
Title of host publication	Proceedings of the 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC'13)
Number of pages	12
Publisher	Association for Computing Machinery
Publication date	2013
Pages	47-58
ISBN (Print)	978-1-4503-2381-9
DOIs	https://doi.org/10.1145/2502323.2502328
Publication status	Published - 2013
Event	ACM SIGPLAN workshop on Functional high-performance computing - Boston, United States Duration: 25 Sept 2013 → 27 Sept 2013 Conference number: 2

Conference

Conference	ACM SIGPLAN workshop on Functional high-performance computing
Number	2
Country/Territory	United States
City	Boston
Period	25/09/2013 → 27/09/2013

Access to Document

10.1145/2502323.2502328

Cite this

@inproceedings{1eb807ab0e8c4264bffc6c9fd649fe6a,

title = "A T2 graph-reduction approach to fusion",

abstract = "Fusion is one of the most important code transformations as it has the potential to substantially optimize both the memory hierarchy time overhead and, sometimes asymptotically, the space requirement.In functional languages, fusion is naturally and relatively easily derived as a producer-consumer relation between program constructs that expose a richer, higher-order algebra of program invariants, such as the map-reduce list homomorphisms.In imperative languages, fusing producer-consumer loops requires dependency analysis on arrays applied at loop-nest level. Such analysis, however, has often been labeled as {"}heroic effort{"} and, if at all, is supported only in its simplest and most conservative form in industrial compilers.Related implementations in the functional context typically apply fusion only when the to-be-fused producer is used exactly once, i.e., in the consumer. This guarantees that the transformation is conservative: the resulting program does not duplicate computation.We show that the above restriction is more conservative than needed, and present a structural-analysis technique, inspired from the T1--T2 transformation for reducible data flow, that enables fusion even in some cases when the producer is used in different consumers and without duplicating computation.We report an implementation of the fusion algorithm for a functional-core language, named L0, which is intended to support nested parallelism across regular multi-dimensional arrays. We succinctly describe L0's semantics and the compiler infrastructure on which the fusion transformation relies, and present compiler-generated statistics related to fusion on a set of six benchmarks.",

keywords = "autoparallelization, functional language, fusion",

author = "Troels Henriksen and Oancea, {Cosmin Eugen}",

year = "2013",

doi = "10.1145/2502323.2502328",

language = "English",

isbn = "978-1-4503-2381-9 ",

pages = "47--58",

booktitle = "Proceedings of the 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC'13)",

publisher = "Association for Computing Machinery",

note = "ACM SIGPLAN workshop on Functional high-performance computing ; Conference date: 25-09-2013 Through 27-09-2013",

}

TY - GEN

T1 - A T2 graph-reduction approach to fusion

AU - Henriksen, Troels

AU - Oancea, Cosmin Eugen

N1 - Conference code: 2

PY - 2013

Y1 - 2013

N2 - Fusion is one of the most important code transformations as it has the potential to substantially optimize both the memory hierarchy time overhead and, sometimes asymptotically, the space requirement.In functional languages, fusion is naturally and relatively easily derived as a producer-consumer relation between program constructs that expose a richer, higher-order algebra of program invariants, such as the map-reduce list homomorphisms.In imperative languages, fusing producer-consumer loops requires dependency analysis on arrays applied at loop-nest level. Such analysis, however, has often been labeled as "heroic effort" and, if at all, is supported only in its simplest and most conservative form in industrial compilers.Related implementations in the functional context typically apply fusion only when the to-be-fused producer is used exactly once, i.e., in the consumer. This guarantees that the transformation is conservative: the resulting program does not duplicate computation.We show that the above restriction is more conservative than needed, and present a structural-analysis technique, inspired from the T1--T2 transformation for reducible data flow, that enables fusion even in some cases when the producer is used in different consumers and without duplicating computation.We report an implementation of the fusion algorithm for a functional-core language, named L0, which is intended to support nested parallelism across regular multi-dimensional arrays. We succinctly describe L0's semantics and the compiler infrastructure on which the fusion transformation relies, and present compiler-generated statistics related to fusion on a set of six benchmarks.

AB - Fusion is one of the most important code transformations as it has the potential to substantially optimize both the memory hierarchy time overhead and, sometimes asymptotically, the space requirement.In functional languages, fusion is naturally and relatively easily derived as a producer-consumer relation between program constructs that expose a richer, higher-order algebra of program invariants, such as the map-reduce list homomorphisms.In imperative languages, fusing producer-consumer loops requires dependency analysis on arrays applied at loop-nest level. Such analysis, however, has often been labeled as "heroic effort" and, if at all, is supported only in its simplest and most conservative form in industrial compilers.Related implementations in the functional context typically apply fusion only when the to-be-fused producer is used exactly once, i.e., in the consumer. This guarantees that the transformation is conservative: the resulting program does not duplicate computation.We show that the above restriction is more conservative than needed, and present a structural-analysis technique, inspired from the T1--T2 transformation for reducible data flow, that enables fusion even in some cases when the producer is used in different consumers and without duplicating computation.We report an implementation of the fusion algorithm for a functional-core language, named L0, which is intended to support nested parallelism across regular multi-dimensional arrays. We succinctly describe L0's semantics and the compiler infrastructure on which the fusion transformation relies, and present compiler-generated statistics related to fusion on a set of six benchmarks.

KW - autoparallelization, functional language, fusion

U2 - 10.1145/2502323.2502328

DO - 10.1145/2502323.2502328

M3 - Article in proceedings

SN - 978-1-4503-2381-9

SP - 47

EP - 58

BT - Proceedings of the 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC'13)

PB - Association for Computing Machinery

T2 - ACM SIGPLAN workshop on Functional high-performance computing

Y2 - 25 September 2013 through 27 September 2013

ER -

A T2 graph-reduction approach to fusion

Abstract

Conference

Access to Document

Fingerprint

Cite this