Abstract
Fusion is one of the most important code transformations as it has the potential to substantially optimize both the memory hierarchy time overhead and, sometimes asymptotically, the space requirement.
In functional languages, fusion is naturally and relatively easily derived as a producer-consumer relation between program constructs that expose a richer, higher-order algebra of program invariants, such as the map-reduce list homomorphisms.
In imperative languages, fusing producer-consumer loops requires dependency analysis on arrays applied at loop-nest level. Such analysis, however, has often been labeled as "heroic effort" and, if at all, is supported only in its simplest and most conservative form in industrial compilers.
Related implementations in the functional context typically apply fusion only when the to-be-fused producer is used exactly once, i.e., in the consumer. This guarantees that the transformation is conservative: the resulting program does not duplicate computation.
We show that the above restriction is more conservative than needed, and present a structural-analysis technique, inspired from the T1--T2 transformation for reducible data flow, that enables fusion even in some cases when the producer is used in different consumers and without duplicating computation.
We report an implementation of the fusion algorithm for a functional-core language, named L0, which is intended to support nested parallelism across regular multi-dimensional arrays. We succinctly describe L0's semantics and the compiler infrastructure on which the fusion transformation relies, and present compiler-generated statistics related to fusion on a set of six benchmarks.
In functional languages, fusion is naturally and relatively easily derived as a producer-consumer relation between program constructs that expose a richer, higher-order algebra of program invariants, such as the map-reduce list homomorphisms.
In imperative languages, fusing producer-consumer loops requires dependency analysis on arrays applied at loop-nest level. Such analysis, however, has often been labeled as "heroic effort" and, if at all, is supported only in its simplest and most conservative form in industrial compilers.
Related implementations in the functional context typically apply fusion only when the to-be-fused producer is used exactly once, i.e., in the consumer. This guarantees that the transformation is conservative: the resulting program does not duplicate computation.
We show that the above restriction is more conservative than needed, and present a structural-analysis technique, inspired from the T1--T2 transformation for reducible data flow, that enables fusion even in some cases when the producer is used in different consumers and without duplicating computation.
We report an implementation of the fusion algorithm for a functional-core language, named L0, which is intended to support nested parallelism across regular multi-dimensional arrays. We succinctly describe L0's semantics and the compiler infrastructure on which the fusion transformation relies, and present compiler-generated statistics related to fusion on a set of six benchmarks.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC'13) |
Number of pages | 12 |
Publisher | Association for Computing Machinery |
Publication date | 2013 |
Pages | 47-58 |
ISBN (Print) | 978-1-4503-2381-9 |
DOIs | |
Publication status | Published - 2013 |
Event | ACM SIGPLAN workshop on Functional high-performance computing - Boston, United States Duration: 25 Sept 2013 → 27 Sept 2013 Conference number: 2 |
Conference
Conference | ACM SIGPLAN workshop on Functional high-performance computing |
---|---|
Number | 2 |
Country/Territory | United States |
City | Boston |
Period | 25/09/2013 → 27/09/2013 |