Optimizing relational algebra operations using generic equivalence discriminators and lazy products

Fritz Henglein

doi:10.1145/1706356.1706372

Optimizing relational algebra operations using generic equivalence discriminators and lazy products

Fritz Henglein

Department of Computer Science

4 Citations (Scopus)

Abstract

We show how to efficiently evaluate generic map-filter-product queries, generalizations of select-project-join (SPJ) queries in re- lational algebra, based on a combination of two novel techniques: generic discrimination-based joins and lazy (formal) products.
Discrimination-based joins are based on the notion of (equiv- alence) discriminator. A discriminator partitions a list of values according to a user-specified equivalence relation on keys the val- ues are associated with. Equivalence relations can be specified in an expressive embedded language for denoting equivalence rela- tions. We show that discriminators can be constructed generically (by structural recursion on equivalence expressions), purely func- tionally, and efficiently (worst-case linear time). The array-based basic multiset discrimination algorithm of Cai and Paige (1995) provides a base discriminator that is both asymptotically and prac- tically efficient. In contrast to hashing, discrimination is fully ab- stract (only depends on which equivalences hold on its inputs), and in contrast to comparison-based sorting, it does not require an or- dering relation on its inputs. In particular, it is applicable to ref- erences (pointers). Furthermore, it has better asymptotic computa- tional complexity than both sorting and hashing.
We represent cross-products and unions lazily (symbolically) as formal products of the argument sets (relations). This allows the selection operation to recognize on the fly whenever it is applied to a cross-product and invoke an efficient equijoin implementation. In particular, queries can still be formulated naively, using filter, map and product without an explicit join operation, yet garner the advantages of efficient join-algorithms during evaluation.
The techniques subsume many of the optimization techniques based on relational algebra equalities, without need for a query preprocessing phase. They require no indexes and behave purely functionally. They can be considered a form of symbolic execution of set expressions that automate and encapsulate dynamic program transformation of such expressions and lead to asymptotic perfor- mance improvements over naive execution in many cases.

Original language	English
Title of host publication	Proceedings of the 2010 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation
Number of pages	10
Publisher	Association for Computing Machinery
Publication date	2010
Pages	73-82
ISBN (Print)	978-1-60558-727-1
DOIs	https://doi.org/10.1145/1706356.1706372
Publication status	Published - 2010
Event	2010 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation - Madrid, Spain Duration: 18 Jan 2010 → 19 Jan 2010

Conference

Conference	2010 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation
Country/Territory	Spain
City	Madrid
Period	18/01/2010 → 19/01/2010

Access to Document

10.1145/1706356.1706372

Cite this

Optimizing relational algebra operations using generic equivalence discriminators and lazy products. / Henglein, Fritz.
Proceedings of the 2010 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation. Association for Computing Machinery, 2010. p. 73-82.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Henglein, F 2010, Optimizing relational algebra operations using generic equivalence discriminators and lazy products. in Proceedings of the 2010 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation. Association for Computing Machinery, pp. 73-82, 2010 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, Madrid, Spain, 18/01/2010. https://doi.org/10.1145/1706356.1706372

@inproceedings{529b008b9d11425f81ea854bfdda6f66,

title = "Optimizing relational algebra operations using generic equivalence discriminators and lazy products",

abstract = "We show how to efficiently evaluate generic map-filter-product queries, generalizations of select-project-join (SPJ) queries in re- lational algebra, based on a combination of two novel techniques: generic discrimination-based joins and lazy (formal) products.Discrimination-based joins are based on the notion of (equiv- alence) discriminator. A discriminator partitions a list of values according to a user-specified equivalence relation on keys the val- ues are associated with. Equivalence relations can be specified in an expressive embedded language for denoting equivalence rela- tions. We show that discriminators can be constructed generically (by structural recursion on equivalence expressions), purely func- tionally, and efficiently (worst-case linear time). The array-based basic multiset discrimination algorithm of Cai and Paige (1995) provides a base discriminator that is both asymptotically and prac- tically efficient. In contrast to hashing, discrimination is fully ab- stract (only depends on which equivalences hold on its inputs), and in contrast to comparison-based sorting, it does not require an or- dering relation on its inputs. In particular, it is applicable to ref- erences (pointers). Furthermore, it has better asymptotic computa- tional complexity than both sorting and hashing.We represent cross-products and unions lazily (symbolically) as formal products of the argument sets (relations). This allows the selection operation to recognize on the fly whenever it is applied to a cross-product and invoke an efficient equijoin implementation. In particular, queries can still be formulated naively, using filter, map and product without an explicit join operation, yet garner the advantages of efficient join-algorithms during evaluation.The techniques subsume many of the optimization techniques based on relational algebra equalities, without need for a query preprocessing phase. They require no indexes and behave purely functionally. They can be considered a form of symbolic execution of set expressions that automate and encapsulate dynamic program transformation of such expressions and lead to asymptotic perfor- mance improvements over naive execution in many cases.",

author = "Fritz Henglein",

year = "2010",

doi = "10.1145/1706356.1706372",

language = "English",

isbn = "978-1-60558-727-1",

pages = "73--82",

booktitle = "Proceedings of the 2010 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation",

publisher = "Association for Computing Machinery",

note = "2010 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation ; Conference date: 18-01-2010 Through 19-01-2010",

}

TY - GEN

T1 - Optimizing relational algebra operations using generic equivalence discriminators and lazy products

AU - Henglein, Fritz

PY - 2010

Y1 - 2010

N2 - We show how to efficiently evaluate generic map-filter-product queries, generalizations of select-project-join (SPJ) queries in re- lational algebra, based on a combination of two novel techniques: generic discrimination-based joins and lazy (formal) products.Discrimination-based joins are based on the notion of (equiv- alence) discriminator. A discriminator partitions a list of values according to a user-specified equivalence relation on keys the val- ues are associated with. Equivalence relations can be specified in an expressive embedded language for denoting equivalence rela- tions. We show that discriminators can be constructed generically (by structural recursion on equivalence expressions), purely func- tionally, and efficiently (worst-case linear time). The array-based basic multiset discrimination algorithm of Cai and Paige (1995) provides a base discriminator that is both asymptotically and prac- tically efficient. In contrast to hashing, discrimination is fully ab- stract (only depends on which equivalences hold on its inputs), and in contrast to comparison-based sorting, it does not require an or- dering relation on its inputs. In particular, it is applicable to ref- erences (pointers). Furthermore, it has better asymptotic computa- tional complexity than both sorting and hashing.We represent cross-products and unions lazily (symbolically) as formal products of the argument sets (relations). This allows the selection operation to recognize on the fly whenever it is applied to a cross-product and invoke an efficient equijoin implementation. In particular, queries can still be formulated naively, using filter, map and product without an explicit join operation, yet garner the advantages of efficient join-algorithms during evaluation.The techniques subsume many of the optimization techniques based on relational algebra equalities, without need for a query preprocessing phase. They require no indexes and behave purely functionally. They can be considered a form of symbolic execution of set expressions that automate and encapsulate dynamic program transformation of such expressions and lead to asymptotic perfor- mance improvements over naive execution in many cases.

AB - We show how to efficiently evaluate generic map-filter-product queries, generalizations of select-project-join (SPJ) queries in re- lational algebra, based on a combination of two novel techniques: generic discrimination-based joins and lazy (formal) products.Discrimination-based joins are based on the notion of (equiv- alence) discriminator. A discriminator partitions a list of values according to a user-specified equivalence relation on keys the val- ues are associated with. Equivalence relations can be specified in an expressive embedded language for denoting equivalence rela- tions. We show that discriminators can be constructed generically (by structural recursion on equivalence expressions), purely func- tionally, and efficiently (worst-case linear time). The array-based basic multiset discrimination algorithm of Cai and Paige (1995) provides a base discriminator that is both asymptotically and prac- tically efficient. In contrast to hashing, discrimination is fully ab- stract (only depends on which equivalences hold on its inputs), and in contrast to comparison-based sorting, it does not require an or- dering relation on its inputs. In particular, it is applicable to ref- erences (pointers). Furthermore, it has better asymptotic computa- tional complexity than both sorting and hashing.We represent cross-products and unions lazily (symbolically) as formal products of the argument sets (relations). This allows the selection operation to recognize on the fly whenever it is applied to a cross-product and invoke an efficient equijoin implementation. In particular, queries can still be formulated naively, using filter, map and product without an explicit join operation, yet garner the advantages of efficient join-algorithms during evaluation.The techniques subsume many of the optimization techniques based on relational algebra equalities, without need for a query preprocessing phase. They require no indexes and behave purely functionally. They can be considered a form of symbolic execution of set expressions that automate and encapsulate dynamic program transformation of such expressions and lead to asymptotic perfor- mance improvements over naive execution in many cases.

U2 - 10.1145/1706356.1706372

DO - 10.1145/1706356.1706372

M3 - Article in proceedings

SN - 978-1-60558-727-1

SP - 73

EP - 82

BT - Proceedings of the 2010 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation

PB - Association for Computing Machinery

T2 - 2010 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation

Y2 - 18 January 2010 through 19 January 2010

ER -

Optimizing relational algebra operations using generic equivalence discriminators and lazy products

Abstract

Conference

Access to Document

Fingerprint

Cite this