Abstract
This paper presents GMP, a library for generic, SQL-style programming with multisets. It generalizes the querying core of SQL in a number of ways: Multisets may contain elements of arbitrary first-order data types, including references (pointers), recur- sive data types and nested multisets; it contains an expressive embedded domain-specific language for specifying user-definable equivalence and ordering relations, extending the built-in equality and inequality predicates; it admits mapping arbitrary functions over mul- tisets, not just projections; it supports user-defined predicates in selections; and it allows user-defined aggregation functions.
Most significantly, it avoids many cases of asymptotically inefficient nested iteration through Cartesian products that occur in a straightforward stream-based implementation of multisets. It accomplishes this by employing two novel techniques: symbolic (term) repre- sentations of multisets, specifically for Cartesian products, for facilitating dynamic symbolic computation, which intersperses algebraic simplification steps with conventional data pro- cessing; and discrimination-based joins, a generic technique for computing equijoins based on equivalence discriminators, as an alternative to hash-based and sort-merge joins.
Full source code for GMP in Haskell, which is based on generic top-down discrimina- tion (not included), is included for experimentation. We provide illustrative examples whose performance indicates that GMP, even without requisite algorithm and data structure engi- neering, is a realistic alternative to SQL even for SQL-expressible queries.
Most significantly, it avoids many cases of asymptotically inefficient nested iteration through Cartesian products that occur in a straightforward stream-based implementation of multisets. It accomplishes this by employing two novel techniques: symbolic (term) repre- sentations of multisets, specifically for Cartesian products, for facilitating dynamic symbolic computation, which intersperses algebraic simplification steps with conventional data pro- cessing; and discrimination-based joins, a generic technique for computing equijoins based on equivalence discriminators, as an alternative to hash-based and sort-merge joins.
Full source code for GMP in Haskell, which is based on generic top-down discrimina- tion (not included), is included for experimentation. We provide illustrative examples whose performance indicates that GMP, even without requisite algorithm and data structure engi- neering, is a realistic alternative to SQL even for SQL-expressible queries.
Original language | English |
---|---|
Journal | Higher-Order and Symbolic Computation |
Volume | 23 |
Issue number | 3 |
Pages (from-to) | 337-370 |
Number of pages | 34 |
ISSN | 1388-3690 |
DOIs | |
Publication status | Published - Sept 2010 |