On optimizing relational self-joins

Yu Cao, Yongluan Zhou, Chee Yong Chan, Kian-Lee Tan

1 Citation (Scopus)

Abstract

Self-join, which joins a relation with itself, is a prevalent operation in relational database systems. Despite its wide applicability, there has been little attention devoted to improving its performance. In this paper, we present SCALE (Sort for Clustered Access with Lazy Evaluation), an efficient self-join algorithm, which takes advantage of the fact that both inputs of a self-join operation are instances of the same relation. SCALE first sorts the relation on one join attribute, say R. A. In this way, for every value of the other join attribute, say R. B, its matching R. A tuples are essentially clustered. As SCALE scans the sorted relation, each tuple is joined with its matching tuples co-existing in memory. For tuples where full-range clustered accesses to their matching tuples are not possible, they are buffered and the unfinished part of join processing deferred. Such lazy evaluation minimizes the need for "random" access to the matching tuples. SCALE further optimizes the memory allocation for clustered access and lazy evaluation to keep the processing cost minimal. Our analytical study shows that SCALE degenerates gracefully to a Sort-Merge Join in the worst case. We have also implemented SCALE in PostgreSQL, and results of our extensive experimental study show that it outperforms both Sort-Merge Join and Hybrid Hash Join by a wide margin in (almost) all cases.

Original languageEnglish
Title of host publicationProceedings of the 15th International Conference on Extending Database Technology
Number of pages12
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Publication date2012
Pages120-131
ISBN (Electronic)978-1-4503-0790-1
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event15th International Conference on Extending Database Technology - Berlin, Germany
Duration: 27 Mar 201230 Mar 2012
Conference number: 15

Conference

Conference15th International Conference on Extending Database Technology
Number15
Country/TerritoryGermany
CityBerlin
Period27/03/201230/03/2012

Fingerprint

Dive into the research topics of 'On optimizing relational self-joins'. Together they form a unique fingerprint.

Cite this