Parallel SPARQL query optimization

Buwen Wu; Yongluan Zhou; Hai Jin; Amol Deshpande

doi:10.1109/ICDE.2017.110

Parallel SPARQL query optimization

Buwen Wu, Yongluan Zhou, Hai Jin, Amol Deshpande

3 Citationer (Scopus)

64 Downloads (Pure)

Abstract

Existing parallel SPARQL query optimizers assume hash-based data partitioning and adopt plan enumeration algorithms with unnecessarily high complexity. Therefore, they cannot easily accommodate other partitioning methods and only consider an unnecessarily limited plan space. To address these problems, we first define a generic RDF data partitioning model to capture the common structure of various state-of-The-Art RDF data partitioning methods. Then we propose a query plan enumeration algorithm that not only has an optimal efficiency, but also accommodates different data partitioning methods. Furthermore, based on a solid analysis of the complexity of the plan enumeration algorithm, we propose two new heuristic methods that can consider a much larger plan space than the existing methods, and at the same time can still confine the search space of the algorithm. An autonomous approach is proposed to choose one of the two methods by considering the structure and the size of a complex SPARQL query. We conduct extensive experiments using synthetic and a real-world dataset, which show the superiority of our algorithms in comparing to existing ones.

Originalsprog	Engelsk
Titel	Proceedings of the 33rd IEEE International Conference on Data Engineering (ICDE)
Antal sider	12
Forlag	IEEE Press
Publikationsdato	16 maj 2017
Sider	547-558
ISBN (Trykt)	978-1-5090-6544-8
ISBN (Elektronisk)	978-1-5090-6543-1
DOI	https://doi.org/10.1109/ICDE.2017.110
Status	Udgivet - 16 maj 2017
Udgivet eksternt	Ja
Begivenhed	33rd IEEE International Conference on Data Engineering - San Diego, USA Varighed: 19 apr. 2017 → 22 apr. 2017 Konferencens nummer: 33

Konference

Konference	33rd IEEE International Conference on Data Engineering
Nummer	33
Land/Område	USA
By	San Diego
Periode	19/04/2017 → 22/04/2017

Adgang til dokumentet

10.1109/ICDE.2017.110

ICDE2017

Citationsformater

@inproceedings{b8fd5c39b34b4d90b549ba362ca12f09,

title = "Parallel SPARQL query optimization",

abstract = "Existing parallel SPARQL query optimizers assume hash-based data partitioning and adopt plan enumeration algorithms with unnecessarily high complexity. Therefore, they cannot easily accommodate other partitioning methods and only consider an unnecessarily limited plan space. To address these problems, we first define a generic RDF data partitioning model to capture the common structure of various state-of-The-Art RDF data partitioning methods. Then we propose a query plan enumeration algorithm that not only has an optimal efficiency, but also accommodates different data partitioning methods. Furthermore, based on a solid analysis of the complexity of the plan enumeration algorithm, we propose two new heuristic methods that can consider a much larger plan space than the existing methods, and at the same time can still confine the search space of the algorithm. An autonomous approach is proposed to choose one of the two methods by considering the structure and the size of a complex SPARQL query. We conduct extensive experiments using synthetic and a real-world dataset, which show the superiority of our algorithms in comparing to existing ones.",

author = "Buwen Wu and Yongluan Zhou and Hai Jin and Amol Deshpande",

year = "2017",

month = may,

day = "16",

doi = "10.1109/ICDE.2017.110",

language = "English",

isbn = "978-1-5090-6544-8",

pages = "547--558",

booktitle = "Proceedings of the 33rd IEEE International Conference on Data Engineering (ICDE)",

publisher = "IEEE Press",

note = "33rd IEEE International Conference on Data Engineering, ICDE 2017 ; Conference date: 19-04-2017 Through 22-04-2017",

}

TY - GEN

T1 - Parallel SPARQL query optimization

AU - Wu, Buwen

AU - Zhou, Yongluan

AU - Jin, Hai

AU - Deshpande, Amol

N1 - Conference code: 33

PY - 2017/5/16

Y1 - 2017/5/16

N2 - Existing parallel SPARQL query optimizers assume hash-based data partitioning and adopt plan enumeration algorithms with unnecessarily high complexity. Therefore, they cannot easily accommodate other partitioning methods and only consider an unnecessarily limited plan space. To address these problems, we first define a generic RDF data partitioning model to capture the common structure of various state-of-The-Art RDF data partitioning methods. Then we propose a query plan enumeration algorithm that not only has an optimal efficiency, but also accommodates different data partitioning methods. Furthermore, based on a solid analysis of the complexity of the plan enumeration algorithm, we propose two new heuristic methods that can consider a much larger plan space than the existing methods, and at the same time can still confine the search space of the algorithm. An autonomous approach is proposed to choose one of the two methods by considering the structure and the size of a complex SPARQL query. We conduct extensive experiments using synthetic and a real-world dataset, which show the superiority of our algorithms in comparing to existing ones.

AB - Existing parallel SPARQL query optimizers assume hash-based data partitioning and adopt plan enumeration algorithms with unnecessarily high complexity. Therefore, they cannot easily accommodate other partitioning methods and only consider an unnecessarily limited plan space. To address these problems, we first define a generic RDF data partitioning model to capture the common structure of various state-of-The-Art RDF data partitioning methods. Then we propose a query plan enumeration algorithm that not only has an optimal efficiency, but also accommodates different data partitioning methods. Furthermore, based on a solid analysis of the complexity of the plan enumeration algorithm, we propose two new heuristic methods that can consider a much larger plan space than the existing methods, and at the same time can still confine the search space of the algorithm. An autonomous approach is proposed to choose one of the two methods by considering the structure and the size of a complex SPARQL query. We conduct extensive experiments using synthetic and a real-world dataset, which show the superiority of our algorithms in comparing to existing ones.

U2 - 10.1109/ICDE.2017.110

DO - 10.1109/ICDE.2017.110

M3 - Article in proceedings

SN - 978-1-5090-6544-8

SP - 547

EP - 558

BT - Proceedings of the 33rd IEEE International Conference on Data Engineering (ICDE)

PB - IEEE Press

T2 - 33rd IEEE International Conference on Data Engineering

Y2 - 19 April 2017 through 22 April 2017

ER -

Parallel SPARQL query optimization

Abstract

Konference

Adgang til dokumentet

Fingeraftryk

Citationsformater