SemStore: a semantic-preserving distributed RDF triple store

Buwen Wu; Yongluan Zhou; Pingpeng Yuan; Hai Jin; Ling Liu

doi:10.1145/2661829.2661876

SemStore: a semantic-preserving distributed RDF triple store

Buwen Wu, Yongluan Zhou, Pingpeng Yuan, Hai Jin, Ling Liu

28 Citationer (Scopus)

Abstract

The flexibility of the RDF data model has attracted an increasing number of organizations to store their data in an RDF format. With the rapid growth of RDF datasets, we envision that it is inevitable to deploy a cluster of computing nodes to process large-scale RDF data in order to deliver desirable query performance. In this paper, we address the challenging problems of data partitioning and query optimization in a scale-out RDF engine. We identify that existing approaches only focus on using fine-grained structural information for data partitioning, and hence fail to localize many types of complex queries. We then propose a radically different approach, where a coarse-grained structure, namely Rooted Sub-Graph (RSG), is used as the partition unit. By doing so, we can capture structural information at a much greater scale and hence are able to localize many complex queries. We also propose a k-means partitioning algorithm for allocating the RSGs onto the computing nodes as well as a query optimization strategy to minimize the inter-node communication during query processing. An extensive experimental study using benchmark datasets and real dataset shows that our engine, SemStore, outperforms existing systems by orders of magnitudes in terms of query response time.

Originalsprog	Engelsk
Titel	Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
Antal sider	10
Forlag	Association for Computing Machinery
Publikationsdato	3 nov. 2014
Sider	509-518
ISBN (Elektronisk)	978-1-4503-2598-1
DOI	https://doi.org/10.1145/2661829.2661876
Status	Udgivet - 3 nov. 2014
Udgivet eksternt	Ja
Begivenhed	23rd ACM International Conference on Conference on Information and Knowledge Management - Shanghai, Kina Varighed: 3 nov. 2014 → 7 nov. 2014

Konference

Konference	23rd ACM International Conference on Conference on Information and Knowledge Management
Land/Område	Kina
By	Shanghai
Periode	03/11/2014 → 07/11/2014

Adgang til dokumentet

10.1145/2661829.2661876

Citationsformater

SemStore: a semantic-preserving distributed RDF triple store. / Wu, Buwen; Zhou, Yongluan; Yuan, Pingpeng et al.
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. Association for Computing Machinery, 2014. s. 509-518.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › peer review

Wu, B, Zhou, Y, Yuan, P, Jin, H & Liu, L 2014, SemStore: a semantic-preserving distributed RDF triple store. i Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. Association for Computing Machinery, s. 509-518, 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai, Kina, 03/11/2014. https://doi.org/10.1145/2661829.2661876

@inproceedings{940b3d9df43641ad9cc8cb59de4d2c43,

title = "SemStore: a semantic-preserving distributed RDF triple store",

abstract = "The flexibility of the RDF data model has attracted an increasing number of organizations to store their data in an RDF format. With the rapid growth of RDF datasets, we envision that it is inevitable to deploy a cluster of computing nodes to process large-scale RDF data in order to deliver desirable query performance. In this paper, we address the challenging problems of data partitioning and query optimization in a scale-out RDF engine. We identify that existing approaches only focus on using fine-grained structural information for data partitioning, and hence fail to localize many types of complex queries. We then propose a radically different approach, where a coarse-grained structure, namely Rooted Sub-Graph (RSG), is used as the partition unit. By doing so, we can capture structural information at a much greater scale and hence are able to localize many complex queries. We also propose a k-means partitioning algorithm for allocating the RSGs onto the computing nodes as well as a query optimization strategy to minimize the inter-node communication during query processing. An extensive experimental study using benchmark datasets and real dataset shows that our engine, SemStore, outperforms existing systems by orders of magnitudes in terms of query response time.",

author = "Buwen Wu and Yongluan Zhou and Pingpeng Yuan and Hai Jin and Ling Liu",

year = "2014",

month = nov,

day = "3",

doi = "10.1145/2661829.2661876",

language = "English",

pages = "509--518",

booktitle = "Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management",

publisher = "Association for Computing Machinery",

note = "23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM ; Conference date: 03-11-2014 Through 07-11-2014",

}

TY - GEN

T1 - SemStore

T2 - 23rd ACM International Conference on Conference on Information and Knowledge Management

AU - Wu, Buwen

AU - Zhou, Yongluan

AU - Yuan, Pingpeng

AU - Jin, Hai

AU - Liu, Ling

PY - 2014/11/3

Y1 - 2014/11/3

N2 - The flexibility of the RDF data model has attracted an increasing number of organizations to store their data in an RDF format. With the rapid growth of RDF datasets, we envision that it is inevitable to deploy a cluster of computing nodes to process large-scale RDF data in order to deliver desirable query performance. In this paper, we address the challenging problems of data partitioning and query optimization in a scale-out RDF engine. We identify that existing approaches only focus on using fine-grained structural information for data partitioning, and hence fail to localize many types of complex queries. We then propose a radically different approach, where a coarse-grained structure, namely Rooted Sub-Graph (RSG), is used as the partition unit. By doing so, we can capture structural information at a much greater scale and hence are able to localize many complex queries. We also propose a k-means partitioning algorithm for allocating the RSGs onto the computing nodes as well as a query optimization strategy to minimize the inter-node communication during query processing. An extensive experimental study using benchmark datasets and real dataset shows that our engine, SemStore, outperforms existing systems by orders of magnitudes in terms of query response time.

AB - The flexibility of the RDF data model has attracted an increasing number of organizations to store their data in an RDF format. With the rapid growth of RDF datasets, we envision that it is inevitable to deploy a cluster of computing nodes to process large-scale RDF data in order to deliver desirable query performance. In this paper, we address the challenging problems of data partitioning and query optimization in a scale-out RDF engine. We identify that existing approaches only focus on using fine-grained structural information for data partitioning, and hence fail to localize many types of complex queries. We then propose a radically different approach, where a coarse-grained structure, namely Rooted Sub-Graph (RSG), is used as the partition unit. By doing so, we can capture structural information at a much greater scale and hence are able to localize many complex queries. We also propose a k-means partitioning algorithm for allocating the RSGs onto the computing nodes as well as a query optimization strategy to minimize the inter-node communication during query processing. An extensive experimental study using benchmark datasets and real dataset shows that our engine, SemStore, outperforms existing systems by orders of magnitudes in terms of query response time.

U2 - 10.1145/2661829.2661876

DO - 10.1145/2661829.2661876

M3 - Article in proceedings

SP - 509

EP - 518

BT - Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

PB - Association for Computing Machinery

Y2 - 3 November 2014 through 7 November 2014

ER -

SemStore: a semantic-preserving distributed RDF triple store

Abstract

Konference

Adgang til dokumentet

Fingeraftryk

Citationsformater