Abstract
The flexibility of the RDF data model has attracted an increasing number of organizations to store their data in an RDF format. With the rapid growth of RDF datasets, we envision that it is inevitable to deploy a cluster of computing nodes to process large-scale RDF data in order to deliver desirable query performance. In this paper, we address the challenging problems of data partitioning and query optimization in a scale-out RDF engine. We identify that existing approaches only focus on using fine-grained structural information for data partitioning, and hence fail to localize many types of complex queries. We then propose a radically different approach, where a coarse-grained structure, namely Rooted Sub-Graph (RSG), is used as the partition unit. By doing so, we can capture structural information at a much greater scale and hence are able to localize many complex queries. We also propose a k-means partitioning algorithm for allocating the RSGs onto the computing nodes as well as a query optimization strategy to minimize the inter-node communication during query processing. An extensive experimental study using benchmark datasets and real dataset shows that our engine, SemStore, outperforms existing systems by orders of magnitudes in terms of query response time.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management |
Antal sider | 10 |
Forlag | Association for Computing Machinery |
Publikationsdato | 3 nov. 2014 |
Sider | 509-518 |
ISBN (Elektronisk) | 978-1-4503-2598-1 |
DOI | |
Status | Udgivet - 3 nov. 2014 |
Udgivet eksternt | Ja |
Begivenhed | 23rd ACM International Conference on Conference on Information and Knowledge Management - Shanghai, Kina Varighed: 3 nov. 2014 → 7 nov. 2014 |
Konference
Konference | 23rd ACM International Conference on Conference on Information and Knowledge Management |
---|---|
Land/Område | Kina |
By | Shanghai |
Periode | 03/11/2014 → 07/11/2014 |