Abstract
The flexibility of the RDF data model has attracted an increasing number of organizations to store their data in an RDF format. With the rapid growth of RDF datasets, we envision that it is inevitable to deploy a cluster of computing nodes to process large-scale RDF data in order to deliver desirable query performance. In this paper, we address the challenging problems of data partitioning and query optimization in a scale-out RDF engine. We identify that existing approaches only focus on using fine-grained structural information for data partitioning, and hence fail to localize many types of complex queries. We then propose a radically different approach, where a coarse-grained structure, namely Rooted Sub-Graph (RSG), is used as the partition unit. By doing so, we can capture structural information at a much greater scale and hence are able to localize many complex queries. We also propose a k-means partitioning algorithm for allocating the RSGs onto the computing nodes as well as a query optimization strategy to minimize the inter-node communication during query processing. An extensive experimental study using benchmark datasets and real dataset shows that our engine, SemStore, outperforms existing systems by orders of magnitudes in terms of query response time.
Original language | English |
---|---|
Title of host publication | Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management |
Number of pages | 10 |
Publisher | Association for Computing Machinery |
Publication date | 3 Nov 2014 |
Pages | 509-518 |
ISBN (Electronic) | 978-1-4503-2598-1 |
DOIs | |
Publication status | Published - 3 Nov 2014 |
Externally published | Yes |
Event | 23rd ACM International Conference on Conference on Information and Knowledge Management - Shanghai, China Duration: 3 Nov 2014 → 7 Nov 2014 |
Conference
Conference | 23rd ACM International Conference on Conference on Information and Knowledge Management |
---|---|
Country/Territory | China |
City | Shanghai |
Period | 03/11/2014 → 07/11/2014 |