DIAERESIS: Knowledge Graph Partitioning for Efficient Query Answering
Troullinou, Georgia; Stefanidis, Kostas; Plexousakis, Dimitris; Kondylakis, Haridimos (2024-05)
Troullinou, Georgia
Stefanidis, Kostas
Plexousakis, Dimitris
Kondylakis, Haridimos
05 / 2024
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202407307801
https://urn.fi/URN:NBN:fi:tuni-202407307801
Kuvaus
Peer reviewed
Tiivistelmä
<p>The rapid explosion of linked data demands effective and efficient storage, management, and querying methods. Apache Spark is one of the most widely used engines for big data processing, with more and more systems adopting it for efficient query answering. Existing approaches, exploiting Spark for querying RDF data, adopt partitioning techniques for reducing the data that need to be accessed in order to improve efficiency. However, simplistic methods for data partitioning fail to minimize data access at query answering and effectively improve query efficiency. In this demonstration, we present DIAERESIS, a novel platform that exploits a summary-based partitioning strategy achieving a significant improvement in minimizing data access and as such improving query-answering efficiency. DIAERESIS first identifies the top-k most important schema nodes and distributes the other schema nodes to the centroid they mostly depend on. Then, it allocates the corresponding instance nodes to the schema nodes they are instantiated under, creating vertical sub-partitions and indexes. We allow conference participants to actively identify the impact of our partitioning methodology on data distribution and replication, data accessed for query answering, and query answering efficiency. Further, we contrast our approach with existing partitioning approaches adopted by state-of-the-art systems in the domain, providing a deep understanding of the challenges in the area.</p>
Kokoelmat
- TUNICRIS-julkaisut [20143]