Skip to content

Latest commit

 

History

History
305 lines (255 loc) · 10.9 KB

README.md

File metadata and controls

305 lines (255 loc) · 10.9 KB

gremlinator

Gremlinator documentation

This document provide a reference documentation for the SPARQL-Gremlin transpiler, aka Gremlinator, which is a compiler used to transform SPARQL queries into Gremlin traversals. It is based on the Apache Jena SPARQL processor ARQ, which provides access to a syntax tree of a SPARQL query.

The current version of SPARQL-Gremlin only uses a subset of the features provided by Apache Jena. The examples below show each implemented feature.

Table of contents

  1. Introduction
    1. Goal
    2. Supported Queries
    3. Limitations
  2. Usage
    1. Console Application
    2. Gremlin Shell Plugin
      1. Prefixes
  3. Examples
  4. Future work
  5. Acknowledgements

Introduction

This is an continuous effort towards enabling automatic support for executing SPARQL queries over Graph systems via Gremlin query language. This is achieved by converting SPARQL queries to Gremlin pattern matching traversals.

Goal

The goal of this work is to bridge the query interoperability gap between the two famous, yet fairly disconnected, graph communities: Semantic Web (which relies on the RDF data model) and Graph database (which relies on Property graphs).

The Gremlinator work is a sub-task of a bigger goal: LITMUS, an open extensible framework for benchmarking diverse data management solutions. Further information can be obtained from the following resources:

  1. Proposal - ESWC 2017 Ph.D. Symposium
  2. Publication - Semantics 2017 R&D paper (best research & innovation paper award)
  3. First working prototype - Docker

The foundational research work on Gremlinator can be found from - Gremlinator full paper. In this paper, we present and discuss the notion of graph query language semantics of SPARQL and Gremlin, and a formal mapping between SPARQL pattern matching graph patterns and Gremlin traversals. Furthermore, we point the interested reader to the following resourcesfor a better understanding:

  1. Gremlinator demonstration - (Public Demo Mirror 1) and (Public Demo Mirror 2)
  2. A short video tutorial on how to use the demonstration - Video tutorial

Supported Queries

Gremlinator is currently an on-going effort with an aim to cover the entire SPARQL 1.1 query spectrum, however we currently support translation of the SPARQL 1.0 specification, especially SELECT queries. The supported SPARQL query types are:

  • Union
  • Optional
  • Order-By
  • Group-By
  • STAR-shaped or neighbourhood queries
  • Query modifiers, such as:
    • Filter with restrictions
    • Count
    • LIMIT
    • OFFSET

Limitations (*Fixed)

The current implementation of Gremlinator (i.e. SPARQL-Gremlin) does not support the following:

  • SPARQL queries with variables in the predicate position are not currently covered, with an exception of the following case:
SELECT ?x WHERE { ?x ?y ?z . }
  • A SPARQL Union query with un-balanced patterns, i.e. a gremlin union traversal can only be generated if the unput SPARQL query has the same number of patterns on both the side of the union operator. For instance, the following SPARQL query cannot be mapped using Gremlinator, since a union is executed between different number of graph patterns (two patterns union 1 pattern).
SELECT * WHERE {
  {?person e:created ?software .
  ?person v:name "daniel" .}
  UNION
  {?software v:lang "java" .}
}

Usage

Console Application

The project contains a console application that can be used to compile SPARQL queries and evaluate the resulting Gremlin traversals. For usage examples simply run ${PROJECT_HOME}/bin/sparql-gremlin.sh.

Gremlin Shell Plugin

To use Gremlin-SPARQL as a Gremlin shell plugin, run the following commands (be sure sparql-gremlin-xyz.jar is in the classpath):

gremlin> :install com.datastax sparql-gremlin 0.1
==>Loaded: [com.datastax, sparql-gremlin, 0.1]
gremlin> :plugin use datastax.sparql
==>datastax.sparql activated

Once the plugin is installed and activated, establish a remote connection to execute SPARQL queries:

gremlin> :remote connect datastax.sparql graph
==>SPARQL[graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]]
gremlin> :> SELECT ?name ?age WHERE { ?person v:name ?name . ?person v:age ?age }
==>[name:marko, age:29]
==>[name:vadas, age:27]
==>[name:josh, age:32]
==>[name:peter, age:35]

Note that the sparql-gremlin 0.1 is a legacy plugin and will be replaced by the new updated sparql-gremlin 0.2 plugin, once successfully tested. sparql-gremlin 0.1 does not support the SPARQL features described in the documentation, rather only basic graph patterns.

Prefixes

SPARQL-Gremlin supports the following prefixes to traverse the graph:

Prefix Purpose
v:<label> label-access traversal
e:<label> out-edge traversal
p:<name> property traversal
v:<name> property-value traversal

Note that element IDs and labels are treated like normal properties, hence they can be accessed using the same pattern:

SELECT ?name ?id ?label WHERE { ?element v:name ?name . ?element v:id ?id . ?element v:label ?label }

Examples

In this section, we present comprehensive examples of SPARQL queries that are currently supported by Gremlinator.

Select All

Select all vertices in the graph.

SELECT * WHERE { }

Match Constant Values

Select all vertices with the label person.

SELECT * WHERE {  ?person v:label "person" .
}

Select Specific Elements

Select the values of the properties name and age for each person vertex.

SELECT ?name ?age
WHERE {
  ?person v:label "person" .
  ?person v:name ?name .
  ?person v:age ?age .
}

Pattern Matching

Select only those persons who created a project.

SELECT ?name ?age
WHERE {
  ?person v:label "person" .
  ?person v:name ?name .
  ?person v:age ?age .
  ?person e:created ?project .
}

Filtering

Select only those persons who are older than 30.

SELECT ?name ?age
WHERE {
  ?person v:label "person" .
  ?person v:name ?name .
  ?person v:age ?age .
  ?person e:created ?project .
    FILTER (?age > 30)
}

Deduplication

Select the distinct names of the created projects.

SELECT DISTINCT ?name
WHERE {
  ?person v:label "person" .
  ?person e:created ?project .
  ?project v:name ?name .
    FILTER (?age > 30)
}

Multiple Filters

Select the distinct names of all Java projects.

SELECT DISTINCT ?name
WHERE {
  ?person v:label "person" .
  ?person v:age ?age .
  ?person e:created ?project .
  ?project v:name ?name .
  ?project v:lang ?lang .
    FILTER (?age > 30 && ?lang == "java")
}

Pattern Filter(s)

A different way to filter all person who created a project.

SELECT ?name
WHERE {
  ?person v:label "person" .
  ?person v:name ?name .
    FILTER EXISTS { ?person e:created ?project }
}

Filter all person who did not create a project.

SELECT ?name
WHERE {
  ?person v:label "person" .
  ?person v:name ?name .
    FILTER NOT EXISTS { ?person e:created ?project }
}

Meta-Property Access

Accessing the Meta-Property of a graph element. Meta-Property can be perceived as the reified statements in an RDF graph.

SELECT ?name ?startTime
WHERE {
  ?person v:name "daniel" .
  ?person p:location ?location .
  ?location v:value ?name .
  ?location v:startTime ?startTime
}

Union

Select all persons who have developed a software in java using union.

SELECT * WHERE {
  {?person e:created ?software .}
  UNION
  {?software v:lang "java" .}
}

Optional

Return the names of the persons who have created a software in java and optionally also in python.

SELECT ?person WHERE {
  ?person v:label "person" .
  ?person e:created ?software .
  ?software v:lang "java" .
  OPTIONAL {?software v:lang "python" . }
}

Order By

Select all vertices with the label person and order them by their age.

SELECT * WHERE {
  ?person v:label "person" .
  ?person v:age ?age .
} ORDER BY (?age)

Group By

Select all vertices with the label person and group them by their age.

SELECT * WHERE {
  ?person v:label "person" .
  ?person v:age ?age .
} GROUP BY (?age)

Mixed/complex/aggregation-based queries

Count the number of projects which have been created by persons under the age of 30 and group them by age. Return only the top two.

SELECT COUNT(?project) WHERE {
  ?person v:label "person" .
  ?person v:age ?age . FILTER (?age < 30)
  ?person e:created ?project .
} GROUP BY (?age) LIMIT 2

STAR-shaped queries

STAR-shaped queries are the queries that form/follow a star-shaped execution plan. These in terms of graph traversals can be perceived as path queries or neighbourhood queries. For instance, getting all the information about a specific person or software.

SELECT ?age ?software ?name ?location ?startTime WHERE {
  ?person v:name "daniel" .
  ?person v:age ?age .
  ?person e:created ?software .
  ?person p:location ?location .
  ?location v:value ?name .
  ?location v:startTime ?startTime
}

Future work

As future work we plan to:

  1. cover all cases of SPARQL queries with variables in predicate position
  2. cover SPARQL 1.1 specification (such as Property Paths)

Acknowledgements

The authors of this project are supported by funding received from the European Union’s Horizon 2020 Research and Innovation program under the Marie Skłodowska Curie Grant Agreement No. 642795 (WDAqua ITN).

We would like to express our gratitude to Mr. Daniel Kupitz, who laid the early foundation of work that follows. Lastly, we would also like to thank Mr. Stephen Mallette and Dr. Marko Rodriguez for their valuable inputs and efforts for enabling the integration of Gremlinator into Apache TinkerPop framework. Many thanks getting us started three-cheers :)