Skip to content

Latest commit

 

History

History
324 lines (212 loc) · 12.8 KB

README.md

File metadata and controls

324 lines (212 loc) · 12.8 KB

Sparqlify SPARQL->SQL rewriter

Build Status

Introduction

Sparqlify is a scalable SPARQL-SQL rewriter whose development began in April 2011 in the course of the LinkedGeoData project.

This system's features/traits are:

  • Support of the 'Sparqlification Mapping Language' (SML), an intuitive language for expressing RDB-RDF mappings with only very little syntactic noise.
  • Scalability: Sparqlify does not evaluate expressions in memory. All SPARQL filters end up in the corresponding SQL statement, giving the underlying RDBMS has maximum control over query planning.
  • A powerful rewriting engine that analyzes filter expressions in order to eleminate self joins and joins with unsatisfiable conditions.
  • Initial support for spatial datatypes and predicates.
  • A subset of the SPARQL 1.0 query language plus sub queries are supported.
  • Tested with PostgreSQL/Postgis and H2. Support for further databases is planned.
  • CSV support
  • R2RML will be supported soon

Functions

SPARQL-to-SQL function mappings are specified in the file functions.xml.

Standard SPARQL functions
SPARQL function SQL Definition
boolean strstarts(string, string) strpos($1$, $2$) = 1
TODO
Spatial Function Extensions
SPARQL function SQL Definition
TODO

Supported SPARQL language features

  • Join, LeftJoin (i.e. Optional), Union, Sub queries
  • Filter predicates: comparison: (<=, <, =, >, >=) logical: (!, &&; ||) arithmetic: (+, -) spatial: st_intersects, geomFromText; other: regex, lang, langMatches
  • Aggregate functions: Count(*)
  • Order By is pushed into the SQL

Debian packages

Sparqlify Debian packages can be obtained by following means:

Public repositories

After setting up any of the repositories below, you can install sparqlify with apt using

  • apt: `sudo apt-get install sparqlify-cli

Linked Data Stack (this is what you want)

Sparqlify is distributed at the Linked Data Stack, which offers many great tools done by various contributors of the Semantic Web community.

  • The repository is available in the flavors nightly, testing and stable here.
# !!! Replace stable with nightly or testing as needed !!!

# Download the repository package
wget http://stack.linkeddata.org/ldstable-repository.deb

# Install the repository package
sudo dpkg -i ldstable-repository.deb

# Update the repository database
sudo apt-get update

Bleeding Edge (Not recommended for production)

For the latest development version (built on every commit) perform the following steps

Import the public key with

wget -qO - http://cstadler.aksw.org/repos/apt/conf/packages.precise.gpg.key  | sudo apt-key add -

Add the repository

echo 'deb http://cstadler.aksw.org/repos/apt precise main contrib non-free' | sudo tee -a /etc/apt/sources.list.d/cstadler.aksw.org.list

Note that this also works with distros other than "precise" (ubuntu 12.04) such as ubuntu 14.04 or 16.04.

Building

Building the repository creates the JAR files providing the sparqlify-* tool suite.

One of the plugins requires the xjc command (for compiling an XML schema to Java classes) which is no longer part of the jdk. The following package provides it:

sudo apt install jaxb

Debian package

Building debian packages from this repo relies on the Debian Maven Plugin plugin, which requires a debian-compatible environment. If such an environment is present, the rest is simple:

# Install all shell scripts necessary for creating deb packages
sudo apt-get install devscripts

# Execute the follwing from the `<repository-root>/sparqlify-core` folder:
mvn clean install deb:package

# Upon sucessful completion, the debian package is located under `<repository-root>/sparqlify-core/target`
# Install using `dpkg`
sudo dpkg -i sparqlify_<version>.deb

# Uninstall using dpkg or apt:
sudo dpkg -r sparqlify
sudo apt-get remove sparqlify

Assembly based

Another way to build the project is run the following commands at <repository-root>

mvn clean install

cd sparqlify-cli
mvn assembly:assembly

This will generate a single stand-alone jar containing all necessary dependencies. Afterwards, the shell scripts under sparqlify-core/bin should work.

Tool suite

If Sparqlify was installed from the debian package, the following commands are available system-wide:

  • sparqlify: This is the main executable for running individual SPARQL queries, creating dumps and starting a stand-alone server.
  • sparqlify-csv: This tool can create RDF dumps from CSV file based on SML view definitions.
  • sparqlify-platform: A stand-alone server component integrating additional projects.

These tools write their output (such as RDF data in the N-TRIPLES format) to STDOUT. Log output goes to STDERR.

sparqlify

Usage: sparqlify [options]

Options are:

  • Setup

    • -m SML view definition file
  • Database Connectivity Settings

    • -h Hostname of the database (e.g. localhost or localhost:5432)
    • -d Database name
    • -u User name
    • -p Password
    • -j JDBC URI (mutually exclusive with both -h and -d)
  • Quality of Service

    • -n Maximum result set size
    • -t Maximum query execution time in seconds (excluding rewriting time)
  • Stand-alone Server Configuration

    • -P Server port [default: 7531]
  • Run-Once (these options prevent the server from being started and are mutually exclusive with the server configuration)

    • -D Create an N-TRIPLES RDF dump on STDOUT
    • -Q [SPARQL query] Runs a SPARQL query against the configured database and view definitions

Example

The following command will start the Sparqlify HTTP server on the default port.

sparqlify -h localhost -u postgres -p secret -d mydb -m mydb-mappings.sml -n 1000 -t 30

Agents can now access the SPARQL endpoint at http://localhost:7531/sparql

sparqlify-csv

Usage: sparqlify-csv [options]

  • Setup

    • -m SML view definition file
    • -f Input data file
    • -v View name (can be omitted if the view definition file only contains a single view)
  • CSV Parser Settings

    • -d CSV field delimiter (default is '"')
    • -e CSV field escape delimiter (escapes the field delimiter) (default is '')
    • -s CSV field separator (default is ',')
    • -h Use first row as headers. This option allows one to reference columns by name additionally to its index.

sparqlify-platform (Deprecated; about to be superseded by sparqlify-web-admin)

The Sparqlify Platform (under /sparqlify-platform) bundles Sparqlify with the Linked Data wrapper Pubby and the SPARQL Web interface Snorql.

Usage: sparqlify-platform config-dir [port]

  • config-dir Path to the configuration directory, e.g. <repository-root/sparqlify-platform/config/example>
  • port Port on which to run the platform, default 7531.

For building, at the root of the project (outside of the sparqlify-* directories), run mvn compile to build all modules. Afterwards, lauch the platform using:

cd sparqlify-platform/bin
./sparqlify-platform <path-to-config> <port>

Assuming the platform runs under http://localhost:7531, you can access the following services relative to this base url:

  • /sparql is Sparqlify's SPARQL endpoint
  • /snorql shows the SNORQL web frontend
  • /pubby is the entry point to the Linked Data interface

Configuration

The configDirectory argument is mandatory and must contain a sub-directory for the context-path (i.e. sparqlify-platform) in turn contains the files:

  • platform.properties This file contains configuration parameters that can be adjusted, such as the database connection.
  • views.sparqlify The set of Sparqlify view definition to use.

I recommend to first create a copy of the files in /sparqlify-platform/config/example under a different location, then adjust the parameters and finally launch the platform with -DconfigDirectory=... set appropriately.

The platform applies autoconfiguration to Pubby and Snorql:

  • Snorql: Namespaces are those of the views.sparqlify file.
  • Pubby: The host name of all resources generated in the Sparqlify views is replaced with the URL of the platform (currently still needs to be configured via platform.properties)

Additionally you probably want to make the URIs nice by e.g. configuring an apache reverse proxy:

Enable the apache proxy_http module:

sudo a2enmod proxy_http

Then in your /etc/apache2/sites-available/default add lines such as

ProxyRequest Off
ProxyPass /resource http://localhost:7531/pubby/bizer/bsbm/v01/ retry=1
ProxyPassReverse /resource http://localhost:7531/pubby/bizer/bsbm/v01/

These entries will enable requests to http://localhost/resource/... rather than http//localhost:7531/pubby/bizer/bsbm/v01/.

The retry=1 means, that apache only waits 1 seconds before retrying again when it encounters an error (e.g. HTTP code 500) from the proxied resource.

IMPORTANT: ProxyRequests are off by default; DO NOT ENABLE THEM UNLESS YOU KNOW WHAT YOU ARE DOING. Simply enabling them potentially allows anyone to use your computer as a proxy.

SML Mapping Syntax:

A Sparqlification Mapping Language (SML) configuration is essentially a set of CREATE VIEW statements, somewhat similar to the CREATE VIEW statement from SQL. Probably the easiest way to learn to syntax is to look at the following resources:

Two more examples are from

Additionally, for convenience, prefixes can be declared, which are valid throughout the config file. As comments, you can use //, /* */, and #.

For a first impression, here is a quick example:

/* This is a comment
 * /* You can even nest them! */
 */
// Prefixes are valid throughout the file
Prefix dbp:<http://dbpedia.org/ontology/>
Prefix ex:<http://ex.org/>

Create View myFirstView As
    Construct {
        ?s a dbp:Person .
        ?s ex:workPage ?w .
    }
With
    ?s = uri('http://mydomain.org/person', ?id) // Define ?s to be an URI generated from the concatenation of a prefix with mytable's id-column.
    ?w = uri(?work_page) // ?w is assigned the URIs in the column 'work_page' of 'mytable'
Constrain
    ?w prefix "http://my-organization.org/user/" // Constraints can be used for optimization, e.g. to prune unsatisfiable join conditions
From
    mytable; // If you want to use an SQL query, the query (without trailing semicolon) must be enclosed in double square brackets: [[SELECT id, work_page FROM mytable]]

Notes for sparqlify-csv

For sparqlify-csv view definition syntax is almost the same as above; the differences being:

  • Instead of Create View viewname As Construct start your views with CREATE VIEW TEMPLATE viewname As Construct
  • There is no FROM and CONSTRAINT clause

Colums can be referenced either by name (see the -h option) or by index (1-based).

Example

// Assume a CSV file with the following columns (osm stands for OpenStreetMap)
(city\_name, country\_name, osm\_entity\_type, osm\_id, longitude, latitude)

Prefix fn:<http://aksw.org/sparqlify/> //Needed for urlEncode and urlDecode.
Prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
Prefix owl:<http://www.w3.org/2002/07/owl#>
Prefix xsd:<http://www.w3.org/2001/XMLSchema#>
Prefix geo:<http://www.w3.org/2003/01/geo/wgs84_pos#>

Create View Template geocode As
  Construct {
    ?cityUri
      owl:sameAs ?lgdUri .

    ?lgdUri
      rdfs:label ?cityLabel ;
      geo:long ?long ;
      geo:lat ?lat .
  }
  With
    ?cityUri = uri(concat("http://fp7-pp.publicdata.eu/resource/city/", fn:urlEncode(?2), "-", fn:urlEncode(?1)))
    ?cityLabel = plainLiteral(?1)
    ?lgdUri = uri(concat("http://linkedgeodata.org/triplify/", ?4, ?5))
    ?long = typedLiteral(?6, xsd:float)
    ?lat = typedLiteral(?7, xsd:float)