Skip to content

europeana/sparql-updater

Repository files navigation

SPARQL updater

Software to automatically fill a Virtuoso DB with Europeana datasets from the Europeana FTP server and update the data sets regularly.

Steps to build and start a Virtuoso Docker image

  1. Check if the Virtuoso buffer settings in the Dockerfile_virtuoso are appropriate for the server where this should be running (see also Virtuoso performance tuning tutorial
  2. Run docker build -t europeana/sparql-virtuoso -f Dockerfile_virtuoso . This will create a Docker image for Virtuoso with relevant settings.
  3. Check if the settings in the file virtuoso/docker-compose-virtuoso.yml are correct. A proper RAM limit should be set and the path to the database should point to an existing Virtuoso database (alternatively a new empty database will be created on start-up).
  4. Start the container with (docker compose -f docker-compose-virtuoso.yml up). The Virtuoso GUI will be available at http://localhost:8890/

Steps to build and start a Virtuoso Docker image including the sparql-updater

  1. Make a copy of the file /src/main/resources/sparql-updater.properties and name it sparql-updater.user.properties. Configuration options defined in this file will override those defined in sparql-updater.properties. Check if all values are correct.
  2. Run mvn clean package to create the file /target/sparql-updater.jar. This file contains the code to automatically load sets from the Europeana FTP server and write it to Virtuoso.
  3. Run docker build -t europeana/sparql-virtuoso-updater -f Dockerfile_updater .. This will create a Docker image containing both Virtuoso and the built sparql-updater.jar. The jar file will contain the sparql-updater.user.properties file, so don't push this to DockerHub!
  4. Check if the settings in the file virtuoso/docker-compose-updater.yml are correct and use the file to start the container (docker compose -f docker-compose-updater.yml up).

Some things to be aware of:

  • Loading all Europeana datasets in Virtuoso will require around 150GB of disk space!
  • For local testing purposes we use a hard-coded password (see DBA_PASSWORD variable in docker-compose-updater.yml and docker-compose-virtuoso.yml files). For production purposes the credentials in this .yml file and in the updater's user.properties file should be changed.
  • After startup a folder named /database is automatically created relative to the startup location. This folder contains the Virtuoso database files but also has a folder named tmp-ingest where files will be stored that are downloaded from the ftp-server and generated by the sparql-updater for ingestion. These files are automatically deleted when they are no longer needed.
  • You can check which datasets are loaded using this SPARQL query: SELECT DISTINCT ?g WHERE { GRAPH ?g {?s a ?o} }
  • You can use the DELETE_VIRTUOSO_DB=true environment variable to clear the Virtuoso database on startup.

If you are making (configuration) changes to the sparql-updater don't forget to:

  1. Rebuild the jar
  2. Rebuild the Docker image
  3. Recreate the container

About

Software to automatically update the Virtuoso DB

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •