Software to automatically fill a Virtuoso DB with Europeana datasets from the Europeana FTP server and update the data sets regularly.
- Check if the Virtuoso buffer settings in the
Dockerfile_virtuoso
are appropriate for the server where this should be running (see also Virtuoso performance tuning tutorial - Run
docker build -t europeana/sparql-virtuoso -f Dockerfile_virtuoso .
This will create a Docker image for Virtuoso with relevant settings. - Check if the settings in the file
virtuoso/docker-compose-virtuoso.yml
are correct. A proper RAM limit should be set and the path to the database should point to an existing Virtuoso database (alternatively a new empty database will be created on start-up). - Start the container with (
docker compose -f docker-compose-virtuoso.yml up
). The Virtuoso GUI will be available at http://localhost:8890/
- Make a copy of the file
/src/main/resources/sparql-updater.properties
and name itsparql-updater.user.properties
. Configuration options defined in this file will override those defined in sparql-updater.properties. Check if all values are correct. - Run
mvn clean package
to create the file/target/sparql-updater.jar
. This file contains the code to automatically load sets from the Europeana FTP server and write it to Virtuoso. - Run
docker build -t europeana/sparql-virtuoso-updater -f Dockerfile_updater .
. This will create a Docker image containing both Virtuoso and the built sparql-updater.jar. The jar file will contain the sparql-updater.user.properties file, so don't push this to DockerHub! - Check if the settings in the file
virtuoso/docker-compose-updater.yml
are correct and use the file to start the container (docker compose -f docker-compose-updater.yml up
).
Some things to be aware of:
- Loading all Europeana datasets in Virtuoso will require around 150GB of disk space!
- For local testing purposes we use a hard-coded password (see
DBA_PASSWORD
variable indocker-compose-updater.yml
anddocker-compose-virtuoso.yml
files). For production purposes the credentials in this .yml file and in the updater's user.properties file should be changed. - After startup a folder named
/database
is automatically created relative to the startup location. This folder contains the Virtuoso database files but also has a folder namedtmp-ingest
where files will be stored that are downloaded from the ftp-server and generated by the sparql-updater for ingestion. These files are automatically deleted when they are no longer needed. - You can check which datasets are loaded using this SPARQL query:
SELECT DISTINCT ?g WHERE { GRAPH ?g {?s a ?o} }
- You can use the
DELETE_VIRTUOSO_DB=true
environment variable to clear the Virtuoso database on startup.
If you are making (configuration) changes to the sparql-updater don't forget to:
- Rebuild the jar
- Rebuild the Docker image
- Recreate the container