ETL pipelines for the RKI Metadata Exchange.
The Metadata Exchange (MEx) project is committed to improve the retrieval of RKI research data and projects. How? By focusing on metadata: instead of providing the actual research data directly, the MEx metadata catalog captures descriptive information about research data and activities. On this basis, we want to make the data FAIR1 so that it can be shared with others.
Via MEx, metadata will be made findable, accessible and shareable, as well as available for further research. The goal is to get an overview of what research data is available, understand its context, and know what needs to be considered for subsequent use.
RKI cooperated with D4L data4life gGmbH for a pilot phase where the vision of a FAIR metadata catalog was explored and concepts and prototypes were developed. The partnership has ended with the successful conclusion of the pilot phase.
After an internal launch, the metadata will also be made publicly available and thus be available to external researchers as well as the interested (professional) public to find research data from the RKI.
For further details, please consult our project page.
Contact
For more information, please feel free to email us at [email protected].
Robert Koch-Institut
Nordufer 20
13353 Berlin
Germany
The mex-extractors package implements a variety of ETL pipelines to extract
metadata from primary data sources using a range of different technologies and
protocols. Then, we transform the metadata into a standardized format using models
provided by mex-common. The last step in this process is to load the harmonized
metadata into a sink (file output, API upload, etc).
This package is licensed under the MIT license. All other software components of the MEx project are open-sourced under the same license as well.
- install python3.11 on your system
- on unix, run
make install - on windows, run
.\mex.bat install
- run all linters with
make lintor.\mex.bat lint - run unit and integration tests with
make testor.\mex.bat test - run just the unit tests with
make unitor.\mex.bat unit
- update boilerplate files with
cruft update - update global requirements in
requirements.txtmanually - update git hooks with
pre-commit autoupdate - update package dependencies using
pdm update-all - update github actions in
.github/workflows/*.ymlmanually
- run
pdm release RULEto release a new version where RULE determines which part of the version to update and is one ofmajor,minor,patch.
- build image with
make image - run directly using docker
make run - start with docker compose
make start
- run
pdm run {command} --helpto print instructions - run
pdm run {command} --debugfor interactive debugging
pdm run dagster devto launch a local dagster UI
pdm run all-extractorsexecutes all extractors- execute only in local or dev environment
pdm run artificialcreates deterministic artificial sample data- execute only in local or dev environment
pdm run biospecimenextracts sources from the Biospecimen excel files
pdm run blueantextracts sources from the Blue Ant project management software
pdm run confluence-vvtextracts sources from the VVT confluence page
pdm run consent-mailersend emails to collect publishing consents
pdm run contact-pointextracts default contact points
pdm run datscha-webextracts sources from the datscha web app
pdm run endnoteextracts from endnote XML files
pdm run ff-projectsextracts sources from the FF Projects excel file
pdm run ifsgextracts sources from the ifsg data base
pdm run international-projectsextracts sources from the international projects excel
pdm run grippewebextracts grippeweb metadata from grippeweb database
pdm run odkextracts ODK survey data from excel files
pdm run open-dataextracts Open Data sources from the Zenodo API
pdm run seq-repoextracts sources from seq-repo JSON file
pdm run sumoextract sumo data from xlsx files
pdm run synopseextracts synopse data from report-server exports
pdm run voxcoextracts voxco data from voxco JSON files
pdm run publishergets merged items from backend and publishes them into sink
Footnotes
-
FAIR is referencing the so-called FAIR data principles – guidelines to make data Findable, Accessible, Interoperable and Reusable. ↩