-
Notifications
You must be signed in to change notification settings - Fork 138
Description
As part of developing a provenance recording solution for the REF Climate-REF/climate-ref#26 and Climate-REF/climate-ref#162, @aspinuso suggested that we write down a provenance template (see here for an introduction and here and here for IPCC examples) describing how ESMValCore records provenance to facilitate discussion. A general introduction to ESMValCore provenance is available here.
Here is a first attempt at a provenance template in PROV-N format:
document
prefix var <http://openprovenance.org/var#>
prefix attribute <https://www.esmvaltool.org/attribute>
prefix preprocessor <https://www.esmvaltool.org/preprocessor>
activity(var:diagnosticTask, -, -)
activity(var:preprocessingTask, -, -)
activity(var:software, -, -)
agent(
var:diagnosticAuthor,
[
attribute:email='var:emailDiagnosticAuthor',
attribute:orcid='var:orcidDiagnosticAuthor',
attribute:github='var:githubDiagnosticAuthor',
attribute:institute='var:instituteDiagnosticAuthor'
]
)
agent(
var:recipeAuthor,
[
attribute:email='var:emailRecipeAuthor',
attribute:orcid='var:orcidRecipeAuthor',
attribute:github='var:githubRecipeAuthor',
attribute:institute='var:instituteRecipeAuthor'
]
)
agent(var:project)
entity(
var:inputFile,
[
attribute:Conventions='var:inputFileConventions',
attribute:branch_time='var:inputFileBranchTime',
attribute:cmor_version='var:inputFileCMORVersion',
attribute:model_id='var:inputFileModelId',
...
]
)
entity(
var:preprocessedFile,
[
attribute:Conventions='var:inputFileConventions',
attribute:branch_time='var:inputFileBranchTime',
attribute:cmor_version='var:inputFileCMORVersion',
attribute:model_id='var:inputFileModelId',
...
preprocessor:regrid='var:regridPreprocessorSettings',
preprocessor:convert_units='var:convertUnitsPreprocessorSettings',
...
]
)
entity(
var:resultFile,
[
attribute:caption='var:resultCaption',
attribute:domains='var:resultDomains',
attribute:realm='var:resultRealm',
attribute:references='var:resultReferences'
...
]
)
entity(
var:recipe,
[
attribute:description='var:recipeDescription',
attribute:references='var:recipeReferences'
]
)
wasDerivedFrom(var:preprocessedFile, var:inputFile, var:preprocessingTask, -, -)
wasDerivedFrom(var:resultFile, var:preprocessedFile, var:diagnosticTask, -, -)
wasAttributedTo(var:recipe, var:recipeAuthor)
wasAttributedTo(var:recipe, var:project)
wasAttributedTo(var:resultFile, var:recipeAuthor)
wasAttributedTo(var:resultFile, var:diagnosticAuthor)
wasStartedBy(var:preprocessingTask, var:recipe, var:software, -)
wasStartedBy(var:diagnosticTask, var:recipe, var:software, -)
endDocument
Note that this describes the current implementation, which may not be optimal.
The items in the attribute
namespace are pretty much free-form. For datasets like CMIP5, CMIP6, obs4MIPs, and CORDEX there exist controlled vocabularies prescribing the required global attributes, but for other data this is not the case and even when there is a prescribed controlled vocabulary, experience has shown that often data does not comply with it. The attribute
s of the resultFile
will always contain certain items, but users may add more by specifying them in the recipe under the diagnostic script. To make linked data 'work', using a proper namespace as suggested in #649 (review) would be nice, but this is challenging because of the variation in available attributes.
More complicated templates than the above are possible, e.g. when using a multi model preprocessor function like multi_model_statistics
there will be an extra intermediate preprocessedFile
in the provenance record, and when using the ancestors
feature resultFile
s may be derived from other resultFile
s as well as preprocessedFile
s.