provenance template

As part of developing a provenance recording solution for the REF https://github.com/Climate-REF/climate-ref/issues/26 and https://github.com/Climate-REF/climate-ref/issues/162, @aspinuso suggested that we write down a [provenance template](https://openprovenance.org/prov-template-2014-06-07/) (see [here](https://lucmoreau.wordpress.com/2017/03/30/prov-template-a-quick-start/) for an introduction and [here](https://github.com/swirrl-api/ipccfiguresgenprov-f41fb836/blob/master/IPCCFigure.template.provn) and [here](https://github.com/edsml-mh1123/ar7-wg1-fod-ch99-fig99-Python/blob/main/IPCCFigure3_11.provn) for IPCC examples) describing how ESMValCore records provenance to facilitate discussion. A general introduction to ESMValCore provenance is available [here](https://docs.esmvaltool.org/en/latest/community/diagnostic.html#recording-provenance).

Here is a first attempt at a provenance template in [PROV-N](https://www.w3.org/TR/2013/REC-prov-n-20130430/) format:
```
document
  prefix var <http://openprovenance.org/var#>
  prefix attribute <https://www.esmvaltool.org/attribute>
  prefix preprocessor <https://www.esmvaltool.org/preprocessor>

  activity(var:diagnosticTask, -, -)
  activity(var:preprocessingTask, -, -)
  activity(var:software, -, -)
  agent(
    var:diagnosticAuthor,
    [
      attribute:email='var:emailDiagnosticAuthor',
      attribute:orcid='var:orcidDiagnosticAuthor',
      attribute:github='var:githubDiagnosticAuthor',
      attribute:institute='var:instituteDiagnosticAuthor'
    ]
  )
  agent(
    var:recipeAuthor,
    [
      attribute:email='var:emailRecipeAuthor',
      attribute:orcid='var:orcidRecipeAuthor',
      attribute:github='var:githubRecipeAuthor',
      attribute:institute='var:instituteRecipeAuthor'
    ]
  )
  agent(var:project)
  entity(
    var:inputFile,
    [
      attribute:Conventions='var:inputFileConventions',
      attribute:branch_time='var:inputFileBranchTime',
      attribute:cmor_version='var:inputFileCMORVersion',
      attribute:model_id='var:inputFileModelId',
      ...
    ]
  )
  entity(
    var:preprocessedFile,
    [
      attribute:Conventions='var:inputFileConventions',
      attribute:branch_time='var:inputFileBranchTime',
      attribute:cmor_version='var:inputFileCMORVersion',
      attribute:model_id='var:inputFileModelId',
      ...
      preprocessor:regrid='var:regridPreprocessorSettings',
      preprocessor:convert_units='var:convertUnitsPreprocessorSettings',
      ...
    ]
  )
  entity(
    var:resultFile,
    [
      attribute:caption='var:resultCaption',
      attribute:domains='var:resultDomains',
      attribute:realm='var:resultRealm',
      attribute:references='var:resultReferences'
      ...
    ]
  )
  entity(
    var:recipe,
    [
      attribute:description='var:recipeDescription',
      attribute:references='var:recipeReferences'
    ]
  )
  wasDerivedFrom(var:preprocessedFile, var:inputFile, var:preprocessingTask, -, -)
  wasDerivedFrom(var:resultFile, var:preprocessedFile, var:diagnosticTask, -, -)
  wasAttributedTo(var:recipe, var:recipeAuthor)
  wasAttributedTo(var:recipe, var:project)
  wasAttributedTo(var:resultFile, var:recipeAuthor)
  wasAttributedTo(var:resultFile, var:diagnosticAuthor)
  wasStartedBy(var:preprocessingTask, var:recipe, var:software, -)
  wasStartedBy(var:diagnosticTask, var:recipe, var:software, -)
endDocument
```

Note that this describes the current implementation, which may not be optimal.

The items in the `attribute` namespace are pretty much free-form. For datasets like CMIP5, CMIP6, obs4MIPs, and CORDEX there exist controlled vocabularies prescribing the required global attributes, but for [other data](https://docs.esmvaltool.org/en/latest/input.html#observations) this is not the case and even when there is a prescribed controlled vocabulary, experience has shown that often data does not comply with it. The `attribute`s of the `resultFile` will always contain certain items, but users may add more by specifying them in the [recipe under the diagnostic script](https://docs.esmvaltool.org/projects/ESMValCore/en/latest/recipe/overview.html#passing-arguments-to-a-diagnostic-script). To make linked data 'work', using a proper namespace as suggested in https://github.com/ESMValGroup/ESMValTool/pull/649#pullrequestreview-167942468 would be nice, but this is challenging because of the variation in available attributes.

More complicated templates than the above are possible, e.g. when using a multi model preprocessor function like [`multi_model_statistics`](https://docs.esmvaltool.org/projects/ESMValCore/en/latest/recipe/preprocessor.html#multi-model-statistics) there will be an extra intermediate `preprocessedFile` in the provenance record, and when using the [`ancestors`](https://docs.esmvaltool.org/projects/ESMValCore/en/latest/recipe/overview.html#ancestor-tasks) feature `resultFile`s may be derived from other `resultFile`s as well as `preprocessedFile`s.

See also #29 and ESMValGroup/ESMValTool#649 for previous discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

provenance template #4118

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

provenance template #4118

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions