Description
Description
Given the rising need for IPT (Integrity, Provenance, Trust) through OGC APIs and their workflow processing, the provenance capabilities of CWL should be leveraged to accomplish this goal. This would add metadata references within the CWL Application Packages themselves, allowing better open-science and IPT workflow tracking.
To Do
-
GET /jobs/{jobId}/run
to return the PROV-JSON produced by cwltool --provenance
(edit: won't do) -
GET /jobs/{jobId}/prov
as alternateendpoint -
consider additional PROV endpoints vs what
cwlprov
offers
https://gitlab.ogc.org/ogc/T20-GDC/-/wikis/GDC-Provenance-demonstration-GeoLabs#usage-
GET /jobs/{jobId}/prov
(contents of variousmetadata/provenance/primary.cwlprov.{ext}
) -
GET /jobs/{jobId}/prov/info
(ascwlprov info
or)metadata/manifest.json
-
GET /jobs/{jobId}/prov/who
(ascwlprov who
) -
GET /jobs/{jobId}/prov/inputs
(ascwlprov inputs
) -
GET /jobs/{jobId}/prov/inputs/{id}
(ascwlprov inputs [<run-id>]
) -
GET /jobs/{jobId}/prov/outputs
(ascwlprov outputs
) -
GET /jobs/{jobId}/prov/outputs/{id}
(ascwlprov outputs [<run-id>]
) -
GET /jobs/{jobId}/prov/run
(ascwlprov run --inputs --outputs --labels --duration --steps
)
(use all flags to get all available metadata) -
GET /jobs/{jobId}/prov/run/{id}
(ascwlprov run [<run-id>]
)
-
-
Alternate PROV-XML/RDF/etc. if
Accept
requests it
(all variants should already be generated bycwltool
as various manifest representations) -
When generating
cwltool --provenance
results, avoid duplicating results already found in WPS-outputs to save space (use their URI for cross-reference).- if
File
is saved only as the{"class":"File", "path": "..."}
definition, this could be allowed to avoid extra code managing the references - assume that strings are sufficiently small (as per
cwltool
's own content limit)
- if
-
Any additional metadata/links pointing at the specific job and process executed that should be embedded in the PROV contents
-
Cross-walk with Support
POST /jobs
for various workflow implementations #716 requirements -
Include ORCID and other relevant PROV metadata #783
(seecwltool --orcid --enable-user-provenance --enable-host-provenance
) -
update CLI to provide a
provenance
operation -
ensure provenance-related requirements are added to conformance
-
ensure provenance links are returned in job status response
References
- https://github.com/common-workflow-language/cwlprov (see
cwltool --provenance
) - example run with OSPD algae workflow: https://gitlab.ogc.org/ogc/T20-GDC/-/wikis/GDC-Provenance-demonstration-GeoLabs
- consider new options to avoid data duplication of I/O in provenance folder
Prov data input output common-workflow-language/cwltool#1989 - PROV: https://www.w3.org/TR/prov-overview/
- PROV-LINKS: https://www.w3.org/TR/2013/NOTE-prov-links-20130430/
- PROV-JSON: https://www.w3.org/submissions/prov-json/
- PROV-RDF: https://www.w3.org/TR/2013/REC-prov-o-20130430/
- PROV-HTML: https://www.w3.org/TR/2013/REC-prov-n-20130430/
- PROV-XML: https://www.w3.org/TR/2013/NOTE-prov-xml-20130430/