Skip to content

Support CWL Prov with cwltool for OGC API - Processes IPT #673

Closed
@fmigneault

Description

@fmigneault

Description

Given the rising need for IPT (Integrity, Provenance, Trust) through OGC APIs and their workflow processing, the provenance capabilities of CWL should be leveraged to accomplish this goal. This would add metadata references within the CWL Application Packages themselves, allowing better open-science and IPT workflow tracking.

To Do

  • GET /jobs/{jobId}/run to return the PROV-JSON produced by cwltool --provenance
    (edit: won't do)

  • GET /jobs/{jobId}/prov as alternate endpoint

  • consider additional PROV endpoints vs what cwlprov offers
    https://gitlab.ogc.org/ogc/T20-GDC/-/wikis/GDC-Provenance-demonstration-GeoLabs#usage

    • GET /jobs/{jobId}/prov (contents of various metadata/provenance/primary.cwlprov.{ext})
    • GET /jobs/{jobId}/prov/info (as cwlprov info or metadata/manifest.json)
    • GET /jobs/{jobId}/prov/who (as cwlprov who)
    • GET /jobs/{jobId}/prov/inputs (as cwlprov inputs)
    • GET /jobs/{jobId}/prov/inputs/{id} (as cwlprov inputs [<run-id>])
    • GET /jobs/{jobId}/prov/outputs (as cwlprov outputs)
    • GET /jobs/{jobId}/prov/outputs/{id} (as cwlprov outputs [<run-id>])
    • GET /jobs/{jobId}/prov/run (as cwlprov run --inputs --outputs --labels --duration --steps)
      (use all flags to get all available metadata)
    • GET /jobs/{jobId}/prov/run/{id} (as cwlprov run [<run-id>])
  • Alternate PROV-XML/RDF/etc. if Accept requests it
    (all variants should already be generated by cwltool as various manifest representations)

  • When generating cwltool --provenance results, avoid duplicating results already found in WPS-outputs to save space (use their URI for cross-reference).

    • if File is saved only as the {"class":"File", "path": "..."} definition, this could be allowed to avoid extra code managing the references
    • assume that strings are sufficiently small (as per cwltool's own content limit)
  • Any additional metadata/links pointing at the specific job and process executed that should be embedded in the PROV contents

  • Cross-walk with Support POST /jobs for various workflow implementations #716 requirements

  • Include ORCID and other relevant PROV metadata #783
    (see cwltool --orcid --enable-user-provenance --enable-host-provenance)

  • update CLI to provide a provenance operation

  • ensure provenance-related requirements are added to conformance

  • ensure provenance links are returned in job status response

  • add more W3C PROV details about process I/O #780

References

Implementation

Metadata

Metadata

Assignees

Labels

feature/CWLIssue related to CWL supportfeature/job/provenanceIssue related to W3C PROV metadata applied to a Job.feature/provenanceIssue related to any provenance metadata functionality.process/OAP-Part4: JobsOGC API - Processes - Part 4: Job Managementprocess/workflowRelated to a Workflow process.project/OGC-GDCDevelopments related to OGC GeoDataCubeproject/OGC-IPTDevelopments related to OGC Integrity, Provenance, and Trusttriage/featureNew requested feature.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions