Skip to content

Extensions and Provenance for Derived Data

Josh Mandel edited this page Aug 24, 2018 · 3 revisions

Background

Shared thoughts on how to represent resources that have been produced by things like NLP pipelines on free-text note (to extract, say, conditions), or OCR from a faxed note, or risk assessments derived from ML algorithms.

Techniques in use today

For each technique include: source organization, the use case for which the extension, tag, or provenance resource is employed, the implementation details (e.g., extension or tag URI used, and a description of which resources or elements can be annotated). If possible, please also include an example of its use.

Provenance to show that one resource was derived from another

Source: Ciitizen.

Use case: Show that one resource was derived computationally from another, without directly impacting the contents of either resource

Implementation: When we create a derived resource, we also create a Provenance instance targeting the derived resource, with a pointer back to our own software (which performed the derivation) and the source resource(s) from which the derivation was performed.

Example: For example, to show that our FHIR extraction application used DocumentReference/a as an input to derive DocumentReference/b as an output:

{
  "active": true,
  "recorded": "2018-08-22T14:03:20.090Z",
  "resourceType": "Provenance",
  "agent": {
    "whoUri": "http://api.ciitizen.com/fhir/extract-service",
    "role": {
      "coding": [
        {
          "system": "http://dicom.nema.org/resources/ontology/DCM",
          "code": "110150",
          "display": "Application"
        }
      ]
    }
  },
  "target": [
    {
      "reference": "DocumentReference/b"
    }
  ],
  "entity": [
    {
      "role": "derivation",
      "whatReference": {
        "reference": "DocumentReference/a"
      }
    }
  ]
}

Silly example tag for resources that result from "Spontaneous Generation"

Source: Josh's imagination.

Use case: This is a fake tag used for resources that occasionally appear, fully-formed, out of the aether that fills the interstices of our data ingestion pipelines.

Implementation: When we find resources like this, we tag them with {"system": "https://fhir.example.org", "code": "spontaneously-generated"}.

Example: For example, we found this dust mite:

{
  "resourceType": "Substance",
  "id": "f201",
  "meta": {
    "tag": [
      {
        "system": "https://fhir.example.org",
        "code": "spontaneously-generated"
      }
    ]
  },
  "code": {
    "coding": [
      {
        "system": "http://snomed.info/sct",
        "code": "406466009",
        "display": "House dust allergen"
      }
    ]
  }
}