Adding detail to RetrievalSource provenance #1624
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Exploring some modeling that would support capturing a couple additional retrieval source provenance details on a per edge basis. To discuss on upcoming MUTT/DINGO call:
added an
ingest_sourcepermissible value - to help capture which source the data was actually ingested from (and made theRetrievalSoruce.resoruce_roleslot multivalued - to allow indicating that a particular primary or aggregator source was also the 'ingest_source')also tested an alternative pattern to capture this info - that defines a separate slot to capture the ingest_source -
ingest_source: booleanadded an
ingest_filesslot toRetrievalSource- for use in theRetrievalSourceobject for the ingest_source, to report files(s) from which the data used to create the edge were retrieved. This provides more complete provenance, and supports various downstream activities:. . . If not at the edge level in the data, perhaps making it standard to put this info in the RIG for each 'EdgeType' object?
ResourceRoleEnumvalues - which I think we should keep even if we don't adopt the other proposals above