-
Notifications
You must be signed in to change notification settings - Fork 190
Description
The question on adopting controlled (and thus validatable, suggestive in UI etc) dictionaries for various metadata fields is coming up with an increased frequency, and extra efforts are invested in developing and ad-hoc annotation of existing BIDS datasets (e.g. https://neurobagel.org/ by @jbpoline and his group).
The point is not only to restrict the set of values (already possible/used), but also to have clear alignment to external (larger) ontologies to make it easier to crosswalk metadata from BIDS datasets (attn @lzehl).
Here I would like to collect list of metadata fields we already have or suggested in BEPs which would be of immediate benefit from such standardization
- species (came up recently in the scope of BEP-038: Atlases #1714): we could limit to the list of species found in major archives such as openneuro and dandi to likely cover 99% of cases
- sex (surprise: is not M/F for other species!)
- age reference point (Formalize participants' age to clarify the reference point #1634)
- handedness
- subject group (ref: neurobagel)
- tricky here since it could be experimental group (read vs write) and not necessarily any special population/disease; so we might want to formalize some
group_typeand then use different value type forgroupbased ongroup_type?
- tricky here since it could be experimental group (read vs write) and not necessarily any special population/disease; so we might want to formalize some
- medication (BEP010 Medication field #319)
- ... point/suggest ...
and interesting aspect here was brought up by @bendichter while chatting with @leej3 and their teams on the fact that some metadata fields are inter-dependent: e.g. sex and age itself might be dependent on species! So while envisioning here we might want to provision for that one way or another.
I think all of the above metadata fields come already as "columns" so we could literally add enums for them. Moreover we already have some e.g.
sample_type:
name: sample_type
display_name: Sample type
description: |
Biosample type defined by
[ENCODE Biosample Type](https://www.encodeproject.org/profiles/biosample_type).
type: string
enum:
- $ref: objects.enums.cell_line.value
- $ref: objects.enums.in_vitro_differentiated_cells.value
- $ref: objects.enums.primary_cell.value
- $ref: objects.enums.cell_free_sample.value
- $ref: objects.enums.cloning_host.value
- $ref: objects.enums.tissue.value
- $ref: objects.enums.whole_organisms.value
- $ref: objects.enums.organoid.value
- $ref: objects.enums.technical_sample.valueso we would just
- formalize referencing specific ontology (or ontologies) sources beyond the free text in description
- extend a list of such enums for some/all of above metadata fields, starting with some obvious (e.g.
species) - promote establishing linkage to ontologies at the level of BEP guidelines