Skip to content

Formalize the concept of [{leading entities}_]{entity_plural}.{tsv,json} file(s) #2283

@yarikoptic

Description

@yarikoptic

This is precursor to

and mentioned through out the existing issues and PRs, e.g.

and would be required to facilitate

but I failed to find a dedicated issue.

Current situation/issue

We have an expanding list of already defined .tsv files summarized below (including `_scans.tsv` which is not per-entity per se but close in spirit):

Table 1: Entity-level TSV files ({entity-plural}.tsv pattern)

File Pattern Entity Location Column 1 Column 2 Column 3 Column 4 Issues etc
participants.tsv subject (sub-<label>) Dataset root participant_id species (R) age (R) sex (R) bids-2-devel#14
samples.tsv sample (sample-<label>) Dataset root sample_id participant_id sample_type pathology (R) Composite index (sample_id + participant_id)
sub-<label>_sessions.tsv session (ses-<label>) Subject folder session_id acq_time (O) pathology (R) HED (O)
phenotype/<name>.tsv subject (per assessment) phenotype/ participant_id HED (O)
[sub-<label>_][ses-<label>_]descriptions.tsv description (desc-<label>) Derivatives (root/sub/ses) desc_id description #2281
sub-<label>[_ses-<label>]_scans.tsv scan (data files) Subject/session folder filename acq_time (O) HED (O)

Notes:

  • (R) = Recommended, (O) = Optional, unmarked = Required

and those are not to be "conflated" (at the moment at least) with data type files like _electrodes.tsv in iEEG etc

Table 2: Internal construct TSV files (non-entity)

which somewhat relate but already inconsistent as

  • use name not some id
  • to avoid composite index, like we need for bep032, use composite of some {location}{index} within name column
click to expand -- not primary target for this issue
File Pattern Describes Column 1 Col 1 Example Column 2 Column 3 Column 4 Issues etc
*_channels.tsv (EEG) Recording channels name VEOG type units description (O)
*_channels.tsv (MEG) Recording channels name VEOG type units description (O)
*_channels.tsv (EMG) Recording channels name type units description (O)
*_channels.tsv (iEEG) Recording channels name LT01 type units low_cutoff
*_channels.tsv (NIRS) Recording channels name S1-D1 type source detector
*_channels.tsv (Motion) Recording channels name t1_acc_x component type tracked_point
*_electrodes.tsv (EEG) Electrode positions name Cz x y z
*_electrodes.tsv (iEEG) Electrode positions name LT01 x y z
*_electrodes.tsv (EMG) Electrode positions name x y z (O)
*_optodes.tsv (NIRS) Optode positions name A1 type x y
*_events.tsv Events/stimuli onset 1.2 duration trial_type (O) response_time (O)
*_beh.tsv Behavioral data trial_type (O) congruent response_time (O) HED (O) stim_file (O)

|

Notes:

  • (R) = Recommended, (O) = Optional, unmarked = Required

and we lack information in

  • BEP Guidelines on construction of such files generally, that leading column should be {entity}_id, that they could be added pretty much for any entity at the appropriate level in the hierarchy
    • TODO: file a clarification PR there
  • BIDS Common principles to expect such files for (only some) ATM entities to succinctly provide metadata specific for each {entity}_id (e.g. instead of duplicating it in individual data .json) files
    • TODO: file a clarification PR against common principles to describe such .tsv files and their purpose generally in the Tabular files section.

As a result, BEPs now

  • come up with composite indexes (like we got one for samples.tsv already)
  • introduce new ad-hoc filenames/approaches (see below on atlas-<label>_description.json)

Alternative/complimentary solutions proposed

{entity}_description.json (atlas-<label>_description.json) in BEP-038 Atlases

I guess it was largely motivated by the fact that .tsv is "flat" and embedding nested structures, e.g. "Authors" list is tricky and non-tsv friendly. But I think overall, we might want indeed to formalize some formalization like a

  • json lines .jsonl - https://jsonlines.org/
  • ... some other like a .dict.json where it would be a simple json with keys on the index of the .tsv ATM or .list.json with close to .jsonl above

without introducing proliferation of use of _description suffix, and to be used interchangeably with any .tsv? IMHO worth a dedicated issue/discussion

Metadata

Metadata

Assignees

No one assigned

    Labels

    consistencySpec is (potentially) inconsistent

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions