-
Notifications
You must be signed in to change notification settings - Fork 190
BEP-038: Atlases #1714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
BEP-038: Atlases #1714
Conversation
I made a first draft of the atlas BEP based on the Google doc
Added atlas content
All the content of the BEP038 Google document has been ported to the atlas.md file. Additionally atlas and a definition of what of means was added as an entity. There are still smaller things to fix as some links, references and the tables still need to be inserted.
Fixed details such as linking to entities and suffixes as well as clarified some of the example text. The tables need still to be fixed.
[BEP038] Style and similar cleanups
|
@erdalkaraca @tsalo this has passed the review period now, can this be checked and merged? thx |
|
I'm working on a review. Please hold on. |
|
I also have not finished my review/proposed changes. I am unlikely to get to it before the meeting next week as I am traveling for another meeting this week. |
|
This BEP currently needs:
I'm converting this to draft to reflect this status. I'm going to temporarily restrict comments, pending synchronous discussion. |
Co-authored-by: Mark Mikkelsen <[email protected]>
Co-authored-by: Mark Mikkelsen <[email protected]>
This reverts commit 0511805.
Co-authored-by: Mark Mikkelsen <[email protected]>
Co-authored-by: Mark Mikkelsen <[email protected]>
Co-authored-by: Mark Mikkelsen <[email protected]>
Co-authored-by: Mark Mikkelsen <[email protected]>
yarikoptic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think it is largely great work, needs some clarifications, but IMHO a larger problem is that introduces a new principled file type (atlas-<label>_description.json) without considering/adopting otherwise existing already pattern in BIDS (atlases.tsv).
At large IMHO that violated existing BEP Guidelines recommendation on atomic changes: https://bids.neuroimaging.io//extensions/guidelines.html#facilitate-atomic-changes and not mentioning there also tpl and cohort for others to become aware in https://bids.neuroimaging.io//extensions/guidelines.html#proposed-entities .
I think this inconsistently does not necessarily should stop this BEP from being merged, but I think it would be great to talk through on either we would like to either
- introduce that as some kind of a generic principle to have such
_description.jsonfiles - or refactor/migrate later into some more common
atlases.tsvoratlases.jsonl(see #2283)
| The selected `<label>` in the `atlas-<label>_description.json` file is RECOMMENDED | ||
| for the [`atlas-<label>` entity](../appendices/entities.md#atlas) | ||
| in downstream derivatives from this particular atlas (see previous section | ||
| [Derivatives from atlases](imaging.md#derivatives-from-atlases)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this introduces a new principle to BIDS of {entity-singular}_description.json in favor of current
{entities_plural}.tsv(atlases.tsvhere) to be used for such purposes in case of multiple values for the entity being present in the dataset (overview + proposal: Formalize the concept of[{leading entities}_]{entity_plural}.{tsv,json}file(s) #2283)- a singular
dataset_description.jsonto describe metadata purpose of the dataset, potentially withDatasetTypebeing set to "atlas", and paired with aboveatlases.tsvto provide per atlas specific metadata .
Moreover, given above and that leading folder is tpl- I feel that templates.tsv would have been most appropriate and expected at the top level to provide description of templates here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be okay with having the JSON metadata inside the dataset description, but it's going to be an ugly dictionary or JSONs that otherwise are outside (with BIDS traditionally being not very friendly with nested dictionaries, they are suggested or mandated at places, but typically small dictionaries with rather controlled keys)
Considering that atlases may have different license terms etc., this BEP proposes the _description.json suffix/ext, which on the other hand doesn't seem like such a large modification.
If following the atomicity principle we should first try to get _description.json more generally approved as a separate PR, that's fine by me but feels bureaucratic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces a new file, not a new principle, in that this is not generalized elsewhere. Further, the existing solutions were deemed insufficient. "DatasetType": "atlas" was rejected after the previous round of review of BEP38, when it was determined that atlases could be encoded as derivative datasets.
atlas-<label>_description.json describes an atlas used in a dataset, while dataset_description.json describes the dataset itself. I would think of it as analogous to incorporating LICENSE files of projects that have been bundled, rather than a row in a descriptions.tsv table.
Moving these contents into either an atlases.tsv or dataset_description.json is a significant deviation from the BEP outside the review period. I'm rejecting this as out of scope. At this point, if you feel there is a fatal flaw in the BEP that justifies further postponement, please ask the steering council to make that call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces a new file, not a new principle
nope, since there is a varying part in that filename, so there could be how @oesteban said "A lot (big lot)" of those files in a dataset, not a single new file. They look like "bids files" to be parsed, but not really fall into any category AFAIK (besides, as I mentioned, look more like something applicable to "inheritance principle" since there are files with atlas- entities within hierarchy).
FWIW, overall, feel free and welcome to merge since I do not want to stop progress on this BEP! but I will keep making the case that this introduces yet another inconsistency we will need to address one way or another. At least in itself it is consistent so could be a subject to auto-migration (#1984 ).
I might also cook up a PR on top which would bring consistency back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed a dedicated
so we could have a more targetted discussion there to weigh pros/cons for introducing of this new top-level only suffix _description.
|
|
||
| ### Atlas identification and metadata | ||
|
|
||
| The `atlas-<label>_description.json` file provides metadata to uniquely identify, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it considered at all to just make "one atlas per dataset" and thus introduce DatasetType "atlas" and thus use dataset_description.json to provide all the gory details (with huge overlap which what current dataset_description.json has)???
then, when necessary, those could be composed into some BIDS super-dataset across atlases if so desired similar to
- what was contemplated in https://bids.neuroimaging.io/bep035 : https://bids.neuroimaging.io/bep035
- inline with potential assembly of BIDS dataset from smaller subj/session ones Allow composition of a BIDS dataset (dataset level) from smaller (subj or subj/ses) level bids-2-devel#59
but the question is even - how common is to bring up multiple atlases here into a single dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it considered at all to just make "one atlas per dataset" and thus introduce
DatasetType"atlas" and thus usedataset_description.jsonto provide all the gory details (with huge overlap which what current dataset_description.json has)???
This has taken probably 80% of the time for this BEP (meaning, this is not proposed this way lightly).
how common is to bring up multiple atlases here into a single dataset?
A lot (big lot).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replying to @oesteban 's comment in the main thread (can't be collapsed/cleaned there):
At large IMHO that violated existing BEP Guidelines recommendation on atomic changes: https://bids.neuroimaging.io//extensions/guidelines.html#facilitate-atomic-changes
Creating a new DatasetType seems way less atomic than creating new entities to me,
I proposed above defining a new DatasetType (if needed to tell it apart from other derivative datasets, but may be not even needed) as an alternative not to a new entity (or entities, here - atlas, tpl, cohort which I did not voice against) but to a new principled file type {entity}-<label>_description.json, which is not following any prior known convention AFAIK to be defined on top level. I would not discuss "precedence cases" of other BEPs here ATM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that, for much I like the atomicity principle it is not applicable as it stands at the moment. Every community member has a slightly different understanding of BIDS and what is atomic and what's not.
Are you arguing against {entity}-<label>_description.json or indicating that, for the atomicity principle it should be introduced in another BEP as a more general rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you arguing against {entity}-_description.json or indicating that, for the atomicity principle it should be introduced in another BEP as a more general rule?
yes and yes. As if not considered more generally, it should not be introduced and rather already existing approaches should be used even if they are neither formalized fully yet (#2283).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatasetType"atlas"
This was proposed in 2023 and abandoned in 2024.
Are you arguing against
{entity}-<label>_description.jsonor indicating that, for the atomicity principle it should be introduced in another BEP as a more general rule?yes and yes. As if not considered more generally, it should not be introduced and rather already existing approaches should be used even if they are neither formalized fully yet (#2283).
Not sure what else to say, but we've had several years of back and forth on this, and finally a review period that produced no dissenting votes. I'm not sure what we should have done, procedurally, but it is out of scope for a PR review to propose such changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what else to say, but we've had several years of back and forth on this
ah, having atlases.tsv was discussed and rejected? (I wonder if "lists" formatting for authors etc was the reason)
I'm not sure what we should have done, procedurally, but it is out of scope for a PR review to propose such changes.
as we already have for entities, IMHO there should have been a proposal/discussion to introduce a principally new file type {entity}-<label>_description.json file with atlas-<label>_description.json as an example (pointing to this BEP), in favor of already present in BIDS {entity_plural}.tsv files.
NB IMHO it is actually a relatively small "serialization" aspect here as all those could be easily expressed as a more conventional atlases.tsv (and missing ATM templates.tsv and spaces.tsv) and we all could remain friends -- stay consistent at a minor question of formatting of those few list entries in the .tsv, while discussing potentially a better fit format to replace .tsv in such cases.
Creating a new DatasetType seems way less atomic than creating new entities to me, and also potentially by the principle you linked to (it would fall under BIDS principles). Creating new datatype folders seems way less atomic than creating new entities (to me again, from what I understand from the guidelines), and yet most BEPs happily add new datatypes with no objection. With its current implementation (a paragraph that points to a PR), the atomicity principle is largely in the eyes of the beholder (unfortunately, because I think it is a fundamental perspective we should keep in mind and try to enforce in some way). This is to say, this BEP currently tries to make changes as atomic as possible (saving the cohort discussion for which I have no more time, if it drops, I will not say a thing). |
Co-authored-by: Yaroslav Halchenko <[email protected]>
Co-authored-by: Yaroslav Halchenko <[email protected]>
Concrete suggestions addressed. Other conversations out of scope for post-BEP review.
effigies
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a full read through, and AFAIC this is ready for merge. One last comment for consistency.
| Species: | ||
| name: Species | ||
| display_name: Species | ||
| description: | | ||
| The species from which the atlas was derived. For example, "human", "macaque", "rat", or "mouse". | ||
| type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment: Should this be unified with objects.columns.species?
bids-specification/src/schema/objects/columns.yaml
Lines 604 to 617 in 57d7711
| species: | |
| name: species | |
| display_name: Species | |
| description: | | |
| The `species` column SHOULD be a binomial species name from the | |
| [NCBI Taxonomy](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi) | |
| (for example, `homo sapiens`, `mus musculus`, `rattus norvegicus`). | |
| For backwards compatibility, if `species` is absent, the participant is assumed to be | |
| `homo sapiens`. | |
| definition: { | |
| "Description": | |
| "Latin binomial species name from the NCBI Taxonomy (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi).", | |
| "Format": "string", | |
| } |
We could even use:
| Species: | |
| name: Species | |
| display_name: Species | |
| description: | | |
| The species from which the atlas was derived. For example, "human", "macaque", "rat", or "mouse". | |
| type: string | |
| Species: | |
| $ref: objects.columns.species | |
| name: Species | |
| type: string | |
| definition: null |
This change makes BEP038's atlas metadata approach consistent with existing BIDS conventions for entity-specific TSV files (like participants.tsv, sessions.tsv, descriptions.tsv). Changes: - Replace atlas-<label>_description.json with atlases.tsv - Add templates.tsv for template metadata - Add tpl-<label>_cohorts.tsv for cohort metadata within templates - Support tpl-<label>_atlases.tsv for template-level atlas metadata The atlases.tsv includes columns derived from the original JSON fields: - atlas_id, name, license (REQUIRED) - description, authors (RECOMMENDED) - curators, funding, references_and_links, species, sample_size, derived_from, level_type, rrid (OPTIONAL) Array fields (authors, curators, funding, references_and_links) use JSON-encoded strings in TSV cells. Schema changes: - Added column definitions for all atlas/template/cohort TSV columns - Added suffix definitions for atlases, templates, cohorts - Added tabular_data rules for column validation - Added file rules supporting both root-level and template-level naming - Removed old atlas_description JSON-based rules and associations This addresses the inconsistency noted in: - Issue #2285: atlas-<label>_description.json pattern introduced by BEP038 - Issue #2283: Proposal to formalize {entities_plural}.tsv pattern This is a follow-up to PR #1714 (BEP038: Templates and atlases). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace atlas-<label>_description.json with atlases.tsv for atlas metadata, consistent with existing entity-specific TSV files (participants.tsv, sessions.tsv). Key changes: - atlases.tsv: placed at derivative dataset root or tpl-<label>/ directory - Metadata fields converted to snake_case column names - Array fields (authors, curators, funding, references_and_links) use JSON encoding Schema changes: - Add atlas_id and all atlas-specific columns to columns.yaml - Add atlases suffix to suffixes.yaml - Add Atlases rule to common_derivatives.yaml - Add atlases file rule to tables.yaml - Remove old atlas_description association, context, and metadata rules - Delete atlas.yaml check and file rules Documentation: - Update atlas.md with atlases.tsv section and updated examples - Add JSON-encoded array note - Replace all atlas-<label>_description.json references with atlases.tsv Closes: #2285 Related to: #2283 Related to: #1714 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
note: sorry, I made some wrong push, reverted right away. those changes are now in dedicated PR(s) |
Replace atlas-<label>_description.json with atlases.tsv for atlas metadata, consistent with existing entity-specific TSV files (participants.tsv, sessions.tsv). Key changes: - atlases.tsv: placed at derivative dataset root or tpl-<label>/ directory - Metadata fields converted to snake_case column names - Array fields (authors, curators, funding, references_and_links) use JSON encoding Schema changes: - Add atlas_id and all atlas-specific columns to columns.yaml - Add atlases suffix to suffixes.yaml - Add Atlases rule to common_derivatives.yaml - Add atlases file rule to tables.yaml - Remove old atlas_description association, context, and metadata rules - Delete atlas.yaml check and file rules Documentation: - Update atlas.md with atlases.tsv section and updated examples - Add JSON-encoded array note - Replace all atlas-<label>_description.json references with atlases.tsv Closes: #2285 Related to: #2283 Related to: #1714 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace atlas-<label>_description.json with atlases.tsv for atlas metadata, consistent with existing entity-specific TSV files (participants.tsv, sessions.tsv). Key changes: - atlases.tsv: placed at derivative dataset root or tpl-<label>/ directory - Metadata fields converted to snake_case column names - Array fields (authors, curators, funding, references_and_links) use JSON encoding Schema changes: - Add atlas_id and all atlas-specific columns to columns.yaml - Add atlases suffix to suffixes.yaml - Add Atlases rule to common_derivatives.yaml - Add atlases file rule to tables.yaml - Remove old atlas_description association, context, and metadata rules - Delete atlas.yaml check and file rules Documentation: - Update atlas.md with atlases.tsv section and updated examples - Add JSON-encoded array note - Replace all atlas-<label>_description.json references with atlases.tsv Closes: #2285 Related to: #2283 Related to: #1714 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace atlas-<label>_description.json with atlases.tsv for atlas metadata, consistent with existing entity-specific TSV files (participants.tsv, sessions.tsv). Key changes: - atlases.tsv: placed at derivative dataset root or tpl-<label>/ directory - Metadata fields converted to snake_case column names - Array fields (authors, curators, funding, references_and_links) use JSON encoding Schema changes: - Add atlas_id and all atlas-specific columns to columns.yaml - Add atlases suffix to suffixes.yaml - Add Atlases rule to common_derivatives.yaml - Add atlases file rule to tables.yaml - Remove old atlas_description association, context, and metadata rules - Delete atlas.yaml check and file rules Documentation: - Update atlas.md with atlases.tsv section and updated examples - Add JSON-encoded array note - Replace all atlas-<label>_description.json references with atlases.tsv Closes: #2285 Related to: #2283 Related to: #1714 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
This PR proposes the inclusion of BEP-038 into the BIDS specification.
The BEP is hosted on its own branch in the main repository, and BEP leads have been given write permissions to the branch. If you are a BEP lead and having issues writing to the branch, please let a maintainer know and we can make sure you have access.
At present this is a starting point, and I anticipate significant rewriting and reorganization before final merge. There is extensive discussion in #1281 that needs to be turned into concrete proposals. I would suggest keeping the discussion on this PR minimal so that newcomers can follow the thread. When opening PRs against this branch, please include a reference to this PR (
#1714) so that it shows up in the thread.Rendered draft
https://bids-specification--1714.org.readthedocs.build/en/1714/derivatives/atlas.html
Related discussions
Associated PRs