-
Notifications
You must be signed in to change notification settings - Fork 190
[ENH] Deprecate 89+ string for default age column, increase expressiveness of column definitions in sidecar files #2162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Deprecate 89+ string for default age column, increase expressiveness of column definitions in sidecar files #2162
Conversation
|
@effigies Mostly LGTM as for what we had discussed - I will take a second look at src/schema/objects/metadata.yaml later to ensure I understand Were you planning to submit a separate pre-PR to fix the mypy issues on |
Done in #2163. |
205c898 to
095cfdf
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2162 +/- ##
=======================================
Coverage 82.71% 82.71%
=======================================
Files 20 20
Lines 1608 1608
=======================================
Hits 1330 1330
Misses 278 278 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
yarikoptic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more discussion points/ideas
| "Description": "Subject age in postnatal years", | ||
| "Format": "number", | ||
| "Units": "year", | ||
| "Maximum": 89, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a wonderful motivation for me to extract the bst migrate-datasets from https://github.com/bids-standard/bids-specification/pull/1775/files#diff-24b8a97543a1dc43190b8d5b070df26e1b88023f2b468b4483255af049788788R39 , likely to call upgrade-datasets instead. And add migration for this so while going to newer version we could get participants.tsv fixed up automagically?
FTR -- there is a one bids-examples with 89+
❯ git grep 89\+ **/*.tsv
genetics_ukbb/participants.tsv:sub-05 89+ M control 623589 yes b002 SNV/CNV hg19/hg38 WES/SNP_intensity
genetics_ukbb/participants.tsv:sub-06 89+ F patient 458712 no b003 SNV hg19 WES
genetics_ukbb/participants.tsv:sub-07 89+ M patient 985632 yes b004 SNV hg19 WES
genetics_ukbb/participants.tsv:sub-13 89+ M patient 225587 no b001 SNV/CNV hg19/hg38 WES/SNP_intensity
| def test_format_consistency(schema_obj): | ||
| """Test that the "Format" field is consistent with objects.formats.""" | ||
| assert set(schema_obj.objects.metadata.Format.enum) == schema_obj.objects.formats.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think any additional testing would be worth it (e.g., example files asserting their column values to be of a particular type, and ensuring validation errors if not that particular non-string type, etc.) or is the assumption that the current testing suite would cover that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's out-of-scope for bidsschematools, but will definitely be in-scope for the validator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want to start a sister PR over on the validator to ensure this works as expected or simply proceed and patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is great to see for specifications to converge! You should come over to the North more often!
Co-authored-by: Cody Baker <[email protected]>
7060607 to
1162a5b
Compare
…oth string/number and arrays Cherry-pick: gh-2162
This is a minimal change to column JSON definitions to permit them to be as expressive as JSON schema for defining TSV columns. This adds:
Format, which covers bothtypeandformat, given that all TSV values are strings until cast by a parser.MinimumandMaximum, which are used for some values. Here I permit them also to be strings, on the grounds that someone might want a minimum/maximum datetime.This also deprecates the string "89+" for ages above 88, instead setting a
Maximumof89. It is expected that a validator implementing this will carve out an exception foragefor 89+. The alternative would be something along the lines of:Adding the equivalent of
anyOfto JSON sidecar column descriptions would be excessive complexity. Looking at the existing uses ofanyOf, two are degenerate (anyOf: {type: string}) and one allowed strings or numbers, which is equivalent totype: stringfor TSV values.At this point, this PR does not attempt to adopt a single convention for schema-defined and sidecar-defined columns. I think we probably should, and posted some options in #2048 (comment).
Closes #1791
Closes #2045
Closes #2048