Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle wes/wgs inheritance edge case #4440
base: dev
Are you sure you want to change the base?
handle wes/wgs inheritance edge case #4440
Changes from 9 commits
babadd7
7115f81
fb1dc19
de664cd
8d9d25d
877abbc
fb88af9
773ba0c
18d1d63
4d58e03
e1edb07
d806332
3206941
ef5a4ec
2d5d07f
f82c0fe
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting for posterity that I'm not 100% sure that this is the final logic we want, because it would allow us to return variants that pass quality and fail inheritance in one sample type and fail quality and pass inheritance in the other, meaning theres no sample type that clearly passes both inheritance and quality. However, I think we maybe do want to return these, the logic gets kind of confusing and I can't quite be sure these would not be helpful. I think being overly permissive here is better, if the analysts are seeing a bunch of cases where they ultimately think that the returned variants are not helpful and should be filtered out we can always go back later and make this a stricter criteria, so we should leave this as is for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, we may want to consider quality and inheritance passing together and handle it differently (the
&
seems like it could be too simple) but it's not clear to me what the logic/change would be. I agree that trying this out and getting feedback from analysts before we do that is the way to go.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string comparisons in hail are substantially slower than integer comparisons.
family_entries_field
andfailed_family_sample_field
for a given sample type should be the same length and the same order so you should be able to rewrite this more performantly asThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this won't work since
family_entries_field
andfailed_family_sample_field
for a given sample type are not necessarily the same length.ht[sample_type.other_sample_type.failed_family_sample_field][other_sample_type_family_idx]
is a list of sample_ids of variable length - it only contains failed sample IDs - so the [i] access won't work. I originally hadfailed_family_sample_field
as an array of failing indices but switched it to an array of failing sample IDs because it was easier for me to come up with a solution to compare the same samples from different sample types.Maybe a better way to structure
failed_family_sample_field
is an array of booleans that matches the shape offamily_entries_field
where the value at each sample index is true if failed or false if pass. It's just more complicated to compare samples individually that way - I'd need to reintroduce that map of families to samples to sample types to indices inside of their respective family_entries field.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if
failed_family_sample_field
is a list of booleans then you can easily get the list of passing sample ids from the main family_entries field without maintaining an entry map, and then your logic for usingother_sample_type_pass_samples
would not need to changeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I refactored the
passes_inheritance_field
to be a list of booleans corresponding to the samples in family entries, ^ and used this more performant code to get a list of passing sample IDsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think a comment here explaining whats being tested is helpful. Maybe something like "Variant 2 fails inheritance when parental data is present"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is testing the case where its not returned when theres no valid WGS parental data, but can we also test that it IS returned when we include the parental data for WGS and its overridden?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this case is covered by an above test -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The search criteria changes between these 2 test cases. We really need back-to-back tests that the exact same search returns different results when the paternal data is present/absent. Instead of running a dominant and recessive search each filtered to variant 1 and variant 2, it would probably be better to not use an interval filter at all and test that dominant and recessive search return the expected combo of variants with and without paternal data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this^!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is unneeded