You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@huddlej flagged sequences with unusual collection dates on Slack, where date == date_submitted. We should exclude these sequences from the builds because this is a clear metadata issue.
Possible solutions
Add (date != date_submitted) to all of the filter queries across all configs
Add a new filter rule in the main workflow to exclude these sequences for all builds
Add a new filter rules in the upload workflow to exclude these sequences in our S3 files
Add specific sequences to outliers.txt (e.g. 8209b35)
The text was updated successfully, but these errors were encountered:
I'm a little worried about excluding these types of records algorithmically without any notification to us. Ideally, we want to catch these data issues, alert the data provider so they can fix the records, and update our records to use the correct metadata. Another approach might be to make a QC report that runs weekly (on new data only?) with checks for this kind of issue plus Nextclade QC statuses, failed alignments, etc.
If data providers can't or won't update their records, the outlier file approach seems reasonable.
Another approach might be to make a QC report that runs weekly (on new data only?) with checks for this kind of issue plus Nextclade QC statuses, failed alignments, etc.
Ah that would be nice! Seems like something we can add to the upload + Nextclade workflows.
If data providers can't or won't update their records, the outlier file approach seems reasonable.
With the outlier file approach, I feel like we never go back to check if the "outliers" have been fixed. I guess if we implement the QC report, it can flag sequences that have been fixed and can be removed from the outlier files.
Context
@huddlej flagged sequences with unusual collection dates on Slack, where
date
==date_submitted
. We should exclude these sequences from the builds because this is a clear metadata issue.Possible solutions
(date != date_submitted)
to all of the filter queries across all configsThe text was updated successfully, but these errors were encountered: