-
Notifications
You must be signed in to change notification settings - Fork 136
Description
Context
Currently, date ranges can only be defined by ambiguous/incomplete dates in the format YYYY-MM-DD with some characters swapped in with the wildcard character X¹. This only allows date ranges to be bounded by start/end of a year or month and not a more precise range within or across year/month boundaries.
Scope
Anywhere that currently accepts ambiguous/incomplete dates (i.e. uses AmbiguousDate):
augur filteraugur frequenciesaugur refine
Potential solutions
- ⛔️ Allow scalar boundary values under new columns e.g.
--min-date-column/--max-date-column - ✅ Allow a new range format for values under the existing
datecolumn²- ⛔️ Allow the TreeTime format of
[min:max]whereminandmaxare numeric dates - ⛔️ Allow the ISO format of
min/maxwhereminandmaxsupports various formats that, when taken together, resolve to a date interval. - ⛔️ Merge (1) and (2) allowing a format of
[min/max]whereminandmaxare any scalar date accepted by Augur (ISO or numeric) - ✅ Allow a subset of the ISO format from (2), namely
YYYY-MM-DD/YYYY-MM-DD.
- ⛔️ Allow the TreeTime format of
¹ this could be better documented: #882
² there is a feature request to make the date column name customizable: #1443
original issue description
Context
To ensure patient privacy, Denmark bins SARS-CoV-2 collection dates to the Monday of the week the actual collection date lies in.
Currently, we seem to be unable to specify such ambiguity within augur refine even though treetime supports arbitrarily constrained date ranges.
The only workaround right now I can think of is to make the date ambiguous, but that throws away information while also not working in situations where the week crosses a month boundary.
The issue has become particularly noticeable when making BA.2.86 trees where Denmark has provided ~30% of global sequences so far. To remove bias, I could add 3/4 days to the dates, but it would be nice if refine just accepted ranges as such.
Description
Make refine support arbitrary min/max input dates.
Possible solution
The simplest way to implement this would be to accept the min/max date format that's natively supported by treetime: [min:max] as in [2022.3452:2022.3649] (opening bracket, min date as year float, colon, max date as year float, closing bracket)
A neater solution might be to allow min and max dates to be specified through two columns: --min-date and --max-date.
This change could potentially be made across all date handling: it would be more general than our current x'ing strategy, e.g. 2021-10-XX for ambiguous date of month. That'd be a lot of work though.