Skip to content

Allow precise date ranges #1304

@corneliusroemer

Description

@corneliusroemer

Context

Currently, date ranges can only be defined by ambiguous/incomplete dates in the format YYYY-MM-DD with some characters swapped in with the wildcard character X¹. This only allows date ranges to be bounded by start/end of a year or month and not a more precise range within or across year/month boundaries.

Scope

Anywhere that currently accepts ambiguous/incomplete dates (i.e. uses AmbiguousDate):

  • augur filter
  • augur frequencies
  • augur refine

Potential solutions

  1. ⛔️ Allow scalar boundary values under new columns e.g. --min-date-column / --max-date-column
  2. ✅ Allow a new range format for values under the existing date column²
    1. ⛔️ Allow the TreeTime format of [min:max] where min and max are numeric dates
    2. ⛔️ Allow the ISO format of min/max where min and max supports various formats that, when taken together, resolve to a date interval.
    3. ⛔️ Merge (1) and (2) allowing a format of [min/max] where min and max are any scalar date accepted by Augur (ISO or numeric)
    4. ✅ Allow a subset of the ISO format from (2), namely YYYY-MM-DD/YYYY-MM-DD.

¹ this could be better documented: #882
² there is a feature request to make the date column name customizable: #1443


original issue description

Context

To ensure patient privacy, Denmark bins SARS-CoV-2 collection dates to the Monday of the week the actual collection date lies in.

Currently, we seem to be unable to specify such ambiguity within augur refine even though treetime supports arbitrarily constrained date ranges.

The only workaround right now I can think of is to make the date ambiguous, but that throws away information while also not working in situations where the week crosses a month boundary.

The issue has become particularly noticeable when making BA.2.86 trees where Denmark has provided ~30% of global sequences so far. To remove bias, I could add 3/4 days to the dates, but it would be nice if refine just accepted ranges as such.

Description

Make refine support arbitrary min/max input dates.

Possible solution

The simplest way to implement this would be to accept the min/max date format that's natively supported by treetime: [min:max] as in [2022.3452:2022.3649] (opening bracket, min date as year float, colon, max date as year float, closing bracket)

A neater solution might be to allow min and max dates to be specified through two columns: --min-date and --max-date.

This change could potentially be made across all date handling: it would be more general than our current x'ing strategy, e.g. 2021-10-XX for ambiguous date of month. That'd be a lot of work though.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestpriority: highTo be resolved before other issues

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions