Skip to content

XXXX-XX-XX passes any date filter #1894

@victorlin

Description

@victorlin

Scope

augur filter and augur subsample

Current Behavior

XXXX-XX-XX passes any --min-date or --max-date filter because it evaluates to [0001-01-01, present].

Expected behavior

It should behave as if the value was an empty string, i.e. dropped.

How to reproduce

Cram test:

Create metadata TSV file with two date values that should be functionally equivalent.

  $ cat >metadata.tsv <<~~
  > strain	date
  > SEQ_1	
  > SEQ_2	XXXX-XX-XX
  > ~~

BUG: SEQ_2 passes for --min-date and --max-date.

  $ ${AUGUR} filter \
  >  --metadata metadata.tsv \
  >  --min-date 2025 \
  >  --output-strains filtered_strains.txt 2>/dev/null
  $ cat filtered_strains.txt
  SEQ_2

  $ ${AUGUR} filter \
  >  --metadata metadata.tsv \
  >  --max-date 2025 \
  >  --output-strains filtered_strains.txt 2>/dev/null
  $ cat filtered_strains.txt
  SEQ_2

Possible solution

Write a specific regular expression for this special value and handle it before handling the ambiguous date pattern.

--- a/augur/dates/__init__.py
+++ b/augur/dates/__init__.py
@@ -193,10 +193,19 @@ Matches an Augur-style ambiguous date with 'XX' used to mask unknown parts of th
 Note that this can support any date format, not just YYYY-MM-DD.
 """
 
+RE_AUGUR_MISSING_DATE = re.compile(r'^XXXX-XX-XX$')
+"""
+Matches an Augur-style ambiguous date with all parts masked.
+This only supports YYYY-MM-DD format.
+"""
+
 @cache
 def get_numerical_date_from_value(value, fmt, min_max_year=None) -> Union[float, Tuple[float, float], None]:
     value = str(value)
 
+    if RE_AUGUR_AMBIGUOUS_DATE.match(value):
+        return None
+
     # 1. Check if value is an exact date in the specified format (fmt).
 
     try:

Additional context

Noticed in nextstrain/ebola@b27d1ba

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions