-
Notifications
You must be signed in to change notification settings - Fork 136
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Scope
augur filter and augur subsample
Current Behavior
XXXX-XX-XX passes any --min-date or --max-date filter because it evaluates to [0001-01-01, present].
Expected behavior
It should behave as if the value was an empty string, i.e. dropped.
How to reproduce
Cram test:
Create metadata TSV file with two date values that should be functionally equivalent.
$ cat >metadata.tsv <<~~
> strain date
> SEQ_1
> SEQ_2 XXXX-XX-XX
> ~~
BUG: SEQ_2 passes for --min-date and --max-date.
$ ${AUGUR} filter \
> --metadata metadata.tsv \
> --min-date 2025 \
> --output-strains filtered_strains.txt 2>/dev/null
$ cat filtered_strains.txt
SEQ_2
$ ${AUGUR} filter \
> --metadata metadata.tsv \
> --max-date 2025 \
> --output-strains filtered_strains.txt 2>/dev/null
$ cat filtered_strains.txt
SEQ_2
Possible solution
Write a specific regular expression for this special value and handle it before handling the ambiguous date pattern.
--- a/augur/dates/__init__.py
+++ b/augur/dates/__init__.py
@@ -193,10 +193,19 @@ Matches an Augur-style ambiguous date with 'XX' used to mask unknown parts of th
Note that this can support any date format, not just YYYY-MM-DD.
"""
+RE_AUGUR_MISSING_DATE = re.compile(r'^XXXX-XX-XX$')
+"""
+Matches an Augur-style ambiguous date with all parts masked.
+This only supports YYYY-MM-DD format.
+"""
+
@cache
def get_numerical_date_from_value(value, fmt, min_max_year=None) -> Union[float, Tuple[float, float], None]:
value = str(value)
+ if RE_AUGUR_AMBIGUOUS_DATE.match(value):
+ return None
+
# 1. Check if value is an exact date in the specified format (fmt).
try:Additional context
Noticed in nextstrain/ebola@b27d1ba
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working