Skip to content

Commit

Permalink
Merge pull request #1653 from nextstrain/format-dates-docs
Browse files Browse the repository at this point in the history
Update docs for `curate format-dates`
  • Loading branch information
joverlee521 authored Nov 6, 2024
2 parents efabf11 + c7e8a31 commit 29188d4
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 8 deletions.
2 changes: 2 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,10 @@
### Bug Fixes

* index: Previously specifying a directory that does not exist in the path to `--output` would result in an incorrect error stating that the input file does not exist. It now shows the correct path responsible for the error. [#1644][] (@victorlin)
* curate format-dates: Update help docs and improve failure messages to show use of `--expected-date-formats`. [#1653][] (@joverlee521)

[#1644]: https://github.com/nextstrain/augur/issues/1644
[#1653]: https://github.com/nextstrain/augur/pull/1653
[#1656]: https://github.com/nextstrain/augur/pull/1656

## 26.0.0 (17 September 2024)
Expand Down
23 changes: 15 additions & 8 deletions augur/curate/format_dates.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
"""
Format date fields to ISO 8601 dates (YYYY-MM-DD), where incomplete dates
are masked with 'XX' (e.g. 2023 -> 2023-XX-XX).
Format date fields to ISO 8601 dates (YYYY-MM-DD).
If the provided ``--expected-date-formats`` represent incomplete dates then
the incomplete dates are masked with 'XX'. For example, providing
``%Y`` will allow year only dates to be formatted as ``2023-XX-XX``.
"""
import re
from datetime import datetime
Expand Down Expand Up @@ -30,14 +33,14 @@ def register_parser(parent_subparsers):
required = parser.add_argument_group(title="REQUIRED")
required.add_argument("--date-fields", nargs="+", action="extend",
help="List of date field names in the record that need to be standardized.")
required.add_argument("--expected-date-formats", nargs="+", action="extend",

optional = parser.add_argument_group(title="OPTIONAL")
optional.add_argument("--expected-date-formats", nargs="+", action="extend",
default=DEFAULT_EXPECTED_DATE_FORMATS,
help="Expected date formats that are currently in the provided date fields, " +
"defined by standard format codes as listed at " +
"https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes. " +
"If a date string matches multiple formats, it will be parsed as the first matched format in the provided order.")

optional = parser.add_argument_group(title="OPTIONAL")
optional.add_argument("--failure-reporting",
type=DataErrorMethod.argtype,
choices=list(DataErrorMethod),
Expand Down Expand Up @@ -181,6 +184,10 @@ def format_date(date_string, expected_formats):
def run(args, records):
failures = []
failure_reporting = args.failure_reporting
failure_suggestion = (
f"\nCurrent expected date formats are {args.expected_date_formats!r}. " +
"This can be updated with --expected-date-formats."
)
for index, record in enumerate(records):
record = record.copy()
record_id = index
Expand All @@ -203,7 +210,7 @@ def run(args, records):

failure_message = f"Unable to format date string {date_string!r} in field {field!r} of record {record_id!r}."
if failure_reporting is DataErrorMethod.ERROR_FIRST:
raise AugurError(failure_message)
raise AugurError(failure_message + failure_suggestion)

if failure_reporting is DataErrorMethod.WARN:
print_err(f"WARNING: {failure_message}")
Expand All @@ -221,10 +228,10 @@ def run(args, records):
'\n'.join(map(repr, failures))
)
if failure_reporting is DataErrorMethod.ERROR_ALL:
raise AugurError(failure_message)
raise AugurError(failure_message + failure_suggestion)

elif failure_reporting is DataErrorMethod.WARN:
print_err(f"WARNING: {failure_message}")
print_err(f"WARNING: {failure_message}" + failure_suggestion)

else:
raise ValueError(f"Encountered unhandled failure reporting method: {failure_reporting!r}")
3 changes: 3 additions & 0 deletions tests/functional/curate/cram/format-dates/failure-reporting.t
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ This is expected to fail with an error, so redirecting stdout since we don't car
> --date-fields "date" "collectionDate" "releaseDate" "updateDate" \
> --expected-date-formats "%Y" "%Y-%m-%dT%H:%M:%SZ" 1> /dev/null
ERROR: Unable to format date string '2020-01' in field 'collectionDate' of record 0.
Current expected date formats are ['%Y-%m-%d', '%Y-%m-XX', '%Y-XX-XX', 'XXXX-XX-XX', '%Y', '%Y-%m-%dT%H:%M:%SZ']. This can be updated with --expected-date-formats.
[2]

Test output with unmatched expected date formats with `ERROR_ALL` failure reporting.
Expand All @@ -29,6 +30,7 @@ This is expected to fail with an error, so redirecting stdout since we don't car
ERROR: Unable to format dates for the following (record, field, date string):
(0, 'collectionDate', '2020-01')
(0, 'releaseDate', '2020-01')
Current expected date formats are ['%Y-%m-%d', '%Y-%m-XX', '%Y-XX-XX', 'XXXX-XX-XX', '%Y', '%Y-%m-%dT%H:%M:%SZ']. This can be updated with --expected-date-formats.
[2]

Test output with unmatched expected date formats while warning on failures.
Expand All @@ -44,6 +46,7 @@ This is expected to print warnings for failures and return the masked date strin
WARNING: Unable to format dates for the following (record, field, date string):
(0, 'collectionDate', '2020-01')
(0, 'releaseDate', '2020-01')
Current expected date formats are ['%Y-%m-%d', '%Y-%m-XX', '%Y-XX-XX', 'XXXX-XX-XX', '%Y', '%Y-%m-%dT%H:%M:%SZ']. This can be updated with --expected-date-formats.
{"record": 1, "date": "2020-XX-XX", "collectionDate": "XXXX-XX-XX", "releaseDate": "XXXX-XX-XX", "updateDate": "2020-07-18"}

Test output with unmatched expected date formats while silencing failures.
Expand Down

0 comments on commit 29188d4

Please sign in to comment.