Skip to content

Commit 4b5ad2a

Browse files
authored
Merge pull request #1565 from nextstrain/curate-internal-quotes
Fix curate internal quotes take 2
2 parents 2890452 + 3f94f3e commit 4b5ad2a

File tree

4 files changed

+13
-7
lines changed

4 files changed

+13
-7
lines changed

CHANGES.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@
22

33
## __NEXT__
44

5+
### Features
6+
7+
- curate: change output metadata to [RFC 4180 CSV-like TSVs][] to match the TSV format output by other Augur subcommands and the Nextstrain ecosystem as discussed in [#1566][]. [#1565][] (@joverlee521)
8+
9+
[#1565]: https://github.com/nextstrain/augur/pull/1565
10+
[#1566]: https://github.com/nextstrain/augur/issues/1566
11+
[RFC 4180 CSV-like TSVs]: https://datatracker.ietf.org/doc/html/rfc4180
512

613
## 26.1.0 (12 November 2024)
714

augur/io/metadata.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -273,7 +273,8 @@ def visible_worksheet(s: calamine.SheetMetadata) -> bool:
273273
# change in a future Python version.
274274
raise InvalidDelimiter from error
275275

276-
metadata_reader = csv.DictReader(handle, dialect=dialect)
276+
# Only use the dialect delimiter and keep all other default format params
277+
metadata_reader = csv.DictReader(handle, delimiter=dialect.delimiter)
277278

278279
columns, records = metadata_reader.fieldnames, iter(metadata_reader)
279280

@@ -549,8 +550,6 @@ def write_records_to_tsv(records, output_file):
549550
extrasaction='ignore',
550551
delimiter='\t',
551552
lineterminator='\n',
552-
quoting=csv.QUOTE_NONE,
553-
quotechar=None,
554553
)
555554
tsv_writer.writeheader()
556555
tsv_writer.writerow(first_record)

tests/functional/curate/cram/metadata-output-with-internal-quotes.t

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,18 +12,18 @@ Create NDJSON with internal quotes
1212
> ~~
1313

1414
Test passthru with output to TSV.
15-
This should not add any quotes around the field with internal quotes.
15+
This should add double quotes around the internal quotes to match CSV-like quoting.
1616

1717
$ cat records.ndjson \
1818
> | ${AUGUR} curate passthru \
1919
> --output-metadata output-metadata.tsv
2020

2121
$ cat output-metadata.tsv
2222
strain\tsubmitting_lab (esc)
23-
sequence_A\tSRC VB "Vector", Molecular Biology of Genomes (esc)
23+
sequence_A\t"SRC VB ""Vector"", Molecular Biology of Genomes" (esc)
2424

2525
Run the output TSV through augur curate passthru again.
26-
The new output should still be identical to the first output.
26+
The new output should still be identical to the first output because it is already double quoted.
2727

2828
$ ${AUGUR} curate passthru \
2929
> --metadata output-metadata.tsv \

tests/io/test_metadata.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -458,7 +458,7 @@ def output_records():
458458
def expected_output_tsv():
459459
return (
460460
"strain\tcountry\tdate\n"
461-
'SEQ_A\t"USA"\t2020-10-01\n'
461+
'SEQ_A\t"""USA"""\t2020-10-01\n'
462462
"SEQ_T\tUSA\t2020-10-02\n"
463463
)
464464

0 commit comments

Comments
 (0)