Skip to content
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion augur/filter/_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -402,7 +402,7 @@ def run(args):
strains_file = None
if args.output_strains:
strains_file = args.output_strains
elif args.output_sequences:
elif args.output_sequences or args.output_metadata:
strains_file = NamedTemporaryFile(delete=False).name

if strains_file is not None:
Expand Down
2 changes: 1 addition & 1 deletion augur/filter/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def write_output_metadata(input_filename: str, id_column: str, output_filename:

command = f"""
{augur()} read-file {shquote(input_filename)} |
{tsv_join} -H --filter-file {ids_file} --key-fields {id_column} |
{tsv_join} -H --filter-file <(printf "%s\n" {shquote(id_column)}; cat {shquote(ids_file)}) --key-fields {shquote(id_column)} |
{augur()} write-file {shquote(output_filename)}
"""

Expand Down
15 changes: 3 additions & 12 deletions tests/functional/filter/cram/filter-output-metadata-header.t
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,7 @@ Setup

$ source "$TESTDIR"/_setup.sh

Since Pandas's read_csv() and to_csv() are used with a double-quote character as
the default quotechar, any column names with that character may be altered.

Quoted columns containing the tab delimiter are left unchanged.

# FIXME: tsv-join has different behavior here. Test both?
Quoting is unchanged regardless of placement.

$ cat >metadata.tsv <<~~
> strain "col 1"
Expand All @@ -21,8 +16,6 @@ Quoted columns containing the tab delimiter are left unchanged.
$ head -n 1 filtered_metadata.tsv
strain "col 1"

Quoted columns without the tab delimiter are stripped of the quotes.

$ cat >metadata.tsv <<~~
> strain "col1"
> SEQ_1 a
Expand All @@ -33,9 +26,7 @@ Quoted columns without the tab delimiter are stripped of the quotes.
> --output-metadata filtered_metadata.tsv 2>/dev/null

$ head -n 1 filtered_metadata.tsv
strain col1

Any other columns with quotes are quoted, and pre-existing quotes are escsaped by doubling up.
strain "col1"

$ cat >metadata.tsv <<~~
> strain col"1 col2"
Expand All @@ -47,4 +38,4 @@ Any other columns with quotes are quoted, and pre-existing quotes are escsaped b
> --output-metadata filtered_metadata.tsv 2>/dev/null

$ head -n 1 filtered_metadata.tsv
strain "col""1" "col2"""
strain col"1 col2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for linking to this draft PR in yesterday's lab meeting!

Seeing this change reminded me that this was implemented before the discussions around consistent TSV formats in #1566. I think we'd want to keep the consistent CSV-like quoting here. Not sure if wrapping the tsv-util calls with csv2tsv and csvtk fix-quotes is the correct move here as I suspect they would slow things down.

Loading