Skip to content

Commit 52c8b55

Browse files
filter: Keep index as a column for filtering
This allows column-based filters to use the id column. Co-authored-by: Cornelius Roemer <[email protected]>
1 parent 5bb4b6a commit 52c8b55

File tree

2 files changed

+33
-0
lines changed

2 files changed

+33
-0
lines changed

augur/filter/_run.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,7 @@ def run(args):
200200
delimiters=[metadata_object.delimiter],
201201
columns=useful_metadata_columns,
202202
id_columns=[metadata_object.id_column],
203+
keep_id_as_column=True,
203204
chunk_size=args.metadata_chunk_size,
204205
dtype="string",
205206
)
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
Setup
2+
3+
$ source "$TESTDIR"/_setup.sh
4+
5+
The id column can be used in --query, --exclude-where, and --include-where.
6+
7+
$ cat > metadata.tsv <<~~
8+
> strain accession clade
9+
> strain1 acc1 A
10+
> strain2 acc2 A
11+
> strain3 acc3 B
12+
> strain4 acc4 B
13+
> strain4 acc5 C
14+
> ~~
15+
16+
$ ${AUGUR} filter \
17+
> --metadata metadata.tsv \
18+
> --metadata-id-columns accession \
19+
> --query "accession != 'acc5'" \
20+
> --exclude-where accession=acc2 clade=B \
21+
> --include-where accession=acc3 \
22+
> --output-strains filtered.txt
23+
3 strains were dropped during filtering
24+
1 was dropped because of 'accession=acc2'
25+
2 were dropped because of 'clade=B'
26+
1 was filtered out by the query: "accession != 'acc5'"
27+
1 was added back because of 'accession=acc3'
28+
2 strains passed all filters
29+
30+
$ sort filtered.txt
31+
acc1
32+
acc3

0 commit comments

Comments
 (0)