Skip to content

Commit fca09c1

Browse files
authored
Merge pull request #1797: filter: Update notes around --sequence-index
2 parents 7c5fa84 + 33a6ef2 commit fca09c1

File tree

7 files changed

+10
-19
lines changed

7 files changed

+10
-19
lines changed

CHANGES.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22

33
## __NEXT__
44

5+
### Bug fixes
6+
7+
* filter: Removed the note that appeared in output when running with `--sequences` and without `--sequence-index`. The help text of both options has been updated to clarify the relationship between the two. [#1797][] (@victorlin)
8+
9+
[#1797]: https://github.com/nextstrain/augur/pull/1797
510

611
## 30.0.0 (15 April 2025)
712

augur/filter/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ def register_arguments(parser):
1818
"""
1919
input_group = parser.add_argument_group("inputs", "metadata and sequences to be filtered")
2020
input_group.add_argument('--metadata', required=True, metavar="FILE", help="sequence metadata")
21-
input_group.add_argument('--sequences', '-s', help="sequences in FASTA or VCF format")
22-
input_group.add_argument('--sequence-index', help="sequence composition report generated by augur index. If not provided, an index will be created on the fly.")
21+
input_group.add_argument('--sequences', '-s', help="sequences in FASTA or VCF format. For large inputs, consider using --sequence-index in addition to this option.")
22+
input_group.add_argument('--sequence-index', help="sequence composition report generated by augur index. If not provided, an index will be created on the fly. This should be generated from the same file as --sequences.")
2323
input_group.add_argument('--metadata-chunk-size', type=int, default=100000, help="maximum number of metadata records to read into memory at a time. Increasing this number can speed up filtering at the cost of more memory used.")
2424
input_group.add_argument('--metadata-id-columns', default=DEFAULT_ID_COLUMNS, nargs="+", action=ExtendOverwriteDefault, help="names of possible metadata columns containing identifier information, ordered by priority. Only one ID column will be inferred.")
2525
input_group.add_argument('--metadata-delimiters', default=DEFAULT_DELIMITERS, nargs="+", action=ExtendOverwriteDefault, help="delimiters to accept when reading a metadata file. Only one delimiter will be inferred.")

augur/filter/_run.py

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -46,18 +46,12 @@ def run(args):
4646
build_sequence_index = True
4747

4848
if build_sequence_index:
49-
# Generate the sequence index on the fly, for backwards compatibility
50-
# with older workflows that don't generate the index ahead of time.
51-
# Create a temporary index using a random filename to avoid collisions
52-
# between multiple filter commands.
49+
# Generate the sequence index on the fly for workflows that don't do
50+
# this separately. Create a temporary index using a random filename to
51+
# avoid collisions between multiple filter commands.
5352
with NamedTemporaryFile(delete=False) as sequence_index_file:
5453
sequence_index_path = sequence_index_file.name
5554

56-
print_err(
57-
"Note: You did not provide a sequence index, so Augur will generate one.",
58-
"You can generate your own index ahead of time with `augur index` and pass it with `augur filter --sequence-index`."
59-
)
60-
6155
if is_vcf:
6256
index_vcf(args.sequences, sequence_index_path)
6357
else:

tests/functional/filter/cram/filter-deprecated-options.t

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ Create files
2626
> --sequences sequences.fasta \
2727
> --output filtered.fasta
2828
WARNING: --output is deprecated. Use --output-sequences instead.
29-
Note: You did not provide a sequence index, so Augur will generate one. You can generate your own index ahead of time with `augur index` and pass it with `augur filter --sequence-index`.
3029
1 strain was dropped during filtering
3130
1 had no metadata
3231
2 strains passed all filters
@@ -44,7 +43,6 @@ Create files
4443
> --sequences sequences.fasta \
4544
> -o filtered.fasta
4645
WARNING: -o is deprecated. Use --output-sequences instead.
47-
Note: You did not provide a sequence index, so Augur will generate one. You can generate your own index ahead of time with `augur index` and pass it with `augur filter --sequence-index`.
4846
1 strain was dropped during filtering
4947
1 had no metadata
5048
2 strains passed all filters

tests/functional/filter/cram/filter-duplicates-error.t

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,6 @@ Error on duplicates in sequences.
6767
> --metadata metadata.tsv \
6868
> --sequences sequences.fasta \
6969
> --output-sequences sequences-filtered.fasta
70-
Note: You did not provide a sequence index, so Augur will generate one. You can generate your own index ahead of time with `augur index` and pass it with `augur filter --sequence-index`.
7170
ERROR: The following strains are duplicated in 'sequences.fasta':
7271
a
7372
c
@@ -79,7 +78,6 @@ Error even if the corresponding output is not used.
7978
> --metadata metadata.tsv \
8079
> --sequences sequences.fasta \
8180
> --output-strains filtered.txt
82-
Note: You did not provide a sequence index, so Augur will generate one. You can generate your own index ahead of time with `augur index` and pass it with `augur filter --sequence-index`.
8381
ERROR: The following strains are duplicated in 'sequences.fasta':
8482
a
8583
c

tests/functional/filter/cram/filter-mismatched-sequences-error.t

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ This should produce no results because the intersection of metadata and sequence
1212
> --min-length 4 \
1313
> --max-date 2020-01-30 \
1414
> --output-strains filtered_strains.txt > /dev/null
15-
Note: You did not provide a sequence index, so Augur will generate one. You can generate your own index ahead of time with `augur index` and pass it with `augur filter --sequence-index`.
1615
13 strains were dropped during filtering
1716
1 had no metadata
1817
12 had no sequence data
@@ -29,7 +28,6 @@ Repeat with sequence and strain outputs. We should get the same results.
2928
> --max-date 2020-01-30 \
3029
> --output-strains filtered_strains.txt \
3130
> --output-sequences filtered.fasta > /dev/null
32-
Note: You did not provide a sequence index, so Augur will generate one. You can generate your own index ahead of time with `augur index` and pass it with `augur filter --sequence-index`.
3331
13 strains were dropped during filtering
3432
1 had no metadata
3533
12 had no sequence data
@@ -48,7 +46,6 @@ Since we expect metadata to be filtered by presence of strains in input sequence
4846
> --sequences dummy.fasta \
4947
> --metadata "$TESTDIR/../data/metadata.tsv" \
5048
> --output-strains filtered_strains.txt > /dev/null
51-
Note: You did not provide a sequence index, so Augur will generate one. You can generate your own index ahead of time with `augur index` and pass it with `augur filter --sequence-index`.
5249
13 strains were dropped during filtering
5350
1 had no metadata
5451
12 had no sequence data

tests/functional/filter/cram/filter-sequences-vcf.t

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ Filter TB strains from VCF and save as a list of filtered strains.
1010
> --min-date 2012 \
1111
> --output-sequences filtered.vcf \
1212
> --output-strains filtered_strains.txt > /dev/null
13-
Note: You did not provide a sequence index, so Augur will generate one. You can generate your own index ahead of time with `augur index` and pass it with `augur filter --sequence-index`.
1413
162 strains were dropped during filtering
1514
155 had no sequence data
1615
7 were dropped because they were earlier than 2012.0 or missing a date

0 commit comments

Comments
 (0)