Skip to content

Commit 80c06c1

Browse files
authored
Merge pull request #1160 from nextstrain/nextclade-v3
Migrate to Nextclade v3
2 parents 66aa9be + f269e69 commit 80c06c1

File tree

10 files changed

+23
-196
lines changed

10 files changed

+23
-196
lines changed

defaults/parameters.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ files:
7171
mutational_fitness_distance_map: "defaults/mutational_fitness_distance_map.json"
7272
sites_to_mask: "defaults/sites_ignored_for_tree_topology.txt"
7373

74-
# Define genes to translate during alignment by nextalign.
74+
# Define genes to translate during alignment by nextclade.
7575
genes: ["ORF1a", "ORF1b", "S", "ORF3a", "E", "M", "ORF6", "ORF7a", "ORF7b", "ORF8", "N", "ORF9b"]
7676

7777
# Filter settings

docs/src/reference/change_log.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,10 @@ We also use this change log to document new features that maintain backward comp
55

66
## New features since last version update
77

8+
## v14 (23 October 2024)
9+
10+
- 23 October 2024: Update workflow to use Nextclade v3. This includes the removal of unused mutation summary script and rules that expected Nextclade v2 outputs. Dropping the mutation summary rules removed the need for the full alignment rule `align` to produce the insertions and translations outputs, so they have been removed. The `build_align` rule no longer produces a separate `insertions.tsv` since insertions are now included in the `nextclade_qc.tsv`. [PR 1160](https://github.com/nextstrain/ncov/pull/1160)
11+
812
- 2 October 2024: Include a new parameter for `clade_recency` under `colors`. This parameter is used to define which clades should receive a color from the standard rainbow palette. A value of `6M` will cause clades with strains in the tree sampled within the last 6 months to be colored and earlier strains to not receive a color (and be colored in a palette of grays by Auspice). This `clade_recency` parameter is used in `builds.yaml` in `nextstrain_profiles` to color clades according for the `1m`, `2m`, `6m` and `all-time` timepoints. If `clade_recency` is not supplied then all clades will be colored. [PR 1132](https://github.com/nextstrain/ncov/pull/1132)
913

1014
- 30 September 2024: Use population-based weighted sampling for `nextstrain_profiles`. This requires a minimum Augur version of 25.3.0. PRs [1106](https://github.com/nextstrain/ncov/pull/1106), [1150](https://github.com/nextstrain/ncov/pull/1150), [1151](https://github.com/nextstrain/ncov/pull/1151)

docs/src/reference/workflow-config-file.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -431,7 +431,7 @@ genes
431431
-----
432432

433433
- type: array
434-
- description: A list of genes for which ``nextalign`` should generate amino acid sequences during the alignment process. Gene names must match the names provided in the gene map from the ``annotation`` parameter.
434+
- description: A list of genes for which ``nextclade`` should generate amino acid sequences during the alignment process. Gene names must match the names provided in the gene map from the ``annotation`` parameter.
435435
- default: ``["ORF1a", "ORF1b", "S", "ORF3a", "M", "N"]``
436436
- used in rules: ``align``, ``build_align``, ``translate``, ``mutational_fitness``
437437

@@ -513,17 +513,17 @@ alignment_reference
513513
~~~~~~~~~~~~~~~~~~~
514514

515515
- type: string
516-
- description: Path to a FASTA-formatted sequence to use for alignment with ``nextalign``
516+
- description: Path to a FASTA-formatted sequence to use for alignment with ``nextclade``
517517
- default: ``defaults/reference_seq.fasta``
518-
- used in rules: ``align``, ``proximity_score`` (subsampling), ``build_align``, ``build_mutation_summary``
518+
- used in rules: ``align``, ``proximity_score`` (subsampling)
519519

520520
annotation
521521
~~~~~~~~~~
522522

523523
- type: string
524-
- description: Path to a GFF-formatted annotation of gene coordinates (e.g., a “gene map”) for use by ``nextalign`` and mutation summaries.
524+
- description: Path to a GFF-formatted annotation of gene coordinates (e.g., a “gene map”) for use by ``nextclade`` for codon-aware alignment.
525525
- default: ``defaults/annotation.gff``
526-
- used in rules: ``align``, ``build_align``, ``build_mutation_summary``
526+
- used in rules: ``align``
527527

528528
outgroup
529529
~~~~~~~~

nextstrain_profiles/nextstrain-gisaid-21L/builds.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ slack_token: ~
1818
slack_channel: "#ncov-gisaid-updates"
1919

2020
genes: ["ORF1a", "ORF1b", "S", "ORF3a", "E", "M", "ORF6", "ORF7a", "ORF7b", "ORF8", "N", "ORF9b"]
21-
use_nextalign: true
2221
include_hcov19_prefix: True
2322

2423
files:

nextstrain_profiles/nextstrain-gisaid/builds.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ slack_token: ~
1717
slack_channel: "#ncov-gisaid-updates"
1818

1919
genes: ["ORF1a", "ORF1b", "S", "ORF3a", "E", "M", "ORF6", "ORF7a", "ORF7b", "ORF8", "N", "ORF9b"]
20-
use_nextalign: true
2120
include_hcov19_prefix: True
2221

2322
files:

nextstrain_profiles/nextstrain-open/builds.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ slack_token: ~
1717
slack_channel: "#ncov-genbank-updates"
1818

1919
genes: ["ORF1a", "ORF1b", "S", "ORF3a", "E", "M", "ORF6", "ORF7a", "ORF7b", "ORF8", "N", "ORF9b"]
20-
use_nextalign: true
2120
include_hcov19_prefix: False
2221

2322
files:

scripts/mutation_summary.py

Lines changed: 0 additions & 106 deletions
This file was deleted.

workflow/envs/nextstrain.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,7 @@ dependencies:
77
- augur=22.4.0
88
- epiweeks=2.1.2
99
- iqtree=2.2.0.3
10-
- nextalign=2.14.0
11-
- nextclade=2.14.0
10+
- nextclade=3.9.0
1211
- pangolin=3.1.20
1312
- pangolearn=2022.01.20
1413
- python>=3.8*

workflow/snakemake_rules/export_for_nextstrain.smk

Lines changed: 0 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -93,38 +93,6 @@ rule export_all_regions:
9393
"""
9494

9595

96-
rule mutation_summary:
97-
message: "Summarizing {input.alignment}"
98-
input:
99-
alignment = rules.align.output.alignment,
100-
insertions = rules.align.output.insertions,
101-
translations = rules.align.output.translations,
102-
reference = config["files"]["alignment_reference"],
103-
genemap = config["files"]["annotation"]
104-
output:
105-
mutation_summary = "results/mutation_summary_{origin}.tsv.xz"
106-
log:
107-
"logs/mutation_summary_{origin}.txt"
108-
benchmark:
109-
"benchmarks/mutation_summary_{origin}.txt"
110-
params:
111-
outdir = "results/translations",
112-
basename = "seqs_{origin}",
113-
genes=config["genes"],
114-
conda: config["conda_environment"]
115-
shell:
116-
r"""
117-
python3 scripts/mutation_summary.py \
118-
--alignment {input.alignment} \
119-
--insertions {input.insertions} \
120-
--directory {params.outdir} \
121-
--basename {params.basename} \
122-
--reference {input.reference} \
123-
--genes {params.genes:q} \
124-
--genemap {input.genemap} \
125-
--output {output.mutation_summary} 2>&1 | tee {log}
126-
"""
127-
12896
#
12997
# Rule for generating a per-build auspice config
13098
#

workflow/snakemake_rules/main_workflow.smk

Lines changed: 12 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -65,15 +65,13 @@ rule align:
6565
"""
6666
input:
6767
sequences = lambda wildcards: _get_path_for_input("sequences", wildcards.origin),
68-
genemap = config["files"]["annotation"],
68+
# Annotation used for codon-aware nucleotide alignment
69+
# <https://docs.nextstrain.org/projects/nextclade/en/stable/user/algorithm/01-sequence-alignment.html>
70+
annotation = config["files"]["annotation"],
6971
reference = config["files"]["alignment_reference"]
7072
output:
7173
alignment = "results/aligned_{origin}.fasta.xz",
72-
insertions = "results/insertions_{origin}.tsv",
73-
translations = expand("results/translations/seqs_{{origin}}.gene.{gene}.fasta.xz", gene=config.get('genes', ['S']))
7474
params:
75-
output_translations = lambda w: f"results/translations/seqs_{w.origin}.gene.{{gene}}.fasta",
76-
output_translations_toxz = "results/translations/seqs_{origin}.gene.*.fasta",
7775
strain_prefixes=config["strip_strain_prefixes"],
7876
# Strip the compression suffix for the intermediate output from the aligner.
7977
uncompressed_alignment=lambda wildcards, output: Path(output.alignment).with_suffix(""),
@@ -92,15 +90,12 @@ rule align:
9290
--sequences {input.sequences} \
9391
--strip-prefixes {params.strain_prefixes:q} \
9492
--output /dev/stdout 2> {params.sanitize_log} \
95-
| nextalign run \
93+
| nextclade run \
9694
--jobs={threads} \
97-
--reference {input.reference} \
98-
--genemap {input.genemap} \
99-
--output-translations {params.output_translations} \
100-
--output-fasta {params.uncompressed_alignment} \
101-
--output-insertions {output.insertions} > {log} 2>&1;
95+
--input-ref {input.reference} \
96+
--input-annotation {input.annotation} \
97+
--output-fasta {params.uncompressed_alignment} > {log} 2>&1;
10298
xz -2 -T {threads} {params.uncompressed_alignment};
103-
xz -2 -T {threads} {params.output_translations_toxz}
10499
"""
105100

106101
def _get_subsampling_settings(wildcards):
@@ -466,8 +461,8 @@ rule prepare_nextclade:
466461
conda: config["conda_environment"]
467462
shell:
468463
r"""
469-
nextclade2 --version
470-
nextclade2 dataset get --name {params.name} --output-zip {output.nextclade_dataset}
464+
nextclade --version
465+
nextclade dataset get --name {params.name} --output-zip {output.nextclade_dataset}
471466
"""
472467

473468
rule build_align:
@@ -481,14 +476,13 @@ rule build_align:
481476
nextclade_dataset = "data/sars-cov-2-nextclade-defaults.zip",
482477
output:
483478
alignment = "results/{build_name}/aligned.fasta",
484-
insertions = "results/{build_name}/insertions.tsv",
485479
nextclade_qc = 'results/{build_name}/nextclade_qc.tsv',
486480
translations = expand("results/{{build_name}}/translations/aligned.gene.{gene}.fasta", gene=config.get('genes', ['S']))
487481
params:
488482
outdir = "results/{build_name}/translations",
489483
strain_prefixes=config["strip_strain_prefixes"],
490484
sanitize_log="logs/sanitize_sequences_before_nextclade_{build_name}.txt",
491-
output_translations = lambda w: f"results/{w.build_name}/translations/aligned.gene.{{gene}}.fasta"
485+
output_translations = lambda w: f"results/{w.build_name}/translations/aligned.gene.{{cds}}.fasta"
492486
log:
493487
"logs/align_{build_name}.txt"
494488
benchmark:
@@ -503,13 +497,12 @@ rule build_align:
503497
--sequences {input.sequences} \
504498
--strip-prefixes {params.strain_prefixes:q} \
505499
--output /dev/stdout 2> {params.sanitize_log} \
506-
| nextclade2 run \
500+
| nextclade run \
507501
--jobs {threads} \
508502
--input-dataset {input.nextclade_dataset} \
509503
--output-tsv {output.nextclade_qc} \
510504
--output-fasta {output.alignment} \
511-
--output-translations {params.output_translations} \
512-
--output-insertions {output.insertions} 2>&1 | tee {log}
505+
--output-translations {params.output_translations} 2>&1 | tee {log}
513506
"""
514507

515508
rule join_metadata_and_nextclade_qc:
@@ -936,34 +929,6 @@ rule translate:
936929
--output {output.node_data} 2>&1 | tee {log}
937930
"""
938931

939-
rule build_mutation_summary:
940-
message: "Summarizing {input.alignment}"
941-
input:
942-
alignment = rules.build_align.output.alignment,
943-
insertions = rules.build_align.output.insertions,
944-
translations = rules.build_align.output.translations,
945-
reference = config["files"]["alignment_reference"],
946-
genemap = config["files"]["annotation"]
947-
output:
948-
mutation_summary = "results/{build_name}/mutation_summary.tsv"
949-
log:
950-
"logs/mutation_summary_{build_name}.txt"
951-
params:
952-
outdir = "results/{build_name}/translations",
953-
basename = "aligned"
954-
conda: config["conda_environment"]
955-
shell:
956-
r"""
957-
python3 scripts/mutation_summary.py \
958-
--alignment {input.alignment} \
959-
--insertions {input.insertions} \
960-
--directory {params.outdir} \
961-
--basename {params.basename} \
962-
--reference {input.reference} \
963-
--genemap {input.genemap} \
964-
--output {output.mutation_summary} 2>&1 | tee {log}
965-
"""
966-
967932
rule distances:
968933
input:
969934
tree = rules.refine.output.tree,

0 commit comments

Comments
 (0)