From 99cf25813c8f6119da4cd9019531cc30fe119d32 Mon Sep 17 00:00:00 2001 From: Jover Lee Date: Mon, 24 Nov 2025 16:43:29 -0800 Subject: [PATCH 1/2] phylogenetic: Add docs for multiple input configs Resolves https://github.com/nextstrain/zika/issues/94 --- phylogenetic/README.md | 49 +++++++++++++++++++++++++++++++----------- 1 file changed, 37 insertions(+), 12 deletions(-) diff --git a/phylogenetic/README.md b/phylogenetic/README.md index 70489e6..f737f34 100644 --- a/phylogenetic/README.md +++ b/phylogenetic/README.md @@ -64,24 +64,48 @@ The workflow is contained in [Snakefile](Snakefile) with included [rules](rules) Each rule specifies its file inputs and output and pulls its parameters from the config. There is little redirection and each rule should be able to be reasoned with on its own. -### Using GenBank data - -This build starts by pulling preprocessed sequence and metadata files from: +### Default input data + +The default build starts from the public Nextstrain data that have been preprocessed +and cleaned from NCBI GenBank and the USVI data from https://github.com/blab/zika-usvi. + +```yaml +inputs: + - name: ncbi + metadata: "s3://nextstrain-data/files/workflows/zika/metadata.tsv.zst" + sequences: "s3://nextstrain-data/files/workflows/zika/sequences.fasta.zst" + - name: usvi + metadata: "s3://nextstrain-data/files/workflows/zika/metadata_usvi.tsv.zst" + sequences: "s3://nextstrain-data/files/workflows/zika/sequences_usvi.fasta.zst" +``` -* https://data.nextstrain.org/files/workflows/zika/sequences.fasta.zst -* https://data.nextstrain.org/files/workflows/zika/metadata.tsv.zst +The NCBI Genbank data are preprocessed using the [ingest/](../ingest/) +workflow and are automatically updated when new data are available. +The USVI records were pulled from https://github.com/blab/zika-usvi/ +with [additional processing steps to remove duplicates][]. These are static +unless we make changes to the expected metadata columns. -The above datasets have been preprocessed and cleaned from GenBank using the -[ingest/](../ingest/) workflow and are updated at regular intervals. +### Adding your own data -### Using USVI data +If you want to add your own data to the default input, specify your inputs with +the `additional_inputs` config parameter. -This build also merges in USVI data from: +```yaml +additional_inputs: + - name: private + metadata: data/metadata.tsv + sequences: data/sequences.fasta +``` -* https://data.nextstrain.org/files/workflows/zika/sequences_usvi.fasta.zst -* https://data.nextstrain.org/files/workflows/zika/metadata_usvi.tsv.zst +If you want to run the builds _without_ the default data and only use your own +data, you can do so by specifying the `inputs` parameter. -The above dataset was pulled from https://github.com/blab/zika-usvi/ with [additional processing steps to remove duplicates](https://github.com/nextstrain/zika/blob/f8a6423a7f6b6f1b30b6496d8433b99eff0d54ff/phylogenetic/data/README.md). +```yaml +inputs: + - name: private + metadata: data/metadata.tsv + sequences: data/sequences.fasta +``` ### Using example data @@ -109,3 +133,4 @@ nextstrain build \ [auspice]: https://docs.nextstrain.org/projects/auspice/en/stable/index.html [Installing Nextstrain guide]: https://docs.nextstrain.org/en/latest/install.html [Running a Pathogen Workflow guide]: https://docs.nextstrain.org/en/latest/tutorials/running-a-workflow.html +[additional processing steps to remove duplicates]: https://github.com/nextstrain/zika/blob/f8a6423a7f6b6f1b30b6496d8433b99eff0d54ff/phylogenetic/data/README.md From e3351ae675a10d69bc499c84d547b1c7a5d3316e Mon Sep 17 00:00:00 2001 From: Jover Lee Date: Mon, 24 Nov 2025 17:03:02 -0800 Subject: [PATCH 2/2] phylogenetic: Add note on `nextstrain run` and inputs `nextstrain run` does not support input files from _within_ the repo because the `path_or_url` does function does not resolve path (yet). See --- phylogenetic/README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/phylogenetic/README.md b/phylogenetic/README.md index f737f34..923abfe 100644 --- a/phylogenetic/README.md +++ b/phylogenetic/README.md @@ -114,6 +114,9 @@ repository by running: nextstrain build . --configfile build-configs/ci/config.yaml +Note: this only works with `nextstrain build`. Within repo input files are _not_ +supported by `nextstrain run`. + ### Deploying build To run the workflow and automatically deploy the build to nextstrain.org,