(WIP) phylogenetic: Support `nextstrain run` #327

joverlee521 · 2025-07-09T23:30:42Z

Description of proposed changes

Workflow updates to support nextstrain run. Since there are multiple builds, I've created a separate entrypoint/Snakefile per build.

This allows users to use nextstrain run without know the internal config file locations:

nextstrain run mpox phylogenetic/clade-i <analysis-dir>

For users who still want to use nextstrain build, the command is now

nextstrain build phylogenetic -s clade-i/Snakefile

Discussion

Currently a draft PR mainly for discussion of the changes needed to support nextstrain run.

I want to mainly gather thoughts on the 4 extra directories/Snakefiles as separate workflow entrypoints. I think the long term goal is for nextstrain run to be smarter about picking workflows to run, e.g. imagine defining the workflows within nextstrain-pathogen.yaml.

Related issue(s)

Part of #314

Checklist

Checks pass
Update changelog

Workflow updates to support `nextstrain run`. Since there are multiple builds, I've created a separate entrypoint/Snakefile per build. This allows users to use `nextstrain run` without know the internal config file locations: ``` nextstrain run mpox phylogenetic/clade-i <analysis-dir> ``` For users who still want to use `nextstrain build`, the command is now ``` nextstrain build phylogenetic -s clade-i/Snakefile ```

for more information, see https://pre-commit.ci

jameshadfield

I want to mainly gather thoughts on the 4 extra directories/Snakefiles as separate workflow entrypoints. I think the long term goal is for nextstrain run to be smarter about picking workflows to run, e.g. imagine defining the workflows within nextstrain-pathogen.yaml.

Regarding the 4 extra Snakefiles I have two thoughts. Firstly, it's great that each repo has the power to do its own thing, and that it can expose ~any Snakefile (with the necessary path wrangling etc) as an interface to our runner and leverage the power that brings (cloud compute, runtimes etc). Secondly, I imagine you want to decide on a pattern that people can just copy-paste for multi-build workflows and it seems like this approach is in direct competition with using wildcards to parameterise builds. (The long-term plans for nextstrain-pathogen.yaml are opaque to me.)

jameshadfield · 2025-07-10T02:02:41Z

phylogenetic/clade-iib/Snakefile

If we're going to rename hmpxv1 to clade-iib (which seems fine 🤷 I don't have much context here) then we should go the whole way and rename everything - defaults/hmpxv1 etc. (Perhaps you plan to do this -- it's a draft PR after all -- if so this can just serve as a reminder)

Users in office hours and myself have often been confused by the mapping between the config files and the named builds, so I thought the workflow should at least match the build name.

If there's not push back from others here, I'll rename everything.

jameshadfield · 2025-07-10T02:10:54Z

phylogenetic/clade-iib/Snakefile

@@ -0,0 +1,14 @@
+configfile: os.path.join(workflow.basedir, "../defaults/hmpxv1/config.yaml")


I think it's cleaner and easier to understand if we co-locate the config.yaml with the Snakefile, i.e.

$ tree phylogenetic/clade-iib clade-iib ├── Snakefile └── config.yaml

This raises the question of where to store clade-iib specific config files: phylogenetic/defaults/clade-iib or phylogenetic/clade-iib/defaults (or some variant thereof). If we look at some values in the config YAML (swapping hmpxv1 for clade-iib for clarity) we have things like:

color_scheme: "color_schemes.tsv" auspice_config: "clade-iib/auspice_config.json"

which isn't very ergonomic from an external users point of view. I think something like the following is cleaner:

color_scheme: "color_schemes.tsv" auspice_config: "auspice_config.json"

$ tree phylogenetic ├── defaults │ └── color_schemes.tsv ├── clade-iib │ ├── Snakefile │ ├── config.yaml │ ├── defaults # maybe use a defaults subdir? │ │ └── auspice_config.json │ └── auspice_config.json # or maybe just do this?

Where you have a path resolution of (1) analysis dir, (2) clade-iib, (3) phylogenetic.

Yeah, totally see your point about the config not being ergonomic for external users here. On the other hand, as a maintainer, it's no longer clear from

color_scheme: "color_schemes.tsv" auspice_config: "auspice_config.json"

which files need to be edited to change the config...I am also wary of the splintering of config files, which is making me reconsider the separate Snakefile approach. See proposal in nextstrain/cli#454

+1 for exploring ways to get different configfile "entrypoints" into nextstrain run beyond creating multiple Snakefiles. I originally did just that for avian-flu, but my solution wasn't as nice as your proposed use of nextstrain-pathogen.yaml which might have been why it didn't get any traction.

Even with the --configfile approach we'll still have subtleties for analysis directory usage such as "color_schemes.tsv" vs "clade-iib/auspice_config.json". I guess the solution here is better per-pathogen docs.

jameshadfield · 2025-07-10T02:20:00Z

phylogenetic/rules/config.smk

+include: "../../shared/vendored/snakemake/config.smk"
+
+
+def phylo_resolve_config_path(path: str) -> Callable[[Wildcards], str]:
+    """
+    Wrapper around the shared `resolve_config_path` to force the default directory
+    to be `phylogenetic/defaults`. This is necessary because the entry point for
+    each build is nested within phylogenetic (e.g. phylogenetic/clade-i/Snakefile).
+    """
+    PHYLO_DEFAULTS_DIR = os.path.normpath(os.path.join(workflow.current_basedir, "../defaults"))
+    # Strip the `defaults/` prefix to be backwards compatible with older configs
+    # This is necessary in this wrapper because we are providing a custom defaults dir
+    # which skips the handling of the defaults/ prefix within resolve_config_path.
+    path = path.removeprefix("defaults/")
+    return resolve_config_path(path, PHYLO_DEFAULTS_DIR)
+
+


I still wish we got rid of all this magic around defaults/ and the path in the config.yaml was the path used, relative to an ordered set of "base directories" which are easily documented. I understand people don't want external users to have to make a "defaults" directory, but if we used "config" instead I think there'd be less objection.

Secondly I still think phylo_resolve_config_path is overly verbose and leaves the door open for unintentionally using resolve_config_path (which is still in the namespace). I continue to think the following is preferable:

include: "../../shared/vendored/snakemake/config.smk" _shared_resolve_config_path = resolve_config_path def resolve_config_path(path: str) -> Callable[[Wildcards], str]: return _shared_resolve_config_path(...)

Yeah, this is only necessary because of the nested Snakefiles. If we keep the main phylogenetic/Snakefile then we could just directly use the shared resolve_config_path.

joverlee521 · 2025-07-10T16:38:23Z

phylogenetic/rules/main.smk

Ah, moving the main Snakefile breaks the pathogen-repo-ci because it expects the phylogenetic/Snakefile...

Yeah, this is why I used pathogen-repo-ci.yaml@v0 for avian-flu

joverlee521 and others added 2 commits July 9, 2025 15:46

[pre-commit.ci] auto fixes from pre-commit.com hooks

2016c1f

for more information, see https://pre-commit.ci

jameshadfield reviewed Jul 10, 2025

View reviewed changes

joverlee521 commented Jul 10, 2025

View reviewed changes

joverlee521 mentioned this pull request Jul 10, 2025

nextstrain run: Define available workflows in the nextstrain-pathogen.yaml nextstrain/cli#454

Closed

Base automatically changed from nextstrain-run-ingest to master July 18, 2025 16:59

joverlee521 mentioned this pull request Jul 23, 2025

phylogenetic: Support nextstrain run (alternative) #330

Draft

2 tasks

joverlee521 mentioned this pull request Aug 14, 2025

Workflows as programs nextstrain/public#1

Open

35 tasks

tsibley mentioned this pull request Sep 9, 2025

Register workflow files in nextstrain-pathogen.yaml nextstrain/cli#462

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(WIP) phylogenetic: Support `nextstrain run` #327

(WIP) phylogenetic: Support `nextstrain run` #327

Uh oh!

joverlee521 commented Jul 9, 2025

Uh oh!

jameshadfield left a comment

Uh oh!

jameshadfield Jul 10, 2025

Uh oh!

joverlee521 Jul 10, 2025

Uh oh!

jameshadfield Jul 10, 2025

Uh oh!

joverlee521 Jul 10, 2025

Uh oh!

jameshadfield Jul 15, 2025

Uh oh!

jameshadfield Jul 10, 2025

Uh oh!

joverlee521 Jul 10, 2025

Uh oh!

joverlee521 Jul 10, 2025

Uh oh!

jameshadfield Jul 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,14 @@
		configfile: os.path.join(workflow.basedir, "../defaults/hmpxv1/config.yaml")

(WIP) phylogenetic: Support nextstrain run #327

Are you sure you want to change the base?

(WIP) phylogenetic: Support nextstrain run #327

Uh oh!

Conversation

joverlee521 commented Jul 9, 2025

Description of proposed changes

Discussion

Related issue(s)

Checklist

Uh oh!

jameshadfield left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

(WIP) phylogenetic: Support `nextstrain run` #327

(WIP) phylogenetic: Support `nextstrain run` #327