[shared] enforce declared dependencies #91

jameshadfield · 2025-07-16T00:42:10Z

Implements snakemake code which reads a repo's nextstrain-pathogen.yaml, parses declared python/cli dependencies, and checks the current runtime to ensure dependencies are installed and match the declared version specifiers.

~~This code is not compatible with Snakemake's per-rule --conda environments, which I know we've used in the past but I believe no longer do.~~ This code doesn't aim to check dependencies in a conda environment if we use Snakemake's per-rule --conda environments or if we do things like conda activate X in a rule's shell block.

I don't intend to merge this PR into Zika, but I think it's an ideal place to test it out. Once people have reviewed the code I'll add it to nextstrain/shared and then pathogens can vendor it as they wish.

shared/vendored/snakemake/versioning.smk

genehack · 2025-07-16T18:59:45Z

This code is not compatible with Snakemake's per-rule --conda environments, which I know we've used in the past but I believe no longer do.

@kimandrews aren't you using those per-rule Conda envs in the TB build currently?

joverlee521

Overall +1 for dependency checks. I wish this could be squirreled away in the Nextstrain CLI, but I get the desire to run workflows with just snakemake.

nextstrain-pathogen.yaml

shared/vendored/snakemake/versioning.smk

joverlee521 · 2025-07-17T23:32:36Z

shared/vendored/snakemake/versioning.smk

+    its deserialized contents.
+
+    Taken from <https://github.com/nextstrain/cli/blob/4dbac262b22a3db9c48267e23f713ad56251ffd0/nextstrain/cli/pathogens.py#L843C1-L858C24>
+    with modifications. (Note: pathogen repos don't need the nextstrain CLI to be installed and thus we can't import the code.)


Thanks for the note here 😄 I was initially confused why this was getting implemented as part of the Snakemake workflow instead of the Nextstrain CLI which has all of the code for parsing nextstrain-pathogen.yaml. Will have to think on whether we will eventually also duplicate schema validation for nextstrain-pathogen.yaml.

After implementing this, my takeaway is that using the nextstrain-pathogen.yaml comes with a lot of baggage. Whatever I choose now will almost certainly be replaced with some other syntax when we implement workflows-as-programs, and then we have a backwards-compatibility debt to pay. We also have the conceptual conflict that the CLI is the tool which uses this file, but I want a solution that doesn't depend on the CLI. The very existence of the file implies a specific directory structure which is independent of my desire to just check some dependencies!

Alternatively I can define the dependencies elsewhere (and still use the code implemented here but just read the definitions from elsewhere). The could be (i) in the config YAMLs, (ii) hardcoded in the snakemake files or (iii) in a new file nextstrain-dependencies.yaml. Pros and cons. (i) and (ii) allow for per-workflow dependencies, which could be very nice in cetain situations but is prone to lists getting out of sync. I'm thinking about letting the snakefile decide by supplying them to check_pathogen_required_versions(dependencies, *, fatal=True); an example usage might be check_pathogen_required_versions(config['dependencies'], fatal=True). Any opinions welcomed.

After implementing this, my takeaway is that using the nextstrain-pathogen.yaml comes with a lot of baggage. Whatever I choose now will almost certainly be replaced with some other syntax when we implement workflows-as-programs, and then we have a backwards-compatibility debt to pay. We also have the conceptual conflict that the CLI is the tool which uses this file, but I want a solution that doesn't depend on the CLI.

To me, the takeaway from the above is "don't try to solve this problem right now".

To me, the takeaway from the above is "don't try to solve this problem right now".

👆 yeah, same.

Why do you want a solution that doesn't depend on Nextstrain CLI?

I agree we definitely want to keep workflows run-able independent of Nextstrain CLI, but it seems fine for dependency checking to happen at the Nextstrain CLI level and not the workflow level. I don't think we need to support all the niceties we want for workflows at the lowest possible level (the workflow), esp. if they're easier or better implemented at a higher level (Nextstrain CLI).

joverlee521 · 2025-07-17T23:39:49Z

shared/vendored/snakemake/versioning.smk

+            if name=='snakemake':
+                # importlib reports 0.0.0, so follow the approach of Snakemake's own min_version function
+                version = Version(snakemake_version)


Huh, I can't replicate the 0.0.0 version...

$ nextstrain shell . $ python Python 3.10.18 (main, Jul 1 2025, 10:55:01) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from importlib.metadata import version as importlib_version >>> from packaging.version import Version >>> Version(importlib_version("snakemake")) <Version('7.32.3')>

From my conda environment:

$ snakemake --version 9.7.1 $ python Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:54:21) [Clang 16.0.6 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from importlib.metadata import version as importlib_version >>> from packaging.version import Version >>> Version(importlib_version("snakemake")) <Version('0.0.0')>

Ah, it's a conda specific thing. I can replicate within nextstrain shell --conda .

(Then snakemake should work as a cli dep, which brings me back to my question of do we need to separate py vs cli)

Then snakemake should work as a cli dep

yeah - snakemake was previously (also) specified as a CLI dep to show just this :)

do we need to separate py vs cli

If we want the simplicity of not differentiating between these then we have to special-case the python version checking of snakemake or special-case a returned version of '0.0.0'. I've done the former.

joverlee521 · 2025-07-17T23:55:12Z

nextstrain-pathogen.yaml

+  py:
+    nextstrain-augur: ">=30,!=31.3"
+    boto3: ">=1.38"
+    snakemake: ">=9,<10"
+  cli:
+    augur: '>30'
+    nextclade: '>=3.15'
+    this-program-doesnt-exist: '1'
+    nextstrain: '==11'
+    snakemake: "<=8"
+    aws: "==3"


Does this need to differentiate between py and cli dependencies? I'm already confused by the nextstrain-augur vs augur dependencies listed...

Could the version check just try importlib_version first and if that fails, then try subprocess.run?

I'm already confused by the nextstrain-augur vs augur dependencies listed...

I've better documented the list in nextstrain-pathogen.yaml - the intention was for them to be a testing set rather than how an actual pathogen repo's dependencies would look like.

Could the version check just try importlib_version first and if that fails, then try subprocess.run?

Sure. That trades off explicitness for simplicity. Maybe a win? I'm not sure, but I've updated the code to use this approach for us to test out.

tsibley · 2025-09-05T21:24:00Z

shared/vendored/snakemake/versioning.smk

+def pathogen_yaml(*, subdir_max=3):
+    _searched_paths = []
+    for i in range(0, subdir_max):
+        p = path.normpath(path.join(workflow.basedir, *['..']*i, "nextstrain-pathogen.yaml"))
+        _searched_paths.append(p)
+        if path.isfile(p):


FWIW, this is a more limited search than what Nextstrain CLI does, so it's prone to cause different behaviour if the search depth (height) is exceeded.

tsibley · 2025-09-05T21:29:56Z

shared/vendored/snakemake/versioning.smk

+        try:
+            version = Version(m.group(0))
+        except InvalidVersion: # <https://packaging.pypa.io/en/stable/version.html#packaging.version.InvalidVersion>
+            self.unexpected_errors.append(f"CLI program {name!r} reported a version of {m.group(0)} which we were unable to parse")


This will fail for versions which don't conform to Python's idea of a version, of course. Non-Python CLI tools seem more likely to hit that. FWIW, Nextstrain CLI has code for parsing any version string in a way that leaves the version at least comparable to like versions.

jameshadfield requested review from joverlee521 and victorlin July 16, 2025 00:42

jameshadfield commented Jul 16, 2025

View reviewed changes

shared/vendored/snakemake/versioning.smk Outdated Show resolved Hide resolved

genehack reviewed Jul 16, 2025

View reviewed changes

shared/vendored/snakemake/versioning.smk Show resolved Hide resolved

joverlee521 reviewed Jul 17, 2025

View reviewed changes

jameshadfield force-pushed the james/dependency-version-checks branch 2 times, most recently from 662c3c8 to 1397182 Compare July 21, 2025 00:39

WIP - shared smk to enforce declared deps

2b9cdf1

jameshadfield force-pushed the james/dependency-version-checks branch from 1397182 to 2b9cdf1 Compare July 21, 2025 00:44

tsibley reviewed Sep 5, 2025

View reviewed changes

[shared] enforce declared dependencies #91

Are you sure you want to change the base?

[shared] enforce declared dependencies #91

Conversation

jameshadfield commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

genehack commented Jul 16, 2025

Uh oh!

joverlee521 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jameshadfield Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jameshadfield Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jameshadfield commented Jul 16, 2025 •

edited

Loading

jameshadfield Jul 21, 2025 •

edited

Loading

jameshadfield Jul 21, 2025 •

edited

Loading