Skip to content

Conversation

@jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Jul 16, 2025

Slack context

Implements snakemake code which reads a repo's nextstrain-pathogen.yaml, parses declared python/cli dependencies, and checks the current runtime to ensure dependencies are installed and match the declared version specifiers.

This code is not compatible with Snakemake's per-rule --conda environments, which I know we've used in the past but I believe no longer do. This code doesn't aim to check dependencies in a conda environment if we use Snakemake's per-rule --conda environments or if we do things like conda activate X in a rule's shell block.

I don't intend to merge this PR into Zika, but I think it's an ideal place to test it out. Once people have reviewed the code I'll add it to nextstrain/shared and then pathogens can vendor it as they wish.

@genehack
Copy link
Contributor

This code is not compatible with Snakemake's per-rule --conda environments, which I know we've used in the past but I believe no longer do.

@kimandrews aren't you using those per-rule Conda envs in the TB build currently?

Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall +1 for dependency checks. I wish this could be squirreled away in the Nextstrain CLI, but I get the desire to run workflows with just snakemake.

its deserialized contents.
Taken from <https://github.com/nextstrain/cli/blob/4dbac262b22a3db9c48267e23f713ad56251ffd0/nextstrain/cli/pathogens.py#L843C1-L858C24>
with modifications. (Note: pathogen repos don't need the nextstrain CLI to be installed and thus we can't import the code.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the note here 😄 I was initially confused why this was getting implemented as part of the Snakemake workflow instead of the Nextstrain CLI which has all of the code for parsing nextstrain-pathogen.yaml. Will have to think on whether we will eventually also duplicate schema validation for nextstrain-pathogen.yaml.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After implementing this, my takeaway is that using the nextstrain-pathogen.yaml comes with a lot of baggage. Whatever I choose now will almost certainly be replaced with some other syntax when we implement workflows-as-programs, and then we have a backwards-compatibility debt to pay. We also have the conceptual conflict that the CLI is the tool which uses this file, but I want a solution that doesn't depend on the CLI. The very existence of the file implies a specific directory structure which is independent of my desire to just check some dependencies!

Alternatively I can define the dependencies elsewhere (and still use the code implemented here but just read the definitions from elsewhere). The could be (i) in the config YAMLs, (ii) hardcoded in the snakemake files or (iii) in a new file nextstrain-dependencies.yaml. Pros and cons. (i) and (ii) allow for per-workflow dependencies, which could be very nice in cetain situations but is prone to lists getting out of sync. I'm thinking about letting the snakefile decide by supplying them to check_pathogen_required_versions(dependencies, *, fatal=True); an example usage might be check_pathogen_required_versions(config['dependencies'], fatal=True). Any opinions welcomed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After implementing this, my takeaway is that using the nextstrain-pathogen.yaml comes with a lot of baggage. Whatever I choose now will almost certainly be replaced with some other syntax when we implement workflows-as-programs, and then we have a backwards-compatibility debt to pay. We also have the conceptual conflict that the CLI is the tool which uses this file, but I want a solution that doesn't depend on the CLI.

To me, the takeaway from the above is "don't try to solve this problem right now".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the takeaway from the above is "don't try to solve this problem right now".

👆 yeah, same.

Why do you want a solution that doesn't depend on Nextstrain CLI?

I agree we definitely want to keep workflows run-able independent of Nextstrain CLI, but it seems fine for dependency checking to happen at the Nextstrain CLI level and not the workflow level. I don't think we need to support all the niceties we want for workflows at the lowest possible level (the workflow), esp. if they're easier or better implemented at a higher level (Nextstrain CLI).

Comment on lines 52 to 82
if name=='snakemake':
# importlib reports 0.0.0, so follow the approach of Snakemake's own min_version function
version = Version(snakemake_version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I can't replicate the 0.0.0 version...

$ nextstrain shell .
$ python 
Python 3.10.18 (main, Jul  1 2025, 10:55:01) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from importlib.metadata import version as importlib_version
>>> from packaging.version import Version
>>> Version(importlib_version("snakemake"))
<Version('7.32.3')>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my conda environment:

$ snakemake --version
9.7.1
$ python
Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:54:21) [Clang 16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from importlib.metadata import version as importlib_version
>>> from packaging.version import Version
>>> Version(importlib_version("snakemake"))
<Version('0.0.0')>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's a conda specific thing. I can replicate within nextstrain shell --conda .

(Then snakemake should work as a cli dep, which brings me back to my question of do we need to separate py vs cli)

Copy link
Member Author

@jameshadfield jameshadfield Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then snakemake should work as a cli dep

yeah - snakemake was previously (also) specified as a CLI dep to show just this :)

do we need to separate py vs cli

If we want the simplicity of not differentiating between these then we have to special-case the python version checking of snakemake or special-case a returned version of '0.0.0'. I've done the former.

Comment on lines 19 to 29
py:
nextstrain-augur: ">=30,!=31.3"
boto3: ">=1.38"
snakemake: ">=9,<10"
cli:
augur: '>30'
nextclade: '>=3.15'
this-program-doesnt-exist: '1'
nextstrain: '==11'
snakemake: "<=8"
aws: "==3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to differentiate between py and cli dependencies? I'm already confused by the nextstrain-augur vs augur dependencies listed...

Could the version check just try importlib_version first and if that fails, then try subprocess.run?

Copy link
Member Author

@jameshadfield jameshadfield Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm already confused by the nextstrain-augur vs augur dependencies listed...

I've better documented the list in nextstrain-pathogen.yaml - the intention was for them to be a testing set rather than how an actual pathogen repo's dependencies would look like.

Could the version check just try importlib_version first and if that fails, then try subprocess.run?

Sure. That trades off explicitness for simplicity. Maybe a win? I'm not sure, but I've updated the code to use this approach for us to test out.

@jameshadfield jameshadfield force-pushed the james/dependency-version-checks branch 2 times, most recently from 662c3c8 to 1397182 Compare July 21, 2025 00:39
@jameshadfield jameshadfield force-pushed the james/dependency-version-checks branch from 1397182 to 2b9cdf1 Compare July 21, 2025 00:44
Comment on lines +149 to +154
def pathogen_yaml(*, subdir_max=3):
_searched_paths = []
for i in range(0, subdir_max):
p = path.normpath(path.join(workflow.basedir, *['..']*i, "nextstrain-pathogen.yaml"))
_searched_paths.append(p)
if path.isfile(p):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this is a more limited search than what Nextstrain CLI does, so it's prone to cause different behaviour if the search depth (height) is exceeded.

Comment on lines +121 to +124
try:
version = Version(m.group(0))
except InvalidVersion: # <https://packaging.pypa.io/en/stable/version.html#packaging.version.InvalidVersion>
self.unexpected_errors.append(f"CLI program {name!r} reported a version of {m.group(0)} which we were unable to parse")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail for versions which don't conform to Python's idea of a version, of course. Non-Python CLI tools seem more likely to hit that. FWIW, Nextstrain CLI has code for parsing any version string in a way that leaves the version at least comparable to like versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants