-
Notifications
You must be signed in to change notification settings - Fork 11
[shared] enforce declared dependencies #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@kimandrews aren't you using those per-rule Conda envs in the TB build currently? |
joverlee521
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall +1 for dependency checks. I wish this could be squirreled away in the Nextstrain CLI, but I get the desire to run workflows with just snakemake.
| its deserialized contents. | ||
| Taken from <https://github.com/nextstrain/cli/blob/4dbac262b22a3db9c48267e23f713ad56251ffd0/nextstrain/cli/pathogens.py#L843C1-L858C24> | ||
| with modifications. (Note: pathogen repos don't need the nextstrain CLI to be installed and thus we can't import the code.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the note here 😄 I was initially confused why this was getting implemented as part of the Snakemake workflow instead of the Nextstrain CLI which has all of the code for parsing nextstrain-pathogen.yaml. Will have to think on whether we will eventually also duplicate schema validation for nextstrain-pathogen.yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After implementing this, my takeaway is that using the nextstrain-pathogen.yaml comes with a lot of baggage. Whatever I choose now will almost certainly be replaced with some other syntax when we implement workflows-as-programs, and then we have a backwards-compatibility debt to pay. We also have the conceptual conflict that the CLI is the tool which uses this file, but I want a solution that doesn't depend on the CLI. The very existence of the file implies a specific directory structure which is independent of my desire to just check some dependencies!
Alternatively I can define the dependencies elsewhere (and still use the code implemented here but just read the definitions from elsewhere). The could be (i) in the config YAMLs, (ii) hardcoded in the snakemake files or (iii) in a new file nextstrain-dependencies.yaml. Pros and cons. (i) and (ii) allow for per-workflow dependencies, which could be very nice in cetain situations but is prone to lists getting out of sync. I'm thinking about letting the snakefile decide by supplying them to check_pathogen_required_versions(dependencies, *, fatal=True); an example usage might be check_pathogen_required_versions(config['dependencies'], fatal=True). Any opinions welcomed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After implementing this, my takeaway is that using the
nextstrain-pathogen.yamlcomes with a lot of baggage. Whatever I choose now will almost certainly be replaced with some other syntax when we implement workflows-as-programs, and then we have a backwards-compatibility debt to pay. We also have the conceptual conflict that the CLI is the tool which uses this file, but I want a solution that doesn't depend on the CLI.
To me, the takeaway from the above is "don't try to solve this problem right now".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the takeaway from the above is "don't try to solve this problem right now".
👆 yeah, same.
Why do you want a solution that doesn't depend on Nextstrain CLI?
I agree we definitely want to keep workflows run-able independent of Nextstrain CLI, but it seems fine for dependency checking to happen at the Nextstrain CLI level and not the workflow level. I don't think we need to support all the niceties we want for workflows at the lowest possible level (the workflow), esp. if they're easier or better implemented at a higher level (Nextstrain CLI).
| if name=='snakemake': | ||
| # importlib reports 0.0.0, so follow the approach of Snakemake's own min_version function | ||
| version = Version(snakemake_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I can't replicate the 0.0.0 version...
$ nextstrain shell .
$ python
Python 3.10.18 (main, Jul 1 2025, 10:55:01) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from importlib.metadata import version as importlib_version
>>> from packaging.version import Version
>>> Version(importlib_version("snakemake"))
<Version('7.32.3')>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my conda environment:
$ snakemake --version
9.7.1
$ python
Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:54:21) [Clang 16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from importlib.metadata import version as importlib_version
>>> from packaging.version import Version
>>> Version(importlib_version("snakemake"))
<Version('0.0.0')>There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's a conda specific thing. I can replicate within nextstrain shell --conda .
(Then snakemake should work as a cli dep, which brings me back to my question of do we need to separate py vs cli)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then snakemake should work as a cli dep
yeah - snakemake was previously (also) specified as a CLI dep to show just this :)
do we need to separate py vs cli
If we want the simplicity of not differentiating between these then we have to special-case the python version checking of snakemake or special-case a returned version of '0.0.0'. I've done the former.
nextstrain-pathogen.yaml
Outdated
| py: | ||
| nextstrain-augur: ">=30,!=31.3" | ||
| boto3: ">=1.38" | ||
| snakemake: ">=9,<10" | ||
| cli: | ||
| augur: '>30' | ||
| nextclade: '>=3.15' | ||
| this-program-doesnt-exist: '1' | ||
| nextstrain: '==11' | ||
| snakemake: "<=8" | ||
| aws: "==3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to differentiate between py and cli dependencies? I'm already confused by the nextstrain-augur vs augur dependencies listed...
Could the version check just try importlib_version first and if that fails, then try subprocess.run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm already confused by the nextstrain-augur vs augur dependencies listed...
I've better documented the list in nextstrain-pathogen.yaml - the intention was for them to be a testing set rather than how an actual pathogen repo's dependencies would look like.
Could the version check just try importlib_version first and if that fails, then try subprocess.run?
Sure. That trades off explicitness for simplicity. Maybe a win? I'm not sure, but I've updated the code to use this approach for us to test out.
662c3c8 to
1397182
Compare
1397182 to
2b9cdf1
Compare
| def pathogen_yaml(*, subdir_max=3): | ||
| _searched_paths = [] | ||
| for i in range(0, subdir_max): | ||
| p = path.normpath(path.join(workflow.basedir, *['..']*i, "nextstrain-pathogen.yaml")) | ||
| _searched_paths.append(p) | ||
| if path.isfile(p): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is a more limited search than what Nextstrain CLI does, so it's prone to cause different behaviour if the search depth (height) is exceeded.
| try: | ||
| version = Version(m.group(0)) | ||
| except InvalidVersion: # <https://packaging.pypa.io/en/stable/version.html#packaging.version.InvalidVersion> | ||
| self.unexpected_errors.append(f"CLI program {name!r} reported a version of {m.group(0)} which we were unable to parse") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail for versions which don't conform to Python's idea of a version, of course. Non-Python CLI tools seem more likely to hit that. FWIW, Nextstrain CLI has code for parsing any version string in a way that leaves the version at least comparable to like versions.
Slack context
Implements snakemake code which reads a repo's
nextstrain-pathogen.yaml, parses declared python/cli dependencies, and checks the current runtime to ensure dependencies are installed and match the declared version specifiers.This code is not compatible with Snakemake's per-ruleThis code doesn't aim to check dependencies in a conda environment if we use Snakemake's per-rule--condaenvironments, which I know we've used in the past but I believe no longer do.--condaenvironments or if we do things likeconda activate Xin a rule's shell block.I don't intend to merge this PR into Zika, but I think it's an ideal place to test it out. Once people have reviewed the code I'll add it to nextstrain/shared and then pathogens can vendor it as they wish.