Skip to content

conda environments recreated on every run with Seqera Platform (-with-tower) #6140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
aringeri opened this issue May 30, 2025 · 9 comments · May be fixed by #6166
Open

conda environments recreated on every run with Seqera Platform (-with-tower) #6140

aringeri opened this issue May 30, 2025 · 9 comments · May be fixed by #6166
Assignees

Comments

@aringeri
Copy link

Bug report

Expected behavior and actual behavior

When using the conda directive for a process to point to a environment yaml file, the environment gets re-created in the cache directory on every run when using Seqera Platform.
We would expect that nextflow would use the existing conda environment in the cache directory.

This change in behaviour is due to the conda cache changes here: #5489

Steps to reproduce the problem

  1. create a nextflow pipeline with the following process and environment file
process foo {
  conda '/some/path/my-env.yaml'

  script:
  """
  your_command --here
  """
}
name: my-env
channels:
  - conda-forge
  - bioconda
dependencies:
  - star=2.5.4a
  - bwa=0.7.15
  1. Import this pipeline into Seqera Platform - https://docs.seqera.io/platform-cloud/getting-started/quickstart-demo/add-pipelines
  2. Configure the `nextflow.config' of pipeline to:
    • enable conda conda.enabled = true
    • set the conda.cacheDir = /some/path/on/the/system
  3. Launch the pipeline via Seqera (in my case I am using a SLURM compute environment)
    • Seqera will:
      • set NXF_ASSETS to a location in the work directory <basepath>/work/.nextflow/pipelines/<pipeline-id>
      • launch nextflow on the server with nextflow run <pipeline> ... -with-tower <seqera-url>

Program output

Monitor the execution with Seqera Platform using this URL: <seqera-url>
Creating env using conda: /home/user/<username>/nextflow/work/.nextflow/pipelines/f1f16f92/my-pipeline/envs/environment.yml [cache /home/user/<username>/nextflow/conda_cache/env-fe50773488a5e0e5-535eefbee5734a58afed514fbe57635c]

Environment

  • Nextflow version: 24.10.5
  • Java version: 17.0.14
  • Operating system: Linux
  • Bash version: 5.1.8

Additional context

My troubleshooting has lead me to understand that Seqera Platform generates a new directory for nextflow assets for every pipeline launch. The new directory is placed in the work/.nextflow/pipelines directory. This directory has a new name for every pipeline launch.
In my case:

  • /home/user/<username>/nextflow/work/.nextflow/pipelines/f1f16f92/ - Run 1
  • /home/user/<username>/nextflow/work/.nextflow/pipelines/ee5ee5c/ - Run 2

This is problematic because the CondaCache class computes the cached environment directory based on a hash of the full path of the environment.yml file:

else if( isYamlFilePath(condaEnv) ) {
try {
final path = condaEnv as Path
content = path.text
name = 'env-' + sipHash(path)

  • /home/user/<username>/nextflow/work/.nextflow/pipelines/f1f16f92/my-pipeline/envs/environment.yml becomes env-fe50773488a5e0e5-535eefbee5734a58afed514fbe57635c

Essentially there is no way for two pipeline runs in Seqera to use a cached conda environment because they will never have the same name.
I see this as an incompatibility between the way Seqera Platform uses nextflow, so I'm not 100% sure where the fix should lie.

A current workaround may be to specify conda requirements within a string (instead of file):

process foo {
  conda 'star=2.5.4a bwa=0.7.15'
  ...
}

An alternative way (within nextflow) may be to make the generation of the conda environment hash more configurable: (ie. only take path from project root).

@bentsherman
Copy link
Member

@pditommaso why is the environment name based on the file name? I would argue it should be based on the file contents

@pditommaso
Copy link
Member

Don't have a precise memory, If I'm not wrong @jorgee work on this recently

@jorgee
Copy link
Contributor

jorgee commented May 30, 2025

I remember it, there was a concurrency problem when a run was building the same conda environment several times. (#5485). I will review what I did. It could be related to this issue

@jorgee
Copy link
Contributor

jorgee commented Jun 2, 2025

I have reviewed the issue that was creating the issue. Before #5489, the Conda cache map had the file conda environment file (url, path or string) as key and the <cache_path>/<name>_<content_hash> as value. Two files in different paths but with same name and content had different keys in the map but pointing to the same value, generating the OverlappingFileLockException. This is why we added the hash of the path. Reviewing it again, I have seen the original problem is the key of the map, which was also changed by the <cache_path>/<name>_<content_hash> in #5489. I think we can just go back to the content-hash.

@bentsherman
Copy link
Member

@jorgee can you make a PR? then we'll see if we can avoid the race condition

@jorgee jorgee self-assigned this Jun 5, 2025
@jorgee jorgee linked a pull request Jun 5, 2025 that will close this issue
@jorgee jorgee linked a pull request Jun 5, 2025 that will close this issue
@jorgee
Copy link
Contributor

jorgee commented Jun 5, 2025

I have created a PR #6166 that works in both cases caching the environment when using the cachedir, and not raising the OverlappingFileLockException.

@jorgee
Copy link
Contributor

jorgee commented Jun 5, 2025

Testing the PR I have seen another issue. When I run two workflows at the same time that create the same conda environment. The workflow that starts first works well.

 N E X T F L O W   ~  version 25.05.0-edge

Launching `main.nf` [nasty_maxwell] DSL2 - revision: 142b9c09f9

[-        ] greet  -
executor >  local (2)
executor >  local (2)
executor >  local (2)
[3e/db184f] greet (1)  | 1 of 1 ✔
[cb/3c53db] greet2 (1) | 1 of 1 ✔
Creating env using conda: /home/jorgee/nextflow_tests/issue-6140/conda_2/env.yaml [cache /home/jorgee/nextflow_tests/shared_conda_cache/env-dd83a192cf0030e98953ef467708ac6c]
hello2 world

hello world

However, the workflow started a bit later detects and waits. But once the other has finished, it tries to create the environment again and it fails.

N E X T F L O W   ~  version 25.05.0-edge

Launching `main.nf` [magical_becquerel] DSL2 - revision: 142b9c09f9

[-        ] greet  -
[-        ] greet2 -
Another Nextflow instance is creating the conda environment /home/jorgee/nextflow_tests/issue-4164/conda_2/env.yaml -- please wait till it completes
Creating env using conda: /home/jorgee/nextflow_tests/issue-4164/conda_2/env.yaml [cache /home/jorgee/nextflow_tests/shared_conda_cache/env-dd83a192cf0030e98953ef467708ac6c]
ERROR ~ Error executing process > 'greet (1)'

Caused by:
 Failed to create Conda environment
   command: conda env create --prefix /home/jorgee/nextflow_tests/shared_conda_cache/env-dd83a192cf0030e98953ef467708ac6c --file /home/jorgee/nextflow_tests/issue-4164/conda_2/env.yaml
   status : 1
   message:
     CondaValueError: prefix already exists: /home/jorgee/nextflow_tests/shared_conda_cache/env-dd83a192cf0030e98953ef467708ac6c

I think Nextflow shouldn't try to create the environment again.

@bentsherman
Copy link
Member

From our discussion today, we can avoid this race condition with the same approach we use for task directories -- increment the conda env hash until you find a directory that doesn't exist yet. Or if you can verify that the env already exists and isn't corrupted, re-use it.

@jorgee
Copy link
Contributor

jorgee commented Jun 6, 2025

I have reviewed the code and it is easier. If there is an error, the environment directory is removed to avoid corrupted environments. I just needed to recheck the conda environment directory, and it solved the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants