Skip to content

POC: Global Cache#6100

Draft
jorgee wants to merge 1 commit intomasterfrom
global-cache-playground
Draft

POC: Global Cache#6100
jorgee wants to merge 1 commit intomasterfrom
global-cache-playground

Conversation

@jorgee
Copy link
Contributor

@jorgee jorgee commented May 19, 2025

This PR is a test to validate the possibility of a global cache to share the task caching between executions. This is forcing to have the same cloud-cache, working dir and sessionId for a defined global-cache.It is able to detect when same task is executed by different workflows without using the resume option.

-with-globalcache=<cloud_path> parameter overwrites cloud cache path with <cloud_path>/.nf-global-cache and the working dir with <cloud_path>/work and creates a SessionId hashing the global cache path. This option implies -resume.

I tested the concurrent execution of tasks in two different workflow and it is detecting if it is the same and generating a new hash. When executing a third workflow with the same task after the execution of the task it detects that is cached and do not run the task again.

@netlify

This comment was marked as off-topic.

@bentsherman
Copy link
Member

@jorgee since there is renewed interest in the global cache and you might be working on this soon -- I recommend implementing a TaskHasherV1 / TaskHasherV2 and adding a setting (e.g. environment var) for which task hash to use. Then the TaskHasherV2 simply needs to not include things like the session id, process name, etc

@bentsherman bentsherman force-pushed the global-cache-playground branch from 1f07b36 to c9cea71 Compare January 30, 2026 22:26
@bentsherman bentsherman force-pushed the global-cache-playground branch from c9cea71 to dc6ae45 Compare January 30, 2026 23:42
@bentsherman
Copy link
Member

Updates:

  • use NXF_GLOBALCACHE_PATH environment var instead of CLI option
  • set session ID to all zeros
  • remove session ID and process name from task hash when global caching is enabled

@bentsherman
Copy link
Member

Extra note: set process.cache = 'deep' alongside this to enable global content-based caching.

@jorgee jorgee mentioned this pull request Feb 3, 2026
@bentsherman
Copy link
Member

Note about concurrency

There is a slight chance that two Nextflow runs try to run the same task at the exact same time, in which case they will use the same task directory and clobber each other.

We don't know how likely it is, but at a high enough scale I suspect it could happen enough to be very annoying. So, this POC is only recommended for testing at small scale. Otherwise, be prepared for weird concurrency errors

@pditommaso pditommaso force-pushed the master branch 2 times, most recently from d9fa5cd to d752bc2 Compare February 28, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants