Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Singularity OCI mode #4440

Merged
merged 16 commits into from
Nov 6, 2023
Merged

Add support for Singularity OCI mode #4440

merged 16 commits into from
Nov 6, 2023

Conversation

pditommaso
Copy link
Member

@pditommaso pditommaso commented Oct 25, 2023

This PR adds support for the OCI-mode provided by Singularity 4. More details at this link.

Guidelines for testing

  1. Install Singularity 4 and runc (see here for details)

  2. enable the singularity OCI mode in the nextflow config

    singularity.enabled = true
    singularity.oci = true
    
  3. run a test pipeline

    nextflow run hello
    

@netlify
Copy link

netlify bot commented Oct 25, 2023

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit dbbc8e6
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/6549112d5a7a3300086a2314

@pditommaso pditommaso linked an issue Oct 25, 2023 that may be closed by this pull request
@pditommaso
Copy link
Member Author

@emi80 you may want to give a try to this branch

@emi80
Copy link
Contributor

emi80 commented Oct 25, 2023

Hey @pditommaso, thanks a lot! I see you eventually had to do it by yourself, as usual... You are too fast!

I will have a try at this tomorrow and let you know!

Signed-off-by: Paolo Di Tommaso <[email protected]>
@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

Hey @pditommaso,

tryed to run as from your guideline and got the following error on an Ubuntu 22.04 machine and singularity-ce version 4.0.1-jammy:

Command error:
  FATAL:   While initializing: while applying crun cgroup workaround: couldn't create cgroup manager: rootless cgroups require a D-Bus session - check that XDG_RUNTIME_DIR and DBUS_SESSION_BUS_ADDRESS are set

It seems like exporting those two env variables in the command makes it work.

@pditommaso
Copy link
Member Author

oohh, should not those vars be provided by the host environment?

@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

Yes, but the Singularity command in .command.run is prepended with env - and I think that removes the vars from the environment...

@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

also, I am not sure how much this is reproducible on other host systems

@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

If I run the command:

set +u; env - XDG_RUNTIME_DIR="$XDG_RUNTIME_DIR" DBUS_SESSION_BUS_ADDRESS="$DBUS_SESSION_BUS_ADDRESS" PATH="$PATH" ${TMP:+SINGULARITYENV_TMP="$TMP"} ${TMPDIR:+SINGULARITYENV_TMPDIR="$TMPDIR"} ${NXF_TASK_WORKDIR:+SINGULARITYENV_NXF_TASK_WORKDIR="$NXF_TASK_WORKDIR"} singularity exec --no-home --pid --oci -B /data/epalumbo/test-nf/oci/work/45/55fcd590ebc3c1dc5d7c9d891bf748 docker://quay.io/nextflow/bash /bin/bash -ue /data/epalumbo/test-nf/oci/work/45/55fcd590ebc3c1dc5d7c9d891bf748/.command.sh

it pulls the image, but then gives me the following error:

❯ bash .command.run
INFO:    Using cached OCI-SIF image
INFO:    --oci runtime uses a PID namespace by default, pid flag is redundant.
chdir: No such file or directory

@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

Removing --no-home or adding -B $HOME works

@pditommaso
Copy link
Member Author

I think that removes the vars from the environment

ok, then adding those to the singularity.envWhitelist should solve it

https://www.nextflow.io/docs/latest/config.html#scope-singularity

Removing --no-home or adding -B $HOME works

Independently of the above variables?

@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

ok, then adding those to the singularity.envWhitelist should solve it

This adds ${XDG_RUNTIME_DIR:+SINGULARITYENV_XDG_RUNTIME_DIR="$XDG_RUNTIME_DIR"} ${DBUS_SESSION_BUS_ADDRESS:+SINGULARITYENV_DBUS_SESSION_BUS_ADDRESS="$DBUS_SESSION_BUS_ADDRESS"} but it does not work. Still getting:

Command error:
  FATAL:   While initializing: while applying crun cgroup workaround: couldn't create cgroup manager: rootless cgroups require a D-Bus session - check that XDG_RUNTIME_DIR and DBUS_SESSION_BUS_ADDRESS are set

That is because it prepends SINGULARITYENV_ to the variable names.

Independently of the above variables?

No, XDG_RUNTIME_DIR and DBUS_SESSION_BUS_ADDRESS must be set

@pditommaso
Copy link
Member Author

Ok, those variables can added explicitly, but are those names portable?

Also, I'm still confused by the -B $HOME binding. Why is that necessary?

@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

As far as I understand those variables are standard and are needed to use cgroups v2.

Also, I'm still confused by the -B $HOME binding. Why is that necessary?

Maybe this is something related to the pipeline I am running: nextflow-io/hello.

@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

I can confirm that with nextflow-io/rnaseq-nf the -B $HOME binding is not needed.

@pditommaso
Copy link
Member Author

Ok, i'll just add the support for XDG_RUNTIME_DIR and DBUS_SESSION_BUS_ADDRESS env vars then

@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

Thanks!

It seems that rnaseq-nf works because the bind -B /home/epalumbo/.nextflow/assets/nextflow-io/rnaseq-nf is added to the Singularity command in .command.run while this is not done with hello.

Indeed, it seems to be an issue with Singularity and OCI:

❯ singularity exec --no-home docker://ubuntu echo Hello
INFO:    Using cached SIF image
Hello

❯ singularity exec --no-home --oci docker://ubuntu echo Hello
INFO:    Using cached OCI-SIF image
2023-10-26T14:12:46.000537361Z: chdir: No such file or directory

@pditommaso
Copy link
Member Author

You can use NXF_SINGULARITY_HOME_MOUNT=true to enable it

@pditommaso
Copy link
Member Author

Ok, i'll just add the support for XDG_RUNTIME_DIR and DBUS_SESSION_BUS_ADDRESS env vars then

But HOW?! 😆

@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

I opened sylabs/singularity#2289. Let's see what they tell me...

@pditommaso
Copy link
Member Author

Where did you added XDG_RUNTIME_DIR and DBUS_SESSION_BUS_ADDRESS vars to test it?

@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

It's a problem with the low-level runtime that Singularity uses in oci mode.

See sylabs/singularity#2289 (comment).

@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

Where did you added XDG_RUNTIME_DIR and DBUS_SESSION_BUS_ADDRESS vars to test it?

I put them before PATH in the singularity command in .command.run:

set +u; env - XDG_RUNTIME_DIR="$XDG_RUNTIME_DIR" DBUS_SESSION_BUS_ADDRESS="$DBUS_SESSION_BUS_ADDRESS" PATH="$PATH ..." 

@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

It looks like setting --cwd in the SIngularity command would fix the --no-home issue and would be more portable as it would work with different OCI runtimes.

See sylabs/singularity#2289 (comment)

@pditommaso
Copy link
Member Author

If it's a singularity problem, I'm inclined to ignore it, and only add the variables

@emi80
Copy link
Contributor

emi80 commented Oct 26, 2023

Indeed, it's actually an OS problem so you can safely ignore it 👍

Signed-off-by: Paolo Di Tommaso <[email protected]>
@pditommaso
Copy link
Member Author

-1 for latest changes (apart of course for compact -> compat renamin).

I agree to default --no-compact, but considering this is a new feature can be useful to keep control over it.

SingularityCache should be disabled when OCI mode is enabled

@marcodelapierre
Copy link
Member

I agree to default --no-compat, but considering this is a new feature can be useful to keep control over it.

then we should put a clear warning on the docs, --oci --compat may result in unexpected behaviours and pipeline failure.

@marcodelapierre
Copy link
Member

Otherwise OK with what you say - see also Sylabs Dave's comments around OCI mode in Singularity - I did not have a clear enough picture of the scope for OCI mode.

We should aim to have --oci to work without --no-compat .

I will work on it, and I think we're on track to finalise this on Monday.

@pditommaso
Copy link
Member Author

Ok, now I've realised that --oci implies --compat by default (i had understood the opposite)

What's the side effect of using the same default when OCI is enabled on our side?

Signed-off-by: Paolo Di Tommaso <[email protected]>
@pditommaso
Copy link
Member Author

@marcodelapierre moved your changes to #4483.

@pditommaso
Copy link
Member Author

I'm testing this branch with classic nf-core/rnaseq + Singularity OCI mode enabled (using defaults) + Wave and it looks working fine.

Instead, some problems are reported when enabling the Fusion support. I've opened an issue specific for it separately

@marcodelapierre
Copy link
Member

I am about to do a commit that should fix some issues, not sure about yours, hold on

@marcodelapierre
Copy link
Member

marcodelapierre commented Nov 6, 2023

So @pditommaso @emi80 I confirm that Dave helped me build a clearer and correct picture of OCI mode in Singularity 4 in this Github issue: the key aspect is that by definition, OCI mode aims to have Singularity behave as close as possible to Docker.

I only have one outstanding little fix to propose, and will then perform some extra tests.

@pditommaso
Copy link
Member Author

Ok, please make a separate PR.

@marcodelapierre
Copy link
Member

marcodelapierre commented Nov 6, 2023

Here we go @pditommaso : #4484

This fixes issues experienced with certain pipelines (eg blast-example), where the changed behaviour of the work directory (OCI compliant) causes the pipeline to run in the wrong directory and hence crash.

I am wondering whether this will also fix the Fusion issues .. although I don't know their nature at the moment.

@marcodelapierre
Copy link
Member

Ok, let's go ahead with this. Can you please make this test by adding in your nextflow config this line to make sure it's solving the problem?

@pditommaso just saw this bit of thread right now. It is the usual issue with conda/mamba base images not being designed for Singularity. We have gone through this before, and now I actually have 2 PRs opened with Conda and Mamba devs to try and fix these issues (for Mamba: mamba-org/micromamba-docker#381).

Ultimately, it is the very same issue with the rnaseq-nf container: https://github.com/nextflow-io/rnaseq-nf/blob/master/docker/Dockerfile#L13

The same workaround can be applied inside Wave when building the container for the conda strategy .

@pditommaso
Copy link
Member Author

Indeed, the same patch has been applied to the Wave build for Docker seqeralabs/libseqera@348cf29

@marcodelapierre
Copy link
Member

.... and #4484 also ready for review

Copy link
Member

@marcodelapierre marcodelapierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • One change on compat option, file modules/nextflow/src/main/groovy/nextflow/container/SingularityBuilder.groovy

  • Test files may need updating as a consequence: modules/nextflow/src/test/groovy/nextflow/container/SingularityBuilderTest.groovy , modules/nextflow/src/test/groovy/nextflow/executor/BashWrapperBuilderTest.groovy

Note I would NOT document the compat option, should only be for internal debug use

@marcodelapierre
Copy link
Member

after chat: Let's remove the compat option completely

Signed-off-by: Paolo Di Tommaso <[email protected]>
@pditommaso
Copy link
Member Author

Done

@pditommaso
Copy link
Member Author

For sake of consistency of this PR, I've added also the fix for the singularity cd using $PWD here, see dbbc8e6.

Then, we'll merge #4484 over master to propagate the use of TASK_WORK_DIR

@pditommaso
Copy link
Member Author

ok, let's merge this

@pditommaso pditommaso merged commit f5362a7 into master Nov 6, 2023
19 checks passed
@pditommaso pditommaso deleted the singularity-oci branch November 6, 2023 17:50
@marcodelapierre marcodelapierre self-assigned this Nov 7, 2023
@marcodelapierre marcodelapierre added this to the 23.11.0-edge milestone Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Singularity OCI-mode
5 participants