Skip to content

flux GPUs #1554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 20, 2025
Merged

flux GPUs #1554

merged 1 commit into from
Jun 20, 2025

Conversation

bgruening
Copy link
Member

@sanjaysrikakulam this seems to work for me.

We could also pass $_CONDOR_AssignedGPUs to --gpus ... but this seems to be very tricky for quoting.

docker/docs#11010

Copy link
Member

@sanjaysrikakulam sanjaysrikakulam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to a dig a bit and did not come up with a cleaner HTCondor solution for this. Moving forward, for multi-GPU destinations with Docker containerized tools, we should explicitly add this ENV to ensure the tool sticks to the assigned GPU, rather than all the tools/jobs using GPU index 0.

@sanjaysrikakulam sanjaysrikakulam merged commit 94dc58a into master Jun 20, 2025
4 checks passed
@sanjaysrikakulam sanjaysrikakulam deleted the bgruening-patch-1 branch June 20, 2025 19:41
@bgruening
Copy link
Member Author

I guess there is no harm in adding it to the default Docker destination?

@sanjaysrikakulam
Copy link
Member

I guess there is no harm in adding it to the default Docker destination?

I am unsure if TPV merges the docker_run_extra_arguments and singularity_run_extra_arguments when they are specified in multiple places (destinations, tools, tool_defaults, users, roles, etc.).

@sanjaysrikakulam
Copy link
Member

sanjaysrikakulam commented Jun 23, 2025

I had a thought over the weekend on how we could add this to the destinations. We could try something like the following

rules:
  - id: has_gpus_all_flag
    if: "'--gpus all' in (entity.params.get('docker_run_extra_arguments') or '')"
    params:
      docker_run_extra_arguments: "{{ entity.params.get('docker_run_extra_arguments') or '' }} --env CUDA_VISIBLE_DEVICES=$_CONDOR_AssignedGPUs"

The above is a generic solution; if we have a GPU dedicated destination, then we don't have to check for --gpus all explicitly. This needs to be tested, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants