Fix issue 390 support different machine types on gcp #451

gmiasnychenko · 2025-05-23T20:19:16Z

As per #390, there is a feature request for allowing to choose different machine types on GCP. Here I tried to implement that, and make only the workers to use GPU.

I used #369 as a reference

dbalabka · 2025-05-28T15:57:25Z

dask_cloudprovider/gcp/instances.py

+        if ngpus is not None:
+            self.scheduler_options["ngpus"] = 0
+            self.scheduler_options["gpu_type"] = None
+            self.scheduler_options["gpu_instance"] = False


@gmiasnychenko should we set scheduler gpus settings always, whatever the number of GPUs?

Also, please leave a comment that we don't run tasks on scheduler, so we don't need a GPU there.

As for setting GPUs settings, I believe the answer is yes. All the settings are going into the self.options, which is the base for later self.scheduler_options and self.worker_options. If we don't override the scheduler GPU settings, they will stay from up above, and we will have the same configuration for both scheduler and worker.

I can move the overriding outside the if statement, if that's what you mean. It provides more clarity, but functionally should be the same

I agree with providing more documentation. I will add it for the ngpus and gpu_type argument descriptions.

dbalabka · 2025-05-28T16:01:30Z

dask_cloudprovider/gcp/instances.py

@@ -603,7 +613,14 @@ def __init__(
            bootstrap if bootstrap is not None else self.config.get("bootstrap")
        )
        self.machine_type = machine_type or self.config.get("machine_type")
-        self.gpu_instance = "gpu" in self.machine_type or bool(ngpus)
+        if machine_type is None:


@gmiasnychenko it would be great if we could check that machine_type is set XOR scheduler/worker_machine_type; otherwise, we should throw an error. It should be a BC safe check.

- added info on GPU logic in docs - adjusted scheduler GPU logic - fixed the machine type checker

jacobtomlinson

Thanks for this, seems like a great improvement!

jacobtomlinson · 2025-06-02T09:42:03Z

dask_cloudprovider/gcp/instances.py

@@ -445,10 +453,11 @@ class GCPCluster(VMCluster):
    extra_bootstrap: list[str] (optional)
        Extra commands to be run during the bootstrap phase.
    ngpus: int (optional)
-        The number of GPUs to atatch to the instance.
+        The number of GPUs to atatch to the worker instance. No work is expected to be done on scheduler, so no


This isn't true. Due to the way that Dask uses pickle to move things around there are cases where the scheduler might deserialize a meta object which may try and allocate a small amount of GPU memory. It's always recommended to have a small GPU available on the scheduler.

https://docs.rapids.ai/deployment/stable/guides/scheduler-gpu-requirements/

Thank you for the feedback!

While it makes sense to have a GPU on the scheduler to avoid these issues, I think it would be beneficial to allow some flexibility in the configuration. Some users might want different GPU configurations (e.g., a smaller/cheaper GPU on the scheduler vs. more powerful ones on workers), or in some cases might want to explicitly disable scheduler GPUs for cost reasons despite the potential pickle issues.

I've updated the PR to support both approaches:

Unified configuration (existing behavior): ngpus and gpu_type apply to both scheduler and workers

Separate configuration (new): scheduler_ngpus/scheduler_gpu_type and worker_ngpus/worker_gpu_type for fine-grained control

The default behavior remains the same (same GPU config for both), but now users have the flexibility to choose different configurations when needed. I've also updated the documentation to mention the scheduler GPU requirements you referenced

Yup totally agree with all of that!

- Maintain existing GPU logic - Add ability to specify different GPU configurations for workers and scheduler

jacobtomlinson

One small nitpick.

dask_cloudprovider/cloudprovider.yaml

- Updating the documentation to get rid of old GPUs

gmiasnychenko added 4 commits May 22, 2025 14:51

Adding support for different machine types on GCP

aa8aeb3

Making GPUs present only on worker instances

38ae998

Fixing the GPU instance on scheduler

e6c95d4

Cleanup

5c19ceb

dbalabka requested changes May 28, 2025

View reviewed changes

Adjusting to feedback:

087bda8

- added info on GPU logic in docs - adjusted scheduler GPU logic - fixed the machine type checker

jacobtomlinson reviewed Jun 2, 2025

View reviewed changes

Adjusting to feedback:

d7efeb8

- Maintain existing GPU logic - Add ability to specify different GPU configurations for workers and scheduler

gmiasnychenko requested a review from jacobtomlinson June 2, 2025 18:39

jacobtomlinson reviewed Jun 3, 2025

View reviewed changes

dask_cloudprovider/cloudprovider.yaml Outdated Show resolved Hide resolved

Adjusting to feedback:

323ccb9

- Updating the documentation to get rid of old GPUs

gmiasnychenko requested a review from jacobtomlinson June 3, 2025 14:12

jacobtomlinson merged commit 0e2bdbf into dask:main Jun 3, 2025
5 of 7 checks passed

gmiasnychenko mentioned this pull request Jun 24, 2025

Fixing the log error #465

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix issue 390 support different machine types on gcp #451

Fix issue 390 support different machine types on gcp #451

Uh oh!

gmiasnychenko commented May 23, 2025

Uh oh!

dbalabka May 28, 2025

Uh oh!

gmiasnychenko May 28, 2025

Uh oh!

dbalabka May 28, 2025

Uh oh!

jacobtomlinson left a comment

Uh oh!

jacobtomlinson Jun 2, 2025

Uh oh!

gmiasnychenko Jun 2, 2025

Uh oh!

jacobtomlinson Jun 3, 2025

Uh oh!

jacobtomlinson left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix issue 390 support different machine types on gcp #451

Fix issue 390 support different machine types on gcp #451

Uh oh!

Conversation

gmiasnychenko commented May 23, 2025

Uh oh!

dbalabka May 28, 2025

Choose a reason for hiding this comment

Uh oh!

gmiasnychenko May 28, 2025

Choose a reason for hiding this comment

Uh oh!

dbalabka May 28, 2025

Choose a reason for hiding this comment

Uh oh!

jacobtomlinson left a comment

Choose a reason for hiding this comment

Uh oh!

jacobtomlinson Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

gmiasnychenko Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

jacobtomlinson Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

jacobtomlinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!