Open
Description
Search before asking
- I had searched in the issues and found no similar feature requirement.
Description
if use aliyun k8s gpu share, gpu key is aliyun.com/gpu-mem
workerGroupSpecs:
resources:
limits:
aliyun.com/gpu-mem: "1"
cpu: "1"
memory: 2Gi
requests:
aliyun.com/gpu-mem: "1"
cpu: "1"
memory: 2Gi
autoscaler will not work when request gpu resource
(autoscaler +3m13s) Error: No available node types can fulfill resource request {'GPU': 1.0, 'CPU': 1.0}. Add suitable node types to this cluster to resolve this issue.
code:
import ray
import time
ray.init()
@ray.remote(num_gpus=1)
def gpu_task():
import torch
x = torch.rand(10000, 10000).cuda()
y = torch.mm(x, x)
return y.sum().item()
future = gpu_task.remote()
result = ray.get(future)
print("Result:", result)
ray.shutdown()
Use case
No response
Related issues
none
Are you willing to submit a PR?
- Yes I am willing to submit a PR!