-
I am trying now for a few days and I can't get k3s to use the I am basically just following the documentation of the NVIDIA gpu-operator: 1. I changed these two env vars to point to the bundled containerdnvidia-gpu-operator.yamltoolkit:
env:
- name: CONTAINERD_CONFIG
value: /var/lib/rancher/k3s/agent/etc/containerd/config.toml
- name: CONTAINERD_SOCKET
value: /run/k3s/containerd/containerd.sock 2. Install the gpu-operatorhelm install --wait nvidia-gpu-operator \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
-f nvidia-gpu-operator.yaml 3. Wait for everything to finish installingrunning on a fresh ubuntu 20.04 cloud image
4. Some things I checked:
5. Create a pod (deployment in this case for easier iterations)apiVersion: apps/v1
kind: Deployment
metadata:
name: sample
labels:
app: sample
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: sample
template:
metadata:
labels:
app: sample
spec:
containers:
- name: sample
image: nvcr.io/nvidia/cuda:11.6.2-base-ubuntu20.04
command: [nvidia-smi]
resources:
limits:
nvidia.com/gpu: 1 6. No nvidia-smi and no nvidia-container-runtimeNow I get this error message:
also when instead using So basically its not using the nvidia runtime... Ive been at this for quite a while now and I cannot figure out what I missed. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Wow and of course seconds after writing this, something I tested works:
works. |
Beta Was this translation helpful? Give feedback.
Wow and of course seconds after writing this, something I tested works:
works.
Now I just need to figure out why I have to manually specify this and none of the examples on the NVIDIA docs include it.