-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] GPU Acceleration not functioning with CUDA #3138
Comments
Oh also the logs from the pod when the model was loaded:
|
So I finally got it working. The documentation for CUDA is incomplete and outdated. Given that DJI 0.28.0 is now being used, the documentation about CUDA 11.6 and pytorch 1.31.1 is incorrect. It should instead be in the supported list from here: I was able to use pytorch 2.2.2 and cuda 12.1 to finally get it launched. As well, there's quite a lot missing to make a working Docker image for this. Here's what I have that finally worked:
|
@justicel glad that it's working, can you close this issue if it's resolved? |
What is the bug?
Attempting to follow the sparse instructions for GPU acceleration using CUDA, here: https://opensearch.org/docs/latest/ml-commons-plugin/gpu-acceleration/
When using the instructions, they are mostly for Neuron, but suggest that you can definitely use CUDA. I built a custom docker image using the attached Dockerfile. It contains opensearch, cuda and pytorch.
However, on attempting to load a pre-built model it never shows as utilizing the GPU (via nvidia-smi output):
The 'no running processes definitely means nothing was loaded in to the GPU.
I'm uncertain if there's simply something missing, or, that this is expected.
How can one reproduce the bug?
Steps to reproduce the behavior:
ml
type nodesWhat is the expected behavior?
The model should be loaded and installed in GPU memory.
What is your host/environment?
Dockerfile:
Additional note: Hopefully a newer version of pytorch and cuda are actually supported?
The text was updated successfully, but these errors were encountered: