-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Team,
I recently had the opportunity to work with this project and was able to get everything up and running successfully using the $ make install
command.
Overall, the setup process went smoothly, but I did have to make a few adjustments due to environment-specific and hardware-related constraints.
1. Container Image Pull Issues
Initially, I encountered issues pulling container images due to authentication restrictions. To resolve this, I followed the article from the Red Hat documentation here: https://access.redhat.com/RegistryAuthentication
I ran the following commands to create and link a pull secret:
$ oc create secret generic <pull_secret_name> \
--from-file=.dockerconfigjson=<path/to/.docker/config.json> \
--type=kubernetes.io/dockerconfigjson
$ oc secrets link default <pull_secret_name> --for=pull
I executed these commands during the $ make install
process specifically, at the point where the script was prompting for my Hugging Face token. This allowed the project setup to proceed using the pull secret by default.
2. Precision Compatibility with GPU
Once the containers were successfully pulled, I ran into a precision related error due to the GPU on my hardware (Tesla T4) not supporting BF16:
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0.
Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly
setting the `dtype` flag in CLI, for example: --dtype=half.
To resolve this, I changed the model precision to half by modifying the following files:
deploy/helm/rag-ui/values.yaml
safety-model:
extraArgs:
- --dtype
- "half"
- --model
- meta-llama/Llama-Guard-3-1B
deploy/helm/rag-ui/charts/llama-serve/values.yaml
args:
- --enable-auto-tool-choice
- --chat-template
- /app/tool_chat_template_llama3.2_json.jinja
- --tool-call-parser
- llama3_json
- --port
- "8000"
- --dtype
- "half"
- '--max-model-len'
- '8192'
Summary
Everything worked well after these changes. I’m sharing this here in case others run into similar issues or if the team wants to consider incorporating these workarounds into the documentation for broader compatibility.
Thanks!
~ Limitless