Skip to content

Obstacles seen with this project #16

@real-limitless

Description

@real-limitless

Team,

I recently had the opportunity to work with this project and was able to get everything up and running successfully using the $ make install command.

Overall, the setup process went smoothly, but I did have to make a few adjustments due to environment-specific and hardware-related constraints.

1. Container Image Pull Issues

Initially, I encountered issues pulling container images due to authentication restrictions. To resolve this, I followed the article from the Red Hat documentation here: https://access.redhat.com/RegistryAuthentication

I ran the following commands to create and link a pull secret:

$ oc create secret generic <pull_secret_name> \
    --from-file=.dockerconfigjson=<path/to/.docker/config.json> \
    --type=kubernetes.io/dockerconfigjson

$ oc secrets link default <pull_secret_name> --for=pull

I executed these commands during the $ make install process specifically, at the point where the script was prompting for my Hugging Face token. This allowed the project setup to proceed using the pull secret by default.

2. Precision Compatibility with GPU

Once the containers were successfully pulled, I ran into a precision related error due to the GPU on my hardware (Tesla T4) not supporting BF16:

ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0.
Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly
setting the `dtype` flag in CLI, for example: --dtype=half.

To resolve this, I changed the model precision to half by modifying the following files:

  • deploy/helm/rag-ui/values.yaml
safety-model:
  extraArgs:
    - --dtype
    - "half"
    - --model
    - meta-llama/Llama-Guard-3-1B
  • deploy/helm/rag-ui/charts/llama-serve/values.yaml
args:
  - --enable-auto-tool-choice
  - --chat-template
  - /app/tool_chat_template_llama3.2_json.jinja
  - --tool-call-parser
  - llama3_json
  - --port
  - "8000"
  - --dtype
  - "half"
  - '--max-model-len'
  - '8192'

Summary

Everything worked well after these changes. I’m sharing this here in case others run into similar issues or if the team wants to consider incorporating these workarounds into the documentation for broader compatibility.

Thanks!

~ Limitless

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions