Skip to content

Commit 1406283

Browse files
committed
ADD step by step README instructions for users
1 parent 0443651 commit 1406283

File tree

1 file changed

+30
-2
lines changed

1 file changed

+30
-2
lines changed

README.md

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ The kickstart supports two modes of deployments
8686
- [Hugging Face Token](https://huggingface.co/settings/tokens)
8787
- Access to [Meta Llama](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/) model.
8888
- Access to [Meta Llama Guard](https://huggingface.co/meta-llama/Llama-Guard-3-8B/) model.
89-
- Some of the example scripts use `jq` a JSON parsing utility which you can acquire via `brew install jq`
89+
- Some of the example scripts use `jq` a JSON parsing utility which you can acquire via `brew install jq` if you are on a MacOS or using your favorite package manager if you are on Linux.
9090

9191
### Supported Models
9292

@@ -180,7 +180,35 @@ model: llama-3-2-3b-instruct
180180
model: llama-guard-3-8b (shield)
181181
```
182182

183-
6. Install via make
183+
# Deploying RAG Blueprint Step by Step
184+
185+
## Step 1: Deploy LLM Services
186+
187+
```bash
188+
make install-llm-service NAMESPACE=llama-stack-rag LLM=llama-3-2-3b-instruct SAFETY=llama-guard-3-8b
189+
```
190+
191+
This command deploys the LLM services that power the RAG application. Here's what happens when you run it:
192+
193+
1. The command creates the namespace `llama-stack-rag` if it doesn't already exist
194+
2. It creates required secrets, including the Hugging Face token secret
195+
3. It deploys the `llm-service` Helm chart with specific model configurations:
196+
- Enables the `llama-3-2-3b-instruct` model as the main LLM
197+
- Enables the `llama-guard-3-8b` model as the safety filter
198+
199+
Each enabled model triggers the creation of:
200+
- A Persistent Volume Claim (PVC) to store the model files
201+
- A KServe InferenceService resource that defines how to run the model
202+
203+
The actual model pods are created by the KServe operator in OpenShift AI, which processes the InferenceService resources. Each model runs in its own vLLM instance, which is a high-performance inference engine for LLMs.
204+
205+
Note that if you don't see pods being created after running this command, ensure that:
206+
- Your OpenShift cluster has the OpenShift AI operator installed and properly configured
207+
- You have sufficient GPU resources available (if using GPU versions of the models)
208+
- The Hugging Face token provided has access to the requested models
209+
210+
211+
# Deploying RAG Blueprint All at once
184212

185213
Use the taint key from above as the `LLM_TOLERATION` and `SAFETY_TOLERATION`
186214

0 commit comments

Comments
 (0)