Skip to content

Commit e02d739

Browse files
committed
ADD step by step README instructions for users
Signed-off-by: acmenezes <[email protected]>
1 parent 0443651 commit e02d739

File tree

3 files changed

+83
-4
lines changed

3 files changed

+83
-4
lines changed

README.md

Lines changed: 51 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ The kickstart supports two modes of deployments
8686
- [Hugging Face Token](https://huggingface.co/settings/tokens)
8787
- Access to [Meta Llama](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/) model.
8888
- Access to [Meta Llama Guard](https://huggingface.co/meta-llama/Llama-Guard-3-8B/) model.
89-
- Some of the example scripts use `jq` a JSON parsing utility which you can acquire via `brew install jq`
89+
- Some of the example scripts use `jq` a JSON parsing utility which you can acquire via `brew install jq` if you are on a MacOS or using your favorite package manager if you are on Linux.
9090

9191
### Supported Models
9292

@@ -180,7 +180,56 @@ model: llama-3-2-3b-instruct
180180
model: llama-guard-3-8b (shield)
181181
```
182182

183-
6. Install via make
183+
# Deploying RAG Blueprint Step by Step
184+
185+
## Step 1: Deploy LLM Services
186+
187+
```bash
188+
make install-llm-service NAMESPACE=llama-stack-rag LLM=llama-3-2-3b-instruct SAFETY=llama-guard-3-8b
189+
```
190+
191+
## Step 2: Install the mcp servers
192+
193+
```bash
194+
make install-mcp-servers NAMESPACE=llama-stack-rag
195+
```
196+
## Step 3: Deploy the main RAG UI components
197+
198+
```bash
199+
make install-llama-stack
200+
```
201+
202+
## Step 4: Set up PGVector database
203+
204+
```bash
205+
make pg-vector NAMESPACE=llama-stack-rag
206+
```
207+
208+
## Step 5: make create-minio-bucket NAMESPACE=llama-stack-rag
209+
210+
```bash
211+
make create-minio-bucket NAMESPACE=llama-stack-rag
212+
```
213+
214+
## Step 6: Configure the ingestion pipeline
215+
216+
```bash
217+
make configure-pipeline-server NAMESPACE=llama-stack-rag
218+
```
219+
220+
# Step 7: Create the ingestion pipeline
221+
222+
```bash
223+
make create-ingestion-pipeline NAMESPACE=llama-stack-rag
224+
```
225+
226+
# Final step verify status by listing the resources created:
227+
228+
```bash
229+
make status NAMESPACE=llama-stack-rag
230+
```
231+
232+
# Deploying RAG Blueprint All at once
184233

185234
Use the taint key from above as the `LLM_TOLERATION` and `SAFETY_TOLERATION`
186235

deploy/helm/Makefile

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -171,19 +171,31 @@ create-minio-bucket:
171171
install-rag: namespace secrets install-mcp-servers
172172
@$(eval HELM_ARGS := $(call helm_llama_stack_args))
173173

174+
@$(MAKE) pg-vector
175+
@echo "Waiting for pgvector database to be ready..."
176+
@$(MAKE) wait
177+
174178
@echo "Deploying Helm chart $(CHART_PATH) as release $(RELEASE_NAME) in namespace $(NAMESPACE)..."
175179
helm upgrade --install $(RELEASE_NAME) $(CHART_PATH) -n $(NAMESPACE) $(HELM_ARGS) $(EXTRA_HELM_ARGS)
176180

177-
@$(MAKE) pg-vector
178181
@$(MAKE) create-minio-bucket
179-
180182
@$(MAKE) status
181183
@$(MAKE) configure-pipeline-server
182184
@$(MAKE) create-ingestion-pipeline
183185

184186
@echo "Waiting for deployment to be ready..."
185187
@$(MAKE) wait
186188

189+
.PHONY: install-llama-stack
190+
install-llama-stack:
191+
@$(eval HELM_ARGS := $(call helm_llama_stack_args))
192+
193+
@echo "Deploying Helm chart $(CHART_PATH) as release $(RELEASE_NAME) in namespace $(NAMESPACE)..."
194+
helm upgrade --install $(RELEASE_NAME) $(CHART_PATH) -n $(NAMESPACE) $(HELM_ARGS) $(EXTRA_HELM_ARGS)
195+
196+
@echo "Waiting for deployment to be ready..."
197+
@$(MAKE) wait
198+
187199
install-%: install-llm-service-% install-rag
188200
@echo "Installed from target install-$*"
189201

docs/detailed-step-by-step.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
This command deploys the LLM services that power the RAG application. Here's what happens when you run it:
2+
3+
1. The command creates the namespace `llama-stack-rag` if it doesn't already exist
4+
2. It creates required secrets, including the Hugging Face token secret
5+
3. It deploys the `llm-service` Helm chart with specific model configurations:
6+
- Enables the `llama-3-2-3b-instruct` model as the main LLM
7+
- Enables the `llama-guard-3-8b` model as the safety filter
8+
9+
Each enabled model triggers the creation of:
10+
- A Persistent Volume Claim (PVC) to store the model files
11+
- A KServe InferenceService resource that defines how to run the model
12+
13+
The actual model pods are created by the KServe operator in OpenShift AI, which processes the InferenceService resources. Each model runs in its own vLLM instance, which is a high-performance inference engine for LLMs.
14+
15+
Note that if you don't see pods being created after running this command, ensure that:
16+
- Your OpenShift cluster has the OpenShift AI operator installed and properly configured
17+
- You have sufficient GPU resources available (if using GPU versions of the models)
18+
- The Hugging Face token provided has access to the requested models

0 commit comments

Comments
 (0)