ADD step by step README instructions for users

acmenezes · acmenezes · commit 1406283b7aae · 2025-04-29T15:26:47.000-04:00
diff --git a/README.md b/README.md
@@ -86,7 +86,7 @@ The kickstart supports two modes of deployments
 - [Hugging Face Token](https://huggingface.co/settings/tokens)
 - Access to [Meta Llama](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/) model.
 - Access to [Meta Llama Guard](https://huggingface.co/meta-llama/Llama-Guard-3-8B/) model.
-- Some of the example scripts use `jq` a JSON parsing utility which you can acquire via `brew install jq`
+- Some of the example scripts use `jq` a JSON parsing utility which you can acquire via `brew install jq` if you are on a MacOS or using your favorite package manager if you are on Linux.
 
 ### Supported Models
 
@@ -180,7 +180,35 @@ model: llama-3-2-3b-instruct
 model: llama-guard-3-8b (shield)
 ```
 
-6. Install via make
+# Deploying RAG Blueprint Step by Step
+
+## Step 1: Deploy LLM Services
+
+```bash
+make install-llm-service NAMESPACE=llama-stack-rag LLM=llama-3-2-3b-instruct SAFETY=llama-guard-3-8b
+```
+
+This command deploys the LLM services that power the RAG application. Here's what happens when you run it:
+
+1. The command creates the namespace `llama-stack-rag` if it doesn't already exist
+2. It creates required secrets, including the Hugging Face token secret
+3. It deploys the `llm-service` Helm chart with specific model configurations:
+   - Enables the `llama-3-2-3b-instruct` model as the main LLM
+   - Enables the `llama-guard-3-8b` model as the safety filter
+
+Each enabled model triggers the creation of:
+- A Persistent Volume Claim (PVC) to store the model files
+- A KServe InferenceService resource that defines how to run the model
+
+The actual model pods are created by the KServe operator in OpenShift AI, which processes the InferenceService resources. Each model runs in its own vLLM instance, which is a high-performance inference engine for LLMs.
+
+Note that if you don't see pods being created after running this command, ensure that:
+- Your OpenShift cluster has the OpenShift AI operator installed and properly configured
+- You have sufficient GPU resources available (if using GPU versions of the models)
+- The Hugging Face token provided has access to the requested models 
+
+
+# Deploying RAG Blueprint All at once
 
 Use the taint key from above as the `LLM_TOLERATION` and `SAFETY_TOLERATION`