ADD step by step README instructions for users

acmenezes · acmenezes · commit e02d739ad330 · 2025-04-29T16:55:48.000-04:00
Signed-off-by: acmenezes &lt;adcmenezes@gmail.com&gt;
diff --git a/README.md b/README.md
@@ -86,7 +86,7 @@ The kickstart supports two modes of deployments
 - [Hugging Face Token](https://huggingface.co/settings/tokens)
 - Access to [Meta Llama](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/) model.
 - Access to [Meta Llama Guard](https://huggingface.co/meta-llama/Llama-Guard-3-8B/) model.
-- Some of the example scripts use `jq` a JSON parsing utility which you can acquire via `brew install jq`
+- Some of the example scripts use `jq` a JSON parsing utility which you can acquire via `brew install jq` if you are on a MacOS or using your favorite package manager if you are on Linux.
 
 ### Supported Models
 
@@ -180,7 +180,56 @@ model: llama-3-2-3b-instruct
 model: llama-guard-3-8b (shield)
 ```
 
-6. Install via make
+# Deploying RAG Blueprint Step by Step
+
+## Step 1: Deploy LLM Services
+
+```bash
+make install-llm-service NAMESPACE=llama-stack-rag LLM=llama-3-2-3b-instruct SAFETY=llama-guard-3-8b
+```
+
+## Step 2: Install the mcp servers
+
+```bash
+make install-mcp-servers NAMESPACE=llama-stack-rag
+```
+## Step 3: Deploy the main RAG UI components
+
+```bash
+make install-llama-stack
+```
+
+## Step 4: Set up PGVector database
+
+```bash
+make pg-vector NAMESPACE=llama-stack-rag
+```
+
+## Step 5: make create-minio-bucket NAMESPACE=llama-stack-rag
+
+```bash
+make create-minio-bucket NAMESPACE=llama-stack-rag
+```
+
+## Step 6: Configure the ingestion pipeline
+
+```bash
+make configure-pipeline-server NAMESPACE=llama-stack-rag
+```
+
+# Step 7: Create the ingestion pipeline
+
+```bash
+make create-ingestion-pipeline NAMESPACE=llama-stack-rag
+```
+
+# Final step verify status by listing the resources created:
+
+```bash
+make status NAMESPACE=llama-stack-rag
+```
+
+# Deploying RAG Blueprint All at once
 
 Use the taint key from above as the `LLM_TOLERATION` and `SAFETY_TOLERATION`
 
diff --git a/deploy/helm/Makefile b/deploy/helm/Makefile
@@ -171,19 +171,31 @@ create-minio-bucket:
 install-rag: namespace secrets install-mcp-servers
 	@$(eval HELM_ARGS := $(call helm_llama_stack_args))
 
+	@$(MAKE) pg-vector
+	@echo "Waiting for pgvector database to be ready..."
+	@$(MAKE) wait
+
 	@echo "Deploying Helm chart $(CHART_PATH) as release $(RELEASE_NAME) in namespace $(NAMESPACE)..."
 	helm upgrade --install $(RELEASE_NAME) $(CHART_PATH) -n $(NAMESPACE) $(HELM_ARGS) $(EXTRA_HELM_ARGS)
 
-	@$(MAKE) pg-vector
 	@$(MAKE) create-minio-bucket
-	
 	@$(MAKE) status
 	@$(MAKE) configure-pipeline-server
 	@$(MAKE) create-ingestion-pipeline
 
 	@echo "Waiting for deployment to be ready..."
 	@$(MAKE) wait
 
+.PHONY: install-llama-stack
+install-llama-stack:
+	@$(eval HELM_ARGS := $(call helm_llama_stack_args))
+
+	@echo "Deploying Helm chart $(CHART_PATH) as release $(RELEASE_NAME) in namespace $(NAMESPACE)..."
+	helm upgrade --install $(RELEASE_NAME) $(CHART_PATH) -n $(NAMESPACE) $(HELM_ARGS) $(EXTRA_HELM_ARGS)
+
+	@echo "Waiting for deployment to be ready..."
+	@$(MAKE) wait
+
 install-%: install-llm-service-% install-rag
 	@echo "Installed from target install-$*"
 
diff --git a/docs/detailed-step-by-step.md b/docs/detailed-step-by-step.md
@@ -0,0 +1,18 @@
+This command deploys the LLM services that power the RAG application. Here's what happens when you run it:
+
+1. The command creates the namespace `llama-stack-rag` if it doesn't already exist
+2. It creates required secrets, including the Hugging Face token secret
+3. It deploys the `llm-service` Helm chart with specific model configurations:
+   - Enables the `llama-3-2-3b-instruct` model as the main LLM
+   - Enables the `llama-guard-3-8b` model as the safety filter
+
+Each enabled model triggers the creation of:
+- A Persistent Volume Claim (PVC) to store the model files
+- A KServe InferenceService resource that defines how to run the model
+
+The actual model pods are created by the KServe operator in OpenShift AI, which processes the InferenceService resources. Each model runs in its own vLLM instance, which is a high-performance inference engine for LLMs.
+
+Note that if you don't see pods being created after running this command, ensure that:
+- Your OpenShift cluster has the OpenShift AI operator installed and properly configured
+- You have sufficient GPU resources available (if using GPU versions of the models)
+- The Hugging Face token provided has access to the requested models