This README will provide guidance for using the Kagenti project with Llama Stack.
The model could optionally be provided in-cluster OR served external to the cluster. The scenario described here at minimum needs OpenShift AI specifically with the capabilities of Llama Stack.
Ensure the RHOAI operator is installed through Operator hub.
An example model deployment is located at kubernetes/llama3.2-3b/
By default, OpenShift does not initialize Llama Stack. When deploying the default Data Science Cluster modify the YAML and enable llamastackoperator:
llamastackoperator:
managementState: ManagedThe operator will become ready once Llama stack is initialized.
To validate Llama Stack is ready and available the following can be done.
oc get llsd -n default
No resources found in default namespace.oc get po -n redhat-ods-applications | grep llama
llama-stack-k8s-operator-controller-manager-64554d8c6f-6hkp5 1/1 Running 0 4h33mAn example LLama Stack deployment exists in kubernetes/llama-stack-dist modify the VLLM_URL and INFERENCE_MODEL with your values before deploying.
env:
- name: INFERENCE_MODEL
value: "llama32-3b"
- name: VLLM_URL
value: "https://llama32-3b.serving.svc.cluster.local/v1"The LLama Stack endpoint can be tested by doing a port forward and a curl.
kubectl port-forward -n serving svc/lsd-llama32-3b-service 8321:8321
curl -X POST http://localhost:8321/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama32-3b",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"max_tokens": 100,
"stream": false
}'Kagenti is a Kubernetes-based control plane for AI agents. It provides a framework-neutral, scalable, and secure platform for deploying and orchestrating AI agents.
- Kagenti installed on the cluster (see Kagenti installation guide)
- LlamaStack endpoint deployed and accessible (see sections above)
Once Kagenti is installed, access the UI:
- URL:
https://kagenti-ui-kagenti-system.apps.<cluster-domain>/ - Credentials: Check with your cluster administrator (default:
temp-admin/ auto-generated password)
Kagenti supports deploying agents that use any OpenAI-compatible LLM endpoint, including LlamaStack.
For a complete, tested deployment with MCP tools, use the kagenti-llamastack-poc folder:
cd kubernetes/kagenti-llamastack-poc/scripts
chmod +x *.sh
./01-setup.sh # Setup namespace & permissions
./02-deploy-agent.sh # Build and deploy agent
./03-deploy-tools.sh # Deploy MCP tools (weather, calculator)
./04-patch-mcp-urls.sh # Connect tools to agent
./05-test.sh # Test everythingSee kubernetes/kagenti-llamastack-poc/README.md for full documentation including workarounds and troubleshooting.
-
Access the Kagenti UI
https://kagenti-ui-kagenti-system.apps.llama.octo-emerging.redhataicoe.com/ -
Navigate to "Import New Agent"
-
Configure the agent:
- Namespace:
kagenti-system(or create a new one) - Deployment Method: Build from source
- Repository URL:
https://github.com/kagenti/agent-examples - Subfolder:
a2a/generic_agent - Protocol: A2A
- Namespace:
-
Add environment variables:
Variable Value LLM_MODELvllm-inference/llama32-3bLLM_API_BASEhttp://lsd-llama32-3b-service.serving.svc.cluster.local:8321/v1LLM_API_KEYdummyMCP_TRANSPORTstreamable_http -
Click "Build New Agent" and wait for deployment to complete.
For manual deployment, see the YAML manifests in kubernetes/kagenti-llamastack-poc/agent/.
- In the Kagenti UI, navigate to Agent Catalog
- Find your deployed agent and click View Details
- Use the chat interface at the bottom to test:
Hello! Can you tell me about yourself?
The agent connects to LlamaStack using these settings:
| Setting | Value |
|---|---|
| Service URL | http://lsd-llama32-3b-service.serving.svc.cluster.local:8321/v1 |
| Model ID | vllm-inference/llama32-3b |
| API Format | OpenAI-compatible (/v1/chat/completions) |
| Authentication | None required (set LLM_API_KEY=dummy) |
If the agent fails to deploy or respond:
-
Check LlamaStack is running:
oc get pods -n serving | grep llama oc get llsd -n serving -
Test the LlamaStack endpoint:
oc exec -n serving deployment/lsd-llama32-3b -- curl -s -X POST \ http://localhost:8321/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "vllm-inference/llama32-3b", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 20}'
-
Check agent pod logs:
oc logs -n kagenti-system -l app=<agent-name> --tail=100
-
Verify network connectivity:
oc exec -n kagenti-system deployment/<agent-deployment> -- curl -s http://lsd-llama32-3b-service.serving.svc.cluster.local:8321/v1/models