-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
What is the outcome that you are trying to reach?
Currently, there is no standardized, end-to-end guide for deploying the NVIDIA AI-Q Research Assistant Blueprint on Amazon EKS. Users who want to leverage this advanced RAG architecture on AWS face the significant problem of manually configuring a complex environment. This includes provisioning multiple, specialized GPU node groups, setting up the NVIDIA GPU Operator, integrating with AWS services like OpenSearch Serverless and Load Balancers, and correctly configuring IAM Roles for Service Accounts (IRSA) for secure access. This complexity creates a high barrier to adoption for teams wanting to build powerful research assistants on EKS.
The goal is to publish a repeatable, end-to-end solution for deploying the NVIDIA Deep Research Agent Blueprint on Amazon EKS. This will provide the community with a powerful, enterprise-grade deep research agent solution using NVIDIA NIMs on AWS infrastructure.
Describe the solution you would like
I propose adding a new solution that provides a comprehensive guide and assets for deploying the NVIDIA Deep research agent blueprint. This solution will include:
Infrastructure as Code to provision an EKS cluster with multiple specialized GPU node groups for all the NVIDIA NIMs int he solution
Kubernetes Manifests/Helm Charts to deploy the solution, including the 49B Llama Nemotron model,, Nemo retriever NIMs, data ingestion services, and other components.
Integration for key AWS services, such as using AWS OpenSearch Serverless for vector storage, IAM Roles for Service Accounts (IRSA) for secure access, and AWS Load Balancer Controller for exposing services.
Describe alternatives you have considered
The alternative is for individual users to manually adapt NVIDIA's general deployment guides for EKS. This is a complex and time-consuming process that involves significant manual configuration of networking, IAM policies, OIDC providers, GPU drivers, and Kubernetes storage. This manual approach is error-prone and presents a high barrier to entry. A dedicated ai-on-eks solution would drastically simplify and standardize the process.
Additional context
Arch diagram provided
