π§ CLOSED BETA VERSION π§
A modern job execution platform that uses Claude AI to plan and execute workloads on AWS EKS with both CPU and GPU support.
Beta Notice: This is a closed beta release. For questions, issues, or feedback, please contact: [email protected]
This project includes a cross-platform bootstrap that sets up an EKS cluster and deploys the API.
It will:
- Prompt you for AWS Region / Profile and your ANTHROPIC_API_KEY
- Create/update an EKS cluster (CPU or GPU)
- Configure NVIDIA device plugin as an EKS add-on (GPU mode)
- Apply Kubernetes namespace + RBAC
- Deploy the API (via
k8s/api.yamlif present, or via Helm if you have a chart)
- AWS account with permissions for EKS, IAM, and EC2
- AWS IAM Access Keys (Access Key ID + Secret Access Key)
- Auto-installed by the bootstrap scripts:
eksctl(automatically installed)kubectl(automatically installed on Windows)
- Manually install (one-time setup):
aws(AWS CLI v2) - Install Guidehelm- Install Guide
- Your Anthropic API key
# Run as Administrator (for tool installation)
.\scripts\bootstrap-eks.ps1
# Optional GPU mode:
.\scripts\bootstrap-eks.ps1 -GPU
# Pin specific eksctl version:
.\scripts\bootstrap-eks.ps1 -EksctlVersion 0.214.0# From repo root
bash scripts/bootstrap-eks.sh
# Optional GPU mode:
GPU=1 bash scripts/bootstrap-eks.sh
# Pin specific eksctl version:
EKSCTL_VERSION=0.214.0 bash scripts/bootstrap-eks.sh-
Auto-installs missing tools:
eksctl(via Chocolatey/Homebrew or direct download)kubectl(Windows only - assumes macOS/Linux users have it)
-
Prompts you for:
- Project name (default:
altngc) - AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_REGION (e.g.,
us-east-1) - GPU mode (yes/no)
- ANTHROPIC_API_KEY (if not already set in your environment)
- Project name (default:
-
Creates or updates an EKS cluster:
- CPU:
m5.largenodes - GPU:
g5.xlargenodes + installs NVIDIA device plugin as an EKS add-on
- CPU:
-
Applies:
- Kubernetes namespace (matching your project name)
- RBAC from
infra/k8s/altngc-rbac.yaml
-
Deploys your API:
- If
k8s/api.yamlexists βkubectl apply - Else prompts to deploy via Helm chart
charts/api(you'll provideimage:tag)
- If
-
Prints cluster nodes and (if deployed) API pods.
- Check nodes and pods:
kubectl -n <project> get nodes -o wide kubectl -n <project> get pods -o wide
- Expose the API via:
- A
Serviceof typeLoadBalancer, or - An Ingress + Ingress Controller (e.g., AWS Load Balancer Controller)
- A
- Missing tools: Only
awsandhelmneed manual install -eksctl/kubectlare auto-installed. - EKS auth: Ensure your AWS user/role can create/manage EKS and node groups.
- GPU jobs pending: Confirm GPU node type and NVIDIA add-on are present:
kubectl -n kube-system get daemonset nvidia-device-plugin-daemonset
- API not exposed: Add a
Service(typeLoadBalancer) or an Ingress manifest.
This project does not require AWS SSO or profiles.
Instead, you'll provide your classic IAM access keys:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_REGION
The bootstrap script will prompt you for these values on first run. They are exported only for the current session and passed directly to aws, eksctl, and kubectl.
Make sure the IAM user/role backing these keys has:
eks:*ec2:*iam:*s3:*
(Or attach the AdministratorAccess policy for quickstarts.)
- AI-Powered Planning: Claude translates natural language prompts into Kubernetes job specifications
- EKS Integration: Runs jobs on AWS EKS with automatic scaling
- GPU Support: Supports NVIDIA GPUs with proper node selection and tolerations
- Safety First: Image allowlists, resource caps, cooldowns, and preflight checks
- Real-time Streaming: Live job status and log streaming via Server-Sent Events
- S3 Integration: Presigned URLs for secure file uploads/downloads
- Modern UI: React + TypeScript frontend with real-time updates
- Node.js 18+
- Docker & Docker Compose
- AWS CLI configured with appropriate permissions
- kubectl configured for EKS access
- Anthropic API key for Claude integration
If you want a zero-K8s quickstart, run the Local Demo:
# 1) Put keys in api/.env
# ANTHROPIC_API_KEY=...
# AWS_REGION=us-east-1
# (Optional) AWS creds if not using an instance role/SSO
# 2) Start DB + API
npm run up # docker compose up --build
# 3) Health check
curl http://localhost:8000/healthThis runs Postgres + the API locally with RUNNER=local (configure in docker-compose.yml).
git clone <repo-url> altngc2
cd altngc2
npm installCreate api/.env file:
# Claude API
ANTHROPIC_API_KEY=your_claude_api_key_here
CLAUDE_MODEL=claude-3-5-sonnet-latest
CLAUDE_TIMEOUT_MS=12000
CLAUDE_MAX_TOKENS=800
CLAUDE_TEMPERATURE=0
# AWS & Kubernetes
AWS_REGION=us-east-1
K8S_NAMESPACE=altngc
S3_BUCKET=altngc-artifacts-yourid
# Database
DATABASE_URL="postgresql://postgres:password@localhost:5432/altngc"
# API Server
PORT=8000# Configure AWS credentials (choose one)
aws sso login --profile your-profile
export AWS_PROFILE=your-profile
# OR use temporary credentials
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
# Connect to EKS cluster
aws eks update-kubeconfig --name altngc --region us-east-1# Start database
docker compose up -d
# Start API server
cd api && npm run dev
# Start frontend (in another terminal)
cd web && npm run devkubectl apply -f infra/k8s/altngc-rbac.yamleksctl create cluster --name altngc --region us-east-1 --nodes 2 --node-type t3.small --with-oidc
aws eks update-kubeconfig --name altngc --region us-east-1
kubectl apply -f infra/k8s/altngc-rbac.yamleksctl create nodegroup --cluster altngc --name gpu-ng --region us-east-1 \
--node-type g4dn.xlarge --nodes 1 --managed \
--node-labels accelerator=nvidia \
--node-taints nvidia.com/gpu=true:NoSchedule
kubectl apply -n kube-system -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.16.2/nvidia-device-plugin.ymlcurl http://localhost:8000/health
# Expected: {"api":"ok","db":"up","k8s":"up"}curl -X POST http://localhost:8000/api/v1/auto/plan-and-run \
-H "Content-Type: application/json" \
-d '{"prompt":"echo hello from alpine"}'curl -X POST http://localhost:8000/api/v1/auto/plan-and-run \
-H "Content-Type: application/json" \
-d '{"prompt":"Run nvidia-smi on 1 GPU"}'curl -X POST http://localhost:8000/api/v1/uploads/presign \
-H "Content-Type: application/json" \
-d '{"key":"test.txt","contentType":"text/plain"}'βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Web Frontend βββββΆβ API Server βββββΆβ EKS Cluster β
β (React + TS) β β (Fastify) β β (CPU + GPU) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Claude API β β PostgreSQL β
β (Planning) β β (Job State) β
βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β S3 Bucket β
β (Artifacts) β
βββββββββββββββββββ
- Image Allowlist: Only trusted container images allowed
- Resource Caps: CPU (16 cores), Memory (64GB), GPU (2 units)
- Cooldown: 5-second minimum between job submissions
- Timeout: 12-second Claude API timeout
- TTL: Kubernetes jobs auto-cleanup after 10 minutes
- Preflight: GPU availability checks before scheduling
- Real-time job status via Server-Sent Events
- Pod log streaming
- Health checks for DB and K8s connectivity
- Comprehensive error handling and logging
- Environment-based configuration (no hardcoded secrets)
- AWS credential chain (SSO/profiles preferred)
- RBAC for Kubernetes permissions
- Input validation and sanitization
- Rate limiting and resource caps
POST /api/v1/auto/plan-and-run- Submit natural language jobGET /api/v1/jobs/{id}/stream- Stream job status/logs (SSE)POST /api/v1/jobs/{id}/run- Manually trigger job executionGET /health- System health check
POST /api/v1/uploads/presign- Get presigned upload URL
GET /api/v1/projects- List projectsPOST /api/v1/projects- Create projectGET /api/v1/projects/{id}/jobs- List project jobsGET /api/v1/jobs/{id}- Get job details
1. "No GPU nodes available"
- Ensure GPU nodegroup is created and NVIDIA device plugin is installed
- Check node labels:
kubectl get nodes --show-labels
2. "Invalid JSON from model"
- Check Claude API key is valid
- Verify network connectivity to Anthropic API
3. "Job not found"
- Ensure job was created successfully in database
- Check job ID in the response
4. Kubernetes connection issues
- Verify
kubectl get pods -n altngcworks - Check AWS credentials and kubeconfig
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
MIT License - see LICENSE file for details
Built with β€οΈ using Claude AI, React, Fastify, and Kubernetes