AltNGC2 - Claude-Powered EKS Job Runner

🚧 CLOSED BETA VERSION 🚧

A modern job execution platform that uses Claude AI to plan and execute workloads on AWS EKS with both CPU and GPU support.

Beta Notice: This is a closed beta release. For questions, issues, or feedback, please contact: [email protected]

🚀 Cloud Quickstart (One-liner EKS)

This project includes a cross-platform bootstrap that sets up an EKS cluster and deploys the API.

It will:

Prompt you for AWS Region / Profile and your ANTHROPIC_API_KEY
Create/update an EKS cluster (CPU or GPU)
Configure NVIDIA device plugin as an EKS add-on (GPU mode)
Apply Kubernetes namespace + RBAC
Deploy the API (via k8s/api.yaml if present, or via Helm if you have a chart)

Prerequisites

AWS account with permissions for EKS, IAM, and EC2
AWS IAM Access Keys (Access Key ID + Secret Access Key)
Auto-installed by the bootstrap scripts:
- eksctl (automatically installed)
- kubectl (automatically installed on Windows)
Manually install (one-time setup):
- aws (AWS CLI v2) - Install Guide
- helm - Install Guide
Your Anthropic API key

Windows (PowerShell)

# Run as Administrator (for tool installation)
.\scripts\bootstrap-eks.ps1

# Optional GPU mode:
.\scripts\bootstrap-eks.ps1 -GPU

# Pin specific eksctl version:
.\scripts\bootstrap-eks.ps1 -EksctlVersion 0.214.0

macOS / Linux (bash/zsh)

# From repo root
bash scripts/bootstrap-eks.sh

# Optional GPU mode:
GPU=1 bash scripts/bootstrap-eks.sh

# Pin specific eksctl version:
EKSCTL_VERSION=0.214.0 bash scripts/bootstrap-eks.sh

What the script does

Auto-installs missing tools:
- eksctl (via Chocolatey/Homebrew or direct download)
- kubectl (Windows only - assumes macOS/Linux users have it)
Prompts you for:
- Project name (default: altngc)
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_REGION (e.g., us-east-1)
- GPU mode (yes/no)
- ANTHROPIC_API_KEY (if not already set in your environment)
Creates or updates an EKS cluster:
- CPU: m5.large nodes
- GPU: g5.xlarge nodes + installs NVIDIA device plugin as an EKS add-on
Applies:
- Kubernetes namespace (matching your project name)
- RBAC from infra/k8s/altngc-rbac.yaml
Deploys your API:
- If k8s/api.yaml exists → kubectl apply
- Else prompts to deploy via Helm chart charts/api (you'll provide image:tag)
Prints cluster nodes and (if deployed) API pods.

After bootstrap

Check nodes and pods:

kubectl -n <project> get nodes -o wide
kubectl -n <project> get pods -o wide

Expose the API via:
- A Service of type LoadBalancer, or
- An Ingress + Ingress Controller (e.g., AWS Load Balancer Controller)

Troubleshooting

Missing tools: Only aws and helm need manual install - eksctl/kubectl are auto-installed.
EKS auth: Ensure your AWS user/role can create/manage EKS and node groups.
GPU jobs pending: Confirm GPU node type and NVIDIA add-on are present:
```
kubectl -n kube-system get daemonset nvidia-device-plugin-daemonset
```
API not exposed: Add a Service (type LoadBalancer) or an Ingress manifest.

AWS Credentials

This project does not require AWS SSO or profiles.

Instead, you'll provide your classic IAM access keys:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION

The bootstrap script will prompt you for these values on first run. They are exported only for the current session and passed directly to aws, eksctl, and kubectl.

Make sure the IAM user/role backing these keys has:

eks:*
ec2:*
iam:*
s3:*

(Or attach the AdministratorAccess policy for quickstarts.)

🚀 Features

AI-Powered Planning: Claude translates natural language prompts into Kubernetes job specifications
EKS Integration: Runs jobs on AWS EKS with automatic scaling
GPU Support: Supports NVIDIA GPUs with proper node selection and tolerations
Safety First: Image allowlists, resource caps, cooldowns, and preflight checks
Real-time Streaming: Live job status and log streaming via Server-Sent Events
S3 Integration: Presigned URLs for secure file uploads/downloads
Modern UI: React + TypeScript frontend with real-time updates

📋 Prerequisites

Node.js 18+
Docker & Docker Compose
AWS CLI configured with appropriate permissions
kubectl configured for EKS access
Anthropic API key for Claude integration

⚡ Local Demo (No Kubernetes)

If you want a zero-K8s quickstart, run the Local Demo:

# 1) Put keys in api/.env
# ANTHROPIC_API_KEY=...
# AWS_REGION=us-east-1
# (Optional) AWS creds if not using an instance role/SSO

# 2) Start DB + API
npm run up   # docker compose up --build

# 3) Health check
curl http://localhost:8000/health

This runs Postgres + the API locally with RUNNER=local (configure in docker-compose.yml).

⚡ Manual Setup (Local Development)

1. Clone and Install Dependencies

git clone <repo-url> altngc2
cd altngc2
npm install

2. Set Up Environment

Create api/.env file:

# Claude API
ANTHROPIC_API_KEY=your_claude_api_key_here
CLAUDE_MODEL=claude-3-5-sonnet-latest
CLAUDE_TIMEOUT_MS=12000
CLAUDE_MAX_TOKENS=800
CLAUDE_TEMPERATURE=0

# AWS & Kubernetes
AWS_REGION=us-east-1
K8S_NAMESPACE=altngc
S3_BUCKET=altngc-artifacts-yourid

# Database
DATABASE_URL="postgresql://postgres:password@localhost:5432/altngc"

# API Server
PORT=8000

3. Set Up AWS & Kubernetes

# Configure AWS credentials (choose one)
aws sso login --profile your-profile
export AWS_PROFILE=your-profile

# OR use temporary credentials
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret

# Connect to EKS cluster
aws eks update-kubeconfig --name altngc --region us-east-1

4. Start Services

# Start database
docker compose up -d

# Start API server
cd api && npm run dev

# Start frontend (in another terminal)
cd web && npm run dev

5. Deploy Kubernetes RBAC

kubectl apply -f infra/k8s/altngc-rbac.yaml

🏗️ EKS Cluster Setup

Create EKS Cluster (CPU)

eksctl create cluster --name altngc --region us-east-1 --nodes 2 --node-type t3.small --with-oidc
aws eks update-kubeconfig --name altngc --region us-east-1
kubectl apply -f infra/k8s/altngc-rbac.yaml

Add GPU Node Group (Optional)

eksctl create nodegroup --cluster altngc --name gpu-ng --region us-east-1 \
  --node-type g4dn.xlarge --nodes 1 --managed \
  --node-labels accelerator=nvidia \
  --node-taints nvidia.com/gpu=true:NoSchedule

kubectl apply -n kube-system -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.16.2/nvidia-device-plugin.yml

🧪 Testing

Health Check

curl http://localhost:8000/health
# Expected: {"api":"ok","db":"up","k8s":"up"}

CPU Job Test

curl -X POST http://localhost:8000/api/v1/auto/plan-and-run \
  -H "Content-Type: application/json" \
  -d '{"prompt":"echo hello from alpine"}'

GPU Job Test (requires GPU nodes)

curl -X POST http://localhost:8000/api/v1/auto/plan-and-run \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Run nvidia-smi on 1 GPU"}'

S3 Presigned URL Test

curl -X POST http://localhost:8000/api/v1/uploads/presign \
  -H "Content-Type: application/json" \
  -d '{"key":"test.txt","contentType":"text/plain"}'

🔧 Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Web Frontend  │───▶│   API Server    │───▶│   EKS Cluster   │
│  (React + TS)   │    │   (Fastify)     │    │   (CPU + GPU)   │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                │                        │
                                ▼                        ▼
                       ┌─────────────────┐    ┌─────────────────┐
                       │   Claude API    │    │   PostgreSQL    │
                       │   (Planning)    │    │   (Job State)   │
                       └─────────────────┘    └─────────────────┘
                                │
                                ▼
                       ┌─────────────────┐
                       │   S3 Bucket     │
                       │  (Artifacts)    │
                       └─────────────────┘

🛡️ Safety Features

Image Allowlist: Only trusted container images allowed
Resource Caps: CPU (16 cores), Memory (64GB), GPU (2 units)
Cooldown: 5-second minimum between job submissions
Timeout: 12-second Claude API timeout
TTL: Kubernetes jobs auto-cleanup after 10 minutes
Preflight: GPU availability checks before scheduling

📊 Monitoring

Real-time job status via Server-Sent Events
Pod log streaming
Health checks for DB and K8s connectivity
Comprehensive error handling and logging

🔒 Security

Environment-based configuration (no hardcoded secrets)
AWS credential chain (SSO/profiles preferred)
RBAC for Kubernetes permissions
Input validation and sanitization
Rate limiting and resource caps

📚 API Endpoints

Core Endpoints

POST /api/v1/auto/plan-and-run - Submit natural language job
GET /api/v1/jobs/{id}/stream - Stream job status/logs (SSE)
POST /api/v1/jobs/{id}/run - Manually trigger job execution
GET /health - System health check

S3 Integration

POST /api/v1/uploads/presign - Get presigned upload URL

Standard CRUD

GET /api/v1/projects - List projects
POST /api/v1/projects - Create project
GET /api/v1/projects/{id}/jobs - List project jobs
GET /api/v1/jobs/{id} - Get job details

🚨 Troubleshooting

Common Issues

1. "No GPU nodes available"

Ensure GPU nodegroup is created and NVIDIA device plugin is installed
Check node labels: kubectl get nodes --show-labels

2. "Invalid JSON from model"

Check Claude API key is valid
Verify network connectivity to Anthropic API

3. "Job not found"

Ensure job was created successfully in database
Check job ID in the response

4. Kubernetes connection issues

Verify kubectl get pods -n altngc works
Check AWS credentials and kubeconfig

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

📄 License

MIT License - see LICENSE file for details

Built with ❤️ using Claude AI, React, Fastify, and Kubernetes

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
api		api
charts/api		charts/api
infra/k8s		infra/k8s
k8s		k8s
packages/types		packages/types
scripts		scripts
web		web
.gitignore		.gitignore
.npmrc		.npmrc
.nvmrc		.nvmrc
AUTO_INSTALL_COMPLETE.md		AUTO_INSTALL_COMPLETE.md
CLAUDE_INTEGRATION.md		CLAUDE_INTEGRATION.md
CLOUD_QUICKSTART_COMPLETE.md		CLOUD_QUICKSTART_COMPLETE.md
IMPLEMENTATION_COMPLETE.md		IMPLEMENTATION_COMPLETE.md
INSTALL_EKSCTL_NOW.md		INSTALL_EKSCTL_NOW.md
QUICK_EKSCTL_INSTALL.md		QUICK_EKSCTL_INSTALL.md
README.md		README.md
REPOSITORY_SETUP_SUMMARY.md		REPOSITORY_SETUP_SUMMARY.md
SIMPLIFIED_BOOTSTRAP_COMPLETE.md		SIMPLIFIED_BOOTSTRAP_COMPLETE.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.base.json		tsconfig.base.json

ave2400/PromptOpts

Folders and files

Latest commit

History

Repository files navigation