Skip to content

ave2400/PromptOpts

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AltNGC2 - Claude-Powered EKS Job Runner

🚧 CLOSED BETA VERSION 🚧

A modern job execution platform that uses Claude AI to plan and execute workloads on AWS EKS with both CPU and GPU support.

Beta Notice: This is a closed beta release. For questions, issues, or feedback, please contact: [email protected]

πŸš€ Cloud Quickstart (One-liner EKS)

This project includes a cross-platform bootstrap that sets up an EKS cluster and deploys the API.

It will:

  • Prompt you for AWS Region / Profile and your ANTHROPIC_API_KEY
  • Create/update an EKS cluster (CPU or GPU)
  • Configure NVIDIA device plugin as an EKS add-on (GPU mode)
  • Apply Kubernetes namespace + RBAC
  • Deploy the API (via k8s/api.yaml if present, or via Helm if you have a chart)

Prerequisites

  • AWS account with permissions for EKS, IAM, and EC2
  • AWS IAM Access Keys (Access Key ID + Secret Access Key)
  • Auto-installed by the bootstrap scripts:
    • eksctl (automatically installed)
    • kubectl (automatically installed on Windows)
  • Manually install (one-time setup):
  • Your Anthropic API key

Windows (PowerShell)

# Run as Administrator (for tool installation)
.\scripts\bootstrap-eks.ps1

# Optional GPU mode:
.\scripts\bootstrap-eks.ps1 -GPU

# Pin specific eksctl version:
.\scripts\bootstrap-eks.ps1 -EksctlVersion 0.214.0

macOS / Linux (bash/zsh)

# From repo root
bash scripts/bootstrap-eks.sh

# Optional GPU mode:
GPU=1 bash scripts/bootstrap-eks.sh

# Pin specific eksctl version:
EKSCTL_VERSION=0.214.0 bash scripts/bootstrap-eks.sh

What the script does

  1. Auto-installs missing tools:

    • eksctl (via Chocolatey/Homebrew or direct download)
    • kubectl (Windows only - assumes macOS/Linux users have it)
  2. Prompts you for:

    • Project name (default: altngc)
    • AWS_ACCESS_KEY_ID
    • AWS_SECRET_ACCESS_KEY
    • AWS_REGION (e.g., us-east-1)
    • GPU mode (yes/no)
    • ANTHROPIC_API_KEY (if not already set in your environment)
  3. Creates or updates an EKS cluster:

    • CPU: m5.large nodes
    • GPU: g5.xlarge nodes + installs NVIDIA device plugin as an EKS add-on
  4. Applies:

    • Kubernetes namespace (matching your project name)
    • RBAC from infra/k8s/altngc-rbac.yaml
  5. Deploys your API:

    • If k8s/api.yaml exists β†’ kubectl apply
    • Else prompts to deploy via Helm chart charts/api (you'll provide image:tag)
  6. Prints cluster nodes and (if deployed) API pods.

After bootstrap

  • Check nodes and pods:
    kubectl -n <project> get nodes -o wide
    kubectl -n <project> get pods -o wide
  • Expose the API via:
    • A Service of type LoadBalancer, or
    • An Ingress + Ingress Controller (e.g., AWS Load Balancer Controller)

Troubleshooting

  • Missing tools: Only aws and helm need manual install - eksctl/kubectl are auto-installed.
  • EKS auth: Ensure your AWS user/role can create/manage EKS and node groups.
  • GPU jobs pending: Confirm GPU node type and NVIDIA add-on are present:
    kubectl -n kube-system get daemonset nvidia-device-plugin-daemonset
  • API not exposed: Add a Service (type LoadBalancer) or an Ingress manifest.

AWS Credentials

This project does not require AWS SSO or profiles.

Instead, you'll provide your classic IAM access keys:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_REGION

The bootstrap script will prompt you for these values on first run. They are exported only for the current session and passed directly to aws, eksctl, and kubectl.

Make sure the IAM user/role backing these keys has:

  • eks:*
  • ec2:*
  • iam:*
  • s3:*

(Or attach the AdministratorAccess policy for quickstarts.)


πŸš€ Features

  • AI-Powered Planning: Claude translates natural language prompts into Kubernetes job specifications
  • EKS Integration: Runs jobs on AWS EKS with automatic scaling
  • GPU Support: Supports NVIDIA GPUs with proper node selection and tolerations
  • Safety First: Image allowlists, resource caps, cooldowns, and preflight checks
  • Real-time Streaming: Live job status and log streaming via Server-Sent Events
  • S3 Integration: Presigned URLs for secure file uploads/downloads
  • Modern UI: React + TypeScript frontend with real-time updates

πŸ“‹ Prerequisites

  • Node.js 18+
  • Docker & Docker Compose
  • AWS CLI configured with appropriate permissions
  • kubectl configured for EKS access
  • Anthropic API key for Claude integration

⚑ Local Demo (No Kubernetes)

If you want a zero-K8s quickstart, run the Local Demo:

# 1) Put keys in api/.env
# ANTHROPIC_API_KEY=...
# AWS_REGION=us-east-1
# (Optional) AWS creds if not using an instance role/SSO

# 2) Start DB + API
npm run up   # docker compose up --build

# 3) Health check
curl http://localhost:8000/health

This runs Postgres + the API locally with RUNNER=local (configure in docker-compose.yml).


⚑ Manual Setup (Local Development)

1. Clone and Install Dependencies

git clone <repo-url> altngc2
cd altngc2
npm install

2. Set Up Environment

Create api/.env file:

# Claude API
ANTHROPIC_API_KEY=your_claude_api_key_here
CLAUDE_MODEL=claude-3-5-sonnet-latest
CLAUDE_TIMEOUT_MS=12000
CLAUDE_MAX_TOKENS=800
CLAUDE_TEMPERATURE=0

# AWS & Kubernetes
AWS_REGION=us-east-1
K8S_NAMESPACE=altngc
S3_BUCKET=altngc-artifacts-yourid

# Database
DATABASE_URL="postgresql://postgres:password@localhost:5432/altngc"

# API Server
PORT=8000

3. Set Up AWS & Kubernetes

# Configure AWS credentials (choose one)
aws sso login --profile your-profile
export AWS_PROFILE=your-profile

# OR use temporary credentials
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret

# Connect to EKS cluster
aws eks update-kubeconfig --name altngc --region us-east-1

4. Start Services

# Start database
docker compose up -d

# Start API server
cd api && npm run dev

# Start frontend (in another terminal)
cd web && npm run dev

5. Deploy Kubernetes RBAC

kubectl apply -f infra/k8s/altngc-rbac.yaml

πŸ—οΈ EKS Cluster Setup

Create EKS Cluster (CPU)

eksctl create cluster --name altngc --region us-east-1 --nodes 2 --node-type t3.small --with-oidc
aws eks update-kubeconfig --name altngc --region us-east-1
kubectl apply -f infra/k8s/altngc-rbac.yaml

Add GPU Node Group (Optional)

eksctl create nodegroup --cluster altngc --name gpu-ng --region us-east-1 \
  --node-type g4dn.xlarge --nodes 1 --managed \
  --node-labels accelerator=nvidia \
  --node-taints nvidia.com/gpu=true:NoSchedule

kubectl apply -n kube-system -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.16.2/nvidia-device-plugin.yml

πŸ§ͺ Testing

Health Check

curl http://localhost:8000/health
# Expected: {"api":"ok","db":"up","k8s":"up"}

CPU Job Test

curl -X POST http://localhost:8000/api/v1/auto/plan-and-run \
  -H "Content-Type: application/json" \
  -d '{"prompt":"echo hello from alpine"}'

GPU Job Test (requires GPU nodes)

curl -X POST http://localhost:8000/api/v1/auto/plan-and-run \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Run nvidia-smi on 1 GPU"}'

S3 Presigned URL Test

curl -X POST http://localhost:8000/api/v1/uploads/presign \
  -H "Content-Type: application/json" \
  -d '{"key":"test.txt","contentType":"text/plain"}'

πŸ”§ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Web Frontend  │───▢│   API Server    │───▢│   EKS Cluster   β”‚
β”‚  (React + TS)   β”‚    β”‚   (Fastify)     β”‚    β”‚   (CPU + GPU)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚                        β”‚
                                β–Ό                        β–Ό
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚   Claude API    β”‚    β”‚   PostgreSQL    β”‚
                       β”‚   (Planning)    β”‚    β”‚   (Job State)   β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β–Ό
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚   S3 Bucket     β”‚
                       β”‚  (Artifacts)    β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ›‘οΈ Safety Features

  • Image Allowlist: Only trusted container images allowed
  • Resource Caps: CPU (16 cores), Memory (64GB), GPU (2 units)
  • Cooldown: 5-second minimum between job submissions
  • Timeout: 12-second Claude API timeout
  • TTL: Kubernetes jobs auto-cleanup after 10 minutes
  • Preflight: GPU availability checks before scheduling

πŸ“Š Monitoring

  • Real-time job status via Server-Sent Events
  • Pod log streaming
  • Health checks for DB and K8s connectivity
  • Comprehensive error handling and logging

πŸ”’ Security

  • Environment-based configuration (no hardcoded secrets)
  • AWS credential chain (SSO/profiles preferred)
  • RBAC for Kubernetes permissions
  • Input validation and sanitization
  • Rate limiting and resource caps

πŸ“š API Endpoints

Core Endpoints

  • POST /api/v1/auto/plan-and-run - Submit natural language job
  • GET /api/v1/jobs/{id}/stream - Stream job status/logs (SSE)
  • POST /api/v1/jobs/{id}/run - Manually trigger job execution
  • GET /health - System health check

S3 Integration

  • POST /api/v1/uploads/presign - Get presigned upload URL

Standard CRUD

  • GET /api/v1/projects - List projects
  • POST /api/v1/projects - Create project
  • GET /api/v1/projects/{id}/jobs - List project jobs
  • GET /api/v1/jobs/{id} - Get job details

🚨 Troubleshooting

Common Issues

1. "No GPU nodes available"

  • Ensure GPU nodegroup is created and NVIDIA device plugin is installed
  • Check node labels: kubectl get nodes --show-labels

2. "Invalid JSON from model"

  • Check Claude API key is valid
  • Verify network connectivity to Anthropic API

3. "Job not found"

  • Ensure job was created successfully in database
  • Check job ID in the response

4. Kubernetes connection issues

  • Verify kubectl get pods -n altngc works
  • Check AWS credentials and kubeconfig

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details


Built with ❀️ using Claude AI, React, Fastify, and Kubernetes

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 74.6%
  • PowerShell 12.9%
  • Shell 6.5%
  • JavaScript 3.2%
  • Smarty 1.7%
  • CSS 0.7%
  • HTML 0.4%