Qwen2.5 LLM on EKS + Karpenter

Deploy Qwen2.5 0.5B Large Language Model on Amazon EKS using vLLM, Terraform, and Helm with cost-effective t3.micro spot instances.

Overview

This project provides a minimal, cost-effective setup to run Qwen2.5 0.5B on AWS EKS:

Model: Qwen2.5-0.5B-Instruct (CPU-only, no GPU required)
Infrastructure: EKS cluster with Karpenter autoscaling
Compute: t3.micro spot instances (~$0.005/hour)
Automation: Infrastructure as Code with Terraform and Helm

Karpenter Architecture & Implementation

Overview

Karpenter is deployed as a Kubernetes controller that observes unschedulable pods and provisions optimal EC2 instances. Unlike Cluster Autoscaler or managed node groups, Karpenter evaluates instance types dynamically and can provision nodes in <30 seconds without ASG warm-up delays.

Key Advantages:

Right-sizing: Evaluates all compatible instance types, selects cheapest fit
Spot-first: Prefers Spot capacity (90% savings) with On-Demand fallback
Zero-min scaling: Can scale to zero worker nodes, unlike managed node groups
Fast provisioning: Direct EC2 API calls, no ASG lifecycle hooks
Consolidation: Actively binpacks and terminates underutilized nodes

Controller Deployment

Location: eks/modules/eks/addons.tf

enable_karpenter = true

karpenter = {
  chart_version       = "1.1.2"
  repository_username = data.aws_ecrpublic_authorization_token.token.user_name
  repository_password = data.aws_ecrpublic_authorization_token.token.password
}

Deployment Details:

Chart: Deployed via Helm using aws-ia/eks-blueprints-addons module
Namespace: karpenter (created by Helm chart)
Scheduling: Controller runs on Fargate (not Karpenter-managed nodes)
Image: Public ECR (public.ecr.aws/karpenter/karpenter), requires us-east-1 provider for auth token

IAM & Permissions

Controller IAM Role (created by aws-ia/eks-blueprints-addons):

Trust: IRSA (IAM Roles for Service Accounts) via OIDC provider
Permissions: EC2 instance creation, termination, Describe APIs, Launch Templates
Access Entry: aws_eks_access_entry.karpenter_node_access_entry grants cluster API access

Node IAM Role (karpenter_node configuration):

Base Policy: Standard EKS node policy (EKS CNI, container registry pull, CloudWatch logs)
Additional Policies: AmazonSSMManagedInstanceCore (SSM Session Manager access)
Role Name: Uses cluster_name prefix (no random suffix per config)

Custom Resources

EC2NodeClass

Purpose: Defines EC2 instance configuration (AMI, subnet, security groups, IAM role, tags)

Location: eks/modules/eks/karpenter/default-nodeclass.yaml

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiSelectorTerms:
    - alias: al2@latest
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: ${cluster_name}
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: ${cluster_name}

Key Fields:

amiSelectorTerms: Dynamic AMI selection (AL2 latest)
subnetSelectorTerms: Tag-based subnet discovery (private subnets only)
securityGroupSelectorTerms: Tag-based security group discovery
role: References karpenter_node IAM role name

NodePool

Purpose: Defines scheduling constraints, instance requirements, limits, and disruption policies

Location: eks/modules/eks/karpenter/freetier-nodepool.yaml

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: freetier-cpu
spec:
  template:
    spec:
      nodeClassRef:
        name: default
      requirements:
        - key: karpenter.k8s.aws/instance-type
          operator: In
          values: ["t3.micro"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
  limits:
    cpu: 1  # Maximum total CPU across all nodes in this pool
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s

Configuration Details:

Field	Value	Rationale
`instance-type`	`t3.micro`	Free-tier eligible, 1 vCPU, 1GB RAM
`capacity-type`	`spot`	90% cost savings vs On-Demand
`arch`	`amd64`	x86_64 architecture
`limits.cpu`	`1`	Prevents cluster from scaling beyond budget
`consolidationPolicy`	`WhenEmpty`	Only consolidates when node is empty (safer for development)
`consolidateAfter`	`30s`	Quick scale-down to minimize idle costs

Provisioning Flow

sequenceDiagram
    participant Pod as Pod (Pending)
    participant Scheduler as kube-scheduler
    participant Controller as Karpenter Controller
    participant NodePool as NodePool CR
    participant EC2Class as EC2NodeClass CR
    participant EC2API as AWS EC2 API
    participant Node as EC2 Instance
    
    Pod->>Scheduler: Create/Pending
    Scheduler->>Scheduler: Evaluate nodeSelector/taints/affinity
    Scheduler->>Pod: Mark Unschedulable (NoNodes)
    Controller->>Pod: Watch Event (PodUnschedulable)
    Controller->>NodePool: List matching NodePools
    NodePool->>Controller: Return requirements/limits
    Controller->>EC2Class: Get subnet/SG/AMI config
    EC2Class->>Controller: Return selector terms
    Controller->>EC2API: Evaluate instance types (price/compatibility)
    Controller->>EC2API: CreateFleet/LaunchInstance (Spot preferred)
    EC2API->>Node: Instance Launching
    Node->>Scheduler: Node Ready (kubelet registered)
    Scheduler->>Pod: Bind to Node
    Pod->>Pod: Running

Technical Steps:

Pod Creation: Pod resource created with nodeSelector: {instanceType: freetier} or matching taints
Scheduler Evaluation: kube-scheduler evaluates existing nodes, finds none match -> Unschedulable
Controller Watch: Karpenter watches Pod events via informer, detects Unschedulable with reason NoNodes
NodePool Matching: Controller evaluates NodePool requirements against pod requests/nodeSelector
Instance Selection:
- Queries EC2 pricing/availability APIs
- Filters by requirements (instance-type, capacity-type, arch, zone)
- Selects cheapest compatible instance (Spot if available)
EC2 Provisioning:
- Uses EC2NodeClass to resolve subnet/SG/AMI
- Calls RunInstances or CreateFleet with user-data for bootstrap script
- Tags instance with karpenter.sh/nodepool, karpenter.sh/discovery
Node Registration: Bootstrap script installs kubelet, joins cluster via EKS API
Pod Binding: Scheduler assigns pod to new node, Kubelet starts container

Provisioning Time: Typically 20-30 seconds from pending pod to running (faster than ASG warm-up ~2-5 minutes)

Consolidation & Disruption

Consolidation Policy: WhenEmpty

Only consolidates nodes with zero pods (safer for workloads with no PDB)
After consolidateAfter duration, evicts pods, drains node, terminates instance

Alternative Policies:

WhenUnderutilized: Consolidates even with pods (requires PodDisruptionBudget)
Never: Disable consolidation (not recommended for cost savings)

Interruption Handling:

Spot interruptions trigger NodeClaim deletion → pods rescheduled → new node provisioned
PodDisruptionBudget protects against voluntary evictions (consolidation, not Spot interruptions)

Operational Considerations

Subnet Discovery:

Subnets must be tagged: karpenter.sh/discovery: <cluster_name>
Terraform tags private subnets automatically
Karpenter provisions nodes in discovered subnets across AZs for HA

Security Groups:

Tagged: karpenter.sh/discovery: <cluster_name>
Uses cluster security group + node security group (if created)
Allows traffic: control plane ↔ nodes, nodes ↔ nodes (pod networking)

Monitoring:

# Controller logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f

# Node claims (provisioned nodes)
kubectl get nodeclaims

# NodePool status
kubectl get nodepools freetier-cpu -o yaml

# Provisioning metrics (if Prometheus enabled)
karpenter_provisioner_nodepool_limit{...}  # CPU limit
karpenter_provisioner_nodepool_usage{...}   # Current usage

Troubleshooting:

No nodes provisioned: Check NodePool limits.cpu, Spot capacity in region, IAM permissions
Wrong instance type: Verify NodePool requirements match pod requests
Nodes terminate immediately: Check consolidationPolicy, pod count, PDBs

Network Architecture

This deployment uses a standard AWS VPC architecture with public and private subnets across multiple Availability Zones:

flowchart TB
    Internet((Internet)):::ext
    EKSAPI[EKS API Endpoint<br/>Public]:::cp
    
    subgraph VPC[VPC - IPv4 Only]
        IGW[Internet Gateway]:::net
        
        subgraph AZA[Availability Zone A]
            PubA[Public Subnet A<br/>NAT Gateway]:::net
            PrivA[Private Subnet A<br/>Karpenter Nodes<br/>Fargate Pods]:::node
        end
        
        subgraph AZB[Availability Zone B]
            PubB[Public Subnet B<br/>Optional: ALB/NLB]:::lb
            PrivB[Private Subnet B<br/>Karpenter Nodes<br/>Fargate Pods]:::node
        end
    end
    
    Internet --> EKSAPI
    EKSAPI -->|TLS + IAM Auth| VPC
    PubA --> IGW
    PubB --> IGW
    PrivA -->|0.0.0.0/0| PubA
    PrivB -->|0.0.0.0/0| PubA
    
    classDef cp fill:#e8f0fe,stroke:#3b82f6,color:#1e3a8a
    classDef net fill:#eef2ff,stroke:#6366f1,color:#3730a3
    classDef node fill:#ecfdf5,stroke:#10b981,color:#065f46
    classDef lb fill:#fde68a,stroke:#d97706,color:#78350f
    classDef ext fill:#f1f5f9,stroke:#94a3b8,color:#334155

Network Configuration:

VPC: Single VPC spanning 2 Availability Zones for high availability
Public Subnets: Host NAT Gateway and optional Load Balancers
Private Subnets: Host Karpenter-provisioned EC2 nodes and Fargate pods
Egress: All outbound traffic from private subnets routes through NAT Gateway
Ingress: EKS API endpoint is public (configured via Terraform)
IPv4 Only: This deployment uses IPv4 addressing only. Dual-stack (IPv4/IPv6) networking is currently out of scope.

Key Networking Notes:

Karpenter nodes are provisioned in private subnets tagged for Karpenter discovery
System pods (CoreDNS, Karpenter controller) run on Fargate
Workload pods run on Karpenter-provisioned EC2 nodes
Single NAT Gateway reduces costs but creates a single point of egress

Prerequisites

AWS Account with appropriate permissions
AWS CLI installed and configured
Terraform >= 1.13 installed
kubectl installed
Helm 3.x installed

Project Structure

.
├── eks/
│   ├── deploy/clusters/dev/      # EKS cluster deployment
│   └── modules/eks/              # EKS Terraform module
├── qwen-vllm/
│   ├── deploy/                   # Qwen model deployment
│   └── modules/qwen/             # Qwen Terraform module
└── genai-app/                    # Chatbot application

Deployment Guide

Step 1: Deploy EKS Cluster

Navigate to EKS deployment directory:
```
cd eks/deploy/clusters/dev
```

Configure your cluster:

Edit config.yaml and update:

aws.account_id: Your AWS account ID
aws.region: Your AWS region (e.g., us-west-1)
eks.name: Your desired cluster name (default: yuklia-sbx-inference)
default_tags: Your account details

Example config.yaml:

aws:
  account_id: "123456789012"
  region: us-west-1
eks:
  name: "my-qwen-cluster"
  team: "mlops"
  gpu_mng:
    enable: false  # Disabled for CPU-only Qwen model
default_tags:
  aws_account_id: "123456789012"
  aws_account_name: "my-account"
  project: "Qwen2.5 Educational"

Initialize Terraform:
```
terraform init
```
Review the deployment plan:
```
terraform plan -out=eks-plan.json
```
This will show you what resources will be created:
- VPC with public/private subnets
- EKS cluster
- Karpenter for node autoscaling
- Internet Gateway and NAT Gateway
- Security groups and IAM roles
Deploy the EKS cluster:
```
terraform apply eks-plan.json
```
⏱️ This takes approximately 15-20 minutes to complete.
Configure kubectl:

After the cluster is created, configure kubectl to access it:
```
aws eks update-kubeconfig \
  --region us-west-1 \
  --name my-qwen-cluster
```
Replace us-west-1 and my-qwen-cluster with your values.

Verify the cluster:

# Check cluster nodes
kubectl get nodes

# Check Karpenter
kubectl get pods -n karpenter

# Verify Karpenter nodepools
kubectl get nodepools
kubectl get ec2nodeclasses

You should see:

Karpenter pods running
freetier-cpu nodepool configured for t3.micro spot instances
default EC2NodeClass available

Step 2: Deploy Qwen2.5 Model

Navigate to model deployment directory:
```
cd ../../../qwen-vllm/deploy
```

Configure the model deployment:

Edit config_qwen2.5.yaml and update:

eks.name: Must match the EKS cluster name from Step 1
aws.region: Must match the region from Step 1
default_tags.aws_account_id: Your AWS account ID

Example:

aws:
  region: "us-west-1"
eks:
  name: "my-qwen-cluster"  # Must match EKS cluster name
model_size: "0.5b"
model_name: "Qwen/Qwen2.5-0.5B-Instruct"
resources:
  limits_cpu: 1
  requests_cpu: 1
  limits_memory: "900Mi"
  requests_memory: "800Mi"
volumes:
  size: "512Mi"
default_tags:
  aws_account_id: "123456789012"
  aws_account_name: "my-account"
  project: "Qwen2.5 Educational"

Initialize Terraform:
```
terraform init
```

Deploy the model:

terraform plan -out=qwen-plan.json
terraform apply qwen-plan.json

Monitor the deployment:
```
# Watch pod status
kubectl get pods -n qwen2.5-0.5b -w

# Check logs (wait for pod to be running)
kubectl logs -n qwen2.5-0.5b -l app=qwen2.5-0.5b -f
```
The model will:
- Download from HuggingFace (~200MB)
- Load into memory
- Start the vLLM server
⏱️ This takes 5-10 minutes depending on download speed.

Verify the model is ready:

Look for this in the logs:

INFO:     Started server process
INFO:     Uvicorn running on http://0.0.0.0:8000

Step 3: Test the Model

Port forward to access the API:

kubectl port-forward svc/qwen2.5-0.5b-service 8000:8000 -n qwen2.5-0.5b

Test with curl:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-0.5B-Instruct",
    "messages": [{"role": "user", "content": "Is life possible on Enceladus moon?"}]
  }'

Test with the web app (optional):

cd ../../genai-app
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python app.py

Then open http://localhost:7860 in your browser.

Cost Breakdown

EKS Control Plane: ~$0.10/hour (charged 24/7 when cluster exists)
t3.micro Spot Instance: ~$0.004-0.005/hour (only when model is running)
NAT Gateway: ~$0.045/hour + data transfer (only during deployment/inference)
Data Transfer: Minimal for small model

Total: Approximately $0.15-0.20/hour when running, ~$0.10/hour when idle (just control plane).

Cleanup

To remove all resources and avoid charges:

Delete the model deployment:
```
cd qwen-vllm/deploy
terraform destroy
```

Delete the EKS cluster:

cd ../../eks/deploy/clusters/dev
terraform destroy

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
eks		eks
genai-app		genai-app
qwen-vllm		qwen-vllm
.gitignore		.gitignore
.tool-versions		.tool-versions
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Qwen2.5 LLM on EKS + Karpenter

Overview

Karpenter Architecture & Implementation

Overview

Controller Deployment

IAM & Permissions

Custom Resources

EC2NodeClass

NodePool

Provisioning Flow

Consolidation & Disruption

Operational Considerations

Network Architecture

Prerequisites

Project Structure

Deployment Guide

Step 1: Deploy EKS Cluster

Step 2: Deploy Qwen2.5 Model

Step 3: Test the Model

Cost Breakdown

Cleanup

Additional Resources

About

Uh oh!

Releases

Packages

Languages

yuklia/qwen2.5-vllm-eks-karpenter

Folders and files

Latest commit

History

Repository files navigation

Qwen2.5 LLM on EKS + Karpenter

Overview

Karpenter Architecture & Implementation

Overview

Controller Deployment

IAM & Permissions

Custom Resources

EC2NodeClass

NodePool

Provisioning Flow

Consolidation & Disruption

Operational Considerations

Network Architecture

Prerequisites

Project Structure

Deployment Guide

Step 1: Deploy EKS Cluster

Step 2: Deploy Qwen2.5 Model

Step 3: Test the Model

Cost Breakdown

Cleanup

Additional Resources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages