Skip to content

Qwen2.5-0.5B LLM on AWS EKS with Karpenter autoscaling using t3.micro spot instances. Terraform + Helm + vLLM.

Notifications You must be signed in to change notification settings

yuklia/qwen2.5-vllm-eks-karpenter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Qwen2.5 LLM on EKS + Karpenter

Deploy Qwen2.5 0.5B Large Language Model on Amazon EKS using vLLM, Terraform, and Helm with cost-effective t3.micro spot instances.

Overview

This project provides a minimal, cost-effective setup to run Qwen2.5 0.5B on AWS EKS:

  • Model: Qwen2.5-0.5B-Instruct (CPU-only, no GPU required)
  • Infrastructure: EKS cluster with Karpenter autoscaling
  • Compute: t3.micro spot instances (~$0.005/hour)
  • Automation: Infrastructure as Code with Terraform and Helm

Karpenter Architecture & Implementation

Overview

Karpenter is deployed as a Kubernetes controller that observes unschedulable pods and provisions optimal EC2 instances. Unlike Cluster Autoscaler or managed node groups, Karpenter evaluates instance types dynamically and can provision nodes in <30 seconds without ASG warm-up delays.

Key Advantages:

  • Right-sizing: Evaluates all compatible instance types, selects cheapest fit
  • Spot-first: Prefers Spot capacity (90% savings) with On-Demand fallback
  • Zero-min scaling: Can scale to zero worker nodes, unlike managed node groups
  • Fast provisioning: Direct EC2 API calls, no ASG lifecycle hooks
  • Consolidation: Actively binpacks and terminates underutilized nodes

Controller Deployment

Location: eks/modules/eks/addons.tf

enable_karpenter = true

karpenter = {
  chart_version       = "1.1.2"
  repository_username = data.aws_ecrpublic_authorization_token.token.user_name
  repository_password = data.aws_ecrpublic_authorization_token.token.password
}

Deployment Details:

  • Chart: Deployed via Helm using aws-ia/eks-blueprints-addons module
  • Namespace: karpenter (created by Helm chart)
  • Scheduling: Controller runs on Fargate (not Karpenter-managed nodes)
  • Image: Public ECR (public.ecr.aws/karpenter/karpenter), requires us-east-1 provider for auth token

IAM & Permissions

Controller IAM Role (created by aws-ia/eks-blueprints-addons):

  • Trust: IRSA (IAM Roles for Service Accounts) via OIDC provider
  • Permissions: EC2 instance creation, termination, Describe APIs, Launch Templates
  • Access Entry: aws_eks_access_entry.karpenter_node_access_entry grants cluster API access

Node IAM Role (karpenter_node configuration):

  • Base Policy: Standard EKS node policy (EKS CNI, container registry pull, CloudWatch logs)
  • Additional Policies: AmazonSSMManagedInstanceCore (SSM Session Manager access)
  • Role Name: Uses cluster_name prefix (no random suffix per config)

Custom Resources

EC2NodeClass

Purpose: Defines EC2 instance configuration (AMI, subnet, security groups, IAM role, tags)

Location: eks/modules/eks/karpenter/default-nodeclass.yaml

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiSelectorTerms:
    - alias: al2@latest
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: ${cluster_name}
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: ${cluster_name}

Key Fields:

  • amiSelectorTerms: Dynamic AMI selection (AL2 latest)
  • subnetSelectorTerms: Tag-based subnet discovery (private subnets only)
  • securityGroupSelectorTerms: Tag-based security group discovery
  • role: References karpenter_node IAM role name

NodePool

Purpose: Defines scheduling constraints, instance requirements, limits, and disruption policies

Location: eks/modules/eks/karpenter/freetier-nodepool.yaml

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: freetier-cpu
spec:
  template:
    spec:
      nodeClassRef:
        name: default
      requirements:
        - key: karpenter.k8s.aws/instance-type
          operator: In
          values: ["t3.micro"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
  limits:
    cpu: 1  # Maximum total CPU across all nodes in this pool
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s

Configuration Details:

Field Value Rationale
instance-type t3.micro Free-tier eligible, 1 vCPU, 1GB RAM
capacity-type spot 90% cost savings vs On-Demand
arch amd64 x86_64 architecture
limits.cpu 1 Prevents cluster from scaling beyond budget
consolidationPolicy WhenEmpty Only consolidates when node is empty (safer for development)
consolidateAfter 30s Quick scale-down to minimize idle costs

Provisioning Flow

sequenceDiagram
    participant Pod as Pod (Pending)
    participant Scheduler as kube-scheduler
    participant Controller as Karpenter Controller
    participant NodePool as NodePool CR
    participant EC2Class as EC2NodeClass CR
    participant EC2API as AWS EC2 API
    participant Node as EC2 Instance
    
    Pod->>Scheduler: Create/Pending
    Scheduler->>Scheduler: Evaluate nodeSelector/taints/affinity
    Scheduler->>Pod: Mark Unschedulable (NoNodes)
    Controller->>Pod: Watch Event (PodUnschedulable)
    Controller->>NodePool: List matching NodePools
    NodePool->>Controller: Return requirements/limits
    Controller->>EC2Class: Get subnet/SG/AMI config
    EC2Class->>Controller: Return selector terms
    Controller->>EC2API: Evaluate instance types (price/compatibility)
    Controller->>EC2API: CreateFleet/LaunchInstance (Spot preferred)
    EC2API->>Node: Instance Launching
    Node->>Scheduler: Node Ready (kubelet registered)
    Scheduler->>Pod: Bind to Node
    Pod->>Pod: Running
Loading

Technical Steps:

  1. Pod Creation: Pod resource created with nodeSelector: {instanceType: freetier} or matching taints
  2. Scheduler Evaluation: kube-scheduler evaluates existing nodes, finds none match -> Unschedulable
  3. Controller Watch: Karpenter watches Pod events via informer, detects Unschedulable with reason NoNodes
  4. NodePool Matching: Controller evaluates NodePool requirements against pod requests/nodeSelector
  5. Instance Selection:
    • Queries EC2 pricing/availability APIs
    • Filters by requirements (instance-type, capacity-type, arch, zone)
    • Selects cheapest compatible instance (Spot if available)
  6. EC2 Provisioning:
    • Uses EC2NodeClass to resolve subnet/SG/AMI
    • Calls RunInstances or CreateFleet with user-data for bootstrap script
    • Tags instance with karpenter.sh/nodepool, karpenter.sh/discovery
  7. Node Registration: Bootstrap script installs kubelet, joins cluster via EKS API
  8. Pod Binding: Scheduler assigns pod to new node, Kubelet starts container

Provisioning Time: Typically 20-30 seconds from pending pod to running (faster than ASG warm-up ~2-5 minutes)

Consolidation & Disruption

Consolidation Policy: WhenEmpty

  • Only consolidates nodes with zero pods (safer for workloads with no PDB)
  • After consolidateAfter duration, evicts pods, drains node, terminates instance

Alternative Policies:

  • WhenUnderutilized: Consolidates even with pods (requires PodDisruptionBudget)
  • Never: Disable consolidation (not recommended for cost savings)

Interruption Handling:

  • Spot interruptions trigger NodeClaim deletion → pods rescheduled → new node provisioned
  • PodDisruptionBudget protects against voluntary evictions (consolidation, not Spot interruptions)

Operational Considerations

Subnet Discovery:

  • Subnets must be tagged: karpenter.sh/discovery: <cluster_name>
  • Terraform tags private subnets automatically
  • Karpenter provisions nodes in discovered subnets across AZs for HA

Security Groups:

  • Tagged: karpenter.sh/discovery: <cluster_name>
  • Uses cluster security group + node security group (if created)
  • Allows traffic: control plane ↔ nodes, nodes ↔ nodes (pod networking)

Monitoring:

# Controller logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f

# Node claims (provisioned nodes)
kubectl get nodeclaims

# NodePool status
kubectl get nodepools freetier-cpu -o yaml

# Provisioning metrics (if Prometheus enabled)
karpenter_provisioner_nodepool_limit{...}  # CPU limit
karpenter_provisioner_nodepool_usage{...}   # Current usage

Troubleshooting:

  • No nodes provisioned: Check NodePool limits.cpu, Spot capacity in region, IAM permissions
  • Wrong instance type: Verify NodePool requirements match pod requests
  • Nodes terminate immediately: Check consolidationPolicy, pod count, PDBs

Network Architecture

This deployment uses a standard AWS VPC architecture with public and private subnets across multiple Availability Zones:

flowchart TB
    Internet((Internet)):::ext
    EKSAPI[EKS API Endpoint<br/>Public]:::cp
    
    subgraph VPC[VPC - IPv4 Only]
        IGW[Internet Gateway]:::net
        
        subgraph AZA[Availability Zone A]
            PubA[Public Subnet A<br/>NAT Gateway]:::net
            PrivA[Private Subnet A<br/>Karpenter Nodes<br/>Fargate Pods]:::node
        end
        
        subgraph AZB[Availability Zone B]
            PubB[Public Subnet B<br/>Optional: ALB/NLB]:::lb
            PrivB[Private Subnet B<br/>Karpenter Nodes<br/>Fargate Pods]:::node
        end
    end
    
    Internet --> EKSAPI
    EKSAPI -->|TLS + IAM Auth| VPC
    PubA --> IGW
    PubB --> IGW
    PrivA -->|0.0.0.0/0| PubA
    PrivB -->|0.0.0.0/0| PubA
    
    classDef cp fill:#e8f0fe,stroke:#3b82f6,color:#1e3a8a
    classDef net fill:#eef2ff,stroke:#6366f1,color:#3730a3
    classDef node fill:#ecfdf5,stroke:#10b981,color:#065f46
    classDef lb fill:#fde68a,stroke:#d97706,color:#78350f
    classDef ext fill:#f1f5f9,stroke:#94a3b8,color:#334155
Loading

Network Configuration:

  • VPC: Single VPC spanning 2 Availability Zones for high availability
  • Public Subnets: Host NAT Gateway and optional Load Balancers
  • Private Subnets: Host Karpenter-provisioned EC2 nodes and Fargate pods
  • Egress: All outbound traffic from private subnets routes through NAT Gateway
  • Ingress: EKS API endpoint is public (configured via Terraform)
  • IPv4 Only: This deployment uses IPv4 addressing only. Dual-stack (IPv4/IPv6) networking is currently out of scope.

Key Networking Notes:

  • Karpenter nodes are provisioned in private subnets tagged for Karpenter discovery
  • System pods (CoreDNS, Karpenter controller) run on Fargate
  • Workload pods run on Karpenter-provisioned EC2 nodes
  • Single NAT Gateway reduces costs but creates a single point of egress

Prerequisites

  • AWS Account with appropriate permissions
  • AWS CLI installed and configured
  • Terraform >= 1.13 installed
  • kubectl installed
  • Helm 3.x installed

Project Structure

.
├── eks/
│   ├── deploy/clusters/dev/      # EKS cluster deployment
│   └── modules/eks/              # EKS Terraform module
├── qwen-vllm/
│   ├── deploy/                   # Qwen model deployment
│   └── modules/qwen/             # Qwen Terraform module
└── genai-app/                    # Chatbot application

Deployment Guide

Step 1: Deploy EKS Cluster

  1. Navigate to EKS deployment directory:

    cd eks/deploy/clusters/dev
  2. Configure your cluster:

    Edit config.yaml and update:

    • aws.account_id: Your AWS account ID
    • aws.region: Your AWS region (e.g., us-west-1)
    • eks.name: Your desired cluster name (default: yuklia-sbx-inference)
    • default_tags: Your account details

    Example config.yaml:

    aws:
      account_id: "123456789012"
      region: us-west-1
    eks:
      name: "my-qwen-cluster"
      team: "mlops"
      gpu_mng:
        enable: false  # Disabled for CPU-only Qwen model
    default_tags:
      aws_account_id: "123456789012"
      aws_account_name: "my-account"
      project: "Qwen2.5 Educational"
  3. Initialize Terraform:

    terraform init
  4. Review the deployment plan:

    terraform plan -out=eks-plan.json

    This will show you what resources will be created:

    • VPC with public/private subnets
    • EKS cluster
    • Karpenter for node autoscaling
    • Internet Gateway and NAT Gateway
    • Security groups and IAM roles
  5. Deploy the EKS cluster:

    terraform apply eks-plan.json

    ⏱️ This takes approximately 15-20 minutes to complete.

  6. Configure kubectl:

    After the cluster is created, configure kubectl to access it:

    aws eks update-kubeconfig \
      --region us-west-1 \
      --name my-qwen-cluster

    Replace us-west-1 and my-qwen-cluster with your values.

  7. Verify the cluster:

    # Check cluster nodes
    kubectl get nodes
    
    # Check Karpenter
    kubectl get pods -n karpenter
    
    # Verify Karpenter nodepools
    kubectl get nodepools
    kubectl get ec2nodeclasses

    You should see:

    • Karpenter pods running
    • freetier-cpu nodepool configured for t3.micro spot instances
    • default EC2NodeClass available

Step 2: Deploy Qwen2.5 Model

  1. Navigate to model deployment directory:

    cd ../../../qwen-vllm/deploy
  2. Configure the model deployment:

    Edit config_qwen2.5.yaml and update:

    • eks.name: Must match the EKS cluster name from Step 1
    • aws.region: Must match the region from Step 1
    • default_tags.aws_account_id: Your AWS account ID

    Example:

    aws:
      region: "us-west-1"
    eks:
      name: "my-qwen-cluster"  # Must match EKS cluster name
    model_size: "0.5b"
    model_name: "Qwen/Qwen2.5-0.5B-Instruct"
    resources:
      limits_cpu: 1
      requests_cpu: 1
      limits_memory: "900Mi"
      requests_memory: "800Mi"
    volumes:
      size: "512Mi"
    default_tags:
      aws_account_id: "123456789012"
      aws_account_name: "my-account"
      project: "Qwen2.5 Educational"
  3. Initialize Terraform:

    terraform init
  4. Deploy the model:

    terraform plan -out=qwen-plan.json
    terraform apply qwen-plan.json
  5. Monitor the deployment:

    # Watch pod status
    kubectl get pods -n qwen2.5-0.5b -w
    
    # Check logs (wait for pod to be running)
    kubectl logs -n qwen2.5-0.5b -l app=qwen2.5-0.5b -f

    The model will:

    • Download from HuggingFace (~200MB)
    • Load into memory
    • Start the vLLM server

    ⏱️ This takes 5-10 minutes depending on download speed.

  6. Verify the model is ready:

    Look for this in the logs:

    INFO:     Started server process
    INFO:     Uvicorn running on http://0.0.0.0:8000
    

Step 3: Test the Model

  1. Port forward to access the API:

    kubectl port-forward svc/qwen2.5-0.5b-service 8000:8000 -n qwen2.5-0.5b
  2. Test with curl:

    curl -X POST http://localhost:8000/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "Qwen/Qwen2.5-0.5B-Instruct",
        "messages": [{"role": "user", "content": "Is life possible on Enceladus moon?"}]
      }'
  3. Test with the web app (optional):

    cd ../../genai-app
    python3 -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt
    python app.py

    Then open http://localhost:7860 in your browser.

Cost Breakdown

  • EKS Control Plane: ~$0.10/hour (charged 24/7 when cluster exists)
  • t3.micro Spot Instance: ~$0.004-0.005/hour (only when model is running)
  • NAT Gateway: ~$0.045/hour + data transfer (only during deployment/inference)
  • Data Transfer: Minimal for small model

Total: Approximately $0.15-0.20/hour when running, ~$0.10/hour when idle (just control plane).

Cleanup

To remove all resources and avoid charges:

  1. Delete the model deployment:

    cd qwen-vllm/deploy
    terraform destroy
  2. Delete the EKS cluster:

    cd ../../eks/deploy/clusters/dev
    terraform destroy

Additional Resources

About

Qwen2.5-0.5B LLM on AWS EKS with Karpenter autoscaling using t3.micro spot instances. Terraform + Helm + vLLM.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published