-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
What were you trying to accomplish?
use eksctl to create a managed nodegroup with Capacity reservation for ML (p5.48xlarge and p5en.48xlarge)
What happened?
Create nodegroup fail and the checking the cloudformation and find the CapacityReservationTarget is empty:
"CapacityReservationSpecification": {
"CapacityReservationTarget": {}
},
"InstanceMarketOptions": {
"MarketType": "capacity-block"
},
How to reproduce it?
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: aws-us-east-01
region: us-east-1
managedNodeGroups:
- name: p5-ng-efa
instanceType: p5.48xlarge Add version command #8 x 3.84 TB NVMe SSD
capacityReservation:
capacityReservationTarget:
capacityReservationId: "cr-0a91582543292e10a" # Replace with your ML Capacity Block ID
instanceMarketOptions:
marketType: capacity-block
privateNetworking: true
efaEnabled: true
iam:
attachPolicyARNs:- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
- arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
desiredCapacity: 1
minSize: 1
maxSize: 1
volumeSize: 200
tags:
"eks/node-type": "gpu"
labels:
eks/node-type: gpu
eks/gpu-type: nvidia
nvidia.com/gpu.present: "true"
taints: - key: nvidia.com/gpu
effect: "NoSchedule"
subnets:- subnet-01290a2d94274f6ee #AZ E, IAD5
ssh: # use existing EC2 key
publicKeyName: LabKey
- subnet-01290a2d94274f6ee #AZ E, IAD5
Logs
part of the yaml:
"LaunchTemplate": {
"Type": "AWS::EC2::LaunchTemplate",
"Properties": {
"LaunchTemplateData": {
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"Iops": 3000,
"Throughput": 125,
"VolumeSize": 200,
"VolumeType": "gp3"
}
}
],
"CapacityReservationSpecification": {
"CapacityReservationTarget": {}
},
"InstanceMarketOptions": {
"MarketType": "capacity-block"
},
"InstanceType": "p5.48xlarge",
"KeyName": "LabKey",
"MetadataOptions": {
"HttpPutResponseHopLimit": 2,
"HttpTokens": "required"
},
Anything else we need to know?
N/A
It's very easy to reproduce.
Versions
Admin:~/environment/koala $ eksctl info
eksctl version: 0.215.0
kubectl version: v1.31.0-eks-a737599
OS: linux