Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Amazon EKS + Karpenter + Cilium #7003

Closed
ruzickap opened this issue Aug 27, 2023 · 3 comments
Closed

[Bug] Amazon EKS + Karpenter + Cilium #7003

ruzickap opened this issue Aug 27, 2023 · 3 comments
Labels
kind/feature New feature or request kind/help Request for help priority/important-longterm Important over the long term, but may not be currently staffed and/or may require multiple releases

Comments

@ruzickap
Copy link

What were you trying to accomplish?

Install Amazon EKS + Karpenter + Cilium or simply Amazon EKS + Karpenter + "taints".

What happened?

I'm trying to install Amazon EKS with Karpenter and Cilium using eksctl.
In the Cilium documentation it is mentioned to use taints, but this will prevent Karpenter from being started and whole eksctl command will fail.

Is there a way to install EKS + Karpenter with "NoExecute" taints (or skip waiting for Karpenter)?

How to reproduce it?

# AWS Region
export AWS_DEFAULT_REGION="us-east-1"
export CLUSTER_NAME="k01"
export KUBECONFIG="kubeconfig-${CLUSTER_NAME}.conf"

tee "eksctl-${CLUSTER_NAME}.yaml" << EOF
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: ${CLUSTER_NAME}
  region: ${AWS_DEFAULT_REGION}
  tags:
    karpenter.sh/discovery: ${CLUSTER_NAME}
iam:
  withOIDC: true
karpenter:
  version: v0.29.2
  createServiceAccount: true
  withSpotInterruptionQueue: true
managedNodeGroups:
  - name: mng01-ng
    instanceType: t4g.medium
    desiredCapacity: 2
    minSize: 2
    maxSize: 5
    volumeSize: 20
    taints:
     - key: "node.cilium.io/agent-not-ready"
       value: "true"
       effect: "NoExecute"
    privateNetworking: true
EOF

Let's start eksctl to see the error log:

$ eksctl create cluster --config-file "eksctl-${CLUSTER_NAME}.yaml" --kubeconfig "${KUBECONFIG}"
...
2023-08-27 12:12:04 [ℹ]  waiting for at least 2 node(s) to become ready in "mng01-ng"
2023-08-27 12:12:04 [ℹ]  nodegroup "mng01-ng" has 2 node(s)
2023-08-27 12:12:04 [ℹ]  node "ip-192-168-114-136.ec2.internal" is ready
2023-08-27 12:12:04 [ℹ]  node "ip-192-168-88-108.ec2.internal" is ready
2023-08-27 12:12:04 [ℹ]  1 task: { create karpenter for stack "k01" }
2023-08-27 12:12:04 [ℹ]  building nodegroup stack "eksctl-k01-karpenter"
2023-08-27 12:12:05 [ℹ]  deploying stack "eksctl-k01-karpenter"
2023-08-27 12:12:05 [ℹ]  waiting for CloudFormation stack "eksctl-k01-karpenter"
2023-08-27 12:12:36 [ℹ]  waiting for CloudFormation stack "eksctl-k01-karpenter"
2023-08-27 12:13:12 [ℹ]  waiting for CloudFormation stack "eksctl-k01-karpenter"
2023-08-27 12:14:52 [ℹ]  waiting for CloudFormation stack "eksctl-k01-karpenter"
2023-08-27 12:14:52 [ℹ]  1 task: { create IAM role for serviceaccount "karpenter/karpenter" }
2023-08-27 12:14:52 [ℹ]  1 task: { create IAM role for serviceaccount "karpenter/karpenter" }
2023-08-27 12:14:52 [ℹ]  building iamserviceaccount stack "eksctl-k01-addon-iamserviceaccount-karpenter-karpenter"
2023-08-27 12:14:53 [ℹ]  deploying stack "eksctl-k01-addon-iamserviceaccount-karpenter-karpenter"
2023-08-27 12:14:53 [ℹ]  waiting for CloudFormation stack "eksctl-k01-addon-iamserviceaccount-karpenter-karpenter"
2023-08-27 12:15:23 [ℹ]  waiting for CloudFormation stack "eksctl-k01-addon-iamserviceaccount-karpenter-karpenter"
2023-08-27 12:15:24 [ℹ]  adding identity "arn:aws:iam::729560437327:role/eksctl-KarpenterNodeRole-k01" to auth ConfigMap
2023-08-27 12:15:24 [ℹ]  adding Karpenter to cluster k01
Error: failed to install Karpenter: failed to install Karpenter chart: failed to install chart: timed out waiting for the condition
$ kubectl get pods -A
NAMESPACE     NAME                         READY   STATUS    RESTARTS   AGE
karpenter     karpenter-54cf6c6b86-kcdh7   0/1     Pending   0          16m
karpenter     karpenter-54cf6c6b86-w9th5   0/1     Pending   0          16m
kube-system   aws-node-6xjsj               1/1     Running   0          21m
kube-system   aws-node-x4g69               1/1     Running   0          21m
kube-system   coredns-7975d6fb9b-j6fcj     0/1     Pending   0          30m
kube-system   coredns-7975d6fb9b-qsnhx     0/1     Pending   0          30m
kube-system   kube-proxy-f7tgp             1/1     Running   0          21m
kube-system   kube-proxy-hzkb5             1/1     Running   0          21m
$ kubectl describe pod -n karpenter karpenter-54cf6c6b86-kcdh7
...
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  2m33s (x4 over 17m)  default-scheduler  0/2 nodes are available: 2 node(s) had untolerated taint {node.cilium.io/agent-not-ready: true}. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.

Logs
(see above)

Versions

❯ eksctl info
eksctl version: 0.154.0-dev+f73436487.2023-08-24T11:52:10Z
kubectl version: v1.27.2
OS: darwin
@TiberiuGC
Copy link
Contributor

TiberiuGC commented Sep 8, 2023

HI @ruzickap - at the moment there is no workaround to achieve this using eksctl. There are two options to support this:

  1. Create the cluster with ng taints and install Cilium. Afterwards install Karpenter. However, there is no option to install Karpenter post cluster creation using eksctl. There is an open ticket - [Feature] I would like to be able to setup Karpenter for an existing cluster which I created via EKSCTL #6494 , but we don't know when we'll be able to deliver this.

  2. Create the cluster with ng taints and Karpenter (basically as you do above), and as you suggested, don't wait for
    Karpenter to become active. At the moment waiting is set to true by default when installing the helm chart.


    One idea is to provide an option via config file to make the waiting configurable e.g.

     ```
     karpenter:
       version: v0.29.2
       createServiceAccount: true
       withSpotInterruptionQueue: true
       wait: true // default is false
     ```
    

    But this needs more thought and internal discussions.

I have actually been testing myself option 2, and after installing Cilium on such cluster, Karpenter comes alive as expected. Only thing to bear in mind here is that the user would have to properly configure the cilium-operator service account and helm values. I will point to the issue you opened in Cilium project for future reference - cilium/cilium#27224.

@TiberiuGC TiberiuGC added the priority/important-longterm Important over the long term, but may not be currently staffed and/or may require multiple releases label Sep 8, 2023
@ruzickap
Copy link
Author

ruzickap commented Sep 8, 2023

Thank you for the nice summary.

The second option with wait parameter may be a quick win, because it may be not so difficult to add it...

@Himangini Himangini added kind/feature New feature or request kind/help Request for help and removed kind/bug labels Sep 12, 2023
@TiberiuGC
Copy link
Contributor

Hi @ruzickap - we've concluded that there's plenty community traction for supporting option 1, whereas adding the wait flag is, for now, only backed up by this particular use case. I will leave this ticket open as a feature request, mainly for visibility purposes, checking out whether there are other community members with use cases that would benefit from option 2.

For the moment, please stay tuned for any progress on #6494

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request kind/help Request for help priority/important-longterm Important over the long term, but may not be currently staffed and/or may require multiple releases
Projects
None yet
Development

No branches or pull requests

3 participants