"No endpoints available for service 'aws-load-balancer-webhook-service'" helm install error on initial deployment

**Bug Description**
Hi, we are using TF+flux to deploy our cluster to EKS. We have separate flux git repos for infrastructure components, apps and our cluster fleet.

When deploying cluster, we deploy infra components in logical groups (storage, networking, monitoring, etc.), using flux's dependency handling to make sure the crds, controllers and configs are deployed in the correct order.

As part of our `networking` component group, we install `aws-load-balancer-controller`, as well as a few other components (Envoy gateway, cert-manager, external-dns), currently all via helm charts.

The problem is that on the initial cluster deploy, flux `HelmReleases` for these other components will fail with the error:
```
* Internal error occurred: failed calling webhook "mservice.elbv2.k8s.aws": failed to call webhook: Post "https://aws-load-balancer-webhook-service.networking.svc:443/mutate-v1-service?timeout=10s": no endpoints available for service "aws-load-balancer-webhook-service"
```

Flux `HelmRelease` does not retry installs by default, so this leaves these releases in a permanent failure, requiring us to manually delete them and have flux re-reconcile them, which then succeeds and everyone on the cluster are happy.

If we increase the number of retries on these `HelmReleases`, the initial deploy will succeed (after a retry).

I don't know much about mutating webhooks, but this seems like `aws-load-balancer-controller` registers its mutating webhook before `aws-load-balancer-webhook-service` service is ready to handle requests, and these initial requests time out.



**Steps to Reproduce**

- provision a new EKS cluster
- deploy a few components using flux `HelmRelease`, including `aws-load-balancert-controller` (if required, I can try to provide a minimal project to reproduce this).

**Expected Behavior**
Other `HelmReleases` not erroring out with the (presumably) timeout error during the initial cluster deployment.

**Current Workarounds**
I could add retries to all other `HelmReleases`, or make `aws-load-balancer-controller` a separate component, making all other infra components dependent on it, but that seems like a bad approach, when other components don't depend in it being fully deployed.

**Environment**
- AWS Load Balancer controller version: v2.12.0
- Kubernetes version: 1.31
- Using EKS (yes/no), if so version?: 1.31
- Using Service or Ingress:
- AWS region: eu-west-1
- How was the aws-load-balancer-controller installed:
  - `helm ls`:
```
aws-load-balancer-controller    networking      1               2025-04-10 11:28:42.887441528 +0000 UTC deployed        aws-load-balancer-controller-1.12.0     v2.12.0
```
  - helm values:
```
USER-SUPPLIED VALUES:
clusterName: my-cluster
controllerConfig:
  featureGates:
    ServiceTypeLoadBalancerOnly: true
serviceAccount:
  create: false
  name: aws-load-balancer-controller
```
- Current state of the Controller configuration:
  - `kubectl -n <controllernamespace> describe deployment aws-load-balancer-controller`
```
Name:                   aws-load-balancer-controller
Namespace:              networking
CreationTimestamp:      Thu, 10 Apr 2025 13:28:46 +0200
Labels:                 app.kubernetes.io/instance=aws-load-balancer-controller
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=aws-load-balancer-controller
                        app.kubernetes.io/version=v2.12.0
                        helm.sh/chart=aws-load-balancer-controller-1.12.0
                        helm.toolkit.fluxcd.io/name=aws-load-balancer-controller
                        helm.toolkit.fluxcd.io/namespace=networking
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: aws-load-balancer-controller
                        meta.helm.sh/release-namespace: networking
Selector:               app.kubernetes.io/instance=aws-load-balancer-controller,app.kubernetes.io/name=aws-load-balancer-controller
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app.kubernetes.io/instance=aws-load-balancer-controller
                    app.kubernetes.io/name=aws-load-balancer-controller
  Annotations:      prometheus.io/port: 8080
                    prometheus.io/scrape: true
  Service Account:  aws-load-balancer-controller
  Containers:
   aws-load-balancer-controller:
    Image:       public.ecr.aws/eks/aws-load-balancer-controller:v2.12.0
    Ports:       9443/TCP, 8080/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      --cluster-name=my-cluster
      --ingress-class=alb
      --feature-gates=ServiceTypeLoadBalancerOnly=true
    Liveness:     http-get http://:61779/healthz delay=30s timeout=10s period=10s #success=1 #failure=2
    Readiness:    http-get http://:61779/readyz delay=10s timeout=10s period=10s #success=1 #failure=2
    Environment:  <none>
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
  Volumes:
   cert:
    Type:               Secret (a volume populated by a Secret)
    SecretName:         aws-load-balancer-tls
    Optional:           false
  Priority Class Name:  system-cluster-critical
  Node-Selectors:       <none>
  Tolerations:          <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   aws-load-balancer-controller-c95cbff64 (2/2 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  29m   deployment-controller  Scaled up replica set aws-load-balancer-controller-c95cbff64 to 2
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"No endpoints available for service 'aws-load-balancer-webhook-service'" helm install error on initial deployment #4140

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"No endpoints available for service 'aws-load-balancer-webhook-service'" helm install error on initial deployment #4140

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions