Skip to content

Conversation

@rybnico
Copy link

@rybnico rybnico commented Nov 11, 2025

What this PR does / why we need it:

I realised that we had many failed requests to load balancers on our K8s platforms using our OpenStack platform when we did maintenance that involved replacing nodes in a rolling fashion. I opened an issue in the Kubernetes project and I was told that the correct solution would be to have HTTP health checks on the Services to make sure not only the port is reachable, but also the service behind the port.

As the OCCM only supports HTTP health checks for load balancers with a TrafficPolicy of "Local" on the HealthCheckNodePort, I implemented HTTP monitors using additional annotations.

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 11, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @rybnico!

It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 11, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @rybnico. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kayrus for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 11, 2025
@kayrus
Copy link
Contributor

kayrus commented Nov 11, 2025

As the OCCM only supports HTTP health checks for load balancers with a TrafficPolicy of "Local" on the HealthCheckNodePort, I implemented HTTP monitors using additional annotations.

I never faced such issue before. With the TrafficPolicy: Cluster k8s automatically marks a pod as unavailable, and traffic is redirected by the node's kube-proxy to another node. If the node is marked for maintenance it's payload should be gracefully transferred to another node. When the payload is going to be shut down, it must stop receive new requests and gracefully handle already incoming connections. Thus I believe there is something wrong with your setup or your applications.

BTW, with the HTTP healthcheck you'll get the same problem, since loadbalancers usually have a latency with a healthcheck, and this latency is much more higher than k8s API understands that the pod is going to shut down, see kubernetes/kubernetes#133165 for more details. See also #2148 and try to use the node selector for the node maintenance #2601

See also #2869 docs update, which should be available since k8s 1.34

@kayrus
Copy link
Contributor

kayrus commented Nov 11, 2025

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 11, 2025
@rybnico
Copy link
Author

rybnico commented Nov 11, 2025

I never faced such issue before. With the TrafficPolicy: Cluster k8s automatically marks a pod as unavailable, and traffic is redirected by the node's kube-proxy to another node. If the node is marked for maintenance it's payload should be gracefully transferred to another node. When the payload is going to be shut down, it must stop receive new requests and gracefully handle already incoming connections. Thus I believe there is something wrong with your setup or your applications.

I encountered this issue when adding new nodes. New nodes may be added to the LoadBalancer before the network is fully set up. The service node ports open before Kubernetes networking can forward requests to the service pods, so the TCP health check succeeds but the application itself cannot be reached.

BTW, with the HTTP healthcheck you'll get the same problem, since loadbalancers usually have a latency with a healthcheck, and this latency is much more higher than k8s API understands that the pod is going to shut down, see kubernetes/kubernetes#133165 for more details. See also #2148 and try to use the node selector for the node maintenance #2601

The LoadBalancer should not add a new backend until at least one successful HTTP health check has been performed. This should help with the scenario I described above.

See also #2869 docs update, which should be available since k8s 1.34

I can't think of a solution for auto-labelling nodes for Cluster-API or KOPS-based deployments without manual action or a custom controller.

Aside from the issues we encountered with our load balancers when maintaining the clusters, I believe that support for HTTP health checks would be beneficial in general.

@kayrus
Copy link
Contributor

kayrus commented Nov 11, 2025

The LoadBalancer should not add a new backend until at least one successful HTTP health check has been performed. This should help with the scenario I described above.

Unfortunately this is not true for OpenStack Octavia. A new pool member receives the traffic immediately independently on the healthcheck results. This is true for OpenStack Amphora and for F5 Octavia backends/drivers. Other backends/drivers should follow the same approach. If not, please provide an evidence.

Aside from the issues we encountered with our load balancers when maintaining the clusters, I believe that support for HTTP health checks would be beneficial in general.

I don't think so, considering the facts above and the overall high latency for healthchecks from the LB side. Believe me, I'm trying to find a solution for the TrafficPolicy: Local downtime, when a new pool member is added, for several years. And the only solution to do this properly: fix octavia default behavior, then fix corresponding drivers/backends. Until this is implemented, please use service annotations and node labels to exclude a node from the lb pool members.

@rybnico
Copy link
Author

rybnico commented Nov 13, 2025

A new pool member receives the traffic immediately independently on the healthcheck results. This is true for OpenStack Amphora and for F5 Octavia backends/drivers. Other backends/drivers should follow the same approach. If not, please provide an evidence.

You're absolutely right. This doesn't seem logical to me, and other load balancer solutions that I know of always check new backends before sending traffic to them. Sorry for the misinformation, and thank you for clearing things up.

I found a theoretical solution to this problem: if you create a new member with AdminStateUp set to false and then set it to true after creation, the operating status doesn't become ONLINE if the health check fails. This solution is a bit cumbersome to implement (new members have to be created with AdminStateUp set to false and updated later), but I think it would solve this issue and others relating to TCP/UDP health checks too.

I would be happy to give it a try if you're interested in an implementation.

@kayrus
Copy link
Contributor

kayrus commented Nov 13, 2025

I found a theoretical solution to this problem: if you create a new member with AdminStateUp set to false and then set it to true after creation, the operating status doesn't become ONLINE if the health check fails. This solution is a bit cumbersome to implement (new members have to be created with AdminStateUp set to false and updated later), but I think it would solve this issue and others relating to TCP/UDP health checks too.

I'm aware about this, but in this case we need to have two reconciliation cycles:

  1. add an offline member
  2. mark it online

And we must have a delay between these cycles, what you can not really determine. Besides current k8s upstream cloud-controller logic doesn't have a two cycles logic.

@kayrus
Copy link
Contributor

kayrus commented Nov 13, 2025

Just for the reference to my research about this topic:

haproxy adds a pool member and considers it's alive: https://github.com/haproxy/haproxy/blob/697f7d1142a26352940532d0481cc17f9225d0d1/src/server.c#L2836

To change this behavior the haproxy config must contain init-state fully-down. But this must be indicated in the Octavia API. The resulting haproxy template patch should look like:

diff --git a/octavia/common/jinja/haproxy/combined_listeners/templates/macros.j2 b/octavia/common/jinja/haproxy/combined_listeners/templates/macros.j2
index 5a81c37..45c6b91 100644
--- a/octavia/common/jinja/haproxy/combined_listeners/templates/macros.j2
+++ b/octavia/common/jinja/haproxy/combined_listeners/templates/macros.j2
@@ macro member_macro(constants, lib_consts, pool, member) %}
     {% if pool.health_monitor and pool.health_monitor.enabled %}
         {% if member.monitor_address %}
@@
     {% endif %}
     {% if pool.alpn_protocols is defined %}
         {% set alpn_opt = " alpn %s"|format(pool.alpn_protocols) %}
     {% else %}
         {% set alpn_opt = "" %}
     {% endif %}
+    {% if member.monitor_init_state_down %}
+        {% set init_state_opt = " init-state fully-down" %}
+    {% else %}
+        {% set init_state_opt = "" %}
+    {% endif %}
-    {{ "server %s %s:%d weight %s%s%s%s%s%s%s%s%s%s%s%s%s%s%s"|e|format(
-        member.id, member.address, member.protocol_port, member.weight,
-        hm_opt, persistence_opt, proxy_protocol_opt, member_backup_opt,
-        member_enabled_opt, def_opt_prefix, def_crt_opt, ca_opt, crl_opt,
-        def_verify_opt, def_sni_opt, ciphers_opt, tls_versions_opt,
-        alpn_opt)|trim() }}
+    {{ "server %s %s:%d weight %s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s"|e|format(
+        member.id, member.address, member.protocol_port, member.weight,
+        hm_opt, persistence_opt, proxy_protocol_opt, member_backup_opt,
+        member_enabled_opt, def_opt_prefix, def_crt_opt, ca_opt, crl_opt,
+        def_verify_opt, def_sni_opt, ciphers_opt, tls_versions_opt,
+        alpn_opt, init_state_opt)|trim() }}
 {% endmacro %}

Once the upstream octavia codebase get this support, backend vendors should also add this support, and we can implement support for this option in OCCM.

@kayrus
Copy link
Contributor

kayrus commented Nov 18, 2025

@rybnico
Copy link
Author

rybnico commented Nov 18, 2025

Thank you for the detailed information, @kayrus!

I just wanted to ask if you had created a PR/bug in the Octavia project. Then I saw your reply.

I will keep an eye on the Octavia bug. Please tell me if there's anything I can help with.

@kayrus
Copy link
Contributor

kayrus commented Nov 18, 2025

I just wanted to ask if you had created a PR/bug in the Octavia project.

No, I hope that someone from Octavia team can handle this.

Please tell me if there's anything I can help with.

you can subscribe to this bug to increase the bug severity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants