-
Notifications
You must be signed in to change notification settings - Fork 203
K8SPXC-1734 Add configurable HAProxy health check parameters #2207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
K8SPXC-1734 Add configurable HAProxy health check parameters #2207
Conversation
bc23e2f to
cb1bef1
Compare
|
I'm btw happy to add this change to the Helm chart and the docs as well, but wanted to wait with those PRs until I knew for sure this would be accepted. |
cb1bef1 to
c4cc754
Compare
|
@timstoop haproxy does not have |
|
I totally agree that it makes sense to have it as default, it's what we were planning on doing in our Helm default code anyway. I only made it togglable in case there were good reasons that I didn't see and I really wanted to have this change implemented :-) I totally missed the 2205 PR, no idea how that could happen. |
yes, please |
| firs_node_replica='' | ||
| main_node='' | ||
|
|
||
| SERVER_OPTIONS=${HA_SERVER_OPTIONS:-'resolvers kubernetes check inter 10000 rise 1 fall 2 weight 1'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if I want to set custom HA_SERVER_OPTIONS via a secret? Will it work with your changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that would break. Fixed in my latest commit!
|
Removed on marked down shutdown sessions option, as requested. Also fixed the handling of HA_SERVER_OPTIONS via a secret. |
|
hello @timstoop , we are planning to include this PR in the next PXC release. Is it possible for you to have a look on the conflicts and the test failures so that we can proceed with the review and the testing? Thanks a lot! |
f8e092d to
6107a65
Compare
|
Rebased on main. |
6107a65 to
c797815
Compare
c797815 to
7952d6c
Compare
Add support for configuring HAProxy health check parameters through the PerconaXtraDBCluster CR spec, allowing operators to tune health check behavior for their specific environments. Changes: - Added healthCheck field to HAProxySpec with interval, fall, rise, and shutdownOnMarkDown options - Modified haproxy_add_pxc_nodes.sh to use HA_SERVER_OPTIONS and HA_SHUTDOWN_ON_MARK_DOWN environment variables - Updated StatefulSet generation to pass health check config as env vars - Added comprehensive test coverage for health check configurations
7952d6c to
2ecd007
Compare
|
Rebased and cleaned up the branch. Now contains a single commit with only the necessary changes:
All generated files (CRDs, manifests) created with correct tool versions. |
Add HA_SERVER_OPTIONS environment variable to HAProxy statefulset comparison files to match the new default configuration generated by the operator. This environment variable is now added to all HAProxy deployments with default values: resolvers kubernetes check inter 10000 rise 1 fall 2 weight 1
|
Updated e2e test comparison files to include the new I updated these manually by adding the env var to each HAProxy statefulset comparison file. Is there tooling or a script to regenerate these comparison files that I should be using instead? |
|
Note: There are 2 other test failures (custom-users-8-0, demand-backup-encrypted-with-tls-8-0) that appear unrelated to this PR. The custom-users test shows a password rotation issue, which doesn't involve HAProxy configuration. |
Move HA_SERVER_OPTIONS env var to be added after REPLICAS_SVC_ONLY_READERS to maintain consistent ordering in the generated StatefulSet configuration. This ensures the environment variables appear in the expected order: 1. PXC_SERVICE 2. IS_PROXY_PROTOCOL (if present) 3. REPLICAS_SVC_ONLY_READERS 4. HA_SERVER_OPTIONS
pkg/pxc/app/statefulset/haproxy.go
Outdated
| // Add health check configuration env vars after REPLICAS_SVC_ONLY_READERS | ||
| if cr.Spec.HAProxy != nil { | ||
| container.Env = append(container.Env, buildHAProxyHealthCheckEnvVars(cr.Spec.HAProxy.HealthCheck)...) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to add this env var only if crVersion is 1.19.0 and above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upgrade-haproxy fails because of this
Only add HA_SERVER_OPTIONS environment variable for crVersion 1.19.0 and above to maintain backward compatibility during upgrades. Fixes upgrade-haproxy test failures.
commit: 89209ce |
CHANGE DESCRIPTION
Problem:
When a PXC node fails (e.g., during rolling restart), HAProxy takes 20+ seconds to detect the failure with default settings (
check inter 10000+fall 2= 20s). Worse, existing client connections to the failed backend are NOT terminated, causing them to hang until TCP timeout (potentially minutes).The only workaround is to provide a complete HAProxy configuration via
haproxy.configuration, which duplicates operator logic, breaks on upgrades, and is difficult to maintain.Cause:
HAProxy backend health check parameters (
interval,fall,rise,on-marked-down shutdown-sessions) are hardcoded in the operator and not exposed through the CR API.Solution:
Add a new
healthCheckfield to the HAProxy spec allowing granular control:This enables fast failover (6s vs 20s) and active connection cleanup without overriding the entire configuration.
CHECKLIST
Jira
Needs Doc) and QA (Needs QA)?Tests
compare/*-oc.yml)?Config/Logging/Testability
Additional Notes:
Fixes #2206