Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(healthchecks): Adding Agent Health Checks for legacy & hubble control planes #1092
base: main
Are you sure you want to change the base?
feat(healthchecks): Adding Agent Health Checks for legacy & hubble control planes #1092
Changes from 4 commits
c7deff8
a1501ce
9c6f39b
df90b99
a200860
e4df974
cf30c6c
755a82b
c2dfc17
5d2a80e
2bf29fa
9004b86
cbcc57d
8e06fd1
201e8c9
f897632
12dd9c1
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why create a new port? and not reuse the existing 10093 ? since we are a hostnet: true pod, any port number we use comes with restrictions as it takes away from nodes usable port ranges. In AKS, we will have to pre register the ports we will be using to make sure customers are also aware of the ports they should not use to avoid any conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be already configured in Legacy & Hubble control planes as default HealthProbeBindAddress.
Its always provided when we set up k8s controller runtime options.
Hubble:
retina/pkg/config/hubble_config_linux.go
Line 42 in 4b12472
Legacy: https://github.com/microsoft/retina/blob/4b12472cc5007e0931238dd5e26f819e6be093b5/cmd/root.go#L42C10-L42C15
I think also @rbtr mentioned having a configurable port.
Do you think we should change to 10093?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I did some testing and got errors when using port 10093.
I don't think we can reuse the Retina port, some findings below:
We start a http api server with that port in controller manager:
retina/pkg/managers/controllermanager/controllermanager.go
Line 56 in b27958f
And the trail for setting up health check is:
The HealthProbeBindAddress we pass as part of options in the ctrl manager ends up here where a listener is created.
comment:
This will throw an error if the bind address is invalid or already in use.
(same logic as providing pprof address below)
https://github.com/kubernetes-sigs/controller-runtime/blob/aea2e32a936584b06ae6f7992f856fe7292b0297/pkg/manager/manager.go#L407
If listener exists k8s tries to add HealthProbeServer:
https://github.com/kubernetes-sigs/controller-runtime/blob/aea2e32a936584b06ae6f7992f856fe7292b0297/pkg/manager/internal.go#L400
addHealthProverServer() definition - creates a new server:
https://github.com/kubernetes-sigs/controller-runtime/blob/aea2e32a936584b06ae6f7992f856fe7292b0297/pkg/manager/internal.go#L290
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with Vamsi. Controller Manager's implementation isn't fit for purpose here, so we should just use the standard library and write the two HTTP handlers on whatever server is bound to 10093. It will probably provide a better healthcheck overall, since we're testing whether traffic can be served on 10093.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vakalapa @timraymond Updated the PR.