-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add check to access agent-metrics endpoint #151
base: main
Are you sure you want to change the base?
Conversation
cli := &http.Client{ | ||
Transport: &http.Transport{}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is worth configuring a timeout here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a timeout and made a couple of other changes and fixes:
- access agent-metrics (we don't have any health endpoints exposed, this was always restarting)
- make the health more pessimistic, return unhealthy when restart is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just left one comment.
This is intended as minimal change to fix signadot/signadot#5172 tested with warp, it causes restarts and we are able to connect whereas previously it hangs still to consider: - do we need to make this visible in status? - do we need to make check period configurable? add check to agent-metrics endpoints in tp monitor
7e8ac89
to
ed05c99
Compare
thanks, I made a couple of changes after more testing. could you look again? |
Yes, still LGTM. |
More testing revealed the status visibility pessimism is too pessimistic sometimes, looking at that. |
This is intended as minimal change to fix
https://github.com/signadot/signadot/issues/5172
tested with warp, it causes restarts and we are able to connect whereas previously it hangs
regarding discussion around:
do we need to make this visible in status?
I changed it so that it is more visible in status, returning unhealthy when a restart is needed instead of healthy.
do we need to make check period configurable?
I think not, because it is actually complicated because there 2 check periods for healthy and unhealthy state. Also, the check period relates to the timeout for the added checking request.