Nodes summary #83

AdamsGH · 2024-10-18T11:53:56Z

Hey! Here is a little improvement to add /nodes endpoint with some code refactoring for this needs. Also add ghcr registry build workflow. I hope this can solve issue from #28, at least that's my case.

Here is a little summary:

Added build_ghcr.yaml for GitHub Container Registry workflow.
Included a Helm chart in the helm directory for easier deployment.
Updated main.go:
- Added a new /nodes endpoint to retrieve metrics for all nodes in the cluster.
- Improved error handling and context management.
Updated main_test.go with new tests for the added functionality.
Updated README.md:
Added information about the new /nodes endpoint.
- Included details about command-line flags and available metrics.

george-angel · 2024-10-31T09:29:54Z

Hello @AdamsGH - sorry I was on leave.

Can you please explain more about your use-case and why you need a new endpoint for collecting summary from all Nodes?

Thanks!

AdamsGH · 2024-10-31T12:53:28Z

Hi, @george-angel! No problem at all, thanks for your answer!

And yep, sure. We are working on the k8s management project where new nodes can be added in a few clicks. Your project works really well (and thank you, it's really great!) for us, but to automate the process we need somehow to get the name of each node and create a service monitor for it. I try to find a less painful solution available, but the only answer I've found so far is to deploy a pod to every node so they can create a service monitor for their own endpoint, which doesn't sound like a good solution for me.

Using /nodes from my fork solved problem with only one simple service monitor and provide data for every single node and without security problems. Idk if it fits well in the way you see your project, but I hope this contribution may be useful for someone else. And if I haven't messed anything up during testing - shouldn't break any of the existing functionality

george-angel · 2024-11-01T05:10:59Z

Thanks for explaining.

Have you seen the scrape config example we provide (linked in the README) https://github.com/utilitywarehouse/kube-summary-exporter/blob/master/manifests/scrape-config.yaml ?

This is what we use ourselves. Our largest production cluster is 170 Nodes and >1700 PVCs. For this we run a single replica Deployment and its only using 30Mi memory. This way we leave the Node discovery to Prometheus. I think that sits neatly where it should.

This way when you go to Prometheus UI and check "Targets" you will see an endpoint per Node, and you get instance label assigned by Prometheus.

Would this work for you?

AdamsGH · 2024-11-01T06:08:28Z

Yeah, I saw it, look really good. But unfortunately our whole system is fundamentally built on service monitors, so adding scrape configs in our case is not an option at least for now =(

george-angel · 2024-11-01T06:45:19Z

I see!

In that case I'm happy to accept a new /nodes endpoint to address this use-case. However:

I don't believe we need the GHCR build - we already have a build and push images to Quay.
Helm - we don't use Helm, and I don't want to take on the maintenance of this.

AdamsGH · 2024-11-01T09:10:21Z

I'm glad to hear that! Rolled back unnecessary commits and left only main part

george-angel · 2024-11-04T03:30:45Z

Thanks @AdamsGH !

I'm sorry I have pushed some changes since you opened the PR to fix bugs I found while reviewing. Could you please resolve conflicts?

I'm just getting a few more people to cast their eyes over your changes to see if I missed anything 👍

asiyani

small request, could you please add some fake node data to test-summary.prom file so that node labels in the test is not empty string.
apart from that all good. 👍
Thank you for the PR!

Verify its set as a label in tests

Add new endpoint with all nodes data

bd0b6e8

AdamsGH force-pushed the nodes-summary branch from 8f114e4 to bd0b6e8 Compare November 1, 2024 09:06

george-angel requested a review from asiyani November 4, 2024 08:19

asiyani approved these changes Nov 4, 2024

View reviewed changes

george-angel added 2 commits November 15, 2024 14:39

Merge remote-tracking branch 'origin/master' into nodes-summary

da0f0a6

Add nodeName to test data

dd14294

Verify its set as a label in tests

asiyani approved these changes Nov 15, 2024

View reviewed changes

george-angel merged commit 13d04b5 into utilitywarehouse:master Nov 18, 2024
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Nodes summary #83

Nodes summary #83

Uh oh!

AdamsGH commented Oct 18, 2024

Uh oh!

george-angel commented Oct 31, 2024

Uh oh!

AdamsGH commented Oct 31, 2024

Uh oh!

george-angel commented Nov 1, 2024

Uh oh!

AdamsGH commented Nov 1, 2024

Uh oh!

george-angel commented Nov 1, 2024

Uh oh!

AdamsGH commented Nov 1, 2024

Uh oh!

george-angel commented Nov 4, 2024

Uh oh!

asiyani left a comment

Uh oh!

Uh oh!

Uh oh!

Nodes summary #83

Nodes summary #83

Uh oh!

Conversation

AdamsGH commented Oct 18, 2024

Uh oh!

george-angel commented Oct 31, 2024

Uh oh!

AdamsGH commented Oct 31, 2024

Uh oh!

george-angel commented Nov 1, 2024

Uh oh!

AdamsGH commented Nov 1, 2024

Uh oh!

george-angel commented Nov 1, 2024

Uh oh!

AdamsGH commented Nov 1, 2024

Uh oh!

george-angel commented Nov 4, 2024

Uh oh!

asiyani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!