Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodes summary #83

Merged
merged 3 commits into from
Nov 18, 2024
Merged

Conversation

AdamsGH
Copy link
Contributor

@AdamsGH AdamsGH commented Oct 18, 2024

Hey! Here is a little improvement to add /nodes endpoint with some code refactoring for this needs. Also add ghcr registry build workflow. I hope this can solve issue from #28, at least that's my case.

Here is a little summary:

  1. Added build_ghcr.yaml for GitHub Container Registry workflow.
  2. Included a Helm chart in the helm directory for easier deployment.
  3. Updated main.go:
    • Added a new /nodes endpoint to retrieve metrics for all nodes in the cluster.
    • Improved error handling and context management.
  4. Updated main_test.go with new tests for the added functionality.
  5. Updated README.md:
  6. Added information about the new /nodes endpoint.
    • Included details about command-line flags and available metrics.

@george-angel
Copy link
Member

Hello @AdamsGH - sorry I was on leave.

Can you please explain more about your use-case and why you need a new endpoint for collecting summary from all Nodes?

Thanks!

@AdamsGH
Copy link
Contributor Author

AdamsGH commented Oct 31, 2024

Hi, @george-angel! No problem at all, thanks for your answer!

And yep, sure. We are working on the k8s management project where new nodes can be added in a few clicks. Your project works really well (and thank you, it's really great!) for us, but to automate the process we need somehow to get the name of each node and create a service monitor for it. I try to find a less painful solution available, but the only answer I've found so far is to deploy a pod to every node so they can create a service monitor for their own endpoint, which doesn't sound like a good solution for me.

Using /nodes from my fork solved problem with only one simple service monitor and provide data for every single node and without security problems. Idk if it fits well in the way you see your project, but I hope this contribution may be useful for someone else. And if I haven't messed anything up during testing - shouldn't break any of the existing functionality

@george-angel
Copy link
Member

Thanks for explaining.

Have you seen the scrape config example we provide (linked in the README) https://github.com/utilitywarehouse/kube-summary-exporter/blob/master/manifests/scrape-config.yaml ?

This is what we use ourselves. Our largest production cluster is 170 Nodes and >1700 PVCs. For this we run a single replica Deployment and its only using 30Mi memory. This way we leave the Node discovery to Prometheus. I think that sits neatly where it should.

This way when you go to Prometheus UI and check "Targets" you will see an endpoint per Node, and you get instance label assigned by Prometheus.

2024-11-01-150819_2560x363_scrot

Would this work for you?

@AdamsGH
Copy link
Contributor Author

AdamsGH commented Nov 1, 2024

Yeah, I saw it, look really good. But unfortunately our whole system is fundamentally built on service monitors, so adding scrape configs in our case is not an option at least for now =(

@george-angel
Copy link
Member

I see!

In that case I'm happy to accept a new /nodes endpoint to address this use-case. However:

  • I don't believe we need the GHCR build - we already have a build and push images to Quay.
  • Helm - we don't use Helm, and I don't want to take on the maintenance of this.

@AdamsGH
Copy link
Contributor Author

AdamsGH commented Nov 1, 2024

I'm glad to hear that! Rolled back unnecessary commits and left only main part

@george-angel
Copy link
Member

Thanks @AdamsGH !

I'm sorry I have pushed some changes since you opened the PR to fix bugs I found while reviewing. Could you please resolve conflicts?

I'm just getting a few more people to cast their eyes over your changes to see if I missed anything 👍

@george-angel george-angel requested a review from asiyani November 4, 2024 08:19
Copy link
Contributor

@asiyani asiyani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small request, could you please add some fake node data to test-summary.prom file so that node labels in the test is not empty string.
apart from that all good. 👍
Thank you for the PR!

@george-angel george-angel merged commit 13d04b5 into utilitywarehouse:master Nov 18, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants