Skip to content

Conversation

@jigisha620
Copy link
Contributor

@jigisha620 jigisha620 commented Oct 30, 2025

Fixes #N/A

Description
This PR adds a function to calculate cluster cost. This can be used to measure performance. For example - when Karpenter performs consolidation, this function can be used to compare cost before and after consolidation.

How was this change tested?
https://github.com/jigisha620/karpenter/actions/runs/18952021119/job/54118913345

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jigisha620
Once this PR has been reviewed and has the lgtm label, please assign tzneal for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 30, 2025
// them against pricing data from env.InstanceTypes. Returns 0.0 if no nodes or pricing data exists.
//
//nolint:gocyclo
func (env *Environment) GetClusterCost() float64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a metric scraping approach is more extensible. Karpenter should emit cost metrics soon, and if we want to set other performance SLAs we might want to do it via metrics scraping. What are your thoughts?

Copy link
Contributor Author

@jigisha620 jigisha620 Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move to metrics once we have them, but until then I think this is the most straightforward way to get the cluster cost.

@coveralls
Copy link

coveralls commented Oct 30, 2025

Pull Request Test Coverage Report for Build 18954237020

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 4 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.03%) to 81.593%

Files with Coverage Reduction New Missed Lines %
pkg/controllers/disruption/drift.go 2 88.0%
pkg/controllers/node/termination/controller.go 2 77.14%
Totals Coverage Status
Change from base Build 18947767733: -0.03%
Covered Lines: 11565
Relevant Lines: 14174

💛 - Coveralls

Copy link
Contributor Author

@jigisha620 jigisha620 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/karpenter perf

Copy link
Contributor Author

@jigisha620 jigisha620 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/karpenter perf

@jigisha620 jigisha620 force-pushed the perf-tests branch 2 times, most recently from 8d8477b to bda31a5 Compare October 30, 2025 19:25
for _, path := range paths {
if path != "" {
if data, err := os.ReadFile(path); err != nil {
return nil, fmt.Errorf("could not read instance types file %s: %w", path, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
return nil, fmt.Errorf("could not read instance types file %s: %w", path, err)
return nil, serrors.Wrap(fmt.Errorf("could not read instance types file, %w", err), "file", path)


instanceTypes, err := kwok.ConstructInstanceTypes(ctx)
if err != nil {
log.Default().Printf("failed constructing instance types: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also be a fatal error?

Suggested change
log.Default().Printf("failed constructing instance types: %v", err)
log.Default().Printf("failed constructing instance types, %w", err)

// Get the output dir if it's set
outputDir, _ := os.LookupEnv("OUTPUT_DIR")

instanceTypes, err := kwok.ConstructInstanceTypes(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea of us taking a direct dependency on the kwok provider implementation inside our common testing package. If we were to have this type of util, I would constrain it to a dedicated kwok testing package so that kwok details don't leak into our common packages. This problem goes away though if we just wait to rely on the upstream metric.


By("Calculating cluster cost")
nodes := &corev1.NodeList{}
Expect(env.Client.List(env, nodes, client.HasLabels{test.DiscoveryLabel})).To(Succeed())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with this approach, we should scrape NodeClaims rather than nodes since they exist for the instance's entire lifetime wheras nodes only exist for a subset of it.

Comment on lines +1294 to +1295
OS: node.Labels["kubernetes.io/os"],
Arch: node.Labels["kubernetes.io/arch"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we care about OS and arch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants