feat: Add GPU usage to pod view #3437

NirLevy98 · 2025-07-06T08:53:07Z

Description

Adds a GPU column to the pods view that displays the total number of GPUs requested by each pod.

Motivation

When managing GPU workloads in Kubernetes clusters, it's useful to quickly see which pods are consuming GPU resources directly in the k9s interface without having to describe each pod individually.

NirLevy98 · 2025-07-09T18:37:30Z

@derailed
Can you CR me please ?
Thanks!

derailed

@NirLevy98 Nice! figured that would be coming next. Thank you for this update!

derailed · 2025-07-11T02:05:02Z

internal/render/pod.go

@@ -67,6 +68,7 @@ var defaultPodHeader = model1.Header{
 	model1.HeaderColumn{Name: "%CPU/L", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}},
 	model1.HeaderColumn{Name: "%MEM/R", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}},
 	model1.HeaderColumn{Name: "%MEM/L", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}},
+	model1.HeaderColumn{Name: "GPUS", Attrs: model1.Attrs{Align: tview.AlignRight}},


I think it might be good to keep consistency with cpu aka GPU R/L' %GPU/R' '%GPU/L' as separate columns one can sort or see where they are at.
We should volunteer gpu reporting in the container view as well.
What do you think?

@derailed Done! I’ve updated it to match the CPU/MEM pattern with the GPU, GPU/RL, %GPU/R, and %GPU/L columns.
As for the container view, I think it’s nice to have, but not a must.
Anyway, I’d like to merge this feature if that’s okay, and start using it :)

internal/render/pod.go

derailed

@NirLevy98 Thank you for the updates. I have actually started working on this...
I think we should update container view as well so we remain consistent across all resource usage.

internal/render/pod.go

derailed · 2025-07-13T15:35:43Z

internal/render/pod.go

@@ -191,6 +199,10 @@ func (p *Pod) defaultRow(pwm *PodWithMetrics, row *model1.Row) error {
 		client.ToPercentageStr(c.cpu, r.lcpu),
 		client.ToPercentageStr(c.mem, r.mem),
 		client.ToPercentageStr(c.mem, r.lmem),
+		strconv.FormatInt(g.current, 10),


I think you can use toMi here.

Didn't work for me

internal/render/pod_test.go

NirLevy98 · 2025-07-14T14:30:00Z

@derailed WDYT ? :)

derailed

@NirLevy98 Thank you for the update. Though I do appreciate your enthuse to get this out, I hope you can appreciate that I end up having to support all code that makes it thru review. It's close but needs some TLC

internal/render/pod.go

derailed · 2025-07-14T15:57:31Z

internal/render/pod.go

 	cc := make([]v1.Container, 0, len(spec.InitContainers)+len(spec.Containers))
 	cc = append(cc, filterSidecarCO(spec.InitContainers)...)
 	cc = append(cc, spec.Containers...)

+	// Get CPU and Memory requests/limits
 	rcpu, rmem := cosRequests(cc)


Given the above, this should return rgpu i.e no need for cosGPU below since we can collect all request/limits in one swoop for cpu,gpu and mem.

internal/render/pod.go

derailed · 2025-07-14T15:59:42Z

internal/render/pod.go

+
+		// Check requests
+		if container.Resources.Requests != nil {
+			for _, gpuResource := range config.KnownGPUVendors {


This function will go away but we can dry this up into a helper that we can reuse when collecting request/limit

I removed this function. I’m not sure we need to move it now. Can you take a look at the new code and let me know what you think?

internal/render/pod_int_test.go

derailed · 2025-07-14T16:01:46Z

internal/render/pod_test.go

@@ -661,6 +664,166 @@ func TestCheckPhase(t *testing.T) {
 	}
 }

+func TestPodGPUCalculation(t *testing.T) {


This should all fold under gatherCoMX

NirLevy98 · 2025-07-15T10:28:10Z

@derailed
Fixed, and all the tests passed

derailed

@NirLevy98 Thank you for these updates!
Definitely better but still no quite what I'd envisioned:

Container need to expose gpu so user can track which container is pulling resources
GPU should be treated as just another resource

I need to push a new release and running out of time on these reviews. I'll have some of this implemented in the next drop and we can iterate from there.
Thank you for your support and guidance on seeing this thru.

derailed · 2025-07-15T13:44:21Z

internal/render/pod.go

@@ -67,6 +68,7 @@ var defaultPodHeader = model1.Header{
 	model1.HeaderColumn{Name: "%CPU/L", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}},
 	model1.HeaderColumn{Name: "%MEM/R", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}},
 	model1.HeaderColumn{Name: "%MEM/L", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}},
+	model1.HeaderColumn{Name: "GPU", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}},


We should expose request/limit for gpu resources.

derailed · 2025-07-15T13:45:38Z

internal/render/pod.go

+			mem.Add(*requests.Memory())
+		}
+
+		for _, gpuResource := range config.KnownGPUVendors {


Again this should be a helper

derailed · 2025-07-15T13:46:09Z

internal/render/pod.go

 	return
 }

-func cosLimits(cc []v1.Container) (cpuQ, memQ resource.Quantity) {
+func cosRequests(cc []v1.Container) (cpuQ, memQ resource.Quantity, gpu int64) {


Why is gpu different? it should be a Quantity

derailed · 2025-07-16T14:10:42Z

Closing as of v0.50.8

NirLevy98 force-pushed the add-gpus-to-pod-panel branch 3 times, most recently from 816c10b to 12797d1 Compare July 7, 2025 07:24

derailed requested changes Jul 11, 2025

View reviewed changes

NirLevy98 force-pushed the add-gpus-to-pod-panel branch 2 times, most recently from 3839765 to 820e6d1 Compare July 13, 2025 12:52

derailed requested changes Jul 13, 2025

View reviewed changes

NirLevy98 force-pushed the add-gpus-to-pod-panel branch 2 times, most recently from f75040f to ac2546a Compare July 13, 2025 17:14

NirLevy98 requested a review from derailed July 13, 2025 17:54

derailed requested changes Jul 14, 2025

View reviewed changes

NirLevy98 force-pushed the add-gpus-to-pod-panel branch 8 times, most recently from 9ca83df to 4d927c3 Compare July 15, 2025 10:02

NirLevy98 requested a review from derailed July 15, 2025 10:02

NirLevy98 force-pushed the add-gpus-to-pod-panel branch 2 times, most recently from c1e7b8d to c5c60f3 Compare July 15, 2025 10:53

feat: Add GPU usage to pod view

2a833e4

NirLevy98 force-pushed the add-gpus-to-pod-panel branch from c5c60f3 to 2a833e4 Compare July 15, 2025 10:58

derailed requested changes Jul 15, 2025

View reviewed changes

derailed mentioned this pull request Jul 15, 2025

Rel v0.50.8 #3457

Merged

derailed closed this Jul 16, 2025

Uh oh!

feat: Add GPU usage to pod view #3437

feat: Add GPU usage to pod view #3437

Uh oh!

Conversation

NirLevy98 commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation

Uh oh!

NirLevy98 commented Jul 9, 2025

Uh oh!

derailed left a comment

Choose a reason for hiding this comment

Uh oh!

derailed Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

derailed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NirLevy98 commented Jul 14, 2025

Uh oh!

derailed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NirLevy98 commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

derailed left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

derailed commented Jul 16, 2025

Uh oh!

Uh oh!

NirLevy98 commented Jul 6, 2025 •

edited

Loading

derailed Jul 11, 2025 •

edited

Loading

NirLevy98 commented Jul 15, 2025 •

edited

Loading