-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: Add GPU usage to pod view #3437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
816c10b
to
12797d1
Compare
@derailed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NirLevy98 Nice! figured that would be coming next. Thank you for this update!
internal/render/pod.go
Outdated
@@ -67,6 +68,7 @@ var defaultPodHeader = model1.Header{ | |||
model1.HeaderColumn{Name: "%CPU/L", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}}, | |||
model1.HeaderColumn{Name: "%MEM/R", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}}, | |||
model1.HeaderColumn{Name: "%MEM/L", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}}, | |||
model1.HeaderColumn{Name: "GPUS", Attrs: model1.Attrs{Align: tview.AlignRight}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be good to keep consistency with cpu aka GPU R/L'
%GPU/R' '%GPU/L' as separate columns one can sort or see where they are at.
We should volunteer gpu reporting in the container view as well.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@derailed Done! I’ve updated it to match the CPU/MEM pattern with the GPU, GPU/RL, %GPU/R, and %GPU/L columns.
As for the container view, I think it’s nice to have, but not a must.
Anyway, I’d like to merge this feature if that’s okay, and start using it :)
3839765
to
820e6d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NirLevy98 Thank you for the updates. I have actually started working on this...
I think we should update container view as well so we remain consistent across all resource usage.
internal/render/pod.go
Outdated
@@ -191,6 +199,10 @@ func (p *Pod) defaultRow(pwm *PodWithMetrics, row *model1.Row) error { | |||
client.ToPercentageStr(c.cpu, r.lcpu), | |||
client.ToPercentageStr(c.mem, r.mem), | |||
client.ToPercentageStr(c.mem, r.lmem), | |||
strconv.FormatInt(g.current, 10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use toMi
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't work for me
f75040f
to
ac2546a
Compare
@derailed WDYT ? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NirLevy98 Thank you for the update. Though I do appreciate your enthuse to get this out, I hope you can appreciate that I end up having to support all code that makes it thru review. It's close but needs some TLC
internal/render/pod.go
Outdated
cc := make([]v1.Container, 0, len(spec.InitContainers)+len(spec.Containers)) | ||
cc = append(cc, filterSidecarCO(spec.InitContainers)...) | ||
cc = append(cc, spec.Containers...) | ||
|
||
// Get CPU and Memory requests/limits | ||
rcpu, rmem := cosRequests(cc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the above, this should return rgpu i.e no need for cosGPU below since we can collect all request/limits in one swoop for cpu,gpu and mem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
internal/render/pod.go
Outdated
|
||
// Check requests | ||
if container.Resources.Requests != nil { | ||
for _, gpuResource := range config.KnownGPUVendors { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function will go away but we can dry this up into a helper that we can reuse when collecting request/limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this function. I’m not sure we need to move it now. Can you take a look at the new code and let me know what you think?
internal/render/pod_test.go
Outdated
@@ -661,6 +664,166 @@ func TestCheckPhase(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestPodGPUCalculation(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should all fold under gatherCoMX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
9ca83df
to
4d927c3
Compare
@derailed ![]() |
c1e7b8d
to
c5c60f3
Compare
c5c60f3
to
2a833e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NirLevy98 Thank you for these updates!
Definitely better but still no quite what I'd envisioned:
- Container need to expose gpu so user can track which container is pulling resources
- GPU should be treated as just another resource
I need to push a new release and running out of time on these reviews. I'll have some of this implemented in the next drop and we can iterate from there.
Thank you for your support and guidance on seeing this thru.
@@ -67,6 +68,7 @@ var defaultPodHeader = model1.Header{ | |||
model1.HeaderColumn{Name: "%CPU/L", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}}, | |||
model1.HeaderColumn{Name: "%MEM/R", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}}, | |||
model1.HeaderColumn{Name: "%MEM/L", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}}, | |||
model1.HeaderColumn{Name: "GPU", Attrs: model1.Attrs{Align: tview.AlignRight, MX: true}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should expose request/limit for gpu resources.
mem.Add(*requests.Memory()) | ||
} | ||
|
||
for _, gpuResource := range config.KnownGPUVendors { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this should be a helper
return | ||
} | ||
|
||
func cosLimits(cc []v1.Container) (cpuQ, memQ resource.Quantity) { | ||
func cosRequests(cc []v1.Container) (cpuQ, memQ resource.Quantity, gpu int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is gpu different? it should be a Quantity
Closing as of v0.50.8 |
Description
Adds a GPU column to the pods view that displays the total number of GPUs requested by each pod.
Motivation
When managing GPU workloads in Kubernetes clusters, it's useful to quickly see which pods are consuming GPU resources directly in the k9s interface without having to describe each pod individually.