-
Notifications
You must be signed in to change notification settings - Fork 232
Tracy GPU profiling v2 #1889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracy GPU profiling v2 #1889
Conversation
ad43d2f to
96ed23a
Compare
96ed23a to
6bfc981
Compare
|
I added Then, in the last commit, I took a stab at making the thing optional. There are two ways I imagine this can go:
Option 1 sounds enticing, however there's one problem with it: I'm not sure it's possible to plug in tracy-client's So here I tried Option 2. It's kinda wordy with all the cfgs, but I tried to make it manageable. My goals were:
So, the profiler API is always present. If the new One new question: should the |
6bfc981 to
3f95c76
Compare
Drakulix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like the api quite well. I am not sure however, if we want to force tracy_gpu_profiling to enable renderer_gl.
After all, once we gain a Vulkan renderer, that might be able to add profiling with tracy and a similar api?
Without gl_renderer the whole module shouldn't be compiled anyway, right?
| "GL_EXT_texture_format_BGRA8888", | ||
| "GL_EXT_unpack_subimage", | ||
| "GL_OES_EGL_sync", | ||
| "GL_EXT_disjoint_timer_query", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can stay. While generating code for unused extensions slightly increases the compile times for non-profile builds as well, I don't think it does much in the grand scheme of things.
3f95c76 to
01278f6
Compare
Maybe we can postpone this change until then? Right now the thing is very tied to GL and I'm not sure how exactly it will look with a second renderer backend. |
01278f6 to
f603c24
Compare
|
Alright, this should be done. Updated niri commit to the new Screenshot of the new GPU names commit on my laptop:
Please squash when merging if I don't. |
Remove profiler glFlush() and most calls outside GlesFrame Rename EnteredGpuTracepoint -> GpuSpan Expose profiling methods on GlesFrame Add GPU span to blit() It exports a sync point so there's a flush we can piggyback off. Gate GPU profiling behind new tracy_gpu_profiling feature flag Add profiling scope to QueryPool::collect() Implement deleting timestamp queries Bump tracy-client Handle missing timer query extension Document methods Add GlesFrame::with_gpu_span() instead of the error-prone manual API Make ScopedGpuSpan use a shared borrow making it possible to nest them The enter() in render texture actually had a mistake where early error returns were possible without exit(), which would cause an assertion failure. Co-authored-by: Christian Meissl <[email protected]>
Lets you differentiate between them easier in Tracy.
f603c24 to
ebadbd9
Compare
|
My previous squash moved the Co-authored-by line to the middle of the message 🤦 |
Drakulix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @cmeissl Do you want to take a quick look at this as well?
Yes, will take a look tomorrow latest. |
If these checks aren't hit it's not the end of the world, so let's leave them debug-only.
Yeah, doing this were we flush makes sense. Though it would be nice if Tracy could help us see what time is spent on buffer imports. |
|
|
||
| let context = client | ||
| .new_gpu_context( | ||
| Some(gpu_name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cosmic-comp uses separate surface threads for each output being rendered, each with their own shared EGL context.
This seems to result in the contexts for all connectors of the same name being given the same name in tracy. (Why CPU activity is labeled with profiling::register_thread!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you suggest naming these instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it looks like tracy does show the thread name as well when you hover over a GPU zone. So maybe that's okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, hovering over GPU zones highlights the corresponding CPU zone
ids1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks reasonable.
I need to get used to some of the relevant features in Tracy, but it seems to be working in cosmic-comp.
cmeissl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up!






Supersedes #1134, cc @cmeissl. Depends on nagisa/rust_tracy_client#153.
Rebased things and reimplemented the timestamp query tracking. It now more or less matches what I do in https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3391 and should be correct.
This is still WIP:
still tied to tracy-client; my plan is to try abstracting it away, so the downstream compositor can plug in Tracy on its own (the API will be Tracy shaped though)someone needs to verify that I rebased stuff correctly (that the GPU spans make sense)is there a better place to callcollect()? During my testing I already hit a case where I needed to addcollect()tocleanup_texture_cache()since otherwise no-damage frames build up timestamps that don't get collected. Can we avoid this situation somehow?probably need to callsync_gpu_time()in more places, for example things likeimport_shm_bufferare sometimes called on their own, and without a sync close by, their GPU timestamps driftneed to figure out how to delete the timestamp queriesI set the max queries to 1024 matching Mutter; we can bump this if needed (and replace inline array with Vec in that case I suppose)collect()(ideally before a batch of GPU work) and it needs to callsync_gpu_time()after a batch of GPU work. But ideally not in the middle of ongoing GPU work to avoid stalls. Currently I put these inrender()andblit()-- the two places that create an EGL fence and flush GL at the end. And all GPU profiling spans are therefore also restricted to withinrender()andblit(), otherwise withoutcollect()timestamps may build up and overflow, and withoutsync_gpu_time()they may drift. This misses some potentially interesting (?) places like buffer imports. Not sure what to do about this.