Skip to content

Conversation

@YaLTeR
Copy link
Contributor

@YaLTeR YaLTeR commented Dec 27, 2025

Supersedes #1134, cc @cmeissl. Depends on nagisa/rust_tracy_client#153.

Rebased things and reimplemented the timestamp query tracking. It now more or less matches what I do in https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3391 and should be correct.

This is still WIP:

  • still tied to tracy-client; my plan is to try abstracting it away, so the downstream compositor can plug in Tracy on its own (the API will be Tracy shaped though)
  • someone needs to verify that I rebased stuff correctly (that the GPU spans make sense)
  • is there a better place to call collect()? During my testing I already hit a case where I needed to add collect() to cleanup_texture_cache() since otherwise no-damage frames build up timestamps that don't get collected. Can we avoid this situation somehow?
  • probably need to call sync_gpu_time() in more places, for example things like import_shm_buffer are sometimes called on their own, and without a sync close by, their GPU timestamps drift
  • need to figure out how to delete the timestamp queries
  • I set the max queries to 1024 matching Mutter; we can bump this if needed (and replace inline array with Vec in that case I suppose)
  • The instrumentation needs to regularly call collect() (ideally before a batch of GPU work) and it needs to call sync_gpu_time() after a batch of GPU work. But ideally not in the middle of ongoing GPU work to avoid stalls. Currently I put these in render() and blit() -- the two places that create an EGL fence and flush GL at the end. And all GPU profiling spans are therefore also restricted to within render() and blit(), otherwise without collect() timestamps may build up and overflow, and without sync_gpu_time() they may drift. This misses some potentially interesting (?) places like buffer imports. Not sure what to do about this.

@YaLTeR
Copy link
Contributor Author

YaLTeR commented Dec 27, 2025

Oh yeah, obligatory screenshots. Winit with some offscreen rendering, then the final render:

Screenshot from 2025-12-27 13-42-01

TTY:

image

@YaLTeR
Copy link
Contributor Author

YaLTeR commented Dec 27, 2025

Hm, I notice that it likes to cause constant back and forth to some GL thread:

image

Possibly it's the glFlush() calls on exit(). Maybe need to bring it back from there to finish_internal().

It's a similar problem to sync_gpu_time() I guess. We need to call glFlush() when we know there won't be GPU work immediately thereafter, and we don't want to call it if we know that there will be GPU work immediately.

@YaLTeR YaLTeR force-pushed the tracy-gpu-profiling branch from ad43d2f to 96ed23a Compare December 27, 2025 15:33
@YaLTeR
Copy link
Contributor Author

YaLTeR commented Dec 27, 2025

I pushed a commit where I removed all profiler calls outside render() to avoid the collect() and sync_gpu_time() problem, also removed the glFlush() in exit. I think the GPU spans make much more sense this way? Here's an example when screencasting output with OBS:

image

@YaLTeR YaLTeR force-pushed the tracy-gpu-profiling branch from 96ed23a to 6bfc981 Compare December 28, 2025 06:40
@YaLTeR
Copy link
Contributor Author

YaLTeR commented Dec 28, 2025

I added with_profiled_context() and manual enter/exit span on GlesFrame to make it possible for compositors to profile their custom render elements.

Then, in the last commit, I took a stab at making the thing optional. There are two ways I imagine this can go:

  1. Have a GpuProfiler trait with methods for enter span, exit span, upload GPU timestamp. Receive an instance of this type when creating GlesRenderer.
  2. Depend on Tracy and have a bunch of cfgs to make it optional.

Option 1 sounds enticing, however there's one problem with it: I'm not sure it's possible to plug in tracy-client's &'static span locations this way. Or in general with Rust, at least until maybe we have const fn trait methods. So we'd have to use allocated spans which is a bit meh.

So here I tried Option 2. It's kinda wordy with all the cfgs, but I tried to make it manageable. My goals were:

  • Make tracy-client dep optional. While it has its own enable/disable mechanism, it still links Tracy even when disabled, so it should be optional in Smithay.
  • Even when the instrumentation is enabled, leave the actual tracy-client enablement flags to the compositor (since it may want to have it on-demand or something else).
  • Keep the cfgs inside Smithay's profiler impl. It's just nice when you don't need to spam cfg in your compositor for profiling.

So, the profiler API is always present. If the new tracy_gpu_profiling feature is on, it calls into tracy-client and does the whole OpenGL timestamp querying. If the feature is off, it compiles into mostly empty stubs. The compositor can write GPU profiling instrumentation, then enable the features to make it active. Example here: YaLTeR/niri@745f4a9

One new question: should the GL_EXT_disjoint_timer_query generation in build.rs also be disabled when the feature is disabled, or is it fine to leave that in?

@YaLTeR
Copy link
Contributor Author

YaLTeR commented Dec 28, 2025

And here's a screenshot showcasing multi-GPU rendering (niri always renders on the main GPU). Hopefully it looks right.

image

@YaLTeR YaLTeR force-pushed the tracy-gpu-profiling branch from 6bfc981 to 3f95c76 Compare January 5, 2026 12:04
Copy link
Member

@Drakulix Drakulix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the api quite well. I am not sure however, if we want to force tracy_gpu_profiling to enable renderer_gl.

After all, once we gain a Vulkan renderer, that might be able to add profiling with tracy and a similar api?

Without gl_renderer the whole module shouldn't be compiled anyway, right?

"GL_EXT_texture_format_BGRA8888",
"GL_EXT_unpack_subimage",
"GL_OES_EGL_sync",
"GL_EXT_disjoint_timer_query",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can stay. While generating code for unused extensions slightly increases the compile times for non-profile builds as well, I don't think it does much in the grand scheme of things.

@YaLTeR YaLTeR force-pushed the tracy-gpu-profiling branch from 3f95c76 to 01278f6 Compare January 5, 2026 13:19
@YaLTeR
Copy link
Contributor Author

YaLTeR commented Jan 5, 2026

Overall I like the api quite well. I am not sure however, if we want to force tracy_gpu_profiling to enable renderer_gl.

After all, once we gain a Vulkan renderer, that might be able to add profiling with tracy and a similar api?

Without gl_renderer the whole module shouldn't be compiled anyway, right?

Maybe we can postpone this change until then? Right now the thing is very tied to GL and I'm not sure how exactly it will look with a second renderer backend.

@YaLTeR YaLTeR force-pushed the tracy-gpu-profiling branch from 01278f6 to f603c24 Compare January 5, 2026 13:47
@YaLTeR
Copy link
Contributor Author

YaLTeR commented Jan 5, 2026

Alright, this should be done. Updated niri commit to the new GlesFrame::with_gpu_span() API: YaLTeR/niri@e558074

Screenshot of the new GPU names commit on my laptop:

image

Please squash when merging if I don't.

@YaLTeR YaLTeR marked this pull request as ready for review January 5, 2026 13:57
YaLTeR and others added 3 commits January 5, 2026 16:58
Remove profiler glFlush() and most calls outside GlesFrame

Rename EnteredGpuTracepoint -> GpuSpan

Expose profiling methods on GlesFrame

Add GPU span to blit()

It exports a sync point so there's a flush we can piggyback off.

Gate GPU profiling behind new tracy_gpu_profiling feature flag

Add profiling scope to QueryPool::collect()

Implement deleting timestamp queries

Bump tracy-client

Handle missing timer query extension

Document methods

Add GlesFrame::with_gpu_span() instead of the error-prone manual API

Make ScopedGpuSpan use a shared borrow making it possible to nest them

The enter() in render texture actually had a mistake where early error
returns were possible without exit(), which would cause an assertion
failure.

Co-authored-by: Christian Meissl <[email protected]>
Lets you differentiate between them easier in Tracy.
@YaLTeR YaLTeR force-pushed the tracy-gpu-profiling branch from f603c24 to ebadbd9 Compare January 5, 2026 13:58
@YaLTeR
Copy link
Contributor Author

YaLTeR commented Jan 5, 2026

My previous squash moved the Co-authored-by line to the middle of the message 🤦

Copy link
Member

@Drakulix Drakulix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @cmeissl Do you want to take a quick look at this as well?

@cmeissl
Copy link
Collaborator

cmeissl commented Jan 5, 2026

LGTM. @cmeissl Do you want to take a quick look at this as well?

Yes, will take a look tomorrow latest.

If these checks aren't hit it's not the end of the world, so let's leave
them debug-only.
@ids1024
Copy link
Member

ids1024 commented Jan 7, 2026

The instrumentation needs to regularly call collect() (ideally before a batch of GPU work) and it needs to call sync_gpu_time() after a batch of GPU work. But ideally not in the middle of ongoing GPU work to avoid stalls. Currently I put these in render() and blit() -- the two places that create an EGL fence and flush GL at the end. And all GPU profiling spans are therefore also restricted to within render() and blit(), otherwise without collect() timestamps may build up and overflow, and without sync_gpu_time() they may drift. This misses some potentially interesting (?) places like buffer imports. Not sure what to do about this.

Yeah, doing this were we flush makes sense. Though it would be nice if Tracy could help us see what time is spent on buffer imports.


let context = client
.new_gpu_context(
Some(gpu_name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cosmic-comp uses separate surface threads for each output being rendered, each with their own shared EGL context.

This seems to result in the contexts for all connectors of the same name being given the same name in tracy. (Why CPU activity is labeled with profiling::register_thread!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you suggest naming these instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it looks like tracy does show the thread name as well when you hover over a GPU zone. So maybe that's okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, hovering over GPU zones highlights the corresponding CPU zone

Copy link
Member

@ids1024 ids1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks reasonable.

I need to get used to some of the relevant features in Tracy, but it seems to be working in cosmic-comp.

Copy link
Collaborator

@cmeissl cmeissl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up!

@Drakulix Drakulix merged commit df89244 into Smithay:master Jan 9, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants