Tracy GPU profiling v2 #1889

YaLTeR · 2025-12-27T12:09:17Z

Supersedes #1134, cc @cmeissl. Depends on nagisa/rust_tracy_client#153.

Rebased things and reimplemented the timestamp query tracking. It now more or less matches what I do in https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3391 and should be correct.

This is still WIP:

~~still tied to tracy-client; my plan is to try abstracting it away, so the downstream compositor can plug in Tracy on its own (the API will be Tracy shaped though)~~
~~someone needs to verify that I rebased stuff correctly (that the GPU spans make sense)~~
is there a better place to call collect()? During my testing I already hit a case where I needed to add collect() to cleanup_texture_cache() since otherwise no-damage frames build up timestamps that don't get collected. Can we avoid this situation somehow?
~~probably need to call sync_gpu_time() in more places, for example things like import_shm_buffer are sometimes called on their own, and without a sync close by, their GPU timestamps drift~~
~~need to figure out how to delete the timestamp queries~~
~~I set the max queries to 1024 matching Mutter; we can bump this if needed (and replace inline array with Vec in that case I suppose)~~
The instrumentation needs to regularly call collect() (ideally before a batch of GPU work) and it needs to call sync_gpu_time() after a batch of GPU work. But ideally not in the middle of ongoing GPU work to avoid stalls. Currently I put these in render() and blit() -- the two places that create an EGL fence and flush GL at the end. And all GPU profiling spans are therefore also restricted to within render() and blit(), otherwise without collect() timestamps may build up and overflow, and without sync_gpu_time() they may drift. This misses some potentially interesting (?) places like buffer imports. Not sure what to do about this.

YaLTeR · 2025-12-27T12:14:18Z

Oh yeah, obligatory screenshots. Winit with some offscreen rendering, then the final render:

TTY:

YaLTeR · 2025-12-27T12:52:38Z

Hm, I notice that it likes to cause constant back and forth to some GL thread:

Possibly it's the glFlush() calls on exit(). Maybe need to bring it back from there to finish_internal().

It's a similar problem to sync_gpu_time() I guess. We need to call glFlush() when we know there won't be GPU work immediately thereafter, and we don't want to call it if we know that there will be GPU work immediately.

YaLTeR · 2025-12-27T15:43:15Z

I pushed a commit where I removed all profiler calls outside render() to avoid the collect() and sync_gpu_time() problem, also removed the glFlush() in exit. I think the GPU spans make much more sense this way? Here's an example when screencasting output with OBS:

YaLTeR · 2025-12-28T06:54:38Z

I added with_profiled_context() and manual enter/exit span on GlesFrame to make it possible for compositors to profile their custom render elements.

Then, in the last commit, I took a stab at making the thing optional. There are two ways I imagine this can go:

Have a GpuProfiler trait with methods for enter span, exit span, upload GPU timestamp. Receive an instance of this type when creating GlesRenderer.
Depend on Tracy and have a bunch of cfgs to make it optional.

Option 1 sounds enticing, however there's one problem with it: I'm not sure it's possible to plug in tracy-client's &'static span locations this way. Or in general with Rust, at least until maybe we have const fn trait methods. So we'd have to use allocated spans which is a bit meh.

So here I tried Option 2. It's kinda wordy with all the cfgs, but I tried to make it manageable. My goals were:

Make tracy-client dep optional. While it has its own enable/disable mechanism, it still links Tracy even when disabled, so it should be optional in Smithay.
Even when the instrumentation is enabled, leave the actual tracy-client enablement flags to the compositor (since it may want to have it on-demand or something else).
Keep the cfgs inside Smithay's profiler impl. It's just nice when you don't need to spam cfg in your compositor for profiling.

So, the profiler API is always present. If the new tracy_gpu_profiling feature is on, it calls into tracy-client and does the whole OpenGL timestamp querying. If the feature is off, it compiles into mostly empty stubs. The compositor can write GPU profiling instrumentation, then enable the features to make it active. Example here: YaLTeR/niri@745f4a9

One new question: should the GL_EXT_disjoint_timer_query generation in build.rs also be disabled when the feature is disabled, or is it fine to leave that in?

YaLTeR · 2025-12-28T07:07:44Z

And here's a screenshot showcasing multi-GPU rendering (niri always renders on the main GPU). Hopefully it looks right.

Drakulix

Overall I like the api quite well. I am not sure however, if we want to force tracy_gpu_profiling to enable renderer_gl.

After all, once we gain a Vulkan renderer, that might be able to add profiling with tracy and a similar api?

Without gl_renderer the whole module shouldn't be compiled anyway, right?

src/backend/renderer/gles/profiler.rs

Drakulix · 2026-01-05T12:28:06Z

build.rs

                "GL_EXT_texture_format_BGRA8888",
                "GL_EXT_unpack_subimage",
                "GL_OES_EGL_sync",
+                "GL_EXT_disjoint_timer_query",


This can stay. While generating code for unused extensions slightly increases the compile times for non-profile builds as well, I don't think it does much in the grand scheme of things.

YaLTeR · 2026-01-05T13:24:56Z

Overall I like the api quite well. I am not sure however, if we want to force tracy_gpu_profiling to enable renderer_gl.

After all, once we gain a Vulkan renderer, that might be able to add profiling with tracy and a similar api?

Without gl_renderer the whole module shouldn't be compiled anyway, right?

Maybe we can postpone this change until then? Right now the thing is very tied to GL and I'm not sure how exactly it will look with a second renderer backend.

YaLTeR · 2026-01-05T13:57:46Z

Alright, this should be done. Updated niri commit to the new GlesFrame::with_gpu_span() API: YaLTeR/niri@e558074

Screenshot of the new GPU names commit on my laptop:

Please squash when merging if I don't.

Remove profiler glFlush() and most calls outside GlesFrame Rename EnteredGpuTracepoint -> GpuSpan Expose profiling methods on GlesFrame Add GPU span to blit() It exports a sync point so there's a flush we can piggyback off. Gate GPU profiling behind new tracy_gpu_profiling feature flag Add profiling scope to QueryPool::collect() Implement deleting timestamp queries Bump tracy-client Handle missing timer query extension Document methods Add GlesFrame::with_gpu_span() instead of the error-prone manual API Make ScopedGpuSpan use a shared borrow making it possible to nest them The enter() in render texture actually had a mistake where early error returns were possible without exit(), which would cause an assertion failure. Co-authored-by: Christian Meissl <[email protected]>

Lets you differentiate between them easier in Tracy.

YaLTeR · 2026-01-05T13:59:19Z

My previous squash moved the Co-authored-by line to the middle of the message 🤦

Drakulix

LGTM. @cmeissl Do you want to take a quick look at this as well?

cmeissl · 2026-01-05T17:26:47Z

LGTM. @cmeissl Do you want to take a quick look at this as well?

Yes, will take a look tomorrow latest.

If these checks aren't hit it's not the end of the world, so let's leave them debug-only.

ids1024 · 2026-01-07T19:47:12Z

The instrumentation needs to regularly call collect() (ideally before a batch of GPU work) and it needs to call sync_gpu_time() after a batch of GPU work. But ideally not in the middle of ongoing GPU work to avoid stalls. Currently I put these in render() and blit() -- the two places that create an EGL fence and flush GL at the end. And all GPU profiling spans are therefore also restricted to within render() and blit(), otherwise without collect() timestamps may build up and overflow, and without sync_gpu_time() they may drift. This misses some potentially interesting (?) places like buffer imports. Not sure what to do about this.

Yeah, doing this were we flush makes sense. Though it would be nice if Tracy could help us see what time is spent on buffer imports.

ids1024 · 2026-01-07T20:18:38Z

src/backend/renderer/gles/profiler.rs

+
+            let context = client
+                .new_gpu_context(
+                    Some(gpu_name),


cosmic-comp uses separate surface threads for each output being rendered, each with their own shared EGL context.

This seems to result in the contexts for all connectors of the same name being given the same name in tracy. (Why CPU activity is labeled with profiling::register_thread!).

How would you suggest naming these instead?

Although it looks like tracy does show the thread name as well when you hover over a GPU zone. So maybe that's okay.

Yeah, hovering over GPU zones highlights the corresponding CPU zone

ids1024

This all looks reasonable.

I need to get used to some of the relevant features in Tracy, but it seems to be working in cosmic-comp.

cmeissl

Thanks for picking this up!

YaLTeR force-pushed the tracy-gpu-profiling branch from ad43d2f to 96ed23a Compare December 27, 2025 15:33

YaLTeR force-pushed the tracy-gpu-profiling branch from 96ed23a to 6bfc981 Compare December 28, 2025 06:40

YaLTeR force-pushed the tracy-gpu-profiling branch from 6bfc981 to 3f95c76 Compare January 5, 2026 12:04

Drakulix reviewed Jan 5, 2026

View reviewed changes

YaLTeR force-pushed the tracy-gpu-profiling branch from 3f95c76 to 01278f6 Compare January 5, 2026 13:19

YaLTeR force-pushed the tracy-gpu-profiling branch from 01278f6 to f603c24 Compare January 5, 2026 13:47

YaLTeR marked this pull request as ready for review January 5, 2026 13:57

YaLTeR and others added 3 commits January 5, 2026 16:58

Warn if timer query ext is not supported

eba5fdc

Use actual GPU name as GPU context name

ebadbd9

Lets you differentiate between them easier in Tracy.

YaLTeR force-pushed the tracy-gpu-profiling branch from f603c24 to ebadbd9 Compare January 5, 2026 13:58

Drakulix approved these changes Jan 5, 2026

View reviewed changes

Change drop asserts to debug_assert!()

0924f6e

If these checks aren't hit it's not the end of the world, so let's leave them debug-only.

ids1024 reviewed Jan 7, 2026

View reviewed changes

ids1024 approved these changes Jan 7, 2026

View reviewed changes

cmeissl approved these changes Jan 9, 2026

View reviewed changes

Drakulix merged commit df89244 into Smithay:master Jan 9, 2026
13 checks passed

Tracy GPU profiling v2 #1889

Tracy GPU profiling v2 #1889

Conversation

YaLTeR commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YaLTeR commented Dec 27, 2025

Uh oh!

YaLTeR commented Dec 27, 2025

Uh oh!

YaLTeR commented Dec 27, 2025

Uh oh!

YaLTeR commented Dec 28, 2025

Uh oh!

YaLTeR commented Dec 28, 2025

Uh oh!

Drakulix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Drakulix Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

YaLTeR commented Jan 5, 2026

Uh oh!

YaLTeR commented Jan 5, 2026

Uh oh!

YaLTeR commented Jan 5, 2026

Uh oh!

Drakulix left a comment

Choose a reason for hiding this comment

Uh oh!

cmeissl commented Jan 5, 2026

Uh oh!

ids1024 commented Jan 7, 2026

Uh oh!

ids1024 Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

YaLTeR Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

ids1024 Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

YaLTeR Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

ids1024 left a comment

Choose a reason for hiding this comment

Uh oh!

cmeissl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

YaLTeR commented Dec 27, 2025 •

edited

Loading