perf record with --call-graph=fp --freq=max #1969

saethlin · 2024-08-23T22:38:40Z

I've been carrying this patch locally for months/years; @Noratrieb asked me to make this PR.

I only find profile_local perf-record useful with this patch applied, because otherwise the profile doesn't have enough samples to make anything of the profile data. And since we're sampling as fast as possible to get a reasonable signal-to-noise ratio on a microbenchmark, we need to use frame pointers. Which are now enabled by default in the compiler profile, (and are enabled in the distributed standard library too!).

Kobzol · 2024-08-24T07:48:20Z

Interesting! I wonder if dwarf produces better output than fp if you have debuginfo enabled. I usually enable debuginfo when profiling the compiler, but that of course doesn't help when profiling the distributed optimized artifacts :) I'm just wondering whether it would make sense to make this configurable, or somehow detect if the compiler has debuginfo (but that sounds way overkill).

Btw, how much RAM do you have? 😆 I tried to generate a profile for cargo/Full/Debug with --freq=max and doing perf report on the results OOMs with 32 GiB of RAM (not all of it is available though).. I'm not sure why though, the perf.data result only has about half a gig, which doesn't sound that bad. With --freq=997 (which is what I normally use), it's only 7 MiB on disk and perf report works. Maybe we could find some compromise that would make the recording have a high sampling rate, but still be usable, as max seems like it might be too much for some benchmarks.

Noratrieb · 2024-08-24T09:27:25Z

dwarf is more precise (it knows about inlined functions) than fp, but produces a lot more data and is therefore more keen to just break. therefore it also requires the lower frequency, which in turn makes it less precise. fp generally works better and faster, but can't know about inlined functions.

saethlin · 2024-08-24T17:20:13Z

I usually enable debuginfo when profiling the compiler, but that of course doesn't help when profiling the distributed optimized artifacts :)

I always set debuginfo-level = 1 because it doesn't turn off optimizations. Variable-level debuginfo is pretty much just a waste anyway with optimizations enabled.

I tried to generate a profile for cargo/Full/Debug with --freq=max and doing perf report on the results OOMs with 32 GiB of RAM (not all of it is available though)..

That's unfortunate. On my system this recording peaks at 19.6 GB memory usage; 20x blowup for in-memory data structures is pretty typical in my experience. My system has 128 GB of memory but that's not really relevant because the perf-report UI becomes unusably slow when you load this much data into it.

Granted, I'd never use this for profiling primary benchmarks. We should probably set a lower frequency for them.

and is therefore more keen to just break.

Yup. About half the time I reach for it, perf with dwarf callgraphs is completely unusable due to bugs in perf. It crashes when loading its recordings, with a variety of errors depending on your kernel/perf version. Occasionally it just segfaults.

Kobzol · 2024-08-24T18:43:46Z

Makes sense. So I would suggest this:

Switch the frame resolving mode from dwarf to fp
Make the default freq 997 (or something like that), so that it is still reasonably possible to run the profiler on all of our benchmarks, as it could be done before.
Make it possible to override the frequency through a CLI flag (or just an ENV variable, to make it simpler to thread through rustc-perf, although it would be a bit opaque), so that you can set it to a higher frequency for smaller benchmarks.

I can do 3) in a follow-up PR if you don't want to deal with it.

Btw, what do you use to postprocess/analyze the perf.data file (apart from perf report)?

perf record with --call-graph=fp --freq=max

3e19ac0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf record with --call-graph=fp --freq=max #1969

perf record with --call-graph=fp --freq=max #1969

saethlin commented Aug 23, 2024

Kobzol commented Aug 24, 2024 •

edited

Loading

Noratrieb commented Aug 24, 2024

saethlin commented Aug 24, 2024 •

edited

Loading

Kobzol commented Aug 24, 2024

perf record with --call-graph=fp --freq=max #1969

Are you sure you want to change the base?

perf record with --call-graph=fp --freq=max #1969

Conversation

saethlin commented Aug 23, 2024

Kobzol commented Aug 24, 2024 • edited Loading

Noratrieb commented Aug 24, 2024

saethlin commented Aug 24, 2024 • edited Loading

Kobzol commented Aug 24, 2024

Kobzol commented Aug 24, 2024 •

edited

Loading

saethlin commented Aug 24, 2024 •

edited

Loading