Skip to content

Generalize 'inline' hints to 'instruction_frequency' or 'block_frequency'? #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eqrion opened this issue Jun 18, 2024 · 5 comments

Comments

@eqrion
Copy link
Contributor

eqrion commented Jun 18, 2024

I've been trying to understand how to interpret the 'inline' hint section.

If I understand correctly, it only specifies how many times a call instruction is executed during an average execution of the function that contains it. This can be used to hint that a call is 'cold' and so even if the call target is completely monomorphic and inlineable, it may not be worth inlining.

This seems to be generally useful, and not specific to call instructions. Could we generalize so that these hints can point at any instruction? If we did that we could rename this to something like 'instruction_frequency' (or something else).

There's some strong overlap here with branch hinting, because the frequency of an instruction getting executed is really tied to how often its block is executed, which is tied to how often the branches that target it execute. So an alternate design could be to specify how frequently blocks/branches are executed, instead of the call instruction within them.

@ecmziegler
Copy link
Collaborator

We could do that, sure. Such information is very expensive to collect through instrumentation, but could be easily obtained through sampling-based profiling with incomplete but likely good enough information to be interesting.

I haven't thought much about what we could to with that information, but there is certainly potential. We could e.g. decide not to do certain expensive optimizations within rarely executed blocks (e.g. loop unrolling). Knowing that a loop gets executed 10000 times vs only once could make a difference.

@eqrion
Copy link
Contributor Author

eqrion commented Jun 20, 2024

We could do that, sure. Such information is very expensive to collect through instrumentation, but could be easily obtained through sampling-based profiling with incomplete but likely good enough information to be interesting.

I haven't thought much about what we could to with that information, but there is certainly potential. We could e.g. decide not to do certain expensive optimizations within rarely executed blocks (e.g. loop unrolling). Knowing that a loop gets executed 10000 times vs only once could make a difference.

I agree that collecting this information for all blocks would be expensive, and I don't have a big use-case for it outside of call instructions yet.

It seems like there wouldn't be much of a cost to generalizing this hint, as long as we can let the profilers (that record the hints) decide how much of this information is worth collecting and shipping in a binary. I'd guess they would by default only collect for call sites.

@titzer
Copy link
Contributor

titzer commented Jun 20, 2024

I like the idea of generalizing to instruction frequency. Personally I wouldn't over-index on how expensive it is to collect instruction frequency, because there are valid offline profiling scenarios where instrumentation cost isn't an issue. E.g. we're looking at using Wizard to analyze large repeatable program traces now and it's easy to run them over and over again with different analyses.

@ecmziegler
Copy link
Collaborator

We can certainly allow for additional hints here. I just want to limit the scope of the first version to quantities that we can reliable generate and that will impact optimizations in at least one engine.

I will try to come up with a list of potential annotatable instruction types and what we could use them for so people understand if it's worth collecting (and adding, because binary size is also a factor here) such information.

Generally, I want all of these optimizations to be extensible and optional, so adding future frequency annotations will be backwards compatible. Perhaps this needs to made clearer in the overview.

@ecmziegler
Copy link
Collaborator

I've been toying with the idea a little and I think we have three options:

  1. We can allow call frequencies to be added anywhere and leave it up to the engine which ones to use. -- I think this is not very desirable because without any guidance where to put hints, it's hard to make the right call for tool chains where to generate them.
  2. We can allow frequencies at specific sites that we consider interesting for optimizations. At the moment, these would be calls and loops. Other block-level information for infrequently used blocks are likely sufficiently well covered by the binary information provided in branch hints. -- This is currently my preferred option and is open to future extensions.
  3. We could extend the branch hints proposal to allow more quantitative information and extend its scope to other branching instructions like br_table and potentially also exception handling (though one can argue that this should always be so unlikely that it's never worth optimizing for). In theory, we could extract call frequencies for any instruction from that, but it requires modeling the program flow in the engine just to get useful information out of it. -- The complexity doesn't seem to be worth the effort, especially since any missing annotation spoils the whole model.

Option 2 would cover inlining, loop unrolling/peeling and deferred blocks. I'm wondering if there's an optimization use case for block-level annotations that I've overlooked that would require additional annotations.

The instruction_frequency name should allow extending the list of annotations in a semantically meaningful way in the future if needed.

ecmziegler added a commit that referenced this issue Jun 11, 2025
…15)

This change renames the inline section to instr_freq and allows such frequencies for arbitrary instructions with the given frequencies applying to all following instructions.

It also changes the convoluted logarithmic scheme to a more transparent log2 based scheme at an acceptable loss of resolution.

This implements the changes requested in issue #8.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants