-
Notifications
You must be signed in to change notification settings - Fork 1
Refactor inline section into more general instruction frequencies #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
8e193ce
158ecc6
1e83a70
c1b9aff
212140e
e45e148
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,31 +49,43 @@ It is expected and even desired that not all functions are annotated to keep thi | |
*Note: This should be moved to `metadata.function.compilation_order` without the byte offset if such a namespace will be supported by custom annotations.* | ||
|
||
|
||
### Inlining | ||
### Instruction frequencies | ||
|
||
An engine might decide to inline certain call targets based on its own feedback collection or other hints (e.g. *call targets* section), but explicit hints can be added per call target and per function using the following annotations. | ||
Instruction frequencies might be useful to guide optimizations like inlining, loop unrolling, block deferrals, etc. Within a function, these frequencies inform which blocks lie on the hot path and deserve more expensive optimizations, as well as which are on the cold path and might even allow very expensive steps to even execute the code within (e.g. outlining or de-optimization). An engine can take those decisisions based on the call frequency observed, but cannot assume that any part of the code is unreachable based on the call frequency. | ||
|
||
The `metadata.code.inline` section contains instruction level annotations for all affected call sites. | ||
* *byte offset* |U32| from the beginning of the function to the wire byte index of the call instruction (this must be a `call`, `call_ref` or a `call_indirect`, otherwise the hint will be ignored) | ||
The `metadata.code.instr_freq` section contains instruction level annotations for optimizable instruction dequences. | ||
* *byte offset* |U32| from the beginning of the function to the wire byte index of an instruction (e.g. start of a loop or a block containing a call instruction). | ||
* *hint length* |U32| in bytes (always 1 for now, might be higher for future extensions) | ||
* *log call frequency* |U8| determining the estimated number of times the callee gets called per call of the caller. | ||
* *offset log2 call frequency* |U8| determining the estimated number of times the callee gets called per call of the caller. | ||
ecmziegler marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The call frequency can be thought of the estimated number of times a callee gets called during one call of the caller. It is a logarithmic value based on the formula $f = \max(1, \min(126, 10 \log_{10} \frac{n}{N} + 32))$ where $n$ is the number of callee calls from this call site and $N$ is the number of caller calls. | ||
Call frequencies are always relative to the surrounding function and therefore every instruction at the beginning of a function has an implicit frequency of 1 assigned to it. Each annotation remains valid for all following instructions until the next control flow instruction (`br`, `br_table`, `br_if`, `if`, `return`, `unreachable`, `br_catch`, `throw`). | ||
ecmziegler marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
For now, we expect annotations to only affect `call`, `call_ref`, `call_indirect` and `loop` instructions. Tools can focus on providing annotations for these instruction to make sure they have the expected impact without unnecessary binary bloat. In the future, this can be extended to more instructions if needed. The structure and the name of this annotation has been specifically chosen to allow for that. | ||
|
||
The instruction frequency can be thought of the estimated number of times a callee gets called during one call of the caller. It is a logarithmic value based on the formula $f = \max(1, \min(64, \log_{2} \frac{n}{N} + 32))$ where $n$ is the number of callee calls from this call site and $N$ is the number of caller calls. | ||
|
||
The actual decision which function should be inlined can be based on runtime data that the engine collected, additional heuristics and available resources. There is no guarantee that a function is or is not inlined, but it should roughly be expected that functions of higher call frequency are prefered over ones with lower frequency. | ||
ecmziegler marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Special values of 0 and +127 indicate that a function should never or always be inlined respectively. Engines should respect such annotations over their own heuristics and toolchains should therefore avoid generating such annotations unless there is a good reason for it (e.g. "no inline" annotations in the source). | ||
|
||
|log call frequency|calls per parent call| | ||
|-----------------:|:-------------------:| | ||
| 0| *never inline*| | ||
| 1| <0.0008| | ||
| 22| 0.1 | | ||
| 32| 1 | | ||
| 42| 10 | | ||
| 52| 100 | | ||
| 62| 1,000 | | ||
| 126| >2,511,886,432 | | ||
| 127| *always inline*| | ||
Special values of 0 and 127 indicate that a function should never or always be inlined respectively. Engines should respect such annotations over their own heuristics and toolchains should therefore avoid generating such annotations unless there is a good reason for it (e.g. "no inline" annotations in the source). | ||
|
||
|binary value|log2 call frequency|calls per parent call| | ||
|-----------:|------------------:|:-------------------:| | ||
| 0| |*never inline* | | ||
| 1| -31| <4.657e-10| | ||
| 8| -24| 5.960e-08| | ||
| 16| -16| 1.526e-05| | ||
| 24| -8| 0.00391 | | ||
| 28| -4| 0.0625 | | ||
| 30| -2| 0.25 | | ||
| 31| -1| 0.5 | | ||
| 32| 0| 1 | | ||
| 33| +1| 2 | | ||
| 34| +2| 4 | | ||
| 36| +4| 16 | | ||
| 40| +8| 256 | | ||
| 48| +16| 65536 | | ||
| 56| +24| 1.678e+07| | ||
| 64| +32| >4.295e+09| | ||
| 127| |*always inline* | | ||
|
||
If the *byte offset* is 0, the hint applies to all call sites where the function is the **target**. It serves as a shorthand notation unless explicitly overridden. In this case, the call frequency should be a rough estimate of the average call frequency of all potential sites. *Note: This should likely be moved to a dedicated section for clearer separation, e.g. `metadata.function.inline` if such a namespace will be supported by custom annotations.* | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this part still make sense if we have moved to general instruction frequencies? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I'm not too happy about that part and I'd be in favor of dropping it unless tool chains insist on it. |
||
|
||
|
Uh oh!
There was an error while loading. Please reload this page.