-
Notifications
You must be signed in to change notification settings - Fork 753
Optimizer Cookbook
All the wasm-opt
flags are documented in --help
, of course, but maybe it's not obvious which of the various flags are more important and worth trying first. This page has suggestions for that.
There is also a special page for Wasm GC, which has many more considerations.
--low-memory-unused
is a flag (not a pass) which means "addresses below 1024 are unused", which avoids load/store ptr overflows in situations that prevent certain optimizations. Specifically it can fold added constants into load/store offsets, which are smaller and more efficient.
You need to tell wasm-ld not to use address 1024 or below for globals (emcc
does it automatically) for that to be safe.
--gufa
is a new pass that is not on by default yet, so you need to run it manually. It infers constant values in a whole-program manner. It mostly helps wasm GC by inferring exact types and such, but it can also infer function results on wasm MVP that then lead to more benefits.
--flatten --rereloop -Oz -Oz
Flattening the IR is necessary for running "re-reloop" which completely rewrites the control flow graph. That's sometimes slow and so it's not on by default. But sometimes it helps by a few %.
-Oz
twice is useful after it, as flattening the IR requires additional work to clean up. One way to think about this is that wasm-opt
's default pipeline has been tuned on optimized LLVM output, and so if you give it something less optimized it might take more than one cycle of the pipeline. And the IR after flattening is less optimized than LLVM's optimized form, so it takes more work.
-tnh
is a flag that means "assume traps never happen" which lets the optimizer remove code on paths leading to traps (since it is allowed to assume they never happen in practice when the program runs).
That can interferes with things like crash reporting, if you save info right before crashing. And it will remove runtime asserts in the form of "if error, trap". But if you can live without those it can help. For example, if we assume traps never happen then we can move a load into an if arm and sometimes not run it (but if it trapped, we couldn't change the observable behavior of the trap).
--converge
runs all the opts you told it to in a loop while the file keeps shrinking. That is, --converge -Oz
will keep running all the passes in -Oz
until we reach a fixed point.
Usually the benefit of such additional cycles is limited, but sometimes it matters quite a lot, especially in larger programs.
Most passes look at a single function at a time, and when they see any call of another function they assume it can have arbitrary effects. Computing global effects lets the optimizer do better, by computing each function's effects and then using that. For example, if a function just returns an integer then it does not have any effects, and the optimizer will be able to move a local.set
past such a call.
To do this, use something like the following:
wasm-opt --generate-global-effects -O3
--generate-global-effects
computes the effects, which will then be used in later passes.
This is not automatically recomputed. That is, --generate-global-effects --A --B
will compute global effects once and then use them in both A and B, even if A decreased the effects of some function, which could have helped B. To compute effect info from scratch so it is maximally precise, add another invocation of it, --generate-global-effects --A --generate-global-effects --B
.
Note that the logic assumes that optimization passes normally only decrease effects. That is what makes it ok to not automatically recompute effects during and after each pass. As normal optimization passes keep the behavior of the code identical, that means that no effects are added (but perhaps some might be removed, e.g. if they were in code we realize is dead). Some special passes can add effects, for example if they add instrumentation, and Binaryen will automatically discard global effect info when such a pass is run. (If you write such a custom pass yourself, you will need to mark it as addsEffects
.)
--skip-pass=foo
will skip the pass foo
in the optimizer's normal pipeline. For example,
wasm-opt -O3 --skip-pass=coalesce-locals
will skip coalesce-locals
, which normally is run at least once in -O3
.
In general this should not be needed as the normal optimization pipeline does the right thing. But in some cases it can be useful to skip specific passes, for example, imagine that you want to optimize but not coalesce locals, perhaps because you will run some analysis or custom operation on them later, then skipping coalesce-locals
as just shown might help.
Sometimes it does not make sense to inline an entire function, but inlining part of it can still help. Imagine that a function looks like this:
function foo(x) {
if (!x) return;
// ..lots of heavy work..
}
If that early return is hit often then it might be useful to inline that if
, but not the rest of the work. That is, callers can do the check in-line, and only call the heavy work section if needed:
foo(x);
// partial inlining =>
if (x) {
foo_heavy_work(x);
}
To enable this, use the partial inlining flag:
wasm-opt --partial-inlining-ifs=1 [..]
The number is the number of ifs to potentially inline (a sequence of them can also be handled and not just a single one; values in the range of 1-4
might be worth experimenting with).
Binaryen can monomorphize functions based on their calls, that is, if a function has two callers it can create two duplicate functions, one for each caller. If each caller passes in a constant or a refined type, for example, then we can then specialize each of the new functions accordingly. This type of optimization happens automatically when inlining, since the passed values then get combined with the inlined code, and optimized together, while monomorphization combines the information about the call (the "call context") with the called code without inlining. It is therefore capable of finding opportunities where a call can be optimized even when inlining heuristics do not kick in.
Monomorphization will actually run the optimizer on the combined call context code + called code and see if we benefit from optimizing them together. Only if we benefit enough will we monomorphize.
For example, imagine that a function computes some complex mathematical value that depends on the inputs. Given certain inputs, perhaps we can compute that value at compile time. That would happen if we inline, or monomorphization can discover such opportunities where inlining misses them.
To monomorphize, use
wasm-opt --monomorphize
Note that you should run the full optimization pipeline after monomorphization as it can reduce the code size cost (by re-merging functions, applying inlining in new places, etc), and it can propagate the benefits of monomorphization onward. For example, a sequence you might try is to optimize both before and after:
wasm-opt -O3 --monomorphize -O3
By default monomorphization only operates where we see a very large benefit. You can adjust that with a flag:
wasm-opt --monomorphize --pass-arg=monomorphize-min-benefit@75
The minimum benefit is how much benefit we need to see for us to decide to optimize. "75" in this example means "75% benefit", so if monomorphization sees it removes 75% of the cost of the called code, it will optimize. That is, high values like 75% will monomorphize only when we see very significant benefits. There is a tradeoff here, as more monomorphization will increase code size, so it is worth experimenting with different values.