Skip to content

Optimize OperatorsReader control stack #2241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 24, 2025
Merged

Conversation

Robbepop
Copy link
Collaborator

@Robbepop Robbepop commented Jun 18, 2025

Alternative or addition to #2228.

The OperatorsReader control stack that has been introduced in #2134 always contains at least one frame which causes this common path to always require heap allocations. Since the OperatorsReader type is used pervasively during parsing, validation and deconstructing init-expressions (or const-expressions) those heap allocations quickly sum up. Especially large Wasm modules could have hundreds or tousands of init-expressions.

This PR introduces a new ControlStack type that optimizes this by not requiring heap allocations for just a single frame on the control stack.

Inline annotations have been added where necessary to improve performance. Every such annotation has been benchmarked.

Benchmarks comparing main with this PR:

image

cc @alexcrichton @keithw

@Robbepop Robbepop requested a review from a team as a code owner June 18, 2025 14:21
@Robbepop Robbepop requested review from pchickey and removed request for a team June 18, 2025 14:21
@Robbepop Robbepop requested a review from alexcrichton June 18, 2025 15:14
@pchickey pchickey removed their request for review June 18, 2025 17:45
@alexcrichton
Copy link
Member

Thanks! I'm personally a bit hesitant to go the route of micro optimizations like this just yet mostly insofar that I don't believe this composes with #2228. If doing a pure parse in isolation then this complements #2228 but if doing a parse with a validator this won't be used (and that's the main case that I'd like to optimize for)

@Robbepop
Copy link
Collaborator Author

Robbepop commented Jun 23, 2025

@alexcrichton thanks for your reply. It seems that I have overlooked the fact that #2228 also provides a solution for init-expressions. When I have reviewed #2228 some days ago I didn't notice that. Though, this PR might still have positive impacts on performance since it inlines the top-most block and simplifies some users that do not care about the initialization of the operators reader - though in all honesty for those users performance might not be important.

Due to the results in #2180 (comment) where boths PRs showed improvements in different test cases I was under the impression that both PRs compose well with each other.

@alexcrichton
Copy link
Member

To be clear I agree this PR improves performance, but what it's improving, assuming #2228 is merged, is parse-without-validation performance. The validator won't use this optimization at all because it has its own stack of frames which doesn't have the top frame cached in the structure. Put another way the gains on validate/spidermonkey won't be additive, it's a this-pr-or-that-pr in terms of performance gains.

@Robbepop
Copy link
Collaborator Author

Robbepop commented Jun 23, 2025

To be clear I agree this PR improves performance, but what it's improving, assuming #2228 is merged, is parse-without-validation performance. The validator won't use this optimization at all because it has its own stack of frames which doesn't have the top frame cached in the structure. Put another way the gains on validate/spidermonkey won't be additive, it's a this-pr-or-that-pr in terms of performance gains.

I get that those PRs focus on different areas.

Though, to Wasmi, parse-without-validation performance matters too, since performance sensitive users can skip validation with this API which is used by some users when they can assert otherwise that they are operating with pre-validated Wasm binaries: https://docs.rs/wasmi/latest/wasmi/struct.Module.html#method.new_unchecked

I can understand if you do not want to go into the direction of this PR. Maybe a different solution is needed.

@alexcrichton
Copy link
Member

Ah ok, that makes sense. With the FrameStack trait added in #2228 it should in theory be possible to build the optimization in this PR both in-tree and out-of-tree. The implementation in-tree should be pretty minimal though so seems ok to add too. Would you be up for rebasing this?

@Robbepop
Copy link
Collaborator Author

Ah ok, that makes sense. With the FrameStack trait added in #2228 it should in theory be possible to build the optimization in this PR both in-tree and out-of-tree. The implementation in-tree should be pretty minimal though so seems ok to add too. Would you be up for rebasing this?

Sure, I will rebase on top of #2228 and see how the combined changes impacts benchmarks.

@keithw
Copy link
Contributor

keithw commented Jun 23, 2025

@Robbepop, I didn't realize Wasmi had a new_unchecked mode -- that's cool.

I took a look at how Wasmi is using wasmparser, and I think it may be possible for Wasmi to avoid ever creating awasmparser::OperatorsReader (and its internal Vec<FrameKind>) at all at this point. I think the FuncTranslator can probably implement the FrameStack trait using its ControlStack member, and then call BinaryReader::visit_operator directly. This is similar to what the wasmparser FuncValidator does. I'm hopeful this would get the Wasmi benchmarks back to where they were before #2134.

@Robbepop
Copy link
Collaborator Author

Robbepop commented Jun 23, 2025

@Robbepop, I didn't realize Wasmi had a new_unchecked mode -- that's cool.

Thanks! For Wasmi the new_unechecked module creation kinda is more useful than it probably is in Wasmtime since in combination with Wasmi's lazy translation it can translate Wasm modules so quickly that caching of "precompiled" serialized artifacts isn't even needed. This is why I care so much about wasmparser performance.

I took a look at how Wasmi is using wasmparser, and I think it may be possible for Wasmi to avoid ever creating awasmparser::OperatorsReader (and its internal Vec<FrameKind>) at all at this point. I think the FuncTranslator can probably implement the FrameStack trait using its ControlStack member, and then call BinaryReader::visit_operator directly. This is similar to what the wasmparser FuncValidator does. I'm hopeful this would get the Wasmi benchmarks back to where they were before #2134.

Yes you are right about how or where Wasmi can make use of the new FrameStack trait. I am eager to see how this impacts performance. Unfortunately, I won't really have time to do this until next week. I will inform you both here or in the other thread once I got there.

@Robbepop
Copy link
Collaborator Author

Robbepop commented Jun 24, 2025

I think the FuncTranslator can probably implement the FrameStack trait using its ControlStack member, and then call BinaryReader::visit_operator directly.

@keithw I checked today. Is it possible to change the signature of FrameStack::current_frame from Option<&FrameKind> to Option<FrameKind>? This would allow to efficiently convert from other representations of frames into wasmparser::FrameKind. Otherwise, I'd need to use wasmparser::FrameKind which is not very usable for Wasmi's FuncTranslator otherwise.

@keithw
Copy link
Contributor

keithw commented Jun 24, 2025

Sure, no objection from me.

@Robbepop
Copy link
Collaborator Author

@alexcrichton I rebased this PR on main.

Benchmarks show that validation benchmarks are not affected but parse benchmarks are, as expected:
image

@Robbepop Robbepop force-pushed the rf-try-opt-parser-stack branch from 9dc19b5 to cd6b84b Compare June 24, 2025 15:28
@Robbepop
Copy link
Collaborator Author

Robbepop commented Jun 24, 2025

I resolved conflicts and rebased on main. cc @alexcrichton

@alexcrichton alexcrichton enabled auto-merge June 24, 2025 15:30
@alexcrichton alexcrichton added this pull request to the merge queue Jun 24, 2025
Merged via the queue into main with commit 78f9dc8 Jun 24, 2025
32 checks passed
@alexcrichton alexcrichton deleted the rf-try-opt-parser-stack branch June 24, 2025 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants