-
-
Notifications
You must be signed in to change notification settings - Fork 222
Description
#503 covers the initial investigation into WebAssembly. Due to funding limitations, that work was scoped to a 6-week effort. This issue is about the next phase: productionizing the current prototype.
Note
We are seeking $20k $12k US in funding to make this project possible. If you or your company could benefit from the portability and performance improvements of this work and could potentially help fund it, please get in touch with @pdubroy.
This project is now fully funded through the generous support of @Shopify! 🙏
Background
The goal of the MVP was:
The primary goal is to be able to produce a WebAssembly module (
.wasm
) from an existing Ohm grammar. The Wasm module would export amatch
function, which could be used just like the existingGrammar.match
method.At minimum, we also need an API (or ABI) for walking the parse tree.
This was successfully achieved, and the experiment showed that the Wasm version has significantly better performance on some real-world grammars (e.g. >10x on a 10k ES5 source file).
Goals
The next phase is to make the current prototype production-ready:
- Implement grammar features that are not supported in the MVP: parameterized rules, left recursion, missing primitive rules (
caseInsensitive
,lower
,upper
,UnicodeLtmo
). - Add implicit space skipping.
- Remove the input length limitation (support inputs >64k)
- (Maybe) support grammars with >256 rules.
- Implement error-handling.
- Implement a drop-in replacement for
toAST()
so at least some users could adopt the Wasm version without large-scale code changes. - Make this available in a published NPM package (either a new major version of
ohm-js
, or in a separate package, etc.@ohmjs/wasm
). - Documentation.
Non-goals
Eventual goals, but not included in this phase:
- Incremental parsing.
- Fully replacing the functionality of
pexprs-eval.js
with the WebAssembly version, and rewriting the semantics code to work with CSTs in Wasm linear memory.
Estimate
- Parameterized rules: 1w
- Left recursion: 1w
- Missing primitive rules: 1w
- Space-skipping: 1.5w
- Input length: 1d
- Error-handling: 2w
toAST
: 1.5w- Publish package: 1d
- Docs: 2d
- More testing/benchmarks: 1w
Total: 10w
Time-permitting:
- Work on reducing memory usage and/or parsing performance:
- Dynamic memory allocation for memo table (HAMTs?)
- More sophisticated memoization strategies
- Compile-time optimization for parsing expressions
Reduced scope (~7w):
- Leave out implicit space-skipping and left recursion.
- Minimal documentation
- Focus testing & benchmarking on some existing real-world grammars (e.g. Shopify's liquid-html grammars).