Skip to content

Production-ready WebAssembly support #511

@pdubroy

Description

@pdubroy

#503 covers the initial investigation into WebAssembly. Due to funding limitations, that work was scoped to a 6-week effort. This issue is about the next phase: productionizing the current prototype.

Note

We are seeking $20k $12k US in funding to make this project possible. If you or your company could benefit from the portability and performance improvements of this work and could potentially help fund it, please get in touch with @pdubroy.

This project is now fully funded through the generous support of @Shopify! 🙏

Background

The goal of the MVP was:

The primary goal is to be able to produce a WebAssembly module (.wasm) from an existing Ohm grammar. The Wasm module would export a match function, which could be used just like the existing Grammar.match method.

At minimum, we also need an API (or ABI) for walking the parse tree.

This was successfully achieved, and the experiment showed that the Wasm version has significantly better performance on some real-world grammars (e.g. >10x on a 10k ES5 source file).

Goals

The next phase is to make the current prototype production-ready:

  • Implement grammar features that are not supported in the MVP: parameterized rules, left recursion, missing primitive rules (caseInsensitive, lower, upper, UnicodeLtmo).
  • Add implicit space skipping.
  • Remove the input length limitation (support inputs >64k)
  • (Maybe) support grammars with >256 rules.
  • Implement error-handling.
  • Implement a drop-in replacement for toAST() so at least some users could adopt the Wasm version without large-scale code changes.
  • Make this available in a published NPM package (either a new major version of ohm-js, or in a separate package, etc. @ohmjs/wasm).
  • Documentation.

Non-goals

Eventual goals, but not included in this phase:

  • Incremental parsing.
  • Fully replacing the functionality of pexprs-eval.js with the WebAssembly version, and rewriting the semantics code to work with CSTs in Wasm linear memory.

Estimate

  • Parameterized rules: 1w
  • Left recursion: 1w
  • Missing primitive rules: 1w
  • Space-skipping: 1.5w
  • Input length: 1d
  • Error-handling: 2w
  • toAST: 1.5w
  • Publish package: 1d
  • Docs: 2d
  • More testing/benchmarks: 1w

Total: 10w

Time-permitting:

  • Work on reducing memory usage and/or parsing performance:
    • Dynamic memory allocation for memo table (HAMTs?)
    • More sophisticated memoization strategies
    • Compile-time optimization for parsing expressions

Reduced scope (~7w):

  • Leave out implicit space-skipping and left recursion.
  • Minimal documentation
  • Focus testing & benchmarking on some existing real-world grammars (e.g. Shopify's liquid-html grammars).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions