Skip to content

Rewrite the HTML multipage splitter in Rust #298

Open
@domenic

Description

@domenic

HTML's build pipeline currently works by having Wattsi produce all versions of the standard: singlepage, multipage, dev edition (a sort of multipage), commit snapshot, and review draft.

As part of the general project to move stuff from the untested giant blob that is Wattsi, to more cleanly-factored Rust code in this repository, one nice step would be to move the splitting logic into Rust.

The goal is essentially to replace this part of Wattsi.

This would give us a good codebase with which to fix the problem noted in whatwg/html#5649 (comment), which is that Wattsi's splitting logic does not work unless all <hN>s are direct children of <body>. Which prevents us from moving to a <section>-based approach as described in whatwg/html#5649.

Suggested design:

  • Figure out how Watti's splitter logic interacts with its dev edition logic, and come up with a plan. My first guess would be to have Wattsi output a singlepage version of the dev edition, just like it outputs a singlepage "normal" edition, and then have the Rust-based splitter code run over both of them to produce multipage-dev and multipage-normal. But maybe that's not the right approach.

  • Update main.rs to add a new entrypoint for splitting, which takes a singlepage spec from stdin and writes the output to a folder specified as a command-line argument.

  • Write lots of little synthetic test cases for the splitting logic, similar to the ones in boilerplate.rs. Include ones where there are <section> wrappers. Be sure to include the fragment-links.json, table of contents creation, next/previous links, etc.

  • Make the tests pass. Be sure to use an efficient strategy, like Wattsi does, which minimizes tree walks and DOM node copies. Wattsi's strategy is roughly: do one walk to discover all the IDs and important DOM nodes. Create (# of sections) new documents, and move nodes from the original document into those new documents. The nodes that remain in the original document become the table of contents.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions