Skip to content
This repository was archived by the owner on May 18, 2025. It is now read-only.

Implement our own MDX parser #59

@NathanLovato

Description

@NathanLovato

This task is about replacing the MDX parser we use on GDSchool currently, remark, and the plugins we maintain for it, with our own MDX parser.

The MDX parser should take an MDX document, ideally valid, extract the content like imports and YAML frontmatter, and output a TSX file with the metadata as properties and a default export with a React component in React HTML format.

export const title = "..."
export const index = 2
export const previous_lesson = {
  title: "Module Overview",
  slug: "module_overview"
} 
export const next_lesson = {...}
export const module_title = "Top Down Movement"

export const Content = () => {
  return <>
    <h1 className="main-title">Character Controller</h1>
    ... 
  </>
}

Stretch goal: output in JavaScript instead using the React.createElement API to skip extra parsing steps in the build process.

export const Content = () => {
  return React.createElement('', {}, [
    React.createElement('h1', {className:'main-title'}, ["Character Controller"])
  ])  
} 

MDX processing needs

We maintain our own remark plugins to make some MDX components easier to write in the source documents. They apply the following transformations:

  • Sequences of Practice, Callout, and Searchable Components are wrapped into a container. More types of components may use this mechanism in the future.
  • Child components of Practice, YourTurn, and Challenge components are turned into properties of the parent component. For example, all the hint elements are turned into an array of hints in the parent component.

We need to replicate this behavior in our MDX parser.

Markdown code block parsing needs

We need to turn Markdown code fences into a specific HTML structure. We need to parse and highlight the GDScript code. Options include using a PEG grammar with nim's npeg library, writing our own specialized GDScript parser for highlighting, or passing the code to an external program like prism.js and injecting the result back. The existing build system uses prism.js within nextjs's build system.

Code fences should be turned into this pre and code structure:

<pre className="gdquest-code-container"><code className="gdquest-code">
// code here
</code></pre>

If the code block has the diff attribute (if the language is diff-gdscript for example), we need to insert a class for every line that has a plus or a minus sign at the start.

Markdown headings parsing

We need to extract the H1 heading to use it as a title fallback if a title is not specified in the YAML front matter of the document. We may also need to read the H2 headings to create a table of contents.

Front matter parsing

We use the YAML format for the front matter. We just need to parse it using a YAML parser and inject optional fields or metadata if they are missing. The main two pieces of metadata are title and unlocked, which should be false by default if not specified.

Development

To approach this project, I would:

  • Look into reusing an existing Markdown parser for Nim, such as nim-markdown, as it is implemented in Nim and produces a token tree that we can traverse to generate the output we need. We have to see if it's usable as-is or if we need to fork it to support MDX-specific syntax like imports and exports.
  • Collecting pairs of input MDX files and output TSX files to guide development and test the parser against, to ensure it produces the expected output.

Parsed token structure

For rendering, Jad suggested creating a node tree where tokens represent HTML elements/properties already, so that a single function can render that.

Our only output will be html for the foreseeable future so it makes sense, so the parser could directly parse the markdown into editable tokens that represent an HTML structure.

Pseudo code example:

{ token: 'Practice',
  render: { tag: 'section', class: 'gdquest-practice'}
  children: [{
    token: 'Requirement',
    render: { tag: 'div', class: 'gdauest-requirement'}
    children:[
      { type: 'TEXTNODE', tag: '', contents: 'blah blah'}
    ]
  }]
}

But w. strong types and an object structure

Parsed node tree manipulation

Make an API a bit like Godot, for convenient manipulation, reordering, reparenting, deleting, etc.

This should allow manipulating the node tree easily before rendering the output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions