Skip to content

Pipeline: Lexing

Giorgio Garofalo edited this page Apr 30, 2025 · 13 revisions

← Back to Pipeline

Lexing

Main packages: core.lexer

Lexing is like breaking down a sentence into its individual words before figuring out the meaning of the sentence. Imagine you’re reading a paragraph, and before understanding the message, you first recognize individual words like nouns, verbs, and punctuation.

In Quarkdown, the lexing process scans a source file, which is nothing but a sequence of characters, and splits it into small pieces called tokens. Each token represents a different element, like a headings, a paragraph, or a bold text, and stores basic information like its type, its position in the text, and its textual content (lexeme).

Markdown recognizes two macro-categories of tokens: block tokens and inline tokens. The difference is based on how these elements are structured in the document:

  • Blocks are sections that define the outer structure of a document. For example a paragraph, a list, a heading, a code block, or a quote.

    # A heading
    
    A paragraph
    
    > A quote
    
    - A list
    - of multiple items
    Blocks
  • Inlines are elements that appear inside blocks and define, most commonly, textual features such as formatting. For example bold, italics, monospaced, links, images.

    A **formatted** _text_.

To accomplish this separation, two distinct lexers are supplied: a block lexer and an inline lexer, which extract their corresponding tokens.

Function calls are extracted both as blocks and inlines, with just a few differences between them.

At the beginning, only the block lexer is invoked. Once the source is broken down into its outer blocks, they are passed to the parser which is delegated to search for nested information.

See next: Parsing

Clone this wiki locally