Here you can find definitions for words that are commonly used in the compiler along with links to the codebase. Check https://www.roc-lang.org/tutorial if you want to know about general Roc terms. Feel free to ask for a term to be added or add one yourself!

Contributor note: definitons should be roughly ordered like in a tutorial, e.g. Parser should be explained before Canonicalization.

CLI

Command Line Interface. The entrypoint of the compiler that brings together all functionality in the Roc toolset and makes it accessible to the user through the terminal, e.g. roc build main.roc.

new compiler: src/main.zig
old compiler: crates/cli/src/main.rs

Module

A .roc file forms one module.

Types of modules:

app (example): Applications are combined with a platform and compiled into an executable.
module (example): Provide types and functions which can be imported into other modules.
package (example): Organises modules to share functionality across applications and platforms.
platform (example): Provides memory management and effects like writing to files, network communication,... to interface with the outside world. Detailed explanation.
hosted (example): Lists all Roc types and functions provided by the platform.

Implementation:

new compiler:
- processing of modules
- folder with lots of module related things
old compiler:
- module folder

IR

(Intermediate Representation)

Interning

A memory optimization technique where only one copy of each distinct value is stored in memory, regardless of how many times it appears in a program or IR. For example, a function named foo may be called many times in a Roc file, but we store foo once and use an index to refer to foo at the call sites.

Uses of interning:

new compiler: collections/SmallStringInterner.zig, ident.zig, ModuleEnv.zig, tokenize.zig, ...
old compiler: small_string_interner.rs, mono_module.rs, format.rs, ...
There are many more uses of interning, I recommend searching for "interner" (case-insensitive).

Identifier

Any text in a Roc source file that has significant content, but is not a Roc Str like "Hello". Used for variable names, record field names, type names, etc. .

During tokenization all identifiers are put into a deduplicated collection and given an ID. That ID is used in IRs instead of the actual text to save memory.

Identifier in the compiler:

new compiler:
- Ident
- Ident tokenization: check the functions chompIdentLower and chompIdentGeneral, and their uses.
- Ident parsing: search Ident
old compiler:
- IdentStr
- module/ident.rs
- parsing: search "identifier" (case-insensitive)

Keyword

A specific word that has a predefined meaning in the language, like crash, if, when, ... . Many keywords can not be used as a variable name. We have an overview of all Roc keywords.

Keywords in the compiler:

new compiler
old compiler

Operator

An operator is a symbol or keyword that performs a specific operation on one or more operands (values or variables) to produce a result. Some examples: +, =, ==, >. A table of all operators in Roc. + is an example of binary operator because it works with two operands, e.g. 1 + 1. Similarly ! (e.g. !Bool.false) is a unary operator.

Operators in the compiler:

New compiler: search Op in tokenize.zig
Old compiler: search operator_help in expr.rs

Syntax

Syntactic Sugar

Syntax within a programming language that is designed to make things easier to read or express. It allows developers to write code in a more concise, readable, or convenient way without adding new functionality to the language itself.

Desugaring converts syntax sugar (like x + 1) into more fundamental operations (like Num.add(x, 1)).

A table of all operators in Roc and what they desugar to

Desugaring in the compiler:

New compiler: canonicalize.zig (WIP)
Old compiler: desugar.rs

Compiler Phase

A compiler phase is a distinct stage in the process the compiler goes through to translate high-level source code into machine code that a computer can execute. Compilers don’t just do this in one big step, they break it down into several phases, each handling a specific task. Some examples of phases: tokenization, parsing, code generation,... .

Compiler Pass

Tokenization

The process of breaking down source code into smaller units called tokens. These tokens are the basic building blocks of a programming language, such as keywords, identifiers, operators, and symbols. The input code is scanned character by character and is grouped into meaningful sequences based on the language's syntax rules. This step makes parsing simpler.

Example source code:

module []

foo : U64

Corresponding tokens:

KwModule(1:1-1:7),OpenSquare(1:8-1:9),CloseSquare(1:9-1:10),Newline(1:1-1:1),
Newline(1:1-1:1),
LowerIdent(3:1-3:4),OpColon(3:5-3:6),UpperIdent(3:7-3:10),Newline(1:1-1:1)

New compiler:

tokenize.zig

Old compiler:

We did not do a separate tokenization step, everything happened in the parser.

AST

(Abstract Syntax Tree)

An AST organizes and represents the source code as a tree-like structure. So for the code below:

module []

foo : U64

The AST is:

(file
    (module (1:1-1:10))
    (type_anno (3:1-4:4)
        "foo"
        (tag (3:7-3:10) "U64")))

It captures the meaning of the code, while ignoring purely syntactic details like parentheses, commas, semicolons,... . Compared to raw source code, this structured format is much easier to analyze and manipulate programmatically by the next compiler phase.

The AST is created by the parser.

New compiler:

See the Node struct in this file.
You can see examples of ASTs in the .txt files in this folder.

Old compiler:

See FullAst here
Some tests
Many snapshot tests

Parsing

Symbol

Closure

Canonicalization

Lambda Set

Type Inference

Monomorphization

(mono, specialization)

Monomorphization, also known as type specialization, is the process of creating a distinct copy of each instance of a generic function or value based on all specific usages in a program. For example; a function with the type Num a -> Num a may only be called in the program with a U64 and a I64. Specialization will then create two functions with the types U64 -> U64 and I64 -> I64. This trades off some compile time for a much better runtime performance, since we don't need to look up which implementation to call at runtime (AKA dynamic dispatch).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Glossary.md

Glossary.md

CLI

Module

IR

Interning

Identifier

Keyword

Operator

Syntax

Syntactic Sugar

Compiler Phase

Compiler Pass

Tokenization

AST

Parsing

Symbol

Closure

Canonicalization

Lambda Set

Type Inference

Monomorphization

Type Checking

Reference Count

Alias Analysis

Code Gen

Host

Linking

Surgical Linker

Legacy Linker

Glue

WASM

Files

Glossary.md

Latest commit

History

Glossary.md

File metadata and controls

CLI

Module

IR

Interning

Identifier

Keyword

Operator

Syntax

Syntactic Sugar

Compiler Phase

Compiler Pass

Tokenization

AST

Parsing

Symbol

Closure

Canonicalization

Lambda Set

Type Inference

Monomorphization

Type Checking

Reference Count

Alias Analysis

Code Gen

Host

Linking

Surgical Linker

Legacy Linker

Glue

WASM