Skip to content

Implement Occurs Check #7811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 10, 2025
Merged

Implement Occurs Check #7811

merged 4 commits into from
Jun 10, 2025

Conversation

jaredramirez
Copy link
Collaborator

@jaredramirez jaredramirez commented May 29, 2025

Overview

This PR implements the occurs check for type variables. This means, for a given type variable:

  • Resolve it to its type (ie follow redirects until we get a concrete type)
  • Recursively unwrap all subtype variables for the type (ie all tags in tag unions, record fields in records, etc)
  • Check if we run into a variable we've seen already before

This is the 1st step in the larger project of supporting recursive types in the type system, but only for nominal types – as outlined here. This check is not yet used anywhere, that will come in a follow-up PR.

There has been some discussion in Zulip recently around if it's possible to allow non-nominal-recursion. Depending on how that shakes out, this implementation may change in the future.

Memory Management

During the occurs check, there are things we need to write down:

  • What variables we've seen so far (pushed & popped for each recursive branch)
  • All the variables we've visited and confirmed are not recursive (to avoid redundant work)
  • In the case of an error, what the chain of recursive types was (for error messages)

To capture this, we use a Scratch struct that allocates some array lists that can be re-used across many occurs check. This is the same pattern that unify uses to hold intermediate data.

Differences from the Rust Compiler

This implementation has not significant difference from the Rust compiler version. Originally, I did things slightly differently, but ultimately decided that it probably made the most sense to match the semantics of the Rust version. That said, if those reviewing like the original approach, we can easily revert.

Original "Differences from the Rust Compiler"

The Rust implementation of occurs takes a different approach to avoiding redundant work

  • It uses each variable’s Mark field to track visited state during the check
  • After the traversal, it resets all visited marks

This requires:

  • Mutably updating Mark values on each visited descriptor
  • Tracking visited variables separately in order to reset them

The Zig version does not use Mark at all. Instead:

  • It tracks visited variables explicitly in an array
  • Redundancy is avoided by checking this array during traversal

This results in the same number of allocations, but with two key differences:

  1. No need to mutate or reset Mark fields on descriptors
  2. Possibly slightly worse per-step performance due to linear O(n) check of the already visited array
    • Note that both the Zig and Rust versions do a linear scan of seen variables, though the total number of seen vars for any given recursive branch would be less than the total visited nodes.

That said, I'm unsure what the average number of variables in a type is, but this approach assumes it would be relatively small. (We could also use a Map to check for seen/visited vars, which may perform better than an array)

@jaredramirez jaredramirez self-assigned this May 29, 2025
@jaredramirez jaredramirez marked this pull request as ready for review May 29, 2025 23:53
@jaredramirez jaredramirez force-pushed the push-ovpsosyowlzp branch 9 times, most recently from 2d6d773 to c03d11e Compare June 1, 2025 21:04
@jaredramirez jaredramirez marked this pull request as draft June 1, 2025 22:59
@jaredramirez jaredramirez force-pushed the push-ovpsosyowlzp branch 6 times, most recently from 86e6e50 to ba1b414 Compare June 7, 2025 21:11
@jaredramirez jaredramirez marked this pull request as ready for review June 7, 2025 21:11
/// _only_ modifies a variable's `Mark`. Before returning, all visited nodes'
/// `Mark`s will be reset to `none`.
///
/// TODO: See if there's a way to represent this ^ in the type system? If we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so! I think calling .items(.mark) on the MultiArrayList should get you a slice of just the marks, which would mean you could:

  • continue to take types_store: *Store
  • pass a *const Store and then a mutable slice of just the Marks to a helper function
  • now we know the helper only mutates the Marks

} else if (self.scratch.hasSeenVar(root.var_)) {
// Recursion point! We've already seen this var during traversal.
if (ctx.recursion_allowed and ctx.seen_nominal) {
// If recursion is allowed (we've passed through a Box, List or
Copy link
Contributor

@rtfeldman rtfeldman Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Box, List

😍 😍 😍

}

fn appendSeen(self: *Self, var_: Var) void {
_ = self.seen.append(self.gpa, var_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In working on unify I've actually personally been liking having things do std.mem.Allocation.Error!void and actually use try (instead of the exit on oom thing) - what do you think of doing more of that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm into that, imo the more explicit the points of failure are the better. And tracking the error in the type sigs would make it easier to understand what's allocating and what's not, at a glance

@@ -5,5 +5,8 @@ test {
testing.refAllDeclsRecursive(@import("main.zig"));
testing.refAllDeclsRecursive(@import("snapshot.zig"));
testing.refAllDeclsRecursive(@import("builtins/main.zig"));

// TODO: Remove after hooking up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed now? 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this module isn't yet used in any codepath, it'll be DCE'd. But i'll remove in a subsequent PR that hooks occurs up into unify!

rtfeldman
rtfeldman previously approved these changes Jun 9, 2025
Copy link
Contributor

@rtfeldman rtfeldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaredramirez this looks SWEET!!! 🤩

I left a few suggestions, but nothing blocking!

This change switches back from the "new" way to keep track of visited
node in the occurs check to set Marks, like the rust compiler does.
@jaredramirez jaredramirez merged commit 425602c into main Jun 10, 2025
23 checks passed
@jaredramirez jaredramirez deleted the push-ovpsosyowlzp branch June 10, 2025 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants