Skip to content

Refactor MIR graph rendering to use a unified traversal via GraphBuilder#132

Open
0xh4ty wants to merge 10 commits intoruntimeverification:masterfrom
0xh4ty:refactor/unified-graph-traversal
Open

Refactor MIR graph rendering to use a unified traversal via GraphBuilder#132
0xh4ty wants to merge 10 commits intoruntimeverification:masterfrom
0xh4ty:refactor/unified-graph-traversal

Conversation

@0xh4ty
Copy link
Contributor

@0xh4ty 0xh4ty commented Mar 2, 2026

Summary

This PR introduces a format-agnostic MIR graph traversal based on a new GraphBuilder abstraction. Traversal semantics are centralized in a single implementation, while individual renderers are responsible only for presentation.

As an initial consumer, the D2 renderer has been ported to this new traversal behind a parallel code path. The legacy D2 implementation is retained for now to allow safe comparison and validation.


What changed

  • Introduced a GraphBuilder trait describing semantic graph events
  • Added a single generic traversal (render_graph) that owns MIR graph structure and traversal order
  • Refactored the D2 renderer to implement GraphBuilder
  • Moved all semantic stringification to GraphContext
  • Traversal now passes full MIR structures (statements, terminators, call arguments) to renderers
  • Wired the new D2 path in parallel to the legacy one for validation
  • Verified that legacy and new D2 outputs are byte-for-byte identical across the test suite
  • Code passes both cargo fmt and cargo clippy

Why this change

Previously, each renderer duplicated traversal logic, which made the code harder to reason about and maintain. Centralizing traversal ensures:

  • a single source of truth for graph semantics
  • easier maintenance and future refactors
  • cleaner renderer implementations
  • safer extension to additional output formats

Validation and testing

The new traversal can be exercised by enabling the experimental D2 path:

SMIR_D2_NEW=1 cargo run -- --d2 tests/integration/programs/fibonacci.rs

To compare outputs against the legacy renderer:

git diff --no-index output-d2/fibonacci.smir.d2 fibonacci.smir.d2

No differences were observed across the existing integration test suite.


Status and follow-ups

  • The legacy D2 renderer is intentionally kept for now
  • Once this approach is validated, the D2 renderer can be switched to use the new traversal by default in follow-up commits.
  • Refactoring the DOT renderer to the same model will follow in a separate PR

Notes for reviewers

This PR is a structural refactor only. Traversal semantics and output behavior are unchanged, and the parallel wiring is intended to make review and validation straightforward.


Note on function IDs
The legacy D2 renderer uses short_name-based function IDs, which are prone to collisions in cases such as multiple monomorphizations of the same generic.

This behavior is intentionally preserved for now so that the new traversal can be validated via byte-for-byte output comparison. Collision avoidance will be addressed in follow-up commits by incorporating body hashes into function IDs.

@0xh4ty 0xh4ty requested a review from a team March 2, 2026 09:57
@Stevengre
Copy link
Contributor

LGTM! Would you remove the previous implementation before merging? Or just in another PR?

@dkcumming what do you think?

@0xh4ty
Copy link
Contributor Author

0xh4ty commented Mar 3, 2026

Thanks! I’ll remove the legacy D2 implementation in this PR itself.

/// Format-agnostic MIR graph traversal.
/// Owns traversal order and graph semantics, delegates rendering to `GraphBuilder`.
pub fn render_graph<B: GraphBuilder>(smir: &SmirJson, mut builder: B) -> B::Output {
let ctx = GraphContext::from_smir(smir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    pub fn to_d2_file_new(&self) -> String {
        let ctx = GraphContext::from_smir(self);
        render_graph(self, D2Builder::new(&ctx))
    }

It seems not efficient. It builds GraphContext twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I’ll move GraphContext construction out of render_graph so it is built only once and passed through.

) {
let fn_id = short_name(name);
let label = name_lines(name);
let is_local = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be body.is_some() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should be body.is_some(). I’ll fix that.

pub trait GraphBuilder {
type Output;

fn begin_graph(&mut self, name: &str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have some description here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I’ll expand the doc comment.

Copy link
Collaborator

@dkcumming dkcumming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0xh4ty this is great work! I love what you have done so far! I think it's time to go all in and get the old stuff out and hook up the full implementation of d2 and dot (okay if that is another PR if necessary). I think isolating the shared logic out in the traverse module is a great improvement. Do you think you are fine to do the full conversion? Also I think Jianhong had some great feedback too

@cds-amal cds-amal self-requested a review March 3, 2026 23:25
Copy link
Collaborator

@cds-amal cds-amal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @0xh4ty . @Stevengre gave some excellent feedback, and I added a few notes on the trait boundary design.

The traversal extraction is the right idea, and the hard part (generic traversal order, clean separation of items/blocks/edges) is already solid. The main thing to tighten up is where the format-agnostic boundary actually sits: a couple of the trait methods (block, block_edge) still carry D2-specific assumptions, which means the next renderer would inherit those assumptions rather than being free to do its own thing.

The comments above walk through the specifics. Once those methods pass structured data instead of raw MIR types and format-specific conventions, this will be a clean foundation for DOT, Mermaid, and whatever else comes next.

I also want to see more comments, as I want this project to be usable via cargo doc --open, one more sign of a mature Cargo library.

Finally, my vote is to rip out the old, and use this as the way forward. Though, you will have to make sure the operational parts remain consistent so we can generate graphs :)

) {
let fn_id = short_name(name);
let label = name_lines(name);
let is_local = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this true assignment is unconditional; is it needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be removed.

self.buf.push_str(&format!(" bb{}: \"{}\"\n", idx, label));
}

fn block_edge(&mut self, _fn_id: &str, from: usize, to: usize, _label: Option<&str>) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: block_edge bakes D2's arrow syntax into the implementation, and the Option<&str> label parameter (which it looks like D2Builder ignores?) is a tell that the signature was shaped around what D2 needs today, with a speculative parameter tacked on for a future renderer.

This is the second method where format-specific concerns leak through the trait boundary, so it's worth stepping back and asking: what's the trait actually buying us?

A well-drawn trait boundary gives you two things: the code above it can change without touching the code below (new traversal logic doesn't require touching every renderer), and the code below can change without touching the code above (new output format, same traversal). But that only works if the boundary is at the right level of abstraction. When trait methods receive raw MIR types or carry format-specific assumptions, every new renderer has to understand the same internals, and the trait becomes ceremony rather than insulation.

The fix is the same as for block: have the traversal own the "what" (which blocks connect, what the edge represents) and pass structured, pre-rendered data. Let each renderer own the "how" (syntax, escaping, layout). That way the trait boundary actually earns its keep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. The label parameter was speculative and is not used by the current renderers, so I will remove it to avoid leaking format assumptions into the trait. I will keep traversal responsible for edge semantics and pass structured, pre-rendered data to builders, while the renderers handle syntax and layout.

@cds-amal
Copy link
Collaborator

cds-amal commented Mar 4, 2026

Hey @0xh4ty , I had the opportunity to think this through some more and, well, here you go :)

First: the GraphBuilder trait is the right abstraction. Separating "walk the graph" from "emit format-specific syntax" is exactly where we want to end up, and the incremental rollout (old path kept, new path behind an env var) is a clean way to get there without breaking anything.

The question I want to think through is: what happens when we go to add DOT, Mermaid, Markdown, or other formats on top of this? Right now, each builder holds a &GraphContext, calls ctx.render_stmt() / ctx.render_terminator() / ctx.render_operand() itself, and carries a lifetime parameter. That means every new format re-implements the same rendering logic. And the trait's surface area will grow as formats discover they need additional hooks (locals_table, empty_body, external_function, etc.); each of those becomes a new method that every builder has to implement, even if it's a no-op.

The thing is, the current GraphBuilder is quietly conflating two jobs: (1) "what is the graph structure?" (nodes, edges, containment) and (2) "how do I turn a Statement or Terminator into a human-readable string?" Every builder currently does both. If we pull those apart, adding a new format becomes "just format pre-rendered strings into your syntax" rather than "re-implement MIR rendering and format-specific syntax."

So here's a concrete idea: instead of the driver pushing raw stable_mir types to the builder via a sequence of granular calls (begin_function / block / block_edge / call_edge / end_function), the driver produces a RenderedFunction struct that the builder receives as one cohesive unit:

/// A single basic block with pre-rendered content and structural edges.
struct RenderedBlock<'a> {
    idx: usize,
    stmts: Vec<String>,              // pre-rendered via GraphContext
    terminator: String,               // pre-rendered
    raw_terminator: &'a Terminator,   // escape hatch for structural inspection
    cfg_edges: Vec<(usize, Option<String>)>,  // (target_block, optional label)
}

/// A fully analyzed function, ready for format-specific rendering.
struct RenderedFunction<'a> {
    id: String,                       // stable ID (short_name + body hash)
    display_name: String,             // name_lines() output for labels
    is_local: bool,
    locals: Vec<(usize, String)>,     // (index, type_with_layout string)
    blocks: Vec<RenderedBlock<'a>>,
    call_edges: Vec<CallEdge>,        // resolved callee IDs and pre-rendered args
}

struct CallEdge {
    block_idx: usize,
    callee_id: String,
    callee_name: String,
    rendered_args: String,
}

The traversal driver does all the GraphContext work (rendering statements, resolving callees, computing function IDs) and hands RenderedFunction values to the builder. The trait itself simplifies:

trait GraphBuilder {
    type Output;

    fn begin_graph(&mut self, name: &str);
    fn alloc_legend(&mut self, lines: &[String]);
    fn type_legend(&mut self, lines: &[String]);
    fn external_function(&mut self, id: &str, name: &str);
    fn render_function(&mut self, func: &RenderedFunction);
    fn static_item(&mut self, id: &str, name: &str);
    fn asm_item(&mut self, id: &str, content: &str);
    fn finish(self) -> Self::Output;
}

What does this actually buy us? A few things:

No lifetime on builders. RenderedFunction owns its strings, so the builder doesn't need &GraphContext or a lifetime parameter. D2Builder becomes a plain struct, not D2Builder<'a>.

Rendering logic written once. The driver calls ctx.render_stmt(), ctx.render_terminator(), ctx.render_operand(). No builder ever touches GraphContext. You write the rendering code once; every format gets it for free.

Still flexible where it matters. The raw_terminator field (and potentially raw_stmts if we add it later) is an escape hatch for builders that need structural information; say, coloring blocks differently based on terminator kind, or visually distinguishing Call terminators from Assert. Builders that don't need this just ignore it.

Fewer trait methods. The granular push-based sequence (begin_function / block / block_edge / end_function) collapses into one render_function(&RenderedFunction) call. Builders that want the push style can still iterate func.blocks internally; the trait just doesn't force a particular traversal order.

New hooks emerge as data, not methods. Instead of adding locals_table(), empty_body(), label_to_entry_edge() as separate trait methods that every builder must implement (even as no-ops), these become fields on RenderedFunction that builders can inspect or ignore. func.locals is empty if there are no locals; func.blocks is empty if there's no body. The builder decides how to handle each case in its own render_function impl. No trait churn.

One more thing worth noting: function IDs. The current branch uses short_name(name) for graph node IDs, which works for simple cases but will collide when two monomorphizations of the same generic (say, foo::<i32> and foo::<u64>) produce the same short name. A make_fn_id helper that incorporates a body hash as a tiebreaker fixes this; with the RenderedFunction approach, that helper lives in the driver and produces func.id, so no format has to worry about it:

/// Generate a stable, unique ID from a symbol name and body.
/// Handles the case where multiple monomorphizations of the same
/// generic produce the same short_name by incorporating a body hash.
fn make_fn_id(name: &str, body: &Body) -> String { ... }

In terms of concrete next steps, I'd suggest:

  1. Introduce RenderedBlock, RenderedFunction, and CallEdge structs (could live in traverse.rs alongside the driver).
  2. Have render_graph produce RenderedFunction values and pass them to the builder.
  3. Simplify D2Builder: drop the &'a GraphContext field and lifetime parameter, implement the narrower render_function(&RenderedFunction) method.
  4. Verify the D2 output is identical (should be a pure refactor; no output change).
  5. As a stretch goal: try implementing a second format (DOT or Mermaid) on top of the new trait to validate that it's genuinely easy to add.

Of course, feel free to push back, if you have a different roadmap in mind.

@0xh4ty
Copy link
Contributor Author

0xh4ty commented Mar 4, 2026

Thanks for the detailed explanation. That separation between graph structure and MIR string rendering makes sense. I will refactor the driver to produce RenderedFunction, RenderedBlock and CallEdge data with pre-rendered strings and simplify the GraphBuilder trait to operate on that representation. This should also remove the need for GraphContext inside builders and eliminate the lifetime parameter. I’ll update the PR accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants