Refactor MIR graph rendering to use a unified traversal via GraphBuilder by 0xh4ty · Pull Request #132 · runtimeverification/stable-mir-json

0xh4ty · 2026-03-02T09:57:24Z

Summary

This PR introduces a format-agnostic MIR graph traversal based on a new GraphBuilder abstraction. Traversal semantics are centralized in a single implementation, while individual renderers are responsible only for presentation.

As an initial consumer, the D2 renderer has been ported to this new traversal behind a parallel code path. The legacy D2 implementation is retained for now to allow safe comparison and validation.

What changed

Introduced a GraphBuilder trait describing semantic graph events
Added a single generic traversal (render_graph) that owns MIR graph structure and traversal order
Refactored the D2 renderer to implement GraphBuilder
Moved all semantic stringification to GraphContext
Traversal now passes full MIR structures (statements, terminators, call arguments) to renderers
Wired the new D2 path in parallel to the legacy one for validation
Verified that legacy and new D2 outputs are byte-for-byte identical across the test suite
Code passes both cargo fmt and cargo clippy

Why this change

Previously, each renderer duplicated traversal logic, which made the code harder to reason about and maintain. Centralizing traversal ensures:

a single source of truth for graph semantics
easier maintenance and future refactors
cleaner renderer implementations
safer extension to additional output formats

Validation and testing

The new traversal can be exercised by enabling the experimental D2 path:

SMIR_D2_NEW=1 cargo run -- --d2 tests/integration/programs/fibonacci.rs

To compare outputs against the legacy renderer:

git diff --no-index output-d2/fibonacci.smir.d2 fibonacci.smir.d2

No differences were observed across the existing integration test suite.

Status and follow-ups

The legacy D2 renderer is intentionally kept for now
Once this approach is validated, the D2 renderer can be switched to use the new traversal by default in follow-up commits.
Refactoring the DOT renderer to the same model will follow in a separate PR

Notes for reviewers

This PR is a structural refactor only. Traversal semantics and output behavior are unchanged, and the parallel wiring is intended to make review and validation straightforward.

Note on function IDs
The legacy D2 renderer uses short_name-based function IDs, which are prone to collisions in cases such as multiple monomorphizations of the same generic.

This behavior is intentionally preserved for now so that the new traversal can be validated via byte-for-byte output comparison. Collision avoidance will be addressed in follow-up commits by incorporating body hashes into function IDs.

…ll args

Stevengre · 2026-03-03T09:19:49Z

LGTM! Would you remove the previous implementation before merging? Or just in another PR?

@dkcumming what do you think?

0xh4ty · 2026-03-03T09:24:30Z

Thanks! I’ll remove the legacy D2 implementation in this PR itself.

Stevengre · 2026-03-03T09:33:39Z

src/mk_graph/traverse.rs

+/// Format-agnostic MIR graph traversal.
+/// Owns traversal order and graph semantics, delegates rendering to `GraphBuilder`.
+pub fn render_graph<B: GraphBuilder>(smir: &SmirJson, mut builder: B) -> B::Output {
+    let ctx = GraphContext::from_smir(smir);


pub fn to_d2_file_new(&self) -> String { let ctx = GraphContext::from_smir(self); render_graph(self, D2Builder::new(&ctx)) }

It seems not efficient. It builds GraphContext twice.

Good catch. I’ll move GraphContext construction out of render_graph so it is built only once and passed through.

Stevengre · 2026-03-03T09:58:13Z

src/mk_graph/traverse.rs

+) {
+    let fn_id = short_name(name);
+    let label = name_lines(name);
+    let is_local = true;


Should be body.is_some() ?

Yes, that should be body.is_some(). I’ll fix that.

Stevengre · 2026-03-03T10:07:28Z

src/mk_graph/traverse.rs

+pub trait GraphBuilder {
+    type Output;
+
+    fn begin_graph(&mut self, name: &str);


I'd like to have some description here.

Got it. I’ll expand the doc comment.

dkcumming

@0xh4ty this is great work! I love what you have done so far! I think it's time to go all in and get the old stuff out and hook up the full implementation of d2 and dot (okay if that is another PR if necessary). I think isolating the shared logic out in the traverse module is a great improvement. Do you think you are fine to do the full conversion? Also I think Jianhong had some great feedback too

cds-amal

Good work @0xh4ty . @Stevengre gave some excellent feedback, and I added a few notes on the trait boundary design.

The traversal extraction is the right idea, and the hard part (generic traversal order, clean separation of items/blocks/edges) is already solid. The main thing to tighten up is where the format-agnostic boundary actually sits: a couple of the trait methods (block, block_edge) still carry D2-specific assumptions, which means the next renderer would inherit those assumptions rather than being free to do its own thing.

The comments above walk through the specifics. Once those methods pass structured data instead of raw MIR types and format-specific conventions, this will be a clean foundation for DOT, Mermaid, and whatever else comes next.

I also want to see more comments, as I want this project to be usable via cargo doc --open, one more sign of a mature Cargo library.

Finally, my vote is to rip out the old, and use this as the way forward. Though, you will have to make sure the operational parts remain consistent so we can generate graphs :)

cds-amal · 2026-03-03T23:34:26Z

src/mk_graph/traverse.rs

+) {
+    let fn_id = short_name(name);
+    let label = name_lines(name);
+    let is_local = true;


this true assignment is unconditional; is it needed?

This will be removed.

cds-amal · 2026-03-03T23:55:33Z

src/mk_graph/output/d2.rs

+        self.buf.push_str(&format!("  bb{}: \"{}\"\n", idx, label));
+    }
+
+    fn block_edge(&mut self, _fn_id: &str, from: usize, to: usize, _label: Option<&str>) {


Same issue here: block_edge bakes D2's arrow syntax into the implementation, and the Option<&str> label parameter (which it looks like D2Builder ignores?) is a tell that the signature was shaped around what D2 needs today, with a speculative parameter tacked on for a future renderer.

This is the second method where format-specific concerns leak through the trait boundary, so it's worth stepping back and asking: what's the trait actually buying us?

A well-drawn trait boundary gives you two things: the code above it can change without touching the code below (new traversal logic doesn't require touching every renderer), and the code below can change without touching the code above (new output format, same traversal). But that only works if the boundary is at the right level of abstraction. When trait methods receive raw MIR types or carry format-specific assumptions, every new renderer has to understand the same internals, and the trait becomes ceremony rather than insulation.

The fix is the same as for block: have the traversal own the "what" (which blocks connect, what the edge represents) and pass structured, pre-rendered data. Let each renderer own the "how" (syntax, escaping, layout). That way the trait boundary actually earns its keep.

Thanks, that makes sense. The label parameter was speculative and is not used by the current renderers, so I will remove it to avoid leaking format assumptions into the trait. I will keep traversal responsible for edge semantics and pass structured, pre-rendered data to builders, while the renderers handle syntax and layout.

cds-amal · 2026-03-04T07:34:28Z

Hey @0xh4ty , I had the opportunity to think this through some more and, well, here you go :)

First: the GraphBuilder trait is the right abstraction. Separating "walk the graph" from "emit format-specific syntax" is exactly where we want to end up, and the incremental rollout (old path kept, new path behind an env var) is a clean way to get there without breaking anything.

The question I want to think through is: what happens when we go to add DOT, Mermaid, Markdown, or other formats on top of this? Right now, each builder holds a &GraphContext, calls ctx.render_stmt() / ctx.render_terminator() / ctx.render_operand() itself, and carries a lifetime parameter. That means every new format re-implements the same rendering logic. And the trait's surface area will grow as formats discover they need additional hooks (locals_table, empty_body, external_function, etc.); each of those becomes a new method that every builder has to implement, even if it's a no-op.

The thing is, the current GraphBuilder is quietly conflating two jobs: (1) "what is the graph structure?" (nodes, edges, containment) and (2) "how do I turn a Statement or Terminator into a human-readable string?" Every builder currently does both. If we pull those apart, adding a new format becomes "just format pre-rendered strings into your syntax" rather than "re-implement MIR rendering and format-specific syntax."

So here's a concrete idea: instead of the driver pushing raw stable_mir types to the builder via a sequence of granular calls (begin_function / block / block_edge / call_edge / end_function), the driver produces a RenderedFunction struct that the builder receives as one cohesive unit:

/// A single basic block with pre-rendered content and structural edges.
struct RenderedBlock<'a> {
    idx: usize,
    stmts: Vec<String>,              // pre-rendered via GraphContext
    terminator: String,               // pre-rendered
    raw_terminator: &'a Terminator,   // escape hatch for structural inspection
    cfg_edges: Vec<(usize, Option<String>)>,  // (target_block, optional label)
}

/// A fully analyzed function, ready for format-specific rendering.
struct RenderedFunction<'a> {
    id: String,                       // stable ID (short_name + body hash)
    display_name: String,             // name_lines() output for labels
    is_local: bool,
    locals: Vec<(usize, String)>,     // (index, type_with_layout string)
    blocks: Vec<RenderedBlock<'a>>,
    call_edges: Vec<CallEdge>,        // resolved callee IDs and pre-rendered args
}

struct CallEdge {
    block_idx: usize,
    callee_id: String,
    callee_name: String,
    rendered_args: String,
}

The traversal driver does all the GraphContext work (rendering statements, resolving callees, computing function IDs) and hands RenderedFunction values to the builder. The trait itself simplifies:

trait GraphBuilder {
    type Output;

    fn begin_graph(&mut self, name: &str);
    fn alloc_legend(&mut self, lines: &[String]);
    fn type_legend(&mut self, lines: &[String]);
    fn external_function(&mut self, id: &str, name: &str);
    fn render_function(&mut self, func: &RenderedFunction);
    fn static_item(&mut self, id: &str, name: &str);
    fn asm_item(&mut self, id: &str, content: &str);
    fn finish(self) -> Self::Output;
}

What does this actually buy us? A few things:

No lifetime on builders. RenderedFunction owns its strings, so the builder doesn't need &GraphContext or a lifetime parameter. D2Builder becomes a plain struct, not D2Builder<'a>.

Rendering logic written once. The driver calls ctx.render_stmt(), ctx.render_terminator(), ctx.render_operand(). No builder ever touches GraphContext. You write the rendering code once; every format gets it for free.

Still flexible where it matters. The raw_terminator field (and potentially raw_stmts if we add it later) is an escape hatch for builders that need structural information; say, coloring blocks differently based on terminator kind, or visually distinguishing Call terminators from Assert. Builders that don't need this just ignore it.

Fewer trait methods. The granular push-based sequence (begin_function / block / block_edge / end_function) collapses into one render_function(&RenderedFunction) call. Builders that want the push style can still iterate func.blocks internally; the trait just doesn't force a particular traversal order.

New hooks emerge as data, not methods. Instead of adding locals_table(), empty_body(), label_to_entry_edge() as separate trait methods that every builder must implement (even as no-ops), these become fields on RenderedFunction that builders can inspect or ignore. func.locals is empty if there are no locals; func.blocks is empty if there's no body. The builder decides how to handle each case in its own render_function impl. No trait churn.

One more thing worth noting: function IDs. The current branch uses short_name(name) for graph node IDs, which works for simple cases but will collide when two monomorphizations of the same generic (say, foo::<i32> and foo::<u64>) produce the same short name. A make_fn_id helper that incorporates a body hash as a tiebreaker fixes this; with the RenderedFunction approach, that helper lives in the driver and produces func.id, so no format has to worry about it:

/// Generate a stable, unique ID from a symbol name and body.
/// Handles the case where multiple monomorphizations of the same
/// generic produce the same short_name by incorporating a body hash.
fn make_fn_id(name: &str, body: &Body) -> String { ... }

In terms of concrete next steps, I'd suggest:

Introduce RenderedBlock, RenderedFunction, and CallEdge structs (could live in traverse.rs alongside the driver).
Have render_graph produce RenderedFunction values and pass them to the builder.
Simplify D2Builder: drop the &'a GraphContext field and lifetime parameter, implement the narrower render_function(&RenderedFunction) method.
Verify the D2 output is identical (should be a pure refactor; no output change).
As a stretch goal: try implementing a second format (DOT or Mermaid) on top of the new trait to validate that it's genuinely easy to add.

Of course, feel free to push back, if you have a different roadmap in mind.

0xh4ty · 2026-03-04T15:21:06Z

Thanks for the detailed explanation. That separation between graph structure and MIR string rendering makes sense. I will refactor the driver to produce RenderedFunction, RenderedBlock and CallEdge data with pre-rendered strings and simplify the GraphBuilder trait to operate on that representation. This should also remove the need for GraphContext inside builders and eliminate the lifetime parameter. I’ll update the PR accordingly.

0xh4ty added 10 commits March 2, 2026 14:53

mk_graph: introduce GraphBuilder trait

81ffe3b

d2: extract item traversal helper

565f2c9

d2: introduce D2Builder struct

95eee61

d2: implement GraphBuilder for D2Builder

34debc9

mk_graph: add generic graph traversal

fa119cd

d2: wire GraphBuilder traversal behind flag

a1799b8

mk_graph: make traversal independent of D2 formatting

ac9931b

mk_graph: add comments clarifying traversal responsibilities

cab4f95

mk_graph: extend GraphBuilder to pass statements, terminators, and ca…

7aa45ff

…ll args

d2: elide explicit lifetime in GraphBuilder impl

dbb514a

0xh4ty requested a review from a team March 2, 2026 09:57

Stevengre reviewed Mar 3, 2026

View reviewed changes

dkcumming reviewed Mar 3, 2026

View reviewed changes

cds-amal self-requested a review March 3, 2026 23:25

cds-amal requested changes Mar 4, 2026

View reviewed changes

Conversation

0xh4ty commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Why this change

Validation and testing

Status and follow-ups

Notes for reviewers

Uh oh!

Stevengre commented Mar 3, 2026

Uh oh!

0xh4ty commented Mar 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkcumming left a comment

Choose a reason for hiding this comment

Uh oh!

cds-amal left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cds-amal commented Mar 4, 2026

Uh oh!

0xh4ty commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

0xh4ty commented Mar 2, 2026 •

edited

Loading

cds-amal left a comment •

edited

Loading