Skip to content

Runtime formatting & JSON serialization: performance, determinism, and design concerns #1001

@emil14

Description

@emil14

Description

The current runtime message implementation uses JSON serialization as the canonical string representation for most message types (String(), MarshalJSON(), printing to stdout, etc.). This aligns well with the language design goal of strict separation between code and data and the guarantee that all runtime data is serializable.

However, the current approach introduces several performance, correctness, and maintainability concerns, especially as message sizes and output frequency grow.

This issue proposes reviewing and potentially redesigning the runtime formatting / serialization layer.

Current behavior (summary)

From the runtime implementation:

  • Most message types (BoolMsg, IntMsg, FloatMsg, StringMsg, ListMsg, DictMsg, StructMsg) implement:

    • String() → often calls MarshalJSON()
    • MarshalJSON() → usually delegates to encoding/json
  • Composite types (ListMsg, DictMsg, StructMsg) rely on:

    • Go reflection (json.Marshal)
    • Temporary maps / slices
    • Post-processing JSON strings with strings.ReplaceAll to insert spaces
  • UnionMsg.String() manually formats JSON-like output using fmt.Sprintf

  • Printing and formatting are implicitly tied to JSON encoding, even for simple debug output

Identified issues

1. Performance overhead

  • encoding/json uses reflection and allocations for composite values
  • MarshalJSON → []byte → string creates unnecessary allocations
  • Repeated strings.ReplaceAll causes additional passes over encoded data
  • Printing complex values in hot paths may become a bottleneck even when I/O is not

2. Inconsistent serialization paths

  • Some types use json.Marshal, others hand-format JSON (UnionMsg)
  • Spacing and formatting rules are not centralized
  • String() and MarshalJSON() semantics are tightly coupled but implemented differently across types

3. Determinism concerns

  • DictMsg and StructMsg rely on Go maps internally during marshaling
  • Key order is not guaranteed unless explicitly sorted
  • This affects reproducibility, snapshot tests, and debugging

4. Semantic edge cases

  • JSON loses type distinctions (e.g. int vs float vs uint in the future)
  • NaN / ±Inf handling for FloatMsg is undefined
  • Binary data would require additional conventions
  • Round-tripping printed output back into the language is ambiguous

5. API & design rigidity

  • Formatting is implicitly JSON-only

  • No distinction between:

    • fast debug printing
    • canonical JSON serialization
    • user-facing formatting
  • Hard to optimize or evolve without touching many message types

Why this matters

While JSON-as-default is a reasonable MVP decision, the runtime now:

  • Pays full JSON encoding cost even when it may not be needed
  • Mixes concerns (debug printing vs canonical serialization)
  • Makes future optimizations harder due to scattered logic

Given that the runtime already has a closed set of message kinds (BoolMsg, IntMsg, ListMsg, StructMsg, UnionMsg, etc.), this is an opportunity to:

  • Avoid reflection entirely
  • Centralize formatting logic
  • Make performance characteristics explicit and predictable

Proposed directions (non-exclusive)

Option A: Centralized non-reflective encoder

  • Implement a custom encoder that operates directly on Msg variants
  • Stream output to io.Writer instead of allocating []byte
  • Preserve JSON as the canonical format, but without encoding/json

Option B: Separate concerns

  • String() → fast, deterministic, debug-oriented representation
  • MarshalJSON() / ToJSON() → canonical JSON serialization
  • Keep JSON for interop, not for every print

Option C: Explicit formatting policy

  • Define a single formatting contract for:

    • spacing
    • ordering
    • union encoding
  • Ensure determinism by sorting keys / fields explicitly

Acceptance criteria / next steps

  • Decide whether JSON should remain the default for String()

  • Measure current performance (benchmarks on nested messages)

  • Identify hot paths (printing, logging, REPL, tests)

  • Prototype either:

    • a custom encoder, or
    • a split between debug printing and JSON serialization

References

Relevant runtime code:

  • Msg interface
  • ListMsg.MarshalJSON
  • DictMsg.MarshalJSON
  • StructMsg.MarshalJSON
  • UnionMsg.String

Notes

This issue is not about changing language semantics.
It is about making formatting explicit, deterministic, and efficient, while preserving the core design principle: all runtime data is serializable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ideaThinking neededoptimisationMake it fastp2Someday we should do it. I hope

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions