Skip to content

Add internal code mode module for programmatic notebook control#8670

Merged
manzt merged 27 commits intomainfrom
push-qtlwwmptyrvr
Mar 16, 2026
Merged

Add internal code mode module for programmatic notebook control#8670
manzt merged 27 commits intomainfrom
push-qtlwwmptyrvr

Conversation

@manzt
Copy link
Collaborator

@manzt manzt commented Mar 12, 2026

This introduces marimo._code_mode, an internal agent-only API that gives programmatic access to a running marimo notebook. The motivating use case is letting agents (e.g. from a scratchpad) insert, delete, replace, and reorder cells without going through the frontend UI. e.g.,

import marimo._code_mode as cm

async with cm.get_context() as ctx:
    # Install packages (queued, installed before cell ops)
    ctx.install_packages("pandas", "polars>=0.20")

    # Cell ops (appends at end by default)
    cid = ctx.create_cell("import pandas as pd")
    ctx.create_cell("df = pd.DataFrame()", after=cid)
    ctx.create_cell("setup()", before=cid, hide_code=True, disabled=True)
    
    ctx.update_cell("my_cell", code="x = 42")
    ctx.update_cell("other", hide_code=False, disabled=True)
    ctx.delete_cell("old_cell")
    ctx.move_cell("my_cell", after="other_cell")

    # Set UI element values (batched)
    ctx.set_ui_value(slider, 10)

# Dry-run compile check is on by default; disable with:
async with cm.get_context(check=False) as ctx:
    ...

@vercel
Copy link

vercel bot commented Mar 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Mar 16, 2026 1:40am

Request Review

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an internal marimo._code_mode module to let agents programmatically edit a running notebook (insert/delete/replace/reorder cells and set UI values) by reducing edits into a plan and applying them to the kernel, with accompanying tests.

Changes:

  • Introduces marimo._code_mode with edit descriptors (_edits.py) and an async runtime context (_context.py) to apply edits and broadcast notifications.
  • Adds tests for plan building and apply_edit behavior.
  • Makes the kernel control loop more resilient by catching/logging exceptions during control request handling.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
marimo/_code_mode/_edits.py Defines immutable edit descriptors (NotebookCellData, NotebookEdit) used to describe notebook edits.
marimo/_code_mode/_context.py Implements AsyncCodeModeContext to apply edits against a live kernel/graph and send notifications/execute cells.
marimo/_code_mode/__init__.py Exposes the internal API surface and provides usage docs/examples.
marimo/_runtime/runtime.py Wraps kernel.handle_message() in the control loop with exception logging.
tests/_code_mode/test_plan_building.py Adds unit tests for reducing edits into a plan (_build_plan).
tests/_code_mode/test_apply_edit.py Adds integration-style tests for applying edits to a Kernel and observing graph/notifications.
tests/_code_mode/__init__.py Initializes the new test package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3576 to +3582
try:
await kernel.handle_message(request)
except Exception:
LOGGER.exception(
"Failed to handle control request: %s",
type(request).__name__,
)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new try/except logs and continues on any kernel.handle_message() exception. Two concerns: (1) exceptions from the merged UpdateUIElementCommand/ModelCommand branch (above) are still unhandled and can still crash the control loop, so the exception handling is inconsistent; (2) swallowing exceptions here may leave the client without a response/notification. Consider either applying the same guard to the merged branch and/or emitting an explicit failure notification/teardown so the system fails in a controlled way.

Copilot uses AI. Check for mistakes.
Comment on lines +252 to +255
if entry.config is not None:
cell.configure(entry.config.asdict())
self._kernel.cell_metadata[entry.cell_id] = CellMetadata(
config=entry.config
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage gap: there’s no test that updates code for an existing cell without providing config and asserts the cell’s previous config (from kernel.cell_metadata) is preserved. Adding a regression test around this branch would catch config-loss issues when recompiling/re-registering a cell.

Suggested change
if entry.config is not None:
cell.configure(entry.config.asdict())
self._kernel.cell_metadata[entry.cell_id] = CellMetadata(
config=entry.config
# Preserve existing config if no new config is provided
cfg = entry.config
if cfg is None:
existing_metadata = self._kernel.cell_metadata.get(
entry.cell_id
)
if existing_metadata is not None:
cfg = existing_metadata.config
if cfg is not None:
cell.configure(cfg.asdict())
self._kernel.cell_metadata[entry.cell_id] = CellMetadata(
config=cfg

Copilot uses AI. Check for mistakes.
Base automatically changed from push-ukxtvoylltuk to manzt/agent-cli March 12, 2026 21:12
@manzt manzt force-pushed the manzt/agent-cli branch from 101f362 to d6592fb Compare March 12, 2026 21:18
@manzt manzt force-pushed the push-qtlwwmptyrvr branch from ae4e010 to b1e266e Compare March 12, 2026 21:18
Base automatically changed from manzt/agent-cli to main March 12, 2026 21:40
@manzt manzt force-pushed the push-qtlwwmptyrvr branch from 543988e to dd19df2 Compare March 12, 2026 21:41
manzt and others added 18 commits March 15, 2026 21:12
The code mode API lacked a way to install packages programmatically.
This adds `install_packages(*packages)` which reads the user's
configured package manager and passes pip-style specifiers directly
through to it. Specifiers like `polars>=0.20` are passed as-is rather
than being parsed into name/version pairs, since pip and uv handle
the full specifier natively.

```python
async with cm.get_context() as nb:
    await nb.install_packages("pandas", "polars>=0.20")
```

Tests mock the underlying `package_manager.install()` to verify
each specifier string reaches the package manager unchanged.
The context module was getting large with op types, plan building, and
validation mixed in alongside the public AsyncCodeModeContext class.
This moves all internal machinery (op dataclasses, `_build_plan`,
`_validate_ops`, `_PlanEntry`) into a dedicated `_plan.py` module so
`_context.py` stays focused on the context manager API.

Also renames `test_apply_edit.py` to `test_context.py` to match the
module it tests, and switches notification assertions to use
`msgspec.to_builtins()` for typed snapshot comparisons instead of manual
dict-building helpers.
The code mode context's `install_packages` was async and executed
immediately, which didn't fit the batched mutation pattern used by
`add_cell`, `update_cell`, etc. It's now synchronous and queues
packages, which are installed one-by-one in `__aexit__` before cell ops
are applied. This ensures newly added cells can import the
just-installed packages.

```py
    async with cm.get_context() as ctx:
        ctx.install_packages("pandas", "numpy>=2.0")
        ctx.add_cell("import pandas as pd")
```

When code is executed via the `/api/kernel/execute` scratchpad endpoint,
`MissingPackageAlertNotification` is now suppressed from reaching the
frontend via a new Room-level suppression mechanism on
`session.scoped()`. The listener still captures the notification through
the event bus and surfaces a helpful error suggesting
`ctx.install_packages()` instead of the raw `ModuleNotFoundError`
traceback. The convention in docstrings is also updated from `nb` to
`ctx` throughout.
Calling `ctx.add_cell(...)` without `async with cm.get_context()`
silently queues operations that never flush, making it look like the
call succeeded when nothing actually happened. This was discovered
during agent-driven notebook editing where the missing `async with`
caused cells to never appear in the UI.

All mutating methods (`add_cell`, `update_cell`, `delete_cell`,
`move_cell`, `install_packages`) now check an `_entered` flag set by
`__aenter__` and raise immediately with a clear message showing the
correct pattern.
Extracts `is_headless_request()` into `_messaging/context.py` (replaces
the private `_accepts_html()` in `tracebacks.py`). The missing-packages
hook now checks this before broadcasting
`MissingPackageAlertNotification`, so non-browser clients like the
`/execute` SSE endpoint no longer receive the alert — the
`ModuleNotFoundError` already surfaces as a cell error with an
`ctx.install_packages(...)` hint.

This also removes the Room-level `_suppressed_types` mechanism and the
`suppress` parameter on `session.scoped()`, which were added to solve
the same problem with more machinery.
The code mode context manager now validates all queued cell ops before
touching the graph. On `__aexit__`, `_dry_run_compile` compiles each
cell with `compile_cell`, temporarily registers it in the graph, and
checks for newly introduced multiply-defined names or cycles. If
anything fails the graph is restored and no mutations occur. Existing
graph problems are snapshotted beforehand so only new issues are
flagged.

`get_context()` accepts `check=True` (default) to control this. The
runtime errors include a `check=False` hint for callers who want to
bypass validation intentionally.

`set_ui_value` is now sync and queue-based like the other mutation
methods. Updates are collected during the block and flushed as a single
batched `UpdateUIElementCommand` on exit, after cell ops are applied.
When `update_cell` changed a cell's code without explicitly passing
config kwargs, the cell's existing configuration (e.g. `hide_code`,
`disabled`) was silently lost. The recompiled cell would get a bare
default config, and the frontend notification would hardcode
`hide_code=True` regardless of prior state.

Now the `code_changed` path reads back the cell's existing `CellConfig`
from kernel metadata and carries it forward. The notification fallback
also uses stored metadata instead of a hardcoded default. New cells
created via `create_cell` still default to `hide_code=True`.
These methods are the primary interface for AI agents editing notebooks
programmatically. The previous docstrings were terse one-liners that
didn't explain arguments or show usage patterns. The new docstrings
follow the marimo convention (Examples with code blocks, then Args) so
that agents working with this API can understand the semantics of each
parameter — particularly the "None means keep existing" behavior of
`update_cell` and the `draft` flag.
The code mode context was calling `graph.delete_cell()` directly when
removing or updating cells, bypassing the kernel's cleanup path. This
left stale variables in `kernel.globals`, leaked UI elements (no
`RemoveUIElementsNotification`), and skipped lifecycle hook disposal.

Now deletions go through `Kernel._delete_cell` and updates through
`Kernel._deactivate_cell`, which properly invalidate globals, dispose UI
elements, and fire lifecycle hooks. We use these internal primitives
directly rather than `DeleteCellCommand` because we need synchronous
graph manipulation within a single atomic batch — the command path would
trigger `_run_cells` for descendants, which we already handle via
`ExecuteCellsCommand` at the end.

Also merges the previously duplicated `is_new` / `code_changed` branches
in `_apply_ops`, preserves existing cell config on code-only updates,
simplifies the notification config lookup, and fixes a mypy error where
`_cell_manager` should have been the public `cell_manager` property.
The previous `_apply_ops` manually compiled cells, registered them in
the graph, and called private kernel methods (`_delete_cell`,
`_deactivate_cell`) to clean up state. This reimplemented half of what
`mutate_graph` already does and was fragile — any changes to the
kernel's cleanup path would need to be mirrored here.

Now `_apply_ops` builds `ExecuteCellCommand` / `DeleteCellCommand` lists
and passes them to `mutate_graph`, which handles compilation,
registration, deletion, globals cleanup, UI element disposal, and
lifecycle hooks through its established code path. Configs are resolved
before the call (since `mutate_graph` may delete metadata for replaced
cells) and applied after registration. Draft cells are excluded from the
run set returned by `mutate_graph`.
`mutate_graph` calls `_deactivate_cell` (which removes a cell from the
dict) then `_try_registering_cell` (which re-adds it at the end). This
means updated cells lose their position in the ordering. Since the code
mode plan tracks the intended cell order, we need to reorder the graph's
internal dict after mutation to match.

Adds `Topology.reorder_nodes` to rearrange the cells dict in-place, and
calls it in `_apply_ops` right after `mutate_graph` returns.
The dry-run compile check in `_dry_run_compile` evicts and re-registers
cells to validate updates, but `register_cell` appends to the end of the
dict. This corrupts the cell ordering that `_apply_ops` reads on line
620 via `list(self.graph.cells.keys())`, so the plan built from that
ordering preserves the wrong position. The fix snapshots the cell order
before any mutations and restores it after cleanup.

Separately, `run_scratchpad` was not flushing `state_updates` after
execution. When a widget `.observe()` callback calls a `mo.state` setter
from the scratchpad, the update gets queued but never processed because
`run_scratchpad` returns without calling `_run_cells(set())`. Other code
paths like `set_ui_element_value` and `handle_receive_model_message`
already do this flush. Adding the same pattern to `run_scratchpad` fixes
downstream cell reactivity for programmatic widget state changes.
The dry-run compile check in _dry_run_compile did not simulate
_DeleteOp, so deleting a cell and creating a replacement that defines
the same names would falsely raise "Multiply-defined names". Now delete
ops evict cells from the graph during the dry run, matching the behavior
already in place for _UpdateOp.

Also renames update_cell to edit_cell for a clearer API.
When cell operations are applied via `async with cm.get_context()`,
`__aexit__` now prints a line per operation to stdout. This gives agents
confirmation that ops took effect without needing to re-query the graph.
Previously, success was completely silent, making it impossible to
distinguish "operations applied" from "nothing happened" when executing
remotely via the scratchpad.

Output looks like:

    ✓ created cell 'data_loader'
    ✓ edited code of cell 'a1b2c3d4'
    ✓ deleted cell 'scratch'

Cell names are used when available, otherwise the first 8 characters
of the cell ID.
When using /api/execute in headless mode, the notebook may not have been
instantiated yet, meaning the kernel's globals are empty and scratchpad
code can't reference notebook variables. This adds a
session.instantiate() call before enqueuing the scratchpad command. The
kernel already no-ops if the notebook is already instantiated, so this
is safe on every call. Queue ordering guarantees instantiation completes
before the scratchpad runs.
The code_mode API lacked support for cell names and couldn't convert a
regular cell into a setup cell. `create_cell` and `edit_cell` now accept
a `name` parameter. Passing `name="setup"` uses the well-known setup
cell ID so the frontend recognizes it as a setup cell — the name itself
is cleared since setup identity is purely a cell_id concern.

`edit_cell` handles the tricky case: when an existing cell is converted
to setup, it migrates the cell_id via a `new_cell_id` field on
`_UpdateOp`. The plan builder swaps the ID in place (preserving
position), and `_apply_ops` sees the old ID as deleted and the new one
as added, which is exactly what `mutate_graph` needs.

Cell IDs now use `CellIdGenerator` (producing short 4-letter IDs like
`Hbol`) instead of UUIDs. The generator is seeded with existing graph
cell IDs to avoid collisions. This was necessary because `cell_manager`
is unreachable from the kernel process — it lives on the server side.

```python
async with cm.get_context() as ctx:
    ctx.create_cell("import marimo as mo", name="setup")
    ctx.edit_cell("Hbol", name="setup")  # migrates cell_id
```
When batching multiple operations in a single `async with` block,
`before`/`after` targets could only reference cells by their live graph
name or by a pending add's cell ID. This meant that `create_cell(...,
name="foo")` followed by `create_cell(..., after="foo")` would fail, and
so would referencing a cell by a name assigned via `edit_cell` in the
same batch.

`_resolve_target` now also searches pending adds by name and queued
`_UpdateOp` renames, so all of these work within a single batch:

    ctx.create_cell("x = 1", name="first")
    ctx.create_cell("y = x + 1", after="first")

    ctx.edit_cell("old_cell", name="renamed")
    ctx.create_cell("z = 1", after="renamed")
@manzt manzt enabled auto-merge (squash) March 16, 2026 03:55
@manzt manzt disabled auto-merge March 16, 2026 03:55
@manzt manzt merged commit 1e5a7bf into main Mar 16, 2026
35 of 43 checks passed
@manzt manzt deleted the push-qtlwwmptyrvr branch March 16, 2026 03:55
@github-actions
Copy link

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.20.5-dev70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request preview Experimental or preview-only feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants