Add internal code mode module for programmatic notebook control#8670
Add internal code mode module for programmatic notebook control#8670
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
fe050d3 to
467bf90
Compare
ea7bcfe to
1f5911e
Compare
There was a problem hiding this comment.
Pull request overview
Adds an internal marimo._code_mode module to let agents programmatically edit a running notebook (insert/delete/replace/reorder cells and set UI values) by reducing edits into a plan and applying them to the kernel, with accompanying tests.
Changes:
- Introduces
marimo._code_modewith edit descriptors (_edits.py) and an async runtime context (_context.py) to apply edits and broadcast notifications. - Adds tests for plan building and
apply_editbehavior. - Makes the kernel control loop more resilient by catching/logging exceptions during control request handling.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
marimo/_code_mode/_edits.py |
Defines immutable edit descriptors (NotebookCellData, NotebookEdit) used to describe notebook edits. |
marimo/_code_mode/_context.py |
Implements AsyncCodeModeContext to apply edits against a live kernel/graph and send notifications/execute cells. |
marimo/_code_mode/__init__.py |
Exposes the internal API surface and provides usage docs/examples. |
marimo/_runtime/runtime.py |
Wraps kernel.handle_message() in the control loop with exception logging. |
tests/_code_mode/test_plan_building.py |
Adds unit tests for reducing edits into a plan (_build_plan). |
tests/_code_mode/test_apply_edit.py |
Adds integration-style tests for applying edits to a Kernel and observing graph/notifications. |
tests/_code_mode/__init__.py |
Initializes the new test package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| await kernel.handle_message(request) | ||
| except Exception: | ||
| LOGGER.exception( | ||
| "Failed to handle control request: %s", | ||
| type(request).__name__, | ||
| ) |
There was a problem hiding this comment.
This new try/except logs and continues on any kernel.handle_message() exception. Two concerns: (1) exceptions from the merged UpdateUIElementCommand/ModelCommand branch (above) are still unhandled and can still crash the control loop, so the exception handling is inconsistent; (2) swallowing exceptions here may leave the client without a response/notification. Consider either applying the same guard to the merged branch and/or emitting an explicit failure notification/teardown so the system fails in a controlled way.
marimo/_code_mode/_context.py
Outdated
| if entry.config is not None: | ||
| cell.configure(entry.config.asdict()) | ||
| self._kernel.cell_metadata[entry.cell_id] = CellMetadata( | ||
| config=entry.config |
There was a problem hiding this comment.
Test coverage gap: there’s no test that updates code for an existing cell without providing config and asserts the cell’s previous config (from kernel.cell_metadata) is preserved. Adding a regression test around this branch would catch config-loss issues when recompiling/re-registering a cell.
| if entry.config is not None: | |
| cell.configure(entry.config.asdict()) | |
| self._kernel.cell_metadata[entry.cell_id] = CellMetadata( | |
| config=entry.config | |
| # Preserve existing config if no new config is provided | |
| cfg = entry.config | |
| if cfg is None: | |
| existing_metadata = self._kernel.cell_metadata.get( | |
| entry.cell_id | |
| ) | |
| if existing_metadata is not None: | |
| cfg = existing_metadata.config | |
| if cfg is not None: | |
| cell.configure(cfg.asdict()) | |
| self._kernel.cell_metadata[entry.cell_id] = CellMetadata( | |
| config=cfg |
2d61e07 to
ae4e010
Compare
5ef4725 to
89284ff
Compare
ae4e010 to
b1e266e
Compare
543988e to
dd19df2
Compare
The code mode API lacked a way to install packages programmatically.
This adds `install_packages(*packages)` which reads the user's
configured package manager and passes pip-style specifiers directly
through to it. Specifiers like `polars>=0.20` are passed as-is rather
than being parsed into name/version pairs, since pip and uv handle
the full specifier natively.
```python
async with cm.get_context() as nb:
await nb.install_packages("pandas", "polars>=0.20")
```
Tests mock the underlying `package_manager.install()` to verify
each specifier string reaches the package manager unchanged.
The context module was getting large with op types, plan building, and validation mixed in alongside the public AsyncCodeModeContext class. This moves all internal machinery (op dataclasses, `_build_plan`, `_validate_ops`, `_PlanEntry`) into a dedicated `_plan.py` module so `_context.py` stays focused on the context manager API. Also renames `test_apply_edit.py` to `test_context.py` to match the module it tests, and switches notification assertions to use `msgspec.to_builtins()` for typed snapshot comparisons instead of manual dict-building helpers.
The code mode context's `install_packages` was async and executed
immediately, which didn't fit the batched mutation pattern used by
`add_cell`, `update_cell`, etc. It's now synchronous and queues
packages, which are installed one-by-one in `__aexit__` before cell ops
are applied. This ensures newly added cells can import the
just-installed packages.
```py
async with cm.get_context() as ctx:
ctx.install_packages("pandas", "numpy>=2.0")
ctx.add_cell("import pandas as pd")
```
When code is executed via the `/api/kernel/execute` scratchpad endpoint,
`MissingPackageAlertNotification` is now suppressed from reaching the
frontend via a new Room-level suppression mechanism on
`session.scoped()`. The listener still captures the notification through
the event bus and surfaces a helpful error suggesting
`ctx.install_packages()` instead of the raw `ModuleNotFoundError`
traceback. The convention in docstrings is also updated from `nb` to
`ctx` throughout.
Calling `ctx.add_cell(...)` without `async with cm.get_context()` silently queues operations that never flush, making it look like the call succeeded when nothing actually happened. This was discovered during agent-driven notebook editing where the missing `async with` caused cells to never appear in the UI. All mutating methods (`add_cell`, `update_cell`, `delete_cell`, `move_cell`, `install_packages`) now check an `_entered` flag set by `__aenter__` and raise immediately with a clear message showing the correct pattern.
Extracts `is_headless_request()` into `_messaging/context.py` (replaces the private `_accepts_html()` in `tracebacks.py`). The missing-packages hook now checks this before broadcasting `MissingPackageAlertNotification`, so non-browser clients like the `/execute` SSE endpoint no longer receive the alert — the `ModuleNotFoundError` already surfaces as a cell error with an `ctx.install_packages(...)` hint. This also removes the Room-level `_suppressed_types` mechanism and the `suppress` parameter on `session.scoped()`, which were added to solve the same problem with more machinery.
The code mode context manager now validates all queued cell ops before touching the graph. On `__aexit__`, `_dry_run_compile` compiles each cell with `compile_cell`, temporarily registers it in the graph, and checks for newly introduced multiply-defined names or cycles. If anything fails the graph is restored and no mutations occur. Existing graph problems are snapshotted beforehand so only new issues are flagged. `get_context()` accepts `check=True` (default) to control this. The runtime errors include a `check=False` hint for callers who want to bypass validation intentionally. `set_ui_value` is now sync and queue-based like the other mutation methods. Updates are collected during the block and flushed as a single batched `UpdateUIElementCommand` on exit, after cell ops are applied.
When `update_cell` changed a cell's code without explicitly passing config kwargs, the cell's existing configuration (e.g. `hide_code`, `disabled`) was silently lost. The recompiled cell would get a bare default config, and the frontend notification would hardcode `hide_code=True` regardless of prior state. Now the `code_changed` path reads back the cell's existing `CellConfig` from kernel metadata and carries it forward. The notification fallback also uses stored metadata instead of a hardcoded default. New cells created via `create_cell` still default to `hide_code=True`.
These methods are the primary interface for AI agents editing notebooks programmatically. The previous docstrings were terse one-liners that didn't explain arguments or show usage patterns. The new docstrings follow the marimo convention (Examples with code blocks, then Args) so that agents working with this API can understand the semantics of each parameter — particularly the "None means keep existing" behavior of `update_cell` and the `draft` flag.
The code mode context was calling `graph.delete_cell()` directly when removing or updating cells, bypassing the kernel's cleanup path. This left stale variables in `kernel.globals`, leaked UI elements (no `RemoveUIElementsNotification`), and skipped lifecycle hook disposal. Now deletions go through `Kernel._delete_cell` and updates through `Kernel._deactivate_cell`, which properly invalidate globals, dispose UI elements, and fire lifecycle hooks. We use these internal primitives directly rather than `DeleteCellCommand` because we need synchronous graph manipulation within a single atomic batch — the command path would trigger `_run_cells` for descendants, which we already handle via `ExecuteCellsCommand` at the end. Also merges the previously duplicated `is_new` / `code_changed` branches in `_apply_ops`, preserves existing cell config on code-only updates, simplifies the notification config lookup, and fixes a mypy error where `_cell_manager` should have been the public `cell_manager` property.
The previous `_apply_ops` manually compiled cells, registered them in the graph, and called private kernel methods (`_delete_cell`, `_deactivate_cell`) to clean up state. This reimplemented half of what `mutate_graph` already does and was fragile — any changes to the kernel's cleanup path would need to be mirrored here. Now `_apply_ops` builds `ExecuteCellCommand` / `DeleteCellCommand` lists and passes them to `mutate_graph`, which handles compilation, registration, deletion, globals cleanup, UI element disposal, and lifecycle hooks through its established code path. Configs are resolved before the call (since `mutate_graph` may delete metadata for replaced cells) and applied after registration. Draft cells are excluded from the run set returned by `mutate_graph`.
`mutate_graph` calls `_deactivate_cell` (which removes a cell from the dict) then `_try_registering_cell` (which re-adds it at the end). This means updated cells lose their position in the ordering. Since the code mode plan tracks the intended cell order, we need to reorder the graph's internal dict after mutation to match. Adds `Topology.reorder_nodes` to rearrange the cells dict in-place, and calls it in `_apply_ops` right after `mutate_graph` returns.
The dry-run compile check in `_dry_run_compile` evicts and re-registers cells to validate updates, but `register_cell` appends to the end of the dict. This corrupts the cell ordering that `_apply_ops` reads on line 620 via `list(self.graph.cells.keys())`, so the plan built from that ordering preserves the wrong position. The fix snapshots the cell order before any mutations and restores it after cleanup. Separately, `run_scratchpad` was not flushing `state_updates` after execution. When a widget `.observe()` callback calls a `mo.state` setter from the scratchpad, the update gets queued but never processed because `run_scratchpad` returns without calling `_run_cells(set())`. Other code paths like `set_ui_element_value` and `handle_receive_model_message` already do this flush. Adding the same pattern to `run_scratchpad` fixes downstream cell reactivity for programmatic widget state changes.
The dry-run compile check in _dry_run_compile did not simulate _DeleteOp, so deleting a cell and creating a replacement that defines the same names would falsely raise "Multiply-defined names". Now delete ops evict cells from the graph during the dry run, matching the behavior already in place for _UpdateOp. Also renames update_cell to edit_cell for a clearer API.
When cell operations are applied via `async with cm.get_context()`,
`__aexit__` now prints a line per operation to stdout. This gives agents
confirmation that ops took effect without needing to re-query the graph.
Previously, success was completely silent, making it impossible to
distinguish "operations applied" from "nothing happened" when executing
remotely via the scratchpad.
Output looks like:
✓ created cell 'data_loader'
✓ edited code of cell 'a1b2c3d4'
✓ deleted cell 'scratch'
Cell names are used when available, otherwise the first 8 characters
of the cell ID.
When using /api/execute in headless mode, the notebook may not have been instantiated yet, meaning the kernel's globals are empty and scratchpad code can't reference notebook variables. This adds a session.instantiate() call before enqueuing the scratchpad command. The kernel already no-ops if the notebook is already instantiated, so this is safe on every call. Queue ordering guarantees instantiation completes before the scratchpad runs.
The code_mode API lacked support for cell names and couldn't convert a
regular cell into a setup cell. `create_cell` and `edit_cell` now accept
a `name` parameter. Passing `name="setup"` uses the well-known setup
cell ID so the frontend recognizes it as a setup cell — the name itself
is cleared since setup identity is purely a cell_id concern.
`edit_cell` handles the tricky case: when an existing cell is converted
to setup, it migrates the cell_id via a `new_cell_id` field on
`_UpdateOp`. The plan builder swaps the ID in place (preserving
position), and `_apply_ops` sees the old ID as deleted and the new one
as added, which is exactly what `mutate_graph` needs.
Cell IDs now use `CellIdGenerator` (producing short 4-letter IDs like
`Hbol`) instead of UUIDs. The generator is seeded with existing graph
cell IDs to avoid collisions. This was necessary because `cell_manager`
is unreachable from the kernel process — it lives on the server side.
```python
async with cm.get_context() as ctx:
ctx.create_cell("import marimo as mo", name="setup")
ctx.edit_cell("Hbol", name="setup") # migrates cell_id
```
c633134 to
57882da
Compare
When batching multiple operations in a single `async with` block,
`before`/`after` targets could only reference cells by their live graph
name or by a pending add's cell ID. This meant that `create_cell(...,
name="foo")` followed by `create_cell(..., after="foo")` would fail, and
so would referencing a cell by a name assigned via `edit_cell` in the
same batch.
`_resolve_target` now also searches pending adds by name and queued
`_UpdateOp` renames, so all of these work within a single batch:
ctx.create_cell("x = 1", name="first")
ctx.create_cell("y = x + 1", after="first")
ctx.edit_cell("old_cell", name="renamed")
ctx.create_cell("z = 1", after="renamed")
9e24f0d to
4f9e1e7
Compare
|
🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.20.5-dev70 |
This introduces
marimo._code_mode, an internal agent-only API that gives programmatic access to a running marimo notebook. The motivating use case is letting agents (e.g. from a scratchpad) insert, delete, replace, and reorder cells without going through the frontend UI. e.g.,