MCP + MCTS Coding Agents

This repository contains the code and Docker setup for the O’Reilly Demo Day session:

O’Reilly Demo Day
Building Robust AI Apps and Agents with MCP

Demo Overview

This code shows how autonomous agents can use MCP not just to connect with tools but to communicate with each other.

We run a multi-agent coding system where agents collaborate by:

generating Python solutions,
testing and benchmarking them inside a sandbox,
refining based on structured feedback,
and evolving code together toward higher quality.

MCP acts as the shared protocol between reasoning agents, enabling structured coordination, reflection, and continuous innovation.
This is a step toward self-improving AI systems.

Why MCP?

Traditional local execution (like a Python REPL) comes with risks and limitations:

Security risks: Executing arbitrary generated code on the host machine can lead to data leaks or system damage.
Lack of isolation: Mistakes (e.g. infinite loops, recursion errors) can hang the process.
Difficult scaling: Running code on multiple agents across nodes requires orchestration and sandboxing.

MCP (Model Context Protocol) solves these problems:

Runs Python in Pyodide + Deno, sandboxed from the host OS.
Automatically installs packages into the sandbox when needed.
Captures stdout, stderr, and return values cleanly.
Works across nodes and clusters (e.g. Kubernetes with gVisor).
Provides a shared protocol for inter-agent communication, not just tool calls.

This makes MCP an excellent fit for multi-agent systems where reasoning agents collaborate safely.

Prerequisites

Docker installed (version 24+ recommended).
An OpenAI API key stored in a .env file.

Your .env file should look like this:

OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o

Step 1: Build Docker image

The provided Dockerfile builds a container with:

Python 3.11 slim
Deno (for MCP Pyodide backend)
mcp-run-python, treequest, langgraph, langchain-openai, and other dependencies

Build the image:

docker build -t mcp-run-python:latest .

Step 2: Start the MCP server (stdio mode)

This runs the Python MCP server inside the container:

docker run --rm -it mcp-run-python:latest /usr/local/bin/entrypoint.sh stdio

Step 3: Run the Tree Search + MCP Demo

From the project root, run:

docker run --rm -it \
  --env-file .env \
  -v "$PWD":/app -w /app \
  mcp-run-python:latest \
  python treesearch_fib.py

This executes the LangGraph + TreeQuest + MCP sandboxed agents pipeline. The system will:

Iteratively generate Fibonacci implementations,
Run unit tests and benchmarks in the sandbox,
Refine the code based on structured feedback,
Track performance improvements step by step,
And finally print the best evolved solution with a score.

Example Output

Best answer

```python
def fibonacci(n):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

Key Idea

Each agent generates or refines code using the LLM.
MCP provides a safe execution sandbox for correctness + performance checks.
Agents communicate through MCP, sharing results and corrections.
Over iterations, the system self-improves its code automatically.

How to Expand This Demo

This repo is a boilerplate — the intention is to move away from the idea that MCP is just a tool connector.
Instead, think of MCP as a shared infrastructure for agent collaboration.

Here are some ways to extend this demo:

1. Sandbox Variations

Run MCP servers on Kubernetes with gVisor for secure multi-node isolation.
Swap in E2B sandboxes for cloud-based ephemeral compute.
Add specialized MCP servers (databases, APIs, file systems).

2. Search Algorithms

Replace TreeQuest AB-MCTS with:
- Classic MCTS
- Beam search
- Best-of-N sampling
- Hybrid evolutionary search.

Each will shape how agents explore and refine code.

3. Benchmarks

Extend beyond runtime + memory:
- Add time complexity inference from LLMs.
- Include robustness benchmarks (edge cases, random inputs).
- Add energy efficiency or cost metrics for sustainability-aware code.

4. Unit Tests

Provide custom unit test harnesses for different problem domains (sorting, graph algorithms, data structures).
Introduce an agent that writes new unit tests on the fly to challenge other agents.

5. Multi-Agent Roles

Add agents with distinct roles:
- Coder agent: writes code.
- Tester agent: builds unit tests dynamically.
- Benchmark agent: runs stress tests.
- Reviewer agent: evaluates readability and style.

These can coordinate via MCP to continuously improve solutions.

License

This demo is provided as part of O’Reilly Demo Day educational content.
It is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
resources		resources
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
agent_subgraph.png		agent_subgraph.png
entrypoint.sh		entrypoint.sh
example_env_file.txt		example_env_file.txt
mcp_mcts_graph.png		mcp_mcts_graph.png
requirements.txt		requirements.txt
sandbox_bridge.py		sandbox_bridge.py
sandbox_test.py		sandbox_test.py
test_mcp_http.py		test_mcp_http.py
treesearch_fib.py		treesearch_fib.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MCP + MCTS Coding Agents

Demo Overview

Why MCP?

Prerequisites

Step 1: Build Docker image

Step 2: Start the MCP server (stdio mode)

Step 3: Run the Tree Search + MCP Demo

Example Output

Key Idea

How to Expand This Demo

1. Sandbox Variations

2. Search Algorithms

3. Benchmarks

4. Unit Tests

5. Multi-Agent Roles

License

About

Uh oh!

Releases

Packages

Languages

oklearninglow/o_reilly_demo_day_MCP

Folders and files

Latest commit

History

Repository files navigation

MCP + MCTS Coding Agents

Demo Overview

Why MCP?

Prerequisites

Step 1: Build Docker image

Step 2: Start the MCP server (stdio mode)

Step 3: Run the Tree Search + MCP Demo

Example Output

Key Idea

How to Expand This Demo

1. Sandbox Variations

2. Search Algorithms

3. Benchmarks

4. Unit Tests

5. Multi-Agent Roles

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages