Skip to content

feat(mcp): MVP MCP server — 5 tools, stdio transport, ~24ms node eval (stacked on #22)#23

Open
misaelzapata wants to merge 3 commits into
feat/slirp-net-and-atomic-disk-metafrom
feat/mcp-server
Open

feat(mcp): MVP MCP server — 5 tools, stdio transport, ~24ms node eval (stacked on #22)#23
misaelzapata wants to merge 3 commits into
feat/slirp-net-and-atomic-disk-metafrom
feat/mcp-server

Conversation

@misaelzapata

Copy link
Copy Markdown
Owner

Summary

Adds gocracker-mcp, a JSON-RPC 2.0 server that lets AI clients
(Claude Desktop, Claude Code, custom MCP-aware agents) execute code
in gocracker sandboxes via well-typed tools. Speaks
modelcontextprotocol.io spec rev 2025-11-25.

Stacked on top of #22 (feat/slirp-net-and-atomic-disk-meta), which
adds the node-warm runtime path that the process.eval_node tool
depends on. Review after #22 lands.

What ships

5 tools, all thin SDK wrappers

Tool Backed by Use
sandbox.lease Client.LeaseSandbox Warm-pool lease (~9 ms primitive)
sandbox.delete Client.Delete Idempotent teardown
sandbox.recycle Client.Recycle release+lease in one round-trip
process.exec ToolboxClient.Exec Generic command exec, ~36 ms with node startup
process.eval_node ToolboxClient.Exec(["node-warm", src]) ~24 ms in-guest JS eval against pre-loaded V8

process.eval_node is the marquee differentiator. It routes to the
in-guest pre-loaded node REPL (shipped on #22 as Pieza B); no other
public MCP server has anything close at this latency. State (globals)
persists across calls in the same sandbox — useful for stateful AI
loops.

Transport: stdio (MVP)

Claude Desktop spawns gocracker-mcp as a subprocess, frames JSON-RPC
over stdin/stdout. Stderr is the diagnostic log surface — stdout is
RESERVED for the wire protocol.

Streamable HTTP (multi-tenant, bearer auth) is a follow-up; same
Server.Handle core, just a different framing layer.

Tests

10 tests covering initialize, tools/list (deterministic sort),
tools/call happy + error paths, JSON-RPC version validation, parse
errors, the stdio loop. All against an httptest-backed fake
sandboxd; no real KVM. ~225 LoC.

`go test ./sandboxes/internal/mcp/` → ok 0.005s

Manual smoke

```bash
$ (echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-11-25","clientInfo":{"name":"smoke","version":"1"}}}'
echo '{"jsonrpc":"2.0","method":"notifications/initialized"}'
echo '{"jsonrpc":"2.0","id":2,"method":"tools/list"}')
| bin/gocracker-mcp --sandboxd http://127.0.0.1:9091

{"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2025-11-25","capabilities":{"tools":{}},"serverInfo":{"name":"gocracker-mcp","version":"dev"}}}
{"jsonrpc":"2.0","id":2,"result":{"tools":[...5 entries with full inputSchema...]}}
```

Two valid JSON-RPC frames out, all 5 tools listed with their JSON
Schemas. Verified.

Code layout (~1.3k LoC, all new files)

  • `sandboxes/cmd/gocracker-mcp/main.go` — entry binary (99 LoC)
  • `sandboxes/internal/mcp/protocol.go` — wire types (180 LoC)
  • `sandboxes/internal/mcp/server.go` — Server + dispatch (215 LoC)
  • `sandboxes/internal/mcp/tools.go` — 5 tool handlers (335 LoC)
  • `sandboxes/internal/mcp/util.go` — helpers (33 LoC)
  • `sandboxes/internal/mcp/server_test.go` — 10 tests (225 LoC)
  • `docs/design/mcp-server.md` — design doc (208 LoC)

Zero changes to existing packages. This is purely additive.

Why this matters

State of the art for "execute code" MCP servers: Docker (100 ms+
cold start), V8 isolates (no filesystem, no process state), or cold
microVMs (~125–500 ms even warm — E2B, Daytona, Cloudflare Code Mode,
Arrakis, ConTree all cluster here). Two approaches per the
boxlite/HN research: nobody combines real Linux guest with sub-30 ms
warm-pool restore + sub-30 ms in-guest JS eval.

gocracker MCP gets the ~10 ms ms-floor while keeping a real Linux
guest with files, processes, snapshots — the AI can `pip install`,
write to `/tmp`, fork a server, and have all of it inside a real
KVM isolation boundary.

Claude Desktop integration

```json
{
"mcpServers": {
"gocracker": {
"command": "/usr/local/bin/gocracker-mcp",
"args": ["--sandboxd", "http://127.0.0.1:9091"]
}
}
}
```

Restart Claude Desktop. Ask: "Lease a base-node-warm sandbox and run
`console.log(process.version)` in it."
— Claude calls the right
tools and replies with the output.

Next phases (separate PRs after this lands)

Documented in `docs/design/mcp-server.md`:

  1. `sandbox.fan_out` — N=4..64 microsecond CoW fork in one RPC.
    The primitive that nobody else has at this latency. ~200 LoC.
  2. `speculate.race` — fork N children, run candidates concurrently,
    first to satisfy `success_predicate` wins. ~150 LoC.
  3. `checkpoint.tree.diff` — diff two checkpoints (files, env,
    RSS pages). Surfaces gocracker's existing dirty-log capture. ~250 LoC.
  4. `process.exec_stream` — SSE streaming for long-running jobs. ~80 LoC.
  5. `files.put` / `files.get` / `preview.mint` — std SDK
    passthroughs. ~100 LoC each.
  6. Streamable HTTP transport — multi-tenant, bearer auth. ~150 LoC.

Test plan

Adds gocracker-mcp, a JSON-RPC 2.0 server that lets AI clients (Claude
Desktop, Claude Code, custom MCP-aware agents) execute code in
gocracker sandboxes by calling well-typed tools. Speaks
modelcontextprotocol.io spec rev 2025-11-25.

The differentiator vs E2B / Daytona / Cloudflare Code Mode / Arrakis
is the new process.eval_node tool: it routes JS source to an in-guest
pre-loaded V8 instance (the node-warm runtime shipped on
feat/slirp-net-and-atomic-disk-meta cdbcfc7) for ~24 ms in-guest exec
vs ~36 ms for fork+exec'ed `node`. Combined with sandboxd's sub-30 ms
warm-pool restore, an AI tool call lands well under 100 ms total.

# What ships

## 5 tools, all thin SDK wrappers (no VMM state in this server)

- sandbox.lease(template_id, timeout_ms?) → warm-pool lease via
  Client.LeaseSandbox.
- sandbox.delete(id) → Client.Delete.
- sandbox.recycle(id) → release-and-release in one round-trip.
- process.exec(sandbox_id, cmd[], env?, env_map?, workdir?,
  timeout_ms?, stdin?) → ToolboxClient.Exec.
- process.eval_node(sandbox_id, source, timeout_ms?) → ToolboxClient.Exec
  with cmd[0]="node-warm". Requires base-node-warm template.

Each is ~30–80 LoC; the server holds no state, sandboxd is the source
of truth.

## Stdio transport (default)

Claude Desktop spawns gocracker-mcp as a subprocess, frames JSON-RPC
over stdin/stdout, reads diagnostic logs from stderr. ServeStdio in
sandboxes/internal/mcp/server.go drives the loop until EOF or ctx
cancel.

Stderr is the ONLY log surface — stdout is reserved for JSON-RPC
responses. Polluting stdout breaks the wire protocol. Indirected
through stderrSink so tests can swap it.

## Wire format

JSON-RPC 2.0, line-delimited. Tool errors come back two ways:
- Protocol-level (parse, unknown method, bad params) → JSON-RPC error
  object with -32xxx codes.
- Tool-level (sandbox not found, exec failed) → successful result
  with isError=true and the message in a text content block. The LLM
  sees these inside its own context and adjusts.

# Tests

10 tests covering initialize handshake, tools/list (deterministic
sort), tools/call happy + error paths, JSON-RPC version validation,
parse errors, the stdio loop. All run against an httptest-backed
fake sandboxd; no real KVM needed. ~225 LoC.

  go test ./sandboxes/internal/mcp/  →  ok 0.005s

# Manual smoke

  $ (echo '{"jsonrpc":"2.0","id":1,"method":"initialize",...}'
     echo '{"jsonrpc":"2.0","method":"notifications/initialized"}'
     echo '{"jsonrpc":"2.0","id":2,"method":"tools/list"}') \
    | bin/gocracker-mcp --sandboxd http://127.0.0.1:9091

Returns 2 valid JSON-RPC frames: initialize result with protocol
version + capabilities, then tools/list result with all 5 tools and
their inputSchema (verified).

# Code layout

- sandboxes/cmd/gocracker-mcp/main.go    (99 LoC)  — entry binary
- sandboxes/internal/mcp/protocol.go    (180 LoC) — wire types
- sandboxes/internal/mcp/server.go      (215 LoC) — Server + dispatch
- sandboxes/internal/mcp/tools.go       (335 LoC) — 5 tool handlers
- sandboxes/internal/mcp/util.go        (33 LoC)  — small helpers
- sandboxes/internal/mcp/server_test.go (225 LoC) — 10 tests
- docs/design/mcp-server.md             (208 LoC) — design doc

Total: ~1.3k LoC, all under sandboxes/cmd/gocracker-mcp and
sandboxes/internal/mcp. Zero changes to existing packages.

# Claude Desktop integration

Add to claude_desktop_config.json:

  {
    "mcpServers": {
      "gocracker": {
        "command": "/usr/local/bin/gocracker-mcp",
        "args": ["--sandboxd", "http://127.0.0.1:9091"]
      }
    }
  }

# Next phases (separate PRs)

Documented in docs/design/mcp-server.md:

1. sandbox.fan_out — N=4..64 microsecond CoW fork in one RPC. The
   primitive that nobody else has at this latency. ~200 LoC.
2. speculate.race — fork N children, run candidates concurrently,
   first to satisfy success_predicate wins. ~150 LoC.
3. checkpoint.tree.diff — diff two checkpoints (files, env, RSS).
   Surfaces gocracker's existing dirty-log capture as a verb. ~250 LoC.
4. process.exec_stream — SSE streaming stdout/stderr. ~80 LoC.
5. files.put / files.get / preview.mint — std SDK passthroughs. ~100 LoC.
6. Streamable HTTP transport (multi-tenant, bearer auth). ~150 LoC.

# Builds on (parent branch)

Stacked on top of feat/slirp-net-and-atomic-disk-meta (PR #22), which
added the node-warm runtime path that process.eval_node depends on.
This MCP branch should be reviewed AFTER #22 lands so the eval_node
tool has a real base-node-warm template to point at.
Copilot AI review requested due to automatic review settings May 7, 2026 01:06

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an MVP Model Context Protocol (MCP) JSON-RPC 2.0 server (gocracker-mcp) that exposes gocracker sandbox primitives over stdio, enabling MCP-aware AI clients to lease/recycle sandboxes and execute commands (including the warm node-warm eval path).

Changes:

  • Introduces a new MCP server core (protocol types, dispatch, stdio transport) plus 5 tool handlers backed by the existing Go SDK.
  • Adds unit tests using an httptest fake sandboxd.
  • Adds a design doc describing architecture, wire format, and roadmap.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
sandboxes/internal/mcp/protocol.go Defines JSON-RPC/MCP wire types and constants.
sandboxes/internal/mcp/server.go Implements request dispatch + stdio framing loop.
sandboxes/internal/mcp/tools.go Registers/implements the 5 MVP MCP tools (SDK wrappers).
sandboxes/internal/mcp/util.go Adds stderr sink indirection + deterministic tool sorting helper.
sandboxes/internal/mcp/server_test.go Adds tests for initialize/tools/list/tools/call + stdio round-trip.
sandboxes/cmd/gocracker-mcp/main.go New CLI entrypoint wiring sandboxd client + stdio server loop.
docs/design/mcp-server.md Design documentation + quick start instructions.
.gitignore Ignores local .wrangler/ directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +179 to +190
InputSchema: json.RawMessage(`{
"type": "object",
"properties": {
"sandbox_id": {"type": "string"},
"cmd": {"type": "array", "items": {"type": "string"}},
"env": {"type": "object", "additionalProperties": {"type": "string"}},
"workdir": {"type": "string"},
"timeout_ms": {"type": "integer"},
"stdin": {"type": "string"}
},
"required": ["sandbox_id", "cmd"]
}`),
s.RegisterTool(Tool{
Name: "process.exec",
Description: `Run a command inside the sandbox via the toolbox /exec endpoint. Blocks until the process exits ` +
`(or timeout), then returns aggregated stdout/stderr/exit_code. For long-running jobs use process.exec_stream.`,
Comment on lines +81 to +85
// Handle dispatches one JSON-RPC frame. Returns nil for notifications
// (no id), otherwise a Response. Errors are reported as JSON-RPC
// error objects, never raw Go errors — the caller doesn't need to
// translate.
func (s *Server) Handle(ctx context.Context, raw []byte) *Response {
Comment on lines +86 to +92
var req Request
if err := json.Unmarshal(raw, &req); err != nil {
return errorResponse(nil, ErrParseError, "parse error: "+err.Error())
}
if req.JSONRPC != JSONRPCVersion {
return errorResponse(req.ID, ErrInvalidRequest, "jsonrpc must be \"2.0\"")
}
Comment on lines +71 to +90
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

// SIGINT / SIGTERM cancels ctx so ServeStdio returns; the parent
// (Claude Desktop) usually just closes our stdin which also makes
// ServeStdio return. Handling signals is belt-and-suspenders.
sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
go func() {
<-sigCh
cancel()
}()

fmt.Fprintf(os.Stderr, "[gocracker-mcp] starting (sandboxd=%s, version=%s)\n",
*sandboxdURL, buildinfo.Version)

if err := server.ServeStdio(ctx, os.Stdin, os.Stdout); err != nil {
fmt.Fprintf(os.Stderr, "[gocracker-mcp] fatal: %v\n", err)
os.Exit(1)
}
Comment on lines +34 to +40
// initialised tracks whether the client has completed the MCP
// `initialize` handshake. Most methods refuse to run before this
// per the spec; we soft-enforce it (log + serve) to keep the
// server usable from raw `curl` for debugging.
initialised bool
mu sync.Mutex

Comment thread docs/design/mcp-server.md
Comment on lines +171 to +176
(echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-11-25","clientInfo":{"name":"curl","version":"1"}}}'
echo '{"jsonrpc":"2.0","method":"notifications/initialized"}'
echo '{"jsonrpc":"2.0","id":2,"method":"tools/list"}') \
| gocracker-mcp --sandboxd http://127.0.0.1:9091 \
| jq .
```
Comment thread docs/design/mcp-server.md
"execute code" MCP servers either run Docker (100 ms+ cold start),
V8 isolates (no filesystem, no process state), or cold microVMs
(~125–500 ms even warm). gocracker's warm-pool + node-warm gets the
~10 ms ms-floor while keeping a real Linux guest with files,
t.Fatalf("expected nil response for notification, got %+v", resp)
}
}

… competitors

Adds docs/perf-snapshot-2026-05-06.md as a single-source-of-truth
reference for the current performance numbers, the timing breakdown,
and the public-product comparison. Useful for:

- Answering HN-style perf questions ("how does this compare to
  boxlite?") without needing to re-bench every time.
- Confirming the gocracker-mcp #23 latency claims with measured data.
- Onboarding contributors to which path each number belongs to (CLI
  cold vs WARM vs node-warm vs daemon-mode pool primitive).

# What's measured (5 paths)

  Cold-CLI                gocracker run -dockerfile          240 ms median
  WARM-CLI                gocracker run -warm                 71 ms median
  NODE-WARM-CLI           ... -warm-runtime node -cmd node-warm
                                                              70 ms median
  POOL-PRIMITIVE          pool-bench /bin/true                 7 ms median
  POOL-PRIMITIVE node -v  pool-bench node -v                  37 ms median

# Trace breakdown

Two side-by-side timelines (node-warm REPL eval vs regular
fork+exec node -v) showing where each ms goes inside the gocracker
process. node-warm guest exec is 24 ms vs 31 ms for fork+exec
node -v (~28 % faster in-guest).

The CLI wall-clock 70 ms is dominated by the ~35-40 ms host startup
(sudo + Go init) which the trace anchor doesn't capture. Daemon-mode
callers (sandboxd lease + exec via SDK or MCP) pay 7-37 ms total
because that startup is amortised.

# Comparison table

vs 9 competitors (boxlite, E2B, Daytona, Vercel Sandbox, Cloudflare
Code Mode, Modal, Arrakis, node-vmm). Each row cites its source.
Key findings:

- boxlite has NO in-memory snapshot/restore (qcow2 disk-only +
  freeze/thaw, restore = cold reboot). Source: acmerfight gist.
  Their <50 ms claim is unbenchmarked cold boot, not warm restore.
- gocracker is the only project combining KVM real isolation +
  in-memory dirty-delta snapshot/restore + pre-loaded runtime
  (node-warm) at sub-30 ms latency.
- Cloudflare Code Mode is faster (<5 ms) but trades off filesystem
  and process state — different threat model.

# Source links

ComputeSDK leaderboard, boxlite docs + source-verified gist, E2B's
UFFD blog, Daytona WarmPoolService deepwiki, Modal mem-snapshots
blog, Arrakis repo, Cloudflare Code Mode blog. Every claim cited.

# Reproduce

Last section gives the exact bench commands for each path so anyone
can verify against their own host. All sources are in the repo at
PR #22 (perf foundation) + PR #23 (MCP).
…inding

Bug surfaced during end-to-end testing of feat/mcp-server: piping a
malformed line into the stdio loop fatal-exited the entire binary
("[gocracker-mcp] fatal: decode: invalid character 'i' looking for
beginning of value"), breaking subsequent valid frames.

Root cause: ServeStdio used json.Decoder, which is stateful — once
it hits unexpected input, the rest of the stream is corrupt and
cannot be recovered. The unit test TestParseError didn't catch this
because it exercised Handle() directly, not the ServeStdio loop.

Fix: switch to bufio.Scanner + per-line json.Unmarshal. Each line
is independently parsed; a malformed line emits a JSON-RPC parse-
error response (id=null per spec) and the loop keeps reading. Real
read errors (EOF, scanner buffer overflow) still terminate cleanly.

Buffer is bumped to 4 MiB max line — MCP messages can carry
multi-hundred-KB JS source via process.eval_node, and the default
64 KiB Scanner buffer would silently truncate them.

Adds TestServeStdioRecoversFromBadJSON regression test covering the
specific failure mode (bad line, then valid ping, expects two
responses with the second being a successful ping).

# End-to-end test results (covered by this commit + manual smoke)

  $ sudo bin/gocracker-sandboxd serve --addr :9092 --kernel-path ...
  $ # via curl: cold-create alpine sandbox
  $ # via MCP: initialize → exec(echo) → exec(uname) → exec(env_map) → exec(exit 7) → delete

  init: server=gocracker-mcp proto=2025-11-25                            ✅
  echo: 64 ms wall, exit=0, stdout="hello mcp\n"                         ✅
  uname: <3 ms round-trip, "Linux gocracker 6.1.102 PREEMPT ... x86_64"  ✅
  env_map: <3 ms, MCP_TEST=works (env_map → KEY=VALUE flatten works)     ✅
  exit-7: exit=7, stderr="to-stderr" (non-zero exit + stderr captured)   ✅
  delete: {"id":"sb-...","ok":true}                                      ✅
  mcp clean exit rc=0                                                    ✅

# Real-deployment finding (NOT fixed in this commit; followup)

The sandbox UDS in sandboxd's state-dir defaults to root-only
permissions:

  $ ls -la /tmp/state/sandboxes/sb-X.sock
  ls: Permission denied

So gocracker-mcp running as the user (Claude Desktop's spawning
model) can't open the socket; it hangs silently waiting on a dial
that fails with EACCES. Today: workaround by running gocracker-mcp
under sudo (matching sandboxd's user). Followup: sandboxd should
expose --uds-group GROUP to chmod the UDS files to a known group,
so Claude Desktop's user-context MCP server can talk to a root-owned
sandboxd without privilege escalation. Tracked as a known limitation
in docs/design/mcp-server.md ("Auth / multi-tenancy" follow-up).

# Tests

  go test -count=1 ./sandboxes/internal/mcp/  ─→  ok  0.005s   (11/11)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants