feat(mcp): MVP MCP server — 5 tools, stdio transport, ~24ms node eval (stacked on #22)#23
Open
misaelzapata wants to merge 3 commits into
Open
feat(mcp): MVP MCP server — 5 tools, stdio transport, ~24ms node eval (stacked on #22)#23misaelzapata wants to merge 3 commits into
misaelzapata wants to merge 3 commits into
Conversation
Adds gocracker-mcp, a JSON-RPC 2.0 server that lets AI clients (Claude Desktop, Claude Code, custom MCP-aware agents) execute code in gocracker sandboxes by calling well-typed tools. Speaks modelcontextprotocol.io spec rev 2025-11-25. The differentiator vs E2B / Daytona / Cloudflare Code Mode / Arrakis is the new process.eval_node tool: it routes JS source to an in-guest pre-loaded V8 instance (the node-warm runtime shipped on feat/slirp-net-and-atomic-disk-meta cdbcfc7) for ~24 ms in-guest exec vs ~36 ms for fork+exec'ed `node`. Combined with sandboxd's sub-30 ms warm-pool restore, an AI tool call lands well under 100 ms total. # What ships ## 5 tools, all thin SDK wrappers (no VMM state in this server) - sandbox.lease(template_id, timeout_ms?) → warm-pool lease via Client.LeaseSandbox. - sandbox.delete(id) → Client.Delete. - sandbox.recycle(id) → release-and-release in one round-trip. - process.exec(sandbox_id, cmd[], env?, env_map?, workdir?, timeout_ms?, stdin?) → ToolboxClient.Exec. - process.eval_node(sandbox_id, source, timeout_ms?) → ToolboxClient.Exec with cmd[0]="node-warm". Requires base-node-warm template. Each is ~30–80 LoC; the server holds no state, sandboxd is the source of truth. ## Stdio transport (default) Claude Desktop spawns gocracker-mcp as a subprocess, frames JSON-RPC over stdin/stdout, reads diagnostic logs from stderr. ServeStdio in sandboxes/internal/mcp/server.go drives the loop until EOF or ctx cancel. Stderr is the ONLY log surface — stdout is reserved for JSON-RPC responses. Polluting stdout breaks the wire protocol. Indirected through stderrSink so tests can swap it. ## Wire format JSON-RPC 2.0, line-delimited. Tool errors come back two ways: - Protocol-level (parse, unknown method, bad params) → JSON-RPC error object with -32xxx codes. - Tool-level (sandbox not found, exec failed) → successful result with isError=true and the message in a text content block. The LLM sees these inside its own context and adjusts. # Tests 10 tests covering initialize handshake, tools/list (deterministic sort), tools/call happy + error paths, JSON-RPC version validation, parse errors, the stdio loop. All run against an httptest-backed fake sandboxd; no real KVM needed. ~225 LoC. go test ./sandboxes/internal/mcp/ → ok 0.005s # Manual smoke $ (echo '{"jsonrpc":"2.0","id":1,"method":"initialize",...}' echo '{"jsonrpc":"2.0","method":"notifications/initialized"}' echo '{"jsonrpc":"2.0","id":2,"method":"tools/list"}') \ | bin/gocracker-mcp --sandboxd http://127.0.0.1:9091 Returns 2 valid JSON-RPC frames: initialize result with protocol version + capabilities, then tools/list result with all 5 tools and their inputSchema (verified). # Code layout - sandboxes/cmd/gocracker-mcp/main.go (99 LoC) — entry binary - sandboxes/internal/mcp/protocol.go (180 LoC) — wire types - sandboxes/internal/mcp/server.go (215 LoC) — Server + dispatch - sandboxes/internal/mcp/tools.go (335 LoC) — 5 tool handlers - sandboxes/internal/mcp/util.go (33 LoC) — small helpers - sandboxes/internal/mcp/server_test.go (225 LoC) — 10 tests - docs/design/mcp-server.md (208 LoC) — design doc Total: ~1.3k LoC, all under sandboxes/cmd/gocracker-mcp and sandboxes/internal/mcp. Zero changes to existing packages. # Claude Desktop integration Add to claude_desktop_config.json: { "mcpServers": { "gocracker": { "command": "/usr/local/bin/gocracker-mcp", "args": ["--sandboxd", "http://127.0.0.1:9091"] } } } # Next phases (separate PRs) Documented in docs/design/mcp-server.md: 1. sandbox.fan_out — N=4..64 microsecond CoW fork in one RPC. The primitive that nobody else has at this latency. ~200 LoC. 2. speculate.race — fork N children, run candidates concurrently, first to satisfy success_predicate wins. ~150 LoC. 3. checkpoint.tree.diff — diff two checkpoints (files, env, RSS). Surfaces gocracker's existing dirty-log capture as a verb. ~250 LoC. 4. process.exec_stream — SSE streaming stdout/stderr. ~80 LoC. 5. files.put / files.get / preview.mint — std SDK passthroughs. ~100 LoC. 6. Streamable HTTP transport (multi-tenant, bearer auth). ~150 LoC. # Builds on (parent branch) Stacked on top of feat/slirp-net-and-atomic-disk-meta (PR #22), which added the node-warm runtime path that process.eval_node depends on. This MCP branch should be reviewed AFTER #22 lands so the eval_node tool has a real base-node-warm template to point at.
There was a problem hiding this comment.
Pull request overview
Adds an MVP Model Context Protocol (MCP) JSON-RPC 2.0 server (gocracker-mcp) that exposes gocracker sandbox primitives over stdio, enabling MCP-aware AI clients to lease/recycle sandboxes and execute commands (including the warm node-warm eval path).
Changes:
- Introduces a new MCP server core (protocol types, dispatch, stdio transport) plus 5 tool handlers backed by the existing Go SDK.
- Adds unit tests using an
httptestfake sandboxd. - Adds a design doc describing architecture, wire format, and roadmap.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| sandboxes/internal/mcp/protocol.go | Defines JSON-RPC/MCP wire types and constants. |
| sandboxes/internal/mcp/server.go | Implements request dispatch + stdio framing loop. |
| sandboxes/internal/mcp/tools.go | Registers/implements the 5 MVP MCP tools (SDK wrappers). |
| sandboxes/internal/mcp/util.go | Adds stderr sink indirection + deterministic tool sorting helper. |
| sandboxes/internal/mcp/server_test.go | Adds tests for initialize/tools/list/tools/call + stdio round-trip. |
| sandboxes/cmd/gocracker-mcp/main.go | New CLI entrypoint wiring sandboxd client + stdio server loop. |
| docs/design/mcp-server.md | Design documentation + quick start instructions. |
| .gitignore | Ignores local .wrangler/ directory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+179
to
+190
| InputSchema: json.RawMessage(`{ | ||
| "type": "object", | ||
| "properties": { | ||
| "sandbox_id": {"type": "string"}, | ||
| "cmd": {"type": "array", "items": {"type": "string"}}, | ||
| "env": {"type": "object", "additionalProperties": {"type": "string"}}, | ||
| "workdir": {"type": "string"}, | ||
| "timeout_ms": {"type": "integer"}, | ||
| "stdin": {"type": "string"} | ||
| }, | ||
| "required": ["sandbox_id", "cmd"] | ||
| }`), |
| s.RegisterTool(Tool{ | ||
| Name: "process.exec", | ||
| Description: `Run a command inside the sandbox via the toolbox /exec endpoint. Blocks until the process exits ` + | ||
| `(or timeout), then returns aggregated stdout/stderr/exit_code. For long-running jobs use process.exec_stream.`, |
Comment on lines
+81
to
+85
| // Handle dispatches one JSON-RPC frame. Returns nil for notifications | ||
| // (no id), otherwise a Response. Errors are reported as JSON-RPC | ||
| // error objects, never raw Go errors — the caller doesn't need to | ||
| // translate. | ||
| func (s *Server) Handle(ctx context.Context, raw []byte) *Response { |
Comment on lines
+86
to
+92
| var req Request | ||
| if err := json.Unmarshal(raw, &req); err != nil { | ||
| return errorResponse(nil, ErrParseError, "parse error: "+err.Error()) | ||
| } | ||
| if req.JSONRPC != JSONRPCVersion { | ||
| return errorResponse(req.ID, ErrInvalidRequest, "jsonrpc must be \"2.0\"") | ||
| } |
Comment on lines
+71
to
+90
| ctx, cancel := context.WithCancel(context.Background()) | ||
| defer cancel() | ||
|
|
||
| // SIGINT / SIGTERM cancels ctx so ServeStdio returns; the parent | ||
| // (Claude Desktop) usually just closes our stdin which also makes | ||
| // ServeStdio return. Handling signals is belt-and-suspenders. | ||
| sigCh := make(chan os.Signal, 1) | ||
| signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM) | ||
| go func() { | ||
| <-sigCh | ||
| cancel() | ||
| }() | ||
|
|
||
| fmt.Fprintf(os.Stderr, "[gocracker-mcp] starting (sandboxd=%s, version=%s)\n", | ||
| *sandboxdURL, buildinfo.Version) | ||
|
|
||
| if err := server.ServeStdio(ctx, os.Stdin, os.Stdout); err != nil { | ||
| fmt.Fprintf(os.Stderr, "[gocracker-mcp] fatal: %v\n", err) | ||
| os.Exit(1) | ||
| } |
Comment on lines
+34
to
+40
| // initialised tracks whether the client has completed the MCP | ||
| // `initialize` handshake. Most methods refuse to run before this | ||
| // per the spec; we soft-enforce it (log + serve) to keep the | ||
| // server usable from raw `curl` for debugging. | ||
| initialised bool | ||
| mu sync.Mutex | ||
|
|
Comment on lines
+171
to
+176
| (echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-11-25","clientInfo":{"name":"curl","version":"1"}}}' | ||
| echo '{"jsonrpc":"2.0","method":"notifications/initialized"}' | ||
| echo '{"jsonrpc":"2.0","id":2,"method":"tools/list"}') \ | ||
| | gocracker-mcp --sandboxd http://127.0.0.1:9091 \ | ||
| | jq . | ||
| ``` |
| "execute code" MCP servers either run Docker (100 ms+ cold start), | ||
| V8 isolates (no filesystem, no process state), or cold microVMs | ||
| (~125–500 ms even warm). gocracker's warm-pool + node-warm gets the | ||
| ~10 ms ms-floor while keeping a real Linux guest with files, |
| t.Fatalf("expected nil response for notification, got %+v", resp) | ||
| } | ||
| } | ||
|
|
… competitors
Adds docs/perf-snapshot-2026-05-06.md as a single-source-of-truth
reference for the current performance numbers, the timing breakdown,
and the public-product comparison. Useful for:
- Answering HN-style perf questions ("how does this compare to
boxlite?") without needing to re-bench every time.
- Confirming the gocracker-mcp #23 latency claims with measured data.
- Onboarding contributors to which path each number belongs to (CLI
cold vs WARM vs node-warm vs daemon-mode pool primitive).
# What's measured (5 paths)
Cold-CLI gocracker run -dockerfile 240 ms median
WARM-CLI gocracker run -warm 71 ms median
NODE-WARM-CLI ... -warm-runtime node -cmd node-warm
70 ms median
POOL-PRIMITIVE pool-bench /bin/true 7 ms median
POOL-PRIMITIVE node -v pool-bench node -v 37 ms median
# Trace breakdown
Two side-by-side timelines (node-warm REPL eval vs regular
fork+exec node -v) showing where each ms goes inside the gocracker
process. node-warm guest exec is 24 ms vs 31 ms for fork+exec
node -v (~28 % faster in-guest).
The CLI wall-clock 70 ms is dominated by the ~35-40 ms host startup
(sudo + Go init) which the trace anchor doesn't capture. Daemon-mode
callers (sandboxd lease + exec via SDK or MCP) pay 7-37 ms total
because that startup is amortised.
# Comparison table
vs 9 competitors (boxlite, E2B, Daytona, Vercel Sandbox, Cloudflare
Code Mode, Modal, Arrakis, node-vmm). Each row cites its source.
Key findings:
- boxlite has NO in-memory snapshot/restore (qcow2 disk-only +
freeze/thaw, restore = cold reboot). Source: acmerfight gist.
Their <50 ms claim is unbenchmarked cold boot, not warm restore.
- gocracker is the only project combining KVM real isolation +
in-memory dirty-delta snapshot/restore + pre-loaded runtime
(node-warm) at sub-30 ms latency.
- Cloudflare Code Mode is faster (<5 ms) but trades off filesystem
and process state — different threat model.
# Source links
ComputeSDK leaderboard, boxlite docs + source-verified gist, E2B's
UFFD blog, Daytona WarmPoolService deepwiki, Modal mem-snapshots
blog, Arrakis repo, Cloudflare Code Mode blog. Every claim cited.
# Reproduce
Last section gives the exact bench commands for each path so anyone
can verify against their own host. All sources are in the repo at
PR #22 (perf foundation) + PR #23 (MCP).
…inding
Bug surfaced during end-to-end testing of feat/mcp-server: piping a
malformed line into the stdio loop fatal-exited the entire binary
("[gocracker-mcp] fatal: decode: invalid character 'i' looking for
beginning of value"), breaking subsequent valid frames.
Root cause: ServeStdio used json.Decoder, which is stateful — once
it hits unexpected input, the rest of the stream is corrupt and
cannot be recovered. The unit test TestParseError didn't catch this
because it exercised Handle() directly, not the ServeStdio loop.
Fix: switch to bufio.Scanner + per-line json.Unmarshal. Each line
is independently parsed; a malformed line emits a JSON-RPC parse-
error response (id=null per spec) and the loop keeps reading. Real
read errors (EOF, scanner buffer overflow) still terminate cleanly.
Buffer is bumped to 4 MiB max line — MCP messages can carry
multi-hundred-KB JS source via process.eval_node, and the default
64 KiB Scanner buffer would silently truncate them.
Adds TestServeStdioRecoversFromBadJSON regression test covering the
specific failure mode (bad line, then valid ping, expects two
responses with the second being a successful ping).
# End-to-end test results (covered by this commit + manual smoke)
$ sudo bin/gocracker-sandboxd serve --addr :9092 --kernel-path ...
$ # via curl: cold-create alpine sandbox
$ # via MCP: initialize → exec(echo) → exec(uname) → exec(env_map) → exec(exit 7) → delete
init: server=gocracker-mcp proto=2025-11-25 ✅
echo: 64 ms wall, exit=0, stdout="hello mcp\n" ✅
uname: <3 ms round-trip, "Linux gocracker 6.1.102 PREEMPT ... x86_64" ✅
env_map: <3 ms, MCP_TEST=works (env_map → KEY=VALUE flatten works) ✅
exit-7: exit=7, stderr="to-stderr" (non-zero exit + stderr captured) ✅
delete: {"id":"sb-...","ok":true} ✅
mcp clean exit rc=0 ✅
# Real-deployment finding (NOT fixed in this commit; followup)
The sandbox UDS in sandboxd's state-dir defaults to root-only
permissions:
$ ls -la /tmp/state/sandboxes/sb-X.sock
ls: Permission denied
So gocracker-mcp running as the user (Claude Desktop's spawning
model) can't open the socket; it hangs silently waiting on a dial
that fails with EACCES. Today: workaround by running gocracker-mcp
under sudo (matching sandboxd's user). Followup: sandboxd should
expose --uds-group GROUP to chmod the UDS files to a known group,
so Claude Desktop's user-context MCP server can talk to a root-owned
sandboxd without privilege escalation. Tracked as a known limitation
in docs/design/mcp-server.md ("Auth / multi-tenancy" follow-up).
# Tests
go test -count=1 ./sandboxes/internal/mcp/ ─→ ok 0.005s (11/11)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
gocracker-mcp, a JSON-RPC 2.0 server that lets AI clients(Claude Desktop, Claude Code, custom MCP-aware agents) execute code
in gocracker sandboxes via well-typed tools. Speaks
modelcontextprotocol.io spec rev 2025-11-25.
Stacked on top of #22 (feat/slirp-net-and-atomic-disk-meta), which
adds the node-warm runtime path that the
process.eval_nodetooldepends on. Review after #22 lands.
What ships
5 tools, all thin SDK wrappers
sandbox.leaseClient.LeaseSandboxsandbox.deleteClient.Deletesandbox.recycleClient.Recycleprocess.execToolboxClient.Execprocess.eval_nodeToolboxClient.Exec(["node-warm", src])process.eval_nodeis the marquee differentiator. It routes to thein-guest pre-loaded node REPL (shipped on #22 as Pieza B); no other
public MCP server has anything close at this latency. State (globals)
persists across calls in the same sandbox — useful for stateful AI
loops.
Transport: stdio (MVP)
Claude Desktop spawns
gocracker-mcpas a subprocess, frames JSON-RPCover stdin/stdout. Stderr is the diagnostic log surface — stdout is
RESERVED for the wire protocol.
Streamable HTTP (multi-tenant, bearer auth) is a follow-up; same
Server.Handlecore, just a different framing layer.Tests
10 tests covering initialize, tools/list (deterministic sort),
tools/call happy + error paths, JSON-RPC version validation, parse
errors, the stdio loop. All against an
httptest-backed fakesandboxd; no real KVM. ~225 LoC.
`go test ./sandboxes/internal/mcp/` → ok 0.005s
Manual smoke
```bash
$ (echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-11-25","clientInfo":{"name":"smoke","version":"1"}}}'
echo '{"jsonrpc":"2.0","method":"notifications/initialized"}'
echo '{"jsonrpc":"2.0","id":2,"method":"tools/list"}')
| bin/gocracker-mcp --sandboxd http://127.0.0.1:9091
{"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2025-11-25","capabilities":{"tools":{}},"serverInfo":{"name":"gocracker-mcp","version":"dev"}}}
{"jsonrpc":"2.0","id":2,"result":{"tools":[...5 entries with full inputSchema...]}}
```
Two valid JSON-RPC frames out, all 5 tools listed with their JSON
Schemas. Verified.
Code layout (~1.3k LoC, all new files)
Zero changes to existing packages. This is purely additive.
Why this matters
State of the art for "execute code" MCP servers: Docker (100 ms+
cold start), V8 isolates (no filesystem, no process state), or cold
microVMs (~125–500 ms even warm — E2B, Daytona, Cloudflare Code Mode,
Arrakis, ConTree all cluster here). Two approaches per the
boxlite/HN research: nobody combines real Linux guest with sub-30 ms
warm-pool restore + sub-30 ms in-guest JS eval.
gocracker MCP gets the ~10 ms ms-floor while keeping a real Linux
guest with files, processes, snapshots — the AI can `pip install`,
write to `/tmp`, fork a server, and have all of it inside a real
KVM isolation boundary.
Claude Desktop integration
```json
{
"mcpServers": {
"gocracker": {
"command": "/usr/local/bin/gocracker-mcp",
"args": ["--sandboxd", "http://127.0.0.1:9091"]
}
}
}
```
Restart Claude Desktop. Ask: "Lease a base-node-warm sandbox and run
`console.log(process.version)` in it." — Claude calls the right
tools and replies with the output.
Next phases (separate PRs after this lands)
Documented in `docs/design/mcp-server.md`:
The primitive that nobody else has at this latency. ~200 LoC.
first to satisfy `success_predicate` wins. ~150 LoC.
RSS pages). Surfaces gocracker's existing dirty-log capture. ~250 LoC.
passthroughs. ~100 LoC each.
Test plan
returns valid JSON-RPC frames, all 5 tools listed
reviewer (feat: slirp net + atomic snapshots + Pieza B warm-runtime path (108ms→92ms WARM CLI, ~24ms node-warm guest eval) #22 needs to land first so `base-node-warm` is
available for `process.eval_node` to point at)