English | 日本語
scriptorium is an MCP runtime for grounded Markdown and code retrieval. This repository contains the Go implementation, release packaging, and artifact builders for the runtime.
Even when library documentation and sample code are well maintained, AI-driven usage has often not worked well enough in practice.
AI tends to scan broad portions of the documentation on each request, making it difficult to retrieve only the necessary information efficiently. This also creates problems for consistency and reproducibility in the referenced sources.
In addition, even when documentation, sample code, and related materials are available, information that is understandable for humans is not always structured in a way that AI can handle effectively.
As a result, teams can end up in a state where "the information exists, but it cannot be used well."
scriptorium was developed to address this problem by treating documentation, sample code, and implementation knowledge as a single connected source of truth that AI can reference naturally.
scriptorium is an MCP runtime that analyzes Markdown and code together and attaches grounded references to the sources behind search, retrieval, and implementation-guide generation results.
This makes the origin of retrieved information traceable and helps preserve reproducibility and verifiability in development workflows, including AI-assisted ones.
It also allows users to access the knowledge they need in a consistent way without having to think explicitly about where the information lives or how it should be referenced.
Current runtime and artifact behavior is documented in:
Product-facing behavior should follow the runtime docs above.
- Contribution guide: CONTRIBUTING.md
- Japanese contribution guide: CONTRIBUTING.ja.md
- Code of Conduct: CODE_OF_CONDUCT.md
- Japanese Code of Conduct: CODE_OF_CONDUCT.ja.md
- Security policy: SECURITY.md
- Japanese security policy: SECURITY.ja.md
- Support policy: .github/SUPPORT.md
- Japanese support policy: .github/SUPPORT.ja.md
- Changelog: CHANGELOG.md
- Japanese changelog: CHANGELOG.ja.md
- License: LICENSE
- Japanese license reference: LICENSE.ja.md
- Third-party notices: THIRD-PARTY-NOTICES.md
Annotated tags in the format v<major>.<minor>.<patch> are used to publish
GitHub Releases.
Versioned release notes are maintained in CHANGELOG.md.
The release workflow builds versioned archives for:
- Linux
amd64,arm64 - Windows
amd64,arm64 - macOS
amd64,arm64
Linux release archives contain scriptorium, scriptorium-index,
scriptorium-snapshot, scriptorium.sbom.spdx.json, LICENSE,
THIRD-PARTY-NOTICES.md, and user-facing Linux setup guides under
docs/.
Windows release archives contain scriptorium.exe,
scriptorium-index.exe, scriptorium-snapshot.exe,
scriptorium.sbom.spdx.json, LICENSE, THIRD-PARTY-NOTICES.md, and
user-facing Windows setup guides under docs/.
macOS release archives contain the platform binaries together with the
repository README files, scriptorium.sbom.spdx.json, LICENSE, and
THIRD-PARTY-NOTICES.md.
Normal CI validates pull requests and non-tag pushes on Linux, Windows, and macOS.
- Current implementation includes:
- MCP runtime entrypoint
- docs index builder entrypoint
- git snapshot builder entrypoint
- runtime configuration loading
refIdparsing and spec-compliant heading slug generation- tool result envelope generation
- diagnostics payload scaffolding
- display-path-based filesystem guarding and deterministic file scanning
- text decoding for UTF-8, UTF-16, and Shift_JIS-style fallbacks
- Markdown heading-block parsing and docs filesystem fallback scanning
- runtime search semantics for docs/code ranking with heading/path boosts, code path fallback, and clustered
code_rangeresults - initial
get_contentsemantics for markdown, code, and file refs - initial
expand_relatedsemantics for filesystem-backed docs and sample roots - initial
summarize_flowsemantics for filesystem-backed markdown and code refs - additive
guide_implementationMCP tool that composes retrieval and flow summarization into grounded implementation guidance - SQLite/FTS-backed docs index builder/runtime support with auto-discovery, verification, and stale fallback
- SQLite/FTS-backed git snapshot builder/runtime support for snapshot-backed code search and content resolution
- acceptance-focused verification for documented runtime scenarios
- MCP stdio transport with
initialize,ping,tools/list, andtools/call
- Obtain the distributed binaries for your OS.
- Or download a published release archive from GitHub Releases.
- Prepare the Markdown directory to index and search.
- If you use Git snapshots, make sure the
gitcommand is available.
The distribution is expected to contain at least these three executables:
scriptoriumscriptorium-indexscriptorium-snapshot
On Windows they are typically distributed with .exe.
mkdir scriptorium
cd scriptorium
# Place the distributed binaries hereIf you want faster docs search, generate the SQLite index first.
.\scriptorium-index.exe --docs-root .\docs --out .\scriptorium-index.sqlitemacOS / Linux:
./scriptorium-index --docs-root ./docs --out ./scriptorium-index.sqliteIf you want snapshot-backed code search, generate a snapshot from the target repository.
.\scriptorium-snapshot.exe --repo . --out .\scriptorium-snapshot.sqlite --samples HEAD --code-extensions .go,.ts,.csmacOS / Linux:
./scriptorium-snapshot --repo . --out ./scriptorium-snapshot.sqlite --samples HEAD --code-extensions .go,.ts,.csTypical --samples values:
HEAD: snapshot the current HEADWORKTREE: snapshot the working treefeature/foo: snapshot a specific branch or ref
--samples ALL expands local branches only. It does not automatically include remote-tracking refs such as origin/feature/foo, even when --fetch is enabled.
The required configuration is SCRIPTORIUM_MARKDOWN_DIR. SCRIPTORIUM_CODE_ROOTS enables direct filesystem-backed code search, and SCRIPTORIUM_INDEX_FILE / SCRIPTORIUM_SNAPSHOT_FILE enable prebuilt artifacts. Path-bearing list variables use the host OS path-list separator (; on Windows, : on macOS / Linux) or newlines, so paths containing spaces remain intact.
$env:SCRIPTORIUM_MARKDOWN_DIR = (Resolve-Path .\docs)
$env:SCRIPTORIUM_INDEX_FILE = (Resolve-Path .\scriptorium-index.sqlite)
$env:SCRIPTORIUM_SNAPSHOT_FILE = (Resolve-Path .\scriptorium-snapshot.sqlite)
$env:SCRIPTORIUM_CODE_EXTENSIONS = ".go,.ts,.cs"
$env:SCRIPTORIUM_INDEX_VERIFY = "full"Optional MCP identity metadata can make the server easier for AI clients to select. SCRIPTORIUM_SERVER_NAME overrides serverInfo.name, SCRIPTORIUM_MCP_PROFILE provides a profile identifier, SCRIPTORIUM_MCP_TOOL_PREFIX prefixes advertised tool names, and SCRIPTORIUM_MCP_DOMAIN_DESCRIPTION / SCRIPTORIUM_MCP_CORPUS_SUMMARY make tool descriptions more specific. SCRIPTORIUM_MCP_EXAMPLE_QUERIES accepts newline-delimited examples that show up in diagnostics.
Example with multiple artifacts:
$env:SCRIPTORIUM_INDEX_FILE = @(
(Resolve-Path .\artifacts\docs-a.sqlite)
(Resolve-Path .\artifacts\docs-b.sqlite)
) -join [IO.Path]::PathSeparator
$env:SCRIPTORIUM_SNAPSHOT_FILE = @(
(Resolve-Path .\artifacts\repo-a.sqlite)
(Resolve-Path .\artifacts\repo-b.sqlite)
) -join [IO.Path]::PathSeparatorExample with direct filesystem sample roots:
$env:SCRIPTORIUM_CODE_ROOTS = (Resolve-Path .\src).\scriptorium.exeAfter startup it accepts initialize, tools/list, and tools/call over stdio transport.
macOS / Linux:
./scriptoriumCall diagnostics first to inspect docs index, git snapshot, and cache state.
Recommended checks:
docs.docsOnlyModedocs.index.enabledcode.gitSnapshot.enabledcaches.textFilescaches.sampleRootsretrieval.corpusretrieval.fallbackretrieval.warnings
Recommended MCP tools:
diagnosticsfor runtime state and cache visibilitysearch,get_content,expand_related, andsummarize_flowfor primitive retrieval workflowsguide_implementationwhen the client wants a single grounded implementation guide with supporting docs and sample-code refs
Implementation-guidance retrieval:
searchaccepts additivemode=implementationto enable framework-aware ranking and docs/code diversification.expand_relatedaccepts additivemode=implementationand theframework_linkssignal for startup wiring, DI, route, attribute/decorator, and config-style relations.guide_implementationuses those implementation-aware retrieval hints internally.
If you want to shape docs for better scriptorium parsing and retrieval, see MARKDOWN_AUTHORING_GUIDE.md.
The most important points are:
- Put one implementation topic in each heading block.
- Use numbered lists for procedural steps.
- Keep API names, config keys, routes, type names, and file names literal.
- Keep important examples in the same heading block as the explanation.
- Open engineering work is tracked in GitHub Issues.
- Coverage command:
- PowerShell:
./scripts/test-coverage.ps1 - Bash:
./scripts/test-coverage.sh
- PowerShell:
- The coverage scripts run package-by-package
go test -coverprofilecommands, merge them intocoverage.out, and print thego tool cover -funcsummary. - Current merged statement coverage:
99.2%. - Packages with the most remaining room are
internal/ref(98.2%),internal/docsindex(98.5%),internal/content(98.7%),internal/related(98.7%), andinternal/gitsnapshot(98.9%).
- Historical test-audit notes from the migration era are not part of the active workflow.
- Coverage mapping is tracked by scenario rather than by file name because repository tests are organized by package.
- The git snapshot builder now applies the same size, ignored-directory, and fallback-decoding rules during snapshot materialization that runtime filesystem sample roots use, and follows the same display-path-based symlink policy when symlinks are enabled.
- Build-time fetch is intentionally simple:
--fetchenables it,--fetch-on-startdecides whether it runs before snapshot generation, and--fetch-remoteselects the remote when enabled.
diagnostics.caches.textFilesnow reflects the live filesystem text cache used by docs and filesystem-backed code reads.diagnostics.code.sampleSourcesnow lists both filesystem roots and snapshot-backed roots so runtime source coverage is visible without cross-checkinggitSnapshot.roots.diagnostics.servernow reportsruntime=goandruntimeVersiononly.diagnostics.caches.textFilesincludesevictions.diagnostics.caches.sampleRootsnow reflects the live runtime code-source cache used to resolve filesystem and snapshot-backed code sources, withevictions,filesystemRoots, andsnapshotRootscounters for the current cache contents.
- The repository now uses
modernc.org/sqliteas a pure-Go SQLite driver so the docs index and git snapshot artifacts can match the original SQLite/FTS architecture without adding a CGO requirement.
cmd/scriptorium: runtime server source entrypoint for the distributedscriptoriumbinarycmd/build-docs-index: docs index builder source entrypoint for the distributedscriptorium-indexbinarycmd/build-git-snapshot: git snapshot builder source entrypoint for the distributedscriptorium-snapshotbinaryinternal/app: command orchestrationinternal/config: environment and runtime configurationinternal/content:get_contentrequest handling and range selectioninternal/diagnostics: diagnostics payloadinternal/docs: Markdown block parsing and docs filesystem scanninginternal/docsindex: docs index artifact build/load/verify helpersinternal/filesafe: guarded filesystem access and deterministic scansinternal/flow:summarize_flowextraction, merge, and confidence logicinternal/gitsnapshot: snapshot artifact build/load helpersinternal/protocol: tool result envelope helpersinternal/related:expand_relatedrequest handling and scoringinternal/ref: path normalization andrefIdhandlinginternal/search: query normalization and filesystem-backed search helpersinternal/source: shared filesystem-backed code source resolutioninternal/textdecode: spec-oriented text decoding helpersinternal/textutil: shared line normalization, splitting, and centered range helpers
