Releases: moonrunnerkc/ruleprobe
Release list
v4.5.0
v4.5.0 - Pivot to ESLint Config Translation
RuleProbe now focuses on translating instruction files into ESLint configs, detecting drift, and extracting rules. The core value proposition: translate, detect drift, extract.
New Commands
lint-config- Translates an instruction file into a flat or legacy ESLint configdrift- Compares an instruction file against an existing ESLint config and reports mismatchesextract- Parses an ESLint config and emits a markdown rules section for instruction files
Breaking Changes
- Removed
comparecommand - agent comparison is no longer a primary use case - Removed
tasksandtaskcommands - task template listing and printing removed - Removed
runcommand - agent invocation via the Claude Agent SDK removed - Removed runner module from public API -
buildAgentConfig,invokeAgent,watchForCompletion,countCodeFilesare no longer exported - Removed
formatComparisonMarkdownfrom reporter - Deprecated
verifycommand - still works, but primary focus shifted RuleCategorynarrowed - removedtest-requirement,dependency,preference,file-structure,tooling,testing,workflowVerifierTypenarrowed - removedtreesitter,preference,tooling,config-file,git-history- 67 unmappable matchers removed - only 34 ESLint-mappable matchers remain across 7 categories
Matcher Audit
Categories remaining: naming, forbidden-pattern, structure, import-pattern, error-handling, type-safety, code-style (plus agent-behavior for semantic analysis).
All 34 remaining matchers produce valid ESLint rule entries. Unmappable rules (test requirements, project config, git conventions) are reported as comments in lint-config output.
Install
npm install -g ruleprobe@4.5.0v4.0.0: Open-Source Semantic Tier
RuleProbe v4.0.0
Released: April 2026
Architecture change: three repos to one
v3.0.0 split the semantic analysis across three repositories to protect proprietary IP:
ruleprobe(public, MIT): CLI + thin HTTP clientruleprobe-semantic(private): ASPE engine (fingerprinting, similarity, LLM escalation)ruleprobe-api-service(private): HTTP server, license gating, Anthropic proxy
v4.0.0 consolidates everything into the single ruleprobe repo. The semantic engine is now fully open source under MIT. No separate server, no license keys, no private repos.
What moved where
| Source | Destination | Files |
|---|---|---|
ruleprobe-semantic/src/ |
src/semantic/engine/ |
39 |
ruleprobe-api-service/src/services/anthropic-proxy.ts |
src/semantic/anthropic-caller.ts |
1 |
ruleprobe-semantic/tests/ |
tests/semantic/engine/ |
18 |
What was deleted
| File | Reason |
|---|---|
src/semantic/client.ts |
HTTP client (replaced by direct engine call) |
tests/semantic/client.test.ts |
HTTP client tests (no longer needed) |
All ruleprobe-api-service code |
Server, routes, license service, rate limiter, SQLite store |
Breaking changes
CLI flag rename
--license-key is removed. Use ANTHROPIC_API_KEY environment variable instead (same pattern as --llm-extract with OPENAI_API_KEY).
Before:
ruleprobe analyze ./my-project --semantic --license-key <key>After:
ANTHROPIC_API_KEY=sk-ant-... ruleprobe analyze ./my-project --semanticOr pass it explicitly:
ruleprobe analyze ./my-project --semantic --anthropic-key <key>No API service required
The RULEPROBE_API_ENDPOINT environment variable and apiEndpoint config field are removed. All analysis runs locally.
Config file changes
In .ruleprobe/config.json, replace licenseKey with anthropicApiKey:
Before:
{ "licenseKey": "rp-..." }After:
{ "anthropicApiKey": "sk-ant-..." }Data flow change
Before (v3.0.0):
- CLI extracts raw vectors locally
- CLI sends vectors over HTTP to API service
- API service runs semantic engine, calls Anthropic API
- API service returns verdicts to CLI
After (v4.0.0):
- CLI extracts raw vectors locally (unchanged)
- CLI runs semantic engine directly (no network)
- Engine calls Anthropic API with user's key (only when LLM needed)
- CLI integrates verdicts (unchanged)
Test count
| Before | After |
|---|---|
| ruleprobe: 864 | ruleprobe: 1,085+ |
| ruleprobe-semantic: 221 | (merged into ruleprobe) |
| ruleprobe-api-service: 54 | (deleted) |
| Total: 1,139 | Total: 1,085+ |
API service tests (54) were not migrated as they tested HTTP routes, license validation, rate limiting, and SQLite storage, none of which exist in the consolidated architecture.
Migration checklist
- Remove
RULEPROBE_LICENSE_KEYfrom environment variables and CI secrets - Set
ANTHROPIC_API_KEYwhere semantic analysis is used - Replace
--license-key <key>with--anthropic-key <key>(or just set the env var) - Remove
apiEndpointfrom.ruleprobe/config.jsonif present - Replace
licenseKeywithanthropicApiKeyin.ruleprobe/config.jsonif present - Stop any running
ruleprobe-api-serviceinstances
v3.0.0: Semantic Analysis Tier (ASPE)
RuleProbe v3.0.0
Released: April 2026
45 files changed, +3,861 / -251 lines since v2.0.0. 10 new source files, 7 new test files, 864 tests across 68 test files (was 572 across 52 at v2.0.0). 3 commits.
What changed
v2.0.0 delivered deterministic verification with 78 matchers. The real-world gap was twofold: (1) rules like "follow existing patterns" or "maintain consistency" have no deterministic check, and (2) performance collapsed on large codebases (PostHog: 7,000+ files). v3.0.0 adds the semantic analysis tier (ASPE), fixes 12 root-cause bugs found during E2E validation, and delivers a 50% performance improvement on large repos.
Summary of changes
1. Semantic analysis tier (new)
Full client-side integration for the paid ASPE (Adaptive Structural Profile Engine) tier. Eight new source files in src/semantic/:
- local-extractor.ts: single-pass tree-sitter scanner producing
RawExtractionPayload(AST node type counts, nesting depths, opaque sub-tree hashes). No source code, variable names, comments, file paths, or imports leave the machine. - client.ts: HTTP client sending raw vectors to the API service, receiving
SemanticVerdict[]back. Handles license validation, graceful degradation on network failure, timeout, and retry. - config.ts: license key resolution (CLI flag > env var
RULEPROBE_LICENSE_KEY>.ruleprobe/config.json), API endpoint configuration. - types.ts: public contract types (
StructuralProfile,FeatureVector,CrossFileGraph,SemanticVerdict,RawExtractionPayload, etc.). - ast-visitor.ts: recursive tree visitor, canonical shape hashing (SHA-256 of AST structure), deviation comment detection, node classification.
- file-walker.ts: file discovery respecting
.gitignore,node_modules,dist,build,.nextexclusions, sorted for deterministic order. - audit-log.ts: timestamped logging of every API call to
.ruleprobe/semantic-log/. - index.ts: orchestrator wiring local extraction, remote analysis, and result integration.
Privacy guarantee: only numeric vectors, opaque hashes, boolean flags, and rule text are transmitted. Verified by automated privacy test against excalidraw (626 files) and PostHog (7,160 files). See verification/e2e-verification-report.md section 5.
2. CLI semantic flags (new)
Six new flags on the analyze command:
| Flag | Description |
|---|---|
--semantic |
Enable semantic analysis (requires license key) |
--license-key <key> |
License key for the semantic tier |
--max-llm-calls <n> |
Cap LLM calls per analysis (default: 20) |
--no-cache |
Disable profile caching |
--semantic-log |
Print what was sent/received to stdout |
--cost-report |
Show token cost breakdown |
Without --semantic, the analyze command runs deterministic analysis only (unchanged behavior). If the license key is invalid or the API is unreachable, semantic analysis is skipped gracefully and deterministic results are still returned.
3. Batch AST verifier (performance fix)
Problem: v2.0.0 parsed every file once per AST rule, yielding O(rules * files) ts-morph parse calls. On PostHog (7,000+ files, ~30 AST rules), this was prohibitively slow.
Fix: new ast-verifier-batch.ts creates one ts-morph Project, parses each file once, runs every non-type-aware rule against it, then discards the SourceFile. Complexity drops to O(files).
The verifier router (src/verifier/index.ts) now collects all AST rules, runs them through the batch verifier in a single pass, then routes remaining rule types individually.
4. Tree-sitter WASM stability fix
Problem: creating a new Parser and calling Language.load() for every file caused WASM function table exhaustion on large repos (PostHog: 7,000+ Python files). Parser objects were leaked because parser.delete() was called inconsistently.
Fix: treesitter-loader.ts now caches one Parser instance per language. Language.load() is called once per grammar. The parseWithTreeSitter() return type no longer includes the parser (callers must not delete shared parsers). Tree deletion remains the caller's responsibility.
5. UPPER_CASE constant naming check (new matcher)
New AST check in src/ast-checks/naming.ts: verifies that module-scope const declarations with primitive initializers use UPPER_CASE naming. Skips destructured bindings, function expressions, arrow functions, objects, and arrays.
6. Analyze command decomposition
src/commands/analyze.ts was refactored: formatter functions extracted to src/commands/analyze-formatters.ts (289 lines) to comply with the 300-line file limit. The analyze handler now supports semantic integration, JSON/markdown output for analyze results, and the --threshold flag for CI pass/fail determination.
7. Matcher broadening
Existing matcher files received targeted additions:
- rule-patterns.ts: +19 lines, broader keyword matching for existing patterns
- rule-patterns-extended.ts: +50 lines, additional recognition patterns
- rule-patterns-preference.ts: +4 lines, regex fix for preference pair detection
8. Calibration data
Real-world calibration fixtures added from excalidraw (626 files) and PostHog (7,160 files):
| Metric | excalidraw | PostHog |
|---|---|---|
| Files extracted | 626 | 7,160 |
| Extraction time | 7.5s | 38.6s |
| Unique AST node types | 278 | 298 |
| Sub-tree hashes | 9,262 | 119,254 |
| Topic-matched rules | 24/39 | 45/68 |
| Mean similarity (matched) | 0.9833 | 0.9834 |
Calibrated threshold: 0.85 (pre-calibration default confirmed; all topic-matched rules scored above 0.95 in both repos).
Calibrated weights: Jaccard=0.4, Cosine=0.6 (confirmed; near-identical means across repos of vastly different sizes validates scale-independence).
Full calibration report: tests/fixtures/calibration/CALIBRATION-REPORT.md
Bug fixes (12 root-cause resolutions)
All found during E2E verification against excalidraw and PostHog. Each is a root-cause fix, not a workaround.
| # | Bug | Root Cause | Fix |
|---|---|---|---|
| 1 | Tree-sitter WASM crash on large repos | New Parser per file exhausts WASM function table | Cache one Parser per language in treesitter-loader.ts |
| 2 | O(rules*files) AST performance | Each rule triggered a full ts-morph parse of every file | Batch AST verifier: parse each file once |
| 3 | Matchers not wired to runner | Some v2.0.0 matchers never ran during verification | Connected all matchers through verifier routing |
| 4 | Missing markdown/json analyze output | analyze only output text format |
Added format routing for analyze command |
| 5 | Narrow matcher regexes | Some rule patterns not triggering on real instruction text | Broadened keyword matching in 3 pattern files |
| 6 | Enum string comparison mismatch | VariableDeclarationKind compared as wrong type |
Import and use VariableDeclarationKind enum directly |
| 7 | analyze.ts exceeds 300-line limit | Formatters inline in handler | Extract to analyze-formatters.ts |
| 8 | Stale parser.delete() in tests |
Tests calling delete on shared parser | Remove stale cleanup calls |
| 9 | Client/API header mismatch | client.ts sent license key in body, API expected x-license-key header |
Added header to fetch call |
| 10 | AnalyzeResponse shape mismatch | Client expected {verdicts, report}, API returns {report} with report.verdicts nested |
Fixed type and read path |
| 11 | JSON format corruption | Semantic summary text appended after JSON.stringify output |
Guard semantic summary for non-JSON formats |
| 12 | Test mock shape mismatch | Mocks had verdicts at top level, code reads report.verdicts |
Fixed mocks to nest verdicts inside report |
Breaking changes
New verifier return type for tree-sitter
parseWithTreeSitter() no longer returns a parser in its result object. Callers that previously called parser.delete() must remove that call. The parser is now shared and cached internally.
// Before (v2.0.0)
const result = await parseWithTreeSitter(path, lang);
// result: { root, tree, parser }
result.parser.delete();
// After (v3.0.0)
const result = await parseWithTreeSitter(path, lang);
// result: { root, tree }
// parser is cached internally, do NOT deleteAnalyze handler is now async
src/commands/analyze.ts handler changed from sync to async to support the semantic pipeline. If you call handleAnalyze() programmatically, await the result.
New CLI options
--semanticonanalyze: enable semantic analysis--license-key <key>onanalyze: license key--max-llm-calls <n>onanalyze: cap LLM calls (default: 20)--no-cacheonanalyze: disable profile caching--semantic-logonanalyze: print API call log--cost-reportonanalyze: show token cost breakdown--threshold <number>onanalyze: compliance threshold for CI pass/fail (default: 0.8)
New public types
All new types are in src/semantic/types.ts:
PatternTopicStructuralProfileFeatureVectorCrossFileGraphSemanticVerdictStructuralViolationSemanticAnalysisConfigQualifierType(re-exported from core types)QualifierContextSemanticAnalysisReportCrossFileFindingRawExtractionPayloadRawFileVectorExtractedRulePayload
Note: the semantic module is not re-exported from src/index.ts. Import directly from ruleprobe/dist/semantic/index.js if needed programmatically.
Files created
| File | Lines | Purpose |
|---|---|---|
| src/semantic/types.ts | 155 | Public contract types for semantic tier |
| src/semantic/local-extractor.ts | 261 | Single-pass tree-sitter AST extraction |
| src/semantic/client.ts | 164 | HTTP client for API service |
| src/semantic/config.ts | 120 | License key and endpoint resolution |
| src/semantic/index.ts | 156 | Semantic p... |
v2.0.0
RuleProbe v2.0.0
Released: April 2026
45 files changed, +885 / -105 lines since v1.0.0. 17 new source files, 12 new test files, 572 tests across 52 test files.
What changed
v1.0.0 could only verify naming conventions. A real-world audit against 8 repos (next.js, langchain, excalidraw, zed, elasticsearch, codex, cline, PostHog) found 98% of instruction file statements were unverifiable. v2.0.0 closes that gap with four new matcher categories, compliance scoring, multi-file analysis, structured extraction, and new report formats.
Breaking changes
Compliance scoring replaces binary pass/fail
RuleResult gains a compliance field (number, 0 to 1). Deterministic checks return 0 or 1. Pattern checks (prefer X over Y) return the ratio. Coverage checks (test colocation) return the percentage of source files with tests.
DEFAULT_COMPLIANCE_THRESHOLD is 0.8. The --threshold CLI option controls pass/fail determination.
// Before (v1)
interface RuleResult {
rule: Rule;
passed: boolean;
evidence: Evidence[];
}
// After (v2)
interface RuleResult {
rule: Rule;
passed: boolean;
compliance: number; // 0-1, new
evidence: Evidence[];
}All existing verifiers updated. Code consuming RuleResult needs no changes unless it checked the shape directly, but the compliance field is now always present.
Structured rule extraction
Rule gains two new fields:
interface Rule {
// ... existing fields unchanged
section?: string; // markdown header the rule was found under
qualifier?: QualifierType;
}QualifierType is a new union:
type QualifierType =
| 'always' // "always use", "must", "required", or no qualifier keyword
| 'prefer' // "prefer", "favor", "default to", "instead of"
| 'when-possible' // "when possible", "where feasible", "ideally"
| 'avoid-unless' // "avoid unless", "only when necessary", "except when"
| 'try-to' // "try to", "aim for", "should generally"
| 'never' // "never", "do not", "must not", "forbidden"
;Detection is deterministic keyword/phrase matching during the extraction pass. No NLP, no LLM. Rules with no qualifier keyword default to 'always'.
Expanded category and verifier unions
// New categories
type RuleCategory =
| 'naming' | 'forbidden-pattern' | 'structure' | 'test-requirement'
| 'import-pattern' | 'error-handling' | 'type-safety' | 'code-style'
| 'dependency'
| 'preference' // new
| 'file-structure' // new
| 'tooling' // new
| 'testing' // new
;
// New verifier types
type VerifierType = 'ast' | 'regex' | 'filesystem' | 'treesitter'
| 'preference' // new
| 'tooling' // new
;Exhaustive switch statements and Record<RuleCategory, ...> types need updating.
New report format values
ReportFormat gains three new values: 'summary', 'detailed', 'ci'. Existing formats (text, json, markdown, rdjson) behave identically.
New features
Prefer-pattern matchers (category: preference)
The most common instruction type across all audited repos. Extracts "prefer X over Y", "use X instead of Y", "X over Y", "favor X over Y" patterns and counts occurrences of both sides via ts-morph AST analysis.
8 prefer-pairs ship in v2.0.0:
| Pair | Preferred | Alternative |
|---|---|---|
| const-vs-let | const |
let |
| named-vs-default-exports | named exports | default exports |
| interface-vs-type | interface |
type alias |
| async-await-vs-then | async/await |
.then() chains |
| arrow-vs-function-declarations | arrow functions | function declarations |
| template-literals-vs-concatenation | template literals | string concatenation |
| optional-chaining-vs-nested-conditionals | optional chaining (?.) |
nested conditionals |
| functional-vs-class-components | functional components | class components |
Returns compliance as a ratio (e.g., 0.85 = 85% preferred usage). Adding a new pair requires only adding an entry to the PREFER_PAIRS array in src/verifier/prefer-pairs.ts. No other code changes needed.
If a pair references a pattern without a corresponding AST query, the result reports it as "detected but not yet verifiable" rather than silently dropping it.
File/path existence matchers (category: file-structure)
5 matchers for instructions referencing project structure:
- tests-dir: "Tests go in __tests__/" (directory must exist and contain files)
- components-dir: "Components live in src/components/"
- env-file: "Use .env.local for local config"
- module-index: "Every module needs an index.ts" (checks all module directories, returns compliance ratio)
- src-dir: "Source code in src/"
Dependency/tooling matchers (category: tooling)
9 matchers checking package.json, lockfiles, and config files:
- Package managers: pnpm, yarn, bun (checks lockfile presence, flags competing lockfiles)
- Test frameworks: vitest, jest, pytest (checks config files and package.json dependencies/scripts)
- Tools: eslint, prettier, biome (scans config-like files for references)
When a competing tool is detected alongside the required one (e.g., both pnpm-lock.yaml and package-lock.json), compliance is set to 0.5 and the conflict is reported.
Test pattern matchers (category: testing)
3 matchers for testing conventions:
- colocate-tests: Checks source-to-test colocation ratio across the project
- describe-it-blocks: Verifies test files use
describe()/it()structure - no-console-in-tests: Flags
console.log/warn/errorin test files
Multi-file project analysis
New top-level function and CLI command:
import { analyzeProject, discoverInstructionFiles } from 'ruleprobe';
const analysis = analyzeProject('/path/to/project');
// analysis.files: per-file extraction results
// analysis.conflicts: cross-file contradictions
// analysis.redundancies: same instruction in multiple files
// analysis.coverageMap: which categories are in which filesdiscoverInstructionFiles() checks for all recognized instruction file names: CLAUDE.md, AGENTS.md, .cursorrules, .github/copilot-instructions.md, GEMINI.md, .windsurfrules. The list is a typed constant (INSTRUCTION_FILE_NAMES) for easy extension.
CLI: ruleprobe analyze <project-dir> [--format text|json] [--output path]
Report formats
--format summary: Compact table with per-category pass/total/score, designed as the default CLI output--format detailed: Full per-rule breakdown with compliance percentages, code locations, and evidence--format ci: Minimal key=value output with GitHub Actions::errorannotations for failures
New CLI options
--threshold <number>onverify: compliance threshold (0-1) for pass/fail determination (default: 0.8)ruleprobe analyze <project-dir>: discover and analyze all instruction files in a project
New public API exports
Functions: analyzeProject, discoverInstructionFiles
Types: QualifierType, ProjectAnalysis, FileAnalysis, CrossFileConflict, CrossFileRedundancy
Constants: INSTRUCTION_FILE_NAMES, DEFAULT_COMPLIANCE_THRESHOLD
Stats
| Metric | v1.0.0 | v2.0.0 |
|---|---|---|
| Source files | 75 | 92 |
| Source lines | 8,607 | 11,115 |
| Test files | 40 | 52 |
| Tests | 434 | 572 |
| Rule matchers | 53 | 78 |
| Rule categories | 9 | 13 |
| Verifier engines | 4 | 6 |
| CLI commands | 6 | 7 |
Files created
| File | Lines | Purpose |
|---|---|---|
| src/parsers/qualifier-detector.ts | 104 | Deterministic qualifier detection from instruction text |
| src/parsers/rule-patterns-preference.ts | 182 | 8 preference matchers |
| src/parsers/rule-patterns-file-structure.ts | 124 | 5 file structure matchers |
| src/parsers/rule-patterns-tooling.ts | 186 | 9 tooling matchers |
| src/parsers/rule-patterns-testing.ts | 76 | 3 testing matchers |
| src/parsers/instruction-patterns.ts | 106 | Instruction candidate regex patterns (extracted from rule-extractor.ts) |
| src/verifier/prefer-pairs.ts | 132 | Prefer-pair definitions and lookup |
| src/verifier/preference-verifier.ts | 300 | AST-based preference counting |
| src/verifier/tooling-verifier.ts | 230 | Package.json/lockfile/config verification |
| src/verifier/file-structure-checks.ts | 219 | Directory/file existence and compliance checks |
| src/verifier/test-regex-checks.ts | 83 | Test file regex checks (describe/it, no-console) |
| src/analyzers/project-analyzer.ts | 239 | Multi-file discovery, conflict/redundancy detection |
| src/analyzers/index.ts | 8 | Barrel export |
| src/reporter/summary.ts | 70 | Compact summary table formatter |
| src/reporter/detailed.ts | 103 | Per-rule detailed breakdown formatter |
| src/reporter/ci.ts | 62 | CI-friendly output with GitHub Actions annotations |
| src/commands/analyze.ts | 108 | Handler for the analyze CLI command |
Files modified
| File | Change |
|---|---|
| src/types.ts | Added 4 categories, 2 verifier types, QualifierType, compliance on RuleResult, section/qualifier on Rule, INSTRUCTION_FILE_NAMES, ProjectAnalysis types, DEFAULT_COMPLIANCE_THRESHOLD, 3 new report format values |
| src/index.ts | Added exports for new types, analyzeProject, discoverInstructionFiles |
| src/parsers/rule-extractor.ts | Imports 8 matcher sources + qualifier detector; attaches section/qualifier to rules; extracted instruction patterns to separate file |
| src/verifier/index.ts | Routes preference and tooling verifier types |
| src/verifier/ast-verifier.ts | All returns include compliance field |
| src/verifier/regex-verifier.ts | Added describe-it-structure and no-console-in-tests cases; all returns include compliance |
| src/verifier/file-verifier.ts | Added directory-exists-with-files, file-pattern-exists, module-index-required, `te... |
v1.0.0
14 commits, 100 files changed, +9,017 lines since v0.1.0.
Breaking Changes
verifyOutputis now async. ReturnsPromise<RuleResult[]>instead ofRuleResult[]. Callers mustawaitit.RuleCategoryunion expanded from 5 to 9 members: addederror-handling,type-safety,code-style,dependency. Exhaustiveswitchstatements andRecord<RuleCategory, ...>types need updating.VerifierTypeunion expanded: addedtreesitter.
New Features
53 matchers across 9 categories (was 15 matchers, 5 categories). 19 new AST checks, 7 new regex checks, 5 new filesystem checks, 4 new tree-sitter checks covering error handling, type safety, code style, and dependency verification.
User-defined rules via ruleprobe.config.ts. Add custom rules, override extracted rule severity or thresholds, exclude rules entirely. Auto-discovered in the working directory or specified with --config. defineConfig() export provides TypeScript type checking. Supports .ts, .js, .json, and .ruleproberc.json formats.
LLM-assisted extraction (--llm-extract). Sends unparseable instruction lines through an OpenAI-compatible API for a second extraction pass. Extracted rules tagged with extractionMethod: 'llm', confidence: 'medium', severity warning. Requires OPENAI_API_KEY. Opt-in only; default behavior unchanged.
Rubric decomposition (--rubric-decompose). Breaks subjective instructions ("write clean code") into weighted concrete checks (max function length, no magic numbers, etc.) via LLM. Tagged with extractionMethod: 'rubric', confidence: 'low'. Requires OPENAI_API_KEY. Opt-in only.
Agent invocation (ruleprobe run). Invoke Claude via the Agent SDK, capture output, verify, and report in one step. Also supports --watch mode for any agent that writes to a directory. Requires @anthropic-ai/claude-agent-sdk and ANTHROPIC_API_KEY for SDK mode. Watch mode needs no dependencies.
Tree-sitter multi-language support. Python and Go get naming and function-length checks via WASM grammars. Grammar packages (web-tree-sitter, tree-sitter-python, tree-sitter-go) ship as regular dependencies. If loading fails on a platform, tree-sitter checks are skipped and other verifiers still run.
Type-aware checks (--project). Pass a tsconfig.json to enable cross-file type analysis: implicit any detection through aliases, unused exports, unresolved imports. Falls back to isolated-file parsing automatically if compilation fails.
New CLI Flags
--llm-extractonparseandverify--rubric-decomposeonverify--configonverify,compare, andrun--projectonverifyandrun
New Public API Exports
Functions: defineConfig, loadConfig, applyConfig, extractWithLlm, createOpenAiProvider, buildAgentConfig, invokeAgent, isAgentSdkAvailable, hasAgentOutput, watchForCompletion, countCodeFiles
Types: VerifyOptions, RuleProbeConfig, CustomRule, RuleOverride, LlmProvider, LlmRuleCandidate, LlmExtractionResult, LlmExtractOptions, OpenAiProviderConfig, AgentInvocationConfig, RunOptions, InvocationResult, WatchOptions, WatchResult
Resolved Limitations
Every limitation documented in v0.1.0 has been addressed:
- "TypeScript and JavaScript only": Python and Go via tree-sitter.
- "No subjective evaluation":
--rubric-decomposedecomposes subjective rules into measurable proxies. - "No automated agent invocation":
ruleprobe runwith Claude SDK and watch mode. - "Conservative extraction (15 matchers)": 53 matchers, plus
--llm-extractfor the remainder. - "Type-level checks are limited":
--projectenables TypeChecker-dependent analysis.
Stats
| Metric | v0.1.0 | v1.0.0 |
|---|---|---|
| Source files | 30 | 75 |
| Source lines | 3,328 | 8,607 |
| Test files | 13 | 27 |
| Rule matchers | 15 | 53 |
| Rule categories | 5 | 9 |
| Verifier engines | 3 | 4 |
| CLI commands | 5 | 6 |
| Public API exports | 15 | 40 |
v0.1.0
RuleProbe v0.1.0 — the first release. Parse AI agent instruction files, verify output against extracted rules, get deterministic adherence reports.
What it does
Give RuleProbe an instruction file (CLAUDE.md, AGENTS.md, .cursorrules, copilot-instructions.md, GEMINI.md, .windsurfrules) and a directory of agent-generated code. It extracts machine-verifiable rules, runs them against the code, and tells you exactly which ones passed and which ones failed, with file paths and line numbers.
No LLM evaluation. No judgment calls. Same input, same output, every time.
Install
npm install -g ruleprobeHighlights
Parser: 15 rule matchers across 5 categories (naming, forbidden-pattern, structure, test-requirement, import-pattern). Rules it can't confidently classify are reported as unparseable so you know what was skipped.
Verifiers: AST checks via ts-morph (camelCase, PascalCase, no any, no console.log, named exports, JSDoc, import patterns), file system checks (kebab-case names, test file existence), and regex checks (line length, file length).
CLI: 5 commands.
ruleprobe parse— extract rules from an instruction fileruleprobe verify— check agent output against those rulesruleprobe compare— side-by-side comparison across agentsruleprobe tasks/ruleprobe task <id>— built-in task templates
Reports: text (terminal), JSON (CI), markdown (publishing), rdjson (reviewdog).
GitHub Action: composite action you can drop into any repo. Runs on every PR, posts results as a comment, supports reviewdog for inline annotations. No API keys beyond GITHUB_TOKEN.
Structured exit codes: 0 all passed, 1 violations found, 2 execution error.
Programmatic API: parseInstructionFile, verifyOutput, generateReport, formatReport, extractRules.
Security: never executes scanned code, never makes network calls, path traversal protection, all dependencies pinned to exact versions.
Numbers
- 30 source files, ~3,300 lines of TypeScript
- 206 tests across 23 test files
- 3 task templates: rest-endpoint, utility-module, react-component
Known limitations
- TypeScript and JavaScript only (AST checks use ts-morph)
- No subjective rule evaluation
- No automated agent invocation (planned for v0.2.0)
- Conservative extraction: prefers skipping rules over misclassifying them