Skip to content

Releases: moonrunnerkc/ruleprobe

v4.5.0

Choose a tag to compare

@moonrunnerkc moonrunnerkc released this 08 May 18:51

v4.5.0 - Pivot to ESLint Config Translation

RuleProbe now focuses on translating instruction files into ESLint configs, detecting drift, and extracting rules. The core value proposition: translate, detect drift, extract.

New Commands

  • lint-config - Translates an instruction file into a flat or legacy ESLint config
  • drift - Compares an instruction file against an existing ESLint config and reports mismatches
  • extract - Parses an ESLint config and emits a markdown rules section for instruction files

Breaking Changes

  • Removed compare command - agent comparison is no longer a primary use case
  • Removed tasks and task commands - task template listing and printing removed
  • Removed run command - agent invocation via the Claude Agent SDK removed
  • Removed runner module from public API - buildAgentConfig, invokeAgent, watchForCompletion, countCodeFiles are no longer exported
  • Removed formatComparisonMarkdown from reporter
  • Deprecated verify command - still works, but primary focus shifted
  • RuleCategory narrowed - removed test-requirement, dependency, preference, file-structure, tooling, testing, workflow
  • VerifierType narrowed - removed treesitter, preference, tooling, config-file, git-history
  • 67 unmappable matchers removed - only 34 ESLint-mappable matchers remain across 7 categories

Matcher Audit

Categories remaining: naming, forbidden-pattern, structure, import-pattern, error-handling, type-safety, code-style (plus agent-behavior for semantic analysis).

All 34 remaining matchers produce valid ESLint rule entries. Unmappable rules (test requirements, project config, git conventions) are reported as comments in lint-config output.

Install

npm install -g ruleprobe@4.5.0

v4.0.0: Open-Source Semantic Tier

Choose a tag to compare

@moonrunnerkc moonrunnerkc released this 15 Apr 02:43

RuleProbe v4.0.0

Released: April 2026

Architecture change: three repos to one

v3.0.0 split the semantic analysis across three repositories to protect proprietary IP:

  • ruleprobe (public, MIT): CLI + thin HTTP client
  • ruleprobe-semantic (private): ASPE engine (fingerprinting, similarity, LLM escalation)
  • ruleprobe-api-service (private): HTTP server, license gating, Anthropic proxy

v4.0.0 consolidates everything into the single ruleprobe repo. The semantic engine is now fully open source under MIT. No separate server, no license keys, no private repos.

What moved where

Source Destination Files
ruleprobe-semantic/src/ src/semantic/engine/ 39
ruleprobe-api-service/src/services/anthropic-proxy.ts src/semantic/anthropic-caller.ts 1
ruleprobe-semantic/tests/ tests/semantic/engine/ 18

What was deleted

File Reason
src/semantic/client.ts HTTP client (replaced by direct engine call)
tests/semantic/client.test.ts HTTP client tests (no longer needed)
All ruleprobe-api-service code Server, routes, license service, rate limiter, SQLite store

Breaking changes

CLI flag rename

--license-key is removed. Use ANTHROPIC_API_KEY environment variable instead (same pattern as --llm-extract with OPENAI_API_KEY).

Before:

ruleprobe analyze ./my-project --semantic --license-key <key>

After:

ANTHROPIC_API_KEY=sk-ant-... ruleprobe analyze ./my-project --semantic

Or pass it explicitly:

ruleprobe analyze ./my-project --semantic --anthropic-key <key>

No API service required

The RULEPROBE_API_ENDPOINT environment variable and apiEndpoint config field are removed. All analysis runs locally.

Config file changes

In .ruleprobe/config.json, replace licenseKey with anthropicApiKey:

Before:

{ "licenseKey": "rp-..." }

After:

{ "anthropicApiKey": "sk-ant-..." }

Data flow change

Before (v3.0.0):

  1. CLI extracts raw vectors locally
  2. CLI sends vectors over HTTP to API service
  3. API service runs semantic engine, calls Anthropic API
  4. API service returns verdicts to CLI

After (v4.0.0):

  1. CLI extracts raw vectors locally (unchanged)
  2. CLI runs semantic engine directly (no network)
  3. Engine calls Anthropic API with user's key (only when LLM needed)
  4. CLI integrates verdicts (unchanged)

Test count

Before After
ruleprobe: 864 ruleprobe: 1,085+
ruleprobe-semantic: 221 (merged into ruleprobe)
ruleprobe-api-service: 54 (deleted)
Total: 1,139 Total: 1,085+

API service tests (54) were not migrated as they tested HTTP routes, license validation, rate limiting, and SQLite storage, none of which exist in the consolidated architecture.

Migration checklist

  1. Remove RULEPROBE_LICENSE_KEY from environment variables and CI secrets
  2. Set ANTHROPIC_API_KEY where semantic analysis is used
  3. Replace --license-key <key> with --anthropic-key <key> (or just set the env var)
  4. Remove apiEndpoint from .ruleprobe/config.json if present
  5. Replace licenseKey with anthropicApiKey in .ruleprobe/config.json if present
  6. Stop any running ruleprobe-api-service instances

v3.0.0: Semantic Analysis Tier (ASPE)

Choose a tag to compare

@moonrunnerkc moonrunnerkc released this 15 Apr 01:24

RuleProbe v3.0.0

Released: April 2026

45 files changed, +3,861 / -251 lines since v2.0.0. 10 new source files, 7 new test files, 864 tests across 68 test files (was 572 across 52 at v2.0.0). 3 commits.

What changed

v2.0.0 delivered deterministic verification with 78 matchers. The real-world gap was twofold: (1) rules like "follow existing patterns" or "maintain consistency" have no deterministic check, and (2) performance collapsed on large codebases (PostHog: 7,000+ files). v3.0.0 adds the semantic analysis tier (ASPE), fixes 12 root-cause bugs found during E2E validation, and delivers a 50% performance improvement on large repos.

Summary of changes

1. Semantic analysis tier (new)

Full client-side integration for the paid ASPE (Adaptive Structural Profile Engine) tier. Eight new source files in src/semantic/:

  • local-extractor.ts: single-pass tree-sitter scanner producing RawExtractionPayload (AST node type counts, nesting depths, opaque sub-tree hashes). No source code, variable names, comments, file paths, or imports leave the machine.
  • client.ts: HTTP client sending raw vectors to the API service, receiving SemanticVerdict[] back. Handles license validation, graceful degradation on network failure, timeout, and retry.
  • config.ts: license key resolution (CLI flag > env var RULEPROBE_LICENSE_KEY > .ruleprobe/config.json), API endpoint configuration.
  • types.ts: public contract types (StructuralProfile, FeatureVector, CrossFileGraph, SemanticVerdict, RawExtractionPayload, etc.).
  • ast-visitor.ts: recursive tree visitor, canonical shape hashing (SHA-256 of AST structure), deviation comment detection, node classification.
  • file-walker.ts: file discovery respecting .gitignore, node_modules, dist, build, .next exclusions, sorted for deterministic order.
  • audit-log.ts: timestamped logging of every API call to .ruleprobe/semantic-log/.
  • index.ts: orchestrator wiring local extraction, remote analysis, and result integration.

Privacy guarantee: only numeric vectors, opaque hashes, boolean flags, and rule text are transmitted. Verified by automated privacy test against excalidraw (626 files) and PostHog (7,160 files). See verification/e2e-verification-report.md section 5.

2. CLI semantic flags (new)

Six new flags on the analyze command:

Flag Description
--semantic Enable semantic analysis (requires license key)
--license-key <key> License key for the semantic tier
--max-llm-calls <n> Cap LLM calls per analysis (default: 20)
--no-cache Disable profile caching
--semantic-log Print what was sent/received to stdout
--cost-report Show token cost breakdown

Without --semantic, the analyze command runs deterministic analysis only (unchanged behavior). If the license key is invalid or the API is unreachable, semantic analysis is skipped gracefully and deterministic results are still returned.

3. Batch AST verifier (performance fix)

Problem: v2.0.0 parsed every file once per AST rule, yielding O(rules * files) ts-morph parse calls. On PostHog (7,000+ files, ~30 AST rules), this was prohibitively slow.

Fix: new ast-verifier-batch.ts creates one ts-morph Project, parses each file once, runs every non-type-aware rule against it, then discards the SourceFile. Complexity drops to O(files).

The verifier router (src/verifier/index.ts) now collects all AST rules, runs them through the batch verifier in a single pass, then routes remaining rule types individually.

4. Tree-sitter WASM stability fix

Problem: creating a new Parser and calling Language.load() for every file caused WASM function table exhaustion on large repos (PostHog: 7,000+ Python files). Parser objects were leaked because parser.delete() was called inconsistently.

Fix: treesitter-loader.ts now caches one Parser instance per language. Language.load() is called once per grammar. The parseWithTreeSitter() return type no longer includes the parser (callers must not delete shared parsers). Tree deletion remains the caller's responsibility.

5. UPPER_CASE constant naming check (new matcher)

New AST check in src/ast-checks/naming.ts: verifies that module-scope const declarations with primitive initializers use UPPER_CASE naming. Skips destructured bindings, function expressions, arrow functions, objects, and arrays.

6. Analyze command decomposition

src/commands/analyze.ts was refactored: formatter functions extracted to src/commands/analyze-formatters.ts (289 lines) to comply with the 300-line file limit. The analyze handler now supports semantic integration, JSON/markdown output for analyze results, and the --threshold flag for CI pass/fail determination.

7. Matcher broadening

Existing matcher files received targeted additions:

  • rule-patterns.ts: +19 lines, broader keyword matching for existing patterns
  • rule-patterns-extended.ts: +50 lines, additional recognition patterns
  • rule-patterns-preference.ts: +4 lines, regex fix for preference pair detection

8. Calibration data

Real-world calibration fixtures added from excalidraw (626 files) and PostHog (7,160 files):

Metric excalidraw PostHog
Files extracted 626 7,160
Extraction time 7.5s 38.6s
Unique AST node types 278 298
Sub-tree hashes 9,262 119,254
Topic-matched rules 24/39 45/68
Mean similarity (matched) 0.9833 0.9834

Calibrated threshold: 0.85 (pre-calibration default confirmed; all topic-matched rules scored above 0.95 in both repos).

Calibrated weights: Jaccard=0.4, Cosine=0.6 (confirmed; near-identical means across repos of vastly different sizes validates scale-independence).

Full calibration report: tests/fixtures/calibration/CALIBRATION-REPORT.md

Bug fixes (12 root-cause resolutions)

All found during E2E verification against excalidraw and PostHog. Each is a root-cause fix, not a workaround.

# Bug Root Cause Fix
1 Tree-sitter WASM crash on large repos New Parser per file exhausts WASM function table Cache one Parser per language in treesitter-loader.ts
2 O(rules*files) AST performance Each rule triggered a full ts-morph parse of every file Batch AST verifier: parse each file once
3 Matchers not wired to runner Some v2.0.0 matchers never ran during verification Connected all matchers through verifier routing
4 Missing markdown/json analyze output analyze only output text format Added format routing for analyze command
5 Narrow matcher regexes Some rule patterns not triggering on real instruction text Broadened keyword matching in 3 pattern files
6 Enum string comparison mismatch VariableDeclarationKind compared as wrong type Import and use VariableDeclarationKind enum directly
7 analyze.ts exceeds 300-line limit Formatters inline in handler Extract to analyze-formatters.ts
8 Stale parser.delete() in tests Tests calling delete on shared parser Remove stale cleanup calls
9 Client/API header mismatch client.ts sent license key in body, API expected x-license-key header Added header to fetch call
10 AnalyzeResponse shape mismatch Client expected {verdicts, report}, API returns {report} with report.verdicts nested Fixed type and read path
11 JSON format corruption Semantic summary text appended after JSON.stringify output Guard semantic summary for non-JSON formats
12 Test mock shape mismatch Mocks had verdicts at top level, code reads report.verdicts Fixed mocks to nest verdicts inside report

Breaking changes

New verifier return type for tree-sitter

parseWithTreeSitter() no longer returns a parser in its result object. Callers that previously called parser.delete() must remove that call. The parser is now shared and cached internally.

// Before (v2.0.0)
const result = await parseWithTreeSitter(path, lang);
// result: { root, tree, parser }
result.parser.delete();

// After (v3.0.0)
const result = await parseWithTreeSitter(path, lang);
// result: { root, tree }
// parser is cached internally, do NOT delete

Analyze handler is now async

src/commands/analyze.ts handler changed from sync to async to support the semantic pipeline. If you call handleAnalyze() programmatically, await the result.

New CLI options

  • --semantic on analyze: enable semantic analysis
  • --license-key <key> on analyze: license key
  • --max-llm-calls <n> on analyze: cap LLM calls (default: 20)
  • --no-cache on analyze: disable profile caching
  • --semantic-log on analyze: print API call log
  • --cost-report on analyze: show token cost breakdown
  • --threshold <number> on analyze: compliance threshold for CI pass/fail (default: 0.8)

New public types

All new types are in src/semantic/types.ts:

  • PatternTopic
  • StructuralProfile
  • FeatureVector
  • CrossFileGraph
  • SemanticVerdict
  • StructuralViolation
  • SemanticAnalysisConfig
  • QualifierType (re-exported from core types)
  • QualifierContext
  • SemanticAnalysisReport
  • CrossFileFinding
  • RawExtractionPayload
  • RawFileVector
  • ExtractedRulePayload

Note: the semantic module is not re-exported from src/index.ts. Import directly from ruleprobe/dist/semantic/index.js if needed programmatically.

Files created

File Lines Purpose
src/semantic/types.ts 155 Public contract types for semantic tier
src/semantic/local-extractor.ts 261 Single-pass tree-sitter AST extraction
src/semantic/client.ts 164 HTTP client for API service
src/semantic/config.ts 120 License key and endpoint resolution
src/semantic/index.ts 156 Semantic p...
Read more

v2.0.0

Choose a tag to compare

@moonrunnerkc moonrunnerkc released this 13 Apr 18:57

RuleProbe v2.0.0

Released: April 2026

45 files changed, +885 / -105 lines since v1.0.0. 17 new source files, 12 new test files, 572 tests across 52 test files.

What changed

v1.0.0 could only verify naming conventions. A real-world audit against 8 repos (next.js, langchain, excalidraw, zed, elasticsearch, codex, cline, PostHog) found 98% of instruction file statements were unverifiable. v2.0.0 closes that gap with four new matcher categories, compliance scoring, multi-file analysis, structured extraction, and new report formats.

Breaking changes

Compliance scoring replaces binary pass/fail

RuleResult gains a compliance field (number, 0 to 1). Deterministic checks return 0 or 1. Pattern checks (prefer X over Y) return the ratio. Coverage checks (test colocation) return the percentage of source files with tests.

DEFAULT_COMPLIANCE_THRESHOLD is 0.8. The --threshold CLI option controls pass/fail determination.

// Before (v1)
interface RuleResult {
  rule: Rule;
  passed: boolean;
  evidence: Evidence[];
}

// After (v2)
interface RuleResult {
  rule: Rule;
  passed: boolean;
  compliance: number; // 0-1, new
  evidence: Evidence[];
}

All existing verifiers updated. Code consuming RuleResult needs no changes unless it checked the shape directly, but the compliance field is now always present.

Structured rule extraction

Rule gains two new fields:

interface Rule {
  // ... existing fields unchanged
  section?: string;     // markdown header the rule was found under
  qualifier?: QualifierType;
}

QualifierType is a new union:

type QualifierType =
  | 'always'        // "always use", "must", "required", or no qualifier keyword
  | 'prefer'        // "prefer", "favor", "default to", "instead of"
  | 'when-possible' // "when possible", "where feasible", "ideally"
  | 'avoid-unless'  // "avoid unless", "only when necessary", "except when"
  | 'try-to'        // "try to", "aim for", "should generally"
  | 'never'         // "never", "do not", "must not", "forbidden"
  ;

Detection is deterministic keyword/phrase matching during the extraction pass. No NLP, no LLM. Rules with no qualifier keyword default to 'always'.

Expanded category and verifier unions

// New categories
type RuleCategory =
  | 'naming' | 'forbidden-pattern' | 'structure' | 'test-requirement'
  | 'import-pattern' | 'error-handling' | 'type-safety' | 'code-style'
  | 'dependency'
  | 'preference'      // new
  | 'file-structure'  // new
  | 'tooling'         // new
  | 'testing'         // new
  ;

// New verifier types
type VerifierType = 'ast' | 'regex' | 'filesystem' | 'treesitter'
  | 'preference'  // new
  | 'tooling'     // new
  ;

Exhaustive switch statements and Record<RuleCategory, ...> types need updating.

New report format values

ReportFormat gains three new values: 'summary', 'detailed', 'ci'. Existing formats (text, json, markdown, rdjson) behave identically.

New features

Prefer-pattern matchers (category: preference)

The most common instruction type across all audited repos. Extracts "prefer X over Y", "use X instead of Y", "X over Y", "favor X over Y" patterns and counts occurrences of both sides via ts-morph AST analysis.

8 prefer-pairs ship in v2.0.0:

Pair Preferred Alternative
const-vs-let const let
named-vs-default-exports named exports default exports
interface-vs-type interface type alias
async-await-vs-then async/await .then() chains
arrow-vs-function-declarations arrow functions function declarations
template-literals-vs-concatenation template literals string concatenation
optional-chaining-vs-nested-conditionals optional chaining (?.) nested conditionals
functional-vs-class-components functional components class components

Returns compliance as a ratio (e.g., 0.85 = 85% preferred usage). Adding a new pair requires only adding an entry to the PREFER_PAIRS array in src/verifier/prefer-pairs.ts. No other code changes needed.

If a pair references a pattern without a corresponding AST query, the result reports it as "detected but not yet verifiable" rather than silently dropping it.

File/path existence matchers (category: file-structure)

5 matchers for instructions referencing project structure:

  • tests-dir: "Tests go in __tests__/" (directory must exist and contain files)
  • components-dir: "Components live in src/components/"
  • env-file: "Use .env.local for local config"
  • module-index: "Every module needs an index.ts" (checks all module directories, returns compliance ratio)
  • src-dir: "Source code in src/"

Dependency/tooling matchers (category: tooling)

9 matchers checking package.json, lockfiles, and config files:

  • Package managers: pnpm, yarn, bun (checks lockfile presence, flags competing lockfiles)
  • Test frameworks: vitest, jest, pytest (checks config files and package.json dependencies/scripts)
  • Tools: eslint, prettier, biome (scans config-like files for references)

When a competing tool is detected alongside the required one (e.g., both pnpm-lock.yaml and package-lock.json), compliance is set to 0.5 and the conflict is reported.

Test pattern matchers (category: testing)

3 matchers for testing conventions:

  • colocate-tests: Checks source-to-test colocation ratio across the project
  • describe-it-blocks: Verifies test files use describe()/it() structure
  • no-console-in-tests: Flags console.log/warn/error in test files

Multi-file project analysis

New top-level function and CLI command:

import { analyzeProject, discoverInstructionFiles } from 'ruleprobe';

const analysis = analyzeProject('/path/to/project');
// analysis.files: per-file extraction results
// analysis.conflicts: cross-file contradictions
// analysis.redundancies: same instruction in multiple files
// analysis.coverageMap: which categories are in which files

discoverInstructionFiles() checks for all recognized instruction file names: CLAUDE.md, AGENTS.md, .cursorrules, .github/copilot-instructions.md, GEMINI.md, .windsurfrules. The list is a typed constant (INSTRUCTION_FILE_NAMES) for easy extension.

CLI: ruleprobe analyze <project-dir> [--format text|json] [--output path]

Report formats

  • --format summary: Compact table with per-category pass/total/score, designed as the default CLI output
  • --format detailed: Full per-rule breakdown with compliance percentages, code locations, and evidence
  • --format ci: Minimal key=value output with GitHub Actions ::error annotations for failures

New CLI options

  • --threshold <number> on verify: compliance threshold (0-1) for pass/fail determination (default: 0.8)
  • ruleprobe analyze <project-dir>: discover and analyze all instruction files in a project

New public API exports

Functions: analyzeProject, discoverInstructionFiles

Types: QualifierType, ProjectAnalysis, FileAnalysis, CrossFileConflict, CrossFileRedundancy

Constants: INSTRUCTION_FILE_NAMES, DEFAULT_COMPLIANCE_THRESHOLD

Stats

Metric v1.0.0 v2.0.0
Source files 75 92
Source lines 8,607 11,115
Test files 40 52
Tests 434 572
Rule matchers 53 78
Rule categories 9 13
Verifier engines 4 6
CLI commands 6 7

Files created

File Lines Purpose
src/parsers/qualifier-detector.ts 104 Deterministic qualifier detection from instruction text
src/parsers/rule-patterns-preference.ts 182 8 preference matchers
src/parsers/rule-patterns-file-structure.ts 124 5 file structure matchers
src/parsers/rule-patterns-tooling.ts 186 9 tooling matchers
src/parsers/rule-patterns-testing.ts 76 3 testing matchers
src/parsers/instruction-patterns.ts 106 Instruction candidate regex patterns (extracted from rule-extractor.ts)
src/verifier/prefer-pairs.ts 132 Prefer-pair definitions and lookup
src/verifier/preference-verifier.ts 300 AST-based preference counting
src/verifier/tooling-verifier.ts 230 Package.json/lockfile/config verification
src/verifier/file-structure-checks.ts 219 Directory/file existence and compliance checks
src/verifier/test-regex-checks.ts 83 Test file regex checks (describe/it, no-console)
src/analyzers/project-analyzer.ts 239 Multi-file discovery, conflict/redundancy detection
src/analyzers/index.ts 8 Barrel export
src/reporter/summary.ts 70 Compact summary table formatter
src/reporter/detailed.ts 103 Per-rule detailed breakdown formatter
src/reporter/ci.ts 62 CI-friendly output with GitHub Actions annotations
src/commands/analyze.ts 108 Handler for the analyze CLI command

Files modified

File Change
src/types.ts Added 4 categories, 2 verifier types, QualifierType, compliance on RuleResult, section/qualifier on Rule, INSTRUCTION_FILE_NAMES, ProjectAnalysis types, DEFAULT_COMPLIANCE_THRESHOLD, 3 new report format values
src/index.ts Added exports for new types, analyzeProject, discoverInstructionFiles
src/parsers/rule-extractor.ts Imports 8 matcher sources + qualifier detector; attaches section/qualifier to rules; extracted instruction patterns to separate file
src/verifier/index.ts Routes preference and tooling verifier types
src/verifier/ast-verifier.ts All returns include compliance field
src/verifier/regex-verifier.ts Added describe-it-structure and no-console-in-tests cases; all returns include compliance
src/verifier/file-verifier.ts Added directory-exists-with-files, file-pattern-exists, module-index-required, `te...
Read more

v1.0.0

Choose a tag to compare

@moonrunnerkc moonrunnerkc released this 08 Apr 00:21

14 commits, 100 files changed, +9,017 lines since v0.1.0.

Breaking Changes

  • verifyOutput is now async. Returns Promise<RuleResult[]> instead of RuleResult[]. Callers must await it.
  • RuleCategory union expanded from 5 to 9 members: added error-handling, type-safety, code-style, dependency. Exhaustive switch statements and Record<RuleCategory, ...> types need updating.
  • VerifierType union expanded: added treesitter.

New Features

53 matchers across 9 categories (was 15 matchers, 5 categories). 19 new AST checks, 7 new regex checks, 5 new filesystem checks, 4 new tree-sitter checks covering error handling, type safety, code style, and dependency verification.

User-defined rules via ruleprobe.config.ts. Add custom rules, override extracted rule severity or thresholds, exclude rules entirely. Auto-discovered in the working directory or specified with --config. defineConfig() export provides TypeScript type checking. Supports .ts, .js, .json, and .ruleproberc.json formats.

LLM-assisted extraction (--llm-extract). Sends unparseable instruction lines through an OpenAI-compatible API for a second extraction pass. Extracted rules tagged with extractionMethod: 'llm', confidence: 'medium', severity warning. Requires OPENAI_API_KEY. Opt-in only; default behavior unchanged.

Rubric decomposition (--rubric-decompose). Breaks subjective instructions ("write clean code") into weighted concrete checks (max function length, no magic numbers, etc.) via LLM. Tagged with extractionMethod: 'rubric', confidence: 'low'. Requires OPENAI_API_KEY. Opt-in only.

Agent invocation (ruleprobe run). Invoke Claude via the Agent SDK, capture output, verify, and report in one step. Also supports --watch mode for any agent that writes to a directory. Requires @anthropic-ai/claude-agent-sdk and ANTHROPIC_API_KEY for SDK mode. Watch mode needs no dependencies.

Tree-sitter multi-language support. Python and Go get naming and function-length checks via WASM grammars. Grammar packages (web-tree-sitter, tree-sitter-python, tree-sitter-go) ship as regular dependencies. If loading fails on a platform, tree-sitter checks are skipped and other verifiers still run.

Type-aware checks (--project). Pass a tsconfig.json to enable cross-file type analysis: implicit any detection through aliases, unused exports, unresolved imports. Falls back to isolated-file parsing automatically if compilation fails.

New CLI Flags

  • --llm-extract on parse and verify
  • --rubric-decompose on verify
  • --config on verify, compare, and run
  • --project on verify and run

New Public API Exports

Functions: defineConfig, loadConfig, applyConfig, extractWithLlm, createOpenAiProvider, buildAgentConfig, invokeAgent, isAgentSdkAvailable, hasAgentOutput, watchForCompletion, countCodeFiles

Types: VerifyOptions, RuleProbeConfig, CustomRule, RuleOverride, LlmProvider, LlmRuleCandidate, LlmExtractionResult, LlmExtractOptions, OpenAiProviderConfig, AgentInvocationConfig, RunOptions, InvocationResult, WatchOptions, WatchResult

Resolved Limitations

Every limitation documented in v0.1.0 has been addressed:

  • "TypeScript and JavaScript only": Python and Go via tree-sitter.
  • "No subjective evaluation": --rubric-decompose decomposes subjective rules into measurable proxies.
  • "No automated agent invocation": ruleprobe run with Claude SDK and watch mode.
  • "Conservative extraction (15 matchers)": 53 matchers, plus --llm-extract for the remainder.
  • "Type-level checks are limited": --project enables TypeChecker-dependent analysis.

Stats

Metric v0.1.0 v1.0.0
Source files 30 75
Source lines 3,328 8,607
Test files 13 27
Rule matchers 15 53
Rule categories 5 9
Verifier engines 3 4
CLI commands 5 6
Public API exports 15 40

v0.1.0

Choose a tag to compare

@moonrunnerkc moonrunnerkc released this 06 Apr 22:57

RuleProbe v0.1.0 — the first release. Parse AI agent instruction files, verify output against extracted rules, get deterministic adherence reports.

What it does

Give RuleProbe an instruction file (CLAUDE.md, AGENTS.md, .cursorrules, copilot-instructions.md, GEMINI.md, .windsurfrules) and a directory of agent-generated code. It extracts machine-verifiable rules, runs them against the code, and tells you exactly which ones passed and which ones failed, with file paths and line numbers.

No LLM evaluation. No judgment calls. Same input, same output, every time.

Install

npm install -g ruleprobe

Highlights

Parser: 15 rule matchers across 5 categories (naming, forbidden-pattern, structure, test-requirement, import-pattern). Rules it can't confidently classify are reported as unparseable so you know what was skipped.

Verifiers: AST checks via ts-morph (camelCase, PascalCase, no any, no console.log, named exports, JSDoc, import patterns), file system checks (kebab-case names, test file existence), and regex checks (line length, file length).

CLI: 5 commands.

  • ruleprobe parse — extract rules from an instruction file
  • ruleprobe verify — check agent output against those rules
  • ruleprobe compare — side-by-side comparison across agents
  • ruleprobe tasks / ruleprobe task <id> — built-in task templates

Reports: text (terminal), JSON (CI), markdown (publishing), rdjson (reviewdog).

GitHub Action: composite action you can drop into any repo. Runs on every PR, posts results as a comment, supports reviewdog for inline annotations. No API keys beyond GITHUB_TOKEN.

Structured exit codes: 0 all passed, 1 violations found, 2 execution error.

Programmatic API: parseInstructionFile, verifyOutput, generateReport, formatReport, extractRules.

Security: never executes scanned code, never makes network calls, path traversal protection, all dependencies pinned to exact versions.

Numbers

  • 30 source files, ~3,300 lines of TypeScript
  • 206 tests across 23 test files
  • 3 task templates: rest-endpoint, utility-module, react-component

Known limitations

  • TypeScript and JavaScript only (AST checks use ts-morph)
  • No subjective rule evaluation
  • No automated agent invocation (planned for v0.2.0)
  • Conservative extraction: prefers skipping rules over misclassifying them

Links