Catch visual regressions with your AI agent - diff two renderings of a site, read one tiny report, open an image only when something breaks.
A portable, agent-installable visual regression / screenshot-diff skill powered by odiff + Playwright.
Install • Why • How it works • Usage • Commands • Config • FAQ
visual-diff is an Agent Skill for AI coding assistants. It captures every page in a list from two URLs - a local build and a live/staging build - diffs each pair with the native odiff engine, and writes a compact pass/fail table. Your agent reads the table and only opens a highlighted diff image for the pages that actually changed. No more burning context on full-page screenshots to prove two sites match.
Use it to verify a migration or port, check visual parity after a refactor, QA a redesign before shipping, or run pixel-accurate visual regression testing straight from your agent.
Works in Claude Code, Claude Desktop, OpenAI Codex, Google Antigravity, Cursor, and opencode - plus any agent that can read a Markdown skill and run Node.
The fastest way to install visual-diff is to let your AI agent do it. Paste this one line into Claude Code, Codex, Cursor, opencode, Antigravity, or Claude Desktop:
If you are an agent, read https://github.com/wpfyorg/visual-diff-skill/blob/master/LLM-INSTALL.md and install this stack.
The agent reads LLM-INSTALL.md - a step-by-step guide written for the agent - then places the skill in the right folder for your tool, installs the dependencies, and verifies it works. If it gets stuck, point it at skills/visual-diff/SKILL.md, which is self-describing.
If your tool supports the open npx skills add standard (Claude Code, Codex, Cursor, opencode, and more):
npx skills add https://github.com/wpfyorg/visual-diff-skill --skill visual-diff
cd ~/.claude/skills/visual-diff && npm install && npx playwright install chromium # install runtime depsgit clone https://github.com/wpfyorg/visual-diff-skill
cp -R visual-diff-skill/skills/visual-diff ~/.claude/skills/visual-diff # adjust path for your tool
cd ~/.claude/skills/visual-diff
npm install && npx playwright install chromiumThen copy config.example.json into your project as visual-diff.config.json and edit it.
| Tool | Where the skill goes | Trigger |
|---|---|---|
| Claude Code | ~/.claude/skills/visual-diff/ (global) or .claude/skills/visual-diff/ (project) |
/visual-diff |
| Claude Desktop | Settings → Capabilities → Skills → add the folder | invoke "visual-diff" |
| OpenAI Codex | .agents/skills/visual-diff/ (+ a /visual-diff prompt) |
/visual-diff |
| Google Antigravity | .agents/skills/visual-diff/ |
/visual-diff |
| Cursor | .cursor/commands/visual-diff.md or .cursor/rules/ |
/visual-diff |
| opencode | .opencode/command/visual-diff.md or ~/.config/opencode/ |
/visual-diff |
| Any Markdown-skill agent | wherever it reads skills; just run the Node script | run the script |
The skill is just a folder of Markdown + Node - no proprietary format. If your agent can read a SKILL.md and shell out to node, it can run visual-diff.
We were porting a hand-authored static site into a WordPress/Bricks Builder build, page by page. Every time we changed a section we needed to answer one question: does the live build still match the source?
The naive way an AI agent does this is brutal on tokens - it screenshots both sites full-page and reads two giant PNGs per page into context, then eyeballs the difference. Fourteen pages × two renderings × desktop + mobile is a context bloodbath, and "looks about right" is not a real gate.
So we flipped it. Let a deterministic image differ do the matching and emit a number. The agent reads a one-line-per-page table, sees ✅ pass / ❌ fail, and opens an actual image only for the handful of pages that regressed. Same verification, a fraction of the tokens, and a hard numeric acceptance gate (default < 0.1% pixel difference) instead of vibes.
It turned out to be useful far beyond our migration - any time you have "the same pages rendered two ways" (refactor vs. main, staging vs. prod, framework A vs. framework B, before vs. after a dependency bump), this is the skill.
Tell your agent how close the two builds must match and let it loop. "Consistency" is just
100% − diff%, which maps directly to the skill's --threshold gate. Paste one of these:
| You want… | Threshold | Paste this to your agent |
|---|---|---|
| 90% consistency (rough parity - early porting) | 10 |
Run /visual-diff with --threshold 10. Fix every ❌ fail in my source, then re-run until all pages pass. Read only report.md; open a diff image only for failing rows. |
| 95% consistency (close - pre-review) | 5 |
Run /visual-diff with --threshold 5. Iterate: for each ❌ fail, open its diff*.png, fix the source, re-run that page, repeat until every page passes at 95%. |
| 99% consistency (tight - pre-ship QA) | 1 |
Get my live build to 99% visual parity with local. Run /visual-diff --threshold 1, fix each failing region (font, spacing, color, radius, icon size), and don't stop until all pages pass. |
| 99.9% - default (pixel-perfect) | 0.1 |
Run /visual-diff and don't declare done until every page is under the 0.1% gate. |
How the agent hits the target. The skill turns "make it consistent" into a closed loop the agent can run unattended:
- Set the gate - your target becomes
--threshold (100 − target), so 95% →--threshold 5. - Measure - the runner diffs every page and writes
report.mdwith a real number per page. - Triage cheaply - the agent reads only the table and opens a
diff*.pngonly for❌ failrows (changed pixels are magenta), so it sees exactly where a page drifts. - Fix the source - it edits your code/build for each failing region. The usual culprits are named for it: font fallback, letter-spacing, line-height, an off-by-a-hex brand color, border-radius, icon size, section padding.
- Re-run the failing page -
--page <slug>re-checks just that one, fast. - Loop until green - repeat 3–5 until every page is under the gate. Exit code
0is the agent's stop signal;2means keep going.
The numeric gate is what makes this work without you in the loop: the agent has an objective "am I done yet?" instead of guessing from screenshots.
local base live base
http://127.0.0.1:8765 https://staging.example.com
| |
v v
+-----------------------------------------------+
| Playwright (headless Chromium) |
| - reduced-motion -> deterministic frames |
| - waits document.fonts.ready -> stable text |
| - optional full-page scroll to fire reveals |
+-----------------------------------------------+
| local.png | live.png
+--------------+----------------+
v
+-------------------------+
| odiff (native, fast) |
| - pads size mismatches |
| - highlights changes | -> diff.png (magenta #ff00ff)
| - returns diff % |
+-------------------------+
v
report.md <- the ONLY thing the agent reads
+-------------------------------------------------------+
| | Page | Viewport | Diff % | Status | Diff image| |
| | homepage | desktop | 0.00% | pass | - | |
| | pricing | desktop | 2.41% | fail | diff.png | |
+-------------------------------------------------------+
- Capture - For each page, Playwright loads the same
pathfrom both bases at a fixed viewport (desktop 1280×800; add mobile 390×844 with--mobile). It emulatesprefers-reduced-motionto freeze animations, waits for web fonts so text metrics are identical, and (with--full) scrolls the page to triggerIntersectionObserverreveal animations before shooting. - Diff - odiff compares the two PNGs natively (much faster than pixelmatch on full pages). If the two images differ in size (e.g. the live page is taller), both are padded onto a shared canvas so a height difference correctly raises the diff % instead of erroring out. Changed pixels are painted magenta in
diff.png. - Gate & report - Each page passes when its diff is below the threshold (default 0.1%). The runner writes
report.md(a small table),report.json(machine-readable), and the per-page PNGs. Exit code is0if everything passes,2if anything fails - so it drops straight into CI. - The token-saving discipline - Your agent reads only
report.md, then opens adiff*.pngonly for❌ failrows. Passing pages are never loaded into context.
- Start your local server at whatever
localBasepoints to - e.g.python3 -m http.server 8765 --directory web,npm run dev, etc. - Trigger the skill in your agent (
/visual-diff) or run the script directly:
node ~/.claude/skills/visual-diff/scripts/visual-diff.mjs # all pages, desktop
node ~/.claude/skills/visual-diff/scripts/visual-diff.mjs --mobile # + mobile viewport
node ~/.claude/skills/visual-diff/scripts/visual-diff.mjs --full # full-page captures
node ~/.claude/skills/visual-diff/scripts/visual-diff.mjs --page pricing- Read
tmp/visual-diff/<run>/report.md. Open adiff*.pngonly for failing rows.
Output lands in <cwd>/tmp/visual-diff/<run-timestamp>/:
tmp/visual-diff/2026-06-23T04-08-00Z/
├── report.md <- read this
├── report.json <- machine-readable (CI)
└── homepage/
├── local.png live.png diff.png
| Flag | Effect |
|---|---|
| (none) | All pages, desktop 1280×800 |
--page <slug> |
Diff a single page from the manifest |
--mobile |
Also capture mobile 390×844 |
--full |
Full-page captures (scrolls to trigger reveals; size diffs are padded) |
--local-base <url> |
Override the local base URL |
--live-base <url> |
Override the live base URL |
--threshold <n> |
Override the pass gate (percent, default 0.1) |
--config <file> |
Use a specific config file |
--pages <file> |
Use a specific pages manifest |
--help |
Print usage |
Exit codes: 0 = every page within the gate · 2 = one or more fail/error.
Create visual-diff.config.json in your project root (copy config.example.json):
{
"localBase": "http://127.0.0.1:8765",
"liveBase": "https://staging.example.com",
"thresholdPct": 0.1,
"pages": [
{ "slug": "homepage", "path": "/" },
{ "slug": "pricing", "path": "/pricing/" },
{ "slug": "about", "path": "/about/" }
]
}path is shared between both bases. You can keep pages in a separate visual-diff.pages.json ({ "pages": [...] }) instead - the runner auto-discovers it.
Config resolution order (highest priority first):
- CLI flag -
--local-base,--live-base,--page,--pages,--threshold,--config - Env var -
VDIFF_LOCAL_BASE,VDIFF_LIVE_BASE,VDIFF_PAGES,VDIFF_THRESHOLD visual-diff.config.jsonin the current working directory- Built-in defaults -
localhost:8765vslocalhost:8766, single/page,0.1%gate
| Dependency | Why |
|---|---|
| Node.js ≥ 18 | runtime |
odiff-bin |
native, fast image diffing |
playwright |
headless Chromium capture (npx playwright install chromium) |
pngjs |
size-mismatch padding + diff re-encode so the image is vision-API readable |
Install all three inside the skill folder: npm install && npx playwright install chromium.
Is this only for static sites? No. Any two URLs that render the same path work - staging vs. prod, a refactor branch served on another port vs. main, framework A vs. framework B, before/after a redesign.
A page fails but looks identical to me. Live captures can be noisy - cookie banners, late-loading fonts, lazy content, A/B variants. Re-run first. If it persists, raise the per-pixel threshold or add ignoreRegions in scripts/lib/odiff-compare.mjs. Common real causes: font fallback, letter-spacing, line-height, an off-by-a-hex brand color, border-radius, icon size, or section padding.
Why odiff instead of pixelmatch? odiff is a native binary and dramatically faster on full-page screenshots, and it returns a clean diff percentage the runner can gate on without decoding the image.
Does it work in CI? Yes - it's a plain Node script with a meaningful exit code (2 on any failure) and a report.json. Start your server, run it, fail the job on exit 2.
Can the agent install it itself? That's the intended path - see Install (just ask your agent).
MIT © wpfyorg. No project-specific URLs or paths remain in this build - drop it into any repo.
Keywords: visual regression testing · screenshot diff · pixel diff · odiff · Playwright · AI agent skill · Claude Code skill · Codex skill · Cursor · opencode · Antigravity · visual QA · CI visual testing · migration parity check.