@blazediff/agent
Agentic visual regression for BlazeDiff. Auto-discovers routes, captures deterministic screenshots, diffs them against committed baselines with the native BlazeDiff core, and hands ambiguous diffs back to your coding agent (Claude Code, Cursor, Codex) to judge.
The package ships a deterministic CLI (blazediff-agent) plus a portable playbook (SKILL.md) that any coding agent drives. No embedded LLM call, no API key in the default flow - your coding agent supplies the loop, vision, and context engineering.
Installation
Global
npm install -g @blazediff/agentLocal
npm install --save-dev @blazediff/agentChromium
First run will prompt to install bundled Playwright Chromium - no sudo, no npx playwright install --with-deps.
blazediff-agent browsers install --check --json # check
blazediff-agent browsers install # install if missingOnboard a coding agent
blazediff-agent onboard installs the BlazeDiff playbook into whatever coding-agent harness lives in your project. Run once per project.
# Auto-detect (Claude Code, Codex, Cursor)
blazediff-agent onboard --json
# Explicit
blazediff-agent onboard --harness codex
blazediff-agent onboard --harness claude,codex
blazediff-agent onboard --harness allPer harness:
| Harness | Target | Scope | Detection signal |
|---|---|---|---|
| Claude Code | <project>/.claude/skills/blazediff/SKILL.md | project | CLAUDE.md or .claude/ |
| Codex | ~/.codex/prompts/blazediff.md | user-global | AGENTS.md, .codex/, or ~/.codex/ |
| Cursor | <project>/.cursor/rules/blazediff.mdc | project | .cursor/ or .cursorrules |
Codex is user-global because OpenAI’s Codex CLI looks for slash-command prompts in ~/.codex/prompts/ — installing there means /blazediff works in every project on your machine. On a TTY with no detection, the command prompts. Pass --force to overwrite a hand-edited file.
Quickstart from a coding agent
Once onboarded, from Claude Code / Codex / Cursor:
/blazediff --cwd apps/websiteThe skill detects whether you’re authoring (no .blazediff/manifest.json) or checking (manifest exists), runs the right flow end-to-end, and stops to ask for confirmation only before destructive operations (rewriting baselines, masking).
Quickstart from the CLI
author
Author baselines
# 1. Generate config from your dev script
blazediff-agent init --json
# 2. Ensure Chromium is installed
blazediff-agent browsers install --check --json
# 3. Start the configured dev server (waits up to 60s for the port)
blazediff-agent serve-status --detach --json
# 4. Capture baselines in one call - pipe a JSON list of routes
cat <<'EOF' | blazediff-agent capture --stdin --mode baseline --json
[
{"id": "home", "url": "/", "mask": [".timestamp"]},
{"id": "pricing", "url": "/pricing"}
]
EOF
# 5. Stop the dev server (mandatory teardown)
blazediff-agent serve-status --kill --jsonCommit .blazediff/ (config + manifest + baselines).
Commands
| Command | Purpose |
|---|---|
onboard | Install the playbook into the detected coding-agent harness (Claude Code, Codex, Cursor) |
init | Detect framework/dev-script, write .blazediff/config.json + .gitignore |
discover | BFS-crawl routes from baseUrl (depth 2, ≤50 routes) as a fallback when source-walking fails |
capture --stdin | Read a JSON array of routes from stdin, screenshot each, write baselines/actuals + manifest |
check | Re-capture every manifest entry, diff against baseline, emit CheckReport |
run | Pipelines capture → diff → verdict → judge via LangGraph for parallelism + LangSmith traces |
rewrite <id...> | Re-baseline existing manifest entries (mask/viewport/waitFor preserved) |
diff <id> | Re-diff one entry against its actual capture without re-screenshotting |
manifest | Inspect / list manifest entries |
serve-status | --detach / --kill / --status against the configured dev server |
browsers install | Install bundled Playwright Chromium |
reset --yes | Wipe .blazediff/ |
All commands accept --json for machine-readable output. Pass -C, --cwd <abs-path> to operate on a sub-directory (e.g. one app inside a monorepo).
The judging model
The heuristic verdict pipeline emits one of four labels per failing entry:
| Label | Meaning | Default action |
|---|---|---|
regression-likely | Confident structural change | Investigate; do not rewrite |
intentional-likely | Confident styling/typographic change | Ask user, then rewrite |
noise-likely | Confident non-deterministic source | Ask user; prefer masking over rewriting |
ambiguous | Heuristic couldn’t classify | Defer to host judge |
For ambiguous, the --judge host backend writes a JudgmentRequest to .blazediff/judgments/<id>/request.json containing:
regions[]- bounding boxes, pixel counts, and change types per detected regionpaths.locator(locator.png) - a ~400 px overview thumbnail with every region outlined in redpaths.tiles(regions.png) - a vertical stack of[baseline | actual]pairs, one row per region, at native resolutionpaths.{baseline,actual,diff}- full-page PNGs as a fallbackheuristicVerdictand fullmanifestEntrycontext
Token discipline. The region tiles are 10-100x smaller than the full-page PNGs. A well-behaved host agent reads regions.png + locator.png first and only falls back to the full-page PNGs if a region clearly continues outside its crop.
The host agent writes its verdict to .blazediff/judgments/<id>/verdict.json:
{
"id": "agent",
"verdict": {
"label": "intentional-likely",
"headline": "Em-dash replaced with hyphen in copy",
"rationale": ["region tile shows only typographic substitution"],
"action": "rewrite-if-intended"
},
"rationale": "Full paragraph explanation...",
"confidence": 0.95
}Then re-run blazediff-agent check --apply-judgments --json to merge verdicts into the report. No re-screenshot.
Masking unstable regions
When a diff is noise-likely - or when a regression-likely / intentional-likely diff is actually caused by something inherently non-deterministic in the page - the right fix is usually a mask, not a rebaseline. A rebaseline just resets the clock on a flake; a mask removes it.
Mask whenever the changing region is:
- An auto-cycling animation: carousels, marquees, demo widgets with
setInterval, video posters, Lottie loops - A third-party iframe or embed: Storybook, YouTube, codesandbox, Stripe checkout - anything whose load timing or content you don’t control.
networkidledoes not wait for embedded iframe subresources. - Time-derived:
Date.now()clocks, “X minutes ago” timestamps, today-highlighted calendars, expiry countdowns - Per-session randomness: avatars seeded from session id, A/B-test variants, generated IDs, shuffled lists
- Anti-bot / personalization noise: async cookie banners, recommendation strips, geo-derived prices
Don’t mask real content that just happens to be changing - that’s the change you want the test to catch.
Default attribute
The agent always masks any element matching [data-blazediff-agent-mask]. No manifest changes are needed. This is the preferred path whenever you can edit the source.
<div data-blazediff-agent-mask>...</div>
// or with a reason inline:
<div data-blazediff-agent-mask="report-carousel">...</div>The attribute value is ignored by the matcher (presence is enough); use it to document intent for future readers. Add the attribute to a shared component (layout, header, footer) and the mask applies on every route automatically.
Per-entry selector (fallback)
When you can’t edit the source (third-party iframe, framework-owned element), fall back to a CSS selector on the manifest entry. Selectors are passed to document.querySelectorAll, then painted with a magenta rect over the bounding rect in both baseline and actual.
- For external embeds, target the element type:
iframe,video,[data-testid="storybook-preview"]. - Avoid Tailwind class chains and
nth-childselectors. They break on the next style tweak. - Scope matters. Each manifest entry has its own
maskarray, soiframeon/examples/web-componentswon’t affect/home.
Re-capture the affected entries with the new mask list. The mask list replaces the existing one. Include every selector you want kept.
cat <<'EOF' | blazediff-agent capture --stdin --mode baseline --json
[
{"id": "examples-web-components", "url": "/examples/web-components", "mask": ["iframe"]}
]
EOFRe-run check / run to confirm the entry now passes.
Configuration
.blazediff/config.json is written by init and committed:
{
"devServer": {
"command": "pnpm dev",
"port": 3000,
"readyTimeoutMs": 60000
},
"framework": "next",
"packageManager": "pnpm",
"baseUrl": "http://127.0.0.1:3000"
}Omit devServer to point the agent at an already-running URL (set baseUrl directly):
blazediff-agent init --url https://staging.example.com --json.blazediff/manifest.json is written by capture - never edit it directly. Each entry holds:
{
id: string;
url: string;
mask: string[]; // CSS selectors
viewport: { width: number; height: number };
waitFor: ("networkidle" | "fonts" | string)[];
fullPage: boolean;
}CI
In CI (CI=1 or no TTY), only check and run are allowed. init / capture / rewrite / reset are explicitly blocked - authoring belongs at the developer’s machine.
GitHub Actions
- run: pnpm install
- run: npx blazediff-agent browsers install
- run: npx blazediff-agent --cwd apps/website check --jsonExit codes:
0- every entry passed1- at least one regression, intentional, noise, or pending-judgment entry- non-zero with structured JSON error on infra failures (missing manifest, no chromium, etc.)
Hard rules
- Never
--mode baselinean existing manifest entry without explicit user request. - Never edit
.blazediff/manifest.jsondirectly. - In CI (
CI=1or no TTY), onlycheck/runare allowed. - A route that times out is logged once in the result array and skipped - never blocks the run.
- Never leave a dev server running after authoring exits.
serve-status --killis mandatory teardown.