@blazediff/agent

Agentic visual regression for BlazeDiff. Auto-discovers routes, captures deterministic screenshots, diffs them against committed baselines with the native BlazeDiff core, and hands ambiguous diffs back to your coding agent (Claude Code, Cursor, Codex) to judge.

The package ships a deterministic CLI (blazediff-agent) plus a portable playbook (SKILL.md) that any coding agent drives. No embedded LLM call, no API key in the default flow - your coding agent supplies the loop, vision, and context engineering.

View on GitHub · Landing page

Installation

Global


npm install -g @blazediff/agent


pnpm add -g @blazediff/agent


yarn global add @blazediff/agent


bun add --global @blazediff/agent

Local


npm install --save-dev @blazediff/agent


pnpm add --save-dev @blazediff/agent


yarn add --dev @blazediff/agent


bun add --dev @blazediff/agent

Chromium

First run will prompt to install bundled Playwright Chromium - no sudo, no npx playwright install --with-deps.


blazediff-agent browsers install --check --json   # check
blazediff-agent browsers install                  # install if missing

Onboard a coding agent

blazediff-agent onboard installs the BlazeDiff playbook into whatever coding-agent stack lives in your project. Run once per project.


# Auto-detect (Claude Code, Codex, Cursor)
blazediff-agent onboard --json
 
# Explicit
blazediff-agent onboard --stack codex
blazediff-agent onboard --stack claude,codex
blazediff-agent onboard --stack all
 
# No coding agent - install the local (Moondream + Qwen) judge
blazediff-agent onboard --stack local

Per stack:

Stack	Target	Scope	Detection signal
Claude Code	`<project>/.claude/skills/blazediff/SKILL.md`	project	`CLAUDE.md` or `.claude/`
Codex	`~/.codex/skills/blazediff/SKILL.md`	user-global	`AGENTS.md`, `.codex/`, or `~/.codex/`
Cursor	`<project>/.cursor/rules/blazediff.mdc`	project	`.cursor/` or `.cursorrules`

Codex is user-global because OpenAI’s Codex CLI discovers skills under ~/.codex/skills/<name>/SKILL.md - installing there means /blazediff works in every project on your machine. On a TTY with no detection, the command prompts. Pass --force to overwrite a hand-edited file.

Quickstart from a coding agent

Once onboarded, from Claude Code / Codex / Cursor:


/blazediff --cwd apps/website

The skill detects whether you’re authoring (no .blazediff/manifest.json) or checking (manifest exists), runs the right flow end-to-end, and stops to ask for confirmation only before destructive operations (rewriting baselines, masking).

Quickstart from the CLI

author

Author baselines


# 1. Setup: config from your dev script + Chromium + playbook
#    (--no-capture: baselines are captured explicitly in step 3 below)
blazediff-agent onboard --no-capture
 
# 2. Start the configured dev server (waits up to 60s for the port)
blazediff-agent serve-status --detach --json
 
# 3. Capture baselines in one call - pipe a JSON list of routes
cat <<'EOF' | blazediff-agent capture --stdin --mode baseline --json
[
  {"id": "home", "url": "/", "mask": [".timestamp"]},
  {"id": "pricing", "url": "/pricing"}
]
EOF
 
# 4. Stop the dev server (mandatory teardown)
blazediff-agent serve-status --kill --json

Commit .blazediff/ (config + manifest + baselines).

check

Check (CI verb)


blazediff-agent check --judge host --json

The CLI starts the dev server automatically when config.devServer is set, runs every manifest entry through Playwright, diffs each capture against its baseline, and emits a CheckReport:


{
  "summaryPath": ".blazediff/summary.md",
  "totalEntries": 23,
  "passed": 22,
  "failed": 0,
  "pendingJudgments": 1,
  "results": [
    {
      "id": "agent",
      "url": "/agent",
      "status": "needs-judgment",
      "verdict": {
        "label": "ambiguous",
        "headline": "5 regions: 4 content-change, 1 addition @ left (0.13%, low)",
        "action": "investigate"
      }
    }
  ]
}

results[] lists non-pass entries only. Full per-entry details (regions, paths, rationale) live in .blazediff/summary.md (a 5-column markdown table with inline image previews) and .blazediff/judgments/<id>/request.json.

accept regression

Accept an intentional regression


# By id
blazediff-agent rewrite home pricing --json
 
# All failures from the last check
blazediff-agent rewrite --failed --json
 
# All entries (rare; usually wrong)
blazediff-agent rewrite --all --json

rewrite preserves the existing manifest entry’s mask / viewport / waitFor / fullPage; only the baseline PNG is regenerated. Re-run check afterwards to confirm clean.

reset

Wipe and start over


blazediff-agent reset --yes --json

Deletes the entire .blazediff/ directory (config, manifest, baselines, actual, judgments, summary, pid/log). Tracked dev server is stopped first. Discards committed baselines - confirm explicitly before running.

Commands

Command	Purpose
`onboard`	Interactive setup: write `.blazediff/config.json` + `.gitignore`, install Chromium, install the playbook into the detected coding-agent stack (Claude Code, Codex, Cursor, or `--stack local`), and optionally capture baselines
`discover`	BFS-crawl routes from `baseUrl` (depth 2, ≤50 routes) as a fallback when source-walking fails
`capture --stdin`	Read a JSON array of routes from stdin, screenshot each, write baselines/actuals + manifest
`check`	Re-capture every manifest entry, diff against baseline, emit `CheckReport`. Uses LangGraph for per-entry parallelism; suspends on ambiguous entries when `--judge host` and resumes via `--apply-judgments`
`rewrite <id...>`	Re-baseline existing manifest entries (mask/viewport/waitFor preserved)
`diff <id>`	Re-diff one entry against its actual capture without re-screenshotting
`manifest`	Inspect / list manifest entries (`add --harness <name>` to attach a harness)
`auth init`	Record a login flow via Playwright codegen into `.blazediff/harnesses/auth.js` (fallback for OAuth/SSO/MFA; simple forms are authored directly)
`serve-status`	`--detach` / `--kill` / `--status` against the configured dev server
`browsers install`	Install bundled Playwright Chromium
`reset --yes`	Wipe `.blazediff/`

All commands accept --json for machine-readable output. Pass -C, --cwd <abs-path> to operate on a sub-directory (e.g. one app inside a monorepo).

The judging model

The heuristic verdict pipeline emits one of four labels per failing entry:

Label	Meaning	Default action
`regression-likely`	Confident structural change	Investigate; do not rewrite
`intentional-likely`	Confident styling/typographic change	Ask user, then rewrite
`noise-likely`	Confident non-deterministic source	Ask user; prefer masking over rewriting
`ambiguous`	Heuristic couldn’t classify	Defer to host judge

For ambiguous, the --judge host backend writes a JudgmentRequest to .blazediff/judgments/<id>/request.json containing:

regions[] - bounding boxes, pixel counts, and change types per detected region
paths.locator (locator.png) - a ~400 px overview thumbnail with every region outlined in red
paths.tiles (regions.png) - a vertical stack of [baseline | actual] pairs, one row per region, at native resolution
paths.{baseline,actual,diff} - full-page PNGs as a fallback
heuristicVerdict and full manifestEntry context

Token discipline. The region tiles are 10-100x smaller than the full-page PNGs. A well-behaved host agent reads regions.png + locator.png first and only falls back to the full-page PNGs if a region clearly continues outside its crop.

The host agent writes its verdict to .blazediff/judgments/<id>/verdict.json:


{
  "id": "agent",
  "verdict": {
    "label": "intentional-likely",
    "headline": "Em-dash replaced with hyphen in copy",
    "rationale": ["region tile shows only typographic substitution"],
    "action": "rewrite-if-intended"
  },
  "rationale": "Full paragraph explanation...",
  "confidence": 0.95
}

Then re-run blazediff-agent check --apply-judgments --json to merge verdicts into the report. No re-screenshot.

Masking unstable regions

When a diff is noise-likely - or when a regression-likely / intentional-likely diff is actually caused by something inherently non-deterministic in the page - the right fix is usually a mask, not a rebaseline. A rebaseline just resets the clock on a flake; a mask removes it.

Mask whenever the changing region is:

An auto-cycling animation: carousels, marquees, demo widgets with setInterval, video posters, Lottie loops
A third-party iframe or embed: Storybook, YouTube, codesandbox, Stripe checkout - anything whose load timing or content you don’t control. networkidle does not wait for embedded iframe subresources.
Time-derived: Date.now() clocks, “X minutes ago” timestamps, today-highlighted calendars, expiry countdowns
Per-session randomness: avatars seeded from session id, A/B-test variants, generated IDs, shuffled lists
Anti-bot / personalization noise: async cookie banners, recommendation strips, geo-derived prices

Don’t mask real content that just happens to be changing - that’s the change you want the test to catch.

Default attribute

The agent always masks any element matching [data-blazediff-agent-mask]. No manifest changes are needed. This is the preferred path whenever you can edit the source.


<div data-blazediff-agent-mask>...</div>
// or with a reason inline:
<div data-blazediff-agent-mask="report-carousel">...</div>

The attribute value is ignored by the matcher (presence is enough); use it to document intent for future readers. Add the attribute to a shared component (layout, header, footer) and the mask applies on every route automatically.

Per-entry selector (fallback)

When you can’t edit the source (third-party iframe, framework-owned element), fall back to a CSS selector on the manifest entry. Selectors are passed to document.querySelectorAll, then painted with a magenta rect over the bounding rect in both baseline and actual.

For external embeds, target the element type: iframe, video, [data-testid="storybook-preview"].
Avoid Tailwind class chains and nth-child selectors. They break on the next style tweak.
Scope matters. Each manifest entry has its own mask array, so iframe on /docs/ui-components/vanilla won’t affect /home.

Re-capture the affected entries with the new mask list. The mask list replaces the existing one. Include every selector you want kept.


cat <<'EOF' | blazediff-agent capture --stdin --mode baseline --json
[
  {"id": "examples-vanilla", "url": "/docs/ui-components/vanilla", "mask": ["iframe"]}
]
EOF

Re-run check to confirm the entry now passes.

Configuration

.blazediff/config.json is written by onboard and committed:


{
  "devServer": {
    "command": "pnpm dev",
    "port": 3000,
    "readyTimeoutMs": 60000
  },
  "framework": "next",
  "packageManager": "pnpm",
  "baseUrl": "http://127.0.0.1:3000"
}

Per-route behavior (login, interactions) lives in harnesses, not config - see Harnesses below.

Omit devServer to point the agent at an already-running URL (set baseUrl directly):


blazediff-agent onboard --url https://staging.example.com --json

.blazediff/manifest.json is written by capture - never edit it directly. Each entry holds:


{
  id: string;
  url: string;
  mask: string[];               // CSS selectors
  viewport: { width: number; height: number };
  waitFor: ("networkidle" | "fonts" | string)[];
  fullPage: boolean;
  harnesses?: { name: string; params?: Record<string, unknown> }[];
  parent?: string;              // set on sub-entries from screenshot(name)
  derived?: boolean;
}

Harnesses

A harness is a pluggable script in .blazediff/harnesses/<name>.js, attached to an entry via its harnesses: [{ name, params? }] list. Login is just one kind of harness - anything that drives the page before or around a screenshot is the same concept.

A harness is an ESM module (.js / .mjs - TypeScript is not auto-transpiled) that default-exports a Harness. Two phases:

setup - runs before navigation (establish a session, e.g. login).
interact (default) - runs after the base screenshot; drives the page and may emit extra named screenshots via screenshot(name). Each becomes its own baseline entry, id <entry>__<name>.


export interface HarnessContext<P = Record<string, unknown>> {
  page: import("playwright").Page;
  browser: import("playwright").Browser;
  context: import("playwright").BrowserContext;
  params: P;                            // e.g. { persona: "default" }
  screenshot(name: string): Promise<void>;
}
export interface Harness<P = Record<string, unknown>> {
  phase?: "setup" | "interact";
  run(ctx: HarnessContext<P>): Promise<void>;
}

Interaction harnesses

For a test that needs the page driven mid-flow (open a menu, switch a tab, then shoot again), write an interact harness and attach it by name:


// .blazediff/harnesses/weather-menu.js
/** @type {import("@blazediff/agent").Harness} */
export default {
  async run({ page, screenshot }) {
    await page.getByRole("button", { name: "More options" }).click();
    await screenshot("menu"); // -> baseline "weather__menu"
  },
};


{ "id": "weather", "url": "/weather", "harnesses": ["weather-menu"] }

The base shot weather fires automatically; every screenshot("menu") becomes its own manifest/baseline/diff entry. To re-baseline a multi-shot entry, rewrite <parent-id> re-runs the harness and regenerates all children.

Routes behind a login flow capture through a setup harness. Credentials live in environment variables - never in the harness file, the manifest, or LLM context (the harness only references process.env.BLAZEDIFF_AUTH_*).

For a plain email/password form the agent writes the harness directly - it identifies the form fields from the login route source or a DOM snapshot and emits .blazediff/harnesses/auth.js:


/** @type {import("@blazediff/agent").Harness<{ persona?: string }>} */
export default {
  phase: "setup",
  async run({ page, params }) {
    const upper = (params.persona ?? "default").toUpperCase().replace(/[^A-Z0-9]/g, "_");
    const email = process.env[`BLAZEDIFF_AUTH_${upper}_EMAIL`];
    const password = process.env[`BLAZEDIFF_AUTH_${upper}_PASSWORD`];
    if (!email || !password) throw new Error(`missing BLAZEDIFF_AUTH_${upper}_EMAIL / _PASSWORD`);
    await page.goto("http://127.0.0.1:3000/login");
    await page.locator('input[name="email"]').fill(email);
    await page.locator('input[name="password"]').fill(password);
    await Promise.all([
      page.waitForURL((u) => !u.pathname.startsWith("/login")),
      page.getByRole("button", { name: /sign in|log in/i }).click(),
    ]);
  },
};

For flows that can’t be reduced to fill-and-submit - OAuth/SSO, magic links, MFA, captcha - record it interactively instead:


blazediff-agent auth init --persona default --login-url http://127.0.0.1:3000/login

This opens a Playwright recorder; log in once, and on close the agent swaps the typed email/password for process.env.BLAZEDIFF_AUTH_<PERSONA>_* and writes the same .blazediff/harnesses/auth.js.

Per-entry. Add the harness to the entry’s harnesses list:


{ "id": "dashboard", "url": "/dashboard",
  "harnesses": [{ "name": "auth", "params": { "persona": "default" } }] }

Credentials. The CLI auto-loads env files from --cwd - .blazediff/.env[.local] (blazediff-scoped, auto-gitignored) then the project-root .env[.local] - before any harness runs. Real exported env vars win; .blazediff/ files beat the root. So just drop them in .blazediff/.env:


printf 'BLAZEDIFF_AUTH_DEFAULT_EMAIL=you@example.com\nBLAZEDIFF_AUTH_DEFAULT_PASSWORD=hunter2\n' \
  > .blazediff/.env
blazediff-agent check

The harness throws a clear error at capture time if its vars are missing.

Multiple personas. Use a different params.persona per entry; each maps to its own BLAZEDIFF_AUTH_<PERSONA>_* pair. One harness file serves them all.

Note. Every harness-gated capture runs in a fresh browser context (storageState reuse is not yet implemented), so a setup harness re-runs per entry.

Working reference. examples/agent-auth-spa-example in the repo is a Vite + React SPA with 2 public and 8 auth-gated routes. It ships a .blazediff/harnesses/auth.js and committed baselines, so you can clone the repo and run pnpm --filter @blazediff/agent-auth-spa-example check to see the full flow pass 10/10.

CI

In CI (CI=1 or no TTY), only check is allowed. onboard / capture / rewrite / reset are explicitly blocked - authoring belongs at the developer’s machine.

GitHub Actions


- run: pnpm install
- run: npx blazediff-agent browsers install
- run: npx blazediff-agent --cwd apps/website check --json
  env:
    # Only needed if any entry uses a login harness. One pair per persona.
    # (In CI, set these as secrets rather than committing .blazediff/.env.)
    BLAZEDIFF_AUTH_DEFAULT_EMAIL: ${{ secrets.BLAZEDIFF_AUTH_DEFAULT_EMAIL }}
    BLAZEDIFF_AUTH_DEFAULT_PASSWORD: ${{ secrets.BLAZEDIFF_AUTH_DEFAULT_PASSWORD }}

Exit codes:

0 - every entry passed
1 - at least one regression, intentional, noise, or pending-judgment entry
non-zero with structured JSON error on infra failures (missing manifest, no chromium, etc.)

Hard rules

Never --mode baseline an existing manifest entry without explicit user request.
Never edit .blazediff/manifest.json directly.
In CI (CI=1 or no TTY), only check is allowed.
A route that times out is logged once in the result array and skipped - never blocks the run.
Never leave a dev server running after authoring exits. serve-status --kill is mandatory teardown.