Skip to Content
New: @blazediff/agent - agentic visual regression your coding agent can judge. Read more โ†’

Agentic Visual Testing - Running in CI

With baselines committed (see Setting Up), CIโ€™s job is one verb: check. It re-captures every manifest entry, diffs each against its baseline, and fails the build on any regression.

The check command

blazediff-agent check --judge host --json

The CLI starts the dev server automatically when config.devServer is set, runs every entry through Playwright, diffs each capture, and emits a CheckReport:

{ "summaryPath": ".blazediff/summary.md", "totalEntries": 23, "passed": 22, "failed": 0, "pendingJudgments": 1, "results": [ { "id": "agent", "url": "/agent", "status": "needs-judgment", "verdict": { "label": "ambiguous", "headline": "5 regions: 4 content-change, 1 addition @ left (0.13%, low)", "action": "investigate" } } ] }

results[] lists non-pass entries only. Full per-entry detail lives in .blazediff/summary.md and .blazediff/judgments/<id>/request.json.

Check-only in CI. When CI=1 or thereโ€™s no TTY, only check runs. onboard / capture / rewrite / reset are blocked - authoring belongs on a developerโ€™s machine, where baseline changes can be reviewed.

GitHub Actions

- run: pnpm install - run: npx blazediff-agent browsers install - run: npx blazediff-agent --cwd apps/website check --json env: # Only needed if any entry uses a login harness. One pair per persona. # In CI, set these as secrets rather than committing .blazediff/.env. BLAZEDIFF_AUTH_DEFAULT_EMAIL: ${{ secrets.BLAZEDIFF_AUTH_DEFAULT_EMAIL }} BLAZEDIFF_AUTH_DEFAULT_PASSWORD: ${{ secrets.BLAZEDIFF_AUTH_DEFAULT_PASSWORD }}

Pass -C, --cwd <abs-path> to target one app inside a monorepo.

Exit codes

CodeMeaning
0Every entry passed
1At least one regression, intentional, noise, or pending-judgment entry
non-zero + JSONInfra failure (missing manifest, no Chromium, etc.)

A route that times out is logged once in the result array and skipped - it never blocks the run.

When a check fails

A 1 exit usually means a diff needs a verdict. Locally, your coding agent reads the judgment request and decides; intentional changes are accepted with rewrite. That loop - verdicts, harnesses, and masking flakes - is covered in Judging and Harnesses โ†’.

Last updated on