Skip to content

AI Test Generation

I need to... Guide Summary
Decide when AI test generation pays off (and when to skip it) Overview Use AI generation when backfilling coverage on an existing site or translating a user story into tests. Skip it when you can't articulate what "correct" looks like — the agent will pick a definition for you. Always go plan-first; code without a reviewed plan skips the only review gate a non-developer can use.
Understand the Plan → Review → Generate → Heal cycle Four-Phase Pattern Every AI test generation cycle runs four phases: Plan (Planner writes a Markdown spec), Review (human approves), Generate (Generator writes code from the spec), Heal (Healer fixes locators when CI breaks). Never skip Plan or Review; editing tests instead of the spec causes silent drift.
Author or review a Markdown test plan Test Plan Format Plans use a specific Markdown hierarchy: H2 = epic, H3 = scenario group (test.describe), H4 = single test, numbered Steps, bulleted Expected results, bulleted Negative checks. Free-form prose breaks the Generator.
Decide what goes in the plan vs in generated code Plan vs Code Boundary Scenario intent, step lists, acceptance criteria, negative assertions, and field labels belong in the plan. CSS selectors, wait tactics, fixtures, and data construction belong in generated code. If a manual tester could act on it by reading the rendered page, it goes in the plan.
Write acceptance criteria the Generator can turn into expect() calls Acceptance Criteria Write each criterion as an observable present-tense state ("The success message is visible") — one fact per bullet, independently checkable. The Generator emits one expect() per bullet. 7+ criteria in one scenario usually means split it.
Add negative assertions to every scenario Negative Assertions Include a "Negative checks" subsection in every scenario. The AI Planner encodes what it observes — explicit negative checks force the Planner and reviewer to think about what should NOT happen, catching bugs before they become enshrined as ground truth.
Seed plan generation from the codebase Input: Code Analysis Point the Planner at routing + form + permissions + one existing spec — not the whole codebase. For Drupal, *.routing.yml and buildForm() yield routes, field labels, and required-field negatives automatically. Always combine code analysis with live exploration.
Translate a user story or Jira ticket into a test plan Input: User Stories Map "As an X, I want Y so that Z" to Preconditions = X, scenario title = "X does Y", first criterion = Z made observable. Extract Acceptance Criteria and Description only from Jira — tell the Planner explicitly not to add criteria from comments or related tickets.
Turn a vague developer prompt into a bounded plan Input: Raw Prompt Raw prompts produce over-broad crawls, hallucinated fields, and happy-path-only coverage. Have the Planner ask five scoping questions first, or produce a draft with a "Clarifications needed" block.
Scope a plan to a single flow or viewport Targeted Scope A scope-narrow prompt expresses: feature, surface (viewport + user agent), and explicit exclusions. Viewport goes in Preconditions, not Steps. Always plan a sibling desktop scenario — targeting only mobile misses responsive issues.
Use crawled-site discovery (and when not to) Crawled Site Crawl-based discovery catches forgotten surfaces on inherited projects but produces flat, unmaintainable plans on Drupal sites. Use with strict path prefix restrictions, depth cap of 2, and admin/user exclusions — and treat output as discovery material, not production tests.
Combine multiple input modalities Hybrid Inputs When inputs conflict: user-story constraints win, code analysis fills vocabulary, live exploration validates, crawl only fills gaps inside scope. The Planner's job is reduction — output covering more surface than the input asked for is a Planner failure.
Set up Playwright Test Agents in the project Playwright Test Agents Playwright 1.56+ ships three agents: Planner (writes specs/feature.md), Generator (writes tests/feature.spec.ts from approved plan), Healer (fixes failing locators without touching assertions). Invoke via Claude Code with Playwright MCP installed.
Wire Playwright MCP into Claude Code Playwright MCP Setup Install with claude mcp add playwright npx @playwright/mcp@latest, then restart Claude Code. Use Playwright MCP for test generation; Chrome DevTools MCP for performance traces. Never run against production URLs.
Run the full end-to-end generation loop End-to-End Workflow The full loop is 10 steps from intent to CI. Three separate reviewers for plan, generated code, and every Healer patch — same-person review of all three defeats every gate. Includes review checklists for plans and generated code.
Apply the pattern with ATK or Drupal-specific tools Drupal & ATK Notes For Drupal, point the Planner at routing + buildForm() + permissions. Use ATK's catalog for generic Drupal tests; AI generation for project-specific features. Set testIdAttribute:'data-qa-id' for stable selectors when ATK is installed.
Avoid the encode-current-behavior trap and other anti-patterns Anti-Patterns The AI Planner encodes what it observes — including bugs — as expected behavior. Mitigations: mandatory plan review, negative assertions, draft-only status, Healer auto-commit disabled. The triple-review rule (plan, code, Healer patch) applies without exception.
Find tool URLs and install commands Code Reference Core toolchain: Playwright 1.56+ Test Agents, @playwright/mcp, Claude Code. Install commands, key documentation URLs, and community skill packs.