AI Test Generation

I need to...	Guide	Summary
Decide when AI test generation pays off (and when to skip it)	Overview	Use AI generation when backfilling coverage on an existing site or translating a user story into tests. Skip it when you can't articulate what "correct" looks like — the agent will pick a definition for you. Always go plan-first; code without a reviewed plan skips the only review gate a non-developer can use.
Understand the Plan → Review → Generate → Heal cycle	Four-Phase Pattern	Every AI test generation cycle runs four phases: Plan (Planner writes a Markdown spec), Review (human approves), Generate (Generator writes code from the spec), Heal (Healer fixes locators when CI breaks). Never skip Plan or Review; editing tests instead of the spec causes silent drift.
Author or review a Markdown test plan	Test Plan Format	Plans use a specific Markdown hierarchy: H2 = epic, H3 = scenario group (test.describe), H4 = single test, numbered Steps, bulleted Expected results, bulleted Negative checks. Free-form prose breaks the Generator.
Decide what goes in the plan vs in generated code	Plan vs Code Boundary	Scenario intent, step lists, acceptance criteria, negative assertions, and field labels belong in the plan. CSS selectors, wait tactics, fixtures, and data construction belong in generated code. If a manual tester could act on it by reading the rendered page, it goes in the plan.
Write acceptance criteria the Generator can turn into expect() calls	Acceptance Criteria	Write each criterion as an observable present-tense state ("The success message is visible") — one fact per bullet, independently checkable. The Generator emits one expect() per bullet. 7+ criteria in one scenario usually means split it.
Add negative assertions to every scenario	Negative Assertions	Include a "Negative checks" subsection in every scenario. The AI Planner encodes what it observes — explicit negative checks force the Planner and reviewer to think about what should NOT happen, catching bugs before they become enshrined as ground truth.
Seed plan generation from the codebase	Input: Code Analysis	Point the Planner at routing + form + permissions + one existing spec — not the whole codebase. For Drupal, *.routing.yml and buildForm() yield routes, field labels, and required-field negatives automatically. Always combine code analysis with live exploration.
Translate a user story or Jira ticket into a test plan	Input: User Stories	Map "As an X, I want Y so that Z" to Preconditions = X, scenario title = "X does Y", first criterion = Z made observable. Extract Acceptance Criteria and Description only from Jira — tell the Planner explicitly not to add criteria from comments or related tickets.
Turn a vague developer prompt into a bounded plan	Input: Raw Prompt	Raw prompts produce over-broad crawls, hallucinated fields, and happy-path-only coverage. Have the Planner ask five scoping questions first, or produce a draft with a "Clarifications needed" block.
Scope a plan to a single flow or viewport	Targeted Scope	A scope-narrow prompt expresses: feature, surface (viewport + user agent), and explicit exclusions. Viewport goes in Preconditions, not Steps. Always plan a sibling desktop scenario — targeting only mobile misses responsive issues.
Use crawled-site discovery (and when not to)	Crawled Site	Crawl-based discovery catches forgotten surfaces on inherited projects but produces flat, unmaintainable plans on Drupal sites. Use with strict path prefix restrictions, depth cap of 2, and admin/user exclusions — and treat output as discovery material, not production tests.
Combine multiple input modalities	Hybrid Inputs	When inputs conflict: user-story constraints win, code analysis fills vocabulary, live exploration validates, crawl only fills gaps inside scope. The Planner's job is reduction — output covering more surface than the input asked for is a Planner failure.
Set up Playwright Test Agents in the project	Playwright Test Agents	Playwright 1.56+ ships three agents: Planner (writes specs/feature.md), Generator (writes tests/feature.spec.ts from approved plan), Healer (fixes failing locators without touching assertions). Invoke via Claude Code with Playwright MCP installed.
Wire Playwright MCP into Claude Code	Playwright MCP Setup	Install with `claude mcp add playwright npx @playwright/mcp@latest`, then restart Claude Code. Use Playwright MCP for test generation; Chrome DevTools MCP for performance traces. Never run against production URLs.
Run the full end-to-end generation loop	End-to-End Workflow	The full loop is 10 steps from intent to CI. Three separate reviewers for plan, generated code, and every Healer patch — same-person review of all three defeats every gate. Includes review checklists for plans and generated code.
Apply the pattern with ATK or Drupal-specific tools	Drupal & ATK Notes	For Drupal, point the Planner at routing + buildForm() + permissions. Use ATK's catalog for generic Drupal tests; AI generation for project-specific features. Set testIdAttribute:'data-qa-id' for stable selectors when ATK is installed.
Avoid the encode-current-behavior trap and other anti-patterns	Anti-Patterns	The AI Planner encodes what it observes — including bugs — as expected behavior. Mitigations: mandatory plan review, negative assertions, draft-only status, Healer auto-commit disabled. The triple-review rule (plan, code, Healer patch) applies without exception.
Find tool URLs and install commands	Code Reference	Core toolchain: Playwright 1.56+ Test Agents, @playwright/mcp, Claude Code. Install commands, key documentation URLs, and community skill packs.