Crawled Site
When to Use
Use crawled discovery when backfilling tests on an inherited project with no docs and no organized user stories. Do not use it as the default workflow.
Decision
| Use crawl when... | Don't crawl when... |
|---|---|
| Backfilling tests on an inherited project with no docs | You have user stories — use them instead |
| Mapping unknown surface area before writing real tests | You know what to test — be explicit |
| Producing a one-off discovery report | You expect the output to ship as production tests |
Tradeoffs
Pros: - Catches surfaces nobody remembered (admin pages, legacy routes, settings forms) - Good for "regression net" plans where breadth > depth - Useful when documentation is poor
Cons: - Plans become flat: "Visit /node/1, see a node" - Crawls follow auth-gated links that fail without seed login → useless "redirect to /login" scenarios - Drupal sites have infinite-axis surfaces (filtered views, paginated listings, every node URL) — Planner can't tell which are meaningfully different - Generated plans become unmaintainable: every new node creates a hypothetical scenario
Pattern: bounded crawl
Crawl from sitemap.xml.
Cap depth at 2.
Restrict to path prefixes: /contact, /search, /about.
Exclude /admin/*, /user/*, /node/*.
Treat plans as starting points humans prune.
Common Mistakes
- Wrong: Crawl-and-generate as the default workflow → Right: produces 200 redundant scenarios; one selector change breaks all; PM cannot review
- Wrong: No path exclusions → Right: Planner generates auth-gate scenarios for every admin URL
- Wrong: Treating crawl output as final tests → Right: crawl output is discovery material; humans prune and scope before generating code