Skip to content

Schema.org for AI Discovery

When to Use

You want structured data to improve how AI systems understand, extract, and cite your content — beyond traditional Google rich results. Google AI Overviews, Microsoft Copilot, and ChatGPT are confirmed to use structured data for entity understanding and answer generation. This guide focuses on which Schema.org types matter most for GEO and how to implement them in Drupal. For general Schema Metatag setup, see Schema Metatag Setup.

Decision

Goal Schema type Priority
Article or blog post Article (or BlogPosting, NewsArticle) High — most AI platforms prioritize editorial content
Q&A or FAQ content FAQPage High — directly maps to AI answer format
Product or service Product High — used by shopping AI features
Step-by-step process HowTo High — maps to AI Overviews procedural answers
Organization/brand identity Organization High — entity disambiguation for brand mentions
Voice or AI assistant readout SpeakableSpecification Medium — marks specific content for audio extraction
Events Event Medium — Google and Perplexity surface event data
Persons / authors Person Medium — E-E-A-T signals for author credibility
Breadcrumb trail BreadcrumbList Medium — AI uses for page context
Courses or educational content Course Low-Medium — emerging AI education use

Confirmed Impact

Platform How structured data is used Evidence
Google AI Overviews Entity extraction, answer generation, source selection Google Search Central documentation (2024)
Microsoft Copilot Bing structured data feeds AI answer generation Microsoft Learn — Bing Webmaster Guidelines
ChatGPT Browse Schema used during web retrieval for entity understanding Data World study: GPT-4 accuracy 16% → 54% with schema
Google Rich Results FAQ, HowTo, Product rich snippets in SERP Google Rich Results Test

The Data World finding (GPT-4 accuracy from 16% to 54% with Schema.org markup) is the strongest quantified evidence. The mechanism: LLMs treat JSON-LD properties as explicit, machine-readable assertions rather than inferring facts from prose.

Priority Schema Types for GEO

Article

Maps your editorial content to AI answer generation. AI systems prefer Article over bare WebPage for content they will cite.

Required for GEO impact: - headline — matches page title - datePublished — enables recency filtering - dateModified — 89.7% of ChatGPT citations go to recently modified pages - author (Person type) — E-E-A-T signal - publisher (Organization type) — brand entity - image — AI visual context

FAQPage

The highest-impact type for AI Overviews. FAQ content maps directly to the question-answer format AI systems use for answers. Google AI Overviews heavily samples FAQPage content.

{
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is Generative Engine Optimization?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "GEO is the practice of optimizing content to be cited and summarized by AI systems such as ChatGPT, Google AI Overviews, and Perplexity."
      }
    }
  ]
}

HowTo

Maps to procedural AI answers. When a user asks "how to configure X in Drupal," AI systems with HowTo markup can extract steps directly rather than parsing prose.

{
  "@type": "HowTo",
  "name": "How to configure llms.txt in Drupal",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Create the file",
      "text": "Create web/llms.txt in your Drupal root."
    },
    {
      "@type": "HowToStep",
      "name": "Add the H1 and description",
      "text": "Start with # Site Name and a blockquote description."
    }
  ]
}

Organization

Entity disambiguation. AI systems use Organization markup to associate brand mentions across the web. Without it, "Drupal" might be attributed to multiple unrelated entities. Include for every site.

Key properties: name, url, logo, sameAs (Wikipedia, LinkedIn, social profiles)

SpeakableSpecification

Marks specific text blocks for voice assistants and AI audio readout. Currently used by Google Assistant and emerging AI voice features.

{
  "@type": "WebPage",
  "speakable": {
    "@type": "SpeakableSpecification",
    "cssSelector": [".article-headline", ".article-summary"]
  }
}

Target: headline + first paragraph (the direct answer). Do not mark entire page bodies — AI voice systems need the 1-2 sentence summary, not a full article.

Schema Stacking

Multiple Schema.org types can coexist on a single page. Stack types to give AI systems richer entity context.

Common stack patterns:

Page type Primary type Stack with
Blog post Article BreadcrumbList, Person (author), Organization (publisher)
FAQ page FAQPage WebPage, Organization
Tutorial HowTo Article, BreadcrumbList
Product page Product Organization, BreadcrumbList, Offer
Team member Person Organization

Stacking as separate JSON-LD blocks (preferred approach):

<script type="application/ld+json">
{"@type": "Article", ...}
</script>

<script type="application/ld+json">
{"@type": "BreadcrumbList", ...}
</script>

Separate script blocks are cleaner than nested types and easier to debug with Google's Rich Results Test.

Drupal Implementation with schema_metatag

Schema Metatag 3.0.4 implements JSON-LD via the Metatag module's token system. Configuration is at /admin/config/search/metatag.

Step 1: Enable relevant submodules

drush en schema_article schema_web_page schema_organization schema_person

For FAQPage and HowTo, you need a compatible field or a custom plugin — these types require dynamic structured data from field values, not static token mappings.

Step 2: Configure Article per content type

At /admin/config/search/metatag → Edit defaults for your Article content type:

Schema property Token
@type Article
headline [node:title]
datePublished [node:created:html_datetime]
dateModified [node:changed:html_datetime]
author.name [node:author:display-name]
author.@type Person
publisher.name [site:name]
publisher.@type Organization
image [node:field_image:entity:url]

Step 3: Organization in Global defaults

Set Organization in the Global metatag defaults so every page carries publisher entity data. This is the minimum viable schema for brand entity disambiguation.

Step 4: Verify output

curl -s https://example.com/your-article | grep -A 20 'application/ld+json'

Or use View Source in browser and search for ld+json.

Common Mistakes

  • Wrong: Using Microdata or RDFa instead of JSON-LD → Right: JSON-LD is Google's recommendation and is cleanest for AI parsing; JSON-LD lives in <script> tags, not embedded in HTML attributes
  • Wrong: Adding FAQPage markup to a page that is not actually a FAQ → Right: Google and AI systems penalize misleading structured data; only use FAQPage when the page structure matches
  • Wrong: Setting dateModified to the original publish date → Right: dateModified must reflect the last actual content change; AI uses this for recency filtering
  • Wrong: Omitting author because the site has no bylines → Right: Create a Person entity for the organization's content team; E-E-A-T matters to AI citation weighting
  • Wrong: Stacking all types in one nested JSON-LD object → Right: Use separate <script type="application/ld+json"> blocks per type for cleaner validation
  • Wrong: Only validating with Google Rich Results Test → Right: Also test with schema.org/validator and inspect actual JSON-LD in page source; Rich Results Test only covers Google-specific rich result types

See Also