AI Search (Vector / RAG)

When to Use

Use this guide when setting up semantic search or Retrieval-Augmented Generation (RAG) with vector databases. Use AI Assistant API to wire the rag_action into an assistant.

Decision

Situation	Choose	Why
Accurate semantic search	`contextual_chunks` strategy	Multiple vectors; chunk enriched with title + context
Faster, less accurate	`average_pool` strategy	Single composite vector; simpler
RAG in a chatbot	`rag_action` plugin on assistant	Retrieves semantically relevant content into LLM context
Hybrid with keyword search	Boost processors	Combines vector and DB/Solr results

Pattern

$index = \Drupal\search_api\Entity\Index::load('my_ai_index');
$query = $index->query(['limit' => 10]);
$query->keys('semantic search phrase');
$results = $query->execute();

foreach ($results->getResultItems() as $item) {
  $score = $item->getScore();          // vector distance
  $content = $item->getExtraData('content');
  $entity_id = $item->getExtraData('drupal_entity_id');
}

// Get chunk-level results instead of item-level:
$query->setOption('search_api_ai_get_chunks_result', TRUE);

Setup Steps

Install a VDB provider (ai_vdb_provider_pinecone, ai_vdb_provider_milvus, etc.)
Create a Search API Server: choose "AI Search" backend
Configure VDB connection, embeddings engine, embedding strategy
Create a Search API Index on that server
Go to Fields tab — assign indexing options to each field
Index content

Indexing Options

Option	Description
`main_content`	Chunked and embedded — at least one required
`contextual_content`	Prepended to every chunk for context
`attributes`	Stored as VDB metadata for filtering
`ignore`	Not processed

Embedding Strategies

Strategy	Description
`contextual_chunks`	Multiple vectors per item; each chunk enriched with title + context. Most accurate. Default.
`average_pool`	Single composite vector via average pooling. Faster, less accurate.

Hybrid Search Processors

Processor	Backend	Description
`database_boost_by_ai_search`	`search_api_db`	Injects AI-matched IDs into DB query
`solr_boost_by_ai_search`	`search_api_solr`	Elevates AI-matched IDs in Solr results
`ai_search_score_threshold`	`search_api_ai_search`	Filters below minimum relevance score

Custom Embedding Strategy

use Drupal\ai_search\Attribute\EmbeddingStrategy;

#[EmbeddingStrategy(
  id: 'my_strategy',
  label: new TranslatableMarkup('My Strategy'),
  description: new TranslatableMarkup('Custom chunking approach'),
)]
class MyStrategy extends EmbeddingBase {
  // Override getEmbedding() or getChunks()
}

Common Mistakes

Wrong: No main_content field assigned → Right: At least one field must be main_content for embeddings to work
Wrong: Mismatched tokenizer model → Right: Chunk sizes are calculated from the tokenizer model; mismatch causes wrong chunk sizes
Wrong: Not re-indexing after strategy change → Right: Existing vectors don't match new strategy; must reindex

AI Search (Vector / RAG)

When to Use

Decision

Pattern

Setup Steps

Indexing Options

Embedding Strategies

Hybrid Search Processors

Custom Embedding Strategy

Common Mistakes

See Also