← Back to Docs
Recipe

Query understanding pipeline

Transform raw user queries into structured, intent-aware representations before they touch the retrieval layer.

Stage 1 — Normalization

Strip trailing punctuation, collapse whitespace, lowercase the query, and expand common contractions. This ensures downstream stages operate on a canonical form.

Stage 2 — Entity extraction

Identify named entities (product names, version numbers, error codes) using a lightweight span classifier. Extracted entities are attached as structured metadata.

Stage 3 — Intent classification

Route the query into one of five intent buckets: troubleshooting, how-to, reference, comparison, or purchase. Intent drives retrieval strategy selection.

Stage 4 — Query expansion

Generate synonym variants and hyponym substitutions using a pre-built domain thesaurus. Expanded forms are scored and deduplicated before merging.

Output contract

The pipeline emits a JSON object with fields: normalized, entities, intent, and expansions. Downstream retrievers consume this contract directly.