Query understanding pipeline
Transform raw user queries into structured, intent-aware representations before they touch the retrieval layer.
Stage 1 — Normalization
Strip trailing punctuation, collapse whitespace, lowercase the query, and expand common contractions. This ensures downstream stages operate on a canonical form.
Stage 2 — Entity extraction
Identify named entities (product names, version numbers, error codes) using a lightweight span classifier. Extracted entities are attached as structured metadata.
Stage 3 — Intent classification
Route the query into one of five intent buckets: troubleshooting, how-to, reference, comparison, or purchase. Intent drives retrieval strategy selection.
Stage 4 — Query expansion
Generate synonym variants and hyponym substitutions using a pre-built domain thesaurus. Expanded forms are scored and deduplicated before merging.
Output contract
The pipeline emits a JSON object with fields: normalized, entities, intent, and expansions. Downstream retrievers consume this contract directly.