Recipe
Intent classifier design
A production pattern for routing user prompts to the correct downstream handler using lightweight embedding similarity.
Architecture
Ingest → embed → compare against labeled intent centroids → route to handler. The classifier runs before any LLM call, keeping latency under 15ms and cost near zero.
Intent taxonomy
- •generate_code — produce a complete file or function
- •explain_concept — teach a technical topic
- •debug_error — diagnose a stack trace or log
- •refactor — improve existing code without changing behavior
Embedding pipeline
Use a small, fast model (all-MiniLM-L6-v2, 384-dim). Precompute centroids from 20–50 labeled examples per intent. At runtime, compute cosine similarity between the user prompt embedding and each centroid. Select the highest-scoring intent above a confidence threshold (default 0.65).
Fallback strategy
If no intent exceeds the threshold, route to a general-purpose handler that asks a clarifying question. Log low-confidence prompts for taxonomy expansion.