Recipe

Intent classifier design

A production pattern for routing user prompts to the correct downstream handler using lightweight embedding similarity.

Architecture

Ingest → embed → compare against labeled intent centroids → route to handler. The classifier runs before any LLM call, keeping latency under 15ms and cost near zero.

Intent taxonomy

•generate_code — produce a complete file or function
•explain_concept — teach a technical topic
•debug_error — diagnose a stack trace or log
•refactor — improve existing code without changing behavior

Embedding pipeline

Use a small, fast model (all-MiniLM-L6-v2, 384-dim). Precompute centroids from 20–50 labeled examples per intent. At runtime, compute cosine similarity between the user prompt embedding and each centroid. Select the highest-scoring intent above a confidence threshold (default 0.65).

Fallback strategy

If no intent exceeds the threshold, route to a general-purpose handler that asks a clarifying question. Log low-confidence prompts for taxonomy expansion.

← Back to docs