Back to DocsRecipe

Recipe: Question → best model router

Classify every incoming prompt and route it to the cheapest model that can answer correctly — without sacrificing quality.

Step 1 — Classify the intent

Run a fast, zero-shot classifier (e.g. a fine-tuned BERT variant or a small LLM with a structured prompt) that tags the question with one of: factual, reasoning, creative, or code.

Step 2 — Map intent to model tier

Intent	Model	Why
factual	Haiku / 4o-mini	Lowest latency, high accuracy on retrieval
reasoning	Sonnet / 4o	Multi-step logic needs depth
creative	Opus / 4o	Nuance and tone matter
code	Sonnet / 4o	Strong structured output

Step 3 — Add a fallback

If the classifier confidence is below your threshold (e.g. 0.7), route to your mid-tier model. Never drop a request — degrade gracefully.

Step 4 — Measure and tighten

Log every route decision alongside a human eval or auto-eval score. Shift intent thresholds weekly. The goal: push as much volume as possible to the cheapest model without regressing quality.

Pro tip: Cache identical prompts with a bloom filter so you do not pay for classification twice. At scale, the classifier itself becomes a cost center — treat it like one.