Router-based retrieval
Route every query to the smallest model that can answer it. Meridian's adaptive router inspects prompt complexity, retrieval depth, and latency budget, then picks from a pool of 250+ models spanning gpt-5-mini through opus-4-7 reasoning tiers — without you hard-coding a single model id.
1. Tag your documents at ingest
When you push corpora to /v1/retrieval/ingest, attach a route_hint so the router knows which retrieval tier (fast, balanced, deep) a chunk belongs to. Hints are advisory — the router may override them when the query says otherwise.
2. Call the router endpoint
Use model: meridian/router instead of a specific deployment. The gateway returns the same OpenAI-compatible shape, with an extrax-meridian-routed-to header so you can audit which model actually served the request.
curl https://llm.getnimbus.net/v1/chat/completions \
-H "Authorization: Bearer $MERIDIAN_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meridian/router",
"messages": [
{"role": "user", "content": "Summarize Q3 churn drivers."}
],
"retrieval": {
"corpus": "support-tickets",
"depth": "balanced",
"top_k": 8
}
}'3. Inspect routing decisions
Every routed request lands in the Admin Console under Traffic → Routes. You can see cost, latency, and chosen tier per call. When a class of prompts consistently lands on a tier that's too expensive, pin them to a cheaper deployment with a route override rule — no code change required.