Recipe

Router-based retrieval

Route every query to the smallest model that can answer it. Meridian's adaptive router inspects prompt complexity, retrieval depth, and latency budget, then picks from a pool of 250+ models spanning gpt-5-mini through opus-4-7 reasoning tiers — without you hard-coding a single model id.

1. Tag your documents at ingest

When you push corpora to /v1/retrieval/ingest, attach a route_hint so the router knows which retrieval tier (fast, balanced, deep) a chunk belongs to. Hints are advisory — the router may override them when the query says otherwise.

2. Call the router endpoint

Use model: meridian/router instead of a specific deployment. The gateway returns the same OpenAI-compatible shape, with an extrax-meridian-routed-to header so you can audit which model actually served the request.

curl https://llm.getnimbus.net/v1/chat/completions \
  -H "Authorization: Bearer $MERIDIAN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meridian/router",
    "messages": [
      {"role": "user", "content": "Summarize Q3 churn drivers."}
    ],
    "retrieval": {
      "corpus": "support-tickets",
      "depth": "balanced",
      "top_k": 8
    }
  }'

3. Inspect routing decisions

Every routed request lands in the Admin Console under Traffic → Routes. You can see cost, latency, and chosen tier per call. When a class of prompts consistently lands on a tier that's too expensive, pin them to a cheaper deployment with a route override rule — no code change required.