← Back to docs

Cost-aware routing strategy

Meridian routes every chat completion across 250+ models on a single API. Cost-aware routing picks the cheapest model that meets your latency and quality bar for each request, so you spend a fraction of what a fixed frontier model would cost without sacrificing output quality on the requests that need it.

1. Tag your traffic

Send a x-meridian-tier header on every request: bulk, standard, or premium. Bulk traffic routes to Llama-4 and DeepSeek classes at roughly 1/40th the cost of a frontier reasoning call.

2. Use the router alias

Set model: "meridian/router"and let the adaptive router pick. The router watches per-tenant spend, latency budgets, and recent quality scores to land each call on the right tier.

3. Example request

curl https://meridian.getnimbus.net/v1/chat/completions \
  -H "Authorization: Bearer $MERIDIAN_KEY" \
  -H "x-meridian-tier: bulk" \
  -d '{
    "model": "meridian/router",
    "messages": [
      {"role": "user", "content": "Summarize this log."}
    ]
  }'

Typical savings: 60-85% versus single-model frontier baselines.