← Back to docs

Latency-aware routing

Meridian's gateway can route each request to the fastest healthy model that satisfies your quality bar. This recipe wires up p50/p95 latency tracking per upstream, then biases the router toward low-tail-latency providers without sacrificing answer quality. Use this when your product has a hard time budget (chat UX, voice, autocomplete) and you'd rather drop a slow provider than wait for it.

1. Enable latency telemetry

Set x-meridian-track-latency: 1 on every request. The gateway records per-upstream p50/p95 in a rolling 5-minute window and exposes it on the /v1/health/latency endpoint.

2. Pick a routing policy

Choose policy=fastest-of-tier to pin a quality tier (e.g. "frontier") and let the router pick whichever upstream in that tier currently has the lowest p95. Slow providers are demoted automatically and re-probed after 60s.

3. Wire it into your call

Send the policy header alongside your model alias. The gateway returns the chosen upstream in x-meridian-upstream so you can log it.

curl https://llm.getnimbus.net/v1/chat/completions \
  -H "Authorization: Bearer $MERIDIAN_KEY" \
  -H "x-meridian-policy: fastest-of-tier" \
  -H "x-meridian-tier: frontier" \
  -H "x-meridian-track-latency: 1" \
  -d '{"model":"meridian/auto","messages":[
    {"role":"user","content":"Summarize this ticket."}
  ]}'

Tip: combine with x-meridian-budget-ms: 800 to hard-cancel any upstream that exceeds your budget and fail over to the next-fastest peer.