Latency-aware routing
Meridian's gateway can route each request to the fastest healthy model that satisfies your quality bar. This recipe wires up p50/p95 latency tracking per upstream, then biases the router toward low-tail-latency providers without sacrificing answer quality. Use this when your product has a hard time budget (chat UX, voice, autocomplete) and you'd rather drop a slow provider than wait for it.
1. Enable latency telemetry
Set x-meridian-track-latency: 1 on every request. The gateway records per-upstream p50/p95 in a rolling 5-minute window and exposes it on the /v1/health/latency endpoint.
2. Pick a routing policy
Choose policy=fastest-of-tier to pin a quality tier (e.g. "frontier") and let the router pick whichever upstream in that tier currently has the lowest p95. Slow providers are demoted automatically and re-probed after 60s.
3. Wire it into your call
Send the policy header alongside your model alias. The gateway returns the chosen upstream in x-meridian-upstream so you can log it.
curl https://llm.getnimbus.net/v1/chat/completions \
-H "Authorization: Bearer $MERIDIAN_KEY" \
-H "x-meridian-policy: fastest-of-tier" \
-H "x-meridian-tier: frontier" \
-H "x-meridian-track-latency: 1" \
-d '{"model":"meridian/auto","messages":[
{"role":"user","content":"Summarize this ticket."}
]}'Tip: combine with x-meridian-budget-ms: 800 to hard-cancel any upstream that exceeds your budget and fail over to the next-fastest peer.