Model routing
Every model alias in Meridian is a routing decision — bare names distribute load, prefixed names pin regions, and the adaptive router picks the fastest healthy endpoint automatically.
Bare alias — round‑robin across regions
When you send a request to a bare alias like gpt-4.1, Meridian resolves it to a pool of four geographically distributed backends. Each successive request cycles to the next region in the ring — US East, US West, EU West, and AP Southeast — so no single endpoint becomes a hot spot.
POST /v1/chat/completions
Authorization: Bearer sk-...
Content-Type: application/json
{
"model": "gpt-4.1",
"messages": [...]
}
→ request 1 routes to us-east
→ request 2 routes to us-west
→ request 3 routes to eu-west
→ request 4 routes to ap-southeast
→ request 5 wraps back to us-eastThe round‑robin cursor is per‑API‑key, so two different keys interleave independently. If a region returns a 5xx or exceeds the deadline, Meridian skips it for the remainder of the ring cycle and retries the next candidate — no request is dropped unless every region is unhealthy.
Prefixed alias — pin a specific region
Append a provider prefix to lock every request to a single region. The prefix follows the convention provider/alias.
"model": "azure-swc/gpt-4.1"
→ every call hits Sweden Central (Azure)
→ no round‑robin, no failover to other regions
→ latency is predictable; useful for compliance or data‑residencySupported provider prefixes: azure-eus, azure-wus, azure-swc, azure-sin, openai, anthropic, groq, together. Each prefix maps to exactly one physical region. If the pinned region is unhealthy, the request fails fast with a 502 — Meridian does not silently re‑route pinned requests.
Adaptive router — auto‑select fastest healthy region
The azure/model-router alias delegates region selection to an adaptive control loop. Every 30 seconds the router probes each Azure region with a lightweight health check, recording median latency and error rate over a rolling 5‑minute window.
On each incoming request the router picks the region with the lowest median latency that is currently passing health checks. If the chosen region fails mid‑request, Meridian transparently retries the next‑best candidate — the client sees a single successful response with no indication of the internal failover.
"model": "azure/model-router"
→ probe loop: us-east 12ms ✓ us-west 18ms ✓ sweden 8ms ✓ singapore 42ms ✓
→ request lands → sweden (8ms median)
→ sweden times out → retry us-east (12ms) → 200 OK
→ client latency: ~14ms totalQuick reference
| Alias pattern | Behaviour | Failover |
|---|---|---|
| gpt-4.1 | Round‑robin across 4 regions | Skip unhealthy; retry next |
| azure-swc/gpt-4.1 | Pinned to Sweden Central | None — fast 502 |
| azure/model-router | Adaptive lowest‑latency healthy region | Transparent retry on next‑best |