Model routing

Every model alias in Meridian is a routing decision — bare names distribute load, prefixed names pin regions, and the adaptive router picks the fastest healthy endpoint automatically.

Bare alias — round‑robin across regions

When you send a request to a bare alias like gpt-4.1, Meridian resolves it to a pool of four geographically distributed backends. Each successive request cycles to the next region in the ring — US East, US West, EU West, and AP Southeast — so no single endpoint becomes a hot spot.

POST /v1/chat/completions
Authorization: Bearer sk-...
Content-Type: application/json

{
  "model": "gpt-4.1",
  "messages": [...]
}

→ request 1  routes to  us-east
→ request 2  routes to  us-west
→ request 3  routes to  eu-west
→ request 4  routes to  ap-southeast
→ request 5  wraps back to  us-east

The round‑robin cursor is per‑API‑key, so two different keys interleave independently. If a region returns a 5xx or exceeds the deadline, Meridian skips it for the remainder of the ring cycle and retries the next candidate — no request is dropped unless every region is unhealthy.

Prefixed alias — pin a specific region

Append a provider prefix to lock every request to a single region. The prefix follows the convention provider/alias.

"model": "azure-swc/gpt-4.1"

→ every call hits Sweden Central (Azure)
→ no round‑robin, no failover to other regions
→ latency is predictable; useful for compliance or data‑residency

Supported provider prefixes: azure-eus, azure-wus, azure-swc, azure-sin, openai, anthropic, groq, together. Each prefix maps to exactly one physical region. If the pinned region is unhealthy, the request fails fast with a 502 — Meridian does not silently re‑route pinned requests.

Adaptive router — auto‑select fastest healthy region

The azure/model-router alias delegates region selection to an adaptive control loop. Every 30 seconds the router probes each Azure region with a lightweight health check, recording median latency and error rate over a rolling 5‑minute window.

Probe interval
30 s
per region, staggered
Decision window
5 min
rolling median latency
Cutoff threshold
2 %
error rate before exclusion

On each incoming request the router picks the region with the lowest median latency that is currently passing health checks. If the chosen region fails mid‑request, Meridian transparently retries the next‑best candidate — the client sees a single successful response with no indication of the internal failover.

"model": "azure/model-router"

→ probe loop:  us-east 12ms ✓   us-west 18ms ✓   sweden 8ms ✓   singapore 42ms ✓
→ request lands → sweden (8ms median)
→ sweden times out → retry us-east (12ms) → 200 OK
→ client latency: ~14ms total

Quick reference

Alias patternBehaviourFailover
gpt-4.1Round‑robin across 4 regionsSkip unhealthy; retry next
azure-swc/gpt-4.1Pinned to Sweden CentralNone — fast 502
azure/model-routerAdaptive lowest‑latency healthy regionTransparent retry on next‑best