Resilience

Failover behavior

When a regional gateway returns a 5xx error, Meridian automatically retries the request against an adjacent region using the same model. Up to three hops are attempted before the request is surfaced as a hard failure.

How it works

Every model deployment is replicated across multiple geographic regions. When your application sends a request, the nearest healthy gateway handles it. If that gateway returns any 5xx status (502, 503, 504), the load balancer immediately marks the region as degraded and re-routes the identical payload to the next-closest region that hosts the same model.

Hop sequence

us-east→ 503 →eu-west→ 502 →ap-southeast→ 200 ✓

The client sees a single successful response. Intermediate failures are invisible — no partial streaming, no stale state.

Constraints

  • Max 3 hops. After the third consecutive 5xx, the request fails with a 502 and the original error chain is logged.
  • Same model only. Failover never switches models. A gpt-4o request stays on gpt-4o.
  • Idempotency keys. Retries carry the same x-request-id header so providers can deduplicate.
  • Streaming cutoff. If a stream fails mid-token, the entire request is retried from scratch on the next region.

Circuit breaker

A region that returns 5xx for more than 30% of requests over a 60-second window is automatically removed from the routing pool for 120 seconds. During this cooling period, requests skip the degraded region entirely, reducing tail latency.

Circuit state resets automatically. No manual intervention required.

Observability

Every failover event is emitted as a structured log line with the hop index, region, status code, and latency. You can stream these to your dashboard or SIEM via the webhooks integration.