← Back to docs

Recipe

AI Gateway Architecture

A Meridian AI gateway sits between your application and dozens of upstream model providers, unifying billing, retries, fallback routing, and observability behind a single OpenAI-compatible endpoint. This recipe walks through the three load-bearing layers of a production-grade gateway built on Meridian.

1. The Edge Router

Every request enters through a thin edge layer that authenticates the caller, parses the model alias, and selects an upstream pool. Meridian ships a 54-alias catalog spanning Azure OpenAI, Anthropic, xAI, DeepSeek, and Cohere — aliases like azure/model-router resolve to adaptive routing across the entire fleet.

2. The Retry & Fallback Core

When an upstream returns a 429, 503, or content filter, the core retries on a backoff curve and then cascades to a sibling deployment in a different region. Reasoning models with hidden chain-of-thought consume budget before emitting text, so the core enforces a minimum max_tokens floor of 2048.

3. The Metering Tap

Every response is metered: prompt tokens, completion tokens, reasoning tokens, latency, and upstream cost. The tap writes to a usage stream that bills the customer with a configurable markup — Meridian defaults to a 20% margin over raw provider cost.

POST /v1/chat/completions
Authorization: Bearer sk-meridian-...
{
  "model": "azure/model-router",
  "max_tokens": 2048,
  "messages": [{"role":"user","content":"Hello"}]
}
← Back to all recipes