Recipe
AI Gateway Architecture
A Meridian AI gateway sits between your application and dozens of upstream model providers, unifying billing, retries, fallback routing, and observability behind a single OpenAI-compatible endpoint. This recipe walks through the three load-bearing layers of a production-grade gateway built on Meridian.
1. The Edge Router
Every request enters through a thin edge layer that authenticates the caller, parses the model alias, and selects an upstream pool. Meridian ships a 54-alias catalog spanning Azure OpenAI, Anthropic, xAI, DeepSeek, and Cohere — aliases like azure/model-router resolve to adaptive routing across the entire fleet.
2. The Retry & Fallback Core
When an upstream returns a 429, 503, or content filter, the core retries on a backoff curve and then cascades to a sibling deployment in a different region. Reasoning models with hidden chain-of-thought consume budget before emitting text, so the core enforces a minimum max_tokens floor of 2048.
3. The Metering Tap
Every response is metered: prompt tokens, completion tokens, reasoning tokens, latency, and upstream cost. The tap writes to a usage stream that bills the customer with a configurable markup — Meridian defaults to a 20% margin over raw provider cost.
POST /v1/chat/completions
Authorization: Bearer sk-meridian-...
{
"model": "azure/model-router",
"max_tokens": 2048,
"messages": [{"role":"user","content":"Hello"}]
}← Back to all recipes