RECIPE

Tenant isolation strategy

Meridian routes inference for hundreds of customer tenants through a single gateway pool. This recipe shows how to enforce tenant isolation at the API key, quota, and audit layers so one noisy tenant never degrades another and every token is attributable.

1. Issue a tenant-scoped API key

Every tenant gets its own key prefixed with their tenant id. The gateway validates the prefix on every request and binds the trace to that tenant. Never share keys across tenants and never embed a master key in client code.

curl https://llm.getnimbus.net/v1/keys \
  -H "Authorization: Bearer $MERIDIAN_ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "acme_corp",
    "name": "acme-prod-key",
    "monthly_budget_usd": 500,
    "rpm_limit": 120,
    "allowed_models": ["azure/model-router", "azure-swc/gpt-4.1"]
  }'

2. Enforce per-tenant quotas

Quotas are enforced in the gateway hot path before the upstream call. Each tenant has an independent budget bucket, RPM bucket, and TPM bucket. When a bucket drains, that tenant receives a clean 429 response while every other tenant continues uninterrupted.

Budget is decremented at the end of the request from real upstream cost.
RPM is a sliding window keyed on tenant_id.
TPM counts both prompt and completion tokens, including reasoning tokens.

3. Audit every request

Each call writes one row to the audit log with the tenant id, model alias, upstream provider, token counts, cost in USD, and latency. The admin console reads this table to render cost dashboards and to flag tenants whose 95th-percentile latency exceeds SLO.

Pair audit rows with the X-Request-Id header so a customer support ticket can be traced from a single failed completion all the way back to the upstream Azure region.