Documentation

Performance hub

Everything you need to build fast, resilient integrations on the Meridian inference platform. Explore latency profiles, concurrency models, caching strategies, and failover patterns.

Latency Guide

Understand round-trip timing, cold-start profiles, and how to measure end-to-end latency across regions.

Read guide

Retries & Backoff

Implement exponential backoff with jitter, circuit breakers, and token buckets for resilient upstream calls.

Read guide

Concurrency Best Practices

Tune worker pools, connection limits, and semaphore patterns to maximize throughput without saturation.

Read guide

Response Caching

Cache deterministic responses at the edge with stale-while-revalidate, ETag, and cache-tag invalidation.

Read guide

Batch Inference

Coalesce requests into batched inference calls to reduce per-token overhead and improve GPU utilization.

Read guide

Streaming Deep Dive

Leverage SSE and chunked transfer encoding for progressive delivery of tokens, audio, and structured output.

Read guide

Failover Behavior

Configure regional failover, health-check thresholds, and graceful degradation when primaries are unreachable.

Read guide

Not sure where to start?

Begin with the Latency Guide to baseline your integration, then layer on retries, caching, and streaming as your traffic grows.

Start with Latency Guide