Documentation
Performance hub
Everything you need to build fast, resilient integrations on the Meridian inference platform. Explore latency profiles, concurrency models, caching strategies, and failover patterns.
Latency Guide
Understand round-trip timing, cold-start profiles, and how to measure end-to-end latency across regions.
Read guideRetries & Backoff
Implement exponential backoff with jitter, circuit breakers, and token buckets for resilient upstream calls.
Read guideConcurrency Best Practices
Tune worker pools, connection limits, and semaphore patterns to maximize throughput without saturation.
Read guideResponse Caching
Cache deterministic responses at the edge with stale-while-revalidate, ETag, and cache-tag invalidation.
Read guideBatch Inference
Coalesce requests into batched inference calls to reduce per-token overhead and improve GPU utilization.
Read guideStreaming Deep Dive
Leverage SSE and chunked transfer encoding for progressive delivery of tokens, audio, and structured output.
Read guideFailover Behavior
Configure regional failover, health-check thresholds, and graceful degradation when primaries are unreachable.
Read guideNot sure where to start?
Begin with the Latency Guide to baseline your integration, then layer on retries, caching, and streaming as your traffic grows.
Start with Latency Guide