Async Runtime Design
Build a Meridian-grade async runtime that schedules thousands of LLM calls, retries on transient gateway failures, and surfaces backpressure to the caller without dropping work. This recipe walks through the executor, the queue, and the supervision tree.
1.Executor topology
Pick a single multi-threaded executor per process and pin worker count to min(cpus, 16). LLM calls are IO-bound, so threads spend most of their life parked on the network; oversubscribing past 16 buys nothing and burns context-switch overhead. Reserve a dedicated blocking pool for tokenization and compression.
2.Bounded queue + backpressure
Every job enters a bounded MPSC channel sized to ~4x worker count. When full, the producer awaits room rather than buffering unbounded. This is the only honest way to surface load to the caller. Couple it with a per-tenant semaphore so one customer cannot starve the others.
3.Supervision and retry
Wrap each worker in a supervisor that restarts on panic with exponential backoff. Classify errors at the boundary: 429 and 5xx retry with jitter, 4xx fail fast. The runtime owns retry policy so call sites stay declarative.
async fn run(job: Job) -> Result<Output> {
let mut delay = 250;
for attempt in 0..5 {
match gateway.call(&job).await {
Ok(out) => return Ok(out),
Err(e) if e.retryable() => {
sleep(Duration::from_millis(delay)).await;
delay = (delay * 2).min(8_000);
}
Err(e) => return Err(e),
}
}
Err(Error::Exhausted)
}