Queue system design
How Meridian processes recipe generation requests at scale with fairness, retries, and observability.
Overview
Every recipe generation request enters a distributed queue backed by Upstash QStash. This decouples the API layer from GPU inference workers, absorbs traffic spikes, and guarantees at-least-once delivery.
Queue topology
Client → API (Vercel Edge)
→ QStash (persist + dedupe)
→ Worker pool (GPU inference)
→ Upstash KV (result cache)
→ Client poll / webhookThe edge API enqueues a message and returns a job ID immediately. Workers pull jobs, run the model, and write results to KV with a 24-hour TTL. Clients poll the status endpoint or receive a webhook callback.
Fair scheduling
Each user has a per-minute token bucket. Free tier: 3 RPM. Pro tier: 30 RPM. Exceeding the limit returns HTTP 429 with a Retry-After header. Within the bucket, jobs are FIFO.
Workers implement cooperative concurrency — each worker processes one job at a time to avoid GPU memory contention. Horizontal scaling adds workers; QStash distributes messages round-robin.
Retry policy
- Max 3 attempts with exponential backoff: 1s, 4s, 16s.
- Jitter of ±25% prevents thundering herd on retry storms.
- Permanent failures (invalid prompt, banned content) go to a dead-letter log and are not retried.
- Transient failures (GPU OOM, timeout) trigger retry with backoff.
Idempotency
Every request carries an idempotency key (UUID v7). QStash deduplicates messages with the same key within a 1-hour window. Duplicate submissions return the cached result from KV immediately.
Observability
Queue depth, processing latency (p50/p95/p99), and error rates are exported as Prometheus metrics from each worker. A Grafana dashboard surfaces queue health. Alerts fire on p99 latency exceeding 30s or dead-letter queue growth.
Circuit breaker
If the worker pool error rate exceeds 20% over a 60-second window, the API layer stops enqueuing new jobs and returns HTTP 503. The breaker half-opens after 30 seconds, allowing one probe request. Success resets the breaker; failure re-opens it.
Next: learn how Meridian handles prompt engineering for optimal recipe generation.