Architecture

Queue system design

How Meridian processes recipe generation requests at scale with fairness, retries, and observability.

Overview

Every recipe generation request enters a distributed queue backed by Upstash QStash. This decouples the API layer from GPU inference workers, absorbs traffic spikes, and guarantees at-least-once delivery.

Queue topology

Client → API (Vercel Edge)
   → QStash (persist + dedupe)
     → Worker pool (GPU inference)
       → Upstash KV (result cache)
         → Client poll / webhook

The edge API enqueues a message and returns a job ID immediately. Workers pull jobs, run the model, and write results to KV with a 24-hour TTL. Clients poll the status endpoint or receive a webhook callback.

Fair scheduling

Each user has a per-minute token bucket. Free tier: 3 RPM. Pro tier: 30 RPM. Exceeding the limit returns HTTP 429 with a Retry-After header. Within the bucket, jobs are FIFO.

Workers implement cooperative concurrency — each worker processes one job at a time to avoid GPU memory contention. Horizontal scaling adds workers; QStash distributes messages round-robin.

Retry policy

Max 3 attempts with exponential backoff: 1s, 4s, 16s.
Jitter of ±25% prevents thundering herd on retry storms.
Permanent failures (invalid prompt, banned content) go to a dead-letter log and are not retried.
Transient failures (GPU OOM, timeout) trigger retry with backoff.

Idempotency

Every request carries an idempotency key (UUID v7). QStash deduplicates messages with the same key within a 1-hour window. Duplicate submissions return the cached result from KV immediately.

Observability

Queue depth, processing latency (p50/p95/p99), and error rates are exported as Prometheus metrics from each worker. A Grafana dashboard surfaces queue health. Alerts fire on p99 latency exceeding 30s or dead-letter queue growth.

Circuit breaker

If the worker pool error rate exceeds 20% over a 60-second window, the API layer stops enqueuing new jobs and returns HTTP 503. The breaker half-opens after 30 seconds, allowing one probe request. Success resets the breaker; failure re-opens it.

Next: learn how Meridian handles prompt engineering for optimal recipe generation.