Integration Guide
Modal Workers + Meridian
Offload async LLM inference jobs to Modal's serverless GPU infrastructure. Meridian handles queuing, retries, and result delivery — your workers stay stateless and fast.
⚡
GPU Inference
Run Llama, Mistral, or custom models on A100/H100 GPUs with zero cold-start tuning.
📬
Async Queues
Meridian enqueues jobs, Modal workers pull and execute. Results stream back via webhooks.
📈
Auto-scale
Modal scales containers to zero when idle. Pay only for GPU-seconds consumed.
Architecture
Meridian API→Redis Queue→Modal Worker→Webhook→Client
Ready to ship GPU-powered features?
Connect your Modal account in the Meridian dashboard and start deploying workers in under 5 minutes.
Open Dashboard