Integration Guide

Modal Workers + Meridian

Offload async LLM inference jobs to Modal's serverless GPU infrastructure. Meridian handles queuing, retries, and result delivery — your workers stay stateless and fast.

GPU Inference

Run Llama, Mistral, or custom models on A100/H100 GPUs with zero cold-start tuning.

📬

Async Queues

Meridian enqueues jobs, Modal workers pull and execute. Results stream back via webhooks.

📈

Auto-scale

Modal scales containers to zero when idle. Pay only for GPU-seconds consumed.

Architecture

Meridian APIRedis QueueModal WorkerWebhookClient

Ready to ship GPU-powered features?

Connect your Modal account in the Meridian dashboard and start deploying workers in under 5 minutes.

Open Dashboard