Integration Guide

Modal compute + Meridian

Run Meridian inference jobs on Modal's serverless GPU infrastructure. Deploy in seconds, scale to zero when idle, and pay only for the compute you actually use. No Kubernetes, no cold-start headaches.

Quickstart

Install the dependencies and deploy your first inference function in under two minutes.

1

Install

pip install modal meridian-client
2

Write your inference function

Create meridian_inference.py with a Modal app that wraps the Meridian client.

import modal
from meridian_client import MeridianClient

app = modal.App("meridian-inference")
image = modal.Image.debian_slim().pip_install("meridian-client")

@app.function(image=image, gpu="A10G")
def run_inference(prompt: str, model: str = "meridian-7b"):
    client = MeridianClient(
        api_key=modal.Secret.from_name("meridian-api-key")
    )
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7,
        max_tokens=1024,
    )
    return response.choices[0].message.content

@app.local_entrypoint()
def main():
    result = run_inference.remote(
        "Explain quantum entanglement to a 12-year-old"
    )
    print(result)
3

Deploy

modal deploy meridian_inference.py
4

Call from anywhere

Once deployed, invoke your function from any Python environment using Modal's function lookup.

import modal

f = modal.Function.lookup("meridian-inference", "run_inference")
result = f.remote("Summarize this research paper...")
print(result)

Why Modal + Meridian?

GPU on demand

A10G, A100, H100 — attach any GPU class per function. No reserved instances required.

Scale to zero

Functions idle at zero cost. Modal cold-starts in under a second when a request arrives.

Secrets management

Store your Meridian API key in Modal Secrets. Never hardcode credentials in source.

Batch inference

Use Modal.map() to fan out across hundreds of GPUs for bulk processing jobs.

Web endpoints

Expose your inference function as a REST endpoint with @app.function() and modal serve.

Usage-based billing

Pay per second of GPU time. No monthly commitments, no overprovisioning.