Modal compute + Meridian
Run Meridian inference jobs on Modal's serverless GPU infrastructure. Deploy in seconds, scale to zero when idle, and pay only for the compute you actually use. No Kubernetes, no cold-start headaches.
Quickstart
Install the dependencies and deploy your first inference function in under two minutes.
Install
pip install modal meridian-clientWrite your inference function
Create meridian_inference.py with a Modal app that wraps the Meridian client.
import modal
from meridian_client import MeridianClient
app = modal.App("meridian-inference")
image = modal.Image.debian_slim().pip_install("meridian-client")
@app.function(image=image, gpu="A10G")
def run_inference(prompt: str, model: str = "meridian-7b"):
client = MeridianClient(
api_key=modal.Secret.from_name("meridian-api-key")
)
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
temperature=0.7,
max_tokens=1024,
)
return response.choices[0].message.content
@app.local_entrypoint()
def main():
result = run_inference.remote(
"Explain quantum entanglement to a 12-year-old"
)
print(result)Deploy
modal deploy meridian_inference.pyCall from anywhere
Once deployed, invoke your function from any Python environment using Modal's function lookup.
import modal
f = modal.Function.lookup("meridian-inference", "run_inference")
result = f.remote("Summarize this research paper...")
print(result)Why Modal + Meridian?
GPU on demand
A10G, A100, H100 — attach any GPU class per function. No reserved instances required.
Scale to zero
Functions idle at zero cost. Modal cold-starts in under a second when a request arrives.
Secrets management
Store your Meridian API key in Modal Secrets. Never hardcode credentials in source.
Batch inference
Use Modal.map() to fan out across hundreds of GPUs for bulk processing jobs.
Web endpoints
Expose your inference function as a REST endpoint with @app.function() and modal serve.
Usage-based billing
Pay per second of GPU time. No monthly commitments, no overprovisioning.