← Back to docs
Recipe
Multimodal LLM primer
A practical walkthrough for routing image, audio, and text inputs through Meridian's multimodal endpoints. Covers payload shape, model selection, and cost-aware fallback so you can ship a working vision pipeline in under an afternoon.
1.Pick a vision-capable model
Not every model accepts images. Route through model-router and Meridian picks the cheapest vision-capable backend that fits your input size and latency target. For high-resolution OCR, pin gpt-4o directly.
2.Shape the request
Images go in the content array as typed parts. Base64 inline for <5MB, signed URLs for anything larger.
POST https://llm.getnimbus.net/v1/chat/completions
{
"model": "model-router",
"messages": [{
"role": "user",
"content": [
{ "type": "text", "text": "What's in this chart?" },
{ "type": "image_url",
"image_url": { "url": "data:image/png;base64,..." } }
]
}]
}3.Handle fallback
If the router rejects a malformed image, Meridian returns a structured error with a suggested retry model. Catch unsupported_modality and retry against the suggestion before surfacing a user-facing failure.