← Back to docs
Recipe

Multimodal LLM primer

A practical walkthrough for routing image, audio, and text inputs through Meridian's multimodal endpoints. Covers payload shape, model selection, and cost-aware fallback so you can ship a working vision pipeline in under an afternoon.

1.Pick a vision-capable model

Not every model accepts images. Route through model-router and Meridian picks the cheapest vision-capable backend that fits your input size and latency target. For high-resolution OCR, pin gpt-4o directly.

2.Shape the request

Images go in the content array as typed parts. Base64 inline for <5MB, signed URLs for anything larger.

POST https://llm.getnimbus.net/v1/chat/completions
{
  "model": "model-router",
  "messages": [{
    "role": "user",
    "content": [
      { "type": "text", "text": "What's in this chart?" },
      { "type": "image_url",
        "image_url": { "url": "data:image/png;base64,..." } }
    ]
  }]
}

3.Handle fallback

If the router rejects a malformed image, Meridian returns a structured error with a suggested retry model. Catch unsupported_modality and retry against the suggestion before surfacing a user-facing failure.