Recipe

Vision models primer

Meridian routes vision-capable requests to the cheapest model that satisfies your accuracy floor. This recipe walks through sending an image plus a prompt, picking a model tier, and handling streamed multimodal output without lock-in to any single provider.

1. Pick a vision tier

Meridian exposes three vision tiers: azure/gpt-4o-mini for OCR and simple captions, azure/gpt-4o for charts and diagrams, and azure/model-router when you want the gateway to choose adaptively per request. Tier choice is one string change, not a code rewrite.

2. Send an image plus prompt

The SDK accepts either a remote URL or a base64 data URI. Remote URLs are fetched server-side by the gateway and never billed as egress to your account.

import { Meridian } from '@meridian/sdk';

const meridian = new Meridian({ apiKey: process.env.MERIDIAN_KEY });

const response = await meridian.chat.completions.create({
  model: 'azure/gpt-4o',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'Describe this chart in detail.' },
        {
          type: 'image_url',
          image_url: { url: 'https://example.com/q4-revenue.png' },
        },
      ],
    },
  ],
  max_tokens: 1024,
});

console.log(response.choices[0].message.content);

3. Cost and latency

Vision tokens are billed at the same per-token rate as text. A 720p image costs roughly 1,100 input tokens on the gpt-4o family. Median latency through the Meridian gateway is under 2.4s for a single-image prompt with a 1k output cap. Add a 20% markup over raw Azure pricing to estimate your bill.