Recipe
Vision models primer
Meridian routes vision-capable requests to the cheapest model that satisfies your accuracy floor. This recipe walks through sending an image plus a prompt, picking a model tier, and handling streamed multimodal output without lock-in to any single provider.
1. Pick a vision tier
Meridian exposes three vision tiers: azure/gpt-4o-mini for OCR and simple captions, azure/gpt-4o for charts and diagrams, and azure/model-router when you want the gateway to choose adaptively per request. Tier choice is one string change, not a code rewrite.
2. Send an image plus prompt
The SDK accepts either a remote URL or a base64 data URI. Remote URLs are fetched server-side by the gateway and never billed as egress to your account.
import { Meridian } from '@meridian/sdk';
const meridian = new Meridian({ apiKey: process.env.MERIDIAN_KEY });
const response = await meridian.chat.completions.create({
model: 'azure/gpt-4o',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'Describe this chart in detail.' },
{
type: 'image_url',
image_url: { url: 'https://example.com/q4-revenue.png' },
},
],
},
],
max_tokens: 1024,
});
console.log(response.choices[0].message.content);3. Cost and latency
Vision tokens are billed at the same per-token rate as text. A 720p image costs roughly 1,100 input tokens on the gpt-4o family. Median latency through the Meridian gateway is under 2.4s for a single-image prompt with a 1k output cap. Add a 20% markup over raw Azure pricing to estimate your bill.