RECIPE

Speech-to-text patterns

Meridian routes speech-to-text through the same gateway as chat completions. Pick a model, hand it audio bytes, get back word-level timestamps and speaker diarization in a single round-trip. The patterns below cover the three shapes most production apps need: file upload, streaming mic, and async callback.

1. Buffered file transcription

Best for call recordings, voicemails, and any audio that already lives on disk or in object storage. Pass the raw bytes, request diarization, and Meridian returns segments ordered by start time. Files up to 200 MB go through in one request; larger payloads should chunk on silence boundaries.

import { Meridian } from "@meridian/sdk";

const meridian = new Meridian({
  apiKey: process.env.MERIDIAN_API_KEY,
});

const audio = await fetch("https://example.com/call.wav");
const buffer = Buffer.from(await audio.arrayBuffer());

const transcript = await meridian.audio.transcribe({
  model: "whisper-large-v3",
  audio: buffer,
  language: "en",
  diarize: true,
  timestamps: "word",
});

for (const segment of transcript.segments) {
  console.log(`[${segment.speaker}] ${segment.text}`);
}

2. Streaming microphone input

Open a WebSocket to wss://meridian.getnimbus.net/v1/audio/stream, push 20 ms PCM frames, and read partial transcripts as they arrive. Median first-token latency under 220 ms in us-east. Use this for live captions, voice agents, and any UI that needs interim text before the speaker finishes.

3. Async batch with webhooks

For overnight jobs or queues, submit with callback_url and Meridian POSTs the finished transcript when ready. No polling, no held connections, signed with your webhook secret so you can verify the payload before persisting it.