Recipe

Audio models primer

Meridian exposes a unified audio surface across transcription, text-to-speech, and real-time streaming endpoints. This primer walks through the three most common shapes you will hit when wiring an agent or a voice product on top of the gateway.

1. Transcription

Send a short clip to /v1/audio/transcriptions with a multipart/form-data body. The gateway routes to the cheapest backend that meets your latency budget and returns plain text or a verbose JSON envelope.

2. Text-to-speech

Hit /v1/audio/speech with a voice id and an input string. Output streams as MP3 or PCM. Voices are stable across model upgrades so you can pin one in production without rev-locking the underlying weights.

3. Real-time

For live agents, open a WebSocket to /v1/realtime and stream 16kHz PCM frames. The session multiplexes ASR, LLM, and TTS in a single duplex channel under 400ms round-trip.

curl https://meridian.getnimbus.net/v1/audio/speech \
  -H "Authorization: Bearer $MERIDIAN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "voice": "alloy",
    "input": "Hello from Meridian."
  }' --output hello.mp3

Back to docs index