Recipe
Audio models primer
Meridian exposes a unified audio surface across transcription, text-to-speech, and real-time streaming endpoints. This primer walks through the three most common shapes you will hit when wiring an agent or a voice product on top of the gateway.
1. Transcription
Send a short clip to /v1/audio/transcriptions with a multipart/form-data body. The gateway routes to the cheapest backend that meets your latency budget and returns plain text or a verbose JSON envelope.
2. Text-to-speech
Hit /v1/audio/speech with a voice id and an input string. Output streams as MP3 or PCM. Voices are stable across model upgrades so you can pin one in production without rev-locking the underlying weights.
3. Real-time
For live agents, open a WebSocket to /v1/realtime and stream 16kHz PCM frames. The session multiplexes ASR, LLM, and TTS in a single duplex channel under 400ms round-trip.
curl https://meridian.getnimbus.net/v1/audio/speech \
-H "Authorization: Bearer $MERIDIAN_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"voice": "alloy",
"input": "Hello from Meridian."
}' --output hello.mp3