Audio models

Speech-to-text and text-to-speech endpoints available through the Meridian API.

Speech-to-text

STTwhisper-1

OpenAI Whisper v3 large. 99 languages, word-level timestamps, diarization hints. Latency ~800ms for 30s clips.

TTStts-1

Standard quality. 24 kHz output, ~300ms latency. Best for real-time streaming.

Voices: alloy, echo, fable, onyx, nova, shimmer

TTS-HDtts-1-hd

High-definition. 48 kHz output, ~1.2s latency. Fuller frequency range, reduced sibilance.

Voices: alloy, echo, fable, onyx, nova, shimmer

alloy

echo

fable

onyx

nova

shimmer

Pass voice in the request body. Default: alloy.