Audio models
Speech-to-text and text-to-speech endpoints available through the Meridian API.
Speech-to-text
STT
whisper-1OpenAI Whisper v3 large. 99 languages, word-level timestamps, diarization hints. Latency ~800ms for 30s clips.
- • 16 kHz FLAC / WAV / MP3 input
- • prompt parameter for domain vocabulary
- • response_format: json, text, srt, vtt
Text-to-speech
TTS
tts-1Standard quality. 24 kHz output, ~300ms latency. Best for real-time streaming.
Voices: alloy, echo, fable, onyx, nova, shimmer
TTS-HD
tts-1-hdHigh-definition. 48 kHz output, ~1.2s latency. Fuller frequency range, reduced sibilance.
Voices: alloy, echo, fable, onyx, nova, shimmer
Voice options
alloy
echo
fable
onyx
nova
shimmer
Pass voice in the request body. Default: alloy.