← Docs

Audio models

Speech-to-text and text-to-speech endpoints available through the Meridian API.

Speech-to-text

STTwhisper-1

OpenAI Whisper v3 large. 99 languages, word-level timestamps, diarization hints. Latency ~800ms for 30s clips.

  • • 16 kHz FLAC / WAV / MP3 input
  • prompt parameter for domain vocabulary
  • response_format: json, text, srt, vtt

Text-to-speech

TTStts-1

Standard quality. 24 kHz output, ~300ms latency. Best for real-time streaming.

Voices: alloy, echo, fable, onyx, nova, shimmer

TTS-HDtts-1-hd

High-definition. 48 kHz output, ~1.2s latency. Fuller frequency range, reduced sibilance.

Voices: alloy, echo, fable, onyx, nova, shimmer

Voice options

alloy
echo
fable
onyx
nova
shimmer

Pass voice in the request body. Default: alloy.