Speech-to-text (Whisper)

Convert audio into text using OpenAI's Whisper model. Send a multipart audio file and receive a transcription with optional timestamp granularity.

POSThttps://api.getnimbus.net/v1/audio/transcriptions

Authenticate with your API key in the Authorization: Bearer header.

Request body

Send a multipart/form-data payload with the following fields:

Field	Type	Required	Description
file	binary	Yes	Audio file (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm)
model	string	Yes	Use `whisper-1`
language	string	No	ISO-639-1 code (e.g. `en`) for better accuracy
response_format	string	No	`json`, `text`, `srt`, `verbose_json`, or `vtt` (default: `json`)
timestamp_granularities	string[]	No	`word` and/or `segment` (requires `verbose_json`)

Example

curl https://api.getnimbus.net/v1/audio/transcriptions \
  -H "Authorization: Bearer $MERIDIAN_API_KEY" \
  -F file="@audio.mp3" \
  -F model="whisper-1" \
  -F response_format="verbose_json" \
  -F timestamp_granularities[]="word"

Response

{
  "text": "Hello, this is a transcription.",
  "task": "transcribe",
  "language": "english",
  "duration": 2.34,
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 2.34,
      "text": "Hello, this is a transcription.",
      "words": [
        { "word": "Hello,", "start": 0.0, "end": 0.62 },
        { "word": "this", "start": 0.62, "end": 0.94 },
        { "word": "is", "start": 0.94, "end": 1.12 },
        { "word": "a", "start": 1.12, "end": 1.28 },
        { "word": "transcription.", "start": 1.28, "end": 2.34 }
      ]
    }
  ]
}

Limits

•Maximum file size: 25 MB
•Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
•Rate limit: 50 requests per minute per API key

← Back to docs Image generation →