← Back to docsRecipe

Voice command → API action

Capture spoken commands from the browser, transcribe via Whisper, and trigger Meridian API endpoints with the parsed intent.

Step 1 — Capture audio

Request microphone access through the browser MediaRecorder API. Buffer chunks in memory and finalize as a WAV blob when the user releases the push-to-talk button.

Step 2 — Transcribe

POST the audio blob to your backend or directly to the OpenAI Whisper endpoint. Receive a plain-text transcript with optional language and confidence metadata.

Step 3 — Parse intent

Send the transcript to a lightweight LLM call with a system prompt that maps natural language to Meridian action names and payload shapes. Return structured JSON.

Step 4 — Execute

Call the Meridian API with the resolved action and parameters. Surface the result in the UI with a confirmation toast or spoken TTS response.

Endpoints used

  • POST /api/actions/execute
  • POST /api/intent/parse