Back to Docs
Recipe

Recipe: Voice form filler

Capture spoken input via Whisper and map it to structured form fields in real time.

Overview

This recipe demonstrates a voice-to-form pipeline. A user speaks into their microphone, the audio is streamed to OpenAI Whisper for transcription, and the resulting text is parsed into discrete form fields — name, email, phone, and freeform notes.

Architecture

1Browser captures audio via MediaRecorder API
2Audio chunks streamed to your API route
3API forwards to Whisper for transcription
4Structured extraction maps text to form fields

Field mapping

Spoken phraseExtracted field
"My name is Jane Doe"name
"Email jane@example.com"email
"Phone is 555-0123"phone

Requirements

  • OpenAI API key with Whisper access
  • Browser supporting MediaRecorder (Chrome 49+, Firefox 25+)
  • HTTPS origin for getUserMedia

Full implementation with streaming hooks and extraction prompt available in the recipes repository.