Recipe
Recipe: Voice form filler
Capture spoken input via Whisper and map it to structured form fields in real time.
Overview
This recipe demonstrates a voice-to-form pipeline. A user speaks into their microphone, the audio is streamed to OpenAI Whisper for transcription, and the resulting text is parsed into discrete form fields — name, email, phone, and freeform notes.
Architecture
1Browser captures audio via MediaRecorder API
2Audio chunks streamed to your API route
3API forwards to Whisper for transcription
4Structured extraction maps text to form fields
Field mapping
| Spoken phrase | Extracted field |
|---|---|
| "My name is Jane Doe" | name |
| "Email jane@example.com" | email |
| "Phone is 555-0123" | phone |
Requirements
- OpenAI API key with Whisper access
- Browser supporting MediaRecorder (Chrome 49+, Firefox 25+)
- HTTPS origin for getUserMedia
Full implementation with streaming hooks and extraction prompt available in the recipes repository.