Recipe Video Captioning
End-to-end workflow for generating accurate, timed captions from cooking videos using Meridian's multimodal pipeline.
Overview
Upload a recipe video and Meridian transcribes speech, identifies cooking steps, and overlays synchronized captions. The pipeline handles background noise, overlapping dialogue, and ingredient terminology out of the box.
Step 1 — Upload
POST your video to /api/v1/captioning/upload with multipart form data. Supported formats: MP4, MOV, WebM. Maximum file size is 2 GB.
curl -X POST https://api.getnimbus.net/v1/captioning/upload \
-H "Authorization: Bearer $MERIDIAN_KEY" \
-F "video=@pasta_tutorial.mp4"Step 2 — Process
The job ID returned from upload is used to poll /api/v1/captioning/status. Processing typically completes in under 90 seconds for a 10-minute video.
Speech-to-Text
Whisper-large-v3 tuned on culinary vocabulary.
Timestamp Alignment
Word-level sync with forced alignment.
Step 3 — Retrieve
Download the SRT or WebVTT file from /api/v1/captioning/download. Captions include ingredient names, quantities, and technique labels as metadata.
Output Format
1
00:00:02,140 --> 00:00:05,820
Start by dicing two medium onions
[ingredient: onion] [quantity: 2]
2
00:00:06,100 --> 00:00:09,450
Heat olive oil in a large skillet
[technique: sauté] [tool: skillet]Pro Tip
Enable "recipe mode" with the recipe_mode=true query parameter to extract structured ingredient lists and step-by-step instructions alongside captions.