Recipe Video Captioning

End-to-end workflow for generating accurate, timed captions from cooking videos using Meridian's multimodal pipeline.

Overview

Upload a recipe video and Meridian transcribes speech, identifies cooking steps, and overlays synchronized captions. The pipeline handles background noise, overlapping dialogue, and ingredient terminology out of the box.

Step 1 — Upload

POST your video to /api/v1/captioning/upload with multipart form data. Supported formats: MP4, MOV, WebM. Maximum file size is 2 GB.

curl -X POST https://api.getnimbus.net/v1/captioning/upload \
  -H "Authorization: Bearer $MERIDIAN_KEY" \
  -F "video=@pasta_tutorial.mp4"

Step 2 — Process

The job ID returned from upload is used to poll /api/v1/captioning/status. Processing typically completes in under 90 seconds for a 10-minute video.

Speech-to-Text

Whisper-large-v3 tuned on culinary vocabulary.

Timestamp Alignment

Word-level sync with forced alignment.

Step 3 — Retrieve

Download the SRT or WebVTT file from /api/v1/captioning/download. Captions include ingredient names, quantities, and technique labels as metadata.

Output Format

1
00:00:02,140 --> 00:00:05,820
Start by dicing two medium onions
[ingredient: onion] [quantity: 2]

2
00:00:06,100 --> 00:00:09,450
Heat olive oil in a large skillet
[technique: sauté] [tool: skillet]

Pro Tip

Enable "recipe mode" with the recipe_mode=true query parameter to extract structured ingredient lists and step-by-step instructions alongside captions.