← Recipes
Recipe

Recipe: Screenshot QA

Feed a screenshot to a vision model and get a structured bug report in return.

Ingredients

  • A vision-capable model (GPT-4V, Claude 3.5 Sonnet, Gemini Pro Vision)
  • One screenshot of a UI bug, glitch, or unexpected state
  • A prompt template that asks for structured output

Steps

  1. Capture the screenshot. Include the full viewport — address bar, console errors, and surrounding UI all provide context the model can use.
  2. Paste into the prompt template. Ask the model to describe what it sees, identify anomalies, and propose a root cause.
  3. Request structured output. Specify fields: summary, severity, repro steps, expected vs actual, and a hypothesis.
  4. Review and refine. The first pass is a draft. Edit for accuracy, add reproduction steps the model cannot infer, and attach the original screenshot.

Prompt Template

You are a QA engineer reviewing a screenshot of a web
application. Analyze the image and return a JSON object:

{
  "summary": "one-line description",
  "severity": "low | medium | high | critical",
  "expected": "what should have happened",
  "actual": "what the screenshot shows",
  "hypothesis": "likely root cause",
  "repro_steps": ["step 1", "step 2"]
}

Only return valid JSON. No markdown wrapping.

Why This Works

Vision models excel at pattern recognition. A garbled layout, missing element, or misaligned text jumps out immediately — often faster than a human scanning the same frame. Structured output means the result lands in your tracker without reformatting.