Recipe: Screenshot QA
Feed a screenshot to a vision model and get a structured bug report in return.
Ingredients
- A vision-capable model (GPT-4V, Claude 3.5 Sonnet, Gemini Pro Vision)
- One screenshot of a UI bug, glitch, or unexpected state
- A prompt template that asks for structured output
Steps
- Capture the screenshot. Include the full viewport — address bar, console errors, and surrounding UI all provide context the model can use.
- Paste into the prompt template. Ask the model to describe what it sees, identify anomalies, and propose a root cause.
- Request structured output. Specify fields: summary, severity, repro steps, expected vs actual, and a hypothesis.
- Review and refine. The first pass is a draft. Edit for accuracy, add reproduction steps the model cannot infer, and attach the original screenshot.
Prompt Template
You are a QA engineer reviewing a screenshot of a web
application. Analyze the image and return a JSON object:
{
"summary": "one-line description",
"severity": "low | medium | high | critical",
"expected": "what should have happened",
"actual": "what the screenshot shows",
"hypothesis": "likely root cause",
"repro_steps": ["step 1", "step 2"]
}
Only return valid JSON. No markdown wrapping.Why This Works
Vision models excel at pattern recognition. A garbled layout, missing element, or misaligned text jumps out immediately — often faster than a human scanning the same frame. Structured output means the result lands in your tracker without reformatting.