Recipe
Metric anomaly explainer
Surface the root cause behind sudden metric shifts using layered decomposition and natural-language summaries.
Ingredients
- Time-series metric feed (Prometheus / Datadog / custom)
- Decomposition engine (STL or Prophet backend)
- LLM summarizer with structured prompt template
- Threshold config: z-score ≥ 3.0, min deviation 5%
Steps
- Ingest — pull the last 90 days of the target metric at native resolution.
- Decompose — split into trend, seasonal, and residual components.
- Detect — flag residuals exceeding the z-score threshold.
- Correlate — join anomaly windows against deployment events, config changes, and upstream dependency metrics.
- Explain — pass structured context to the LLM; emit a 3-bullet summary with confidence scores.
Output
Anomaly: p99 latency +340ms
Window: 14:22–14:29 UTC
Trend: +12ms/week (within baseline)
Seasonal: 0.94 match to daily pattern
Residual: 4.1σ
→ Deploy v2.7.1 (14:21) introduced connection-pool shrink. Rollback recommended. Confidence: 0.91.
Guardrails
- Require ≥ 60 days of history before emitting explanations.
- Suppress alerts during known maintenance windows.
- Rate-limit LLM calls to 10/hour per metric.