Recipe

Metric anomaly explainer

Surface the root cause behind sudden metric shifts using layered decomposition and natural-language summaries.

Ingredients

  • Time-series metric feed (Prometheus / Datadog / custom)
  • Decomposition engine (STL or Prophet backend)
  • LLM summarizer with structured prompt template
  • Threshold config: z-score ≥ 3.0, min deviation 5%

Steps

  1. Ingest — pull the last 90 days of the target metric at native resolution.
  2. Decompose — split into trend, seasonal, and residual components.
  3. Detect — flag residuals exceeding the z-score threshold.
  4. Correlate — join anomaly windows against deployment events, config changes, and upstream dependency metrics.
  5. Explain — pass structured context to the LLM; emit a 3-bullet summary with confidence scores.

Output

Anomaly: p99 latency +340ms

Window: 14:22–14:29 UTC

Trend: +12ms/week (within baseline)

Seasonal: 0.94 match to daily pattern

Residual: 4.1σ

→ Deploy v2.7.1 (14:21) introduced connection-pool shrink. Rollback recommended. Confidence: 0.91.

Guardrails

  • Require ≥ 60 days of history before emitting explanations.
  • Suppress alerts during known maintenance windows.
  • Rate-limit LLM calls to 10/hour per metric.
Meridian · getnimbus.net