← Back to docs
Recipe

SLO and error budget

Define service-level objectives and burn error budgets to balance reliability with velocity.

What is an SLO?

A Service-Level Objective is a target for how reliable your service should be over a window — typically 99.9% availability or <100ms p95 latency. It is the internal promise you make before an SLA becomes a contract.

Error budget

The error budget is 100% minus your SLO. If your SLO is 99.9%, you have 0.1% of downtime to spend on risky deploys, experiments, or planned maintenance. When the budget is exhausted, freeze all changes until reliability recovers.

Burn rate alerts

A burn rate measures how fast you consume error budget. Alert on burn rates that would exhaust the budget before the window ends — typically 14.4x for a 1-hour critical alert and 1x for a 3-day warning.

Multi-window approach

Track error budgets over rolling windows: 30 days for long-term trends, 7 days for sprint-level decisions, and 1 day for operational response. Each window gates different classes of change.