Alert fatigue reduction playbook
A step-by-step recipe to cut noise, restore signal, and stop your team from muting channels.
Ingredients
- Access to your alerting platform (PagerDuty, Opsgenie, or similar)
- 30 days of historical alert data (CSV export is fine)
- Stakeholder buy-in from one engineering lead
- A spreadsheet or notebook for the audit
Steps
- 1
Export & classify
Pull 30 days of alerts. Tag each as actionable, informational, or noise. Aim for <20% noise — if you are above 40%, you have a culture problem, not a tool problem.
- 2
Identify the top-3 offenders
Sort by volume. The top three alert sources usually account for 60–80% of total noise. Focus here first — do not boil the ocean.
- 3
Tune or tombstone
For each top offender: adjust thresholds, widen windows, aggregate duplicates, or delete the rule entirely. If nobody has acted on it in 90 days, it is dead weight.
- 4
Set a noise budget
Cap non-actionable alerts at 15% of total volume per week. Review in your weekly ops sync. If the budget breaks, freeze new alert rules until it is back under.
- 5
Automate the guardrail
Schedule a weekly script that counts alert volume by classification and posts a summary to your team channel. Visibility alone drives compliance.
Expected outcome
Within two weeks, total alert volume drops 40–60%. On-call rotations stop dreading their shift. Mean-time-to-acknowledge for real incidents improves because signal is no longer buried.