Circuit Breaker
Prevent cascading failures when external services degrade.
Problem
A downstream service — license validation, payment gateway, or telemetry sink — begins timing out. Without a circuit breaker, every inbound request blocks on that dependency, exhausting thread pools and bringing the entire system down.
Solution
Wrap every external call in a state machine with three states: closed, open, and half-open. Track consecutive failures. When the threshold is breached, trip the circuit open and immediately reject requests for a cooldown window. After the window expires, allow one probe request through. Success resets the circuit; failure extends the open state.
Key Parameters
- Failure threshold — how many consecutive errors before tripping (typically 5).
- Cooldown window — how long the circuit stays open before probing (30–60 seconds).
- Half-open probe limit — number of requests allowed through during the probe phase (usually 1).
Fallback Strategy
When the circuit is open, return a cached response or a graceful degradation signal. For license checks, serve from the offline grace cache signed with HMAC. Never propagate the upstream failure to the end user.
Monitoring
Expose circuit state, trip count, and last-failure timestamp via a lightweight health endpoint. Alert when any circuit enters the open state so operators can investigate the root cause before the cooldown expires.