2025-01-22 · Aisha Rahman
Error Budgets Without the Theater
SLO slides often look precise and act vague. Teams pick 99.9% because it sounds serious, then ignore burn rates until an incident.
Start with one user journey and one SLI you can measure today — usually availability or success rate on a critical API. Set a target you can afford to miss occasionally; error budgets exist to prioritize reliability work, not to punish teams.
Alert on burn rate, not static thresholds on raw metrics. Two-window policies catch fast burns and slow leaks. Document what happens when budget is exhausted: feature freeze, reliability sprint, or explicit risk acceptance.
Observability for System Designers walks through this with worksheets, not vendor pitches.
#SLO #observability
← All posts