Error Budget: 0.100%
| Not Successful
Time to Respond
(2 days 1 hour)
Earlier this week Alex Ewerlöf released the Service Level Calculator via his newsletter and substack. I've enjoyed Alex's content on reliability engineering, career growth and leadership, and organizational change. His calculator inspired me to subscribe (finally, sorry!) and re-roll my own. A long time ago, I made a far-too-basic and assumption-filled downtime to SLO calculator that missed the nuance and most of the point of indicators and objectives.
There's a lot I like about Alex's calculator and a few things I dislike. Splitting the calculator into SLI, SLO, and Alerting categories is great. Including costs is great and almost always overlooked. The help texts are actually helpful, and the presets are useful. At the expense of complication, he includes ways to change the events unit and amount, supports time-based indicators, and hides short window alerts. I'm not a huge fan of the budget consumption graph.
Above you'll find my go at a calculator. I tried to simplify this to what I've seen used and work, and I kept some of the same graphs that you can find in the SRE Workbook. Hopefully, you can connect what that workbook recommends with the interactive graphs here. I'll probably expand this tool out in the coming weeks.