Eric Workman

SRE Calculator Revisit

Published on
Show Help

Indicator

successfulvalid
x 100

Objective

SLO: 99.9%
Window: 30 days

Error Budget: 0.100%

| Not Successful

Successful |

|

Total

|

Burn Rate

Burn Rate: 14.4x

Alert Windows

Consumption Threshold: 2%
Short / Long Ratio: 1/12

3600 seconds

Long Window

(1 hour)

300 seconds

Short Window

(5 minutes)

176400 seconds

Time to Respond

(2 days 1 hour)

Alert Metric

not_successful in the last 3600 secvalid in the last 3600 sec
> 14.4 · 0.001
and
not_successful in the last 300 secvalid in the last 300 sec
> 14.4 · 0.001

Alert Demo

Intensity: 15%
Duration: 10 minutes

8 minutes

Detection Time

4 minutes

Reset Time

Earlier this week Alex Ewerlöf released the Service Level Calculator via his newsletter and substack. I've enjoyed Alex's content on reliability engineering, career growth and leadership, and organizational change. His calculator inspired me to subscribe (finally, sorry!) and re-roll my own. A long time ago, I made a far-too-basic and assumption-filled downtime to SLO calculator that missed the nuance and most of the point of indicators and objectives.

There's a lot I like about Alex's calculator and a few things I dislike. Splitting the calculator into SLI, SLO, and Alerting categories is great. Including costs is great and almost always overlooked. The help texts are actually helpful, and the presets are useful. At the expense of complication, he includes ways to change the events unit and amount, supports time-based indicators, and hides short window alerts. I'm not a huge fan of the budget consumption graph.

Above you'll find my go at a calculator. I tried to simplify this to what I've seen used and work, and I kept some of the same graphs that you can find in the SRE Workbook. Hopefully, you can connect what that workbook recommends with the interactive graphs here. I'll probably expand this tool out in the coming weeks.