Definition
A Service Level Agreement (SLA) is a formal commitment about how reliable a service will be, usually expressed as an uptime percentage over a period. When a provider offers a "99.9% uptime SLA," they are promising the service will be available at least 99.9% of the time, and typically agreeing to consequences — such as service credits — if they fall short.
The uptime figure in an SLA translates into a concrete downtime budget. 99.9% allows about 43 minutes of downtime per month; 99.99% allows only about 4 minutes. Each additional "nine" represents roughly a tenfold reduction in permitted downtime — and usually a large jump in cost and engineering effort to achieve.
When you offer a service, your SLA is the promise you make to customers. When you consume one, it is the promise being made to you — and the only way to know whether it is being kept is to measure independently.
Why It Matters
An SLA sets expectations and accountability. For customers, it defines what reliability they can count on and what they are owed if it slips. For providers, it is a public commitment that shapes architecture, on-call practices, and monitoring. Either way, an SLA is only as meaningful as your ability to measure against it — which is why independent monitoring matters.
How It Works
An SLA states a target uptime percentage and a measurement window. Allowed downtime equals the window multiplied by (100% − the target). Providers track their own uptime; sophisticated customers track it independently with external monitoring, because a provider's internal measurement may use favorable assumptions or infrequent sampling. If measured uptime falls below the SLA, the agreed remedy (often service credits) applies.
Real-World Example
A SaaS vendor offers a 99.9% monthly uptime SLA with 10% service credits for misses. In a 30-day month, 99.9% permits about 43m 49s of downtime. The vendor suffers a 70-minute outage — exceeding the budget — so customers are entitled to the credit. A customer who only relied on the vendor's status page might never have noticed; one with their own external monitor has the timestamped evidence to claim it.
Best Practices
- Match your SLA target to reality — 99.9% is achievable for most small teams; promise only what you can deliver
- State the measurement window and what counts as downtime explicitly
- Measure SLA compliance with external monitoring, not just the provider's own numbers
- Keep a complete incident history so you can prove or dispute SLA breaches
- Use a check interval fine enough to measure the target accurately (1-minute checks for a 99.99% target)
Common Mistakes
- Promising 100% uptime, which no provider can honestly guarantee
- Quoting an SLA without defining the measurement window or what 'downtime' means
- Trusting a provider's self-reported uptime without independent verification
- Setting a 99.99% target while measuring with 5-minute checks that can't detect it accurately
- Having no incident records, making SLA-credit claims impossible to substantiate
In Monitoristic
Monitoristic gives you the independent measurement an SLA needs: regular checks, an uptime percentage, and an automatic incident timeline with timestamps. Whether you are holding a vendor to their SLA or proving your own, you have your own record rather than relying on someone else's.