Glossary

What Is an Outage?

A specific event where a service becomes unavailable or stops functioning correctly.

Definition

An outage is a discrete period during which a service is down or not working as intended. Where downtime is the cumulative measure, an outage is the individual incident — "the API outage on Tuesday" — with a start, a duration, and a resolution.

Outages range from total (everything is offline) to partial (one region, one feature, or one dependency is affected). Both matter, because even a partial outage can block a critical user flow like checkout or login.

Why It Matters

Outages are the events that damage revenue and trust, and how you handle them defines your reliability reputation. Detecting an outage quickly, communicating it clearly, and resolving it fast are what separate a minor blip from a crisis. Tracking outages also reveals patterns worth fixing.

How It Works

Monitoring detects an outage when checks start failing, records the start time, and (with alerting) notifies your team. When checks succeed again, the outage end time is recorded and the incident is resolved. The outage duration feeds your downtime and uptime numbers.

Real-World Example

At 9:14 AM a deploy breaks the login endpoint. Monitoring detects failing checks, opens an incident, and alerts the team. The bad deploy is rolled back and login recovers at 9:31 AM. The outage lasted 17 minutes and is logged with a full timeline.

Best Practices

  • Detect outages fast with frequent checks and instant alerts
  • Communicate outages on a status page to reduce support load
  • Record a timeline for every outage: detected, investigating, resolved
  • Run a brief post-incident review to prevent repeats
  • Distinguish partial outages so you understand real user impact

Common Mistakes

  • Finding out about outages from customers instead of monitoring
  • Staying silent during an outage instead of posting status updates
  • Not recording outage timelines, so lessons are lost
  • Treating every outage as total when many are partial
  • Skipping the post-mortem and repeating the same failure

In Monitoristic

When Monitoristic records failed checks it opens an incident automatically, notifies you via Telegram and webhooks, and re-checks every 60 seconds. The incident timeline captures the start, any updates, and the resolution — and can be shown on your public status page.

Frequently Asked Questions

What is the difference between an outage and downtime?
An outage is a single event of unavailability; downtime is the total time across all outages in a period.
What is a partial outage?
A partial outage affects only part of a service — one region, feature, or dependency — while the rest keeps working. It can still block critical flows.
How should I communicate during an outage?
Post timely updates on a status page: acknowledge the issue, share what you know, and confirm when it's resolved. Clear communication reduces support tickets and preserves trust.
How fast should outages be detected?
As fast as your check interval allows. For revenue-critical services, 1-2 minute checks keep worst-case detection time low.

Get started today

Your Sites Deserve Better Monitoring.

Create monitors, connect alerts, and share status pages with your customers. Plans from $5/month.