What Is an Outage? — Uptime Monitoring Glossary

Definition

An outage is a discrete period during which a service is down or not working as intended. Where downtime is the cumulative measure, an outage is the individual incident — "the API outage on Tuesday" — with a start, a duration, and a resolution.

Outages range from total (everything is offline) to partial (one region, one feature, or one dependency is affected). Both matter, because even a partial outage can block a critical user flow like checkout or login.

Why It Matters

Outages are the events that damage revenue and trust, and how you handle them defines your reliability reputation. Detecting an outage quickly, communicating it clearly, and resolving it fast are what separate a minor blip from a crisis. Tracking outages also reveals patterns worth fixing.

How It Works

Monitoring detects an outage when checks start failing, records the start time, and (with alerting) notifies your team. When checks succeed again, the outage end time is recorded and the incident is resolved. The outage duration feeds your downtime and uptime numbers.

Real-World Example

At 9:14 AM a deploy breaks the login endpoint. Monitoring detects failing checks, opens an incident, and alerts the team. The bad deploy is rolled back and login recovers at 9:31 AM. The outage lasted 17 minutes and is logged with a full timeline.

Best Practices

Detect outages fast with frequent checks and instant alerts
Communicate outages on a status page to reduce support load
Record a timeline for every outage: detected, investigating, resolved
Run a brief post-incident review to prevent repeats
Distinguish partial outages so you understand real user impact

Common Mistakes

Finding out about outages from customers instead of monitoring
Staying silent during an outage instead of posting status updates
Not recording outage timelines, so lessons are lost
Treating every outage as total when many are partial
Skipping the post-mortem and repeating the same failure

In Monitoristic

When Monitoristic records failed checks it opens an incident automatically, notifies you via Telegram and webhooks, and re-checks every 60 seconds. The incident timeline captures the start, any updates, and the resolution — and can be shown on your public status page.

Start monitoring →

Frequently Asked Questions

What is the difference between an outage and downtime?

An outage is a single event of unavailability; downtime is the total time across all outages in a period.

What is a partial outage?

A partial outage affects only part of a service — one region, feature, or dependency — while the rest keeps working. It can still block critical flows.

How should I communicate during an outage?

Post timely updates on a status page: acknowledge the issue, share what you know, and confirm when it's resolved. Clear communication reduces support tickets and preserves trust.

How fast should outages be detected?

As fast as your check interval allows. For revenue-critical services, 1-2 minute checks keep worst-case detection time low.