What Is an Incident? — Uptime Monitoring Glossary

Definition

An incident is a documented event representing a service disruption, opened when a problem is detected and closed when it's resolved. It bundles together the timeline, updates, and outcome of an outage into a single record.

Where an alert is a momentary notification, an incident is the ongoing story: when it started, what was investigated, what actions were taken, and when service was restored. Incident tracking turns chaotic outages into structured, reviewable events.

Why It Matters

Incidents give outages structure and memory. They coordinate the team during a crisis, communicate status to customers, and create a record you can learn from afterward. Without incident tracking, the same failures recur and the details — crucial for post-mortems and SLA claims — are lost.

How It Works

When monitoring detects a failure (often after confirmation), an incident is created automatically with a start time. As responders work, updates are added; the status page can reflect the incident. When checks pass again, the incident is resolved with an end time, producing a complete timeline used for downtime, MTTR, and reviews.

Real-World Example

Checkout starts failing at 10:02. An incident opens automatically and the team is alerted. They post updates — "investigating," then "rolling back a deploy" — and checkout recovers at 10:19. The incident auto-resolves with a full timeline the team reviews the next day.

Best Practices

Open incidents automatically on confirmed failures
Keep a clear timeline: detected, investigating, identified, resolved
Communicate incidents on a status page to reduce support load
Resolve incidents promptly and record the end time accurately
Run brief post-incident reviews to prevent repeats

Common Mistakes

Handling outages ad hoc with no incident record
Failing to communicate during an active incident
Leaving incidents open after the issue is resolved
Skipping the post-mortem and repeating the same failure
Losing timestamps needed for MTTR and SLA evidence

In Monitoristic

Monitoristic creates an incident automatically when a monitor goes down and resolves it when the monitor recovers, with a full timeline you, your team, and your public status page can see.

Start monitoring →

Frequently Asked Questions

What is an incident in monitoring?

A tracked record of a service disruption from detection to resolution, including the timeline, updates, and outcome.

How is an incident different from an alert?

An alert is a momentary notification; an incident is the ongoing record of the whole event, including its timeline and resolution.

Should incidents be created automatically?

Yes, ideally. Automatic creation on confirmed failure ensures every outage is captured with accurate timestamps.

Why run a post-incident review?

To understand root causes and prevent the same failure from recurring, turning each outage into a learning opportunity.