Definition
A timeout is the maximum time a monitoring check will wait for a response before considering it failed. If your endpoint hasn't responded within the timeout, the check is marked down — even if a response would eventually have arrived.
The timeout defines the line between "slow but working" and "effectively down." Set it too low and you'll get false failures on legitimately slow responses; set it too high and a hanging endpoint takes far too long to be flagged.
Why It Matters
The timeout encodes how patient you're willing to be. For users, an endpoint that takes 30 seconds is broken in practice, so a sensible timeout makes your monitoring reflect real usability. It also prevents a single hanging request from delaying detection of a genuine problem.
How It Works
When a check runs, the monitor starts a clock. If a complete response arrives before the timeout, the check proceeds to status-code evaluation. If the timeout elapses first, the check is recorded as a failure (a timeout error) and, depending on configuration, triggers an alert and incident.
Real-World Example
An API usually responds in 200 ms, and its monitor uses a 10-second timeout. During a database stall it starts taking 25 seconds to respond. The check times out at 10 seconds and is marked down — correctly treating a 25-second response as unusable rather than waiting indefinitely.
Best Practices
- Set the timeout above normal response time but below 'unusable'
- Account for occasional legitimate slowness to avoid false failures
- Use a stricter timeout on endpoints where speed is critical
- Pair the timeout with a slow-response threshold for early warning
- Review timeouts if your endpoint's normal latency changes
Common Mistakes
- Setting the timeout so low that normal slow responses fail
- Setting it so high a hanging endpoint takes ages to flag
- Using one timeout for endpoints with very different latencies
- Confusing the timeout with the check interval
- Never revisiting the timeout as the service evolves
In Monitoristic
Monitoristic lets you set a per-monitor timeout; if the endpoint doesn't respond within it, the check is marked failed. Combine it with the slow-response threshold to catch endpoints that are still up but degrading.