When Amazon Web Services Goes Down: A Survival Guide for Your Team
Your phone buzzes with a customer email: "Your app isn't loading." You check — it's fine. You check from a VPN in another region — blank page. You check AWS Health Dashboard — green across the board. Twitter tells the real story: us-east-1 is having issues. AWS's status page won't reflect it for another 40 minutes.
What Happens on Your Team
The DevOps / SRE Engineer
Checks CloudWatch — metrics are delayed. Checks the AWS Health Dashboard — shows "No recent events." Checks Twitter — #awsdown is trending. Starts calling through the services: EC2 instances responding? RDS healthy? S3 accessible? ElastiCache alive?
The real cost: AWS outages require triage across multiple services. Your app might use EC2, RDS, S3, CloudFront, and Lambda — any of which could be the root cause. Without external monitoring, you're debugging from inside the problem. If your monitoring runs on AWS and AWS is down, you're blind.
What they should have had: External monitoring that doesn't run on AWS. A monitor on your public endpoint, hosted outside AWS infrastructure, tells you whether users can reach your app — regardless of what AWS's own dashboards say.
The Support Lead
Customers are reporting the app is down. The support team can see it's down but doesn't know the cause or timeline. They draft a response: "We're investigating" — but have nothing specific to share.
The real cost: Without a status page, every customer becomes a support ticket. During a major AWS outage, ticket volume can spike 10x in an hour. The support team is overwhelmed with "is it down?" while engineering is busy investigating.
What they should have had: A public status page that updates automatically when the monitor detects downtime. Customers check the page instead of filing tickets. The support team references one link instead of writing custom responses.
The CTO
Asks the team: "Are we single-region?" "Do we have failover?" "What's our RTO?" In most cases, the answers reveal that disaster recovery was planned but never fully implemented.
The real cost: AWS outages expose infrastructure decisions that were deferred. Multi-region failover was on the roadmap but never prioritized. Now the CTO is making architecture decisions during an active incident — the worst time to make them.
What they should have had: Historical uptime data showing how often and how long their endpoints have been affected by AWS issues. Quarterly reviews of this data drive infrastructure investment decisions before the next outage.
Why Monitor Amazon Web Services?
AWS powers a significant portion of the internet. A regional outage can take down your servers, databases, CDN, and storage. AWS's own status page has historically been slow to update during major incidents.
What to Monitor
your-app.comYour application's public endpointyour-app.com/api/healthBackend health check endpointyour-app.com/api/statusService status endpointWhat You Should Actually Do
- 1Monitor your application externally — your monitoring should not run on the same infrastructure as your app
- 2Monitor your health check endpoint — not aws.amazon.com, but YOUR application's public URL
- 3Set up non-AWS alert channels — if your alerts go through SES and AWS email is down, you get nothing
- 4Create a status page — your customers need a non-AWS-dependent place to check your status
- 5Bookmark health.aws.amazon.com — but don't rely on it as your only signal
Amazon Web Services's Official Status Page
Amazon Web Services publishes real-time status at health.aws.amazon.com. Monitoristic doesn't replace this — it complements it. The official page tells you when Amazon Web Services reports an issue. Your own monitor tells you when your connection is affected, often before the status page updates. You also get push alerts instead of checking a webpage manually.
The most dangerous thing about an AWS outage is that your monitoring might be on AWS too. If your app, your monitoring, your alerts, and your status page all run on the same infrastructure, a single outage makes you completely blind. External monitoring — checking your app from outside AWS — is the only way to know what your users see when things go wrong.
Related Reading
Skip the panic. Know in 60 seconds.
Start Monitoring Amazon Web Services →Plans from $5/month · 14-day money-back guarantee