AWS had a major outage (October, 2025). One month after the incident (November, 2025), Amazon introduced a new Route 53 feature that was reported to help prevent similar disruption. However, if your servers are in one AWS region and it goes down, switching DNS records alone is not enough.
AWS Outage in October 20, 2025
If there is no cloud, there are no cloud-native applications.
The outage affected businesses worldwide (over 3,500 companies across more than 60 countries, and at least 1,000 sites and apps). Websites and platforms went down, including food delivery apps and airline booking systems that run on AWS servers. The outage caused delayed flights, blocked online purchases, disrupted financial transactions, and stopped workers from accessing business systems. Most of their main services were restored by the afternoon of October 20, but some, such as AWS Config, Redshift, and Connect, took longer to recover.
This was the third time in five years that things went wrong in the AWS US-EAST-1 (Northern Virginia) data center. There were similar issues in 2021 and 2020, but AWS has not explained why this region continues to have problems.
%20data%20center.png)
Amazon Web Services data centers in Ashburn
AWS Outage Root Cause and Technical Details
Amazon looked into the problem. They found that an update to their DNS system caused the issues.
It started with DynamoDB crashing. (DynamoDB is a popular database service.)
The DNS system converts website names into IP addresses. When it broke, applications trying to connect to DynamoDB couldn't find it. This is because they couldn't translate the DynamoDB API name into the IP address of its servers. Without being able to find the right IP address, they couldn't establish a connection.
Other AWS services also failed, affecting users everywhere from London to Tokyo. 113 AWS services stopped working.
The problem started inside Amazon's EC2 network, specifically in a subsystem that manages resource allocation. EC2 gives companies virtual servers to run applications and websites. When the problem occurred, Amazon stopped new EC2 virtual machines from being created to prevent the issue from getting worse. As the situation improved, they allowed customers to create new EC2 servers again.
AWS Outage Economic Impact
AWS holds 37% of the world’s cloud market, with Microsoft Azure in second place and Google Cloud in third.
Large companies that rely on these cloud services suffer major losses when the systems go down. Major AWS clients can lose millions of dollars in revenue for every hour that systems are unavailable.
When Google or Microsoft Azure cloud systems fail, companies lose hundreds of billions of dollars. Airlines, factories, and hospitals all experience disruptions.
Financial Services Impact
In Britain, Lloyds Bank, Bank of Scotland, HMRC’s online services, Vodafone, and BT had problems. In the U.S., Coinbase, Robinhood, Venmo, and Perplexity also struggled with outages.
Consumer Services and Apps Impact
Amazon’s shopping site, Prime Video, and Alexa were down.Social and productivity apps Reddit, Roblox, Snapchat, and Duolingo also stopped working. Gaming platforms Fortnite, Clash Royale, and Clash of Clans were offline. Transportation and communication tools Lyft and Zoom could not connect.
Amazon Web Services Introduced a New Feature to Prevent Future Outages
One month after the incident (November, 2025), Amazon introduced a new Route 53 feature that was reported to help prevent similar disruption.
The DNS Control Plane in AWS Route 53 is the API you use to add, change, or remove DNS records. The DNS Data Plane is what actually resolves DNS queries when users access your services.
During major problems in AWS's US East region, Route 53's Data Plane usually remains available because it is globally distributed across many locations. However, the Control Plane can fail during major US East outages, which prevents you from updating DNS records to redirect traffic.
That is why AWS added this new feature with a 60-minute recovery guarantee. But even if you can update DNS records during an outage, that does not mean your applications will work.
Do not rely on DNS updates alone to handle outages. If your servers are in one AWS region and it goes down, switching DNS records alone is not enough. Your application and databases need to be running if you want your service to stay up.
AWS Outage Prevention
Microsoft Azure and Google Cloud also experience outages from time to time. You cannot prevent all downtime in the public cloud. The best approach is to plan well and respond quickly when something goes wrong.
Modern cloud-native applications should be spread across availability zones.
Use active-active deployment for systems that cannot go down. Run the same services in multiple regions at the same time. If one region goes down, traffic automatically switches to the working region. This keeps your services working with no or very little downtime. Alternatively, keep a small backup environment, called a pilot light. Have some databases and common infrastructure set up in a second region and keep them running with minimal resources. When you need to switch to it, scale up those services quickly to handle your production traffic.
Be ready to switch to the backup (secondary) region quickly. If done right, single-cloud typically provides better reliability than multicloud for most organizations, unless specific business requirements necessitate multicloud.
Plan AWS architecture assuming every region can go dark. Your engineers must set up data replication correctly, test the network latency, and figure out how to keep data consistent across different regions.
Using multiple cloud providers for resilience usually costs more. Good single-cloud architecture is easier to plan, build, and scale.
Teams should analyze what makes their systems stop working when something breaks, such as DNS servers and data stores. When they think they have everything figured out, they should test how well the systems work during outages. Critical operations should be ready to launch from backup platforms or even handle tasks manually if the main system crashes. Often, backup plans like these also satisfy compliance requirements, while costing less than completely separate cloud systems.
Gartner research shows that multicloud for resilience usually costs more and is more complicated. Additionally, in a world with only three major cloud providers (AWS, Microsoft Azure, Google Cloud) there’s not a lot of diversity. For most companies, a strong single-cloud design provides better uptime and easier management than multicloud, while still meeting regulatory requirements if you engineer and manage it correctly.
Public cloud is often the most practical and reliable way to get large computing power. To make it work, set up backups from the start in a different region. Practice recovery to understand if your team can restore quickly when a region goes down. Test disaster recovery often so everyone knows what to do. Companies that cut costs by skipping these protective steps, do not have anything to fall back on when AWS systems go down - they are vulnerable.
Rate this article
Recommended posts
Our Clients' Feedback
We have been working for over 10 years and they have become our long-term technology partner. Any software development, programming, or design needs we have had, Belitsoft company has always been able to handle this for us.
Founder from ZensAI (Microsoft)/ formerly Elearningforce