Mitigating against Public Cloud Disaster: The Most Recent AWS Outage

A big Amazon Web Services (AWS) outage last week affected a large number of Internet services and gadgets at a time when making online sales and using cloud-based remote working platforms are vital for struggling businesses.

AWS

AWS, which is responsible for around 15 per cent of Amazon’s massive overall revenue, is made up of 175+ products and services such as computing, storage, networking, database, analytics, app services, mobile, developer tools, and tools for the Internet of Things (IoT). AWS revenue in 2019 was $35.03 billion.

Critical

The popularity and size of AWS as the biggest of the cloud companies (IaaS and PaaS) means that it is a critical backbone of a huge number of websites and apps worldwide, which is what makes a big outage such a potentially damaging event with economic knock-on effects that have been felt around the world.

What Happened?

Last Wednesday, until 4.18 am on Thursday morning an AWS outage affected 23 AWS geographic regions with representatives from many apps, services, and websites taking to social media to describe how they had been impacted.  Those highlighting the effects included Roku, Adobe Spark, The Washington Post (owned by Amazon boss Jeff Bezos), and iRobot and Flickr.

IoT Casualties

The outage was reported to have caused IoT gadgets such as robot vacuums and smart doorbells to suddenly stop working.  For example, the Home App responsible for operating iRobot’s Roomba robot vacuum stopped working as did Amazon’s own Ring smart doorbells.

The Cause

Ironically, the outage is believed to have been caused by Amazon introducing a small addition of capacity that was intended to improve the service.

What Does This Mean For Your Business?

Although it didn’t last for long, the fact that so many businesses now rely upon AWS, (which has more than 45 per cent of the global cloud computing market), meant that the effects were widespread and are likely to have been disruptive, costly and potentially dangerous in some cases.  This incident could, therefore, be viewed as an example of why having only a few large companies managing cloud computing globally is not an ideal situation.  This is an issue that has been the subject of discussion and suggestions in recent times, such as the study from Roland Berger and the Internet Economy Foundation (IE.F) which highlighted the possible benefits of multi-cloud solutions for companies and public administrations and state rule-setting to help ensure fair competition in cloud computing.

Companies and organisations around the world have needed to rely heavily on cloud-based services e.g. communications and collaborative working platforms due to the pandemic which makes this outage all the more worrying.

What Can Businesses Learn from This Outage?

When adopting cloud infrastructure, ultimately we are entrusting cloud providers (albeit the largest organisations in the world) to manage the infrastructure that provides the backbone to the cloud operation.

This means that considerations should be made for failover and redundancy, including in multiple geographic regions, as demonstrated by this outage. You cannot plan for every failure, but the more you design your solution around the potential for failure or incidents within the public cloud, the more robust the solution becomes.

Organisations need to prepare for online disruptions just as they would prepare for problems with on-premises services. And that requires a proper data protection and recovery plan to ensure that your own business doesn’t suffer when trouble arises in the cloud.

For more information about our public cloud services, please contact us.