Cloud Outage: Why and How Does It Happen?

The more IT relies on cloud services, the more likely you are to suffer downtime and revenue losses due to a cloud outage. Over 60% of organizations that use the public cloud report losses in 2022 due to these incidents, so outages are not a freak occurrence companies are unlikely to face.

But are outages enough of a reason to leave the cloud for good? Or should you stick with this infrastructure type despite the risk of occasional downtime?

This article goes through everything you need to know about cloud outages. We outline their main causes, examine eye-opening stats, show how to minimize the impact of cloud downtime, and look at the most impactful outages that occurred in recent years.

Cloud outage explained

What Is a Cloud Outage?

A cloud outage is a time span during which a cloud provider's services are unavailable to end-users. The vendor's infrastructure goes down (due to a bug, power failure, etc.), and the clients lose access to cloud-based assets until the provider fixes the issue.

Impact-wise, there's no difference between an on-site data center going down and a cloud outage. You lose access to IT assets in both cases, but the hands-off approach to cloud computing adds a few unique considerations:

Like with local hardware, there are two types of could outages:

Recent studies reveal that unplanned outages cost 35% more than planned downtime (both on-prem and in the cloud). The price difference exists because unexpected incidents take longer to identify and fix—and the longer an outage lasts, the bigger the damage.

Compared to on-site hardware, cloud-based infrastructure results in more frequent downtime but with less severity. Since no hosting system provides 100% uptime, clients are ready to tolerate occasional outages in return for cloud computing advantages. This willingness is also evident in market growth—the cloud will make 14.2% of the total global IT spending in 2024 (up from 9.1% in 2020).

If regular outages are a deal-breaker, consider switching to a Bare Metal Cloud (BMC). BMC lets you host assets on a bare-metal dedicated server, a more reliable platform that also possesses cloud-like agility (near-instant scalability, deployments that require only a few clicks, various automation features, etc.).


Cloud outage stats

Cloud Outage Causes

Cloud outages result from a number of causes both within and beyond the provider's control. Here's a list of the most common ones:

When something goes wrong at a hosting facility, the availability of data and user-facing apps becomes a top priority. Our article on high availability offers an in-depth look at how companies ensure end-users do not feel the impact of data center issues.

What Happens When Cloud Goes Down?

In the best-case scenario, a cloud outage lasts only a few minutes and affects a small number of users or services. In the worst case, an outage paralyzes a client's business for half a day or longer. A company loses access to all cloud-based assets and stays cut off until the outage ends.

While threatening, mistakes by third-party providers were the cause of "only" 7% of severe outages in 2021. A severe outage must involve one (or several) of the following:

While there are more pressing concerns (as shown in the donut chart below), remember that an average minute of downtime costs $5,600 (this per-minute figure goes to $9,000 for enterprises). If you are unprepared (i.e., you have no data backup strategies, disaster recovery, etc.), a cloud outage could grind your service to a halt and cause massive hits to the bottom line.

A company that keeps a small segment of operations in the cloud is less vulnerable to outages. For example, if you only host emails in the cloud, even a day-long outage is not catastrophic. You can wait out the incident or run apps with reduced functionality, a strategy that does not work if you use the cloud to run an IoT platform or perform payment processing. 

In some cases, cloud outage leads to permanent data loss (the amount of lost data depends on the frequency of backups). Also, clients in strict industries are liable for legal fines if an outage leads to a data breach or leakage, so be careful when deciding what you keep in cloud storage.

Leading causes of IT downtime

What Can Users Do?

Here's what companies do to mitigate the impact of cloud outages:

PhoenixNAP's backup and Disaster-Recovery-as-a-Service (DRaaS) offerings help prepare both data and infrastructure for prolonged cloud outages, ensuring you can weather any amount of provider downtime.

Biggest Recent Cloud Outages

Cloud outages are unavoidable when using the cloud, and even the most popular providers (like Azure, AWS, and Google Cloud) are not immune to downtime. Let's look at some of the most significant cloud outages in recent history.

Azure Outage (October 2021)

In October 2021, Microsoft Azure suffered a disruption that took down virtual machine services for six hours. For the duration of the outage, many users were unable to deploy new VMs or update extensions. Basic service management operations (such as start, create, and delete) also led to errors.

The cause of the cloud outage was the inability of VM queries to retrieve the required version data of an artifact. A post-recovery report revealed that the software-based mistake occurred when Microsoft migrated one of its VM architectures.

Google Cloud Outage (November 2021)

Google Cloud went down for about two hours in mid-November last year, affecting the likes of:

Impacted websites displayed 404 errors when visitors tried to access them. Google reported that the cause for the cloud outage was a glitch in a network configuration responsible for load balancing.

AWS Outage (December 2021)

A large connection activity surge overwhelmed networking devices in one of AWS's flagship facilities, affecting various websites and apps. Some of the most notable "victims" were:

The data center issue caused severe latency within internal AWS networks. Customer apps felt the ripple effects, suffering traffic delays or total shutdowns for about seven hours.

Two Subsequent IBM Outages (January 2022)

An issue with IBM's infrastructure affected cloud services in the Dallas region for over five hours. The in-house team resolved the problem but accidentally caused an additional hour-long issue with virtual private cloud. The secondary problem affected users across the globe, including the USA, Japan, Canada, and Germany.

AWS/Slack Outage (February 2022)

Slack suffered an outage of its AWS cloud resources in February which prevented normal use of the communication platform for five hours. Over 11,000 reported users were unable to:

Slack's team never shared the reason behind the cloud outage and requested all affected users to restart the app and clear their cache following the recovery.

Examples of cloud outages

iCloud Outage (March 2022)

Fifteen major Apple services went down for four hours in March due to a cloud outage, including:

Apple's corporate and retail systems went down, too. The company later revealed that the root cause was a problem related to the company's domain name system (DNS).

Google Cloud Outage (March 2022)

On March 8, 2022, users of Google Cloud suffered service errors for two and a half hours. Spotify and Discord were among those hit by the outage.

A change to the Traffic Director code for processing configurations caused the error. According to the post-recovery report, bad code changes neglected configuration data format migrations, so the platform inadvertently deleted the user's programming.

Atlassian Outage (April 2022)

The year's biggest Atlassian outage started on April 5 and ended on April 18 (although some users started restoring services by April 8). The company explained that the outage occurred because of inadequate team communication and a poorly-planned incident response plan.

Although this cloud outage lasted almost two weeks for some users, there were no reports of significant losses of client data. However, users of both Atlassian's flagship products, Trello and Jira, were affected by the issue.

Microsoft Azure Outage (June 2022)

On June 7, Azure customers could not connect to resources hosted in the East US 2 region (mainly Virginia). The outage lasted about twelve hours and did not affect consumers relying on zone-redundant infrastructure. Compromised services included:

The culprit was a sudden power oscillation in one of the local data centers, which caused Air Handling Units (AHUs) to shut down.

Cloudflare Outage (June 2022)

In June, an accidental outage at Cloudflare caused major disruptions that lasted an hour and a half, taking down popular sites such as:

The San Francisco-based vendor explained that the unplanned downtime resulted from a change to the network configuration in 19 of its data centers.

If public cloud outages are a concern, consider data repatriation or cloud repatriation. Repatriation takes cloud-based assets (data, apps, workloads, etc.) back on-site to bare metal.

Do Not Overlook the Value of Cloud Outage Planning

Examples of cloud outages in recent years send a clear message: even though the cloud is an IT game-changer, the tech is not foolproof. Companies that care about end-users and app availability must be ready for occasional downtime, which makes backup and disaster recovery (BDR) an integral part of using cloud-based resources.