Advertisement
Top

Cloud outages are not as rare as you might hope

July 26, 2016

Not long ago Apple’s cloud went through a widespread outage at the beginning of June. For almost three hours multiple iCloud and App Store services experienced disruptions, while iCloud users had to postpone their plans involving their data stored in Apple’s cloud. Cloud outages, although less pronounced than 3 or 5 years ago, send a wave of panic among cloud users, since they reiterate all fears linked to losing control over proprietary data when the cloud becomes inaccessible.

In cloud outages, the cause of the incident is not as important as the speed of remedying the issue, and the preventive measured adopted in view of suppressing the possibility of further recurrences.

Prior to the Apple incident, in April 2016 Google Cloud Platform went dark (again), to a greater extent than would have been expected from such a major public cloud provider.

AWS’s outage and the media reaction

Another June 2016 outage raised much more discussions than the one mentioned above. Since “a number of large companies including content streaming providers Stan and Foxtel along with a variety of shopping and media sites” depended on Amazon Web Services in Australia (where the incident took place), this time the service going down provoked a strong media reaction.

The cause? Severe storms and floods that eventually affected the servers, although they were five meters above the ground. However, a storm-water drain whose management did not depend on the data center staff ended up showing that while business data may be in the cloud, its feet are on the ground – vulnerable to floods and damages. The critics were harsh, and perhaps so were the ensuing decisions. Conclusion? Server infrastructure might sometimes prove a lot more costly if future incidents are to be avoided, and its design should take into consideration the risks that are out of the reach of the cloud provider.

All cloud outages become “cautionary tales” that in turn affect the industry, because organizations take a few steps back in their cloud adoption policy whenever they consider the possibility of randomly being forced to interrupt their activity due to the cloud being unattainable.

Bigger clients make failure unacceptable

It is true that outages are less pervasive than they used to be, because each mistake becomes a lesson earned and the infrastructure is continuously improving. However, the time it takes providers to strengthen their hardware capabilities and come up with better, more viable plans also counts as time in which the cloud industry gains more and more big clients. These huge client companies are in turn responsible for a lot of other businesses who employ their services. Even a short service interruption opens up a chain of causality that is simply unacceptable for many organizations. Nobody enjoys apologizing in front of angry clients for a fault that does not belong to them. Money and reputation are equally on the line here – and ultimately the question of cloud’s reliability and support.

You may check here an Information Week analysis based on the Salesforce outage. When a company whose operations are of a Software-as-a-Service type goes down, damages are to follow, and it’s a question of contract and perhaps negotiation who is going to mitigate them.

This time the service outage affected the US West Coast, and lasted nearly a full day. It is easy to see how developers, salespeople, vendors and all those depending of Salesforce’s partnership and services found their operations brought to a halt or at least seriously hindered. The company’s explanation pointed out that a database failure caused the outage. As the CRO of a client company mentioned, the incident caused the loss of “a whole day of work in revenue growth and selling, customer service, operations, billing, and overall productivity”.

Although the organizations that already got a taste of the cloud might not be entirely be put off by such incidents, it is clear that they erode the confidence necessary in business partnerships. When affected by such problems, businesses might not leave the cloud, but they surely start planning to change their provider.

Comparing present cloud outages with past ones

A CRN article on the 10 biggest 2015 cloud outages lists the 40 hours Verizon Cloud outage (a scheduled event), the one hour Google IaaS outage deemed “unacceptable” by the company itself, the yet another Google IaaS error that lasted 45 minutes on March 9, the almost 12 hours March 11 Apple iCloud outage, or the 2 hours outage that brought down Microsoft Azure, as well as other incidents.

Turns out these events are taking place more often that one would have thought – when looking at the past year in review, in terms of cloud outages. Well, at least one might say they are less “scary” than the widespread outages that made this top, dating back to 2009 and upwards.

With weeks’ worth of wait for the damages to be completely repaired (delayed mails, personal data becoming inaccessible), the customers affected by these dreadful incidents learned once and for all the value of backup strategies. From faulty company infrastructure causing the outages (see the Yahoo Mail ones), to stormy weather and floods (The Hurricane Sandy taking out servers as well as buildings and cars), the incidents measure considerable down times , as well as financial consequences, enough to scare the more precautionary companies back into on-premises servers.

It is useful to notice that with or without outages the future of work includes cloud computing, and while public cloud has its own benefits and private cloud hold different strategic advantages, the hybrid cloud solution might be the way to go for an extra backup garnished with all the flexibility and new development solutions relying on public cloud.