Top

Alleviating cloud outages in advance, valuable for companies

September 22, 2016

Category:

Cloud outages are a dreadful perspective to contemplate when businesses have trusted their data storage, computing operations and communications to public or hybrid cloud service providers. Of course, private cloud (and by this we mean on-premises) can also suffer outages, but since the causality lies within the enterprise, the process of reducing the risks or alleviating the effects in advance is part of a more complex insider strategy.

Therefore we try to approach the issue of preparing for possible cloud outages in advance, where these outages are not in the least controllable from within the company. Enterprises that are cloud providers’ clients and basically fall into the mere role of cloud users may find themselves taken by surprise when the cloud is out, if not completely infuriated.

The main goal is minimizing the losses, since even when responsibility and liability is established and followed through, some consequences can prove irreparable. Therefore, what can companies do to prepare themselves for eventual cloud outages, on the line of diminishing the unpleasant consequences?

Learning from outages

It is always better to learn from the mistakes or misshapen of others than to go through the same issues yourself. An AWS cloud outage for example became the base for a specialized analysis of what companies can do to prevent or minimize the losses. Proactive measures consist of:

  • Determining where your data is located (multiple locations are preferable to a single one, since an occurring outage could completely block the access to enterprise data stored in a single location, whereas multiple deployment allows second device fail-over and operation continuity);
  • Consider smaller cloud service providers, which are eager to showcase differentiating technologies and services; explore the provider’s service pack;
  • Design your cloud-based computer system with redundancy and fault tolerance in mind; building resiliency into the software applications and the developing operations could save companies a lot of troubles.

Easy steps in preparing in advance

The same stance mentioned above (taking measures in advance and strategizing for better surviving possible outages) can be broke down in a few essential steps:

  • Study your provider’s backup policy in detail before signing up for its services;
  • Maintain your self-reliance by instituting your own on-premises backup;
  • Consider (and adopt) hybrid cloud solutions;
  • Perform regular test restores for data recovery in the event of an outage.

Strategize, or else… (Netflix’s example)

The same AWS outage from 2015 made some users rethink their strategies in order to fight possible disruptions – in what has been entitled “chaos engineering” tactics.

In fact, this strategy is beyond implementing theoretical measures – the companies that employ such methods induce failures into their systems to simulate naturally occurring incidents, for the purpose of determining what are the problems they should solve. Netflix may well inspire others in what this type of approach is concerned, and its declared goal was to make its systems “resilient to any of the underlying dependencies”.

Of course, this approach involves extra costs, most of the related to the storage tier duplication, because these simulations need to take place on an extra technology layer – be it a duplicate software layer, in order not to risk disturbing the real-time operations.

Although some pointed out that the provider should have intervened to redirect the affected cloud services as soon as possible to a different data center, the cloud service customers cannot afford to wait until this policy is enforced, when their own operations and customers depend on a quick comeback. Duplicating data and testing on-premises recovery measures might be costly, but it can prove extremely useful.

Takeaways on ensuring company data is always available

Although preparing for the worst is something organizations may find unpleasant, especially when handpicking the most reliable cloud providers, this way of thinking is the most precautionary. A few important takeaways on what can organizations do to minimize cloud outages’ effects by their own forces would be:

  • Never rely solely on Service Level Agreements, instead complete your own disaster recovery plan;
  • Prepare an “service escrow” component to the above-mentioned plan;
  • Make your entire staff active components of the Service Disaster Recovery Plan;
  • Ensure efficient data access, such as end user backups and data storage in multiple locations; there are workarounds that can give at least partial user access when the system is hit by a cloud outage, but core data-sets access is essential for this;
  • Prepare multiple communication channels that would ensure updates and status reports are promptly being delivered to your customers in case of an unpleasant event, in order to safeguard at least the customer relations part of your reputation; perhaps cloud outages cannot be helped if they don’t depend on your company, but the way the aftermath is handled can serve in smoothing out the consequences and showing that you are doing all that is possible to speed up the remediation;
  • One of the most important steps in advance preparation consists of fully understanding the way cloud computing works beforehand; it is indeed critical to get informed in time, to conduct case scenarios analysis in order to fully anticipate what might happen in case of an outage, as well as to have your bases covered in what your own customer’s questions are concerned.