When Good Cloud Goes Bad – Know All About Cloud Outages
Putting your applications into the cloud is easy and affordable. Managing them is simple with a centralized and customizable management portal. This makes the workload of manually configuring and provisioning resources quick and simple. On the downside, this can also lead to Cloud Outages.
Just like any IT system, cloud-based services and servers can suffer from outages, but because of large number of users, the consequences are usually larger.
In spite of this, more corporate customers, end users, recognize that service disruptions are inevitable and the benefits they get from the cloud providers such as AWS or Microsoft Azure, far outweigh the risks associated with its cloud services.
Probable causes for cloud outages
Outages are things that happen, whether your computing is on-premises or another location or in the cloud. Some of the usual causes for the cloud outages are:
- Operator error
- Network issues
- Software upgrades
- Weather events.
Whatever the cause for the cloud outage is, the cloud providers need to do a post analysis of how this was architecturally possible, what exactly happened, how the system recovered, and what improvements should be made.
Armed with this information and effective management, you can protect your users against these kind of issues.
Response to handling the Cloud Outages
The cloud providers have learned that users are more willing to tolerate the outage, if they get an immediate response acknowledging the outage. They should assure customers that they are in fact working to address it, and then follows up afterward with a full report on what went wrong, how they fixed it, and what they’re going to do to prevent it from happening again in the future.
A thorough Root Cause Analysis (RCA) of the cloud outage is necessary to answer why it happened and how to prevent similar events from happening.
Lessons learned from previous cloud outages
Here are some lessons which can help you prepare for the inevitable eventual cloud outage and what you can do to minimize the potential resultant down time and loss of productivity for your organization.
- Have a backup plan
Check with your cloud services provider or hosted service about its disaster recovery policy. Plus always back up your data to include redundancies. Your planning should involve doing a risk/benefits analysis to compare the cost of instituting more redundancies with the cost of downtime.
Communicate your plans with all the stakeholders involved – employees, support teams, cloud providers – both during and after the planning phase. It provides accountability and responsibility to all those involved during the cloud outage.
- Simulate test outage scenarios
Be prepared for a cloud outage by simulating and testing your outage scenarios. Simulate a simple outage affecting one application in one location or across your geo-locations, to gather information on its behavior and consequences. Include their result in your actual outage plans.
- Keep track of the SLAs
A contract with the provider includes the Service Level Agreements (SLAs) that guarantees the level of amount of “guaranteed” up time, often expressed in terms of “nines”. For ex. three nines means no more than 17.52 hours, four nines means no more than 52.56 minutes, seven nines means no more than 3.15 seconds.
What it means is that the provider will adjust your payments to reflect that loss of service (usually in the form of service credits). Before selecting a particular provider, clearly understand the SLA terms and conditions.
- Securing your apps
Providers usually respond to the cloud outage by either automatically or manually turning off the service in question until the issue is resolved. There are various management and monitoring tools which will determine if a particular cloud service is down and which region is affected.
- Self Reliance
Don’t rely always on the providers. Do prior analysis on the outcomes of cloud outage and have suitable plans to overcome the possible risks. Build redundancies into the apps to help you in recognizing technical errors and specification which might slow down your system.
It’s easy to forget that Cloud is not perfect. You can prepare for the inevitable and reduce the cloud outage impact on your business, by dropping us a mail on firstname.lastname@example.org or call us at +91-80-4110-5555.