4 Lessons from This Year’s Top Data Failures

4 Lessons from This Year’s Top Data Failures

If you ask most people in the IT industry and some in the business world what would be the ideal data storage, protection and recovery scenario, most people would have very strong preferences towards one type of scenario. This may be influenced by any number of factors, from their own experience to something they read online, and the opinions are as numerous and varied as the people who have them.

To complicate things, nothing related to technology ever stands still, so what may have been true a year or two ago, is likely to not have been as accurate or applicable as it is used to be, so getting attached to those strong opinions on a particular tech doesn’t necessarily pay. What pays is staying abreast of all the ebb and flow in the industry, and learning from others’ mistakes, especially when they are as costly as some of the data failures and outages we’ve witnessed this year so far. Let’s take a look at some of them and what conclusions could be drawn from them.

1. Company and infrastructure size are not always good predictors of its failover and recovery success.

One school of thought when it comes to the question of where to put your data absolutely in the firmest of ways believes that keeping it in-house is the safest and most secure way to go. When the company is a large enterprise with resources to invest in a state of the art geographically diverse infrastructure, that surely means their data is as safe as can be. Recent costly outages that some of the largest US airlines experienced this year may prove otherwise.

  • The Delta Airlines data center outage this August grounded about 2,000 flights over three days and cost the company $150 million and was caused by electrical equipment failure at one of its data centers.
  • Southwest Airlines experienced an outage in July, which led it to cancel flights over three days. While it didn’t disclose the exact cost of the outage, CNN estimated it to have been at least $177 million in lost passenger revenue, based on ratios Southwest did provide.
  • Also in July, The National Science Foundation suffered a power outage that knocked out its data centers, networks and business applications crippling its systems for close to a full 24 hours.
  • The State of Virginia Department of Motor Vehicles lost access to its IT systems for hours on May 24th as a result of a data center outage that disrupted network access for more than 60 state agencies.

2. The size of a hosting cloud provider company is also not always a good predictor of failover and recovery success.

The second school of thought in the world of data management is that the cloud is always the safer option, especially when it comes to the cloud storage giants. While it is true that the cloud technology is always evolving and improving, that doesn’t mean it’s failsafe.

  • Microsoft’s Azure cloud service has been experiencing significant outages in Europe and India over the course of three days in early September, leading to a torrent of angry Tweets from customers there.
  • Also in September, Cogeco Peer1’s data center in Atlanta experienced a partial power outage Thursday afternoon, affecting some of the customers in the facility.
  • One of Equinix’s Telecity data centers in London was affected in July by an outage caused by a problem with a UPS system.
  • In Australia, The Amazon Web Services AWS stated its utility provider suffered a failure at a regional substation this June, which resulted in the total loss of utility power to multiple AWS facilities. At one of these facilities, the power redundancy didn’t work as designed. As a result, AWS customers in Australia lost services for up to six hours.

3. Telecom companies are not necessarily better protected against data outages than cloud-only providers

Telecom systems’ infrastructure and service protection has been a matter of national security since its early days, historically enforcing some of the toughest requirements on their facilities, up to a point of being able to remain operational in the wake of an EMP blast overhead (an interesting fact I’ve heard once on a tour of a telephone company’s facility). These tough requirements should certainly mean the higher degree of reliability when it comes to data storage. Recent outages at different telecom providers show otherwise. And, with the advent of VOIP technology, outages no longer affect only the Internet, but also telephone service as well.

  • A power failure at a data center brought down Comcast Business phone service for nearly 950,000 customers this July.
  • In January, all JetBlue data center based cervices failed after a maintenance issue caused a power failure at the airline’s Verizon owned and operated data center for several hours.
  • When Telia Carrier, the backbone network operator arm of the Swedish telco TeliaSonera was having problems on June 21st, it caused a glitchy performance, for a whole range of popular sites and services, from Amazon’s infrastructure cloud to Reddit and Facebook’s WhatsApp.
  • This September, Vodafone has suffered a seven-hour TITSUP (Total Inability To Support Usual Performance) giving its customer seven hours of ‘try again later’.

4. Using a cloud-based SaaS solution doesn’t always mean better reliability for your business’s operations.

The SaaS services are now used so largely at every level of business that establishing a company-wide SaaS policy is now a common and necessary practice for IT departments. One of the main advantages of the SaaS solutions is that data is no longer stored on a server inside the organization, and usually accessible from anywhere with a secure internet connection. While largely being true, when the failure happens, as it inevitably does, it can ground business operations to a halt.

  • In London, this August, the power outage at the Solihull Data Centre lasting 10 days has prompted the Insurance tech biz SSP Worldwide to permanently decommission it. The outage had a huge impact on customers unable to access SSP’s services. Over 40 per cent of UK Brokers rely on SSP’s SaaS platform in business operations.
  • When in May Salesforce experienced an outage and service disruption to the NA14 instance, its tech-savvy customers took to Twitter to complain and organizations were prompted to evaluate the best way to work with cloud software providers.

The bottom line is…

At the end of the day, “The Cloud” is not an ephemeral place that effortlessly takes your data and transforms it somehow to allow it to never be lost or inaccessible ever again. The cloud is made up of buildings, hardware, and the Internet to connect it all together and bring it to the screen near you. There is no truly perfect solution when it comes to ensuring a ZERO chance of failure. For the large enterprise, the cautionary data operations tale seems to be to not put all the eggs in one basket, whether you hold on to that basket or give it to someone else for safekeeping. Hybrid multi-cloud architecture so far may be the best bet at the moment, so as to make sure that your cloud backup has another backup with another provider with its own data centers. This arrangement, of course, has its own quirks and is only as good as the people who design the architecture and perform the regular testing and monitoring to provide the best chance at lowering your risk of data loss or outage.

About the Author

Nadya Shkurdyuk
Nadya wears many hats as the head of business development at Prominic. Most of her free time is spent chasing after an active toddler, helping her husband run a family business and learning about something or other.

Leave a Reply