The High Cost of “Unpredictable” IT Outages and Disruptions

Morgan Oats - 1 May 2017

(Updated October 10, 2019)

It is no secret that IT service outages and disruptions can cost companies anywhere from thousands up to millions of dollars per incident – plus significant damage to company reputation and customer satisfaction. In the most high profile cases, such as recent IT outages at some major US airlines, the costs can soar to over $150 million per incident (Major US airlines hit with systemwide outages).

While those kinds of major incidents make the headlines, there are thousands of lesser known, but still just as disruptive to business, service level disruptions and outages happening daily in just about every sizeable enterprise.

The costs of these often daily occurring incidents, like an unexpected slowdown in response time of a key business application during prime shift (or during Black Friday), can have a significant cumulative financial impact that may not be readily visible in the company’s accounting system.

Analyst Estimates of IT Service Incidents

ISO defines an ‘incident’ as: “an unplanned interruption to a service, a reduction in the quality of a service or an event that has not yet impacted the service to the customer.”

Determining the cost of service incidents is not a new endeavor. There have been numerous studies and surveys done in order to reach a discernible figure. Across the various studies, the sample size and company composition have varied, but the consensus remains that downtime is, as expected, expensive.

According to these studies, this is the cost of service downtime:

ITIC: $100,000 per hour for 98% of businesses
Gartner: $5,600 per minute or $300k per hour.
Avaya: $140k per incident for the average company. $540k per incident for the financial sector.
IDC: $1.25 billion to $2.5 billion per year for Fortune 1000 companies with $1.39 billion revenue.
IHS: $700 billion per year for North American companies, according to IT decision-makers at 400 medium and large organizations in North America that use information and communication technology.

Availability Downtime Costs Categories to Consider

Application performance incidents can sometimes be directly translated into ‘hard costs’:

Employee headcount spent ‘firefighting’ performance issues – plus overtime costs or contractor costs. This can be broader than only the IT employees working the specific issue – such as call center personnel trying to calm upset users or customers
Service Level Agreement (SLA) financial penalties
Lost or deferred revenue if the applications affected are revenue generating
Government fines if the applications affected have regulatory requirements
Litigation or settlement costs

So it is obvious that disruptions cost organizations financially, but there are also other negative effects or ‘soft costs’ which can linger on beyond the immediate financial impact, such as:

Customer or user satisfaction decreases
Loss of reputation or brand image
Loss of employee morale and increased turnover from employees who are constantly called in at off hours or the weekends to deal with incidents
Negative image of your IT team and your management
Lost market opportunities

The exact cost figures and financial categories may vary, but the conclusion must be the same: it is very costly to have IT performance issues and foolish not to focus on using the latest technology available to prevent them before they occur. Wherever your company falls within the spectrum of incident / downtime costs – you cannot afford it – not when the possibility of preventing most service disruptions is entirely feasible with modern software technologies.

Modern Approaches to Preventing IT Outages and Disruptions

Using modern artificial intelligence techniques, combined with automated analysis of end-to-end infrastructure performance for key business applications, offers the ability to catch many incidents before they impact business users. In our experience, over 85% of ‘unpredictable’ incidents in enterprises could have been easily predicted – even up to days in advance of the actual service disruption.

To predict incidents, however, requires continual automated analysis of critical ‘leading indicator’ metrics across the infrastructure combined with knowledge of what is ‘good or bad’ for that specific infrastructure component. One of our customers, Medavie Blue Cross, provides an example of how easy it can be to make hidden issues visible when using a modern approach to infrastructure performance that includes those capabilities:

“We installed IntelliMagic Vision and looked at the fabric dashboard. It immediately showed an issue that had been hidden until then….IntelliMagic is crucial to avoiding performance and configuration issues” – Marc LeBlanc, Medavie Blue Cross Storage Administrator

So there is an 85% chance your most recent infrastructure incident could have been predicted and its associated hard and soft costs could have been avoided. If you are interested to see for yourself, you can click here to see where your next ‘unpredictable’ performance issue will be by providing IntelliMagic a week of performance metadata from your current environment and getting back a customized analysis of risk areas.

Regardless of if your company’s cost per incident is $10,000 or $150 million, being able to significantly reduce the volume of them will generate meaningful cost savings very quickly. And perhaps more importantly it will keep your company out of the press headlines – your CEO will thank you for that.

How to use Processor Cache Optimization to Reduce z Systems Costs

Optimizing processor cache can significantly reduce CPU consumption, and thus z Systems software costs, for your workload on modern z Systems processors. This paper shows how you can identify areas for improvement and measure the results, using data from SMF 113 records.

Read White Paper

Book a Demo or Connect With an Expert

Discuss your technical or sales-related questions with our mainframe experts today

Speak with an expert Schedule a demo

The High Cost of “Unpredictable” IT Outages and Disruptions

Analyst Estimates of IT Service Incidents

Availability Downtime Costs Categories to Consider

Modern Approaches to Preventing IT Outages and Disruptions

How to use Processor Cache Optimization to Reduce z Systems Costs

Related

A Mainframe Roundtable: The Leaders | IntelliMagic zAcademy

Challenging the Skills Gap – The Next Generation Mainframers | IntelliMagic zAcademy

New to z/OS Performance? 10 Ways to Help Maintain Performance and Cost | IntelliMagic zAcademy

Book a Demo or Connect With an Expert

Analyst Estimates of IT Service Incidents

Availability Downtime Costs Categories to Consider

Modern Approaches to Preventing IT Outages and Disruptions

How to use Processor Cache Optimization to Reduce z Systems Costs

This article's author

Share this blog

z/OS Performance Monitors – Why Real-Time is Too Late

Related

A Mainframe Roundtable: The Leaders | IntelliMagic zAcademy

Challenging the Skills Gap – The Next Generation Mainframers | IntelliMagic zAcademy

New to z/OS Performance? 10 Ways to Help Maintain Performance and Cost | IntelliMagic zAcademy

Book a Demo or Connect With an Expert