The Real Cost of Downtime

What is involved in calculating the cost of downtime?

When it comes to calculating the cost of downtime, the answer may not be as obvious as you think. Unexpected IT outages can unleash a procession of consequences that are both direct and indirect, tangible and intangible, short term and long term, immediate and far reaching. This article is based on US studies and cost are quoted in US $ amounts (The principles, however, are valid for any country and currency). These costs include:

Tangible/Direct Costs   

Lost transaction revenue
Lost wages
Lost inventory
Remedial labour costs
Marketing costs
Bank fees
Legal penalties

Intangible/Indirect Costs

Lost business opportunities
Loss of employees and/or employee morale
Decrease in stock value
Loss of customer/partner goodwill
Brand damage
Driving business to competitor
Bad publicity/press

The monetary value that can be assigned to each hour of downtime varies widely depending upon the nature of your business, the size of your company and the criticality of your IT systems to primary revenue generating processes. For instance, a global financial services firm may lose millions of dollars for every hour of downtime, whereas a small manufacturer that uses IT as an administrative tool would lose only a margin of productivity.

Are there industry averages that I can use to benchmark my downtime costs?

Yes. On average, businesses lose between $84,000 and $108,000 (US) for every hour of IT system downtime according to estimates from studies and surveys performed by IT industry analyst firms. In addition, financial services, telecommunications, manufacturing and energy are high on the list of industries with a high rate of revenue loss during IT downtime.

Typical Hourly Cost of Downtime by Industry

Brokerage Service            $ 6.48 million
Energy                                  2.8 million
Telecom                               2.0 million
Manufacturing                  1.6 million
Retail                                     1.1 million
Health Care                        636,000
Media                                   90,000

Sources: Network Computing, the Meta Group and Contingency Planning Research. All figures U.S. dollars.

What is the “true cost of downtime”?

While idled labour and lower productivity costs may seem to be the most substantial cost of downtime, any true cost of downtime estimate should include the value of the opportunities that were lost when the applications were not available.

For example, consider a company that averages a gross profit margin of $100,000 per hour from web and telemarketing sales. If its order-processing systems crash for an hour, making it impossible to take orders, what is the cost of the outage? The easy, but erroneous, answer would be $100,000. Some customers will be persistent and call or click back at another time. Sales to them are not lost; cash flow is simply delayed. However, some prospects and customers will give up and go to a competitor.

Still, the value of the purchases that these customers would have made during the outage likely underestimates the loss to the company since a satisfied customer can become a loyal customer. Dissatisfied customers, or prospects that never become customers, do not. Consider a prospect who would have made an immediate $100 purchase and then repeated that purchase once a year. Using a standard discounted cash-flow rate of 15 percent, the present value of those purchases over a 20-year period is $719.82. In this example, the company’s loss is more than seven times the value of the first lost sale. (A lower discount rate produces an even higher value.)

How do I determine the true cost of downtime for my organisation?

While it is difficult to precisely calculate a definitive cost of downtime, the following steps will help you develop a very close estimate.

Step 1.
Understand Your Current Reliability—One step in projecting the number of hours that a system will be down each year is to estimate the system’s reliability. This does not equate to the reliability numbers provided by hardware vendors because a system depends on a combination of hardware, software and networking components. To use a network-based system, at a minimum, all of the following must work:

Power supplies.
Central processing units (in all relevant servers and client computers).
Operating systems running all participating systems.
Server disk drives.
Database management system on the servers.
Application software.
Network switching and routing devices.
Network connections.

For example, 99 percent CPU reliability does not necessarily mean that the system will experience 99 percent uptime. If the system depends on 10 components, each of which is 99 percent reliable, reliability for the whole system is about 90.44 percent, according to statistical probability.

Therefore, this system is expected to be unavailable about 9.56 percent of the time, which, in a 24-hour-a-day, 365-day-a-year environment, translates to almost 838 hours (35 days) of downtime each year.

Step 2.
Determine the Amount of Planned Downtime—While unplanned downtime may be significant, often more than 90 percent of downtime is planned. Estimates of yearly planned downtime are usually more accurate than estimates of the unplanned variety as maintenance activities typically either follow rigid schedules or their frequencies are, on an annual basis, reasonably predictable. The first step in deriving an estimate is to perform a rigorous audit of all normal maintenance activities, such as database backups and reorganisations. For each such activity, multiply the historical average downtime per occurrence, adjusted for any growth trends, by the number of times the activity is performed per year. The timing of other planned activities, such as hardware and software updates, is less consistent, but historical averages provide a sufficient guide as to frequency and duration of the required downtime. These averages can be adjusted to incorporate any knowledge of upcoming upgrade requirements.

Step 3.
Calculate Hourly Costs—While it is impossible to predict the precise loss from an outage, it is important to derive reasonable estimates. Only then is it possible to evaluate the economically appropriate level of investment in data recovery or information availability software solutions. Losses in the areas of labour, revenue and service all contribute to the total cost of downtime. A good starting point for evaluating these factors is to collect statistics on both the duration and associated costs of past downtime as recorded by the accounting department. These include:

Labour—employees generally continue to receive full pay even if an out-of-service system cripples their productivity. A historical analysis usually provides a sufficient prediction of the cost of this lost time.

Examine which, how many and to what extent employees were affected by past outages. Some employees can continue to do some productive work during a system outage, while others may be totally idled. Estimate each group of employees’ decline in productivity as a percentage of normal output.

Next, estimate the value of an hour of lost productivity. A good surrogate measure is the total average salary, benefits and overhead costs for the affected group. The human resources department can usually provide this number. Since businesses try to earn profits, the value contributed by employees is usually greater than the cost of employing them. Therefore, using salaries, benefits and overhead costs as an estimate of lost productivity yields a very conservative cost/benefit analysis.

The following equation can be used to calculate the average labour cost of downtime. Since labour costs and the impact of outages vary, to achieve a high degree of accuracy, this equation must be repeated for each department and employee classification. However, a shortcut that groups similar employees into a single class is usually sufficient.

LABOR COST = P x E x R x H Where:

P = number of people affected

E = average percentage they are affected

R = average employee cost per hour

H  = number of hours of outage

Revenue—The simplest way to project the potential annual revenue loss from downtime is with the equation:

LOST REVENUE = (GR / TH) x I x H Where:

GR = gross yearly revenue

TH = total yearly business hours

I = percentage impact

H = number of hours of downtime

The first two elements of this equation provide an estimate of the revenue generated in an hour. The percentage impact is an adjustment that scales the hourly revenue number based on a best estimate of both the company’s ability to recover business lost during an outage and the lifetime value of customers who are permanently lost to the competition.

Loyalty Factor—The sales-per-hour number does not include the value of customer loyalty. To more accurately assess total lost sales, the impact percentage must be increased to reflect the lifetime value of customers who permanently defect to a competitor. If a large percentage of customers typically become very loyal after a satisfactory buying experience, the impact factor may significantly exceed 100 percent, possibly by a high multiple. Since determining lifetime value requires a long history of data and assumes, often inaccurately, that the future will reflect the past, an educated guess must suffice.

Intangible Costs—This category covers some of the more intangible downtime costs and other miscellaneous costs that don’t fall into any of the categories above. Questions that must be considered include, among others:

. Will there be any late delivery surcharges?
. Will overtime pay be required to make up for lost productivity?
. Will any critical financial filing deadlines be missed? If so, will penalties be assessed?
. Will frequent and/or long system outages tarnish the company’s image in the minds of customers and investors?
. Will there be an adverse effect on the company’s stock price?
. Will a loss of customer goodwill erode the company’s ongoing revenue stream?
. Will it be necessary to plan and execute campaigns to explain and apologise for the lack of service?

Service costs are rarely zero. Downtime usually leads to a cascade of related costs. The accounting department can help to identify all such service costs incurred during or after a previous outage. The total of these costs must be divided by the total number of hours the systems were down to determine the cost per hour.

Time-Dependent Costs—The hourly cost of downtime varies depending on the time of day. In most companies, few employees work in the middle of the night, so a system shutdown then would have only minimal impact on corporate productivity. Likewise, even companies selling round-the-clock experience busy and slow periods. For instance, if a North America-focused retailer’s web site is not available (for whatever reason) at 4 a.m., the impact would likely be significantly less than if it is down at 2 p.m. Similarly, downtime costs usually vary depending on whether an outage occurs on a weekday or on a weekend or holiday. Since unplanned downtime can occur at any time, one approach to calculating its hourly cost is to use an average of all hourly costs across a whole week. However, since some problems result from system overloading, which occurs at the most costly times, a more conservative approach is to weight the average accordingly. Unlike unplanned downtime, planned downtime can be scheduled for the least costly times. However, if scheduling maintenance at night or on weekends necessitates the payment of overtime and/or shift premiums, these costs must be factored into the calculation.

Total Costs—Totalling all of the above costs gives a reasonable forecast of the expected loss from an hour of downtime for a particular system. (Costs will vary depending on the nature of the application, so this calculation must be performed for each system.) To calculate the expected annual cost, multiply this number by the number of expected annual hours of downtime. When considering all factors, the potential loss from downtime shocks most people the first time they calculate it. To benchmark your cost of downtime against averages for your industry, see the Typical Hourly Downtime Costs by Industry chart in question 2.

Information-Sharing Opportunity Costs—The above discussion considers only the cost of downtime for systems as they exist now. It does not estimate the value that could be derived by making data that is currently inaccessible due to isolated and/or incompatible systems available to those who could benefit from it. Nor does it calculate the costs that could be eliminated by automating or eliminating the manual transfer or re-entry of data that must be available in two or more incompatible systems. Some of these costs are relatively easy to calculate.

For example, an operator may manually extract data nightly from one system into a flat file and then load it into another. By automating such a transfer, the cost of that operator’s time could be eliminated. Other costs, particularly the value lost by not having ready access to enterprise data, are more difficult to estimate. For instance, consider a company that cannot give customers online access to billing information because that data currently resides on a platform that is incompatible with the systems that customers can access. It is impossible to determine conclusively how highly customers would value online access and, more importantly, how that would affect their purchase decisions and long-term loyalty. Yet, this value must be estimated in order to evaluate the potential return on a data recovery/information availability software solution.

Does downtime affect my compliance readiness?

Yes. Many current regulations require companies to support more stringent information availability standards. Several new laws and regulations, directed at specific industries or a broad cross-section of companies, mandate the protection of business data and system availability. Businesses may incur government penalties for failing to comply with these data or business availability requirements. Any downtime calculation should include an estimate of the financial penalties, legal liabilities, etc., of failure to provide the required data or responsiveness within the specified time period.

What is a business impact analysis? And why do I need one?

The goal of a business impact analysis encompasses several areas, but ultimately helps you determine the cost associated with downtime. It will identify the critical business functions based upon data or application integrity and the sensitivity of each to downtime. You will want to determine the maximum outage time that a specific business function can sustain before the business is impacted.

The major benefit of a business impact analysis will be to identify the costs associated with downtime as well as the financial impact to the business. It will address both long and short-term outages and will help you determine what the recovery point objective should be for each business function (see “what is a recovery point?” in Workbook 3: Determining Business Resiliency).

A business impact analysis will determine the most critical data and resources to the organisation and the different methods and alternatives required to sustain availability and continuity. Finally. the business impact analysis will help define the various solutions or tactics that are required to offset the costs incurred during a business function outage.

9 Steps to Business Impact Assessment

1 – Identify critical business functions based on data/application integrity and time sensitivity to downtime.

2 – Determine the maximum outage that a specific business function can sustain before it impacts the business.

3 – Determine the costs associated with the various disruption scenarios.

4 – Identify the financial (revenue), productivity (expenses) and personal impact(good-will) of a business function disruption.

5 – Assess both long-term (permanent) and short-term outages or disruptions.

6 – Determine the recovery priority of each business function.

7 – Identify the most critical or vital data and the resources required to resume a business function.

8 – Define alternatives to sustain continuity.

9 – Define the various solutions/tactics that can be used to offset the costs incurred during a business function outage.

What should I include in a downtime threat analysis?

In order to understand the potential impact downtime can have for your organisation, you should identify the threats from both internal and external sources. These could include natural events as well as man-made events. Spend time thinking about what could actually happen in your region of the world and plan accordingly. There could be accidental as well as planned events that could cause or contribute to downtime. Some events may be within your control while others are not. Some events, like hurricanes, will give you ample warning; some events may happen quickly and give you very little reaction time.

Once you determine what types of events could likely affect you, create methods of information gathering. Perhaps emails or alerts from local weather stations should be added to your plan so that you are aware of weather events. With some types of events, you will want to be able to determine with certainty the likelihood of the event as well as the potential severity, in order to properly react. Once you have the process in place for determining and monitoring for various events, you will need to put procedures in place to ensure that this is a sustained effort. For example, if you are receiving emails to alert you of weather status, then you need to take steps to delete old alerts, keep the email recipient information current, and update the evaluation processes as time goes on.

Also, be sure to identify the key security and legislative issues that may impact your response. For example, if you must move locations when a disaster occurs, be sure to have the proper security in place for users or for devices attaching to the new server. The last step will be to establish a cost benefit analysis to be associated with the identified loss potential.

Cost Benefit Analysis Process

1 – Identify threats from both internal and external sources. These should include, but not be limited to, the following:

Natural, man-made, technological or political disasters.
Accidental versus intentional.
Internal versus external.
Controllable risks versus those beyond the organisation’s control.
Events with prior warnings versus those with no prior warnings.

2 – Determine the probability of each event.

Create methods of information gathering on each event.
Identify information sources.
Assess and assign a credibility factor to each information source
Develop a suitable method to evaluate probability versus severity.

3 – Identify the relevant key security, legislative or compliance issues.

4 – Establish a cost to be associated with each compliance issue.

5 – Establish a way to support your evaluation process on an ongoing basis.

How do I present my findings to management?

Document the costs of downtime including the effects on the company’s brand image. Base the recommended solution on the cost of planned and unplanned outages versus the cost of the solution. Show how the solution offsets the costs of downtime—especially planned downtime. Show how the additional benefits can increase the ROI from the solution. Present various options and costs against several typical scenarios.

Leave a comment