Downtime prevention is becoming a top priority for organizations across all market sectors — from manufacturing, building security and telecommunications to financial services, public safety and healthcare. What’s driving this requirement for always-on applications? It’s partly due to the rapid expansion of users, environments, and devices. Increasingly, however, organizations require high application availability to compete successfully in a global economy, comply with regulations, mitigate potential disasters, and plan for business continuity. All these factors contribute to a growing demand for high-performance availability solutions to keep applications up and running.
The good news is that there are many effective availability solutions available on the market today including standard servers with backup, continuous data replication, traditional high-availability clusters, virtualization and fault-tolerant solutions. But with so many options, figuring out which approach is good enough for your organization can seem overwhelming.
Understanding the criticality of your computing environment is a good place to start. This involves assessing downtime consequences on an application-by-application basis. If you’ve virtualized applications to save costs and optimize resources, remember that your virtualized servers present a single point of failure that extends to all the virtual machines running on them, thereby increasing the potential impact of downtime. Depending on the criticality of your applications, you may be able to get by with the availability features built into your existing infrastructure or you may need to invest in a more powerful and reliable availability solution — perhaps one that proactively prevents downtime rather than just speeding and simplifying recovery.
But availability level is not the only factor to consider when selecting a best-fit solution to protect your applications against downtime. Stratus has created a Downtime Prevention Buyer’s Guide to streamline the evaluation process and, ultimately, help you make the right choice of availability solution. The guide presents six key questions you should ask vendors along with valuable insights into the strengths and limitations of various approaches. You can use vendors’ responses to objectively compare solutions and identify those that best meet your availability requirements, recovery time objectives, IT management capabilities, and return on investment goals, while integrating seamlessly within your existing IT infrastructure.
Craig Resnick of the ARC Advisory Group shared his insights on how to eliminate unplanned downtime and future-proof automation system assets in a recent webinar. The webinar reviewed the ever-present consequences that can occur from unplanned downtime and some of the leading causes. Strategies to reduce unplanned downtime through implementing updated SCADA systems and using technologies such as virtualization and fault-tolerant computers were discussed, as well as how organizations can leverage those strategies to prepare for the coming wave IIoT.
Here’s a summary of the key take-aways:
- Understanding the true impact of unplanned downtime can lead to a better understanding of where investments can be made in automation systems to reduce such events.
- Unplanned downtime can occur from a variety of areas, including human errors, failure of assets that are not part of the direct supervisory and control chain, and failure of the SCADA systems themselves. The result is lowered OEE, decreased efficiency and reduced profitability.
- Adopting standards-based platforms and implementing technologies such as virtualization can consolidate SCADA server infrastructure and deliver a range of benefits, such as simplified management, easy testing and upgrading of existing and new applications and preparation for the IIoT.
- When virtualizing it is important to understand that you need to protect your server assets, as moving everything to a single virtualized platform means that everything fails if the platform fails. There are various strategies to prevent this, but it is important to ensure that you don’t swap the complexity of a single server per application for a complex failure recovery mechanism in a virtualized environment.
- Fault-tolerant platforms are a key way to avoid this complexity, delivering simplicity and reliability in virtualized implementations, eliminating unplanned downtime and preventing data loss – a critical element in many automation environments, and essential for IIoT analytics. It is important to note that disaster recovery should not be confused with fault-tolerance. DR provides geographic redundancy in case of catastrophic failures, but will not prevent some downtime of data loss. In fact fault-tolerance and DR are complementary and they are often implemented together.
- IIoT is driving OT and IT together so it is important to understand the priorities of each organization. In fact, OT & IT share a lot of common ground when it comes to key issues and this is a good starting point to cooperate in the move towards IIoT. Common requirements include no unscheduled downtime, cyber-security, the need for scalable and upgradeable systems and applications, as well as measurable increases in ROI, ROA and KPI’s. Last but not least is future-proofing systems and preparation for future IIoT applications.
This webinar is a good way to start the process of looking into what needs to be considered for upgrading and modernizing automation compute assets, using technologies such as virtualization and fault tolerance, as the industry evolves to increased levels of efficiency and moves towards implementing IIoT.
What does Always-On mean, and who is really Always-On?
While there are several companies claiming to have an always-on solution, always-on actually represents a much wider spectrum than just on/off. IBM, HPE, Oracle, Microsoft and VMware are among the usual tech titans talking about Always-On solutions of one type or another, joined by others such as Stratus, Veeam and NEC. But what exactly does it mean within the spectrum of always-on solutions?
An always-on environment would be exactly that: always-on. To create an always-on environment starts with a solution that ensures that the customer’s virtual machines and applications are up and running 24×7 and aren’t being recovered or restored after a catastrophic outage has crashed the host server. Industry-wide, there are only two vendors capable of providing a solution that eliminates any amount of unplanned downtime: Stratus and VMware. While both vendors ensure maximum uptime, only Stratus delivers a single-server solution backed by a $50,000 Zero Downtime Guarantee. Furthermore, Stratus’ hardware-based solution incurs no performance overhead nor does it limit the number of vCPUs or amount of virtual memory supported per VM.
Another company that uses “Always-On” in its tagline, Veeam is one of the industry’s leading providers of solutions that address backup requirements for virtualized workloads. Founded in 2006, the Swiss-based company is ranked by IDC as a top 5 data protection software vendor whose products are experiencing rapid adoption, including a large percentage of Stratus customers.
Veeam Backup & Replication and Veeam ONE (offered as packaged solution called the Veeam Availability Suite) is a well-designed solution that delivers a combination of backup and recovery capabilities and onsite and remote image-based replication with monitoring and alerting features. Taken together, this combination of attributes enables users to meet service level agreements that require recovery time objectives (RTO) and recovery point objectives (RPO) of less than 15 minutes for all applications and data.
While RPOs and RTOs of 15 minutes or less are very impressive and may be acceptable for many environments, we believe that ultimately, Veeam’s’ offerings can be more aptly categorized as an “always-recoverable” solution.
Ultimately, users seeking to maximize the total availability of their environment and its data should consider a combination of “always-on” and “always-recoverable” solutions from best-in-class vendors including Stratus and Veeam.
Learn what you can do to eliminate the costs of downtime with Application Availability Solutions from Stratus.
#Uptime Facts to Keep Your Organization “Always-On”
By Jason Andersen
Downtime. It can mean lost revenue, customer dissatisfaction, reputation damage and, depending on the nature of your organization, even loss of life. That’s why we at Stratus are committed to building technology that enables always-on applications.
To communicate the importance of keeping your systems up and running in order to ensure uninterrupted 24 x 7 x 365 performance of your essential business operations, we recently shared some #Uptime Facts on our Twitter account, @StratusAlwaysOn.
Be sure to start following us on Twitter for more info and stats, helpful advice and technological solutions to your business challenges. And, in the meantime, you can find a compilation of these eye-opening #Uptime Facts below:
- #Uptime Facts: Underutilized servers could be costing your organization an additional $1.68 million per year. [Uptime Institute]
- #Uptime Facts: A recent study found over 80% of #sysadmins reported that their #downtime costs exceeded $50k per hour. [The Availability Digest]
- #Uptime Facts: At 99.9995% #Availability, your average downtime is as low as 2 minutes and 38 seconds [Stratus High Availability Journey Infographic]
- #Uptime Facts: 2014 survey from @uptimeinstitute found 56% of organizations don’t have unscheduled datacenter disaster drills [Uptime Institute]
- #Uptime Facts: Running as is, data centers are wasting 16.8 kW of power = $1.68 million per year. [Uptime Institute]
- #Uptime Facts: 80% of outages impacting mission-critical services will be caused by people and process issues. #Downtime [Gartner]
- #Uptime Facts: Approximately 20% of servers in large #datacenters are comatose. Unused servers drawing both your energy, and budgets. [Uptime Institute]
- #Uptime Facts: The average server operates at only 12-18% of capacity [Uptime Institute]
- #Uptime Facts: Did you know? About 50% of server power draw comes from just turning them on. #Datacenter [Uptime Institute]
- #Uptime Facts: 27% of those surveyed running #Virtualization had downtime events lasting over 1 hour. [Stratus High Availability Journey Infographic]
- #Uptime Facts: 59% of Fortune 500 companies experience a minimum of 1.6 hours of downtime per week [FierceWirelessCommunications]
- #Uptime Facts: Database failover can take up to 20 minutes [Stratus High Availability Journey Infographic]
- #Uptime Facts: 20% of servers in #Datacenters are obsolete, outdated, or unused [Uptime Institute]
- #Uptime Facts: Hidden costs of downtime include lost customer, recovery cost, damaged reputation, employee productivity, & regulatory impact [Stratus High Availability Journey Infographic]
- #Uptime Facts: Most common causes of #Downtime are, Hardware Failure, Upgrades, and Migrations [Stratus High Availability Journey Infographic]
Lots of good statistics about the causes, costs and next steps that companies can utilize for understanding their risk and potential costs related to downtime, so they can procure additional funds to protect against future availability issues.
Let’s take a quick look at the high level findings published in the Infographic.
91% still experience downtime
33% of all downtime is caused by IT equipment failure
IT equipment failure is the most expensive outage (23%). Twice as high as every other except cyber crime (21%).
Average length of downtime is still over 86 minutes
Average cost of downtime has increased 54% to $8,023 per minute.
Based on these statistics, 30% (33% of 91%) of all data centers will have downtime related to IT equipment failure. Assuming they only have one incident of the average length, they would incur $689,978 (86 x $8,023) in downtime related costs.
Stratus can address 33% of the most costly downtime with our fault-tolerant hardware and software solutions.
52% believe the outages could have been prevented. This makes sense, because 48% is caused by accident and human error. Only training, personnel changes or outsourcing can improve that cause of downtime.
70% believe cloud is equal or better than their existing availability. That’s if you don’t look too close at the SLA details (i.e. excluding “emergency maintenance” or downtime only counts toward SLA if over xx min per incident). Certainly most cloud providers can provide better than the 99.98% [(525,600-86)/525,600] availability these data centers are currently averaging (assuming only one incident of average length). But remember, all SLAs are limited to the cost of the service, which I assume is far less than the almost $700k downtime related cost most in the survey have realized.
Cloud solutions are constantly improving; but we continue to hear from our customers that availability still has a long way to go, especially when it comes to stateful legacy workloads that don’t have availability built into the application like native cloud apps. Of course, this is something that we at Stratus are working on.
I say look into availability options and invest upfront in the best availability you can afford, it might not pay dividends upfront, but an ounce of prevention is worth a pound of cure. Because $50k spent on availability might be worth $700k in related costs, not to mention headaches and tarnished reputation.
Everybody is texting these days. Teenagers, soccer moms, business people, and even grandparents have jumped on the bandwagon, sending more text messages, photos and videos than ever before. In fact, according to a 2011 Pew Internet survey, “Americans and Text Messaging,” 73 percent of cell phone users text, and nearly one-third of them would rather text than talk. With texting on the rise, it’s inevitable that 9-1-1- technology must evolve to meet the needs of today’s mobile citizens. That’s what Next Generation 9-1-1 (NG9-1-1) is all about.
NG9-1-1 is a national initiative that aims to update and improve emergency communications services. The end goal is to upgrade the country’s 9-1-1 infrastructure so that the public can not only call, but also transmit text, video, photos, and more to a Public Safety Answering Point (PSAP). In turn, the PSAP will be able to process the data, transmit it as necessary, and get it out to first responders. Unlike today’s system, the new infrastructure will also support the transmission of calls and information across county and state lines. These enhanced capabilities will be instrumental in increasing public safety by helping law enforcement, firefighters, EMTs, and other first responders get better information about the situations they face in the field.
While migrating your PSAP to NG9-1-1 may seem overwhelming at first, proper planning can help ensure a smooth and manageable transition. Give careful upfront consideration to all your technology needs — ESInet, CTI software, CAD systems, mobile data networks, TDD software, and more. Think about how you will fund your NG9-1-1 system. Explore potential liability issues. Create a public education plan. And figure out the best way to protect your NG9-1-1 solution against downtime that could lead to tragic consequences. Looking for practical advice on how to successfully move your PSAP to NG9-1-1? Download our informative white paper, “What You Need to Know About Migrating to Next Generation 9-1-1 Technology,” to learn more.
It’s no secret that system downtime is bad for business. For one thing, it’s expensive. According to a 2012 Aberdeen Group report, the average cost of an hour of downtime is now $138,888 USD — up more than 30% from 2010. Given these rising costs, it’s no wonder that ensuring high availability of business-critical applications is becoming a top priority for companies of all sizes.
When it comes to choosing the right downtime protection, there are a couple of important things to keep in mind. First, deployment of applications on hypervisor software for server virtualization is increasing at a steady pace and is expected to continue until almost all applications are implemented on virtualized servers. As a result, you need to make sure that your downtime protection is able to support virtualized as well as non-virtualized applications. Second, with IT spending and headcount on the decline, downtime protection should be easy to install and maintain since there are fewer IT resources available to manage the assets.
Available downtime protection options range from adding no additional protection other than that offered by general-purpose servers to deploying applications on fault-tolerant hardware. Which option you choose will depend on the type of application in question. If the application is mission-critical, then you’ll need higher levels of protection. A strong segment of companies are choosing to protect each of their mission critical applications with fault-tolerant servers because they provide the highest availability, require no specialized IT skills, and are now priced within reach of even small to mid-size companies. Looking for guidance in choosing the right downtime protection for your “can’t fail” applications? Download the Aberdeen Group report to learn more.
Customer Spotlight: Makro South Africa Future-Proofs EFT Credit & Debit Card Processing Infrastructure
Makro South Africa (SA) had hosted its credit and debit card processing service on Stratus® ftServer® systems since their previous server failed them for two hours in 2005. This failure cost the business over one million rand, not to mention its hard-earned brand reputation. At the time, Makro had been quick to understand that their ftServer system would soon pay for itself. It did.
When in 2011 Makro SA came to replace the infrastructure and to re-architect the payment processing service , it was decision time.
Did the organisation still need their Stratus ftServer systems?
It did. Read on …
If you are an IT decision maker looking for application high availability and business continuity, Stratus acquisition of Marathon Technologies is relevant to you.
Stratus, the company known for products and services that keep mission-critical applications up and running all the time, announced on Monday the acquisition of Marathon Technologies. Marathon’s specialty is software-based solutions for high availability, fault tolerance and disaster recovery. Its everRun MX is the world’s first software-based, fault tolerant solution that supports multi-core/multi-processor Microsoft applications; The addition of the Marathon everRun® product line, the world’s first software-based, fault tolerant solution to support multi-core/multi-processor Microsoft applications, further solidifies Stratus’s position as the leading provider of availability solutions.
We welcome Marathon’s customers, channel partners and employees to the Stratus community. Stratus is the leader in high availability and fault tolerant solutions for both software and hardware whether in a physical or virtualized cloud environment.
You can read our recent announcement at here.
If someone told you your company could easily save tens of thousands of dollars every year, would you be interested in finding out how? It seems like a no brainer, but you might be surprised at the answer. According to an array of industry surveys, including one conducted by Stratus Technologies and ITIC, the majority of companies do not calculate the potential impact of IT downtime. Those that say they do may not be calculating it properly. The most recent Stratus-ITIC survey (December 2011) found that 52 percent of businesses do not know the potential financial impact of IT downtime to their organization. Personally, I’d estimate that this figure is actually very low as it’s been my experience that fewer than 10 percent of companies can assign a value – monetary or otherwise – to their cost of downtime.
Does knowing your cost of downtime really matter? Consider this: In February 2012, Aberdeen conducted an in-depth analysis of factors surrounding datacenter downtime. They found that compared to figures reported in June 2010, the average cost of an hour of downtime increased by 38 percent. The hourly cost of downtime for the average company is now estimated to be $181,770. Even for those classified as “best-in-class” companies – i.e. those performing in the top 20 percent, downtime still costs $101,600 an hour.
Ultimately, CTOs and CIOs who understand the impact of downtime on their organization and its deliverables are in a strong position to justify investment decisions and changes that can have a significant impact on their bottom line and help them achieve their service-level agreement commitments to clients.
Our studies have shown that when trying to arrive at the impact of downtime, most companies typically consider only the most obvious direct costs and drastically underestimate the total costs associated with an outage. Arriving at a true accounting of the cost of downtime requires a more comprehensive analysis that should consider some of the factors listed below:
- Lost productivity/reduced production
- Goods and materials lost/disposal and cleanup costs
- Financial impact of customer dissatisfaction
- Contract penalties
- Compliance violations
- Negative effects on your reputation
- Upstream and downstream value-chain ripple effects
- IT recovery costs, meaning out-of-pocket expenses needed by the IT staff to restore the system
- Employee recovery cost, meaning the time it takes to get back up to speed once applications are back up and running
- Missed deadlines that result in employee overtime and priority shipping charges
- Potential litigation/loss of stock value
Here are a few real world scenarios that can help you see just how costly downtime can be:
1. A manufacturing company produces components for other third parties that are incorporated into their products. The server that controls the assembly line goes down and the line is disrupted, let’s say for an hour. In addition to the costs associated with reworking the goods that were not produced during the outage your promised delivery time has been completely thrown off schedule. The end result? Upset employees who have to remain at work until the job is completed. Overtime costs. Expedited shipping costs. And maybe most important: customers who may begin to question your credibility as a reliable supplier.
2. Your anniversary is coming up and you want to surprise your significant other with some flowers. You place the order 3 weeks in advance to be delivered at their workplace. On that day you are waiting for the ‘thank you’ call that never comes. As it turns out, the server at the flower shop went down and orders – including yours – weren’t fulfilled. While the florist has to refund the money associated with the unfulfilled orders, they have lost a customer they will never recover. In our socially networked world, you’re going to go out of your way to insure that everyone you can think of knows what happened and urge them not to order from this vendor.
3. While the cost of downtime is usually measured in financial terms, there is one case where the stakes are much higher – public safety. Here, server uptime can mean the difference between life and death for the person awaiting assistance. Similarly, responders can be placed in harm’s way when they arrive at the scene without information that can be crucial to the performance of their duties. Emergencies happen at all times of the day, so 24-hour uptime is especially important here.
Where does Stratus come in?
If businesses are unclear what IT downtime really costs, it’s unlikely they’re properly prepared to deal with it and its consequences.
Stratus has been protecting critical business applications against downtime and data loss for more than three decades. This is the only thing we do. We have the products, services and people focused on delivering uptime and reliability that are a cut above all the rest.
Learn more about the all of our uptime solutions.