Assume The Worst In IT Disaster Recovery Plan Insurers should leave nothing to chance to assure access to critical data in a crisis
On Oct. 17, 1989, San Francisco was rocked by a powerful earthquake. Lasting just 15 seconds, it left 63 people dead and caused massive damage throughout the city.
At the time, I was supporting a prominent mainframe data backup and recovery product and had the opportunity to talk at length with customers in the San Francisco Bay Area. From their experiences restoring their IT operations, I learned critical lessons about IT disaster recovery that are often overlooked.
The most important lesson learned was that within a disaster response there are many layers of recovery and many hidden assumptions in typical recovery plans.
For many, the largest obstacle was the inability of key personnel to reach their pre-designated location from which to execute the disaster response. In some cases, this was because the roadways were impassible. In others, it was because the telephone systems were down or so congested they were unusable. In no case did the barrier to resuming operation involve recovering data.
The unifying theme from this earthquake experience is that being well-positioned to recover from a disaster involves a lot more than knowing where the backup tapes are kept. In a real emergency, seemingly safe assumptions can become significant obstacles. Delay in recovering information technology services can jeopardize the long-term viability of the entire enterprise.
In addressing disaster recovery, IT professionals cannot afford to think within the confines of their discipline. The key to creating successful disaster recovery strategies is to understand everything that may be required to deal with the emergency at hand, leaving nothing to chance.
For the insurance industry, the disaster recovery risks are no different. Today, a companys intellectual propertyfound in electronic files and datais critical to transacting, tracking and servicing the document-intensive process that drives the business. To achieve a comprehensive disaster recovery program, we must first explore the exact definition residing within the concept itself.
Have You Fully Mitigated Your Risks?
The best disaster recovery plans have often been derived from exacting disaster scenarios. Conducting the scenarios as thought experiments, disaster planning professionals work through recovery from a theoretical disaster in the most minute detail. Questionable assumptions, as well as single points of failure, are uncovered and analyzed. Together, these are balanced against probabilities and risk scenarios.
Ultimately, procedures are developed to mitigate riskspredictable and otherwise. Whenever practical, these procedures are tested and proven by simulating actual recovery.
A devastating earthquake is at one end of the spectrum of recoverable disasters. Many lesser forms of possible disaster occursuch as total server failurewith significantly greater frequency.
Technology today has provided a much greater realm of possibility for full correction and minimization of business interruption. Identifying and preparing accordingly for technological disaster recovery is a product of in-depth planning and systematic “rehearsal” of who does what, when and from where. After all, a risk, once mitigated, is a disaster averted.
Choosing the appropriate collection of measures to ensure continued operation or rapid recoverability of operations is a task incumbent on the managers of any IT-dependent enterprise. Failing to do so can be costly. (See the accompanying “Tips” sidebar for some key risk management and loss control considerations.)
Will You Be Ready?
In sum, good disaster recovery usually results from excruciatingly detailed planning and deliberate decision-making about risks and benefits. The ideal disaster recovery plan results when risks are identified, scenarios are thought through, and the risk itself is mitigated or averted.
The insurance industry, above most others, has a deep understanding of the importance of risk analysis and mitigation as a key factor to successful business operation. Disaster occurrence, threatening a debilitating and possibly devastating impact on a companys key assets, should not be left unaddressed.
Steve Drill is vice president of products for Q.Know Technologies Inc., based in Reston, Va.
Sidebar:
Flag: Loss Control Tips
Head:
Will Your IT System Survive A Catastrophe?
By Steve Drill
In the case of electronic information resources, there are special considerations, and some of these are not obvious. System data backups do not themselves ensure successful recovery from disaster. To understand this point, the following questions should be considered:
Is all the necessary electronic documentation and data being backed up?
A common approach is to back up all servers and shared file stores, neglecting the corporate documentation that may be on workstations, laptops and mobile devices. This may be an unacceptable risk if no other operational mechanism or procedure causes the data on these systems to be shadowed on systems within the backup scheme.
If restoration from the backups is executed, will the result be a functional and correct system?
Software applications are often composed of numerous code and data elements stored in multiple locations. The design of the applications often relies on these separate components being synchronized with each other. Backup operations performed against the running systems may not preserve this synchrony. What appears superficially to be a backup may not in fact be capable of reproducing a functional system.
Make sure that the backup process exploits a snapshot, suspend-resume, or orchestrated application shutdown and restart for backupnot at a component level but at an entire application level.
If the physical hardware must be replaced, where/how can it be quickly and easily obtained? How long would that take?
Any physical component of the IT infrastructure can fail. In the best case, redundant systems automatically compensate for failures, allowing staff to replace the failed components at their leisure and without service outage. In the worst case, the application only runs on discontinued or obsolete hardware when a spare is unavailable.
Have the organizations most critical systems been identified?
The importance of this knowledge becomes obvious when we address the recovery. In what order does the organization need to have the systems back in operation? Some systems may be so critical that even minutes of downtime create serious consequences. Others may be so ancillary that weeks can pass before they are missed.
The most successful recovery strategies will optimize the priorities of the business. Usually, this means optimizing the backup strategy, as well. Mission-critical applications and documentation should be supported by backup and redundancy strategies that would not be prohibitively expensive for the IT operation as a whole.
Does the data/electronic documentation protection scheme match the criticality and volatility of the given application? Is last weeks data sufficient? Must data and documentation be recovered within a day, an hour, a transaction?
Restoring from a backup can, with proper planning, produce a running system. However, to ensure the system is back in synch with the business thereafter can be a formidable challenge. How much change is happening and how frequently? How much manual data entry or data repair can you afford? Ascertaining these priorities and the organizations appetite for this level of planning will greatly speed the recovery process.
Can servers be restored? Can it be achieved internally?
Face it, the “system” is not working unless the users are working. Everything from a users keyboard to the server has to be working for that to happen. Having the redundant, mirror server system in an underground vault can be very impressive, but the whole point is lost if users cannot be redirected to it when needed, regardless of personnel location for plan activation.
Reproduced from National Underwriter Edition, February 25, 2005. Copyright 2005 by The National Underwriter Company in the serial publication. All rights reserved.Copyright in this article as an independent work may be held by the author.
Want to continue reading?
Become a Free PropertyCasualty360 Digital Reader
Your access to unlimited PropertyCasualty360 content isn’t changing.
Once you are an ALM digital member, you’ll receive:
- Breaking insurance news and analysis, on-site and via our newsletters and custom alerts
- Weekly Insurance Speak podcast featuring exclusive interviews with industry leaders
- Educational webcasts, white papers, and ebooks from industry thought leaders
- Critical converage of the employee benefits and financial advisory markets on our other ALM sites, BenefitsPRO and ThinkAdvisor
Already have an account? Sign In Now
© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.