In modern companies, one of the most important elements that affect their market competitiveness as well as smooth operation, is the guarantee of broadly understood business continuity. In case of IT systems and infrastructure, ensuring such continuity comes down primarily to defining and implementing disaster recovery (DR) processes. This includes developing a set of policies, processes and procedures relating to restoration and maintenance of mission-critical IT systems and infrastructure.
In other words, disaster recovery – the above-mentioned measures for restoring business continuity after a disaster or other failure – is a part of a broader concept referred to as ‘business continuity management.’ This concept covers the restoration of all areas of the organisation to working order after a sudden, unexpected enterprise-wide disaster or failure. This can be both a natural disaster, such as fire, earthquake, flood or discharge of water, as well as a major failure of mission-critical systems due to technical or human factor – either accidental or intentional.
Note, however, that disaster recovery focuses exclusively on restoration of mission-critical processes in the area of IT (or broader – ICT), not on restoration of business continuity of the entire company. It is also worth to note that in practice, the disaster recovery procedure is mentioned most often in the context of major failures – such as failures of the server room, all company computers or even the entire data centre. Disaster recovery does not apply to individual servers or workstations. In such case, continuity depends on backup and emergency restoration procedures.
Disaster Recovery Plan – DRP
Until recently, the term ‘disaster recovery procedures’ related to large information and telecommunications systems, especially in the context of data centres as a whole. Such measures were implemented only by large, international corporations and organisations – mostly banks, financial institutions, and telecom operators, that is companies whose existence depends on continuous access to services, and for which even the smallest interruption results in hundreds of millions of dollars in losses.
Currently, large and medium-sized companies with distributed structures are also investing in disaster recovery. These are mostly multi-branch companies, often with offices scattered throughout the world, in which a failure of the IT system or infrastructure makes normal functioning impossible. Implementation of disaster recovery allows us to shorten RTO (Recovery Time Objective) and RPO (Recovery Point Objective) nearly to zero, so that the company can operate virtually without interruption (more on this in a moment) and without fear of major failures or disasters.
The most important document in the development of a disaster recovery strategy is called the disaster recovery plan. This plan describes the entire solution and must include such elements as: analysis of risk and business requirements, a catalogue of process and applications covered by the plan (along with the specification of their parameters), and an organisational chart for the disaster recovery project. The organisational chart must include a scheme that corresponds to the organisational structure under normal, everyday working conditions, and a scheme with organisational structure in force during disasters or failures – similar to the way an army works in times of peace and war. In addition, the DRP document must include patterns and procedures for processes associated with disaster recovery as such as well as various scenarios for actions to be taken in the event of failure or disaster. Of course, the disaster recovery plan should be a living document that will be subject to cyclical revisions and changes associated with company development – both in terms of technology and organisational structure.
The whole idea of the disaster recovery plan is based on the calculation of risk associated with the loss of corporate data, which should be performed before such loss. It is assumed that at any moment, a company can face an unpredictable large-scale failure that will paralyse the operation of the IT systems and infrastructure, and, consequently, the operation of the entire company or its part. If we look at the above elements of the disaster recovery plan, it becomes clear that this plan will allow us to avoid the usual hectic and chaotic rescue operation which in most cases brings more harm than good.
In order to calculate the risk, we have to consider the two most important indicators already mentioned before: RPO and RTO. The first determines how long the company can cope without access to its data, and how current the data will be after recovery. In other words, this value gives us information about the acceptable duration of a failure and the age of data that are critical to its operation. The second indicator, RPO, allows us to estimate the loss and how much will it cost the company. In case of a stock exchange or international investment bank, RTO must be literally counted in seconds. For a small, niche online store, even a day of downtime should not cause excessive losses from a business point of view. In the latter case, data from a backup made at night (with RPO of about 24 hours) should be more than enough.
On the other hand, RTO specifies the maximum time in which we need to recover our data and fully restore the system to operating order. In order to estimate this parameter, we should take into account the first indicator (RPO) as well as the infrastructure capabilities, network bandwidth and abilities of the IT staff. It is worth keeping in mind that according to the statistical data gathered for the US market, 93% of companies which had no access to key data for more than ten days, collapsed within one year of the failure, and 50% of them went bankrupt immediately.
As already mentioned, when estimating both indicators, it is important to correctly determine the potential losses for the company associated with the failure and the costs of implementation of a disaster recovery system. And, of course, all of these costs need to be balanced. It should also be noted that according to the SHARE classification, there are now seven levels of disaster recovery security – from the simplest backup systems, in which the recovery plan is not defined at all and the recovery time is not specified, to automated switching to backup data centres in the matter of milliseconds. This is why it is also important to think about what solutions will be appropriate for our company and infrastructure. In most cases, the best way is to invest in fast and safe backup systems such as Xopero Cloud – with elements of a large disaster recovery system that will protect the mission-critical processes. Otherwise, it is worth to give up self-restoring at all and entrust the protection of our data to an external company that will be doing it quickly and efficiently.