Procedure Disaster Recovery Plan

From D4Science Wiki
Jump to: navigation, search

Disaster Recovery Plan

Disasters can be classified into two broad categories. The first is natural disasters such as floods, hurricanes, tornadoes or earthquakes. While preventing a natural disaster is impossible, risk management measures have been applied to avoid disaster-prone situations. The second category is man-made disasters, such as hazardous material spills, infrastructure failure, bio-terrorism, and disastrous IT bugs or failed change implementations. In these instances, monitoring, testing, and mitigation planning have been also defined.


  • use of high availability systems which keep both the data and system replicated on-site and off-site, enabling continuous access to systems and data, even after a disaster.
  • use of Hybrid Cloud solutions that replicate both on-site and to off-site the main D4ScienceAn e-Infrastructure operated by the initiative. data center. This solution provides the ability to instantly fail-over to local on-site hardware most of the D4ScienceAn e-Infrastructure operated by the initiative. services, but in the event of a physical disaster at the main D4ScienceAn e-Infrastructure operated by the initiative. data center, servers can be brought up in two additional data centers that are federated to D4ScienceAn e-Infrastructure operated by the initiative..
  • backups made at regular intervals of all storage devices (the maximum interval for two consecutive backups is one day).
  • replication of service to an off-site location, which overcomes the need to restore the service (only the data need to be restored or synchronized).

In addition to preparing for the need to recover systems, D4ScienceAn e-Infrastructure operated by the initiative. also implements precautionary measures with the objective of preventing a disaster in the first place. These include:

  • local mirrors of systems and/or data and use of disk protection technology such as RAID;
  • surge protectors — to minimize the effect of power surges on delicate electronic equipment;
  • use of an uninterruptible power supply (UPS) and backup generator to keep systems going in the event of a power failure;
  • fire prevention/mitigation systems equipped with alarms and fire extinguishers;
  • firewall and network frameworks to avoid intrusion and attacks.