Human error, natural disaster, disgruntled staff, malicious hackers and cyberattacks; there seems to be a plethora of opportunities for a major event that completely derails an organisation’s operations, often resulting in financial loss, reputational damage, and issues with compliance.

Disaster recovery is (or should be) an integral part of your business continuity strategy and planning.

IT Teams should fully plan the operations and processes required to re-establish access to applications, data, and IT resources in the shortest possible time, when an incident occurs.

If necessary, this may involve failing over to a full recovery site with redundant pre-configured servers and up to date copies of data.

Unlike failing over a single server or application, a full systems failover will most likely require the decision of the incident manager for the business.

A genuine recovery site should be able to support all (core) business systems without requiring additional configuration or licensing when it becomes active. In a genuine DR scenario, the IT team will be under huge pressure to re-establish normal operations

Disaster Recovery or Backup?

Backups are a vital part of business resilience and compliance but shouldn’t be confused with a full DR plan. A good backup policy is great for governance and restoring smaller amounts of specific data in day-to-day operations; but that becomes irrelevant if you have no systems to restore to.

Your disaster recovery solution should replicate your critical systems with the aim of quickly failing over to assure availability and continuity of the affected applications. Your disaster recovery solution should be able to achieve a recovery time objective (RTO) of minutes and a recovery point objective (RPO) of seconds (caveat ransomware recovery). These objectives usually require a separate site with an operational IT infrastructure that is ready for failover at any time without additional configuration

No alt text provided for this image

Disaster Recovery – Impact and Risk Analysis

Analyse, understand and plan to mitigate direct and indirect financial loss — pay special attention to applications that are critical for revenue-generating processes

For example: time recording systems, billing systems, external-facing systems that are provided to your customers for a fee

Longer term indirect financial loss may occur as a result of customers switching to a competitor or competing service due to your systems not being available

Reputational damage — disruptions and even worse, loss of client data can significantly harm your reputation. Minimising disruption and ensuring availability and security of client data will help avoid reputational damage which is often difficult to reverse

Identify the risk of fines for failing to maintain compliance standards, both legislative and industry specific

Conduct a Risk Analysis of all of your business applications and calculate the business impact of each of them. Not all applications will have the same risk or negative impact if unavailable

Consider the consequences of an applications downtime on both internal operations and its external visibility to clients; consider costs, reputation, and compliance

Recover Point Objective – Define the maximum acceptable delay between the interruption of each application and the restoration of its service (RTO)

Recovery Time Objective – Define the maximum acceptable gap between the data in the recovery system and the data stored in the source application (how much data loss is acceptable, usually calculated in minutes)

Your risk analysis should separate your applications and systems into Tiers 1,2, and 3; with Tier 1 being the most critical and Tier 3 the least important.

Common Figures for Tier 1, Tier 2, and Tier 3 Recovery

No alt text provided for this image

Disaster Recovery in the Cloud (DRaaS)

Unlike on-premises data centres, cloud providers such as AWS allow you to pay only for the DR resources you use, when you use them

No hardware: when you use AWS as your recovery site, you pay for a fully provisioned recovery site only when it is used, such as during a disaster or drill. No CapEx investment or unnecessary duplicate provisioning of resources

Minimise the need for duplicate software licenses by using AWS as your recovery site along with an appropriate replication tool. The disaster recovery tool can keep servers continuously in sync on AWS, without running most operating system or application licenses. In the event of a disaster or a test, you can launch your servers and then pay for third-party licenses as needed

Minimise infrastructure by replicating your applications into a low-cost storage area, which virtually eliminates the need to pay for expensive infrastructure and storage During a disaster or drill, you can launch fully provisioned workloads, and only then do you need to pay for more comprehensive compute resources

Management and monitoring — AWS disaster recovery solutions provide advanced automation which means fewer IT resources are required to maintain and launch your applications. Servers can boot natively on AWS, even if they originated from non-cloud infrastructure

AWS Elastic Disaster Recovery Features:

  • Recover your applications on AWS from physical infrastructure, VMware vSphere, Microsoft Hyper-V, and cloud infrastructure. You can also use it to recover Amazon EC2 instances into a different AWS Region or Availability Zone
  • Continuously replicate applications and databases from any supported source into a staging area subnet in your AWS account in the AWS Region you select; using low-cost storage and minimal compute resources to maintain ongoing replication
  • During normal operation, maintain readiness by monitoring replication and periodically performing non-disruptive tests and failback drills
  • To recover applications, you can launch recovery instances on AWS within minutes, using the most up-to-date server state or a previous point in time.
  • After your recovery instances are running on AWS, you can choose to keep them there, or initiate data replication back to your primary site once the issue is resolved
  • Achieve RTOs of minutes by launching the disaster recovery site on demand and RPOs of seconds using continuous data replication.
  • Support all applications running on x86 architecture (including physical and virtual)
  • Supports any hypervisor (including physical servers when there is no hypervisor at all)

Summary

Conducting a proper business impact analysis for each application will help you determine whether it requires a lower-cost backup solution that focuses on data retention or a more robust disaster recovery solution that can minimise downtime during a major incident.

Once you have determined which applications require disaster recovery and what their recovery objectives are, consider using AWS to set up a flexible, cost-effective, and reliable recovery site. Moving from an on-premises recovery site to a cloud-based recovery site can reduce your disaster recovery TCO while offering additional benefits such as automation, isolated testing environments, and point-in-time recovery.

Contact us for further information about implementation and best practises using AWS Elastic Disaster Recovery

Loading...