Disaster Recovery with VMware Site Recovery Manager (SRM)

I’ve been working on several projects recently that have included requirements for Disaster Recovery and Business Continuity. Before I discuss some of the common technologies and solutions available I’ll briefly discuss how much easier it is to design and build DR plans into a virtualised environment.              

Stepping back in time before virtualisation was mainstream many environments consisted of a large amount of physical servers, each with different hardware components and specifications. Typically these servers were backed up with tape which were stored offsite. In the event of a disaster, which could have been anything from the loss of a single server to a complete site failure, these tapes were relied upon to recover our environment.  In a previous role around 10 years ago I remember our annual DR test where a large truck would pull up in the car park and this was full of hardware ready for us to test the recovery plan. In a nutshell this was not a smooth process and was hindered by tapes not working, driver issues and the fact it took a long time to get the environment back up and running. Not to mention the amount of work involved such as manually following a script which dictated the recovery order and priorities. All of this was far from ideal.

Fast forward to where we are now with our virtualised environments and the whole process is much easier. The most challenging part can be getting the DR plan agreed to meet the business requirements. This is where a service catalogue is key. The three key parts that should be included in any DR plan are the Recovery Point Objective (RPO), Recovery Time Objective (RTO) and the priority of the server or application. For those who aren’t familiar with these terms I’ve included my simple explanation below:

  • The RPO can be seen as how much data can the business lose.

  • The RTO can be seen as how quickly does the business need the application or data available again.

  • The priority is the order or importance of the application or data to be recovered.

There is much more detail that goes into a full DR plan, many organisations have a team of people working together to manage this process.

Once the requirements for DR have been determined it’s time to turn our attention to the technology we can use to meet these. At Computerworld we have several products in our portfolio that meet most DR requirements and one of these is VMware Site Recovery Manager (SRM).

SRM is a software product from VMware that integrates with VMware vSphere to provide automated recovery in the event of a disaster. The first requirement is to have a supported vSphere infrastructure at both the primary and secondary sites. SRM also requires an underlying replication technology and this can be either array based or vSphere replication. Many SAN vendors such as Dell and Nimble provide Storage Replication Adapters (SRA) that integrate with array based replication. This means the data is replicated at a block level and each volume will have a replication schedule created based on the RPO set by the business. For example, tier 1 volumes replicated every 15 minutes, tier 2 volumes every hour, tier 3 once per day, so on and so forth. As replication occurs per volume it’s important to ensure virtual machines are located correctly. vSphere replication is hypervisor based and replicates at the VM level. This method is storage agnostic and can be useful if each site has a different storage vendor.

Once the data is available at the recovery site the next stage is to create protection groups and recovery plans. A protection group is essentially a group of virtual machines that can be recovered together and an example of this could be multiple Exchange Servers. This is more important when it comes to array based replication as the protections groups are determined by the placement of virtual machines across the datastores.

The final step is the creation of recovery plans. These are created at the recovery site as you would expect and you have the ability to be very granular on what is recovered. Recovery plans can be created to recover an entire site or just to recover a certain application. Using my previous example this could be multiple Exchange Servers. There are several advanced features inside the recovery plan to allow you to configure pre and post scripts, configure virtual machine dependencies and even change the IP address of each recovered virtual machine. This allows a great deal of flexibility when it comes to implementing a recovery plan to meet the business needs defined in the service catalogue.

Once SRM is fully configured you now have the ability to run scheduled DR tests that do not impact the production environment. This is achieved by using dedicated hardware at the recovery site and isolated networking. For those of you who know vSphere SRM will automatically present volumes (if using array based replication) to each host and register the virtual machines automatically. Each virtual machine is powered on in the order you determine in the recovery plan. By running these tests you have the ability to fine tune and resolve any issues in the environment to be sure that if you did suffer a failure at the production site the business can be back up and running quickly again.

If you are interested in finding out more about SRM then join me on our upcoming webinar where I’ll take a deeper look at some of the features. If you would like to know more about Disaster Recovery planning and how we can help you then please contact your Computerworld account manager.