Disaster Recovery
Written by: Steve McMaster
From Eye of the Storm - November 2008
We’ve spent the last month or so preparing for a complete data center rewiring project. This project will include shutting down our entire data center for an extended period of time – downtime we just cannot have. We have redundancy internally, but when it comes to shutting down our data center, we have no way of keeping our core services running while we are down. So part of this preparation has been building a replica of those core services off-site at facilities that are, themselves, very redundant. I’ve found, over this past couple weeks, that while internal redundancy is key to achieving high levels of uptime, having a DR site will push you one step closer to 100% uptime.
DR, or Disaster Recovery, is the key to a 24/7 e-presence (as I get murdered for using such a term). You can put as much work in battery backups and redundancy as you want, but if a giant meteor crashes into the building where your main data center is, a million battery backups and ten different ISP’s aren’t going to help you in the slightest. Instead, a second (or even a third) remote site hosting duplicates (carefully designed duplicates, anyways) gives you even more flexibility, more redundancy, and often a nice, fat insurance policy.
Let’s take a look at redundancy first. Take a typical small to medium sized company. This company has two Internet providers, each on its own Cisco router, and each in its own rack. Each of these racks has its own internal and external switch and runs on its own power circuit. They also have a firewall cluster, with one member in each of their racks. They have two UPS units in each rack. Basically, all of their core network resources are redundant – even some of their network services (such as DHCP and DNS) run on Linux clusters. All of this provides protection against power failure, Internet circuit failure, and hardware failure. Either of the Internet circuits can fail, the building power can fail, one of their routers or one of the firewalls can fail, etc. However, like I said earlier, a meteor destroying the building that they’re in could completely take out everything they have, their entire e-presence. Another, probably more likely, example is if they ever decide to move into a new facility. While they could try to run one of their racks in the new facility and simulaneously run the other at the old facility, they would eventually have IP addressing issues, as they cannot announce the same IP range from two locations.
Let’s look at another option that this pretend company has. Many companies offer colocation or, if you don’t need to go that far, dedicated server hosting. The biggest difference between these two options is that with colocation you provide your own server and with dedicated hosting one is provided to you, usually at a higher cost. With this off-site server you can replicate your existing services (such as your email server), your public websites, even your remote access endpoints. The most efficient way to do this would be to run VMware on the remote server, allowing you to replicate several services on the same machine, therefore saving you rental costs.
This second option is the most flexible. If the company ever needs to move their facilities, they can do so without worrying about their services becoming unavailable. If a meteor crashes into their building, their employees can, with a few extra necessary steps, take their laptop to their nearest Starbucks or other convenient wireless hotspot, connect to the VPN at the DR site, and work as usual (well, as close to usual as you can work when your place of employment was just destroyed by a meteor). If maintenance ever needs to be done on the core network equipment, the same thing happens – everyone can work from a remote location, albeit in a probably limited manner, until such a time that the main facility is available for use again.
Keep in mind that this remote facility can be created using open source technologies. Doing so this way will minimize the cost of an off-site DR setup, so essentially you are paying only for the hardware rental or connectivity, depending on the type of hosting you choose.




