Can you afford to lose all your data? Of course you can't. And if you can't, then not paying good money to back it all up just doesn't make sense. In this article, we examine some of the options every online business should be considering for backing up its data, in case the worst happens.
Back up your backups! Say that ten times fast. This is a life lesson as an Internet business owner I have learned repeatedly and way too often more times than I care to count. Just when you think you have enough data backup safeguards in place, one of them fails and all your data is gone, your web site is down, your business is dead. That is probably one of the most catastrophic things that can happen to most businesses, where many jobs and incomes are on the line. There really is no room for error, and you will not realize just how extensive the damage is until it's too late. So it is my hope and intention to give you some food for thought on best practices for backing up your web site/server so that you may hopefully avoid the pain and distress I have experienced first hand.
In this article it is my hope to point out some of the obvious and maybe not so obvious ways that people and businesses alike back up their websites; then we will discuss what can go wrong with those concepts, and finally we will cover some backup/data protection concepts that have worked for us and may work for your business. The key to getting the most out of these ideas is keeping an open mind. I can almost guarantee you that some of these concepts might seem overly redundant and even a bit obsessive compulsive and if that's the case then it's a good thing. And if you are budget conscious and these days who isn't many of these solutions are going to sound overly expensive. The comparison I try to run in my head is the cost of backing up data versus the cost of being out of business. That's usually a very easy computation to make.
The concepts and ideas we are going to cover will apply most to anyone who is leasing or owns a dedicated server or two or twenty. Even though it is possible to apply some of this article toward shared accounts, most of the techniques discussed will not be applicable. For the simple, straightforward web sites, chances are you will be fine with a simple host-offered backup service. What we are aiming to cover in this article is the more complex backup scenarios which involve a lot of mission critical data, files, programming, and even possible customer information such as credit cards, addresses and other sensitive information.
Back up Your Backups - RAID, Shmade
I used to think that so long as I had a RAID configuration, my data would be safe and I could sleep at night. After all that's what the hosting companies tout the most for data backup making sure all of your hard drives are in a RAID configuration. Raid is defined by Webopedia below:
Short for Redundant Array of Independent (or Inexpensive) Disks, a category of disk drives that employ two or more drives in combination for fault tolerance and performance. RAID disk drives are used frequently on servers but aren't generally necessary for personal computers.
Now there are a number of different types of RAID with which you should make sure you are familiar:
Level 0 -- Striped Disk Array without Fault Tolerance: Provides data striping (spreading out blocks of each file across multiple disk drives) but no redundancy. This improves performance but does not deliver fault tolerance. If one drive fails, then all data in the array is lost.
Level 1 -- Mirroring and Duplexing: Provides disk mirroring. Level 1 provides twice the read transaction rate of single disks and the same write transaction rate as single disks.
Level 2 -- Error-Correcting Coding: Not a typical implementation, and rarely used, Level 2 stripes data at the bit level rather than the block level.
Level 3 -- Bit-Interleaved Parity: Provides byte-level striping with a dedicated parity disk. Level 3, which cannot service simultaneous multiple requests, also is rarely used.
Level 4 -- Dedicated Parity Drive: A commonly used implementation of RAID, Level 4 provides block-level striping (like Level 0) with a parity disk. If a data disk fails, the parity data is used to create a replacement disk. A disadvantage to Level 4 is that the parity disk can create write bottlenecks.
Level 5 -- Block Interleaved Distributed Parity: Provides data striping at the byte level and also stripe error correction information. This results in excellent performance and good fault tolerance. Level 5 is one of the most popular implementations of RAID.
Level 6 -- Independent Data Disks with Double Parity: Provides block-level striping with parity data distributed across all disks.
Level 0+1 A Mirror of Stripes: Not one of the original RAID levels, two RAID 0 stripes are created, and a RAID 1 mirror is created over them. Used for both replicating and sharing data among disks.
Level 10 A Stripe of Mirrors: Not one of the original RAID levels, multiple RAID 1 mirrors are created, and a RAID 0 stripe is created over these.
Level 7: A trademark of Storage Computer Corporation that adds caching to Levels 3 or 4.
RAID S: EMC Corporation's proprietary striped parity RAID system used in its Symmetrix storage systems.
The types of RAID we want to concern ourselves with the most for data back up purposes are RAID 1 and RAID 5 configurations. RAID alone should not make you all warm and fuzzy. What a RAID configuration will help you with the most is if one drive fails. Then you should have a mirrored drive which you can then rebuild the data from with no data lost.
That is in theory. Many times a RAID controller will become bad, possibly from an electric short or spike, dust, an ant (did I say ant, yes I did!) or some other malfunction which can cause damage to all drives connected to it simultaneously. Sound far fetched? It isn't this has happened to us at least on three different occasions!
Back up Your Backups - Tape, Shmape
Tape backups are considered to be one of the very first ways created to back up data. This mechanism is still used today. Usually tape backups consist of an automated process which can be run incrementally each time data is altered, or nightly, or weekly, or at some kind of regular interval.
There are three big problems with relying upon this solution. First, tape backups are prone to hogging system resources while they're doing their thing. Second, most tape backups are only as good as the software than runs them, and often they have difficulty backing up certain files/directories. Finally, restoring from a tape backup is extremely painful and time consuming. One other thing to consider is sometimes the tape, over time and depending upon temperature conditions, can become corrupted so even if all your data is backed up onto tape, you may still not have a complete backup.
Many hosting companies to this day will still offer tape backups as an affordable alternative solution to a RAID configuration. Or they may even combine tape backups with CD/DVD backups, where they will archive your tape data onto CD/DVDs each day/week/month or whatever you are willing to pay for. Naturally CD/DVD backups are still only as good as the tape backup, since the data would be pulled from the tape first and then burned onto a CD/DVD, but even with CD/DVDS there can be scratches, they can get lost you get the idea.
Ultimately this kind of backup procedure is better than nothing, but should only be considered as a stand alone solution if you simply cannot afford anything else. Still, the other question you should be asking yourself is if you actually can afford to lose all of your data. Not many people or businesses are able to afford such a disaster.
Back up Your Backups - Other Types of Not So Obvious Backups
Some hosting companies will offer a more direct back up approach, whereby they grab all of the content off your hard drive nightly or weekly and place it on one of their hard drives on a separate server. This approach is an unusual one, and not many hosting companies offer it. The ones that do tend to charge an arm and a leg for it. However, you can see this means of backup definitely has some serious upside potential. First of all it is usually much more reliable than the previously discussed back up options. Second, to restore from an identical backup which is located at your existing hosting company is usually pretty painless. There will still be some down time, but not usually as much as the other back up scenarios (with the exception being a RAID configuration where the RAID stays intact).
There are also some back up services outside the web hosting spectrum where you can pay a third party to log onto your server(s) and download your data as frequently as you wish to pay for. This is also not a bad solution which takes into account the minute possibility that something disastrous happens to your entire hosting company. At least with this scenario your data should be safe since it is being stored outside the hosting company facilities. The main downside with this alternative is that it tends to be extremely expensive and not entirely secure, as a third party would have all of your data and would require FTP access to all of your boxes in order to pull down the data.
One other possibility, depending on how handy you are with customizing and creating your own scripts, is putting together a cron which would grab select data at regular intervals and copy it to a destination of your choice. A "cron" is the clock daemon in UNIX which executes commands at specified dates and times according to the instructions you provide it. Crons are used to run scheduled jobs such as system tasks and nightly security checks. In this particular instance, a cron would handle backups as well.
And Now It's Time For the Breakdown
Okay, so we covered all of the different types, forms, and functions of backups and the obvious question is, which one should you use? Well I can tell you the kind of set up Developer Shed uses and you can decide for yourself if any of it makes sense. I will say this much I have at some point during my 10 years of running web sites tried all of the previously discussed solutions and have still lost my share of data to all kinds of freaky situations you never think could occur until they actually do.
We actually employ a mix of all of the above back up solutions. Like I said at the beginning of this article, you might think that it would be overkill to have so many different solutions in place which are all basically doing the same thing backing up data. But it has been my experience that whenever I have had to rely upon a back up due to hard drive failure or any kind of hardware issue, more than half of the time it was not a complete restore.
We have our own home grown crons which back up critical data on a daily basis, we have RAID across all of our hard drives and servers, and we also have secondary boxes which are constantly replicating all of our data. Finally we have our hosting company download all of our data nightly to one of their own servers. Would you consider this to be overkill? It most likely is. Even as I sit here typing this article it occurs to me that I should probably look into one of those off site third party back up solutions because we are located in South Florida, landing site of numerous hurricanes, and our hosting company is also located in South Florida. I suppose the answer to that question really depends on just how paranoid I really am. Please excuse me, I have a few phone calls I need to make
|