Lightroom: Backups and the Debunking Unit
Lately there has been many blog posts on backups, how to backup, the hardware... Some blog post started it (I don't know which one) then everybody repeats it. Most of them are repeating the BS from the "semi-techies" / sales people. So I had to call, at my own expense I may add, the famous Debunking Unit to cleanup some of the mess.
Notice that almost everybody talks about backups and very few people talk about recovery and restoring.
Hardware
- "You need at least one RAID device to store your images." RAID stands for Redundant Array of Independent Disks. The very simplified purpose of RAID is to split the data between hard drives so that if one drive dies, you can replace it without loosing the data.
The problem is that there are two main ways of doing the RAID "thing," hardware or software. The hardware way is more expensive but it's faster and can scale to much larger sizes than software. The problem with the hardware is that if it breaks, you will need an exact replacement. Will your RAID controller be available in 3 to 5 years? 3Ware has been around for more than 15 years. Will the RAID box be available in 3 to 5 years?
RAID 1 or RAID 5 or RAID 10 only protect you from the drive going bad, it doesn't protect you from the biggest problem, the user. Users delete files by mistake or overwrite files by mistake… RAID doesn't protect from any other computer failure. RAID 0 protects from nothing! If you are going the RAID route, you should buy two and "mirror/sync" the second one so that you can use it if the first one goes south. - Hard drives: "Don't go cheap." Today's hard drives, and I'm talking about the 2Tb or the 3Tb drives, cost in the range of $100 (Canadian) to $650 (Canadian) depending on the size and the speed.
Price has nothing to do with the reliability of the quality of the drive. The likelihood of having a bad sector or a component fail is the same on a cheap or an expensive hard drive. All current technology hard drives support Self-Monitoring, Analysis, and Reporting Technology aka S.M.A.R.T. This is well supported by Windows, Mac, Linux, FreeBSD, Solaris... The drives will tell you if any problem will be happening and they will try to fix the problems before the drive goes "kaput."
RAID is about storage and has nothing to do with backups.
Software
All current operating systems (OS) be it Windows, Mac or Linux supports software RAID where you just plug-in the hard drive and the OS will do the RAID "thing."
It's usually a little bit slower and can't really scale past 5 hard drives and that's already pushing it.
Software RAID works across OS versions. The RAID from Windows, Mac and Linux... has the same since the late 1990s with the introduction of Windows NT 3.51.
The Cloud
Store your stuff in the cloud and you won't have to worry about it, no hardware to buy, no software to configure… It's convenient to share files between people, computers... It use to be called networking, intranet and VPNs.
- Can you trust that these people will be in business 6 month from now? Nobody can tell you that.
We just had the MegaUpload problem where the US government seized all the computers and arrested all the people, in New Zeland, involved because of "piracy complaints." I am sure that there were some pirated movies being shared but the majority of the data stolen by the US government was regular data from companies using these servers for backups and for sharing documents, projects, programming...
6 month later, the US government wants the people/companies to pay to return their data.
I don't know about your volumes of data. I'm writing this blog post over the Internet at a speed that is barely faster than dial-up. Definitely no high-speed especially on uploads. It would take about a couple of month to backup my stuff!
What does Google do?
Most of Google's "stuff" doesn't use RAID or backups as we know it! This is about Google's Internet "stuff."
- Google does not buy computers from Dell, HP, Acer... Google only buys motherboards, hard drives, power supplies and the various components. No case, no nothing. 22 computers per rack. Only one hard drive per computer. Each rack has the "whole Internet", actually it's only the Google cache which means no video, no image, no graphics...
- As of last count, March 2012, Google had 1.3 million of these racks spread around the world.
- As of last count, March 2012, Google had a 9% failure rate. This means that 9% of the racks are broken/shut down for one reason or another.
Google doesn't care. That's why there are so many other that can take over.
What do I do?
As far as backup as concerned, I went with the Google model but on a much smaller scale. I currently have 9 different standalone hard drives for backup. I will be buying a few more when the price will be low enough. My price point is less than $80 Canadian for the 2Tb hard drives.
Whenever somebody tells you about how his backup is better or cheaper or safer than everybody else, ask for this simple question:
How do you do a bare metal recovery?
What's bare metal recovery? Everything is gone. There was a fire, a flood or somebody stole all the computers.
I can explain the steps in less than 10 minutes and the recovery will take between 4 to 8 hours. It's fairly easily to cut down the recovery to 30 minutes! Easy but expensive.
Here are my answers:
