Monday, January 12, 2009

Winter Musings: January 2009. Data Storage II: Archiving.

Sorry about the long hiatus from my musings, the Holidays and some other catch up work kept me from blogging for a while.

So today we are talking about the ways you can and archive your data. There are 3 or 4 commonly accepted methods depending how complex you want to make it for yourself. The common thread is our need to maintain multiple reliable copies of a single file. You can use a RAID storage device, DVD/CDR’s, external hard drives, or remote data hosting. (please excuse my wordiness, there is alot of interesting info to cover)

RAID’s
RAID stands for “Redundant Array of Inexpensive Disks” thus all RAID implementations require more than one hard drive to accomplish its functionality. There are several types of RAID’s the most common of which are: JBOD, 0, 1, 5, 6, and 10. JBOD stands for “just a bunch of disks” and allows you to make multiple hard drives behave as they were a single drive or volume. JBOD offers no data protection, however. This is most useful when you have several small and cheap hard drives that you want to gang together. JBOD’s is less relevant these days as 1TB drives are increasingly available.

RAID0 is called “striping” which is a method of arraying data in somewhat parallel fashion between 2 drives with the benefit of faster reads and writes. The speed gain is a result of performing reads or writes simultaneously on each drive. In practical application you will never experience a 2x performance gain. Like JBOD it also offers no data redundancy.

RAID1 is referred to as “mirroring” in which one drive serves as the main drive while the second is hidden and used only to maintain an exact copy of the first. For example, with two 500GB drives you effectively only have 500GB of storage. Thus the effective capacity is 50% of the total. Losing one drive, however, will not result in any data loss. This scheme is simple and highly fault tolerant as the likelihood of two drives dying at the same time is slim.

RAID5 requires four hard drives with identical capacity to provide a useable capacity at 75% of the total. This implementation is more storage efficient but you sacrifice speed and complexity. Your data has been encoded and spread across 4 drives which can tolerate the loss of one drive while allowing you to continue operating seamlessly. RAID 5 performs a bit slower because each read/write operation requires an encode or decode step. The speed also varies depending on whether the system uses a software (cheaper) or hardware (expensive) RAID controller.

RAID6 is an extension of RAID5. Five drives are arrayed for roughly 60% capacity efficiency. This RAID level is more fault tolerate permitting 2 drives to fail while still storing your data error-free. However, this configuration is somewhat impractical for small office environments as the likelihood of 2 drive failures at the same time is slim so advantages from the extra redundancy are lost.

RAID10 is a combination of RAID0 and RAID1 that typically utilizes eight identical drives. This implementation is commonly found in corporate data centers and mid-level business servers. Data is mirrored and striped to maximize access speed and redundancy. As with RAID1 you effectively operate at 50% of total capacity. Don’t let your “green” friends know you have one of these at home unless is runs on solar panels and hamster power.


Burned Media
We are probably all familiar with DVD +/- R and CDR’s these days - virtually every computer has a multi format burner. No doubt you also know that DVD’s enjoy higher capacities than CDR’s (4+GB v. 750+MB). You have probably noticed that the two sides of these disks are colored differently. In order to store data on this media a laser burns away dye from one side during the encode or write process. The embedded silver or gold colored metal is used to reflect the laser when reading. This media is not the same as manufactured CD’s which have the data physically pressed into the actual plastic material - no dyes involved.

Writeable disks are NOT archival and generally have a useful lifespan in the 10 year range. Duration can vary depending on the storage environment. Readability depends on 3 factors: reflective material, dye chemistry, and obsolescence. Reflective materials are often alloys of aluminum which may corrode or oxide over time rendering the disk unreadable. Be weary when buying gold disks as anything can be colored gold but few materials share its stable properties. Manufactures claim gold disks are inherently archival but this is not completely true and does not account for the dye used. Dyes deteriorate over time and with exposure to UV light and heat. Again, resulting in an unreadable disk. In the photography world, archival has traditionally meant a hundred years or more - the planets would have to align for this to happen. The last item, a disk is only good so long as you have something to read it with. I challenge anyone to readily find a 5.25" floppy disk drive today. How about a 3.5" drive? Remember the 8-track tape? There will come a time when you can no longer find a drive to read that 100-year old DVD. Hmmm, even if you can find a drive, I could be likely the RAW or JPEG or TIFF formats become archaic too. Maybe we will all start using film again.

For completeness, all formats are not created equal. With DVD’s you have a choice between formats DVD-R or DVD+R. Capacities are roughly equal but data encoding algorithms are not. Suffice it to say that DVD+R has several benefits in this arena which make it a better choice for data storage. Chances are good that your drive can read both formats. If given a choice buy DVD+R's. They don't really cost much more and provide a layer of protection against loss.

Bottom line, use DVD's to make short term copies. Don't expect them to last more than a decade. Keep file copies on DVD and other storage devices. Upgrade your archive when technology changes. Do some research and invest in a reputable brand. Taiyo Yuden and TDK are well respected. These are the brands I use.


Hosted Archiving
Have you heard of services that will store your priceless wine collection in a large temperature-controlled and secured warehouse? Well, those exist for your data too. Carbonite, Iron Mountain, Mozy, and SystemSafe are some that I have found. Amazon offers this too and I'm sure Google will eventually rule cloud computing, online storage, search, fast food, and home building. Data archiving has long been the world of corporate computing and warehousing. Critical data is often bundled up late at night and transmitted to remote servers for safe keeping. One company I worked for would send DAT tapes via courier. Sometimes files are instantly available via Internet browser while others may require a more involved process. One thing you can be sure of is that you data will be kept safe. Of the above, Carbonite has been making waves and actively marketing to small businesses. Their prices seem fairly reasonable. On top of that Carbonite has no cap on the amount of data you can store. Sounds great!

How does it work? You open an account, give them your credit card, and you are ready to go. Most have a small client application that runs on your local computer and communicates with the storage company's network. From the client you can schedule regular backups, determine what files and folders to backup, view history, keep track of usage, and restore files. Of course, you will want the fastest Internet connection you can get and make sure to schedule backups to run at night. When the backups occur, the software will usually apply a loss-less compression to your files and encrypt them before transferring.

If cost is no factor, these are probably the best long-term archiving option. But keep in mind that few of us require totally secure facilities like a large bank would. One of the few drawbacks to something like this is actually getting the bulk of your data there in the first place. You will probably require weeks to upload everything because you'll probably only want to do it during periods of inactivity. Something worth mentioning is that the archiving solution is only as good as the company doing the archiving. Companies go out of business all the time so stay on top of it.


What I Do
I prefer maintaining at least 2 copies of my images and usually have 3. My working copy is always on a local internal SATA hard drive. This gives me fast everyday access for tasks like making albums, enlargements, and online portfolios. Once I’ve completed editing a wedding or session, RAW and xmp files are saved to a RAID5 NAS and burned to a DVD+R for long term storage. Both require little time to obtain a needed file and are relatively safe. Perhaps if hosting costs come down even more, I might get rid of the RAID.


--------
Winter Musings are monthly posts between November and February. They cover a range of topics related to wedding photography with couples and photographers in mind. I hope you will tune in next month. Comments and requests are appreciated!

No comments: