Photographers create assets – images. Ideally, these assets can be used for the life of the photographer and beyond.
In the “good” old days of film, the problem was keeping the materials, i.e. negatives and prints, safe from harm (fire, flood, mildew, theft, etc.) and in pristine condition. The latter could be a problem if the materials where not of archival quality or contaminated or whatever.
As “digital” photographers our assets are by nature primarily digital files. The advantage of digital files compared to negatives or prints is that the problems associated with those are easily mitigated or can not occur because of the different nature: keeping a digital file safe from harm is easy is enough copies are kept in enough different physical locations. And deterioration is also not a problem for digital files.
There are, however, a number of risks and issues specific to digital files:
- All currently available media deteriorate over time.
- Files may suffer from “bit rot”, usually through copying errors or because of media deterioration.
- Hardware becomes obsolete and it may not be possible to use todays media in 10 or 20 years because the physical connectors are no longer supported, because there are no drivers, or because hardware can not be replaced if it fails.
- File formats become obsolete and are no longer supported.
Let’s look at the specific problems.
An old quib is that it is not a question wether a hard drive will crash but when. Hard drives are mechanical devices, so it is just a matter of time until they wear out or dust gets inside or something else goes wrong. And if you think that keeping the hard drive in a cool, dark place will help I have to disappoint you: the lubricants deteriorate and eventually the hard drive will fail. In fact, a drive that is spun up regularly (every few weeks) will last much longer than a drive that is just kept in storage.
Optical media, such as CDs and DVDs will deteriorate quickly if not stored perfectly. And even if they are stored under perfect archival conditions, there is ample evidence that the expected data retention time is much, much lower than the time claimed by manufacturers. A friend of mine, for example, found that after 5 years 20-50% of his burned CDs and DVDs show one or more errors that can not be corrected.
Magneto-Optical (MO) media where guaranteed for 30 years and it seems that these claims where accurate. Unfortunately, this is obsolete technology and the drives are no longer being made.
Ultra density optical (UDO) media are interesting, with manufacturers guaranteeing 50+ years of data retention.The drives start at about USD 1000, media are about USD 2 per GB, so UDO is not cheap. This may still be a viable option if the data is valuable enough.
Solid state memory, i.e. flash, is usually able to retain data for 10 years. After that, all bets are off.
There is, of course, an easy way to deal with these issues: copy to a new medium regularly. Regularly means that you must guarantee that a copy (better: two) is created well before errors can be expected.
Data may change as the medium it is stored on deteriorates. Data can also change due to copying errors.
There are various techniques to detect and correct bit rot. They all involve checksums and error correction codes to correct problems.
Hardware becomes obsolete with time. Do you still have a ZIP drive to read ZIP media? If you do, does it work with your current computer? And what happens when the drive breaks and you need a new one?
It is relatively easy to protect against this if you copy to a new medium (that is going to be supported for a number of years) regularly.
If you have data stored in a proprietary format, chances are that you have experienced this problem already. It seems that every new version of Microsoft Word, for example, seems to introduce subtle differences in the way old documents look.
Data stored in an open format, preferably with one or more open source implementations for reading and writing the format, is much safer from becoming obsolete. It seems that JPEG and TIFF, for example, will be around for a very long time, if not forever.
So how does this translate into a viable strategy for keeping my digital assets safe for a long time?
First off, I keep all my data in open formats. For digital negatives, I use DNG files with XMP metadata. Processed files are stored as TIFF. I must admit that I sometimes also use PSD (Adobe Photoshop), which is proprietary and not open, but only to store working copies, which really only make sense in Photoshop anyways.
I then calculate an MD5 hash over each file which is stored with the file. This way I can easily detect changes to the file.
The files are stored on multiple hard drives. I use a combination of FireWire and USB external drives and internal drives that are quickly swapped using a Sharkoon SATA QuickPort which is a docking bay allowing a SATA drive to be connected to a USB port.
The drives containing multiple copies of the files are stored at home, in the office, and offsite (bank vault, friend’s house, etc.).
I have a regular calendar task to check the hashes on every drive every few months. If a single error is found on a drive the entire drive is tossed. This is not a problem as long as there is still a good copy on a different drive. The integrity check has the added benefit of ensuring that the lubricants of the drive get a good workout.
There is another calendar task to replace every drive after it is 3 years old. With the inevitable march of technology increasing the capacity of hard drives, I can consolidate the content of two old drives on a single new drive.
This strategy is probably not perfect, but it does give me a high degree of confidence that the files will still be useable in the distant future.