Managing a digital library of images
Workflow, storage and backup
The need for a cataloguing system
It used to be that only the professionals or advanced (and rich!) amateurs would shoot hundreds of photographs in a couple of days. Part of this was because of the cost. Even by going the budget route, each image cost about 50c to produce once you took the cost of film and developing and printing into account. Also, more importantly,
every image cost that much to make. If photography is just a hobby, spending a hundred bucks a week on it isn't really an option. As a result, there were fewer images to manage and you always had physical copies.
Now, with digital cameras, the 'film' is reusable, needs no developing and you only print what you really really like, liberating you to take hundreds of images over a single weekend. The flip side of this bounty is that the need for a good way to catalogue and maintain the image library becomes very important - you collect a couple of years' quantity of film shots in a few months. Below is the method I use - it is largely camera independent but somewhat Windows tilted on the computer side. I am sure the principles can be easily applied even if you use Macs or Linux boxes. This process also lets you put out a webpage and be quickly able to track back any image that some reader of your website has a question about. "When was this taken?", "What's the EXIF metadata", "Can I have a larger JPG for my school project?" etc.
Pre-processing
Start with a root folder where you will keep your images. C:\Photographs is as good a choice as anything else. Create a sub-folder under this with a name such as cd
nnn : starting with cd001. Underneath, create these sub-directories : Raw, OrigJpg, Tifs, Thumbnails, Standard and Large.
My camera is the Canon Powershot G2 which can shoot in the usual JPG mode but also in Canon's proprietary RAW mode which is essentially Canon's way of compressing the standard TIFF files.
You will shoot either in RAW or JPG or a combination of both. Once you suck these images into your computer, move all the Original JPGs to OrigJpg and the RAWs to Raw under cdnnn. Delete all the absolute losers - these would be the ones that are wrongly exposed or shaken or out of focus or just plain bad.
Next, rotate whatever images you need to. Warning: Right clicking on an image in Windows XP and choosing to rotate will degrade the JPG quality. Win XP will warn you but Canon's own Zoombrowser and even Adobe Photoshop CS will not. The way to go is with lossless rotation. Many programs can do this. I use - and very highly recommend - Irfanview.
One step I have started to follow of late is to update the metadata of the image. Adobe's Photoshop CS will let you update the metadata of multiple images at one shot. So you could update the EXIF/IPTC of all the images from a trip to Yosemite National Park with the phrase "Yosemite National Park, California". Then to a smaller selection you can add more data like "Bridal Veil Falls" or "Ahwanee Lodge". Once you have tagged your images to your satisfaction, you are under lesser pressure to go with overly descriptive file names. This metadata is also a great place to add copyright information with your contact email address.
Choose a descriptive file name for each of the images - for example : "calif_capitol_side_thro_trees_htal", "drum_bridge1" etc. Once all the file names are changed to be more meaningful, batch rename all the files to have the same prefix as the directory they are under - for example, rename all the files under cd001 to "cd001_"+file name. That is to say that "drum_bridge_1" becomes "cd001_drum_bridge_1". Renaming multiple files in one go can be done in Windows XP. If you use Windows NT/2000 like I do, use a tool called Flash Renamer.
If you shot in RAW, batch convert them to TIFs with your favorite software and move the TIFs to the Tifs folder.
Workflow
Go with your standard work flow :
Auto-Levels, Saturation, Unsharp Mask, Resize, Borders, Copyright statement etc. Use
Adobe Photoshop's Actions to implement your worflow consistently for all of your images. Once this is done, save your image as a JPG in three sizes :
Thumbnail. I find that about 135 X 185 pixels is a good compromise between size and visibility. This is for the thumbnail on your webpage or for leaving on your hard disk for a quick preview
Standard. This is about 500 X 700 pixels. If you have a website with hundreds of images, embed the pages with the thumbnails which point to this larger image.
Large. This is upto you but 1000 X 750 just about fills a standard 1024 X 768 pixel monitor. You can further link the Standard image on your webpage to this Large size if you have the bandwidth to spare.
Click on the image at the top of this article to see what I mean or see it in action on this website on the Big Sur page or the Golden Gate Park page. Also , here are some additional thoughts on good website design.
Change any of the above sizes to your liking but remember to leave the original file from your camera - either the RAW or TIF or JPG untouched. This is the master copy should you ever change your workflow or want the un-manipulated version for a print or other purpose.
Move each of these images to the appropriate folder. Finally, you have a structure such as this
cd001
-->Raw
-->Tifs
-->OrigJpg
-->Thumbnails
-->Standard
-->Large
with the re-sized images in the appropriate folders.
Back Up
Now, burn a CD of the entire cd001 directory. Next, make a second CD just in case you scratch or lose the first. Make three CDs if you want to be really careful. CD-Rs vary from free(!) on mail-in rebates to some $10-$20 for a pack of 100. It is probably a good idea to buy a name-brand than an unknown cheap brand which might deteriorate. However, don't sweat that a CD-R will not last 25 years - see the 'Longevity' section below.
Then, go out, shoot some more, create cd002 and so on...
Search and Retrieval
A typical thumbnail is about 10K to 15K. 20K max. Per 1G of hard disk, you can save about 50 to 75 THOUSAND images which your Operating System can then can easily search by file name. This is the simplest way. An even better method is to tag the EXIF/IPTC metadata tags with relevant information. Programs like Adobe Photoshop CS or Irfanview will let you edit the EXIF/IPTC to add whatever keywords you think are necessary. For example the image of the Korean War Memorial at the top of this page could have all of the following keywords: "korean war memorial", "memorial", "washington DC", "soldiers", "statues" etc - putting all of which into a file name is impossible. The real benefit of tagging images with metadata will likely come in the near future when search engines like Google and even operating systems like Windows XP start to be capable of searching by words in the metadata and not just file name.
Longevity
How easy will it be to find, in the year 2025, the image you took in 2003? Will it be on a CD, but unreadable because CD-ROM drives have gone the way of the 5 1/4" floppy disk drives? My approach to this is to keep changing the media every few years - this needn't be as time consuming as you think because every new generation of media holds many times as much data as the previous one and so the number of discrete pieces of storage that you use will shrink dramatically. For example, 5 1/4" floppies held 360K, the 3.5" holds 1.44MB ie a quadrupling in capacity. A CD holds 700MB which is a 440X increase in capacity over a floppy. As of today (2003), the next 'big' medium is a DVD which holds 4.7G of data. You will typically be able to fit 10 CDs worth of photographs on a single DVD. It might well be that as you go on collecting images of ever larger resolution, the number of pieces of media on which you will store them will either remain constant or even go down. Let's call that Srinivasan's Law - remember, you read it here first. While on the topic of predictions, here are some thoughts on the digital cameras of the future.