From Print To Screen
Part 3 - The Scan Library
by Ben | 16th Oct 2018
Now we have our issue scanned, it's time to organise the files and add them to the scan library. The initial scan folder has files with generic numbered names, so I first rename them using a renaming utility with a quick saved search/replace template, so they get renamed as follows:
Now I move the files over to the scan library drive, where all the scans live. This has the following folder structure:-
/muzines/sound_on_sound/1986/sos_86_01_jan/01 original scans/*images*
Now the files are in place, it's time to import them into the scan library.
Fortunately, I have quite some photography experience so I already use an amazing tool to manage, organise and process large quantities of image files non-destructively - Adobe Lightroom.
This handles tens of thousands of images with ease, and has a number of significant advantages in organisation, tagging, image processing and workflow - you can batch process images, and you can crop, rotate, fix and tweak images non-destructively.
Lightroom is ideal to maintain, and process a large image scan library efficiently. There is no way I would attempt of project like this by just saving files to folders and processing them one at a time, destructively. Lightroom is all about Workflow, and Workflow is key.
The dedicated muzines Lightroom library looks like this:
There is a Collection Set for each publication ("Mag - Sound On Sound"), then inside that I break the issue into Collection Sets for individual years, to help keep the lists manageable ("1985") and a Collection for each issue which contains the scanned pages ("sos_85_11_nov").
This lets me easily get to the scans for any issue of any publication, even with hundreds of issues and tens of thousands of scans.
On import, as well as putting the scans into the correct collection smart folder, I also tag the issues with the publiction short name (eg "SOS"), the page size "A4", and the name of the person who scanned it, if it was not scanned by me.
(I used also do some additional metadata - for example, tagging "Cover", "Contents Page" "Editorial", "Full Page Advertisement", "Supplement" pages etc so I could quickly find various page types but I found in practice I didn't really need it enough to continue with it.)
I have additional tags set up to use so I can mark pages as poor quality, and flag for rescanning should there be a scan problem and so on. This lets me easily filter and show all pages flagged for rescanning, and so on, to help keep track of problems that need to be resolved.
I will set the display order to "Filename" if it's not already, which means the scans will be displayed in page order. If any custom ordering is required (for example, moving supplemental pages that are not page numbered and would otherwise break the non-contiguous page numbering and article indexes) I will move those to the end of the pages, and LR will change to "Custom Order" to retain my chosen ordering for that issue.
For scans that have altering page orientations due to the scanning process (ie, every other image will be upside down), I don't deal with that yet, for a particular reason (which I'll go into in Part 4). All I'm concerned with in the import stage is getting the scans into the library, in the correct place and tagged appropriately.
In some cases, I might have scanned an issue that has missing pages. For this situation I have a placeholder image in the library (in the "Helper Images" collection) which I will (virtually) copy into place in the issue for each missing page. This means I can export the scans for the website in the correct page order and respect the article numbering. I will also rename the issue collection to include "*INCOMPLETE*" as a visual reminder the issue is not complete.
When I rescan, or otherwise get replacement pages from other contributors from a better copy of the issue, the files will be named similarly and I put them not into the "01 Original Scans" folder in the library files, but in a separate "02 Replacements" folder. This keeps the two scan session files separate. After importing these into the library and putting them into place, I will tag them as "Rescanned" and with the name of who did the scans. This just helps me keep track of the source of the files in the future.
Renaming the scans takes only a few seconds, and moving the files to the scan library takes a minute or so to copy over. Then the import to LR process and ordering takes another minute or two - it's pretty quick. So going from a folder of raw scans to properly renamed and relocated files, and imported into LR, takes about five minutes. No big deal! ;)
So, once an issue is scanned, the procedure is:
- Rename the scans
- Move the scans to the scan library drive
- Import them into Lightroom, setting some default tags and a destination
- (Very Occasionally) If there are missing or significantly damaged pages, handle and tag those, and update the collection name as a reminder. I will also mark the issue as "Incomplete" on the website.
- Generate smart previews
A great LR feature is Smart Previews, where I can see, and work on, scans on my laptop, even when I don't have the scan library drive connected. On import, LR builds the lower quality smart previews to a local cache file. This means I can sit in a coffee shop and reference and process scans without having to have 2TB of scan library files with me - this is great in that it removes another bottleneck, as I can choose to work on the scans wherever I am, even if I don't have the actual scan files with me.
And working on the scans - Processing - is what I'll be documenting next time. A3 -> A4 conversions, cropping, rotating, straightening, colour balance, image repairing, print-through minimisation, all that fun stuff!
Next part: From Print to Screen - Part 4: Processing the Scans
Part 6 - OCR Part 1a - Contents & Metadata | Apr 2020
Blog entries from 2019...
Synth Patches - The Return | Dec 2019
Part 5 - Outputting the Scans to Use | Nov 2019
*Almost* the first DAW... | Oct 2019
...and Three New Things (Polyphony, Ads, & Stats) | Mar 2019
Part 4 - Processing the Scans | Jan 2019
Follow mu:zines on Twitter: @mu_zines
for updates and other bits and pieces