From Print To Screen
Part 4 - Processing the Scans
by Ben | 25th Jan 2019
Ok, the scans are in the library, and the next task is to prepare them to be output for use on the site, and also for the OCR process.
The main tasks are cropping and straightening, but depending on the scans, there may be other processes required - tweaking the colour balance, repairing damage and other page distractions. I start with the cropping.
Scans I do are generally either in an A3, two-page format, or an A4 single page format, but all pages end up on the site as single pages, so A3 scans with two pages will need to be split into two.
I select all the A3 scans for the issue, and create a new Collection for them, with ">A4" appended to the name, including the selected images as virtual copies - this means the scans are duplicated and can be processed independently and non-destructively, but still refer to the original scan files on disk so no new files are created. All processing is now done with the scans in the ">A4" Collection.
With all the scans selected, we hit "R" for the crop tool. I'll first crop to the top and bottom edges, and then drag in the edge to crop to the left page, and hit return - LR will crop all the selected images in the same way and we'll have the left pages only. The images are still selected, so we'll do a "Create virtual copy" operation which will create a virtual copy of each page - this will become our right page. The new virtual copies will remain selected, so hit "R" again to crop, and we reposition the crop to show the right page, and hit enter.
For images scanned as A4, in most cases we will have alternating rotations for each page. I mentioned in the previous part that we do not rotate these yet. We first want to do a batch crop on all pages so we crop in the same place - otherwise we'd have to crop half the images one way, and then the other half a separate crop, and it's hard to keep page sizes consistent.
So, select all the scans, hit "R" and crop to the page size we need (I usually come in a little from the absolute edges as this helps keep a cleaner image and gives a little room to shift the scanned image around). Once the pages are cropped, we can then select the upside down ones, and command-] to rotate them 180 degrees to the proper orientation.
Shouldn't this batch cropping be enough? Well, no, it isn't enough. Contrary to what you may think if you've not had publishing or printing experience, but magazine page sizes are *not* consistently the same size throughout an issue. Due to the binding, the outside pages are bigger than the inside pages, so the horizontal page size shrinks as you get to the middle of the magazine, and then grows again as you go to the back.
So depending on which page I made the initial crop above, I'm going to have a bunch of page creep to deal with, and for this I'm going to have to go through page by page and correct for this. And while I'm doing this, I will also deal with page straightening as well - both these tasks are done with the crop tool in LR.
Why the need for straightening? Shouldn't the scans be aligned with the scanner edges and therefore already be straight?
Well, there are a number of things to bear in mind here. Sometimes magazines can be physically cropped badly, usually at a jaunty angle, so even if the magazine edge is against the scanner edge, the page content will still be crooked. Sometimes with dual-page A3 scans, the act of pressing the binding down to flatten the page will make pages crooked. And often, even if the page is straight, the printing on the page might be crooked at print time. For the older magazines which were set pre-digital, then often parts of the page will be manually positioned inaccurately and at an angle. All these things mean that I like to eyeball the straightening process and do a decent job of it - it helps the OCR process too.
So, we start with the first page, hit D for the Develop module, and hit R to go into the crop tool. By default, the cursor keys will nudge the image around the crop, but if you position the mouse over the Angle, then the left/right cursors will nudge the rotation instead. So if the page needs rotating a little, I do this by eye until it looks good, then command-right to go to the next page, and keep going.
I judge the angle by eye (although there are some guides to help see the horizontals) because I prefer this method to some of LR's automatic straightening routines which work ok about 85% of the time but give some problems, which are slower to identify and deal with than just systematically going through and doing it manually. Also, often you need to make some judgements on what looks the best when the content isn't particularly straight in the first place, or when there are multiple items on the page with different levels of straightness. Most of the time, I'm trying to get the core body text to be straight as it makes the OCR process better, and for the rest, I'm just trying to get the nicest balance and get the most out of the scanned page image.
From time to time I will need to nudge the page left/right or up/down, so I just move the mouse a little off the Angle parameter, nudge the page with the cursors, then back to the Angle parameter. Occasionally I will need to move the crop edges on particular pages but I try to minimise this to keep the pages a uniform aspect ratio.
In this way, it's fairly fast to step through a complete issue page by page and carry out any necessary minor adjustments to the crop and page angle.
Scans generally need a bit of colour and contrast boost, as scanners tend to scan fairly flat/neutral (and I certainly set my scanners to a neutral setting). This means no colour processing is baked into the scans, so it's completely non-destructive.
I have a LR preset designed to minimise print-through (where you can see a faint image of the content that was on the other side of the scanned page) and also boost the colour and contrast a little.
So I will make sure all the scans are selected, and apply the "Magazine Page Process" preset, and all the pages will have those settings non-destructively applied.
Scans often have a colour cast - it's quite usual for old magazines to have pages that scan rather yellow, or orange, as they age over time and we want to remove that where possible.
For black & white issues, it's fairly straightforward - we can either set the pages to be black and white and discard the colour completely, or alternatively I might reduce the colour saturation down significantly, for a gentler conversion to B&W.
For colour magazines, or magazines that contain a mixture of colour and B&W, I'll often use Lightroom's HSL tools to reduce the saturation of the yellows/reds/oranges, and sometimes bump up the luminance to brighten up the aged areas if they are causing darkening. Another approach is to use LR's white balance tool to set the white balance for the entire issue.
Which method I use depends on what's giving the best results with the smallest amount of effort, bearing in mind the page distribution of the issue and the problems to be fixed.
When a page hasn't been pressed down firmly onto the scanner, such as in the middle binding area on a dual-page A3 scan, you will often get fairly ugly dark shadowing on the image. With magazines with thick bindings, it's often very hard to get this flat all the way to the binding - but a little bit of LR processing can improve these quite a bit.
I will quickly go through the pages and mark the left pages that require edge processing, select those and use the Graduated Filter tool in LR, set to increase shadows and exposure, and draw it in from the page edge. This produces a faded in exposure boost to compensate the edge darkening over the selected left pages. Then I will do the same for the right pages, where the darkening comes in from the other side.
Again, I have presets set up for left and right edge processing, but these will usually need to be tweaked as I apply them as the amount required will vary per issue. Sometimes, a few pages might need some more heavy handed processing than other ones.
LR has a few simple tools for non-destructively removing spots and blemishes, and if anything stands out while I'm going through the pages I will just use those to tackle them - but more serious things will need to be handled in Photoshop.
These things include where someone has written on the page (eg, filled in a competition form with their address details), where a page is ripped/damaged, or there are other serious page distractions or scanning artifacts that need fixing. Some of this includes synthesizing new image areas - I'll be looking at this when I go over handling the article images - as I have to do this quite a bit.
How long scan processing takes really depends on the state of the source images. In the best case, where the images are already cropped nicely to A4, and need no straightening or fixes, it can be literally just a minute or two to run through the pages to check them by eye, batch apply the colour preset, and they are ready to go.
In the worst case, the pages many need a lot of manual straightening, fixing of blemishes and other scan problems, exporting to Photoshop to remove unwanted content and colour balance issues - and with some magazines pushing over 200 pages, it takes more time - up to about an hour or so. On average though, I typically take around 10-20 minutes to get one issue's scans ready for output.
So, we've started with our raw scans, and cropped, rotated, converted to A4 if necessary, straightened and jazzed up our scans, all shiny and ready for Export. Next time!
Next part: From Print to Screen - Part 5: Outputting the Scans to Use
Part 6 - OCR Part 1a - Contents & Metadata | Apr 2020
Blog entries from 2019...
Synth Patches - The Return | Dec 2019
Part 5 - Outputting the Scans to Use | Nov 2019
*Almost* the first DAW... | Oct 2019
...and Three New Things (Polyphony, Ads, & Stats) | Mar 2019
Part 4 - Processing the Scans | Jan 2019
Blog entries from 2018...
More Flexible Gear | Oct 2018
Part 3 - The Scan Library | Oct 2018
and music shops, and lunch money... | May 2018
Our 200th issue brings One Two Testing to mu:zines | Apr 2018
Birthday Time Again! | Mar 2018
Don't miss them! | Jan 2018
Part Two - Scanning | Jan 2018
Follow mu:zines on Twitter: @mu_zines
for updates and other bits and pieces