From Print To Screen
Part Two - Scanning
by Ben | 12th January 2018
So - it all starts with the scanning. It's important to get good quality scans, as this makes all the subsequent steps easier.
I have used a variety of scanners since starting mu:zines, but currently my main scanner is a Mustek 2400S A3 scanner, and I also have a Plustek Opticbook 3900 A4 edge scanner which comes in handy for hard spined issues that can't be flattened.
An A3 scanner is very useful as you can scan a two-page spread of a typical A4 magazine in one pass, and you get to keep the page contents across the fold, which is useful for images that cross the centre page boundary.
It also means I can scan A3 magazines, like the large, almost newspaper-like format Making Music - albeit one page at a time.
I don't own any scanners with automatic sheet feeders, but for my use cases these aren't that useful as you have to destructively process each magazine (separate all the pages) to use them. In the cases where this is unavoidable (see "Hard Bindings" lower down), these scanners might be more useful, providing they can correctly feed individual sheets through without any getting stuck together (missing/unscanned pages are often difficult to catch).
The scan software I use is Vuescan, from Hamrick Software, which I highly recommend. I cannot stress how important it is to just be able to hit one button and have the page scanned, named and saved to disk without a separate save process, extra mouse clicks or other interface interferences. I'm performing a scan operation tens of thousands of times, so it's important to keep the effort of scanning as minimal and quick as possible.
Vuescan's scanner handling is very good, and it is cross platform software so it works nicely on the Mac with supported scanners - however, scanner driver support on the Mac is less good and the scanners I currently use really require Windows, so I mostly run Vuescan in a Windows VM (or sometimes with a separate Windows laptop). Not ideal, but not really a problem in practice.
I remotely control Vuescan by MIDI commands, which enables me to have a little Korg Nanopad by my scanner (or even better, a footpedal connected to a MIDI keyboard). So all I do is line up the page, and hit whatever pad to trigger the scan process, after which the file is immediately saved. I prefer manually triggering a scan, rather than setting Vuescan to scan automatically every few seconds as sometimes pages require different amounts of time to line up correctly.
This MIDI triggering is handled by Keyboard Maestro, in which I've set up a macro script which listens for the incoming MIDI command of my button press and initiates a scan in Vuescan. I use Keyboard Maestro quite a bit to automate or "drive" various processes that would otherwise have to be done manually. More on this in later parts of this series...
I scan at 300dpi to uncompressed TIFFs, which are saved to a scan folder, automatically named and numbered by VueScan. Scans are large, anywhere up to 60MB for a full A3 scan, meaning a typical full issue weighs in somewhere beween 1.5GB to 3GB.
JPEG files would take up less space, but I prefer my master files to be as uncompressed as possible for editing - I can always output/backup to jpegs for a smaller safety copy if I need to.
Scanner settings for colour and so on are typically reasonably neutral. There are various options in the scanner software to automatically set things like white balance etc, but while they work most of the time, they can also get fooled with pages that are intentionally off colour - particularly pale pink/yellow/blue - which happens quite a lot - leaving scanned pages with incorrect colours which would have to be manually investigated after the fact - another workflow breakdown. I find it's best to leave off much of this type of processing and just capture the scans as accurately as possible - changing colours can be done non-destructively and more easily later on in the process.
Similarly with crop settings - I have a few templates with crop areas for each type of magazine to process. Don't try to set a crop for each page, or expect this to be the same for each page in an issue - it won't be, and it just slows the process down. My crop is usually slightly wider than the full magazine, as pages will spread when you put pressure on the spine, and the page sizes will vary through the magazine anyway due to creep (pages begin wider and get smaller towards the centre of the magazine, then get larger again as you work to the back.) So there will usually be some black outside border beyond the page edge, which is fine and is easily dealt with later - as long as I'm capturing the whole page, I'm good.
Two things to bear in mind when scanning - you want to take care to make good scans, as this will save post-processing time later. At the same time, you want to be fast - there's no point spending five minutes lining things up, running test scans, micro-adjusting the rotation and finding unique combinations of weights to flatten the page - that's fine if you have a handful of scans to do, but if you want to run through 200-300 pages in an evening, you can't afford to go for "perfection", you'll just never get anywhere. So that's my goal - taking care to get as good scans as I can get, while being as fast as I can. Fast & Good is better than Slow & Great, for my purposes.
Ideally you want a consistent position and orientation (so you can batch crop multiple images easily), you want the page to be straight (minimising the need to manually rotate in post-processing), and above all you need the page to be flat to the scanner glass - otherwise the scans gets blurry, which has a knock on effect further down the line with the OCR processing.
So line the page up along the scanner edges, and make sure the page is weighted evenly when being scanned - either by holding the lid down, or weighting the page - just closing the scanner lid under it's own weight is not enough when scanning magazines - these almost always need pressure to flatten the page, and with old or particularly misshappen or wrinkly pages, need quite a lot of pressure. If the page is not flat, it will have blurry areas in the scan, which vastly lowers the quality of the OCR results, meaning again, more time in post-processing. Sometimes issues can be in such poor condition that the warping makes it next to impossible to get a magazine page totally flat, no matter what you do.
If you can find the perfect size weight that covers the entire scan area and is decently heavy, you may choose to use that, but I've found that generally my preference for most cases is simply closing the lid and holding it down evenly with pressure - it's more arm work, but it's faster than constantly moving a heavy book on and off each page. For magazines with thicker spines that stand up and cause difficulties with pressing the lid down, or sometimes towards the beginning and end where the page thicknesses are mismatched I'll put an extra magazine on top of the scanned magazine, flush with the spine, to make it easier to apply consistent pressure.
It's sometimes the case where the magazine pages have been cropped badly, so the page edges are not straight - often to a surprising degree! In this case, on an issue-by-issue basis, pick the edge that will keep the content as straight as possible, line that up with the appropriate scanner edge, and leave the other edge to be at an angle - it's easier to deal with cropping than to manually straighten pages.
Look out for small bits of loose paper and other garbage that can sometimes be contained between pages and dropped into the scanner, which will then appear on every subsequently scanned page until you spot it - and check for folded corners and unfold them.
Sometimes, particularly with single-page scanning it's more practical to flip each alternate page while scanning for practical reasons (so the overhanging page - the one outside the scanner glass - is always on the same side, or if the scanner has an edge you always want to scan to). In these cases, I do not worry about the varying page orientation, as that is easily taken care of later.
It's often the case that when you scan a page, the contents of the other side come through faintly in the image - particularly when there are heavy dark areas on the reverse of a light scan, and/or where the paper used for the magazine is very thin. In these cases, when you want a high quality scan, and cannot separate the pages, I have a few sheets of black, thick paper, and before every page turn, this is inserted on the underside of the page to be scanned.
This does slow down the scan process, but with practice it's possible to do this fairly quickly when scanning single pages. When scanning a full spread (two pages at once) you will need *two* A4 sheets inserted behind both pages being scanned, so this is slower and even more tedious.
For mu:zines, I generally don't bother with this "bleed sheet", due to the sheer amount of scanning required, and ultimately, the overall goal here is not to get the highest quality perfect archive scans - just ones good enough to view, and to extract the content from. The page scans are not the main end product. However, on other archive scanning projects where print-through reduces the scan quality significantly, I will use a bleed sheet, for much improved final scans.
I have a process later on in the workflow to help minimise any page print-through present, which I'll talk about in later parts of this series.
Hard-bound magazines are particularly challenging to scan, as their spines make them very hard or impossible to flatten out onto the scanner for a good scan. Examples of these include the later Sound On Sound issues, the later International Musician issues, the later Music Technology issues and The Mix.
For the later Sound On Sound issues I scanned, each of which contains well over 200+ pages, I made the decision to separate out the pages from the magazine, effectively destroying it in the process, but making it far easier to scan flat each A4 page. That is still my preferable method for scanning hard-bound issues, but obviously can't be done for everything - for example, donated issues which need to be returned intact.
In these cases, I turn to my Plustek scanner which has glass which goes to the edge, and scan single A4 pages with the binding pressed hard in to scan as much of the margin as possible. You do lose some margin this way, but it's an acceptable compromise over trying to flatten a hard-bound issue and results in good, flat scans. You have to scan twice the number of pages as you can't scan in page pairs, but there is less area to scan and the Plustek is fairly fast, so it's not too painful to do. It does require some effort to really press the binding in. Depending on the magazine, I might try to get the center of the page over the edge onto the glass to capture all the page.
If you do want to separate out pages from hard-bound issues, the best way I have found to do it is to locate the centre pages, and break the issue here by flattening the pages hard, then use a knife to cut down the inside centre and break the spine. Here the magazine should break into two halves, and you can work through each half, carefully pulling/detaching each page. Remember to keep them in order!
Sometimes additional scissor or knife-work can be done to speed up the process by cutting off part of the binding over multiple pages to make separating easier.
Once done, you should have a nice stack of clean A4 pages to scan (both sides, don't forget!) For a 200+ page issue of Sound On Sound, it typically takes me about half an hour to separate the pages - once you've done a few and get the technique, it's fairly straighforward and reasonably fast.
Before I had an A3 scanner, I left the (A3-sized) Making Music issues unscanned. Unless you've only got a few pages to do, it's not worth trying to scan page sections and merge them together afterwords - it's a lot of work, the pages never line up particularly well no matter how hard you try, and it's a very poor workflow to do for every page. If you have pages larger than A4 to scan, it's really necessary to pick up an A3 scanner.
Anything larger than A3 you will either have no choice but to scan each page in sections and merge them later, or go to an outside third-party who can handle these - thankfully, the largest single pages I have needed to scan are A3 size.
Inserts that are just product ads or brochures that aren't part of a magazine don't get scanned, but for pullout content that is part of the issue or that we do want to archive, I will scan these *after* the issue, and those pages will be added at the end of the content so as not to break the issue's page numbering.
How long an issue takes to scan depends on a number things, including how many scans you need to do, how long the scanner takes to complete a scan, how much time you need to line up and flatten each page, and how fast you can turn and prepare a new page to scan.
Most scanners take a few seconds after a scan has been initiated before the head will move and start to scan, and once a scan is complete, a few more seconds to return the head back to the start. So if you get a page ready to scan, press your "Scan" button, then wait for 6 seconds for the scan to start, 10-30 seconds for the scan, and then another few seconds to return the scan head for every scan, you will be wasting a fair amount of time, which really adds up over hundreds/thousands of pages.
I tend to utilise those extra "wasted" seconds to do the page turns and lining up ready for the next scan, so while the scanner head is returning from the previous scan I lift the lid, remove the magazine and turn to the next page. I then hit Scan, and have about 6 seconds before the scan will start, and that's when I will line up the page in place (and reposition a print-through sheet of necessary, and weight the magazine) ready for the scan - so I'm effectively starting the scan procedure before the magazine is in place, to minimise the turnaround time between scans. The page is therefore ready to scan by the time the scanner head starts to move, and I'm saving ten seconds per scan - saving over an hour for an evening's scanning - just by pressing the scan button a little earlier. Little things add up!
For a smallish, 60-page typical issue, where I'm scanning two A4 pages at once in A3 size (so 30 A3 scans needed), in good condition and easily flattened, I get get it done in under 30 mins. The same issue, scanned in A4, with print-through sheets added, I can also do in about 30 minutes.
Most magazines are in the 80-100 page range (40-50 A3 scans), and on average typically take around 60-90 mins to scan, with the large Sound On Sound issues being up to 250 pages and requiring 250 scans (one A4 scan per page) - those take well over 2 hours per issue.
Scanning can be quite daunting, particularly if you are looking at a large scan pile of issues (such as the two foot high pile of IM&RW issues in front of me as I write this!), but like any large task you tackle it one step at a time. Having a well practiced procedure so you can scan each page quickly makes scanning a 100-page magazine doable and something you can do while listening to music, or watching TV.
To date I've personally scanned over 500 full magazine issues, and there are now over 40,000 pages in the Scan Library - and this is what we'll be looking at in the next part of this series.
Next part: From Print to Screen - Part 3: The Scan Library
Vintage 40-year old cassettes | Jan 2022
Blog entries from 2021...
...with East End voices | Sep 2021
Another milestone | Jul 2021
Blog entries from 2020...
Part 6 - OCR Part 1a - Contents & Metadata | Apr 2020
Follow mu:zines on Twitter: @mu_zines
for updates and other bits and pieces