From Print To Screen | |
Part 5 - Outputting the Scans to Useby Ben | 12th November 2019 |
Following on from last time, the scans from our issue are ready for use, so we need to export them from the Scan Library in Lightroom to the website, and we also need to export them to use in OCR processing.
I need to export three sets of files - the small thumbnails used in the page scan drawer, and the large page scans for the full view need to be named correctly and be placed in the correct folder on the website (my local development version of the site, which runs on my computer - everything that I do on the site is done on my computer first, and then synced up to the live server when ready).
I also need to export full size versions of the pages with editorial content for OCRing - ie I dont want full page ads, or any content I don't intend to OCR.
Again, we don't want to have to do any repetive procedure manually, so I have a workflow triggered by Keyboard Maestro to do all the necessary steps for me, so I don't have to think about it. (See the "Keyboard Maestro" panel below for more info.)
With the issue selected in Lightroom, I hit my keyboard shortcut to run my export macro. This does the following:
The large versions go to:
*website*/images_mag/scans/mt/mt_94_02/l/mt_94_01-1.jpg etc
The small versions go to:
*website*/images_mag/scans/mt/mt_94_02/n/mt_94_02-1.jpg etc
While those are exporting (they typically take a few minutes to render and resize a few hundred jpegs from the source scans, I'll go back to the issue in Lightroom, and quickly step through the pages with the cursor keys and "pick" (press the 'P' key) the cover page, and any the pages I will need for the OCR processing (contents, news, competitions, articles and so on.)
There may be some pages, like news, letters and so on, that I know I won't be OCRing now, but might later go back and add to the site (for completion), so I will include those in the OCR document now so this will be easier when/if I come to tackle that (it means I don't have to go searching through each issue in the library and do another set of exports, merging those pages into the OCR document and other tedious tasks - I'd just open the OCR file, and read/export the required pages.).
When all the editorial pages are Picked (this only takes 30 secs or so to run through), I do a "filter by Pick flag" to show me just the Picked pages, select all those (or just "Select flagged"), and hit another key to run the export for OCR processing, which exports as high quality fullsize jpegs to my processing folder - this takes about 2 mins to export a typical issue (say, 60 editorial pages out of 100 pages for the magazine).
These go to
*muzines*/processing/mt/mt_94_02_feb/scans/01.jpg etc
Exports done, I now need to tell the website CMS that the scans for this issue are now available. So I edit that issue, and tick the "Scans" checkbox.
This will scan the filesystem to check that the expected files are in the correct place and named accordingly, and update some housekeeping things in the database, like how many pages this issue has, and so on. Once that's done, the scan pages will be available to be displayed on the (development) site.
So all in all it's a pretty straightforward process - to export an issue, I basically select it in LR, hit a key and the files will be rendered out for the site, then pick the editorial pages and export those for processing. The export process takes about five minutes in all, and is by far the easiest of all the steps!
Ok - what about those pages we exported for the OCR process? We'll be looking at those in the next part, when we start our look at the whole OCR process - this will take a few blog entries to cover...
Next part: From Print to Screen - Part 6 - OCR Part 1a - Contents & Metadata