Accessibility Assistance

Skip to Content

Workflow in the DLC, Featuring Prime Recognition™ Software

Overview

Going from letters on the printed page to online searchable text involves the following steps:

  • Digital Scanning: Converting the original materials into image files
  • Quality Control (QC): Inspecting the images and creating metadata
  • Optical Character Recognition (OCR): Converting image files into text files
  • Markup: Applying machine-readable metadata code to our content

Image Creation and Quality Control

Once the digital scanning has been completed, along with the necessary quality control of the digital images for image quality and skew, Prime OCR conducts image zoning if the target data is arranged in columns or tables.

Text Creation

Plain-text files are created from the TIFF image files by means of optical character recognition (OCR). Alternative to OCR: lots of typing.

Original Image File (TIFF) Plain Text File (TXT)
sample image file Shingles- Manufacturers of.
DIXON NICHOLAS, First av c Miller (for ad.
see index)
Silver and Silver Plated Ware.
AYRES C. L., Franklin c Jackson (for ad. see
index)
Skating Rinks- Roller.
Jackson c Morgan Charles Parcell, prop.
  • About OCR: The branch of computer science that deals with extracting text from an electronic image file.
    • Preprocessing (deskew, despeckle)
    • Algorithms for character modeling
    • Lexical checking
  • Newsgroup: comp.ai.doc-analysis.ocr.
  • About & Running PrimeOCR™

Markup

Applying markup to the textual product of OCR comprises three topics, in order of application:

Text Quality Control

Prime Recognition™'s output has greater than 99% accuracy, which reduces the amount of time required to spend on quality control. Still, we currently proofread the tables of contents in the SGML file.