- About | People | Projects
- Presentations | Publications
- Exhibits | Printable Materials
- Goals & Reports | News | Giving
- UF Digital Collections (UFDC)
- Digital Library of the Caribbean (dLOC)
- Caribbean Newspaper Digital Library (CNDL)
- Florida Digital Newspaper Library
- Institutional Repository @UF
- Related Libraries
Digital Library Center
Smathers Libraries
University of Florida
P.O Box 117003
Gainesville, FL 32611 USA
P: 352.273.2900
F: 352.846.3702
DLC@uflib.ufl.edu
Digital Library Center: Documentation & Technologies: Digitization Activities and Average Times
Below is a list of the component activities in digitization offered by the DLC with estimates of average times per component. All digitization complies with national standards. See the average file sizes and project planning pages for more resources for planning projects.
Time Requirements by Workflow Component
| Digitization Workflow Category | Type of Process for the Workflow Category | Processing Required | Average Time Requirements |
|---|---|---|---|
| Metadata | Catalog record available | DLC evaluates existing record, ingests, and massages records as needed. | Average time: 1 - 5 minutes per item |
| Spreadsheet available and accurate | DLC reviews, enhances, imports, and verifies.
|
Average time: 40 minutes - 2 hours per spreadsheet; average spreadsheet has 200 items Note: this is only for the import process. The DLC trains others on what information is needed and assists in creating spreadsheet until the creator is comfortable doing so alone. |
|
| Spreadsheet available, but incomplete or inaccurate | Example: a Word file with a table with a single line listing titles, authors, and dates without any consistent separation (no columns, tabs, or commas that can be used to create tabular data). DLC finds a way to separate the rows into tabular data if possible, or copies and pastes all information into a spreadsheet in the correct format. Then, DLC sends the spreadsheet to the selector with any recommendations for added fields and asks for feedback. |
Average time: 1 minute per item to create the spreadsheet item Additional time required: 40 minutes - 2 hours for the completed spreadsheet |
|
| No catalog record, spreadsheet, inventory, finding aid, etc. & Materials can be determined. |
Example: a box of only books with no other information. DLC reviews materials, sorting and creating metadata as possible. DLC offers training for future spreadsheet and metadata creation. OR For items needing actual catalog records in a traditional format, DLC sets a meeting with Cataloging and together they establish a workflow to have the items cataloged in Cataloging and then returned to the DLC for digitization. |
Average time: 10 minutes per item. Additional time required: 40 minutes - 2 hours for the completed spreadsheet OR Average time: one or more 1 hour meetings + Cataloging time to catalog materials. |
|
| No catalog record, spreadsheet, inventory, finding aid, etc. & Materials cannot be determined. |
DLC reviews materials, sorting and creating metadata as possible. After sorting and review, DLC staff create a brief spreadsheet. If a Collection Manager is available, DLC staff send the spreadsheet and ask ask the Collection Manager for feedback. If no Collection Manager is available, DLC staff attempt to work using the newly created spreadsheet. | Average time: varies and can only be determined on a case by case basis
|
|
|
|||
| Copyright | Permissions cleared | Permissions status clearly documented and provided when physical materials received. | Average time: 0 - 1 minute to check documentation in files and update if needed. |
| Officially Published in US pre-1923, Clear Public Domain | Information is available in a published document. No requirements to consult documentation on length of copyright by year or country; no requirements to consult book copyright renewal database. | Average time: 1 minute to read and verify information to verify status as cleared. | |
| Archival, permissions status communicated after inquiry | Average time: 1 - 3 minutes to call or email to check and update documentation. | ||
| Permissions not cleared, but permissions status and the need for DARK archiving clearly documented and provided when physical materials received; Or Permissions status easy to ascertain |
Dark Archiving, if identified as such, requires no additional research. | Average time: 0 - 1 minute to check documentation in files and update if needed. | |
| Permissions not cleared, but wanted and permissions status clearly documented and provided when physical materials received; Or Permissions status easy to ascertain |
Requesting permissions | Average time: 20 minutes Average process includes checking all pertinent copyright rules, searching for copyright holder, sending permissions request to copyright holder; updating documentation in files that permissions request was sent and noting the information found on the copyright holder. When applicable, scheduling for follow-up inquiry. Note: Some materials are significant enough for the allocation of additional resources for pursuing permissions. Those are a case by case basis and normally require at least 2 hours. At least 30 minutes of this time is normally in meetings with collection managers where the necessary background is communicated on how to possibly locate the rights holder and why the particular materials are significant. |
|
| Unclear Copyright Status, Holder, etc. | Copyright research, and requesting permissions. | Average time: 10-20 minutes for copyright research Copyright research consists of searching for information on the materials and copyright holder. If information can't be located quickly, the item is deferred unless it warrants additional resources. Additional average time required: 20 minutes to request permissions. Only required if copyright holder is located. |
|
|
|||
| Material Preparation | Disbinding a book | Also includes any clean-up of physical materials, placing in folders and boxes that are labeled and placing those on appropriate book trucks or shelves to be reviewed for appropriate imaging technology |
Average time for disbinding a book: 8 minutes per book
|
| Cutting newspaper pages (normal newspaper size*) | Includes placing in boxes that are labeled and/or placing those on appropriate book trucks or shelves to be queued for imaging *Some newspapers (i.e.; Iguana; Justice) are 8 1/2 X 11 and are cut using a paper cutter, and then go through the high speed scanner. |
Average time: 20 minutes per inch of newspaper One month of newspapers from August 2008, with no born digital titles, is 16 inches. One month of newspapers from October 2009, with 37 newspapers born digital (total of 72 newspaper titles in the Florida newspaper queue), would be under 1/2 of this or under 8 inches. |
|
| Preparing archival files | Sorting, separating, unfolding, flattening, removing staples, paperclips, debris, etc. |
Average time: varies and can only be determined on a case by case basis |
|
| Collating, de-duping | Breakdown Separating out / checking title: 5 sec/title Collating for input into tracking: 2 secs for monthly, 30 secs for daily Inputting into tracking (calling up tracking, inputting, printing tracking sheet, placing on shelf, record in xls for physical tracking): 40 secs for monthly and 3:10 for daily |
Average time: for new and non-organized or inventoried collections, varies and can only be determined on a case by case basis Average time for collating newspapers: 47 seconds for one month of a monthly newspaper; 3:45 for one month of a daily newspaper Average time for de-duping: varies, but close to collation time after initial physical material ingest, inventory, and review; duplicates do add an additional time component if they cannot be discarded or returned and must be arranged and kept for an unknown length of time |
|
|
|||
| Imaging: Physical Materials | Books | Disbound, and can go through the highspeed scanner |
Average time: 10 - 15 minutes for 300 normal pages (300dpi grayscale, time increases if many color pages) Average time, brittle: 45 - 60 minutes for 300 pages *Time level varies if the scanner has to be cleaned. Brittle pages must be scanned at a slower rate to help prevent rips and jamming. |
| Books | Bound, average book |
Average time, scanned on a copibook: 90 - 110 minutes for 300 normal pages (no foldouts, tip-ins or oversized pages) Average time, if oversized: use times listed for maps and oversized items below Average time with processing: See the post-processing for images section for books for a more accurate assessment of the time for scanning and image processing for a single item. Processing time required is directly related to the imaging technology, so it will vary based on the scanning equipment used. |
|
| Maps and Oversize Items | One full capture using the large format camera, not multiple captures and splicing (as is required for many oversize materials) | Average time: 15 minutes for a single capture and processing Average time for multiple captures and splicing: 30 minutes for two captures (includes processing and splicing), 10 minutes for each additional capture (e.g.; 3 captures=40minutes; 4 captures=50 minutes) |
|
| Photos, Loose | Photos, loose and not oversized, are scanned on the flatbed scanners at 600 dpi | Average time: 1 - 3 minutes to scan per photo | |
| Photos, Mounted (scrapbook, etc) | Photos, Mounted (scrapbook, etc) | Average time: 45 - 60 min. for 75pgs | |
| Photos, Aerials | Average time estimate for scanning and image quality control is based on three successful Florida aerials grants. | Average time: 9.6 photos/pages per hour | |
| Slides, 35mm | Color slides are scanned at 4000dpi and with the bulk loader, to scan 24 per hour. Time increases for older, non plastic mounted slides because they tend to jam the slide scanner. |
Average time: color slides 4000dpi 24 per 60 min. to scan
|
|
| 4x5 color transparencies | 4x5 color transparencies 600dpi | Average time: 3 min. per transparency to scan only |
|
| Slides, Glass | Scanning only: 4x5 900dpi 3.5 min. |
Average time:
3 - 3.5 minutes each
|
|
| Archival materials | Average times for archival materials vary widely because of: special handling needs and average length. If all of the pages are for the same item and can be handled the same way, the overall time is reduced and overhead from switching to a new item and labeling it is reduced as well. | Average time, scanned on a copibook: 90 - 110 minutes for 300 normal pages (no foldouts, tip-ins or oversized pages; no need for backing; all pages are for the same item) | |
| Newspapers: Current |
|
Average time per page in color: 30 sec Average per page in black and white: 15 sec |
|
| Newspapers: Bound | Additional time depends on: the gutter; whether the paper can be captured 1 up or 2 up; turning odd and even pages; whether a glass plate is required to flatten the pages | Average time per page: at least 3x more than for unbound newspapers |
|
| Newspapers: Brittle (requiring large format camera) | Average time: at least 3x more than for normal unbound newspapers, can be even higher |
||
| Object, Flat | Using DSLR camera | Average time: set up time can be several hours for a single shot; set up is the largest time component |
|
| Object, Rotation | Using DSLR camera connected to turntable in DLC. Additional time is required for equipment packing, traveling to location, setup, and repacking and returning. |
Average time: set up time can be several hours for a single item for 126 images; set up is the largest time component | |
|
|||
| Digital Reformatting / Digital Conversion from Analog |
Audio: Record Cassette tape Reel to Reel tape Video: VHS |
Record | Average time: Actual digitization time equivalent to length of audio or video file. Thus, 1 hour of audio takes 1 hour to digitize. Set up time is in addition to this; however, estimate includes set up time within the actual time required because of variances from the degree of supervision needed for the digitization process. Digitization time may or may not need direct supervision at all times. If it needs to be supervised or not impacts how much other work can be done simultaneously, Other work is most often image post-processing. |
|
|||
| Digital File Ingest | Imaging Ingest: Legacy DLC files | Files on CDs, DVDs, portable hard drives, SAN | Average time: varies dramatically |
| Imaging Ingest: Born-Digital IR Materials | Variables include server space available, number of items, size of each item, format of each item (PDF, HTML, AVI, AVI streaming which needs to be ripped or which requires contacting AT for copies) | Average time for 1 volume, new item: 4 min. Additional volume for serial item already in tracking: 1-4 minutes Average time for new groups of materials: varies based on number and type of items |
|
| Imaging Ingest: Born-Digital Newspapers | 1-3 minutes per issue covers time to check spreadsheet, add item to tracking with brief data, match new BIBVID to vendor naming structure, and bulk rename, checking data while doing so.
|
Average time: 1-3 minutes per issue, if from a hard drive, not tarred or zipped, do not have errors, and have some human readable title and date identification (in the file name itself, in a spreadsheet or xml file) Average time if on CD/DVD: 5-15 minutes per item.* *Includes time to copy files from CD/DVD to a hard drive. Also includes time to recheck copy process because the CD/DVDs have a much higher likelihood of errors. |
|
| Imaging Ingest: Vendor Files | Variables are: File identification and usable structure for batch renaming; Files on drives or decaying disks; Files tarred and zipped have greater frequency of integrity errors; |
Average times vary widely. Examples: ingesting the 94 issues (for v. 1-18) burned to disk for FLMNH bulletins required over 12 staff hours. Time required was to work with the disks (two had cyclic redundancy errors), normalize the file quality, qc the files and notice that pages were missing, locate the missing pages or rescan, reprocess, and then OCR and load. |
|
|
|||
| Post-processing for digital files | Splitting pages | Required for bulk digitization from microfilm scanned with 2 pages per image. | Average time: no files needing page splitting currently; prior time estimates not available. |
| Splitting separate items | For digitized microfilm, partner files, and retro ingests that were never processed into items Average reel has 5-10 items. Time required depends on the quality of the film and the accuracy/inclusion of description images (i.e. targets that say the item title & reel position). |
Average time: 15 - 30 minutes to split a reel of digitized microfilm into items
|
|
| Scan, crop, deskew, levels for Baldwin Books | Scanning & Initial image processing (deskew, crop) Kodak DCS 24n megapixel DSLR camera: 3 min/page x 200 pg = 600 min/60 =10 hrs/volume Copibook scanner: .60 min/page x 200 pg = 3 1/3 hr/volume Flatbed scanners: 3 min/page x 200 pg = 600 min/60= 10 hrs/volume |
Average time if DSLR camera: 3 min/page Average time if scanned with Copibook scanner: .60 min/page Average time if scanned on a flatbed scanner: 3 min/page *Please note: in most cases, the times for scanning and image processing are inseparable because the imaging technology used does alter the amount of image processing (deskewing, cropping) required. |
|
| Crop, deskew, levels, color correction | disbound volumes | Average time for disbound volumes: 60 - 90 min for 200 pgs | |
| Batching and Copyright blur | Time depends on the amount of material in copyright. Okeechobee News normally requires 1 minute; Miami Times normally requires 30 minutes | Average time: 1-30 minutes | |
|
|||
| Quality Control Review and Structural Metadata Creation | Brief items | Short research items (under ~40 pages) where a table of contents is very unlikely to be used and wouldn't prove of much benefit only have pagination and quality review during QC; no table of contents style metadata is added | Average time: 1-3 minutes per item (item is normally under 40 pages) if no errors |
| IR | Average time: 1-3 minutes per item (item is normally under 40 pages) if no errors | ||
| Newspapers | Sections (A, B, C) and page numbers added, final quality review of item | Average time: 1-3 minutes per item (item is normally under 40 pages) if no errors | |
| Books, Complex | Average time estimate for QC alone is based on the Baldwin Phase III grant time requirements. | Average time: 40 - 60 min/volume (average of 40 for volumes with no errors and 60 for volumes with errors) | |
| Photos, Aerials | Average time estimate for scanning and image quality control is based on two successfully completed grants for Florida aerials. | Average time for scanning and image quality control: 9.6 photos/pages per hour | |
|
|||
| OCR; Loading; Archiving to FDA and Internally | OCR | Average time: OCR runs constantly against available materials. Average labor time is 15 - 20 minutes per day for all materials to be processed that day. Time is to check process, refine any jobs as needed, and correct any errors. |
|
| Archiving to FDA | FTP and loading drives and mailing (forms, error correction, ingest of reports) | Average time: 3 hours to set up external hard drive for file transfer and start file transfer (10 minutes), transfer files (varies based on size of all files being transferred; done on a separate machine and does not interfere with other work), and then drive to drop off the drives and drive time to return to work. Average drop off has been 10 hard drives in one trip. Goal is to have FDA catch up on backlog and be able to FTP daily work and have that process easily without the need to use external hard drives. |
|
| Archiving internally | Required components of burning DVDs: 1. Labels: printed in batches of 100, 5 minutes to renumber and print: .33 seconds printing time for each label 2. Labeling each DVD: 10 sec 2. Burning: 7-8 minutes per DVD (4.4GB) 3. File sort: 20 seconds per DVD 4. Filing DVD: 10 seconds per DVD 5. Transferring files: moving files from the SAN to a local drive to burn locally and not across the network. Done overnight to reduce time delay; otherwise can take 1-3 hours depending on drive availability and system time |
Average time required with burning DVDs: 9 minutes for each DVD (4.4GB)* Average time expected with Tivoli automation: 0; time would be replaced with 100% load verification |
|
| Load and metadata verification | Average time: 1 hour per day for brief validation using only file names and the m=han page; spot checking under 10% of load items | ||
|
|||
| Post-processing disposition & ongoing changes and corrections | Returning physical materials, updating holdings records to discharge or withdraw item from DLC | Example: all IFAS documents must be completed before they can be returned, and they must be properly ordered for all issues. This means that the DLC must store all completed items, keep them in order, and must only file newly completed items in the correct order. Once all are done, only then will the holding records be updated in one large batch. | Average time: varies on requirements for returning items. |
| Material reclamation | Pulling folders, relabeling boxes | Average time: varies. | |
| Metadata updates, Manual | Involves updating the metadata of one or more items. Single items are done manually and large projects (including serial hierarchy changes) employ combination of automated and manual methods. | Average time: 10 minutes/title (manual assignment) | |
| Serial Hierarchy | Prior work required manual updates for each item (10 minutes X 100). With new tool, DLC staff can update serial hierarchy for batches of items. Tool is being refined for optimal performance. | Average time: 10 minutes for 100 items | |
As Abby Smith notes in the CLIR report on "Strategies for Building Digitized Collections":
Reliable and meaningful cost data about digitization are rare and not often useful in comparative contexts. Costing out the elements of digitizing means beginning with selection and going to physical preparation, cataloging, physical capture, creation of metadata, mounting and managing files, designing and maintaining the site, providing additional user services, and going through to implementing a long-term preservation strategy. Virtually every step in digitization involves human intervention and skill, and these costs, unlike those of storage, for example, are unlikely to go down. (Section 4; 2001)
Example Projects:
| Bound Books: Baldwin Library of Historical Children's Literature (NEH Grant, Phase III) | |
| Overview | Details |
Catalog records created by Cataloging Copyright status already known to be public domain Physical material prep. and post-proc. by Preservation DLC digitization total average time for a 200 page book: |
DLC handled digitization (imaging, image processing, QC with structural metadata, OCR, loading, and archiving) for 2,500 books over 2 years, or 1,250 books per year. For each of the two grant years, dedicated staff time for cost share in the DLC: 2.15 FTE Total of 2,500 volumes, or 500,000 pages over a two year period Scanning & Initial image processing (deskew, crop)
Pre-processing, QC and preliminary XML creation (derive jpgs from master tiff images, create table of content images to use in XML creation, check for missing and/or unacceptable images, assign page numbers, division names, and chapter titles). From numbers recorded in previous two phases, approximately ¾ of the volumes imaged have no errors necessitating rescanning; ¼ of the volumes have errors Mark-up (metadata review and revision; text review): 10 min/volume The full grant proposal is online here. |
| Aerial Photographs: Florida Aerial Grant (LSTA Grant: Phase III) | |
| Overview | Details |
Metadata and material prep and post proc.: Map Library DLC digitization: 1,390 hours for 13,418 images: Plus: DLC cost share of .23 for one year for ingest of another 7,473 already digital images, and training and supervising students |
Digitize 13,418 historical aerial photographs and 120 paper indexes OPS Scanning: 1,125 hours OPS Metadata/quality control student: 225 hours DLC cost share of .23: for ingest of other 7,473 images, system upgrades, and training and supervising students |
| Large Format Architectural Drawings/Photographs: Flagler Architectural Drawings (NPS Grant proposed) | |
| Overview | Details |
267 architectural drawings/ blueprints OPS time: 654 hours Average pages per hour, without factoring in cost share time: Plus: DLC cost share, years 1 and 2 |
267 architecture drawings, blueprints and related material OPS time: 654 hours DLC cost share, year one: .10 |
| Archival and Mixed Materials: Historic Everglades (NHPRC Grant) | |
| Overview | Details |
Spreadsheets for metadata by Special Collections Material prep. and post-proc. by Preservation Digitization by DLC: Plus: DLC cost share, .30FTE for each of the three years |
DLC cost share: .30 FTE for each of the three years
Based on experience with test sets, we're building in a 10% reshoot rate for pages, 15% reshoot for letterbooks, and 15% for photos. Adjusted estimates are: |
