Document Scanning Basics

Posted on November 18, 2015 by

Scanning & Basic Terminology

I am often asked to explain what goes into document scanning. It is much more involved than laying a piece of paper on the scanner and pressing a button. This is the first in a series of blog posts related to document scanning and conversion services that Pinehurst offers. It includes OCR, the various types of outputs, such as PDFs or TIF images, along with the creation of XML or HTML and indexing.

Type of Document Scanners Pinehurst Technologies Uses

  • Sheet-fed (high speed)  — Typically used for high volume black & white scanning. These also can be used at higher resolutions and for color and grayscale.
  • Large Format  — Scanning of architectural/engineering plans or other items larger than 11 x 17.
  • Flatbed — Used for color and/or grayscale images that can later be integrated into black and white pages. Also used for slides and negative scanning.
  • Non-destructive — Used for publications that cannot be despined or are too fragile to process through a sheet-fed scanner.
nondestructive scanner

nondestructive scanner

One-bit Monochrome (black and white)

One-bit monochrome images generally are used for scanning standard text documents that do not contain color or grayscales. The most common format is TIF.

Grayscale

Grayscale is used to scan images for output in black and white. Not to be confused with line art (B/W) mode. Grayscale images are a collection of different shades of gray vs. one-bit monochrome which contain only black & white pixels. Common format is JPEG.

Projects such as journals or magazines can have color or grayscale images integrated into one-bit monochrome images. This is usually done if the monochrome scans are converted to PDF.

Pixels

A pixel is the basic unit of color in a computer image or on a computer display.

DPI

Dots Per Inch. DPI is used to measure the resolution of an image both on screen and in print. which in turn is used to determine the quality of a scan. The most common dpi that Pinehurst will use is 300 dpi. The advantages of 300 dpi is that the quality is generally good (depending on the original) and this resolution produces a smaller file size. For some projects we will scan monochrome images at 600 or even 1200 dpi depending on client needs. If you plan on using OCR to make your scans searchable, you would want to use 300 dpi as a minimum.

Common Image File Formats

  • tif or tiff — Tagged Image File Format
  • gif — Graphics Interchange Forma
  • jpg or jpeg — Joint Photographic Expert Group
  • pdf — Portable Document Format
  • png — Portable Network Graphics
LinkedIn
Facebook