The Scanner - PCs Best Friend
The scanner can convert anything you have on paper—or, for that matter, anything reasonably flat—into computer-compatible electronic form. Dot-by-dot, a scanner can reproduce photos, line drawings, and even collages in detail sharper than your laser printer can duplicate. Better yet, equip your PC with optical character recognition software and the images your scanner captures of typed or printed text can be converted into ASCII files for your word processor, database, or publishing system.
The essence of any scanner is elementary. The scanner detects differences in the brightness of reflections off an image or object using an array of light sensors. In most cases, the scanner has a linear array of these sensors, typically charge-coupled devices or CCDs, squeezed together hundreds per inch in a narrow strip that stretches across the full width of the largest image that can be scanned. The width of each scanning element determines the finest resolution the scanner can detect within a single line. The narrower each scanning element and the closer they are all packed together, the higher the resolution and the finer the detail that can be captured.
This line-up of sensors registers a single, thin line of the image at a time. Circuitry inside the scanner reads each sensing element one by one in order and creates a string of serial data representing the brightness of each point in each individual scan line. Once the scanner has collected and arranged the data from each dot on the line, it advances the sensing element to read the next line.
Types
The fundamental design difference between scanners is how the scanning sensor moves in relation to the image that’s being scanned. Somehow the long line of sensing elements must shift their attention with extreme precision over the entire surface of the image to be captured. Nearly all scanners require a mechanical sweep of the sensors across the image, although a few low-resolution scanners used video technology. To make a sweep in a mechanical scanner, two primary strategies have emerged. One requires the image sensor to move across a fixed original; the other moves the original in front of a fixed scanner. With a video scanner, nothing moves except an electron beam.
Drum Scanners
Drum scanners exemplify the latter technology. They work like printing presses in reverse. You feed a piece of paper that bears the image you want to capture into the scanner, and the paper wraps around a rotating drum that spins the image past a sensor string that’s fixed in place inside the machine.
Flatbed Scanners
A flatbed scanner uses an automatic mechanism to move the sensor. It earns its name from the flat glass surface upon which you must place the item to be scanned, face down. The scanning sensors are mounted on a bar that moves under the glass, automatically sweeping across the image.
Flatbed and drum scanners are designed with precision mechanisms that step the sensors or image a small increment at a time, each increment representing a single scan line. The movement of the mechanism, which is carefully controlled by the electronics of the scanner, determines the width of each line (and thus the resolution of the scanner in that direction).
Hand Scanners
Hand scanners must cope with the vagaries of the sweep of your all-too-human hand. If you move your hand at a speed other than that at which the scanner expects, lines will be scanned as too wide or too narrow, resulting in image distortion. To avoid such disasters, the hand scanner uses a feedback mechanism that tracks the position of the image. Most have a roller that presses down against the image you are scanning to sense how fast you drag the scanner along. The rate at which the roller spins gives the scanner’s electronics the feedback it needs about scanning speed. From this information, the software that controls the hand scanner can give each scanned dot its proper place.
Video Scanners
A video scanner is the electronic equivalent of a photographic copy stand. The video scanner uses a conventional video camera to capture an image. Most video scanners permanently mount the camera on a stand and give you a stage on which you put the item to be scanned. The stage may have a backlight to allow you to scan photographic slides or negatives or it may be a large bed for sheets of paper or even three-dimensional objects.
Optical Character Recognition
Scanners do not care what you point them at. They will capture anything with adequate contrast: drawing or text. However, text captured by a scanner will be in bit image form, which makes it useless to word processors, which use ASCII code. You can translate text in graphic form into ASCII codes by Optical Character Recognition. Add character recognition software to your scanner, and you can quickly convert almost anything you can read on your screen into word processor, database, or spreadsheet files.
Early OCR software used a technique called matrix matching. The computer would compare small parts of each bit image it scanned to bit patterns it had stored in a library to find what character was the most similar to the bit pattern scanned. For example, a letter "A" would be recognized as a pointed tower 40 bits high with a 20-bit wide crossbar.
Most of today’s OCR systems use feature matching. Feature matching systems don’t just look and compare but also analyze each bit pattern that’s scanned. When a feature matching system sees the letter "A," it derives the essential features of the character from the pattern of bits—an upslope, a peak, and a downslope with a horizontal bar across. In fact, feature matching recognition software doesn’t need to know the size or font of the characters it is to recognize beforehand. Even typeset text with variable character spacing is no problem. Feature matching software can thus race through a scan very quickly while making few errors.
Electrical Interfacing
At least four different interface designs are used by scanners—SCSI (the Small Computer System Interface), GPIB (General Purpose Interface Bus), standard serial, and proprietary.
The least desirable of these is the last. Standard serial ports are simply too slow to handle the data generated by a scanner. Most desktop scanners are moving to the SCSI interface for its high speed. Hand scanners generally use proprietary connections because the tiny devices have neither the room nor the need for standardized interface circuitry. GPIB was originally developed by Hewlett-Packard Company (hence, its original moniker, the Hewlett-Packard Interface Bus) for interconnecting its test and measurement equipment. It provides a medium-to-high speed connection that fits neatly between serial and SCSI.
Application Interfacing
As with other input devices, scanners have their own control and signaling systems that must link to your software to be used effectively. Early scanners used their own proprietary application interfaces to relay commands and data. Consequently, each scanner required its own software or drivers. Oftentimes you could only use the scanner manufacturer’s own software to grab images.
Thanks to a concerted effort by the scanner industry, that situation has changed. Now you can expect any scanner to work with just about any graphics program. Moreover, scanning is consistent across applications. The same screens that control your scanner in PhotoShop appear in Corel PhotoPaint.
Central to this standardization is Twain. First released in early 1992, Twain is a scanner software interface standard developed by a consortium of scanner and software makers called (in its final form) the Working Group for Twain. The primary companies involved in forming the working group included Aldus Corporation, Caere Corporation, Eastman Kodak Company, Hewlett-Packard, and Logitech.
The Twain name requires some explanation. Twain is not an acronym, so only its initial letter needs to be capitalized. Rendering Twain in all capital letters is a typographic error. Officially, the Twain developers explained that the name is a reference to the purpose of the interface. Not an acronym, it derives from making the twain (an archaic word for "two") meet, the two of the twain being applications and scanners.
When the Twain interface was being developed, it wore a number of different names. The most common of these were Direct-Connect and CLASP, the latter of which stands for the Connecting Link for Applications and Source Peripherals. The developers of Twain considered these and others as the formal name of the interface. After searching through lists of trademarks in use, however, they found so many conflicts they felt that lawsuits would be a distinct possibility were any of the developmental names to be used. Instead they chose the name Twain to, in the words of one of the developers, "describe this interface which brings together two entities: applications and input devices."
Twain links programs and scanner hardware, giving software writers a standard set of function calls by which to control the features of any scanner. One set of Twain drivers will handle any compatible scanning device. Because the Twain connection has two ends—your scanner and your software—to take advantage of it requires that both be Twain-compatible.
Twain defines its hardware interface as its Source. The Source is hardware or firmware in a scanner that controls the information that flows from the scanner into Twain. The scanner maker designs the Source to match its particular hardware and interface. Your software links to the Twain Source through a Source Manager, which is essentially a set of program calls.
Twain takes the form of a software driver. The original driver was written in 16-bit code and takes the name TWAIN.DLL. The working group has also created a full 32-bit version of the driver called TWAIN32.DLL. Both the 16- and 32-bit versions work with Window 95 and NT.
Harry Husted is a freelance writer and author. His writing projects include ghostwriting, copywriting, web site content, and DTP. His credits include articles for Internet Day, Internet World, Advertising Today, Advertising Age, L-Advertising, and a host of others. Harry is also an author of three books, Learn How to Repair Computers: Get Certified in 15 Weeks, How to Write Your Way to Millions, and How to Find and Start a Legitimate Home Business. He can be reached by sending email to This e-mail address is being protected from spambots. You need JavaScript enabled to view it or visit his site at http://www.creatingwords.com
This article is copyright (c) 2002 by Harry Husted, and may be reprinted in it's entirety as long as this byline and copyright statement is included.