Paperwork which can be scanned for picture processing or storage are sometimes captured at eight bits per pixel. These are known as grayscale pictures in that 256 ranges of grey are in a position to be represented within the picture. The potential to characterize shades of grey is required to faithfully seize images and graphics however will not be usually wanted for textual content. Consequently, scanned pictures are sometimes binarized.
This course of includes the collection of a threshold worth after which altering all pixel values beneath that threshold to black and all pixel values above the brink to white. This compresses the picture by solely requiring 1 bit per pixel as a substitute of eight bits per pixel. For instance, an eight.5 inch by 11 inch doc scanned at 200 samples per inch, has its storage requirement drop from three.6 megabytes to 450 kilobytes. Binarization additionally happens when paperwork are transmitted by fax as pictures are created which can be represented with 1 bit per pixel as effectively.
Given a correct threshold, binarized pictures that include textual content can usually nonetheless be learn by both human eye or OCR algorithms. Certainly, some OCR algorithms start the OCR course of by binarizing the picture because the textual content is often darkish towards a lighter background. Nevertheless, if bar codes are current inside the binarized picture, they will by altered sufficient by the binarization course of to trigger problem for a decode algorithm.
The picture to the left exhibits a part of the beginning sample of a Code 39 linear bar code. The highest portion exhibits the picture represented as eight bits per pixel, whereas the decrease portion exhibits a binarized model utilizing a threshold of 100. Observe that the graceful transitions from white to black within the grayscale picture have been reworked into ragged edges which have plus or minus one pixel variability. In consequence, the width of every bar or area can range by plus or minus 2 pixels. On condition that the knowledge in linear bar codes is contained within the widths of the bars and areas, decode issues may end up relying on the dimensions and sort of the unique bar code image. The rest of this text will concentrate on two points that needs to be thought of to enhance the learn fee of binarized bar codes: pattern density and symbology kind.
Pattern density is the variety of samples per unit space that a picture is captured. Typical pattern densities used for doc processing vary from 200 to 300 samples per inch (dots per inch or DPI). Fax transmission can range from 100 as much as 400 DPI. In bar code decoding, a extra essential parameter to think about is the samples per module, the place the module measurement is the dimensions of the smallest function of the bar code. This distance can also be known as the X dimension. This parameter consists of each the pattern density of the picture seize machine together with the dimensions of the bar code being scanned. Volo(TM), a bar code decode software program toolkit from Omniplanar®, requires a minimal of 1.6 pixels per module for linear bar codes in a grayscale picture. From above, if the width of a bar or area can range by 2 pixels after binarization, one can see that issues can exist within the brief widths at this low pattern density. Consequently, Omniplanar recommends a typical worth of four pixels per module for bar codes that might be binarized. For a doc scanned at 200 DPI, this requires a minimal module measurement within the printed bar code of .020 inchs or 20 mils. Rising the samples per module past four can also be advisable for pictures that can undergo a number of binarization processes reminiscent of paperwork that might be faxed a number of occasions.
The selection of symbology used to encode information also needs to be thought of rigorously. Linear (or 1D) bar codes may be broadly categorised into two distinct teams: huge/slim codes and a number of width codes. In a large/slim code, every bar or area can solely be both slim or huge. The huge to slim ratio, which provides the dimensions ratio between the 2 widths, sometimes ranges from 2 to three. Standard symbologies that use solely 2 widths embrace Code 39 and Interleaved 2 of 5. A number of width codes have greater than 2 alternatives for the widths of bars and areas. Standard symbologies that make use of greater than two potential widths embrace UPC and Code 128, each of which permit a bar or area to be both 1, 2, three, or four modules. Given the identical samples per module, huge/slim symbologies sometimes survive the binarization course of with greater learn fee efficiency than their a number of width counterparts. That is merely on account of the truth that within the presence of edge variation on account of binarization, it’s simpler to find out if a run is X or 3X huge (the place X is the size of a module and an assumed huge to slim ratio of three) or whether it is 1X, 2X, 3X or 4X as is the case with a number of width symbologies. The desk beneath classifies the linear symbologies supported by Omniplanar’s Volo into the 2 courses talked about above.
- Code 39
- Interleaved 2 of 5
- Patch Code
A number of Width
- Code 128
- Code 93
Whereas huge/slim symbologies are most popular in purposes the place binarization will happen, they don’t usually have as excessive a knowledge density (information saved per unit width) in comparison with a number of width symbologies with the identical X dimension. This may be a problem the place restricted area on the doc is supplied for the bar code image.
2D (or two-dimensional) matrix symbologies are sometimes presence/absence codes. They use a daily grid of potential cell positions, and the presence or absence of a cell in a grid location encodes the information. Two-dimensional symbols have a lot greater information density than 1D codes as data is saved in each dimensions of the image. As well as, they usually present error correction to supply redundancy in case some cell information values can’t be decided. When a 2D image is binarized with enough samples per module (the dimensions of a grid place), they have an inclination to do higher than linear codes as the sting data will not be as essential: it’s the place of the middle of the cell that’s extra essential. Volo requires a minimal of two.75 pixels per module in a grayscale picture for dependable operation. If further samples per module past 2.75 is supplied, efficiency continues to enhance and is an effective selection for pictures that can endure a number of binarizations. A pattern density of 5 pixels or extra for 2D matrix symbologies offers good learn efficiency after binarization. 2D Matrix codes supported by Omniplanar’s Volo embrace Knowledge Matrix, QR Code, and Aztec Code.
PDF417 is also known as a 2D symbology. Nevertheless, it isn’t a real matrix code. It’s a stacked linear code, consisting of a number of rows of brief linear codewords. container house and areas to have as much as 6 totally different widths and as such it may be adversely affected by binarization at decrease samples per module.