A while ago I needed to scan a faded report that was printed in 1989 on a 9-pin dot-matrix printer. However, current optical character recognition (OCR) software seems to work best on freshly-produced laser printer documents. After much trial and error, I developed a method of processing the images of the scanned pages to improve the accuracy of the ReadIris Pro 8 OCR application when recognizing faded, dot-matrix printouts.
I scanned the dot matrix printouts at 300dpi and saved them as greyscale images with 256 shade of grey. I used Photoshop to connect the matrix of dots to make each letter more closely resemble the solid letters that ReadIris Pro is able to recognize more accurately.
I connected the dots in four steps:
I then used ReadIris Pro to recognize the text in the image produced in stage 4 using the following settings:
The following steps show the results of applying each stage to the top-left portion of a scan of a faded, dot-matrix printout.
Scan the document at 300dpi and save it as a 256 greyscale levels image.
Select Filter > Stylize > Find Edges to apply the Find Edges filter to the image scanned in at step 1.
Select Image > Adjust > Threshold to apply a threshold of 200 to the image produced at step 2.
Select Filter > Blur > Blur More to apply the Blur More filter to the image produced at step 3.
I don’t have any figures—such as the increase in the number of characters correctly identified—to evaluate the performance of my method. However, it did reduce the number of incorrect characters.