Note: You must be registered in order to post a reply. To register, click here. Registration is FREE!
T O P I C R E V I E W
Sidney Egnew
Posted - May 04 2017 : 13:46:54 I am doing OCR on full color document images. It would be extremely helpful to be able to drop anything with color, leaving only black and gray, before I OCR.
How can this be done?
3 L A T E S T R E P L I E S (Newest First)
xequte
Posted - May 04 2017 : 23:29:59 Hi Sidney
2. There is some good advice on cleaning images before OCR with Tesseract at:
Posted - May 04 2017 : 23:11:55 It appears IEVision may have some issue unrelated to the color background. I took an image of a colored check and converted everything that was not gray or black to white and everything that was gray to black. All that remained was the letters.
For both the original and the black/white lettering, IEVision did not OCR a significant portion of the image correctly. It appears with IEVision, the extra work to clean the image was of no value.
1) What might be causing this?
As far as the dropping colors and leaving black. There are two ways I know how to do this.
Black and Gray values in RGB have all three properties about equal but when the values are greater than about 220 they are more white than gray. Everything else is colored
I could convert RGB to HSV. When H and S are near 0% and V is less that 75%, the pixel is gray or black. All other pixels are color.
In either method, I just change the pixel to black or white as appropriate and I kept the lettering and gotten rid of the coloring.
2) Do you provide the capability to do this?
3) Can I access the pixels HSV values instead of RGB? (I know how to calculate HSV if you don't provide the capability)
Thanks
xequte
Posted - May 04 2017 : 16:12:27 Hi
Do you mean grayscale the image or exclude/delete any non-gray pixels?