ImageEn for Delphi and C++ Builder ImageEn for Delphi and C++ Builder

 

ImageEn Forum
Profile    Join    Active Topics    Forum FAQ    Search this forumSearch
Forum membership is Free!  Click Join to sign-up
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 ImageEn Library for Delphi, C++ and .Net
 ImageEn and IEvolution Support Forum
 Keep black and drop colors
 New Topic  Reply to Topic
Author Previous Topic Topic Next Topic  

Sidney Egnew

USA
55 Posts

Posted - May 04 2017 :  13:46:54  Show Profile  Reply
I am doing OCR on full color document images. It would be extremely helpful to be able to drop anything with color, leaving only black and gray, before I OCR.

How can this be done?

xequte

38610 Posts

Posted - May 04 2017 :  16:12:27  Show Profile  Reply
Hi

Do you mean grayscale the image or exclude/delete any non-gray pixels?

Nigel
Xequte Software
www.xequte.com
nigel@xequte.com
Go to Top of Page

Sidney Egnew

USA
55 Posts

Posted - May 04 2017 :  23:11:55  Show Profile  Reply
It appears IEVision may have some issue unrelated to the color background. I took an image of a colored check and converted everything that was not gray or black to white and everything that was gray to black. All that remained was the letters.

For both the original and the black/white lettering, IEVision did not OCR a significant portion of the image correctly. It appears with IEVision, the extra work to clean the image was of no value.

1) What might be causing this?

As far as the dropping colors and leaving black. There are two ways I know how to do this.

Black and Gray values in RGB have all three properties about equal but when the values are greater than about 220 they are more white than gray. Everything else is colored

I could convert RGB to HSV. When H and S are near 0% and V is less that 75%, the pixel is gray or black. All other pixels are color.

In either method, I just change the pixel to black or white as appropriate and I kept the lettering and gotten rid of the coloring.

2) Do you provide the capability to do this?

3) Can I access the pixels HSV values instead of RGB? (I know how to calculate HSV if you don't provide the capability)


Thanks
Go to Top of Page

xequte

38610 Posts

Posted - May 04 2017 :  23:29:59  Show Profile  Reply
Hi Sidney

2. There is some good advice on cleaning images before OCR with Tesseract at:

https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
http://stackoverflow.com/questions/9480013/image-processing-to-improve-tesseract-ocr-accuracy

In terms of ImageEn methods, I would try the automatic enhancement methods to see which works best:

https://www.imageen.com/help/TImageEnProc.AutoImageEnhance1.html
https://www.imageen.com/help/TImageEnProc.AutoImageEnhance2.html
https://www.imageen.com/help/TImageEnProc.AutoImageEnhance3.html

The following can also be useful with B+W images:

http://www.imageen.com/help/TImageEnProc.RemoveNoise.html
http://www.imageen.com/help/TImageEnProc.RemoveIsolatedPixels.html

Also, try the various color enhancement functions:

https://www.imageen.com/help/TImageEnProc.DoPreviews.html


3. Just convert the RGB result to HSV using:

https://www.imageen.com/help/RGB2HSV.html

For performance, ensure you use scanlines rather than directly accessing pixels:

https://www.imageen.com/help/TIEBitmap.ScanLine.html




Nigel
Xequte Software
www.xequte.com
nigel@xequte.com
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
Jump To: