ImageEn for Delphi and C++ Builder ImageEn for Delphi and C++ Builder

 

ImageEn Forum
Profile    Join    Active Topics    Forum FAQ    Search this forumSearch
Forum membership is Free!  Click Join to sign-up
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 ImageEn Library for Delphi, C++ and .Net
 ImageEn and IEvolution Support Forum
 How to Improve Blank Page Detection
 New Topic  Reply to Topic
Author Previous Topic Topic Next Topic  

Sidney Egnew

USA
55 Posts

Posted - Nov 18 2024 :  08:44:00  Show Profile  Reply
I have over 110,000 PDF files that were scanned in color in duplex mode. I used the code shown below to classify the documents as follows:
53,077 - Classified as all blank backs using 99.8% threshold
19,252 - Classified as duplex
35,365 - Waiting to be classified

v_Index := 1;
v_MinPercent := MaxInt;
repeat
  v_ImageView.ClearAll;
  v_ImageView.IO.Params.FileName := v_FileName;
  v_ImageView.IO.Params.ImageIndex := v_Index;
  v_ImageView.IO.LoadFromFilePDF(v_FileName);
  v_ColorPercent := v_ImageView.Proc.GetDominantColor(v_RGB);
  if v_ColorPercent < 99.8 then
    v_MinPercent := Min(v_MinPercent,v_ColorPercent);
  v_Index := v_Index+2;
until (v_Index >= V_ImageView.IO.Params.ImageCount);
if v_MinPercent = MaxInt then
  UpdateDuplex (p_ScanPageNo,-1,'N');
else
  UpdateDuplex (p_ScanPageNo,v_MinPercent,'Y');

Documents identified as duplex with percentages of 96-97% were sampled with 55% of those classified incorrectly. Since more than half of all duplex classified documents have a percentage of 96% or higher, a very large number of documents are likely to have been classified incorrectly.

The backs in the sample with content showed text, watermarks, handwriting, and in a few instances paper damage. The documents that should have been classified as Non-Duplex are clearly blank to the human eye. What can be done to improve the image classification?

Thanks

xequte

38607 Posts

Posted - Nov 18 2024 :  22:14:59  Show Profile  Reply
Hi Sidney

Can you save some of the pages that are mis-classified to PNG files and post or email them to us?


Nigel
Xequte Software
www.imageen.com
Go to Top of Page

xequte

38607 Posts

Posted - Nov 19 2024 :  18:10:02  Show Profile  Reply
Hi Sidney

Yes, these images contain artifacts that reduce the percentage of the dominant color.

I think you need to have a special case for images that are almost blank like this (e.g. >98%).

For those "maybe blank" images do a further test. Reduce the number of colors/merge similar colors and then reperform GetDominantColor().

For example, your image, 5714771_B.png (which has a gray smudge) returns 98.6% for GetDominantColor(), but I could easily increase this to nearly 100% percent by reducing the color depth, e.g. using thresholding or adjusting the level.

You should try the Every Method demo and add this to the end of the PerformOperation method:

  dd := DestIEViewer.Proc.GetDominantColor( rgb );
  Desc := Desc + format( ' + Dom Color: %s (%s%%)', [ ColorToHex( TRGB2TColor( rgb )), FloatToStrF( dd, ffGeneral, 4, 4 )]);

Then you can try out the various color adjustment and depth methods to find which gives the most reliable result (without increasing the rate of false positives).

Nigel
Xequte Software
www.imageen.com
Go to Top of Page

Sidney Egnew

USA
55 Posts

Posted - Nov 19 2024 :  21:33:56  Show Profile  Reply
I will keep your solution in mind. But I am not too concerned about the smudges as there are watermarks and other valid markings that might be lost. Going forward, I can ask users if they want to ignore the backs when the dominant color is close. Many documents are only front and back and those are indexed anyway.

I am more interested in detecting the roller marks. They seem to be predominately near the left edge of the paper. How can I trim a bit off the images before checking for the dominant color?

Thanks
Go to Top of Page

xequte

38607 Posts

Posted - Nov 19 2024 :  22:14:37  Show Profile  Reply
Hi Sidney

You can just ignore the border area when testing if blank:

// Test if the image is blank (with 1% threshold and ignoring the border area)
threshold  := 1.0;  // Allow 1% of image to be a different color
borderPerc := 10;   // Border area is 10% of width/height
ImageEnView1.SelectionBase := iesbBitmap;
ImageEnView1.Select( MulDiv( ImageEnView1.IEBitmap.Width, borderPerc, 100 ),
                     MulDiv( ImageEnView1.IEBitmap.Height, borderPerc, 100 ),
                     MulDiv( ImageEnView1.IEBitmap.Width, 100 - borderPerc, 100 ),
                     MulDiv( ImageEnView1.IEBitmap.Height, 100 - borderPerc, 100 ));
if ImageEnView1.Proc.GetDominantColor(cl) >= 100 - threshold then
  ShowMessage('Image is blank!')
else
  ShowMessage('Image is NOT blank!');
ImageEnView1.Deselect();


Nigel
Xequte Software
www.imageen.com
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
Jump To: