T O P I C R E V I E W |
coderboy |
Posted - Oct 24 2023 : 08:15:14 Is there a way to determine if a page in a PDF is an image from a scanned paper document? |
5 L A T E S T R E P L I E S (Newest First) |
xequte |
Posted - Oct 25 2023 : 23:09:45 Hi Ale
We'll see how it goes. But we will be limited by the functionality that PDFium provides.
Nigel Xequte Software www.imageen.com
|
aleatprog |
Posted - Oct 25 2023 : 08:53:13 That would be a nice feature for the next official update. In case of multilayer PDF, maybe in the future it could also indicate in which layer the image is, thus, in case of OCRed scans, overlaying or underlaying images can be processed separately.
Ale |
xequte |
Posted - Oct 25 2023 : 00:25:48 Hi
If you email me you can test a beta that lets you retrieve a list of all the objects on the page (just the type of each object). That would tell you if it contains any images, but not anything more than that.
Nigel Xequte Software www.imageen.com
|
coderboy |
Posted - Oct 24 2023 : 13:22:04 Hi Ale,
Some PDFs have text embedded in the page along with the image so that wouldn't work. I need to check if the page has image so that the page can be flagged for someone to review the textual content or to add textual content.
Thanks. |
aleatprog |
Posted - Oct 24 2023 : 10:25:08 Hi coder,
try to extract the textual content using TIEPdfViewerInteraction.GetText. If there isn't any textual content, it may be a scan.
Ale |