1. Can you give us the steps to reproduce the issue of being unable to extract text after page 1.
After creating the TImageEnView, I set the following:
v_ImageView.Align := alNone;
v_ImageView.Width := 612;
v_ImageView.Height := 792;
To load the PDF:
v_ImageEnView.PdfViewer.Enabled := True;
v_ImageEnView.IO.LoadFromFilePDF(v_FileName,-1,-1,c_Password);
v_ImageEnView.Fit;
v_TextMemo.Text := v_ImageEnView.PdfViewer.GetText(0,1000000);
// Initialize PDF data position
v_LineNo := 0;
v_Top := 21;
v_Bottom := 27;
while GetDataLine do
begin
if v_RecordType = '5' then
ProcessBatchHeader
else if v_RecordType = '6' then
ProcessEntryDetail
else if v_RecordType = '7' then
ProcessAddendaRecord
else if v_RecordType = '8' then
ProcessBatchControl
else
LogMemo.Lines.Add('Error Processing Record: '+v_RecordType);
Application.ProcessMessages;
end;
function GetDataLine: Boolean;
begin
v_LineNo := V_LineNo+1;
v_Top := v_Top+9;
v_Bottom := v_Bottom+9;
v_RecordType := StringField(0,'RECORDTYPE');
if v_RecordType = '5' then
Result := True
else if v_RecordType = '6' then
Result := True
else if v_RecordType = '7' then
Result := True
else if v_RecordType = '8' then
Result := True
else
Result := False;
end;
function StringField (const p_RecordType: Integer;
const p_FieldName: String): String;
begin
// Get Field Location from Database
dmMain.GetFieldLocation (p_RecordType,p_FieldName,
v_Left,v_Right,v_FieldSize,v_FieldType);
v_Rect := Trect.Create(v_Left,v_Top,v_Right,v_Bottom);
Result := Trim(v_ImageEnView.PdfViewer.GetText(v_Rect));
end;
When the last data on the page is processed, GetDataLine returns false and the process ends. That is why the data on the other pages is not processed.
To fix this I would need to add code to see if there are other pages. Load the next page and repeat the process (assuming the offsets are the same on all pages).
Can you tell me how to do that?
But, it would be so much better if the OCR would simply return the correct number of spaces instead of a single space. Then I could just process the text without all the rectangles.
2. Did you consider iterating over all the text objects in the page to get their positions?
Yes, that is one of the many "number of approaches" I tried. I used demos\other\PDFPageObjects.
All it shows is Object[0] - Form.