Description
I used this workflow
ocrd-olena-binarize -I IMG -O BIN && ocrd-anybaseocr-crop -I BIN -O CROP && ocrd-cis-ocropy-denoise -I CROP -O DENOISE
on this image to get the following result:
And that's fantastic: barely any noise and the cropped area contains the text and nothing else.
However... due to a slight oversight the image has to go through the same workflow again, though with different dimensions and additional black edges on the left- and right-hand side.
original tiff: 1330 x 2163 px
edited tiff: 2479 x 3508 px (including black edges)
The same workflow with the edited image outputs this:
I have tried several different compression rates and I kept the size of the image (but added the black edges), but the result is always very similar to the image above, the cropping is not as close to the text as it is with the original tiff. Is there a way to tweak the parameters of anybaseocr-crop to get a better result or is it easiest to crop manually (though still within OCR-D because I need those specific image dimensions in the PAGE file)? Or is there a better way to transform the original tiff?
Any help would be appreciated.