-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Open
Description
Current Behavior
tesseract 5.5.1 tries to use the thumbnail images embedded in tiff files
but it fails to open the thumbnail images and shows the warning
Error in pixReadFromTiffStream: old style jpeg format is not supported
test image
wget https://github.com/user-attachments/files/23533251/helloworld-with-thumb.tiff
tesseract -l eng helloworld-with-thumb.tiff -
Page 1
Hello world
Error in pixReadFromTiffStream: old style jpeg format is not supported
Reproduce
gimp -> export as -> test.tiff -> compression: jpeg, save thumbnail
run tesseract test.tiff
Expected Behavior
tesseract should see that these are just thumbnail images
and should silently ignore them
Suggested Fix
No response
tesseract version
$ tesseract -v
tesseract 5.5.1
leptonica-1.85.0
libgif 5.2.2 : libjpeg 6b (libjpeg-turbo 3.1.0) : libpng 1.6.47 : libtiff 4.7.0 : zlib 1.3.1 : libwebp 1.5.0 : libopenjp2 2.5.2
Operating System
No response
Other Operating System
No response
uname -a
No response
Compiler
No response
CPU
No response
Virtualization / Containers
No response
Other Information
the warning message comes from leptonica/src/tiffio.c
/* Old style jpeg is not supported. We tried supporting 8 bpp.
* TIFFReadScanline() fails on this format, so we used RGBA
* reading, which generates a 4 spp image, and pulled out the
* red component. However, there were problems with double-frees
* in cleanup. For RGB, tiffbpl is exactly half the size that
* you would expect for the raster data in a scanline, which
* is 3 * w. */
TIFFGetFieldDefaulted(tif, TIFFTAG_COMPRESSION, &tiffcomp);
if (tiffcomp == COMPRESSION_OJPEG) {
L_ERROR("old style jpeg format is not supported\n", __func__);
return NULL;
}continue #3008
Metadata
Metadata
Assignees
Labels
No labels