Skip to content

Error in pixReadFromTiffStream: old style jpeg format is not supported #4475

@milahu

Description

@milahu

Current Behavior

tesseract 5.5.1 tries to use the thumbnail images embedded in tiff files
but it fails to open the thumbnail images and shows the warning

Error in pixReadFromTiffStream: old style jpeg format is not supported

test image

helloworld-with-thumb.tiff

wget https://github.com/user-attachments/files/23533251/helloworld-with-thumb.tiff
tesseract -l eng helloworld-with-thumb.tiff -
Page 1
Hello world
Error in pixReadFromTiffStream: old style jpeg format is not supported

Reproduce

gimp -> export as -> test.tiff -> compression: jpeg, save thumbnail
run tesseract test.tiff

Expected Behavior

tesseract should see that these are just thumbnail images
and should silently ignore them

Suggested Fix

No response

tesseract version

$ tesseract -v
tesseract 5.5.1
 leptonica-1.85.0
  libgif 5.2.2 : libjpeg 6b (libjpeg-turbo 3.1.0) : libpng 1.6.47 : libtiff 4.7.0 : zlib 1.3.1 : libwebp 1.5.0 : libopenjp2 2.5.2

Operating System

No response

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

the warning message comes from leptonica/src/tiffio.c

        /* Old style jpeg is not supported.  We tried supporting 8 bpp.
         * TIFFReadScanline() fails on this format, so we used RGBA
         * reading, which generates a 4 spp image, and pulled out the
         * red component.  However, there were problems with double-frees
         * in cleanup.  For RGB, tiffbpl is exactly half the size that
         * you would expect for the raster data in a scanline, which
         * is 3 * w.  */
    TIFFGetFieldDefaulted(tif, TIFFTAG_COMPRESSION, &tiffcomp);
    if (tiffcomp == COMPRESSION_OJPEG) {
        L_ERROR("old style jpeg format is not supported\n", __func__);
        return NULL;
    }

continue #3008

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions