Skip to content

Vertical writing systems are not handled correctly in gImageReader #683

Open
@lhy7889678

Description

@lhy7889678

Vertical writing systems can be OCRed (fairly) reliably with the tesseract command-line tool, but will get garbled characters with gImageReader by default. Horizontal writing systems are not affected.

Here are some sample images (in chi_sim, jpn, chi_sim_vert, jpn_vert respectively):

chi_sim
jpn
chi_sim_vert
jpn_vert

Here are the results using tesseract:

tesseract

(縦組み is not OCRed correctly, but that is not a big problem.)

Here is the result using gImageReader (taking jpn_vert as an example):

gimagereader

I noticed that after rotating the image 90° counterclockwise, the result will be correct:

gimagereader_rot

(and 縦組み is OCRed correctly!)

The issue has been reported in Issue #552, but it is mistakenly regarded as a bug in tessdata. Since the tesseract command-line tool can handle it correctly, it is definitely gImageReader's fault.

I'm using gImageReader 3.4.2 and tesseract 5.4.1 under Arch Linux, using the default tessdata provided by tesseract. I noticed that gImageReader says it is using tesseract 5.3.4 in the "About" dialog, so this might have something to do with the problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions