Recognition output contains adjacent characters from neighboring words #1845

hchintada · 2023-04-07T08:58:59Z

hchintada
Apr 7, 2023

if the actual text in the image is like:

"What are the components of a yield"

the detection model (DBNet++) is detecting a bbox for each of the words.

But recogntion model (SAR) is adding letters from adjacent words in each word prediction. It gives something like:
"?Whata tare ethe components ofay ayield"

What is causing this, and how do I handle such predictions?

gaotongxiao · 2023-04-07T09:06:17Z

gaotongxiao
Apr 7, 2023
Maintainer

It could be a bug happening at:

mmocr/mmocr/apis/inferencers/mmocr_inferencer.py

Line 167 in c886936

self.rec_inputs.append(crop_img(img, quad))

which crops the detected region, but with some slight padding:

mmocr/mmocr/utils/img_utils.py

Line 63 in b0b6dad

def crop_img(src_img, box, long_edge_pad_ratio=0.4, short_edge_pad_ratio=0.2):

You can try to bypass the padding by crop_img(img, quad, 0, 0). Let us know if that helps. :)

1 reply

hchintada Apr 10, 2023
Author

Tried bypassing the padding like you suggested, it worked much better. Thank you!.
Why don't you make this a parameter in the inferencer class, which the end user can change based on requirement?

Insktall · 2023-04-20T06:01:59Z

Insktall
Apr 20, 2023

I would also like to suggest this the issue you are experiencing is known as "word segmentation errors" or "tokenization errors". Word segmentation is the process of dividing a sentence into individual words, or tokens, and it is a critical step in optical character recognition (OCR) tasks. In your case, it seems that the segmentation performed by the recognition model is not accurate, resulting in letters from adjacent words being included in each word prediction. There could be several reasons why the recognition model is struggling with word segmentation. One possible cause is the font and style of the text in the image. Certain fonts and styles can be more challenging for OCR algorithms to read accurately, especially if the characters are closely spaced or have unusual shapes. To handle such predictions, there are a few strategies you could try. One approach is to preprocess the image and try to enhance the contrast and sharpness of the text. This could help to make the characters more distinct and easier to segment accurately.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recognition output contains adjacent characters from neighboring words #1845

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Recognition output contains adjacent characters from neighboring words #1845

Uh oh!

hchintada Apr 7, 2023

Replies: 2 comments · 1 reply

Uh oh!

gaotongxiao Apr 7, 2023 Maintainer

Uh oh!

hchintada Apr 10, 2023 Author

Uh oh!

Uh oh!

Insktall Apr 20, 2023

hchintada
Apr 7, 2023

Replies: 2 comments 1 reply

gaotongxiao
Apr 7, 2023
Maintainer

hchintada Apr 10, 2023
Author

Insktall
Apr 20, 2023