Replies: 2 comments 1 reply
-
It could be a bug happening at: which crops the detected region, but with some slight padding: mmocr/mmocr/utils/img_utils.py Line 63 in b0b6dad You can try to bypass the padding by |
Beta Was this translation helpful? Give feedback.
-
I would also like to suggest this the issue you are experiencing is known as "word segmentation errors" or "tokenization errors". Word segmentation is the process of dividing a sentence into individual words, or tokens, and it is a critical step in optical character recognition (OCR) tasks. In your case, it seems that the segmentation performed by the recognition model is not accurate, resulting in letters from adjacent words being included in each word prediction. There could be several reasons why the recognition model is struggling with word segmentation. One possible cause is the font and style of the text in the image. Certain fonts and styles can be more challenging for OCR algorithms to read accurately, especially if the characters are closely spaced or have unusual shapes. To handle such predictions, there are a few strategies you could try. One approach is to preprocess the image and try to enhance the contrast and sharpness of the text. This could help to make the characters more distinct and easier to segment accurately. |
Beta Was this translation helpful? Give feedback.
-
if the actual text in the image is like:
"What are the components of a yield"
the detection model (DBNet++) is detecting a bbox for each of the words.
But recogntion model (SAR) is adding letters from adjacent words in each word prediction. It gives something like:
"?Whata tare ethe components ofay ayield"
What is causing this, and how do I handle such predictions?
Beta Was this translation helpful? Give feedback.
All reactions