Require detailed explanation on few points #423
-
Hello fg-midee & charlesmindee, |
Beta Was this translation helpful? Give feedback.
Replies: 24 comments 6 replies
-
Hi @vishveshtrivedi 👋 Glad to hear that the library is useful!
Hope this helps, let me know if I misunderstood something :) |
Beta Was this translation helpful? Give feedback.
-
Thanks for quick reply.
Thanks a lot!! |
Beta Was this translation helpful? Give feedback.
-
correct!
regarding the quantity of data, cf. my answer on 6 ;)
For now, it's true that users cannot change the threshold for postprocessing. But once your model is instantiated, you can always do: model.postprocessor.box_thresh = your_new_threshold setting a lower value will keep more boxes
It's the postprocessing from the paper, if you want to check the code: https://github.com/mindee/doctr/blob/main/doctr/models/detection/core.py#L85-L116
predictions are postprocessed results (the boxes), while out_map is the logits tensor coming out of the model (a segmentation map of sorts) :) Let me know if that isn't very clear! |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot for the reply!! |
Beta Was this translation helpful? Give feedback.
-
Simply put:
If at some point we see that we need bigger architectures, we'll try for now we favour lighter models 👍 |
Beta Was this translation helpful? Give feedback.
-
Thanks!!!! |
Beta Was this translation helpful? Give feedback.
-
Hi @fg-mindee |
Beta Was this translation helpful? Give feedback.
-
Hi @vishveshtrivedi, The bin_thresh value is used to binarize the raw segmentation map, if you lower it most likely you will detect more words but the risk is to loose the space between words. It should probably lead to a higher recall and a lower precision for the detection task. The recognition task does not use bin_thresh, but boxes detected in the detection task with bin_thresh are used to recognize words, so in a way, it is related. If you have a too high bin_thresh you will end up with (too) large boxes, with maybe more than 1 word on each box. This can lead to a bad final recognition result because we don't have spaces in our vocabularies and thus our models can only deal with 1 word in each box.
Thank you and have a good day ! |
Beta Was this translation helpful? Give feedback.
-
Hi @charlesmindee, Thanks a lot ! |
Beta Was this translation helpful? Give feedback.
-
For the first point, it seems quite logical to detect more words when you decrease the threshold, as explained above. |
Beta Was this translation helpful? Give feedback.
-
Hi @charlesmindee , |
Beta Was this translation helpful? Give feedback.
-
Hi, for the first case the result is logical as you mentioned, for the second case it is quite weird. I must admit I can't really explain that, it is strange because the model did recognized the right digits but replaced the second slash and removed the first one. Which recognition model did you use ? |
Beta Was this translation helpful? Give feedback.
-
Hi @charlesmindee, |
Beta Was this translation helpful? Give feedback.
-
Hi @vishveshtrivedi, |
Beta Was this translation helpful? Give feedback.
-
Hi @charlesmindee, Also, an interesting thing I noticed was that I changed the image DPI from 500 to 600 (we are converting PDF to images) and in a few images the recognition improved a lot. Is there some reason for recognition being so sensitive to DPI? Finally, would it be recommended for the input image to be in a certain preprocessed way (binarized, greyscale, deblured, etc.)? Thank you so much! |
Beta Was this translation helpful? Give feedback.
-
Hi @vishveshtrivedi, If you increase DPI, you will have higher resolution images from your pdf pages, and this can help the recognition model to distinguish letters written in small fonts or slightly blurred lines which can't be resolved at a lower resolution. However, 500 DPI is already a huge resolution (4134 x 5846 Pixel for a A4 page). Are you feeding the model with a document from_pdf or do you convert your pdf to images before creating yout document object ? We almost exclusively work with a DPI of 144, and it seems to be enough for A4 pdf pages. You don't need to binarize/greyscale/... or preprocess your images before feeding the model, it should work fine! Of course, if you work with particularly noisy or blurred documents, it should only improve you performances to preprocess the data. Have a nice day! |
Beta Was this translation helpful? Give feedback.
-
Hi @charlesmindee, Thanks a lot ! |
Beta Was this translation helpful? Give feedback.
-
OK, you can also use our pdf converter instantiating a document from_pdf(), it will use a 144 DPI rate for the conversion. Thanks and don't hesitate to come back with new questions ! |
Beta Was this translation helpful? Give feedback.
-
Hi @charlesmindee , |
Beta Was this translation helpful? Give feedback.
-
Hi @charlesmindee , |
Beta Was this translation helpful? Give feedback.
-
Hi @charlesmindee , |
Beta Was this translation helpful? Give feedback.
-
Hi @charlesmindee, |
Beta Was this translation helpful? Give feedback.
Hi @vishveshtrivedi 👋
Glad to hear that the library is useful!
Here are some answers to your questions:
--pretrained
flag will start your training from the version we trained. You will need to format your dataset to the format mentioned in the README for it to workreturn_model_output=True
as an argument of your…