How to perform inference with a model I trained with docTR? #568

kforcodeai · 2021-11-01T07:45:42Z

kforcodeai
Nov 1, 2021

I have trained a detection and recognition model on my dataset using your framework.
But I am unable to use it using ocr_predictor.
How do I pass the model path to

doctr/doctr/models/zoo.py

Line 26 in b149266

def ocr_predictor(

def ocr_predictor(
    det_arch: str = 'db_resnet50',
    reco_arch: str = 'crnn_vgg16_bn',
    pretrained: bool = False,
    **kwargs: Any
) -> OCRPredictor:
    """End-to-end OCR architecture using one model for localization, and another for text recognition.
    Example::
        >>> import numpy as np
        >>> from doctr.models import ocr_predictor
        >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
        >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
        >>> out = model([input_page])
    Args:
        arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
        pretrained: If True, returns a model pre-trained on our OCR dataset
    Returns:
        OCR predictor
    """

    return _predictor(det_arch, reco_arch, pretrained, **kwargs)

What kwargs should I use?

Also, even I put my models in cache_dir, the code goes on to download its own, also throws error for hashes unmatch
https://github.com/mindee/doctr/blob/b149266ea57fd59047193a01c328c2b8ecb9330a/doctr/models/data_utils.py

@fg-mindee

Answered by fg-mindee

Nov 1, 2021

Hello @K-for-Code 👋

The factory function ocr_predictor is a bit more high-level than that, but you can easily achieve what you want :)
Here is a short example of to do this:

import os

os.environ["USE_TORCH"] = "1"

import torch
from doctr.models.predictor import OCRPredictor
from doctr.models.detection.predictor import DetectionPredictor
from doctr.models.recognition.predictor import RecognitionPredictor
from doctr.models.preprocessor import PreProcessor
# from doctr.models.utils import load_pretrained_params

# Instantiate your model here
det_model = ...
reco_model = ...
# Load the checkpoints you produced
# load_pretrained_params(det_model, "<URL_TO_DET_CHECKPOINT>")
# load_pretrained_…

View full answer

fg-mindee · 2021-11-01T11:01:37Z

fg-mindee
Nov 1, 2021

Hello @K-for-Code 👋

The factory function ocr_predictor is a bit more high-level than that, but you can easily achieve what you want :)
Here is a short example of to do this:

import os

os.environ["USE_TORCH"] = "1"

import torch
from doctr.models.predictor import OCRPredictor
from doctr.models.detection.predictor import DetectionPredictor
from doctr.models.recognition.predictor import RecognitionPredictor
from doctr.models.preprocessor import PreProcessor
# from doctr.models.utils import load_pretrained_params

# Instantiate your model here
det_model = ...
reco_model = ...
# Load the checkpoints you produced
# load_pretrained_params(det_model, "<URL_TO_DET_CHECKPOINT>")
# load_pretrained_params(reco_model, "<URL_TO_RECO_CHECKPOINT>")
# If using PyTorch
# import torch
det_params = torch.load("path/to/your/local/det_checkpoint.pt", map_location="cpu")
reco_params = torch.load("path/to/your/local/reco_checkpoint.pt", map_location="cpu")
det_model.load_state_dict(det_params)
reco_model.load_state_dict(reco_params)

# Ask the preprocessor of each task to resize and normalize similarly to your training
# cf. https://github.com/mindee/doctr/blob/main/references/detection/train_pytorch.py#L94 & https://github.com/mindee/doctr/blob/main/references/detection/train_pytorch.py#L109
det_predictor = DetectionPredictor(PreProcessor((1024, 1024), batch_size=1, mean=(0.798, 0.785, 0.772), std=(0.264, 0.2749, 0.287)), det_model)
# cf. https://github.com/mindee/doctr/blob/main/references/recognition/train_pytorch.py#L97 & https://github.com/mindee/doctr/blob/main/references/recognition/train_pytorch.py#L111
reco_predictor = RecognitionPredictor(PreProcessor((32, 128), preserve_aspect_ratio=True, batch_size=32, mean=(0.694, 0.695, 0.693), std=(0.299, 0.296, 0.301)), reco_model)

predictor = OCRPredictor(det_predictor, reco_predictor)

Let me know if you still have questions!

15 replies

fg-mindee Nov 9, 2021

This is really strange: I'm unable to reproduce this behaviour on my end. So it's quite hard to understand where that would come from. I'm almost certain it's because your instantiated sar_resnet31 is the TF version.
But the os.environ["USE_TORCH"] = "1" should have done the trick.

If you are running this in a notebook or something similar, make sure to execute

import os
os.environ["USE_TORCH"] = "1"

before any import of doctr. To confirm that this has worked, at the end of your script, could you add the following line and report back the console output please?

from doctr.file_utils import is_tf_available, is_torch_available
print(f"TF available: {is_tf_available()}, PyTorch available: {is_torch_available()}")

you should be getting (False, True) for the snippet to work. And I suspect that you're having (True, True) or (True, False)

kforcodeai Nov 11, 2021
Author

@fg-mindee, yes I am using a Jupyter Notebook.

fg-mindee Nov 12, 2021

I'm really unable to reproduce this behaviour. Could you try to run your code with a python script or directly in command, rather than on Jupyter please?

kforcodeai Nov 12, 2021
Author

@fg-mindee I was able to resolve if we move the import torch statement before importing sar_resent it works

fg-mindee Nov 12, 2021

That's quite strange 😅

But glad you managed to solve this!

adesgautam · 2021-12-14T08:14:07Z

adesgautam
Dec 14, 2021

Hi, how to perform inference on trained recognition models using tensorflow ?

2 replies

fg-mindee Dec 14, 2021

Almost the same but only doing text recognition:

import os

os.environ["USE_TF"] = "1"

from doctr.models.recognition.predictor import RecognitionPredictor
from doctr.models.preprocessor import PreProcessor

# Instantiate your model here
reco_model = ...
# Load the checkpoints you produced
reco_model.load_weights("path/to/your/local/reco_checkpoint")

# Ask the preprocessor of each task to resize and normalize similarly to your training
# cf. https://github.com/mindee/doctr/blob/main/references/recognition/train_tensorflow.py
reco_predictor = RecognitionPredictor(PreProcessor((32, 128), preserve_aspect_ratio=True, batch_size=32, mean=(0.694, 0.695, 0.693), std=(0.299, 0.296, 0.301)), reco_model)

# Do inference
input_tensor = ...
out = reco_predictor(input_tensor)

Let me know if you encounter some problems!

adesgautam Dec 14, 2021

Thanks you so much!

Used the following to load:

from doctr.models.recognition.predictor import RecognitionPredictor
from doctr.models.preprocessor import PreProcessor

model = crnn_vgg16_bn(pretrained=False, pretrained_backbone=False)
model.load_weights("doctr/crnn_vgg16_bn_20211214-090713/weights")
reco_predictor = RecognitionPredictor(PreProcessor((32, 128), 
                                                   preserve_aspect_ratio=True, 
                                                   batch_size=1, 
                                                   mean=(0.694, 0.695, 0.693), 
                                                   std=(0.299, 0.296, 0.301)), 
                                      model)

dhea1323 · 2022-03-23T09:00:05Z

dhea1323
Mar 23, 2022

Hi @fg-mindee ,
I want to make predictions separately between detection and recognition stage without predictor = OCRPredictor(det_predictor, reco_predictor), here's the example

doc = DocumentFile.from_pdf("path/to/pdf_file.pdf")
# Detection model
det_model = db_resnet50(pretrained=False)
det_param = torch.load("./path/to/load_model.pt", map_location="cpu")
det_model.load_state_dict(det_param)
det_predictor = DetectionPredictor(PreProcessor((1024, 1024), batch_size=1, mean=(0.798, 0.785, 0.772), std=(0.264, 0.2749, 0.287)), det_model)
detection = det_predictor(doc)

#Recognition model
reco_model = crnn_vgg16_bn(pretrained=False)
reco_param = torch.load("./path/to/load_model.pt", map_location="cpu")
reco_model.load_state_dict(reco_param)
reco_predictor = RecognitionPredictor(PreProcessor((32, 128), preserve_aspect_ratio=True, batch_size=32, mean=(0.694, 0.695, 0.693), std=(0.299, 0.296, 0.301)), reco_model)        
recognition = reco_predictor(detection)

and of course I got an error 😅 because I don't know what supposed to do after getting the output from detection stage detection = det_predictor(doc) then proceed to the recognition stage recognition = reco_predictor(detection).

Here's the error : ValueError: incorrect input shape: all crops are expected to be multi-channel 2D images.
I expect the final result to be a string with .render()

Thank You.

1 reply

frgfm Aug 5, 2022
Maintainer

Oops, for some reasons I didn't see your message @dhea1323!

Did you manage to solve your problem?

shahdghorsi · 2022-11-24T17:58:18Z

shahdghorsi
Nov 24, 2022

Hello @fg-mindee ,
I used one of the pretrained models, I am trying to save the output instead of showing it, but I could not really achieve that.
result = model(doc)
result.show(doc) # what to use instead of show() here to save the output.

Thanks,

4 replies

felixdittrich92 Nov 25, 2022
Maintainer

Hi @adesgautam 👋,

you have some options:

.render() -> get the predicted text as String
.export() -> get results as json format dict
.export_as_xml() -> get the xml as string and the whole ElementTree object
.synthesize() -> get a list of each page (as blank image where the results are printed) # i think you are searching for this :)

shahdghorsi Nov 25, 2022

Thank you so much @felixdittrich92 that saved a picture.

shahdghorsi Nov 25, 2022

I was wondering if it is possible to have the document as it is instead of an image, I do want to get it with the bounding boxes.

felixdittrich92 Nov 25, 2022
Maintainer

we do not provide a 'ready to use' functionality for this but you can solve it on your own :)

Ref: #570 (comment)

you could for example break it down into a for loop and draw with opencv all boxes per page and save the image

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to perform inference with a model I trained with docTR? #568

{{title}}

Replies: 4 comments 22 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to perform inference with a model I trained with docTR? #568

What kwargs should I use?

Replies: 4 comments · 22 replies

kforcodeai Nov 11, 2021 Author

kforcodeai Nov 12, 2021 Author

frgfm Aug 5, 2022 Maintainer

felixdittrich92 Nov 25, 2022 Maintainer

felixdittrich92 Nov 25, 2022 Maintainer

Replies: 4 comments 22 replies

kforcodeai Nov 11, 2021
Author

kforcodeai Nov 12, 2021
Author

frgfm Aug 5, 2022
Maintainer

felixdittrich92 Nov 25, 2022
Maintainer

felixdittrich92 Nov 25, 2022
Maintainer