"I" (vowel) symbol recognition

### 🚀 The feature

"I"  (vowel) symbol is systematically missing from recognition model output when using a vertical image layout with **preserve _aspect_ratio** option set **True**.

### Motivation, pitch

I have discovered that the model consistently struggles with recognizing the vowel "I" in various positions within a sentence—particularly at the start and end, when the **preserve_aspect_ratio** parameter in the Pre-Processor is set to **True**.

Setting **preserve_aspect_ratio**=False helps mitigate the issue, but only for vertically oriented text.

However, when using horizontally oriented text setting **preserve_aspect_ratio**=True results in better recognition of "I" symbols occurrences.

I have also tried to change interpolation method in image resize preprocessing stage. While it improves the overall quality of recognition, it does not affect the "I"s.

### Alternatives

I have three suggestions of how it is possible to fix that issue:

1. To use the conditional check to define the value of **preserve_aspect_ratio** parameter based on the ratio of image sides. Where the detection of horizontal text will lead to using **True** value and vertical - to **False** value.

2. Allow to choose the **preserve_aspect_ratio** value when calling the model.

3. To add more horizontal text samples into the datasets and retrain/finetune the detection and recognition models.

### Additional context

I have tested the issue with the image composed from different sentences containing I's on the latest version of doctr library (torch). 

The picture represents a comparison between two runs of ocr on the same image using different values of **preserve_aspect_ratio** parameter.

Colors of boxes meaning:

- the blue color is assigned to **preserve_aspect_ratio** set to **False** outlier results;
- the red color is assigned to **preserve_aspect_ratio** set to **True** outlier results;
- gray color signifies there is no change between runs;
- other colors represent partial difference.

Two json files attached show the doctr ouput on the same image. In the `doctr_ocr_par_False` where are 10 "I" occurrences, while in `doctr_ocr_par_True` where are 5 "I" occurrences.


<img width="1189" height="454" alt="Image" src="https://github.com/user-attachments/assets/f7459f3b-bf81-4a6f-8349-be52c3f3321f" />

<img width="2550" height="3300" alt="Image" src="https://github.com/user-attachments/assets/b597dcab-944b-4e16-a2f0-fa21027111a6" />

[doctr_ocr_par_False.json](https://github.com/user-attachments/files/21184120/doctr_ocr_par_False.json)

[doctr_ocr_par_True.json](https://github.com/user-attachments/files/21184121/doctr_ocr_par_True.json)

The basic script used for tests:
```python
from fastapi import FastAPI, UploadFile, File
import numpy as np
from doctr.models import ocr_predictor
from PIL import Image
import io
import torch
import uvicorn

app = FastAPI()

DETECTION_MODEL = "db_resnet50"
RECOGNITION_MODEL = "crnn_mobilenet_v3_large"
PRESERVE_ASPECT_RATIO = False

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = ocr_predictor(
    pretrained=True,
    det_arch=DETECTION_MODEL,
    reco_arch=RECOGNITION_MODEL,
    assume_straight_pages=True,
    preserve_aspect_ratio=PRESERVE_ASPECT_RATIO,
    symmetric_pad=True,
).to(device=DEVICE)

@app.post("/ocr")
async def ocr(file: UploadFile = File(...)):
    image = await file.read()
    await file.close()

    doc = []
    image_pil = Image.open(io.BytesIO(image)).convert("RGB")
    doc.append(np.asarray(image_pil))

    result = model(doc)
    return result.export()


if __name__ == "__main__":
    uvicorn.run("main:app", host="0.0.0.0", port=44556, reload=False
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"I" (vowel) symbol recognition #1989

🚀 The feature

Motivation, pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"I" (vowel) symbol recognition #1989

Description

🚀 The feature

Motivation, pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions