block segmentation: overlaps and quality of prebuilt models

Once I got the block segmentation to actually run, I was puzzled over the extremely bad results of the provided model. 

Here's how I gradually worked to isolate the problem.

- using default 0.9 confidence threshold:

| a | b |
| --- | --- |
| ![FILE_0001_REGIONS-ANYOCR_bbox-best_pageviewer](https://user-images.githubusercontent.com/38561704/106795546-7f136780-665a-11eb-9e17-15c2c15185f8.png) | ![FILE_0002_REGIONS-ANYOCR_bbox-best_pageviewer](https://user-images.githubusercontent.com/38561704/106795582-8b97c000-665a-11eb-8296-eefe7e3263e6.png) |
- using lower 0.5 confidence threshold:

| a | b |
| --- | --- |
| ![FILE_0001_REGIONS-ANYOCR_bbox-all_pageviewer](https://user-images.githubusercontent.com/38561704/106795966-0b258f00-665b-11eb-8ca5-30adc25a66b9.png) | ![FILE_0002_REGIONS-ANYOCR_bbox-all_pageviewer](https://user-images.githubusercontent.com/38561704/106796012-18427e00-665b-11eb-9c08-2950c77d464d.png) |

- using default 0.9 confidence threshold, but annotating a polygon from the **mask**:

| a | b |
| --- | --- |
| ![FILE_0001_REGIONS-ANYOCR_mask-best_pageviewer](https://user-images.githubusercontent.com/38561704/106796082-30b29880-665b-11eb-8944-175afdcae58c.png) | ![FILE_0002_REGIONS-ANYOCR_mask-best_pageviewer](https://user-images.githubusercontent.com/38561704/106796106-38723d00-665b-11eb-9e54-9cece3f6f700.png) |
- using lower 0.5 confidence threshold, but annotating a polygon from the mask:

| a | b |
| --- | --- |
| ![FILE_0001_REGIONS-ANYOCR_mask-all_pageviewer](https://user-images.githubusercontent.com/38561704/106796164-4de76700-665b-11eb-8b63-9c58e51e52d1.png) | ![FILE_0002_REGIONS-ANYOCR_mask-all_pageviewer](https://user-images.githubusercontent.com/38561704/106796199-58096580-665b-11eb-8b0f-673fddbde014.png) |
- using lower 0.5 confidence threshold, but annotating a polygon from the mask, and doing non-maximum suppression and other **post-processing** (like checking for containment):

| a | b |
| --- | --- |
| ![FILE_0001_REGIONS-ANYOCR_mask-all-nms_pageviewer](https://user-images.githubusercontent.com/38561704/106872397-8aef4000-66d3-11eb-9a39-a8f048476e5c.png) | ![FILE_0002_REGIONS-ANYOCR_mask-all-nms_pageviewer](https://user-images.githubusercontent.com/38561704/106872444-980c2f00-66d3-11eb-9b16-25bb07402460.png) |
- using even lower 0.02 confidence threshold, but annotating a polygon from the **mask**, and **suppressing the classes** `header`, `footer`, `footnote`, `footnote-continued`, `endnote`, `keynote` (reserving their probability mass):

| a | b |
| --- | --- |
| ![FILE_0001_REGIONS-ANYOCR_mask-all-active_pageviewer](https://user-images.githubusercontent.com/38561704/106877262-dce69480-66d8-11eb-957d-e85eca828206.png) | ![FILE_0002_REGIONS-ANYOCR_mask-all-active_pageviewer](https://user-images.githubusercontent.com/38561704/106877298-e4a63900-66d8-11eb-9570-7b6f1e994e7f.png) |
- using even lower 0.02 confidence threshold, but annotating a polygon from the **mask**, and **suppressing the classes** `header`, `footer`, `footnote`, `footnote-continued`, `endnote`, `keynote` (reserving their probability mass), and doing non-maximum suppression and other **post-processing** (like checking for containment):

| a | b |
| --- | --- |
| ![FILE_0001_REGIONS-ANYOCR_mask-all-active-nms_pageviewer](https://user-images.githubusercontent.com/38561704/106877078-a90b6f00-66d8-11eb-9899-27582a9d0bb7.png) | ![FILE_0002_REGIONS-ANYOCR_mask-all-active-nms_pageviewer](https://user-images.githubusercontent.com/38561704/106877158-bfb1c600-66d8-11eb-9fbe-29fc65eb72da.png) |

So all these refinements seem crucial. 

But it appears that this model was trained on **highly overlapping** regions – which makes it next to impossible to avoid these overlaps during prediction. And an equally serious problem seems to be the nature of the applied **classification**: Footnotes just are not visually differentiable from other text regions (only textually/logically) – so they'll just usurp all the energy of their look-alikes. IMHO an adequate modelling treats this subclassification as secondary task. 

Hence, inevitably, we need to retrain this. 

@n00blet @mahmed1995 @khurramHashmi @mjenckel can you please provide details about the training procedure and dataset you used? There's virtually nothing about this in the OCR-D reader, and your final DFG presentation poster only references one paper on page frame detection and one on dewarping. Am I correct in assuming [this repo](https://github.com/khurramHashmi/Ocrd_anybaseocr_block_segmentation) is where your training tools reside?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

block segmentation: overlaps and quality of prebuilt models #82

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

block segmentation: overlaps and quality of prebuilt models #82

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions