Skip to content

block segmentation: overlaps and quality of prebuilt models #82

Open
@bertsky

Description

@bertsky

Once I got the block segmentation to actually run, I was puzzled over the extremely bad results of the provided model.

Here's how I gradually worked to isolate the problem.

  • using default 0.9 confidence threshold:
a b
FILE_0001_REGIONS-ANYOCR_bbox-best_pageviewer FILE_0002_REGIONS-ANYOCR_bbox-best_pageviewer
  • using lower 0.5 confidence threshold:
a b
FILE_0001_REGIONS-ANYOCR_bbox-all_pageviewer FILE_0002_REGIONS-ANYOCR_bbox-all_pageviewer
  • using default 0.9 confidence threshold, but annotating a polygon from the mask:
a b
FILE_0001_REGIONS-ANYOCR_mask-best_pageviewer FILE_0002_REGIONS-ANYOCR_mask-best_pageviewer
  • using lower 0.5 confidence threshold, but annotating a polygon from the mask:
a b
FILE_0001_REGIONS-ANYOCR_mask-all_pageviewer FILE_0002_REGIONS-ANYOCR_mask-all_pageviewer
  • using lower 0.5 confidence threshold, but annotating a polygon from the mask, and doing non-maximum suppression and other post-processing (like checking for containment):
a b
FILE_0001_REGIONS-ANYOCR_mask-all-nms_pageviewer FILE_0002_REGIONS-ANYOCR_mask-all-nms_pageviewer
  • using even lower 0.02 confidence threshold, but annotating a polygon from the mask, and suppressing the classes header, footer, footnote, footnote-continued, endnote, keynote (reserving their probability mass):
a b
FILE_0001_REGIONS-ANYOCR_mask-all-active_pageviewer FILE_0002_REGIONS-ANYOCR_mask-all-active_pageviewer
  • using even lower 0.02 confidence threshold, but annotating a polygon from the mask, and suppressing the classes header, footer, footnote, footnote-continued, endnote, keynote (reserving their probability mass), and doing non-maximum suppression and other post-processing (like checking for containment):
a b
FILE_0001_REGIONS-ANYOCR_mask-all-active-nms_pageviewer FILE_0002_REGIONS-ANYOCR_mask-all-active-nms_pageviewer

So all these refinements seem crucial.

But it appears that this model was trained on highly overlapping regions – which makes it next to impossible to avoid these overlaps during prediction. And an equally serious problem seems to be the nature of the applied classification: Footnotes just are not visually differentiable from other text regions (only textually/logically) – so they'll just usurp all the energy of their look-alikes. IMHO an adequate modelling treats this subclassification as secondary task.

Hence, inevitably, we need to retrain this.

@n00blet @mahmed1995 @khurramHashmi @mjenckel can you please provide details about the training procedure and dataset you used? There's virtually nothing about this in the OCR-D reader, and your final DFG presentation poster only references one paper on page frame detection and one on dewarping. Am I correct in assuming this repo is where your training tools reside?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions