Skip to content

Vertical block separation #1512

@snewman-aa

Description

@snewman-aa

🚀 The feature

Currently, builder.py has a paragraph_break parameter for merging sub_lines that are relatively close enough.

I would appreciate a similar parameter for merging stacked lines that are vertically close enough.

Motivation, pitch

Screenshot 2024-03-13 at 2 01 24 PM

Currently, when I run docTR on the above image and images with similar lower thirds, I get the following from result.render() with the \n\n representing separating different blocks. I would like to be able to direct the builder to merge lines that are this close into one block containing two lines rather than getting two blocks that contain one line each.
REP. PAUL LEONARD\n\nD-DAYTON

here is the document object:

Document(
  (pages): [Page(
    dimensions=(360, 480)
    (blocks): [
      Block(
        (lines): [Line(
          (words): [
            Word(value='REP.', confidence=0.99),
            Word(value='PAUL', confidence=1.0),
            Word(value='LEONARD', confidence=1.0),
          ]
        )]
        (artefacts): []
      ),
      Block(
        (lines): [Line(
          (words): [Word(value='D-DAYTON', confidence=0.99)]
        )]
        (artefacts): []
      ),
    ]
  )]
)

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions