Skip to content

Extracting from pdfs #1279

@efuae

Description

@efuae

Hello, I am using Grobid for my project and I am working with PDF Drug Labels. I have noticed a few things that happen when the pdf is extracted into xml:

  1. It often times does not extract the text that comes right after an image
  2. It sometimes captures a new head into the preceding header. For example after extracting section 12.3, it extracts section 12.4 as a continuation of the preceding header.

Could this be looked at please?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions