Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llmsherpa is Missing Information #91

Open
hasandot opened this issue Jul 2, 2024 · 0 comments
Open

llmsherpa is Missing Information #91

hasandot opened this issue Jul 2, 2024 · 0 comments

Comments

@hasandot
Copy link

hasandot commented Jul 2, 2024

I have used llmsherpa to process this PDF.
This is a Network Protocol Specification document.

I have utilized the demo provided by you in Colab.

It does not get any error.
When I convert it to text, it is converting only a portion of the pdf. Essentially it is missing lots of information.
I utilized both pdf url and local pdf file path.

  1. I printed all the section titles and the output does not match the pdf. Output is provided here.
  2. I also converted the pdf to text and it is significantly smaller. Converted text file is here.

My main concern: is there any particular reason why llmsherpa might not work for Network Protocol Specification Pdf documents?

@hasandot hasandot changed the title Pdf Processing is Missing Information llmsherpa is Missing Information Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant