Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Textract plugin fails to install #412

Closed
timalamenciak opened this issue Jul 23, 2024 · 3 comments
Closed

Textract plugin fails to install #412

timalamenciak opened this issue Jul 23, 2024 · 3 comments

Comments

@timalamenciak
Copy link
Contributor

Textract 1.6.5 has some problems in its build file which causes the install to fail - documented here: deanmalmgren/textract#476

The only workaround that I have found to work has been installing textract-py3 - a minimally maintained version without this build vulnerability: https://github.com/KyleKing/textract-py3

However, when I try to point OntoGPT at a PDF, it hangs. Presumably it's not trying to use textract-py3 and is looking for vanilla textract.

@caufieldjh
Copy link
Member

Strange - I'm unable to reproduce this issue.
Installing with pip (as pip install ontogpt[textract]) or with poetry (as poetry install --extras textract) both appear to work without issue. Of course, in both cases it's installing textract 1.5.0, which is pretty old, on Python 3.10.
I'll enable using textract-py3 as an optional dependency.

@caufieldjh
Copy link
Member

caufieldjh commented Jul 23, 2024

OK, please let me know if PDF processing works after removing textract and installing the textract-py3 extra (with the repo version - haven't included this in a release yet)

@timalamenciak
Copy link
Contributor Author

Installing the repo version seemed to work off the bat - it installs textract 1.5.0, which may not have this problem.

Of course now textract throws an error with the PDFs because of some undefined characters but I think that is beyond the scope of this issue request. Thanks for the quick response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants