Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Antiword with a Python alternative #468

Open
SMillerDev opened this issue Jun 12, 2023 · 2 comments
Open

Replace Antiword with a Python alternative #468

SMillerDev opened this issue Jun 12, 2023 · 2 comments

Comments

@SMillerDev
Copy link

Is your feature request related to a problem? Please describe.
Antiword hasn't been updated for a while and now the source has completely disappeared. It would be good to use an alternative way to parse word files.

Which filetype should textract support?
docx

Which external software (python or command line tool), can parse the requested file type
https://pypi.org/project/docx-parser/

Describe alternatives you've considered
Nothing is done and package managers drop antiword and all it's dependencies inclusing textract

Additional context
Relates to Homebrew/homebrew-core#131387

@michelemaroni
Copy link

According to the documentation antiword is used for parsing old MS Word binary doc files (Word 97-2003), while newer MS Word docx files are parsed with python-docx2txt. It is not clear how docx-parser would help with former Word 97-2003 files.

One issue to consider is that doc extension can be either a Word 97-2003 or a newer Word file.
Maybe abiword could be a better alternative in this regard.

@SMillerDev
Copy link
Author

Thanks for pointing that out, I must have misread what antiword was actually used for. I don't actually use textract so unfortunately I can't help much with the consideration for Abiword, I just wanted to make sure that the team here was aware of the disappearance of Antiword.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants