You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Antiword hasn't been updated for a while and now the source has completely disappeared. It would be good to use an alternative way to parse word files.
According to the documentation antiword is used for parsing old MS Word binary doc files (Word 97-2003), while newer MS Word docx files are parsed with python-docx2txt. It is not clear how docx-parser would help with former Word 97-2003 files.
One issue to consider is that doc extension can be either a Word 97-2003 or a newer Word file.
Maybe abiword could be a better alternative in this regard.
Thanks for pointing that out, I must have misread what antiword was actually used for. I don't actually use textract so unfortunately I can't help much with the consideration for Abiword, I just wanted to make sure that the team here was aware of the disappearance of Antiword.
Is your feature request related to a problem? Please describe.
Antiword hasn't been updated for a while and now the source has completely disappeared. It would be good to use an alternative way to parse word files.
Which filetype should textract support?
docx
Which external software (python or command line tool), can parse the requested file type
https://pypi.org/project/docx-parser/
Describe alternatives you've considered
Nothing is done and package managers drop antiword and all it's dependencies inclusing textract
Additional context
Relates to Homebrew/homebrew-core#131387
The text was updated successfully, but these errors were encountered: