While using unstructured-ingest fully locally for partitioning odt/doc/docx files, I get this error a lot in the uploading stage:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 244474: character maps to
I think this has to do with the default encoding on Windows not being utf-8. The issue is fixed when changing:
with path.open() as f:" with
with
path.open(encoding="utf-8") as f:
inside the function get_json_data that is located inside utils/data_prep.py.
It might be nice if this problem is fixed in a future update.