Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Word docx failing embedding #435

Open
vap0rtranz opened this issue Oct 28, 2024 · 2 comments
Open

[BUG] Word docx failing embedding #435

vap0rtranz opened this issue Oct 28, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@vap0rtranz
Copy link

Description

Embeddings are failing for Word docx format.

The unstructured loader/reader gives an error.

This is using nomic-embed-text

Reproduction steps

1. In UI, select "Click to Upload" and attach local Word docx 
2. Select "Upload and Index"
3. see

Screenshots

![DESCRIPTION](LINK.png)

Logs

Using reader <kotaemon.loaders.unstructured_loader.UnstructuredReader object at 0x7f984bfba020>
No module named 'unstructured'
Traceback (most recent call last):
  File "/media/justin/external/CodeReady/venv-external/lib/python3.10/site-packages/ktem/index/file/pipelines.py", line 795, in stream
    file_id, docs = yield from pipeline.stream(
  File "/media/justin/external/CodeReady/venv-external/lib/python3.10/site-packages/ktem/index/file/pipelines.py", line 642, in stream
    docs = self.loader.load_data(file_path, extra_info=extra_info)
  File "/media/justin/external/CodeReady/venv-external/lib/python3.10/site-packages/kotaemon/loaders/unstructured_loader.py", line 70, in load_data
    from unstructured.partition.auto import partition
ModuleNotFoundError: No module named 'unstructured'

Browsers

No response

OS

Linux

Additional information

No response

@vap0rtranz vap0rtranz added the bug Something isn't working label Oct 28, 2024
@KKenny0
Copy link
Contributor

KKenny0 commented Oct 31, 2024

The module named 'unstructured' might not be installed. You can install it using pip: pip install unstructured.

@vap0rtranz
Copy link
Author

Hmm, OK I installed unstructured. It was indeed not installed. Now there's a different error that blocks the indexing.

It may be faster to reinstall but I've had installation issues: #425

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants