Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to upload PDF files: Table was not found #2521

Open
Timi7007 opened this issue Oct 22, 2024 · 5 comments
Open

Unable to upload PDF files: Table was not found #2521

Timi7007 opened this issue Oct 22, 2024 · 5 comments
Labels
needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug

Comments

@Timi7007
Copy link

How are you running AnythingLLM?

Docker (local)

What happened?

I'm afraid I'm doing something wrong, as I can't get documents added using the "Save and embed" dialog. Logs show the following:

[backend] info: Adding new vectorized document into namespace buffalo-bills
[backend] info: [RecursiveSplitter] Will split with {"chunkSize":8192,"chunkOverlap":20}
[backend] info: Chunks created from document: 15
[backend] info: [OllamaEmbedder] Embedding 15 chunks of text with nomic-embed-text:latest.
[backend] info: Inserting vectorized chunks into LanceDB collection.
[backend] error: addDocumentToNamespace Table 'buffalo-bills' was not found
[backend] error: Failed to vectorize Buffalo Bills - Wikipedia.pdf

The "Table 'buffalo-bills' was not found" error gets forwarded to the frontend.

Are there known steps to reproduce?

Vector-DB is set to LanceDB as per default, embedding provider is Ollama, I've tried different embedding models with the same result.

@Timi7007 Timi7007 added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Oct 22, 2024
@Timi7007
Copy link
Author

I've just tried again using the native "AnythingLLM Embedder" with the following non-functional result:

[backend] info: Adding new vectorized document into namespace buffalo-bills
[backend] info: [NativeEmbedder] Initialized
[backend] info: [RecursiveSplitter] Will split with {"chunkSize":1000,"chunkOverlap":20}
[backend] info: Chunks created from document: 76
[backend] info: [NativeEmbedder] Embedded Chunk 1 of 4
[backend] info: [NativeEmbedder] Embedded Chunk 2 of 4
[backend] info: [NativeEmbedder] Embedded Chunk 3 of 4
[backend] info: [NativeEmbedder] Embedded Chunk 4 of 4
[backend] info: Inserting vectorized chunks into LanceDB collection.
[backend] error: addDocumentToNamespace lance error: LanceError(IO): Generic LocalFileSystem error: Unable to copy file from /app/server/storage/lancedb/buffalo-bills.lance/_versions/.tmp_1.manifest_4be321d5-b2ec-4add-9c83-2c258c1669b6 to /app/server/storage/lancedb/buffalo-bills.lance/_versions/1.manifest: Function not implemented (os error 38), /home/build_user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-table-0.12.1/src/io/commit.rs:692:54
[backend] error: Failed to vectorize 2021 Buffalo Bills season - Wikipedia.pdf
[backend] info: Adding new vectorized document into namespace buffalo-bills
[backend] info: [NativeEmbedder] Initialized
[backend] info: [RecursiveSplitter] Will split with {"chunkSize":1000,"chunkOverlap":20}
[backend] info: Chunks created from document: 95
[backend] info: [NativeEmbedder] Embedded Chunk 1 of 4
[backend] info: [NativeEmbedder] Embedded Chunk 2 of 4
[backend] info: [NativeEmbedder] Embedded Chunk 3 of 4
[backend] info: [NativeEmbedder] Embedded Chunk 4 of 4
[backend] info: Inserting vectorized chunks into LanceDB collection.
[backend] error: addDocumentToNamespace Table 'buffalo-bills' was not found
[backend] error: Failed to vectorize 2022 Buffalo Bills season - Wikipedia.pdf

This even seems like separate errors. Please advise.

@timothycarambat
Copy link
Member

What does your PDF look like - clearly there is some external or embedded reference to a table that cannot be parsed out of the document

@Timi7007
Copy link
Author

Tried again with a plain .txt, single line, one sentence, no special characters. Still the same issue:

[collector] info: -- Working test.txt --
[collector] info: [SUCCESS]: test.txt converted & ready for embedding.
[backend] info: [CollectorApi] Document test.txt uploaded processed and successfully. It is now available in documents.
[backend] info: [Event Logged] - document_uploaded
[backend] info: Adding new vectorized document into namespace testworkspace
[backend] info: [NativeEmbedder] Initialized
[backend] info: [RecursiveSplitter] Will split with {"chunkSize":1000,"chunkOverlap":20}
[backend] info: Chunks created from document: 1
[backend] info: [NativeEmbedder] Embedded Chunk 1 of 1
[backend] info: Inserting vectorized chunks into LanceDB collection.
[backend] error: addDocumentToNamespace lance error: LanceError(IO): Generic LocalFileSystem error: Unable to copy file from /app/server/storage/lancedb/testworkspace.lance/_versions/.tmp_1.manifest_404b8afe-8daf-4083-9a62-785ca4d619a9 to /app/server/storage/lancedb/testworkspace.lance/_versions/1.manifest: Function not implemented (os error 38), /home/build_user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-table-0.12.1/src/io/commit.rs:692:54
[backend] error: Failed to vectorize test.txt
[backend] info: [Event Logged] - workspace_documents_added
[backend] info: Adding new vectorized document into namespace testworkspace
[backend] info: [NativeEmbedder] Initialized
[backend] info: [RecursiveSplitter] Will split with {"chunkSize":1000,"chunkOverlap":20}
[backend] info: Chunks created from document: 1
[backend] info: [NativeEmbedder] Embedded Chunk 1 of 1
[backend] info: Inserting vectorized chunks into LanceDB collection.
[backend] error: addDocumentToNamespace Table 'testworkspace' was not found
[backend] error: Failed to vectorize test.txt
[backend] info: [TELEMETRY SENT] {"event":"documents_embedded_in_workspace","properties":{"LLMSelection":"ollama","Embedder":"native","VectorDbSelection":"lancedb","TTSSelection":"native","runtime":"docker"}}
[backend] info: [Event Logged] - workspace_documents_added

@timothycarambat
Copy link
Member

This is your issue, its from the lanceDB integration for storing the vectors
[backend] error: addDocumentToNamespace lance error: LanceError(IO): Generic LocalFileSystem error: Unable to copy file from /app/server/storage/lancedb/testworkspace.lance/_versions/.tmp_1.manifest_404b8afe-8daf-4083-9a62-785ca4d619a9 to /app/server/storage/lancedb/testworkspace.lance/_versions/1.manifest: Function not implemented (os error 38), /home/build_user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-table-0.12.1/src/io/commit.rs:692:54

The issue says you are running in Docker, what does the OS you are running on look like and is this using the official image or a custom build?

So the root cause is that causing upserts to fail because tables cannot be written to lance files.

@timothycarambat
Copy link
Member

We have both an x86 and arm image available. Typically trying to run an incompatible arch on the host via docker causes issues like this. Also when the docker storage is mounted to a network drive this can cause IO operation failures

@timothycarambat timothycarambat added needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug and removed possible bug Bug was reported but is not confirmed or is unable to be replicated. labels Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug
Projects
None yet
Development

No branches or pull requests

2 participants