Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models Fail to Process Large Embedded File with LocalDB Embedding #3532

Open
l1v0n1 opened this issue Mar 9, 2025 · 0 comments
Open

Models Fail to Process Large Embedded File with LocalDB Embedding #3532

l1v0n1 opened this issue Mar 9, 2025 · 0 comments
Labels
bug-unconfirmed chat gpt4all-chat issues

Comments

@l1v0n1
Copy link

l1v0n1 commented Mar 9, 2025

Bug Report

I embedded a large file with localdb embedding in GPT4All. The file contains a conversation between two people and has over 90,000 lines of messages. After embedding, I tried several models in GPT4All, including:

  • deepseek-r1-distill-qwen-7b
  • llama 3 8b
  • reasoner v1
  • mistral instruct

All models failed to provide accurate results. However, the deepseek-r1-distill model attempted to process the data but often provided incorrect or incomplete answers. For example, when I asked, "What conversations did users have between 2024-10-01 and 2024-10-16?", the model either skipped the date range or skipped many messages and responded that there were no messages in this period. Similarly, when I asked, "What conversations did users have on 2024-09-26?", the model responded with:

Based on the provided context:
Answer:
There are no specific conversations recorded by users named Kamil and Lana on September 26, 2024. The earliest entry in the context is from September 27th onwards.
If you need further assistance or have more data for that date, please provide additional information!

This indicates that the model does not see a lot of chunks, which seems to be the core issue.

Steps to Reproduce

  1. Embed a large file with over 90,000 lines of conversation using localdb embedding in GPT4All.
  2. Try querying the data using various models, including deepseek-r1-distill-qwen-7b, llama 3 8b, reasoner v1, and mistral instruct.
  3. Ask specific questions about conversations within a date range or on a specific date.

Expected Behavior

The model should accurately process and retrieve the conversations within the specified date range or on the specific date without skipping messages or providing incorrect information.

Your Environment

  • GPT4All version: 3.10.0
  • Operating System: Macos sequoia 15.3.1
  • Chat model used (if applicable): Deepseek-r1-Distill-Qwen-7B, Llama 3 8B, Reasoner v1, Mistral instruct]
@l1v0n1 l1v0n1 added bug-unconfirmed chat gpt4all-chat issues labels Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed chat gpt4all-chat issues
Projects
None yet
Development

No branches or pull requests

1 participant