You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I embedded a large file with localdb embedding in GPT4All. The file contains a conversation between two people and has over 90,000 lines of messages. After embedding, I tried several models in GPT4All, including:
deepseek-r1-distill-qwen-7b
llama 3 8b
reasoner v1
mistral instruct
All models failed to provide accurate results. However, the deepseek-r1-distill model attempted to process the data but often provided incorrect or incomplete answers. For example, when I asked, "What conversations did users have between 2024-10-01 and 2024-10-16?", the model either skipped the date range or skipped many messages and responded that there were no messages in this period. Similarly, when I asked, "What conversations did users have on 2024-09-26?", the model responded with:
Based on the provided context:
Answer:
There are no specific conversations recorded by users named Kamil and Lana on September 26, 2024. The earliest entry in the context is from September 27th onwards.
If you need further assistance or have more data for that date, please provide additional information!
This indicates that the model does not see a lot of chunks, which seems to be the core issue.
Steps to Reproduce
Embed a large file with over 90,000 lines of conversation using localdb embedding in GPT4All.
Try querying the data using various models, including deepseek-r1-distill-qwen-7b, llama 3 8b, reasoner v1, and mistral instruct.
Ask specific questions about conversations within a date range or on a specific date.
Expected Behavior
The model should accurately process and retrieve the conversations within the specified date range or on the specific date without skipping messages or providing incorrect information.
Your Environment
GPT4All version: 3.10.0
Operating System: Macos sequoia 15.3.1
Chat model used (if applicable): Deepseek-r1-Distill-Qwen-7B, Llama 3 8B, Reasoner v1, Mistral instruct]
The text was updated successfully, but these errors were encountered:
Bug Report
I embedded a large file with localdb embedding in GPT4All. The file contains a conversation between two people and has over 90,000 lines of messages. After embedding, I tried several models in GPT4All, including:
All models failed to provide accurate results. However, the deepseek-r1-distill model attempted to process the data but often provided incorrect or incomplete answers. For example, when I asked, "What conversations did users have between 2024-10-01 and 2024-10-16?", the model either skipped the date range or skipped many messages and responded that there were no messages in this period. Similarly, when I asked, "What conversations did users have on 2024-09-26?", the model responded with:
This indicates that the model does not see a lot of chunks, which seems to be the core issue.
Steps to Reproduce
Expected Behavior
The model should accurately process and retrieve the conversations within the specified date range or on the specific date without skipping messages or providing incorrect information.
Your Environment
The text was updated successfully, but these errors were encountered: