Models Fail to Process Large Embedded File with LocalDB Embedding #3532

l1v0n1 · 2025-03-09T13:55:36Z

Bug Report

I embedded a large file with localdb embedding in GPT4All. The file contains a conversation between two people and has over 90,000 lines of messages. After embedding, I tried several models in GPT4All, including:

deepseek-r1-distill-qwen-7b
llama 3 8b
reasoner v1
mistral instruct

All models failed to provide accurate results. However, the deepseek-r1-distill model attempted to process the data but often provided incorrect or incomplete answers. For example, when I asked, "What conversations did users have between 2024-10-01 and 2024-10-16?", the model either skipped the date range or skipped many messages and responded that there were no messages in this period. Similarly, when I asked, "What conversations did users have on 2024-09-26?", the model responded with:

Based on the provided context:
Answer:
There are no specific conversations recorded by users named Kamil and Lana on September 26, 2024. The earliest entry in the context is from September 27th onwards.
If you need further assistance or have more data for that date, please provide additional information!

This indicates that the model does not see a lot of chunks, which seems to be the core issue.

Steps to Reproduce

Embed a large file with over 90,000 lines of conversation using localdb embedding in GPT4All.
Try querying the data using various models, including deepseek-r1-distill-qwen-7b, llama 3 8b, reasoner v1, and mistral instruct.
Ask specific questions about conversations within a date range or on a specific date.

Expected Behavior

The model should accurately process and retrieve the conversations within the specified date range or on the specific date without skipping messages or providing incorrect information.

Your Environment

GPT4All version: 3.10.0
Operating System: Macos sequoia 15.3.1
Chat model used (if applicable): Deepseek-r1-Distill-Qwen-7B, Llama 3 8B, Reasoner v1, Mistral instruct]

The text was updated successfully, but these errors were encountered:

l1v0n1 added bug-unconfirmed chat gpt4all-chat issues labels Mar 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models Fail to Process Large Embedded File with LocalDB Embedding #3532

Models Fail to Process Large Embedded File with LocalDB Embedding #3532

l1v0n1 commented Mar 9, 2025

Models Fail to Process Large Embedded File with LocalDB Embedding #3532

Models Fail to Process Large Embedded File with LocalDB Embedding #3532

Comments

l1v0n1 commented Mar 9, 2025

Bug Report

Steps to Reproduce

Expected Behavior

Your Environment