[BUG] The accuracy of talking to a single document is very high, but when talking to two files, the accuracy is very low, but the information panel can display the most relevant content #460

sandbury · 2024-11-04T04:50:14Z

Description

The accuracy of talking to a single document is very high, but when talking to two files, the accuracy is very low, but the information panel can display the most relevant content，I did my local rag with ollama

Reproduction steps

1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

Screenshots

![DESCRIPTION](LINK.png)

Logs

Session reasoning type None
Session LLM None
Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'>
Reasoning state {'app': {'regen': False}, 'pipeline': {}}
Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x7f8dc0316b30>, FSPath=PosixPath('/code/kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x7f8dc0316dd0>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7f8dbc4dee60>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7f8dbc4def80>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x7f8dbc4df070>), mmr=False, rerankers=[CohereReranking(cohere_api_key='', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x7f8dda14e320>, FSPath=<theflow.base.unset_ object at 0x7f8dda14e320>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x7f8dda14e320>, VS=<theflow.base.unset_ object at 0x7f8dda14e320>, file_ids=[], user_id=<theflow.base.unset_ object at 0x7f8dda14e320>)]
searching in doc_ids ['1bc5ea48-2e16-4ed9-8df7-83a95e111bf7', 'a8b87f79-bb05-483a-bfa6-77e4b491ae60']
retrieval_kwargs: dict_keys(['do_extend', 'scope', 'filters'])
Number of requested results 100 is greater than number of elements in index 21, updating n_results = 21
Got 6 from vectorstore
Got 0 from docstore
Cohere API key not found. Skipping reranking.
Got raw 6 retrieved documents
thumbnail docs 0 non-thumbnail docs 6 raw-thumbnail docs 0
retrieval step took 0.563899040222168
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Got 6 retrieved documents
len (original) 4494
len (trimmed) 4494
Got 0 images
Trying LLM streaming
CitationPipeline: invoking LLM
LLM rerank scores [1.0, 0.3, 0.3, 0.2, 0.2, 0.2]
CitationPipeline: finish invoking LLM
Got 0 cited docs
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
User-id: 1, can see public conversations: True

Browsers

No response

OS

No response

Additional information

No response

taprosoft · 2024-11-04T06:03:08Z

@sandbury probably due to Ollama default context size is 2048 ollama/ollama#1005
This mean event if retrieved documents are correct, documents at the end of the context is cropped due to the model context size and may produce less meaningful result.

QuangTQV · 2024-11-04T08:41:07Z

@sandbury probably due to Ollama default context size is 2048 ollama/ollama#1005 This mean event if retrieved documents are correct, documents at the end of the context is cropped due to the model context size and may produce less meaningful result.

Do you have documentation explaining how you load data and perform chunking, as well as retrieving text, tables, and images?

sandbury added the bug Something isn't working label Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] The accuracy of talking to a single document is very high, but when talking to two files, the accuracy is very low, but the information panel can display the most relevant content #460

[BUG] The accuracy of talking to a single document is very high, but when talking to two files, the accuracy is very low, but the information panel can display the most relevant content #460

sandbury commented Nov 4, 2024

taprosoft commented Nov 4, 2024

QuangTQV commented Nov 4, 2024

[BUG] The accuracy of talking to a single document is very high, but when talking to two files, the accuracy is very low, but the information panel can display the most relevant content #460

[BUG] The accuracy of talking to a single document is very high, but when talking to two files, the accuracy is very low, but the information panel can display the most relevant content #460

Comments

sandbury commented Nov 4, 2024

Description

Reproduction steps

Screenshots

Logs

Browsers

OS

Additional information

taprosoft commented Nov 4, 2024

QuangTQV commented Nov 4, 2024