Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] NanoGraphRag / KeyError: '7' #451

Open
vipervs opened this issue Nov 1, 2024 · 1 comment
Open

[BUG] NanoGraphRag / KeyError: '7' #451

vipervs opened this issue Nov 1, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@vipervs
Copy link

vipervs commented Nov 1, 2024

Description

I got the following error when doing a simple QA with nano graph:
Model: GPT4-o-mini

User-id: 1, can see public conversations: True
Session reasoning type None
Session LLM openai
Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'>
Reasoning state {'app': {'regen': False}, 'pipeline': {}}
Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x306b46c50>, FSPath=PosixPath('/Users/andi/kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x306b47a30>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x331cd0520>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x331cd1300>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x331f94dc0>), mmr=False, rerankers=[CohereReranking(cohere_api_key='WDIdNCKpcA7TlUc4y0IpjisPdNSPdZV8p7kXOrxI', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), NanoGraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x102dce230>, FSPath=<theflow.base.unset_ object at 0x102dce230>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x102dce230>, VS=<theflow.base.unset_ object at 0x102dce230>, file_ids=['bac8649f-72af-44e6-b4c6-91f218d6d6a9'], user_id=<theflow.base.unset_ object at 0x102dce230>)]
searching in doc_ids []
INFO:ktem.index.file.pipelines:Skip retrieval because of no selected files: DocumentRetrievalPipeline(
(vector_retrieval): <function Function._prepare_child..exec at 0x331c1dfc0>
(embedding): <function Function._prepare_child..exec at 0x331c1df30>
)
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
GraphRAG embedding dim 3072
INFO:nano-graphrag:Load KV full_docs with 0 data
INFO:nano-graphrag:Load KV text_chunks with 0 data
INFO:nano-graphrag:Load KV llm_response_cache with 0 data
INFO:nano-graphrag:Load KV community_reports with 0 data
INFO:nano-graphrag:Loaded graph from /Users/andi/kotaemon/ktem_app_data/user_data/files/nano_graphrag/d897887f-bb79-42f5-aabd-d398b9a7f669/input/graph_chunk_entity_relation.graphml with 290 nodes, 188 edges
INFO:nano-vectordb:Load (276, 3072) data
INFO:nano-vectordb:Init {'embedding_dim': 3072, 'metric': 'cosine', 'storage_file': '/Users/andi/kotaemon/ktem_app_data/user_data/files/nano_graphrag/d897887f-bb79-42f5-aabd-d398b9a7f669/input/vdb_entities.json'} 276 data
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Traceback (most recent call last):
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/queueing.py", line 575, in process_events
response = await route_utils.call_process_api(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
result = await self.call_function(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await utils.async_iteration(iterator)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
return await iterator.anext()
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/utils.py", line 656, in anext
return await anyio.to_thread.run_sync(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
result = context.run(func, *args)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
return next(iterator)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
response = next(iterator)
File "/Users/andi/kotaemon/libs/ktem/ktem/pages/chat/init.py", line 899, in chat_fn
for response in pipeline.stream(chat_input, conversation_id, chat_history):
File "/Users/andi/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 705, in stream
docs, infos = self.retrieve(message, history)
File "/Users/andi/kotaemon/libs/ktem/ktem/reasoning/simple.py", line 503, in retrieve
retriever_docs = retriever_node(text=query)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/base.py", line 1097, in call
raise e from None
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/base.py", line 1088, in call
output = self.fl.exec(func, args, kwargs)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/backends/base.py", line 151, in exec
return run(*args, **kwargs)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/middleware.py", line 144, in call
raise e from None
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/middleware.py", line 141, in call
_output = self.next_call(*args, **kwargs)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/middleware.py", line 117, in call
return self.next_call(*args, **kwargs)
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/theflow/base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "/Users/andi/kotaemon/libs/ktem/ktem/index/file/graph/nano_pipelines.py", line 355, in run
entities, relationships, reports, sources = asyncio.run(
File "/opt/homebrew/Cellar/[email protected]/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/Users/andi/kotaemon/libs/ktem/ktem/index/file/graph/nano_pipelines.py", line 142, in nano_graph_rag_build_local_query_context
use_communities = await _find_most_related_community_from_entities(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/nano_graphrag/_op.py", line 698, in _find_most_related_community_from_entities
related_community_keys = sorted(
File "/Users/andi/kotaemon/venv/lib/python3.10/site-packages/nano_graphrag/_op.py", line 702, in
related_community_datas[k]["report_json"].get("rating", -1),
KeyError: '7'
INFO:httpx:HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
User-id: 1, can see public conversations: True


the main issue here is the KeyError: '7', which is being raised during the execution of the _find_most_related_community_from_entities function in the nano_graphrag module. This suggests that the code is trying to access a key (‘7’) in the related_community_datas dictionary that does not exist.

Here’s what could be contributing to this problem:

1.	Missing Data in Dictionary: The related_community_datas dictionary might not have an entry for the key ‘7’, resulting in a KeyError when the code tries to access related_community_datas[k]["report_json"].
2.	Incomplete or Incorrect Data Structure: If the data in the related_community_datas dictionary is incomplete or not structured as expected, this can lead to issues when the code attempts to retrieve specific fields like "report_json" and "rating".
3.	Data Retrieval Logic: The logic in the lambda function might be assuming that all keys have a "report_json" entry with a "rating" field, but that assumption does not hold true for all entries in the data structure.

How to Address This Issue:

•	Check Data Integrity: Verify the contents of the related_community_datas dictionary to ensure that all expected keys and fields exist. This might involve adding some debugging or logging to check which keys are present and how the data is structured.
•	Handle Missing Keys Gracefully: Modify the code to handle cases where a key or nested field is missing. For example, you could add a check before attempting to access related_community_datas[k]["report_json"] or use .get() methods with default values to avoid KeyError.
•	Review Data Loading: Ensure that the data being loaded into related_community_datas is complete and consistent with the requirements of the program. This might involve reviewing how data is generated or retrieved before it’s processed.

Reproduction steps

1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

Screenshots

![DESCRIPTION](LINK.png)

Logs

No response

Browsers

No response

OS

No response

Additional information

No response

@vipervs vipervs added the bug Something isn't working label Nov 1, 2024
@taprosoft
Copy link
Collaborator

@vipervs this seems to be nano-graphrag specific issue. Sometime I observe that JSON community generation can be funky if not using larger LLMs (GPT4o). Please also raise your request and model configuration to https://github.com/gusye1234/nano-graphrag/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants