[Question]: Same chunk size for embedding and knowledge graph creation - is it good? #1859
Replies: 1 comment
-
whoa — finally someone calling this out loud 😮 this falls cleanly under what we call ProblemMap No.8 — "semantic overload via structural reuse". we actually built dual-path chunking + semantic role splitters to fix exactly this (part of a larger toolkit we’re open-sourcing). but in short: yes, you’re not imagining things. it really does matter. a lot. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Do you need to ask a question?
Your Question
Hi,
Looking at the source code it looks like the same chunks are used both for embedding and knowledge graph creation. This feels suboptimal according to my research, because looks like the consensus is chunks for embedding should be smaller than chunks for knowledge graph. And even should be split by a different algorithm: For example a couple of sentences with semantic chunking for embedding. And logical pieces of a document for the knowledge graph - including section or header information if possible.
Or did I miss something? Have you done any experiments / research if this matters in case of LightRAG? Any chance to support different chunking strategy for embedding & KG - looks like it would require a pretty big effort...
Additional Context
No response
Beta Was this translation helpful? Give feedback.
All reactions