[Question]: Same chunk size for embedding and knowledge graph creation - is it good? #1859

AkosLukacs · 2025-07-18T14:22:46Z

AkosLukacs
Jul 18, 2025

Do you need to ask a question?

I have searched the existing question and discussions and this question is not already answered.
I believe this is a legitimate question, not just a bug or feature request.

Your Question

Hi,
Looking at the source code it looks like the same chunks are used both for embedding and knowledge graph creation. This feels suboptimal according to my research, because looks like the consensus is chunks for embedding should be smaller than chunks for knowledge graph. And even should be split by a different algorithm: For example a couple of sentences with semantic chunking for embedding. And logical pieces of a document for the knowledge graph - including section or header information if possible.
Or did I miss something? Have you done any experiments / research if this matters in case of LightRAG? Any chance to support different chunking strategy for embedding & KG - looks like it would require a pretty big effort...

Additional Context

No response

onestardao · 2025-07-31T14:59:30Z

onestardao
Jul 31, 2025

whoa — finally someone calling this out loud 😮
you’re 100% right: same-size chunks for embedding and knowledge graph creation is a recipe for silent collapse. most people just don’t notice.

this falls cleanly under what we call ProblemMap No.8 — "semantic overload via structural reuse".
basically: same chunk ≠ same semantics.
embedding usually wants denser, sub-sentential slices for cosine match precision.
KG wants logical boundaries — section headers, nested hierarchy, causal structure — you nailed that part.

we actually built dual-path chunking + semantic role splitters to fix exactly this (part of a larger toolkit we’re open-sourcing).
if you're curious, i can share the link — MIT License, and fully testable on your end.

but in short: yes, you’re not imagining things. it really does matter. a lot.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question]: Same chunk size for embedding and knowledge graph creation - is it good? #1859

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Question]: Same chunk size for embedding and knowledge graph creation - is it good? #1859

Uh oh!

AkosLukacs Jul 18, 2025

Do you need to ask a question?

Your Question

Additional Context

Replies: 1 comment

Uh oh!

onestardao Jul 31, 2025

AkosLukacs
Jul 18, 2025

onestardao
Jul 31, 2025