QueryParam to format retrieved context #1725
Replies: 1 comment
-
Yeah, this hits a very real pain point. I’ve actually run into the same issue — when the retrieved context gets returned as one big blob (even if it’s technically a dict), it becomes impossible to trace the semantic boundary of each document. Attribution, scoring, even just reranking — everything gets mushy. At one point, I started treating the retrieved context more like a stream of “semantic packets” — where each packet tries to preserve internal coherence and is ID-tagged. But to even attempt that, I had to override the default return format and rebuild a little formatting layer on top. Tedious, but worth it. A built-in queryParam for controlling that structure would be a massive win. Wouldn’t even need to be fancy — just give us the option to get back a list of strings, so downstream models don’t hallucinate over mangled document boundaries. +1 on this. It’s small, but foundational. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
It would be beneficial to introduce a queryParam (or similar configuration) in LightRAG that allows developers to control the format of the retrieved context, specifically to return it as a list of strings, where each string represents a retrieved document.
Motivation
In many downstream applications, particularly those involving grounding, retrieval-augmented generation (RAG), or knowledge graph integration, it is important to maintain clear document boundaries in the retrieved context. Currently, the context is returned as a single dictionary, which:
Makes post-processing, such as attribution, scoring, or reranking, more error-prone
Breaks compatibility with structured LLM inputs that expect a list of discrete documents
Beta Was this translation helpful? Give feedback.
All reactions