-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Feature: Implemented an option to use DeepSeek Reasoner for intelligent semantic node re-ranking and retrieval #1400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@danielaskdd Please review at your convenience. This is my second PR after the first successful one. The main reason I added this because I felt most KG models simply use vector similarity to pick up the relevant nodes to answer the query, even Microsoft GraphRAG does the same. This change would make LightRAG different and more efficient! |
Here are a few questions:
Providing the model name as a query parameter is not a very general approach, because this seems to allow users to select any model from any provider, which does not appear to reflect the actual situation. |
@LarFii Please review the impact of the reranking approach on improving query results at your earliest convenience. |
@danielaskdd Thanks for the question! Why use a reasoning model for node re-ranking in knowledge graphs?I chose a reasoning model specifically for knowledge graph node re-ranking because traditional vector similarity alone misses critical semantic relationships between nodes. In knowledge graphs, not all nodes are equal - some contain more valuable information or form critical junction points between concepts. The current approach in LightRAG simply ranks nodes by vector similarity scores, which only measures how "close" the node embedding is to the query embedding in vector space. The DeepSeek Reasoner can evaluate nodes based on:
Simply checking relationship depth (like counting how many edges a node has) isn't enough because it doesn't consider the semantic importance of those relationships to the specific query. Examples where node re-ranking improved knowledge graph queriesI've seen substantial improvements in knowledge graph exploration:
The key difference was that the reasoning model could understand which nodes contained the most relevant relationship information for answering the query, not just which ones had similar terminology. Regarding model name as parameterGood point! 😅 Yes, right now I'm passing the model name as a parameter, which isn't ideal. I did it this way initially because I was experimenting with different reasoning models to see which worked best for node evaluation. In reality, the current code is specifically designed for DeepSeek Reasoner since I had to add special handling for its response format and chain-of-thought capabilities when analyzing node relationships and importance. For my next update, I'd like to:
Thanks for pointing this out - it would definitely make the API cleaner and more honest about what's actually happening under the hood! |
@danielaskdd Please see the sample responses. You can run these different responses against an AI judge like GPT to compare responses and see that hybrid response with reasoning stands out.
|
Why choose reasoning model, not v3 chat model instead? Do you compare the result with reranking model like:
|
I have a question, the thought process of reasoning models is quite time-consuming. Used to check if the problem scenario is suitable |
Why choose a reasoning model, not a v3 chat model or specialized reranker?I chose a reasoning model specifically because node re-ranking requires evaluating complex relationships between information pieces, not just relevance scoring. While chat models like GPT-4o are powerful general-purpose models, they aren't explicitly optimized for multi-step reasoning about information relevance. Reasoning models like DeepSeek are fine-tuned to:
I did explore specialized rerankers like BGE and Jina, which are excellent for traditional document retrieval. However, knowledge graph node re-ranking is different - it's not just about query-document relevance but understanding how nodes connect and complement each other. These rerankers typically:
DeepSeek Reasoner has shown exceptional performance on knowledge-intensive reasoning benchmarks, making it ideal for this task. That said, I'd be very interested in running a comparative analysis with these specialized rerankers in the future! Regarding performance concernsYou're absolutely right that reasoning models add processing time - it's a legitimate tradeoff. That's precisely why I implemented this as an optional parameter rather than the default behavior. This gives users flexibility based on their priorities:
In my work with legal and construction use cases, we've found users strongly prefer accuracy over speed when dealing with complex analytical questions. The additional 1-2 seconds is well worth the improved answer quality. For production systems, you could even implement adaptive behavior - use simple vector retrieval for straightforward queries and automatically enable reasoning re-ranking only for complex analytical questions where the extra processing time delivers meaningful improvements. |
Given that reranking nodes has the potential to improve query performance, how do you view the possibility of reranking edges to achieve similar benefits? |
@danielaskdd that can definitely be achieved. I don't think that'd be a problem. Also, I have been seeing "query time" being an issue on the above PR you mentioned and others. I kinda disagree. Time is not a factor for many of the applications. For example, for LAW and M&A applications that we work with, people need to index the entire document only once per day using LightRAG and answer some really critical pre-configured questions. Time is not a factor here, but accuracy is. So, that's why it's an OPTIONAL parameter if people care more about the accuracy than the time taken to answer query. |
While there is another pull request (#1415) that addresses the topic of reranking, my primary concern is how to design a universal and robust reranking interface, which is compatible to RESTFUL API. |
yes, we can definitely look into it. But what's your general perception about the idea? My team would love to use an updated version with deepseek, the prime motivation which caused me to create this PR. @danielaskdd |
Description
Note: This feature could be extended to use other reasoning models as well down the line.
I've added an option to use reasoning-based node re-ranking feature to LightRAG if the user wants that improves retrieval quality by using the DeepSeek Reasoner R1 model to re-rank knowledge graph nodes based on relevance to the query. This goes beyond simple vector similarity by incorporating deep semantic understanding and multi-hop reasoning when selecting nodes.
Unlike traditional cosine similarity currently which only measures proximity in embedding space, this reasoning-based approach considers factors like information richness, contextual relevance, and centrality to the query's needs. This is particularly valuable for complex queries where the most similar embedding doesn't always contain the most useful information.
Current Flow (Vector Similarity Only):
New Implementation (With Reasoning Re-ranking):
Unlike traditional cosine similarity which only measures proximity in embedding space, this reasoning-based approach considers factors like information richness, contextual relevance, and centrality to the query's needs. This is particularly valuable for complex queries where the most similar embedding doesn't always contain the most useful information.
Note: This feature is currently only available for
local
andhybrid
modes.Related Issues
No specific issue - this enhancement came from observing limitations in pure vector similarity retrieval when handling complex analytical queries.
Changes Made
use_reasoning_reranking
boolean flag toQueryParam
to enable/disable reasoning-based re-rankingreasoning_model_name
parameter toQueryParam
to specify which reasoning model to use_rerank_nodes_with_reasoning
function inoperate.py
to handle the reasoning-based re-ranking logicprompt.py
deepseek_r1_complete
function to capture and return reasoning chain of thoughtlightrag_reasoning_rerank_demo.py
) showing the feature in actionChecklist
Additional Notes
This feature addresses several practical use cases:
Usage is straightforward: