Skip to content

How does "Deduplication to Optimize Graph Operation D(.)" identify duplicate nodes? #1526

Answered by reqyou
ntsarb asked this question in Q&A
Discussion options

You must be logged in to vote

I believe the answer can be found directly in the code. If you check operate.py, look at the functions merge_nodes_and_edges and _merge_nodes_then_upsert. The logic is as follows:

  1. Collect all nodes by entity_name:

    all_nodes[entity_name].extend(entities)
  2. Merge and Upsert logic:

    • If the node already exists: append its description to already_description, then either:
      • Append the summary
        GRAPH_FIELD_SEP.join(sorted(set([dp["description"] for dp in nodes_data] + already_description)))
      or
      • If num_fragment >= force_llm_summary_on_merge, generate a new summary using:
        summary = await use_llm_func_with_cache(...)
    • If the node doesn’t exist: create it and for sure merge the description or wr…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@ntsarb
Comment options

Answer selected by ntsarb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants