[Question]: Clarification on HyDE prompt, chunking, and LLM settings for Ultradomain Winrate (Table 1 reproduction) #1829
shshjhjh4455
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Do you need to ask a question?
Your Question
Hi, I'm currently reproducing the Ultradomain winrate results in Table 1 of the LightRAG paper, particularly the comparisons between LightRAG, HyDE, and GraphRAG.
I successfully constructed the graph using LightRAG and evaluated winrate performance using the HyDE library. However, I'm seeing significant performance gaps when using HyDE under different configurations.
Specifically:
When using HyDE with the default prompt, the generated answers often contain hallucinations, and the resulting winrate is significantly lower than what's reported in the paper.
When modifying the prompt to explicitly restrict hallucination and enforce information-grounded answers, the winrate improves notably. (I'll attach both winrate evaluation graphs in this issue.)
To properly reproduce the experiment, could you clarify:
What prompt template was used with HyDE in Table 1?
What chunk size and chunking strategy were used when processing the Ultradomain documents?
Which language model (e.g., OpenAI GPT-4, Claude, GPT-4o, etc.) was used to generate answers in the HyDE evaluation?
Were any additional hyperparameters changed (e.g., top-k retrieval count, number of hypotheses n in HyDE generation)?
And similarly, what were the initial settings for GraphRAG in Table 1?
I'm attaching:
📊 Graph 1: HyDE default prompt, chunk size 128 → low winrate
📊 Graph 2: Custom hallucination-restricted prompt, same chunk size → improved winrate
📊 Table 1 From the Paper

📝 Full evaluation script-adapted Custom hallucination-restricted prompt: HyDE Evaluation Gist
Thank you in advance for your help — accurate reproduction of the original settings would be extremely valuable!
Additional Context
No response
Beta Was this translation helpful? Give feedback.
All reactions