Hard to achieve high recall rate(recall@R) for huge workload #4190
Replies: 3 comments 7 replies
-
|
Hi @EvagelineFEI what's the data distribution like for the smaller datasets as well as the larger datasets? Is the data randomly sampled and provide uniform data distribution? |
Beta Was this translation helpful? Give feedback.
-
|
@EvagelineFEI how many query vectors are you using? All 10k from sift1b? And can you paste how you are computing recall@100? |
Beta Was this translation helpful? Give feedback.
-
|
This actually looks like a classic case of what we call the High-Recall Tradeoff Loop (see WFGY ProblemMap No.6). Once you push HNSW toward the 0.93–0.95 recall ceiling on large-scale datasets (like 10M vectors), you often enter a zone where:
We've encountered this in several benchmark cases. A few suggestions that helped us break through:
Let me know if you're curious. We've actually open-sourced a semantic controller that handles this under an MIT license. Happy to share more if needed. |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Hello team,
I am using faiss.IndexHNSWFlat and my dataset is sift1b from http://corpus-texmex.irisa.fr/.
I add 10M data into the index and then I adjust the parameters(M, efc, efs)to improve the recall rate(recall@100).
I have tried many pairs,(M from 5 to 500) but my best result stuck at 0.93.
It doesn't seem that my precision calculation code has error. And it goes better for smaller dataset(like 1M,2M, where the precision can reach 0.96 or higher). What can be the problem?
Some of my machine info:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
CPU family: 6
Model: 79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
Stepping: 1
CPU max MHz: 3600.0000
CPU min MHz: 1200.0000
BogoMIPS: 4600.00
Beta Was this translation helpful? Give feedback.
All reactions