You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When searching with a large value of k (1000+), collecting the top-k results becomes dominant in the search time compared to exploring the index and computing distances.
Symptoms
For example, for an IVF200,Flat in 64D with 100k vectors and nprobe=10 we get something like (1 thread search):
The ivf: n1/n2 is n1=number of times the top-k structure was updated (cost log(k)), n2 = number of distances computed.
ie. returning 5k results is 20x slower than returning 1 result (Admittedly, this is an extreme case since we basically ask to return 5% of the dataset). The -1s correspond to empty results (that were not filled in during search).
heap and reservoir
The approach used in Faiss is to collect the results in a reservoir rather than a heap.
The reservoir is a table of size 2k (or some other factor) that is filled unconditionally with results. When it is full, we compute the median distance (complexity O(k)) and drop all results above the median.
This ought to be cheaper than the log(k) update cost of the heap.
When the scanning is finished, the top-k results out of the 2k are returned.
Note that both the heap and the reservoir, when they contain more than k results, are "guarded" by a threshold that rejects any distance worse than the current k-th result.
Current reservoir implementation
In fact, the reservoir is implemented only for the "flat" datasets, as ReservoirTopN.
It is enabled when k is larger than distance_compute_min_k_reservoir (a global variable set to 100 by default).
When searching on a flat index with the same config we get:
distance_compute_min_k_reservoir=100
RH: indicates the number of reservoir updates / total number of calls to the update function
Observations:
the break-down point between the two in speed is more around k=500 than 100. Above that the reservoir is indeed faster
the gap between k=1 and k=5000 is much higher for the Flat index than the IVFFlat index (20 s instead of 9s). This can be explained by looking at the number of top-k structure updates: for the IVF it is 50M, for Flat it is 280M. This is because (1) IVF scans a much smaller fraction of the dataset so statistically fewer updates and (2) it processes inverted lists from near to far, so the best vectors are processed first, then those that are less likely to be in the top-k.
What can be improved?
We could implement the reservoir for IVF. From the results above it would be around 25% faster.
This requires some re-engineering. In particular, the InvertedListScanner would take a result collector instead (or as an alternative) of the current labels and distances.
A broader approach would be to consider that this is similar to filtered search: the top-k is a "filter" over all the computed distances.
Therefore, at search time for IVF (or graph indexes) we would collect all results. In the end, we'd run a parition sort to keep only the results. This is an extreme case of reservoir. It is not clear without implementing how much faster this would be compared to the reservoir approach....
This discussion was converted from issue #4570 on September 08, 2025 17:31.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
When searching with a large value of k (1000+), collecting the top-k results becomes dominant in the search time compared to exploring the index and computing distances.
Symptoms
For example, for an IVF200,Flat in 64D with 100k vectors and nprobe=10 we get something like (1 thread search):
The ivf: n1/n2 is n1=number of times the top-k structure was updated (cost log(k)), n2 = number of distances computed.
ie. returning 5k results is 20x slower than returning 1 result (Admittedly, this is an extreme case since we basically ask to return 5% of the dataset). The -1s correspond to empty results (that were not filled in during search).
heap and reservoir
The approach used in Faiss is to collect the results in a reservoir rather than a heap.
The reservoir is a table of size 2k (or some other factor) that is filled unconditionally with results. When it is full, we compute the median distance (complexity O(k)) and drop all results above the median.
This ought to be cheaper than the log(k) update cost of the heap.
When the scanning is finished, the top-k results out of the 2k are returned.
Note that both the heap and the reservoir, when they contain more than k results, are "guarded" by a threshold that rejects any distance worse than the current k-th result.
Current reservoir implementation
In fact, the reservoir is implemented only for the "flat" datasets, as
ReservoirTopN.It is enabled when k is larger than
distance_compute_min_k_reservoir(a global variable set to 100 by default).When searching on a flat index with the same config we get:
distance_compute_min_k_reservoir=100
distance_compute_min_k_reservoir=10000
RH: indicates the number of reservoir updates / total number of calls to the update function
Observations:
What can be improved?
We could implement the reservoir for IVF. From the results above it would be around 25% faster.
This requires some re-engineering. In particular, the
InvertedListScannerwould take a result collector instead (or as an alternative) of the current labels and distances.A broader approach would be to consider that this is similar to filtered search: the top-k is a "filter" over all the computed distances.
Therefore, at search time for IVF (or graph indexes) we would collect all results. In the end, we'd run a parition sort to keep only the results. This is an extreme case of reservoir. It is not clear without implementing how much faster this would be compared to the reservoir approach....
Beta Was this translation helpful? Give feedback.
All reactions