Skip to content

Fix diskbbq flush logic #131470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2025
Merged

Conversation

benwtrent
Copy link
Member

I accidentally broke recall on flush by allowing vectors to be double quantized. Additionally, we shouldn't use the first vector as a centroid, this can harm recall significantly when there is just one centroid.

recall before this change:

index_name                             index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
-------------------------------------  ----------  --------  --------------  --------------------  ------------
corpus-dbpedia-entity-E5-small-0.fvec         ivf   1000000           25820                     0            14
corpus-dbpedia-entity-E5-small-0.fvec         ivf   1000000               0                 41693             0

index_name                             index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited  filter_selectivity
-------------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------  ------------------
corpus-dbpedia-entity-E5-small-0.fvec         ivf       50        13.05              0.00           0.00   76.61    0.63  285267.44                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      150        31.92              0.00           0.00   31.33    0.68  629033.22                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      200        34.79              0.00           0.00   28.74    0.69  679699.13                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      500        39.40              0.00           0.00   25.38    0.71  794375.05                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf     1000        45.99              0.00           0.00   21.74    0.72  940493.52                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf       50         1.52              0.00           0.00  655.74    0.74   24201.82                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      150         2.94              0.00           0.00  340.43    0.85   67943.31                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      200         3.81              0.00           0.00  262.81    0.87   89575.99                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      500         7.67              0.00           0.00  130.38    0.93  213586.44                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf     1000        14.85              0.00           0.00   67.33    0.96  402628.11                1.00

With this fix:

index_name                             index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
-------------------------------------  ----------  --------  --------------  --------------------  ------------
corpus-dbpedia-entity-E5-small-0.fvec         ivf   1000000           25304                     0            15
corpus-dbpedia-entity-E5-small-0.fvec         ivf   1000000               0                 42110             0

index_name                             index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited  filter_selectivity
-------------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------  ------------------
corpus-dbpedia-entity-E5-small-0.fvec         ivf       50        12.63              0.00           0.00   79.18    0.89  285527.22                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      150        32.49              0.00           0.00   30.77    0.94  619783.37                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      200        35.46              0.00           0.00   28.20    0.95  667903.47                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      500        40.38              0.00           0.00   24.76    0.97  781959.74                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf     1000        48.62              0.00           0.00   20.57    0.98  931017.40                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf       50         1.55              0.00           0.00  643.09    0.74   23595.57                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      150         2.98              0.00           0.00  335.29    0.85   66299.43                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      200         3.81              0.00           0.00  262.64    0.87   87416.15                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      500         8.80              0.00           0.00  113.64    0.93  209061.37                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf     1000        16.18              0.00           0.00   61.81    0.96  394906.29                1.00

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 17, 2025
Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 17, 2025
@elasticsearchmachine elasticsearchmachine merged commit cf5d40f into elastic:main Jul 17, 2025
33 checks passed
@benwtrent benwtrent deleted the fix-diskbbq-flush branch July 17, 2025 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants