Skip to content

Thresholding, overquery, and compression #254

@jbellis

Description

@jbellis

The interaction between threshold and topk is confusing, it's almost "stop the search whichever comes first", but not quite, because we don't actually start accumulating results towards topk until we clear the threshold.

I'd like to simplify this to

  1. Find a local maximum in the graph using an approach similar to TwoPhaseTracker
  2. Pull out the top k

However, this doesn't currently work because there's a ton of noise introduced by compression. We can fix this by performing step 1 using exact similarities for the top candidate.

We know that that introduces a lot of expensive full-resolution comparisons. But we might be able to make up for that if we push overquery into the search process. (So if user asks for 100 results we pass topk=100 into the search and let it deal with the noise.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions