Skip to content

v0.5.0

Latest
Compare
Choose a tag to compare
@irevoire irevoire released this 01 Oct 14:08
· 3 commits to main since this release
629d1d1

New features

Binary quantization by @irevoire in #82

The binary quantization lets you index up to 10 times more items for the same amount of disk.
The drawback is that it reduces the relevancy when querying documents.
The more dimensions your dataset has, the less the relevancy is impacted. After benchmarking the binary quantization a lot we recommend you use it if:

  • You have (or plan to have) more than 100_000 items in your database
  • Your items have more than 1400 dimensions

To use the feature, you can simply change the Distance provided when opening a Writer and a Reader by adding BinaryQuantized to it.
Euclidean becomes BinaryQuantizedEuclidean for example.

Warning

Enabling the binary quantization is a destructive operation. Once enabled, all your vectors will be modified to only contain -1 and 1, and you won’t be able to get back your original vectors ever again.

Finally, binary quantization has not been implemented for the dot-product distance.

Accept a function to abort the indexing process by @irevoire in #86

If you ever wanted to stop arroy from finishing an indexing process, that’s for you.
You can now provide a closure that arroy will call from time to time, and if it returns true arroy will stop as quickly as possible and return the new error: BuildCancelled.

Breaking

Rename the angular distance to cosine distance by @irevoire in #94

This is both API-breaking and DB-breaking, which means you'll have to re-import all your vectors by hand in arroy after upgrading.
Since it’s more common, we decided to rename Angular and BinaryQuantizedAngular to Cosine and BinaryQuantizedCosine.

Use builder pattern for the configuration by @irevoire in #96

This is API breaking.

Since the API to query vectors and build databases was getting more and more optional parameters, we decided to use a builder pattern that should ease the usage and let us add new configuration options without breaking in the future.

Now, instead of writing:

let results = reader.nns_by_item(&rtxn, item_id, n_results, search_k, None)?.unwrap();

You would instead write:

let results = reader.nns(n_results).search_k(search_k).by_item(&rtxn, item_id)?.unwrap();

The same goes for the build method, instead of writing:

writer.build(&mut wtxn, &mut rng, None)?;

You instead write:

writer.builder(&mut rng).build(&mut wtxn)?;

Maintenance

  • Make the warning output errors in the ci by @irevoire in #97
  • Reorganize the NodeId to make the appending of vectors work in more cases and add a test by @irevoire in #98
  • Store the list of updated IDs directly in LMDB instead of a roaring bitmap to increase the vector insertion performances by @irevoire in #99
  • increase the arroy version for the next release by @irevoire in #100

Full Changelog: v0.4.0...v0.5.0