Skip to content

Commit 73f087e

Browse files
authored
Merge pull request #606 from unum-cloud/main-dev
Accuracy, Correctness, new Python & JavaScript APIs
2 parents e414f46 + 9828ed8 commit 73f087e

File tree

17 files changed

+519
-235
lines changed

17 files changed

+519
-235
lines changed

.github/workflows/release.yml

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -368,7 +368,7 @@ jobs:
368368
echo "last_release_id=$(echo "$response" | jq -r '.id')" >> $GITHUB_OUTPUT
369369
370370
- name: Download release assets
371-
uses: robinraju/release-downloader@v1.8
371+
uses: robinraju/release-downloader@v1.12
372372
with:
373373
latest: true
374374
fileName: "*"
@@ -544,6 +544,7 @@ jobs:
544544
name: prebuilds
545545
path: prebuilds
546546
retention-days: 1
547+
overwrite: true
547548

548549
publish_javascript:
549550
name: Publish JavaScript
@@ -564,7 +565,7 @@ jobs:
564565
node-version: 20
565566

566567
- name: Download prebuilds
567-
uses: actions/download-artifact@v3
568+
uses: actions/download-artifact@v4
568569

569570
- name: Look for links
570571
run: find . -type f -links +1
@@ -760,6 +761,7 @@ jobs:
760761
with:
761762
name: usearch-csharp-dependencies
762763
path: ${{ github.workspace }}/csharp/lib/**/*
764+
overwrite: true
763765

764766
publish_csharp:
765767
name: Publish C#
@@ -781,7 +783,7 @@ jobs:
781783
run: git submodule update --init --recursive
782784

783785
- name: Download usearch libs artifact
784-
uses: actions/download-artifact@v3
786+
uses: actions/download-artifact@v4
785787
with:
786788
name: usearch-csharp-dependencies
787789
path: ${{ env.USEARCH_LIBS }}
@@ -861,21 +863,22 @@ jobs:
861863
if: ${{ always() }}
862864
needs: build_docs
863865
steps:
864-
- uses: robinraju/[email protected]
866+
- name: Download release assets
867+
uses: robinraju/[email protected]
865868
with:
866869
latest: true
867870
fileName: docs.tar.gz
868871
- name: Unpack docs
869872
run: tar -xf ./docs.tar.gz
870873
- name: Setup GitHub Pages
871-
uses: actions/configure-pages@v2
874+
uses: actions/configure-pages@v5
872875
- name: Upload artifacts
873-
uses: actions/upload-pages-artifact@v1
876+
uses: actions/upload-pages-artifact@v3
874877
with:
875878
path: ./build/docs/html
876879
- name: Deploy to GitHub Pages
877880
id: deployment
878-
uses: actions/deploy-pages@v1
881+
uses: actions/deploy-pages@v4
879882

880883
deploy_docs_vercel:
881884
name: Deploy Vercel

.vscode/settings.json

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,8 @@
122122
"xtree": "cpp",
123123
"xutility": "cpp",
124124
"execution": "cpp",
125-
"text_encoding": "cpp"
125+
"text_encoding": "cpp",
126+
"__functional_03": "cpp"
126127
},
127128
"cSpell.words": [
128129
"allclose",
@@ -151,6 +152,7 @@
151152
"FAISS",
152153
"fbin",
153154
"furo",
155+
"geospatial",
154156
"googleanalytics",
155157
"groundtruth",
156158
"hashable",
@@ -170,6 +172,7 @@
170172
"longlong",
171173
"memmap",
172174
"MSVC",
175+
"Multimodal",
173176
"Napi",
174177
"ndarray",
175178
"NDCG",
@@ -208,7 +211,10 @@
208211
"usecases",
209212
"Vardanian",
210213
"vectorize",
211-
"Xunit"
214+
"Vincenty",
215+
"Wasmer",
216+
"Xunit",
217+
"Yuga"
212218
],
213219
"autoDocstring.docstringFormat": "sphinx",
214220
"java.configuration.updateBuildConfiguration": "interactive",
@@ -225,5 +231,11 @@
225231
"editor.formatOnSave": true,
226232
"editor.defaultFormatter": "golang.go"
227233
},
234+
"editor.tabSize": 4,
235+
"editor.insertSpaces": true,
236+
"prettier.singleQuote": true,
237+
"prettier.tabWidth": 4,
238+
"prettier.useTabs": false
239+
228240
"dotnet.defaultSolution": "csharp/Cloud.Unum.USearch.sln"
229241
}

CONTRIBUTING.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -226,10 +226,10 @@ nvm install 20
226226
Testing:
227227

228228
```sh
229-
npm install -g typescript
230-
npm install
231-
npm run build-js
232-
npm test
229+
npm install -g typescript # Install TypeScript globally
230+
npm install # Compile `javascript/lib.cpp`
231+
npm run build-js # Generate JS from TS
232+
npm test # Run the test suite
233233
```
234234

235235
To compile for AWS Lambda you'd need to recompile the binding.

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ This can result in __20x cost reduction__ on AWS and other public clouds.
161161
index.save("index.usearch")
162162

163163
loaded_copy = index.load("index.usearch")
164-
view = Index.restore("index.usearch", view=True)
164+
view = Index.restore("index.usearch", view=True, ...)
165165

166166
other_view = Index(ndim=..., metric=...)
167167
other_view.view("index.usearch")
@@ -528,7 +528,11 @@ index = Index(ndim=ndim, metric=CompiledMetric(
528528

529529
- [x] ClickHouse: [C++](https://github.com/ClickHouse/ClickHouse/pull/53447), [docs](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/annindexes#usearch).
530530
- [x] DuckDB: [post](https://duckdb.org/2024/05/03/vector-similarity-search-vss.html).
531+
- [x] ScyllaDB: [Rust](https://github.com/scylladb/vector-store), [presentation](https://www.slideshare.net/slideshow/vector-search-with-scylladb-by-szymon-wasik/276571548).
532+
- [x] TiDB & TiFlash: [C++](https://github.com/pingcap/tiflash), [announcement](https://www.pingcap.com/article/introduce-vector-search-indexes-in-tidb/).
533+
- [x] YugaByte: [C++](https://github.com/yugabyte/yugabyte-db/blob/366b9f5e3c4df3a1a17d553db41d6dc50146f488/src/yb/vector_index/usearch_wrapper.cc).
531534
- [x] Google: [UniSim](https://github.com/google/unisim), [RetSim](https://arxiv.org/abs/2311.17264) paper.
535+
- [x] MemGraph: [C++](https://github.com/memgraph/memgraph/blob/784dd8520f65050d033aea8b29446e84e487d091/src/storage/v2/indices/vector_index.cpp), [announcement](https://memgraph.com/blog/simplify-data-retrieval-memgraph-vector-search).
532536
- [x] LanternDB: [C++](https://github.com/lanterndata/lantern), [Rust](https://github.com/lanterndata/lantern_extras), [docs](https://lantern.dev/blog/hnsw-index-creation).
533537
- [x] LangChain: [Python](https://github.com/langchain-ai/langchain/releases/tag/v0.0.257) and [JavaScript](https://github.com/hwchase17/langchainjs/releases/tag/0.0.125).
534538
- [x] Microsoft Semantic Kernel: [Python](https://github.com/microsoft/semantic-kernel/releases/tag/python-0.3.9.dev) and C#.

cpp/test.cpp

Lines changed: 55 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
*/
1515
#include <algorithm> // `std::shuffle`
1616
#include <cassert> // `assert`
17+
#include <cmath> // `std::abs`
1718
#include <random> // `std::default_random_engine`
1819
#include <stdexcept> // `std::terminate`
1920
#include <unordered_map> // `std::unordered_map`
@@ -673,16 +674,16 @@ void test_cosine(std::size_t collection_size, std::size_t dimensions) {
673674
scalar_t const* row(std::size_t i) const noexcept { return (*vector_of_vectors_ptr)[i].data(); }
674675

675676
float operator()(member_cref_t const& a, member_cref_t const& b) const {
676-
return metric_cos_gt<scalar_t>{}(row(get_slot(b)), row(get_slot(a)), dimensions);
677+
return metric_cos_gt<scalar_t, float>{}(row(get_slot(b)), row(get_slot(a)), dimensions);
677678
}
678679
float operator()(scalar_t const* some_vector, member_cref_t const& member) const {
679-
return metric_cos_gt<scalar_t>{}(some_vector, row(get_slot(member)), dimensions);
680+
return metric_cos_gt<scalar_t, float>{}(some_vector, row(get_slot(member)), dimensions);
680681
}
681682
float operator()(member_citerator_t const& a, member_citerator_t const& b) const {
682-
return metric_cos_gt<scalar_t>{}(row(get_slot(b)), row(get_slot(a)), dimensions);
683+
return metric_cos_gt<scalar_t, float>{}(row(get_slot(b)), row(get_slot(a)), dimensions);
683684
}
684685
float operator()(scalar_t const* some_vector, member_citerator_t const& member) const {
685-
return metric_cos_gt<scalar_t>{}(some_vector, row(get_slot(member)), dimensions);
686+
return metric_cos_gt<scalar_t, float>{}(some_vector, row(get_slot(member)), dimensions);
686687
}
687688
};
688689

@@ -877,7 +878,7 @@ void test_absurd(std::size_t dimensions, std::size_t connectivity, std::size_t e
877878
template <typename scalar_at>
878879
void test_exact_search(std::size_t dataset_count, std::size_t queries_count, std::size_t wanted_count) {
879880
std::size_t dimensions = 32;
880-
metric_punned_t metric(dimensions, metric_kind_t::cos_k);
881+
metric_punned_t metric(dimensions, metric_kind_t::cos_k, scalar_kind<scalar_at>());
881882

882883
std::random_device rd;
883884
std::mt19937 gen(rd());
@@ -886,9 +887,9 @@ void test_exact_search(std::size_t dataset_count, std::size_t queries_count, std
886887
std::generate(dataset.begin(), dataset.end(), [&] { return static_cast<scalar_at>(dis(gen)); });
887888

888889
exact_search_t search;
889-
auto results = search( //
890-
(byte_t const*)dataset.data(), dataset_count, dimensions * sizeof(float), //
891-
(byte_t const*)dataset.data(), queries_count, dimensions * sizeof(float), //
890+
auto results = search( //
891+
(byte_t const*)dataset.data(), dataset_count, dimensions * sizeof(scalar_at), //
892+
(byte_t const*)dataset.data(), queries_count, dimensions * sizeof(scalar_at), //
892893
wanted_count, metric);
893894

894895
for (std::size_t i = 0; i < results.size(); ++i)
@@ -1098,6 +1099,51 @@ template <typename key_at, typename slot_at> void test_replacing_update() {
10981099
expect_eq(final_search[2].member.key, 44);
10991100
}
11001101

1102+
/**
1103+
* Tests the filtered search functionality of the index.
1104+
*/
1105+
void test_filtered_search() {
1106+
constexpr std::size_t dataset_count = 2048;
1107+
constexpr std::size_t dimensions = 32;
1108+
metric_punned_t metric(dimensions, metric_kind_t::cos_k);
1109+
1110+
std::random_device rd;
1111+
std::mt19937 gen(rd());
1112+
std::uniform_real_distribution<> dis(0.0, 1.0);
1113+
using vector_of_vectors_t = std::vector<std::vector<float>>;
1114+
1115+
vector_of_vectors_t vector_of_vectors(dataset_count);
1116+
for (auto& vector : vector_of_vectors) {
1117+
vector.resize(dimensions);
1118+
std::generate(vector.begin(), vector.end(), [&] { return dis(gen); });
1119+
}
1120+
1121+
index_dense_t index = index_dense_t::make(metric);
1122+
index.reserve(dataset_count);
1123+
for (std::size_t idx = 0; idx < dataset_count; ++idx)
1124+
index.add(idx, vector_of_vectors[idx].data());
1125+
expect_eq(index.size(), dataset_count);
1126+
1127+
{
1128+
auto predicate = [](index_dense_t::key_t key) { return key != 0; };
1129+
auto results = index.filtered_search(vector_of_vectors[0].data(), 10, predicate);
1130+
expect_eq(10, results.size()); // ! Should not contain 0
1131+
for (std::size_t i = 0; i != results.size(); ++i)
1132+
expect(0 != results[i].member.key);
1133+
}
1134+
{
1135+
auto predicate = [](index_dense_t::key_t) { return false; };
1136+
auto results = index.filtered_search(vector_of_vectors[0].data(), 10, predicate);
1137+
expect_eq(0, results.size()); // ! Should not contain 0
1138+
}
1139+
{
1140+
auto predicate = [](index_dense_t::key_t key) { return key == 10; };
1141+
auto results = index.filtered_search(vector_of_vectors[0].data(), 10, predicate);
1142+
expect_eq(1, results.size()); // ! Should not contain 0
1143+
expect_eq(10, results[0].member.key);
1144+
}
1145+
}
1146+
11011147
int main(int, char**) {
11021148
test_uint40();
11031149
test_cosine<float, std::int64_t, uint40_t>(10, 10);
@@ -1174,5 +1220,6 @@ int main(int, char**) {
11741220
test_sets<std::int64_t, slot32_t>(set_size, 20, 30);
11751221
test_strings<std::int64_t, slot32_t>();
11761222

1223+
test_filtered_search();
11771224
return 0;
11781225
}

include/usearch/index.hpp

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2183,6 +2183,7 @@ class index_gt {
21832183
*/
21842184
struct usearch_align_m context_t {
21852185
top_candidates_t top_candidates{};
2186+
top_candidates_t top_for_refine{};
21862187
next_candidates_t next_candidates{};
21872188
visits_hash_set_t visits{};
21882189
std::default_random_engine level_generator{};
@@ -2498,6 +2499,13 @@ class index_gt {
24982499
if (nodes_)
24992500
std::memcpy(new_nodes.data(), nodes_.data(), sizeof(node_t) * size());
25002501

2502+
// Pre-reserve the capacity for `top_for_refine`, which always contains at most one more
2503+
// element than the connectivity factors.
2504+
std::size_t connectivity_max = (std::max)(config_.connectivity_base, config_.connectivity);
2505+
for (std::size_t i = 0; i != new_contexts.size(); ++i)
2506+
if (!new_contexts[i].top_for_refine.reserve(connectivity_max + 1))
2507+
return false;
2508+
25012509
limits_ = limits;
25022510
nodes_capacity_ = limits.members;
25032511
nodes_ = std::move(new_nodes);
@@ -3179,17 +3187,11 @@ class index_gt {
31793187

31803188
std::size_t memory_usage_per_node(level_t level) const noexcept { return node_bytes_(level); }
31813189

3182-
double inverse_log_connectivity() const {
3183-
return pre_.inverse_log_connectivity;
3184-
}
3190+
double inverse_log_connectivity() const { return pre_.inverse_log_connectivity; }
31853191

3186-
std::size_t neighbors_base_bytes() const {
3187-
return pre_.neighbors_base_bytes;
3188-
}
3192+
std::size_t neighbors_base_bytes() const { return pre_.neighbors_base_bytes; }
31893193

3190-
std::size_t neighbors_bytes() const {
3191-
return pre_.neighbors_bytes;
3192-
}
3194+
std::size_t neighbors_bytes() const { return pre_.neighbors_bytes; }
31933195

31943196
#if defined(USEARCH_USE_PRAGMA_REGION)
31953197
#pragma endregion
@@ -3790,7 +3792,7 @@ class index_gt {
37903792
metric_at&& metric, compressed_slot_t new_slot, candidates_view_t new_neighbors, value_at&& value,
37913793
level_t level, context_t& context) usearch_noexcept_m {
37923794

3793-
top_candidates_t& top = context.top_candidates;
3795+
top_candidates_t& top_for_refine = context.top_for_refine;
37943796
std::size_t const connectivity_max = level ? config_.connectivity : config_.connectivity_base;
37953797

37963798
// Reverse links from the neighbors:
@@ -3817,19 +3819,16 @@ class index_gt {
38173819
continue;
38183820
}
38193821

3820-
// To fit a new connection we need to drop an existing one.
3821-
top.clear();
3822-
usearch_assert_m((top.capacity() >= (close_header.size() + 1)),
3823-
"The memory must have been reserved in `add`");
3824-
top.insert_reserved({context.measure(value, citerator_at(close_slot), metric), new_slot});
3822+
top_for_refine.clear();
3823+
top_for_refine.insert_reserved({context.measure(value, citerator_at(close_slot), metric), new_slot});
38253824
for (compressed_slot_t successor_slot : close_header)
3826-
top.insert_reserved(
3825+
top_for_refine.insert_reserved(
38273826
{context.measure(citerator_at(close_slot), citerator_at(successor_slot), metric), successor_slot});
38283827

38293828
// Export the results:
38303829
close_header.clear();
3831-
candidates_view_t top_view =
3832-
refine_(metric, connectivity_max, top, context, context.computed_distances_in_reverse_refines);
3830+
candidates_view_t top_view = refine_(metric, connectivity_max, top_for_refine, context,
3831+
context.computed_distances_in_reverse_refines);
38333832
usearch_assert_m(top_view.size(), "This would lead to isolated nodes");
38343833
for (std::size_t idx = 0; idx != top_view.size(); idx++)
38353834
close_header.push_back(top_view[idx].slot);
@@ -4178,9 +4177,10 @@ class index_gt {
41784177
// This can substantially grow our priority queue:
41794178
next.insert({-successor_dist, successor_slot});
41804179
if (is_dummy<predicate_at>() ||
4181-
predicate(member_cref_t{node_at_(successor_slot).ckey(), successor_slot}))
4180+
predicate(member_cref_t{node_at_(successor_slot).ckey(), successor_slot})) {
41824181
top.insert({successor_dist, successor_slot}, top_limit);
4183-
radius = top.top().distance;
4182+
radius = top.top().distance;
4183+
}
41844184
}
41854185
}
41864186
}

javascript/README.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,6 @@ For Node.js environments, install USearch using `npm`:
1010
npm install usearch
1111
```
1212

13-
For front-end applications using WASM, use the Wasmer package manager:
14-
15-
```sh
16-
wasmer install unum/usearch
17-
```
18-
1913
## Quickstart
2014

2115
Create an index, add vectors, and perform searches with ease:
@@ -78,6 +72,14 @@ const batchResults = index.search(vectors, 2);
7872
const firstMatch = batchResults.get(0);
7973
```
8074

75+
Multi-threading is supported for batch operations:
76+
77+
```js
78+
const threads_count = 0; // Zero for auto-detection or pass an unsigned integer
79+
index.add(keys, vectors, threads_count);
80+
const batchResults = index.search(vectors, 2, threads_count);
81+
```
82+
8183
## Index Introspection
8284

8385
Inspect and interact with the index:

0 commit comments

Comments
 (0)