How to use sparse vectors for retrieval in SurrealDB? #5830

honhimW · 2025-04-24T06:22:21Z

honhimW
Apr 24, 2025

Background:

I currently want to use SurrealDB for hybrid search, but it seems that the documentation does not mention anything related to this. Moreover, my data contains a large amount of Chinese.

Problems:

Firstly, the analyzer for Full Text search index does not support Chinese (the PR for jieba #4556 has made no progress at present). However, I can achieve this by tokenizing the text before storing it. Although the effect may be a bit worse.

But when I perform a search, I first calculate TF-IDF to get the topN words and then recombine them into a search condition tokenized @@ $tf_idf_q. If the search condition contains words that do not exist in the document, no data will be returned because this is not a retrieval of word frequency itself but purely full-text retrieval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SurrealDB

How to use sparse vectors for retrieval in SurrealDB? #5830

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

SurrealDB

How to use sparse vectors for retrieval in SurrealDB? #5830

Uh oh!

honhimW Apr 24, 2025

Background:

Problems:

Replies: 0 comments

honhimW
Apr 24, 2025