You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I currently want to use SurrealDB for hybrid search, but it seems that the documentation does not mention anything related to this. Moreover, my data contains a large amount of Chinese.
Problems:
Firstly, the analyzer for Full Text search index does not support Chinese (the PR for jieba #4556 has made no progress at present). However, I can achieve this by tokenizing the text before storing it. Although the effect may be a bit worse.
But when I perform a search, I first calculate TF-IDF to get the topN words and then recombine them into a search condition tokenized @@ $tf_idf_q. If the search condition contains words that do not exist in the document, no data will be returned because this is not a retrieval of word frequency itself but purely full-text retrieval.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Background:
I currently want to use SurrealDB for hybrid search, but it seems that the documentation does not mention anything related to this. Moreover, my data contains a large amount of Chinese.
Problems:
Firstly, the analyzer for Full Text search index does not support Chinese (the PR for jieba #4556 has made no progress at present). However, I can achieve this by tokenizing the text before storing it. Although the effect may be a bit worse.
But when I perform a search, I first calculate TF-IDF to get the
topN
words and then recombine them into a search conditiontokenized @@ $tf_idf_q
. If the search condition contains words that do not exist in the document, no data will be returned because this is not a retrieval of word frequency itself but purely full-text retrieval.Beta Was this translation helpful? Give feedback.
All reactions