You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: v2.5.x/site/en/release_notes.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,7 @@ Milvus 2.5 introduces a built-in Cluster Management WebUI, reducing system maint
38
38
39
39
Milvus 2.5 leverages analyzers and indexing from Tantivy for text preprocessing and index building, supporting precise natural language matching of text data based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.
40
40
41
-
For details, refer to [Keyword Match](keyword-match.md).
41
+
For details, refer to [Text Match](keyword-match.md).
Copy file name to clipboardexpand all lines: v2.5.x/site/en/tutorials/hybrid_search_with_milvus.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ In this tutorial, we will demonstrate how to conduct hybrid search with [Milvus]
18
18
Milvus supports Dense, Sparse, and Hybrid retrieval methods:
19
19
20
20
- Dense Retrieval: Utilizes semantic context to understand the meaning behind queries.
21
-
- Sparse Retrieval: Emphasizes keyword matching to find results based on specific terms, equivalent to full-text search.
21
+
- Sparse Retrieval: Emphasizes text matching to find results based on specific terms, equivalent to full-text search.
22
22
- Hybrid Retrieval: Combines both Dense and Sparse approaches, capturing the full context and specific keywords for comprehensive search results.
23
23
24
24
By integrating these methods, the Milvus Hybrid Search balances semantic and lexical similarities, improving the overall relevance of search outcomes. This notebook will walk through the process of setting up and using these retrieval strategies, highlighting their effectiveness in various search scenarios.
Copy file name to clipboardexpand all lines: v2.5.x/site/en/userGuide/collections/manage-collections.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -85,7 +85,7 @@ For more information about searches and queries, refer to the articles in the [
85
85
86
86
-[Full-Text Search](full-text-search.md)
87
87
88
-
-[Keyword Match](keyword-match.md)
88
+
-[Text Match](keyword-match.md)
89
89
90
90
In addition, Milvus also provides enhancements to improve search performance and efficiency. They are disabled by default, and you can enable and use them according to your service requirements. They are
Copy file name to clipboardexpand all lines: v2.5.x/site/en/userGuide/schema/analyzer/analyzer-overview.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -8,15 +8,15 @@ summary: "In text processing, an analyzer is a crucial component that converts r
8
8
9
9
In text processing, an **analyzer** is a crucial component that converts raw text into a structured, searchable format. Each analyzer typically consists of two core elements: **tokenizer** and **filter**. Together, they transform input text into tokens, refine these tokens, and prepare them for efficient indexing and retrieval.
10
10
11
-
In Milvus, analyzers are configured during collection creation when you add `VARCHAR` fields to the collection schema. Tokens produced by an analyzer can be used to build an index for keyword matching or converted into sparse embeddings for full text search. For more information, refer to [Keyword Match](keyword-match.md) or [Full Text Search](full-text-search.md).
11
+
In Milvus, analyzers are configured during collection creation when you add `VARCHAR` fields to the collection schema. Tokens produced by an analyzer can be used to build an index for text matching or converted into sparse embeddings for full text search. For more information, refer to [Text Match](keyword-match.md) or [Full Text Search](full-text-search.md).
12
12
13
13
<divclass="alert note">
14
14
15
15
The use of analyzers may impact performance:
16
16
17
17
-**Full text search:** For full text search, DataNode and **QueryNode** channels consume data more slowly because they must wait for tokenization to complete. As a result, newly ingested data takes longer to become available for search.
18
18
19
-
-**Keyword match:** For keyword matching, index creation is also slower since tokenization needs to finish before an index can be built.
19
+
-**Text match:** For text matching, index creation is also slower since tokenization needs to finish before an index can be built.
Copy file name to clipboardexpand all lines: v2.5.x/site/en/userGuide/search-query-get/boolean.md
+5-4
Original file line number
Diff line number
Diff line change
@@ -835,9 +835,10 @@ Match operators include:
835
835
-`like`: Match constants or prefixes (prefix%), infixes (%infix%), and suffixes (%suffix) within constants. It relies on a brute-force search mechanism using wildcards and does not involve text tokenization. While it can achieve exact matches, its query efficiency is relatively low, making it suitable for simple matching tasks or queries on smaller datasets.
836
836
837
837
-`TEXT_MATCH`: Match specific terms or keywords on VARCHAR fields, using tokenization and inverted index to enable efficient text search. Compared to `like`, `TEXT_MATCH` offers more advanced text tokenization and filtering capabilities. It is suited for large-scale datasets where higher query performance is required for complex text search scenarios.
838
+
838
839
<divclass="alert note">
839
840
840
-
To use the `TEXT_MATCH` filter expression, you must enable text matching for the target `VARCHAR` field when creating the collection. For details, refer to [Keyword Match](keyword-match.md).
841
+
To use the `TEXT_MATCH` filter expression, you must enable text matching for the target `VARCHAR` field when creating the collection. For details, refer to [Text Match](keyword-match.md).
841
842
842
843
</div>
843
844
@@ -1022,11 +1023,11 @@ The filtered results are as follows:
1022
1023
1023
1024
```
1024
1025
1025
-
#### Example 3: Keyword match on VARCHAR fields
1026
+
#### Example 3: Text match on VARCHAR fields
1026
1027
1027
-
The `TEXT_MATCH` expression is used for keyword match on `VARCHAR` fields. By default, it applies an **OR** logic, but you can combine it with other logical operators to create more complex query conditions. For details, refer to [Keyword Match](keyword-match.md).
1028
+
The `TEXT_MATCH` expression is used for text match on `VARCHAR` fields. By default, it applies an **OR** logic, but you can combine it with other logical operators to create more complex query conditions. For details, refer to [Text Match](keyword-match.md).
1028
1029
1029
-
The following example demonstrates how to use the `TEXT_MATCH` expression to filter products where the `description` field contains either the keyword`"Apple"` or `"iPhone"`:
1030
+
The following example demonstrates how to use the `TEXT_MATCH` expression to filter products where the `description` field contains either the term`"Apple"` or `"iPhone"`:
Copy file name to clipboardexpand all lines: v2.5.x/site/en/userGuide/search-query-get/keyword-match.md
+19-19
Original file line number
Diff line number
Diff line change
@@ -1,38 +1,38 @@
1
1
---
2
2
id: keyword-match.md
3
-
summary: "Keyword match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria."
4
-
title: Keyword Match
3
+
summary: "Text match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria."
4
+
title: Text Match
5
5
---
6
6
7
-
# Keyword Match
7
+
# Text Match
8
8
9
-
Keyword match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.
9
+
Text match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.
10
10
11
11
<divclass="alert note">
12
12
13
-
Keyword match focuses on finding exact occurrences of the query terms, without scoring the relevance of the matched documents. If you want to retrieve the most relevant documents based on the semantic meaning and importance of the query terms, we recommend you use [Full Text Search](full-text-search.md).
13
+
Text match focuses on finding exact occurrences of the query terms, without scoring the relevance of the matched documents. If you want to retrieve the most relevant documents based on the semantic meaning and importance of the query terms, we recommend you use [Full Text Search](full-text-search.md).
14
14
15
15
</div>
16
16
17
17
## Overview
18
18
19
-
Milvus integrates [Tantivy](https://github.com/quickwit-oss/tantivy) to power its underlying inverted index and keyword search. For each text entry, Milvus indexes it following the procedure:
19
+
Milvus integrates [Tantivy](https://github.com/quickwit-oss/tantivy) to power its underlying inverted index and term-based text search. For each text entry, Milvus indexes it following the procedure:
20
20
21
21
1.[Analyzer](analyzer-overview.md): The analyzer processes input text by tokenizing it into individual words, or tokens, and then applying filters as needed. This allows Milvus to build an index based on these tokens.
22
22
23
23
2.[Indexing](index-scalar-fields.md): After text analysis, Milvus creates an inverted index that maps each unique token to the documents containing it.
24
24
25
-
When a user performs a keyword match, the inverted index is used to quickly retrieve all documents containing the keywords. This is much faster than scanning through each document individually.
25
+
When a user performs a text match, the inverted index is used to quickly retrieve all documents containing the keywords. This is much faster than scanning through each document individually.
Keyword match works on the `VARCHAR` field type, which is essentially the string data type in Milvus. To enable keyword match, set both `enable_analyzer` and `enable_match` to `True` and then optionally configure an analyzer for text analysis when defining your collection schema.
31
+
Text match works on the `VARCHAR` field type, which is essentially the string data type in Milvus. To enable text match, set both `enable_analyzer` and `enable_match` to `True` and then optionally configure an analyzer for text analysis when defining your collection schema.
32
32
33
33
### Set `enable_analyzer` and `enable_match`
34
34
35
-
To enable keyword match for a specific `VARCHAR` field, set both the `enable_analyzer` and `enable_match` parameters to `True` when defining the field schema. This instructs Milvus to tokenize text and create an inverted index for the specified field, allowing fast and efficient keyword matches.
35
+
To enable text match for a specific `VARCHAR` field, set both the `enable_analyzer` and `enable_match` parameters to `True` when defining the field schema. This instructs Milvus to tokenize text and create an inverted index for the specified field, allowing fast and efficient text matches.
36
36
37
37
```python
38
38
from pymilvus import MilvusClient, DataType
@@ -51,7 +51,7 @@ schema.add_field(
51
51
52
52
### Optional: Configure an analyzer
53
53
54
-
The performance and accuracy of keyword matching depend on the selected analyzer. Different analyzers are tailored to various languages and text structures, so choosing the right one can significantly impact search results for your specific use case.
54
+
The performance and accuracy of text matching depend on the selected analyzer. Different analyzers are tailored to various languages and text structures, so choosing the right one can significantly impact search results for your specific use case.
55
55
56
56
By default, Milvus uses the `standard` analyzer, which tokenizes text based on whitespace and punctuation, removes tokens longer than 40 characters, and converts text to lowercase. No additional parameters are needed to apply this default setting. For more information, refer to [Standard](standard-analyzer.md).
57
57
@@ -75,9 +75,9 @@ schema.add_field(
75
75
76
76
Milvus also provides various other analyzers suited to different languages and scenarios. For more details, refer to [Overview](analyzer-overview.md).
77
77
78
-
## Use keyword match
78
+
## Use text match
79
79
80
-
Once you have enabled keyword match for a VARCHAR field in your collection schema, you can perform keyword matches using the `TEXT_MATCH` expression.
80
+
Once you have enabled text match for a VARCHAR field in your collection schema, you can perform text matches using the `TEXT_MATCH` expression.
Keyword match can be used in combination with vector similarity search to narrow the search scope and improve search performance. By filtering the collection using keyword match before vector similarity search, you can reduce the number of documents that need to be searched, resulting in faster query times.
111
+
Text match can be used in combination with vector similarity search to narrow the search scope and improve search performance. By filtering the collection using text match before vector similarity search, you can reduce the number of documents that need to be searched, resulting in faster query times.
112
112
113
113
In this example, the `filter` expression filters the search results to only include documents that match the specified keywords `keyword1` or `keyword2`. The vector similarity search is then performed on this filtered subset of documents.
114
114
@@ -129,9 +129,9 @@ result = MilvusClient.search(
129
129
130
130
```
131
131
132
-
### Query with keyword match
132
+
### Query with text match
133
133
134
-
Keyword match can also be used for scalar filtering in query operations. By specifying a `TEXT_MATCH` expression in the `expr` parameter of the `query()` method, you can retrieve documents that match the given keywords.
134
+
Text match can also be used for scalar filtering in query operations. By specifying a `TEXT_MATCH` expression in the `expr` parameter of the `query()` method, you can retrieve documents that match the given keywords.
135
135
136
136
The example below retrieves documents where the `text` field contains both keywords `keyword1` and `keyword2`.
137
137
@@ -149,6 +149,6 @@ result = MilvusClient.query(
149
149
150
150
## Considerations
151
151
152
-
- Enabling keyword matching for a field triggers the creation of an inverted index, which consumes storage resources. Consider storage impact when deciding to enable this feature, as it varies based on text size, unique tokens, and the analyzer used.
152
+
- Enabling text matching for a field triggers the creation of an inverted index, which consumes storage resources. Consider storage impact when deciding to enable this feature, as it varies based on text size, unique tokens, and the analyzer used.
153
153
154
154
- Once you've defined an analyzer in your schema, its settings become permanent for that collection. If you decide that a different analyzer would better suit your needs, you may consider dropping the existing collection and creating a new one with the desired analyzer configuration.
Copy file name to clipboardexpand all lines: v2.5.x/site/en/userGuide/search-query-get/multi-vector-search.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -19,9 +19,9 @@ Hybrid Search is suitable for the following two scenarios:
19
19
20
20
Different types of vectors can represent different information, and using various embedding models can more comprehensively represent different features and aspects of the data. For example, using different embedding models for the same sentence can generate a dense vector to represent the semantic meaning and a sparse vector to represent the word frequency in the sentence.
21
21
22
-
-**Sparse vectors:** Sparse vectors are characterized by their high vector dimensionality and the presence of few non-zero values. This structure makes them particularly well-suited for traditional information retrieval applications. In most cases, the number of dimensions used in sparse vectors correspond to different tokens across one or more languages. Each dimension is assigned a value that indicates the relative importance of that token within the document. This layout proves advantageous for tasks that involve keyword matching.
22
+
-**Sparse vectors:** Sparse vectors are characterized by their high vector dimensionality and the presence of few non-zero values. This structure makes them particularly well-suited for traditional information retrieval applications. In most cases, the number of dimensions used in sparse vectors correspond to different tokens across one or more languages. Each dimension is assigned a value that indicates the relative importance of that token within the document. This layout proves advantageous for tasks that involve text matching.
23
23
24
-
-**Dense vectors:** Dense vectors are embeddings derived from neural networks. When arranged in an ordered array, these vectors capture the semantic essence of the input text. Note that dense vectors are not limited to text processing; they are also extensively used in computer vision to represent the semantics of visual data. These dense vectors, usually generated by text embedding models, are characterized by most or all elements being non-zero. Thus, dense vectors are particularly effective for semantic search applications, as they can return the most similar results based on vector distance even in the absence of exact keyword matches. This capability allows for more nuanced and context-aware search results, often capturing relationships between concepts that might be missed by keyword-based approaches.
24
+
-**Dense vectors:** Dense vectors are embeddings derived from neural networks. When arranged in an ordered array, these vectors capture the semantic essence of the input text. Note that dense vectors are not limited to text processing; they are also extensively used in computer vision to represent the semantics of visual data. These dense vectors, usually generated by text embedding models, are characterized by most or all elements being non-zero. Thus, dense vectors are particularly effective for semantic search applications, as they can return the most similar results based on vector distance even in the absence of exact text matches. This capability allows for more nuanced and context-aware search results, often capturing relationships between concepts that might be missed by keyword-based approaches.
25
25
26
26
For more details, refer to [Sparse Vector](sparse_vector.md) and [Dense Vector](dense-vector.md).
Copy file name to clipboardexpand all lines: v2.5.x/site/en/userGuide/search-query-get/single-vector-search.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -946,11 +946,11 @@ AUTOINDEX considerably flattens the learning curve of ANN searches. However, the
946
946
947
947
For details on full-text search, refer to [Full Text Search](full-text-search.md).
948
948
949
-
-Keyword Match
949
+
-Text Match
950
950
951
-
Keyword match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.
951
+
Text match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.
952
952
953
-
For details on keyword match, refer to [Keyword Match](keyword-match.md).
953
+
For details on text match, refer to [Text Match](keyword-match.md).
0 commit comments