Skip to content

Commit 7c63c78

Browse files
Milvus-doc-botMilvus-doc-bot
Milvus-doc-bot
authored and
Milvus-doc-bot
committed
Release new docs to master
1 parent 26c0d22 commit 7c63c78

File tree

10 files changed

+36
-35
lines changed

10 files changed

+36
-35
lines changed

v2.5.x/site/en/home/home.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ id: home.md
101101
_Nov 2024 - Milvus 2.5.0 release_
102102

103103
- Added guidance on how to [conduct full text search](full-text-search.md).
104-
- Added guidance on how to [conduct keyword match](keyword-match.md).
104+
- Added guidance on how to [conduct text match](keyword-match.md).
105105
- Added guidance on how to [enable nullable and default values](nullable-and-default.md).
106106
- Added descriptions of [analyzers](analyzer-overview.md).
107107
- Added descriptions of [bitmap indexes](bitmap.md).

v2.5.x/site/en/menuStructure/en.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -702,7 +702,7 @@
702702
"children": []
703703
},
704704
{
705-
"label": "Keyword Match",
705+
"label": "Text Match",
706706
"id": "keyword-match.md",
707707
"order": 7,
708708
"children": []

v2.5.x/site/en/release_notes.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Milvus 2.5 introduces a built-in Cluster Management WebUI, reducing system maint
3838

3939
Milvus 2.5 leverages analyzers and indexing from Tantivy for text preprocessing and index building, supporting precise natural language matching of text data based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.
4040

41-
For details, refer to [Keyword Match](keyword-match.md).
41+
For details, refer to [Text Match](keyword-match.md).
4242

4343
#### Bitmap Index
4444

v2.5.x/site/en/tutorials/hybrid_search_with_milvus.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ In this tutorial, we will demonstrate how to conduct hybrid search with [Milvus]
1818
Milvus supports Dense, Sparse, and Hybrid retrieval methods:
1919

2020
- Dense Retrieval: Utilizes semantic context to understand the meaning behind queries.
21-
- Sparse Retrieval: Emphasizes keyword matching to find results based on specific terms, equivalent to full-text search.
21+
- Sparse Retrieval: Emphasizes text matching to find results based on specific terms, equivalent to full-text search.
2222
- Hybrid Retrieval: Combines both Dense and Sparse approaches, capturing the full context and specific keywords for comprehensive search results.
2323

2424
By integrating these methods, the Milvus Hybrid Search balances semantic and lexical similarities, improving the overall relevance of search outcomes. This notebook will walk through the process of setting up and using these retrieval strategies, highlighting their effectiveness in various search scenarios.

v2.5.x/site/en/userGuide/collections/manage-collections.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ For more information about searches and queries, refer to the articles in the [
8585

8686
- [​Full-Text Search](full-text-search.md)
8787

88-
- [Keyword Match](keyword-match.md)
88+
- [Text Match](keyword-match.md)
8989

9090
In addition, Milvus also provides enhancements to improve search performance and efficiency. They are disabled by default, and you can enable and use them according to your service requirements. They are​
9191

v2.5.x/site/en/userGuide/schema/analyzer/analyzer-overview.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,15 @@ summary: "In text processing, an analyzer is a crucial component that converts r
88

99
In text processing, an **analyzer** is a crucial component that converts raw text into a structured, searchable format. Each analyzer typically consists of two core elements: **tokenizer** and **filter**. Together, they transform input text into tokens, refine these tokens, and prepare them for efficient indexing and retrieval.​
1010

11-
In Milvus, analyzers are configured during collection creation when you add `VARCHAR` fields to the collection schema. Tokens produced by an analyzer can be used to build an index for keyword matching or converted into sparse embeddings for full text search. For more information, refer to [​Keyword Match](keyword-match.md) or [​Full Text Search](full-text-search.md).​
11+
In Milvus, analyzers are configured during collection creation when you add `VARCHAR` fields to the collection schema. Tokens produced by an analyzer can be used to build an index for text matching or converted into sparse embeddings for full text search. For more information, refer to [Text Match](keyword-match.md) or [​Full Text Search](full-text-search.md).​
1212

1313
<div class="alert note">
1414

1515
The use of analyzers may impact performance:​
1616

1717
- **Full text search:** For full text search, DataNode and **QueryNode** channels consume data more slowly because they must wait for tokenization to complete. As a result, newly ingested data takes longer to become available for search.​
1818

19-
- **Keyword match:** For keyword matching, index creation is also slower since tokenization needs to finish before an index can be built.​
19+
- **Text match:** For text matching, index creation is also slower since tokenization needs to finish before an index can be built.​
2020

2121
</div>
2222

v2.5.x/site/en/userGuide/search-query-get/boolean.md

+5-4
Original file line numberDiff line numberDiff line change
@@ -835,9 +835,10 @@ Match operators include:​
835835
- `like`: Match constants or prefixes (prefix%), infixes (%infix%), and suffixes (%suffix) within constants. It relies on a brute-force search mechanism using wildcards and does not involve text tokenization. While it can achieve exact matches, its query efficiency is relatively low, making it suitable for simple matching tasks or queries on smaller datasets.​
836836

837837
- `TEXT_MATCH`: Match specific terms or keywords on VARCHAR fields, using tokenization and inverted index to enable efficient text search. Compared to `like`, `TEXT_MATCH` offers more advanced text tokenization and filtering capabilities. It is suited for large-scale datasets where higher query performance is required for complex text search scenarios.​
838+
838839
<div class="alert note">
839840

840-
To use the `TEXT_MATCH` filter expression, you must enable text matching for the target `VARCHAR` field when creating the collection. For details, refer to [​Keyword Match](keyword-match.md).​
841+
To use the `TEXT_MATCH` filter expression, you must enable text matching for the target `VARCHAR` field when creating the collection. For details, refer to [Text Match](keyword-match.md).​
841842

842843
</div>
843844

@@ -1022,11 +1023,11 @@ The filtered results are as follows:​
10221023

10231024
```
10241025

1025-
#### Example 3: Keyword match on VARCHAR fields​
1026+
#### Example 3: Text match on VARCHAR fields​
10261027

1027-
The `TEXT_MATCH` expression is used for keyword match on `VARCHAR` fields. By default, it applies an **OR** logic, but you can combine it with other logical operators to create more complex query conditions. For details, refer to [​Keyword Match](keyword-match.md).​
1028+
The `TEXT_MATCH` expression is used for text match on `VARCHAR` fields. By default, it applies an **OR** logic, but you can combine it with other logical operators to create more complex query conditions. For details, refer to [Text Match](keyword-match.md).​
10281029

1029-
The following example demonstrates how to use the `TEXT_MATCH` expression to filter products where the `description` field contains either the keyword `"Apple"` or `"iPhone"`:​
1030+
The following example demonstrates how to use the `TEXT_MATCH` expression to filter products where the `description` field contains either the term `"Apple"` or `"iPhone"`:​
10301031

10311032
<div class="multipleCode">
10321033
<a href="#python">Python </a>

v2.5.x/site/en/userGuide/search-query-get/keyword-match.md

+19-19
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,38 @@
11
---
22
id: keyword-match.md
3-
summary: "Keyword match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​"
4-
title: Keyword Match​
3+
summary: "Text match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​"
4+
title: Text Match​
55
---
66

7-
# Keyword Match​
7+
# Text Match​
88

9-
Keyword match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​
9+
Text match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​
1010

1111
<div class="alert note">
1212

13-
Keyword match focuses on finding exact occurrences of the query terms, without scoring the relevance of the matched documents. If you want to retrieve the most relevant documents based on the semantic meaning and importance of the query terms, we recommend you use [​Full Text Search](full-text-search.md).​
13+
Text match focuses on finding exact occurrences of the query terms, without scoring the relevance of the matched documents. If you want to retrieve the most relevant documents based on the semantic meaning and importance of the query terms, we recommend you use [​Full Text Search](full-text-search.md).​
1414

1515
</div>
1616

1717
## Overview
1818

19-
Milvus integrates [Tantivy](https://github.com/quickwit-oss/tantivy) to power its underlying inverted index and keyword search. For each text entry, Milvus indexes it following the procedure:​
19+
Milvus integrates [Tantivy](https://github.com/quickwit-oss/tantivy) to power its underlying inverted index and term-based text search. For each text entry, Milvus indexes it following the procedure:​
2020

2121
1. [Analyzer](analyzer-overview.md): The analyzer processes input text by tokenizing it into individual words, or tokens, and then applying filters as needed. This allows Milvus to build an index based on these tokens.​
2222

2323
2. [Indexing](index-scalar-fields.md): After text analysis, Milvus creates an inverted index that maps each unique token to the documents containing it.​
2424

25-
When a user performs a keyword match, the inverted index is used to quickly retrieve all documents containing the keywords. This is much faster than scanning through each document individually.​
25+
When a user performs a text match, the inverted index is used to quickly retrieve all documents containing the keywords. This is much faster than scanning through each document individually.​
2626

27-
![Keyword Match](../../../assets/keyword-match.png)
27+
![Text Match](../../../assets/keyword-match.png)
2828

29-
## Enable keyword match
29+
## Enable text match
3030

31-
Keyword match works on the `VARCHAR` field type, which is essentially the string data type in Milvus. To enable keyword match, set both `enable_analyzer` and `enable_match` to `True` and then optionally configure an analyzer for text analysis when defining your collection schema.​
31+
Text match works on the `VARCHAR` field type, which is essentially the string data type in Milvus. To enable text match, set both `enable_analyzer` and `enable_match` to `True` and then optionally configure an analyzer for text analysis when defining your collection schema.​
3232

3333
### Set `enable_analyzer` and `enable_match`
3434

35-
To enable keyword match for a specific `VARCHAR` field, set both the `enable_analyzer` and `enable_match` parameters to `True` when defining the field schema. This instructs Milvus to tokenize text and create an inverted index for the specified field, allowing fast and efficient keyword matches.​
35+
To enable text match for a specific `VARCHAR` field, set both the `enable_analyzer` and `enable_match` parameters to `True` when defining the field schema. This instructs Milvus to tokenize text and create an inverted index for the specified field, allowing fast and efficient text matches.​
3636

3737
```python
3838
from pymilvus import MilvusClient, DataType​
@@ -51,7 +51,7 @@ schema.add_field(​
5151

5252
### Optional: Configure an analyzer​
5353

54-
The performance and accuracy of keyword matching depend on the selected analyzer. Different analyzers are tailored to various languages and text structures, so choosing the right one can significantly impact search results for your specific use case.​
54+
The performance and accuracy of text matching depend on the selected analyzer. Different analyzers are tailored to various languages and text structures, so choosing the right one can significantly impact search results for your specific use case.​
5555

5656
By default, Milvus uses the `standard` analyzer, which tokenizes text based on whitespace and punctuation, removes tokens longer than 40 characters, and converts text to lowercase. No additional parameters are needed to apply this default setting. For more information, refer to [​Standard](standard-analyzer.md).​
5757

@@ -75,9 +75,9 @@ schema.add_field(​
7575

7676
Milvus also provides various other analyzers suited to different languages and scenarios. For more details, refer to [​Overview](analyzer-overview.md).​
7777

78-
## Use keyword match
78+
## Use text match
7979

80-
Once you have enabled keyword match for a VARCHAR field in your collection schema, you can perform keyword matches using the `TEXT_MATCH` expression.​
80+
Once you have enabled text match for a VARCHAR field in your collection schema, you can perform text matches using the `TEXT_MATCH` expression.​
8181

8282
### TEXT_MATCH expression syntax​
8383

@@ -106,9 +106,9 @@ filter = "TEXT_MATCH(text, 'machine') and TEXT_MATCH(text, 'deep')"​
106106

107107
```
108108

109-
### Search with keyword match​
109+
### Search with text match​
110110

111-
Keyword match can be used in combination with vector similarity search to narrow the search scope and improve search performance. By filtering the collection using keyword match before vector similarity search, you can reduce the number of documents that need to be searched, resulting in faster query times.​
111+
Text match can be used in combination with vector similarity search to narrow the search scope and improve search performance. By filtering the collection using text match before vector similarity search, you can reduce the number of documents that need to be searched, resulting in faster query times.​
112112

113113
In this example, the `filter` expression filters the search results to only include documents that match the specified keywords `keyword1` or `keyword2`. The vector similarity search is then performed on this filtered subset of documents.​
114114

@@ -129,9 +129,9 @@ result = MilvusClient.search(​
129129

130130
```
131131

132-
### Query with keyword match​
132+
### Query with text match​
133133

134-
Keyword match can also be used for scalar filtering in query operations. By specifying a `TEXT_MATCH` expression in the `expr` parameter of the `query()` method, you can retrieve documents that match the given keywords.​
134+
Text match can also be used for scalar filtering in query operations. By specifying a `TEXT_MATCH` expression in the `expr` parameter of the `query()` method, you can retrieve documents that match the given keywords.​
135135

136136
The example below retrieves documents where the `text` field contains both keywords `keyword1` and `keyword2`.​
137137

@@ -149,6 +149,6 @@ result = MilvusClient.query(​
149149

150150
## Considerations
151151

152-
- Enabling keyword matching for a field triggers the creation of an inverted index, which consumes storage resources. Consider storage impact when deciding to enable this feature, as it varies based on text size, unique tokens, and the analyzer used.​
152+
- Enabling text matching for a field triggers the creation of an inverted index, which consumes storage resources. Consider storage impact when deciding to enable this feature, as it varies based on text size, unique tokens, and the analyzer used.​
153153

154154
- Once you've defined an analyzer in your schema, its settings become permanent for that collection. If you decide that a different analyzer would better suit your needs, you may consider dropping the existing collection and creating a new one with the desired analyzer configuration.​

v2.5.x/site/en/userGuide/search-query-get/multi-vector-search.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ Hybrid Search is suitable for the following two scenarios:​
1919

2020
Different types of vectors can represent different information, and using various embedding models can more comprehensively represent different features and aspects of the data. For example, using different embedding models for the same sentence can generate a dense vector to represent the semantic meaning and a sparse vector to represent the word frequency in the sentence.​
2121

22-
- **Sparse vectors:** Sparse vectors are characterized by their high vector dimensionality and the presence of few non-zero values. This structure makes them particularly well-suited for traditional information retrieval applications. In most cases, the number of dimensions used in sparse vectors correspond to different tokens across one or more languages. Each dimension is assigned a value that indicates the relative importance of that token within the document. This layout proves advantageous for tasks that involve keyword matching.​
22+
- **Sparse vectors:** Sparse vectors are characterized by their high vector dimensionality and the presence of few non-zero values. This structure makes them particularly well-suited for traditional information retrieval applications. In most cases, the number of dimensions used in sparse vectors correspond to different tokens across one or more languages. Each dimension is assigned a value that indicates the relative importance of that token within the document. This layout proves advantageous for tasks that involve text matching.​
2323

24-
- **Dense vectors:** Dense vectors are embeddings derived from neural networks. When arranged in an ordered array, these vectors capture the semantic essence of the input text. Note that dense vectors are not limited to text processing; they are also extensively used in computer vision to represent the semantics of visual data. These dense vectors, usually generated by text embedding models, are characterized by most or all elements being non-zero. Thus, dense vectors are particularly effective for semantic search applications, as they can return the most similar results based on vector distance even in the absence of exact keyword matches. This capability allows for more nuanced and context-aware search results, often capturing relationships between concepts that might be missed by keyword-based approaches.​
24+
- **Dense vectors:** Dense vectors are embeddings derived from neural networks. When arranged in an ordered array, these vectors capture the semantic essence of the input text. Note that dense vectors are not limited to text processing; they are also extensively used in computer vision to represent the semantics of visual data. These dense vectors, usually generated by text embedding models, are characterized by most or all elements being non-zero. Thus, dense vectors are particularly effective for semantic search applications, as they can return the most similar results based on vector distance even in the absence of exact text matches. This capability allows for more nuanced and context-aware search results, often capturing relationships between concepts that might be missed by keyword-based approaches.​
2525

2626
For more details, refer to [​Sparse Vector](sparse_vector.md) and [​Dense Vector](dense-vector.md).​
2727

v2.5.x/site/en/userGuide/search-query-get/single-vector-search.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -946,11 +946,11 @@ AUTOINDEX considerably flattens the learning curve of ANN searches. However, the
946946

947947
For details on full-text search, refer to [​Full Text Search](full-text-search.md).​
948948

949-
- Keyword Match​
949+
- Text Match​
950950

951-
Keyword match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​
951+
Text match in Milvus enables precise document retrieval based on specific terms. This feature is primarily used for filtered search to satisfy specific conditions and can incorporate scalar filtering to refine query results, allowing similarity searches within vectors that meet scalar criteria.​
952952

953-
For details on keyword match, refer to [​Keyword Match](keyword-match.md).​
953+
For details on text match, refer to [Text Match](keyword-match.md).​
954954

955955
- Use Partition Key​
956956

0 commit comments

Comments
 (0)