You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* voyage-multimodal-3.5 (video) support
* Updated: voyage-multimodal-3.5 (video) support
* Updated: voyage-multimodal-3.5 (video) support
* Updated: voyage-multimodal-3.5 (video) support
* Adding VoyageAI V4 family models
[Voyage AI](https://voyageai.com/)'s embedding and ranking models are state-of-the-art in retrieval accuracy. The integration supports the following models:
31
-
-**`voyage-3.5`** and **`voyage-3.5-lite`** - Latest general-purpose embedding models with superior performance
32
-
-**`voyage-3-large`** and **`voyage-3`** - High-performance general-purpose embedding models
33
+
-**`voyage-4-large`**, **`voyage-4`**, and **`voyage-4-lite`** - Latest general-purpose embedding models with shared embedding space and MoE architecture
34
+
-**`voyage-3.5`** and **`voyage-3.5-lite`** - General-purpose embedding models with superior performance
35
+
-**`voyage-code-3`** - Optimized for code retrieval
33
36
-**`voyage-context-3`** - Contextualized chunk embedding model that preserves document context for improved retrieval accuracy
34
-
-**`voyage-2`** and **`voyage-large-2`** - Proven models that outperform `intfloat/e5-mistral-7b-instruct` and `OpenAI/text-embedding-3-large` on the [MTEB Benchmark](https://github.com/embeddings-benchmark/mteb)
37
+
-**`voyage-multimodal-3.5`** - Multimodal model supporting text, images, and video (preview)
35
38
36
39
For the complete list of available models, see the [Embeddings Documentation](https://docs.voyageai.com/embeddings/) and [Contextualized Chunk Embeddings](https://docs.voyageai.com/docs/contextualized-chunk-embeddings).
37
40
41
+
## Supported Models
42
+
43
+
### Text Embedding Models
44
+
45
+
| Model | Description | Dimensions |
46
+
|-------|-------------|------------|
47
+
|`voyage-4-large`| The best general-purpose and multilingual retrieval quality | 1024 (default), 256, 512, 2048 |
48
+
|`voyage-4`| Optimized for general-purpose and multilingual retrieval quality | 1024 (default), 256, 512, 2048 |
49
+
|`voyage-4-lite`| Optimized for latency and cost | 1024 (default), 256, 512, 2048 |
50
+
|`voyage-3.5`| General-purpose embedding model | 1024 |
51
+
|`voyage-3.5-lite`| Efficient model with lower latency | 1024 |
52
+
|`voyage-code-3`| Optimized for code retrieval | 1024 |
53
+
54
+
### Multimodal Embedding Models
55
+
56
+
| Model | Description | Dimensions | Modalities |
57
+
|-------|-------------|------------|------------|
58
+
|`voyage-multimodal-3`| Multimodal embedding model | 1024 | Text, Images |
59
+
|`voyage-multimodal-3.5`| Multimodal embedding model (preview) | 256, 512, 1024, 2048 | Text, Images, Video |
60
+
61
+
### Reranker Models
62
+
63
+
| Model | Description |
64
+
|-------|-------------|
65
+
|`rerank-2`| High-accuracy reranker model |
66
+
|`rerank-2-lite`| Efficient reranker with lower latency |
-[VoyageTextEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_text_embedder.py) - For embedding query text
48
78
-[VoyageDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_document_embedder.py) - For embedding documents
49
-
-[VoyageContextualizedDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/voyage_context-3_model/src/haystack_integrations/components/embedders/voyage_embedders/voyage_contextualized_document_embedder.py) - For contextualized chunk embeddings with `voyage-context-3`
79
+
-[VoyageContextualizedDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_contextualized_document_embedder.py) - For contextualized chunk embeddings with `voyage-context-3`
80
+
-[VoyageMultimodalEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_multimodal_embedder.py) - For multimodal embeddings with `voyage-multimodal-3.5`
50
81
-[VoyageRanker](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/rankers/voyage/ranker.py) - For reranking documents
51
82
52
83
### Standard Embeddings
@@ -58,11 +89,10 @@ To create semantic embeddings for documents, use `VoyageDocumentEmbedder` in you
58
89
For improved retrieval quality, use `VoyageContextualizedDocumentEmbedder` with the `voyage-context-3` model. This component preserves context between related document chunks by grouping them together during embedding, reducing context loss that occurs when chunks are embedded independently
59
90
60
91
**Important:** You must explicitly specify the `model` parameter when initializing any component. Choose from the available models listed in the [Embeddings Documentation](https://docs.voyageai.com/embeddings/). Recommended choices include:
61
-
-`voyage-3.5` - Latest general-purpose model for best performance
62
-
-`voyage-3.5-lite` - Efficient model with lower latency
63
-
-`voyage-3-large` - High-capacity model for complex tasks
92
+
-`voyage-4-large` - Best general-purpose and multilingual retrieval quality
93
+
-`voyage-4` - Balanced general-purpose and multilingual retrieval quality
94
+
-`voyage-4-lite` - Optimized for latency and cost
64
95
-`voyage-context-3` - Contextualized embeddings for improved retrieval (use with `VoyageContextualizedDocumentEmbedder`)
65
-
-`voyage-2` - Proven general-purpose model
66
96
67
97
You can set the environment variable `VOYAGE_API_KEY` instead of passing the API key as an argument. To get an API key, please see the [Voyage AI website.](https://www.voyageai.com/)
68
98
@@ -188,6 +218,99 @@ result = embedder.run(documents=docs)
188
218
189
219
For more examples, see the [contextualized embedder example](https://github.com/awinml/voyage-embedders-haystack/blob/voyage_context-3_model/examples/contextualized_embedder_example.py).
190
220
221
+
## Multimodal Embeddings
222
+
223
+
Voyage AI's `voyage-multimodal-3.5` model transforms unstructured data from multiple modalities (text, images, video) into a shared vector space. This enables mixed-media document retrieval and cross-modal semantic search.
224
+
225
+
### Features
226
+
227
+
-**Multiple modalities**: Supports text, images, and video in a single input
228
+
-**Variable dimensions**: Output dimensions of 256, 512, 1024 (default), or 2048
229
+
-**Interleaved content**: Mix text, images, and video in single inputs
230
+
-**No preprocessing required**: Process documents with embedded images directly
- Mixed-media document retrieval (PDFs, slides with images)
308
+
- Image-text similarity search
309
+
- Video content retrieval and search
310
+
- Cross-modal semantic search
311
+
312
+
For more information, see the [Multimodal Embeddings Documentation](https://docs.voyageai.com/docs/multimodal-embeddings).
313
+
191
314
## License
192
315
193
316
`voyage-embedders-haystack` is distributed under the terms of the [Apache-2.0 license](https://github.com/awinml/voyage-embedders-haystack/blob/main/LICENSE).
0 commit comments