Skip to content

Commit 4433d5f

Browse files
authored
voyage-multimodal-3.5 (video) support (#384)
* voyage-multimodal-3.5 (video) support * Updated: voyage-multimodal-3.5 (video) support * Updated: voyage-multimodal-3.5 (video) support * Updated: voyage-multimodal-3.5 (video) support * Adding VoyageAI V4 family models
1 parent a037b32 commit 4433d5f

File tree

1 file changed

+132
-9
lines changed

1 file changed

+132
-9
lines changed

integrations/voyage.md

Lines changed: 132 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -24,17 +24,47 @@ toc: true
2424

2525
- [Installation](#installation)
2626
- [Usage](#usage)
27+
- [Supported Models](#supported-models)
2728
- [Example](#example)
2829
- [Contextualized Embeddings Example](#contextualized-embeddings-example)
30+
- [Multimodal Embeddings](#multimodal-embeddings)
2931

3032
[Voyage AI](https://voyageai.com/)'s embedding and ranking models are state-of-the-art in retrieval accuracy. The integration supports the following models:
31-
- **`voyage-3.5`** and **`voyage-3.5-lite`** - Latest general-purpose embedding models with superior performance
32-
- **`voyage-3-large`** and **`voyage-3`** - High-performance general-purpose embedding models
33+
- **`voyage-4-large`**, **`voyage-4`**, and **`voyage-4-lite`** - Latest general-purpose embedding models with shared embedding space and MoE architecture
34+
- **`voyage-3.5`** and **`voyage-3.5-lite`** - General-purpose embedding models with superior performance
35+
- **`voyage-code-3`** - Optimized for code retrieval
3336
- **`voyage-context-3`** - Contextualized chunk embedding model that preserves document context for improved retrieval accuracy
34-
- **`voyage-2`** and **`voyage-large-2`** - Proven models that outperform `intfloat/e5-mistral-7b-instruct` and `OpenAI/text-embedding-3-large` on the [MTEB Benchmark](https://github.com/embeddings-benchmark/mteb)
37+
- **`voyage-multimodal-3.5`** - Multimodal model supporting text, images, and video (preview)
3538

3639
For the complete list of available models, see the [Embeddings Documentation](https://docs.voyageai.com/embeddings/) and [Contextualized Chunk Embeddings](https://docs.voyageai.com/docs/contextualized-chunk-embeddings).
3740

41+
## Supported Models
42+
43+
### Text Embedding Models
44+
45+
| Model | Description | Dimensions |
46+
|-------|-------------|------------|
47+
| `voyage-4-large` | The best general-purpose and multilingual retrieval quality | 1024 (default), 256, 512, 2048 |
48+
| `voyage-4` | Optimized for general-purpose and multilingual retrieval quality | 1024 (default), 256, 512, 2048 |
49+
| `voyage-4-lite` | Optimized for latency and cost | 1024 (default), 256, 512, 2048 |
50+
| `voyage-3.5` | General-purpose embedding model | 1024 |
51+
| `voyage-3.5-lite` | Efficient model with lower latency | 1024 |
52+
| `voyage-code-3` | Optimized for code retrieval | 1024 |
53+
54+
### Multimodal Embedding Models
55+
56+
| Model | Description | Dimensions | Modalities |
57+
|-------|-------------|------------|------------|
58+
| `voyage-multimodal-3` | Multimodal embedding model | 1024 | Text, Images |
59+
| `voyage-multimodal-3.5` | Multimodal embedding model (preview) | 256, 512, 1024, 2048 | Text, Images, Video |
60+
61+
### Reranker Models
62+
63+
| Model | Description |
64+
|-------|-------------|
65+
| `rerank-2` | High-accuracy reranker model |
66+
| `rerank-2-lite` | Efficient reranker with lower latency |
67+
3868
## Installation
3969

4070
```bash
@@ -43,10 +73,11 @@ pip install voyage-embedders-haystack
4373

4474
## Usage
4575

46-
You can use Voyage models with four components:
76+
You can use Voyage models with five components:
4777
- [VoyageTextEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_text_embedder.py) - For embedding query text
4878
- [VoyageDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_document_embedder.py) - For embedding documents
49-
- [VoyageContextualizedDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/voyage_context-3_model/src/haystack_integrations/components/embedders/voyage_embedders/voyage_contextualized_document_embedder.py) - For contextualized chunk embeddings with `voyage-context-3`
79+
- [VoyageContextualizedDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_contextualized_document_embedder.py) - For contextualized chunk embeddings with `voyage-context-3`
80+
- [VoyageMultimodalEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_multimodal_embedder.py) - For multimodal embeddings with `voyage-multimodal-3.5`
5081
- [VoyageRanker](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/rankers/voyage/ranker.py) - For reranking documents
5182

5283
### Standard Embeddings
@@ -58,11 +89,10 @@ To create semantic embeddings for documents, use `VoyageDocumentEmbedder` in you
5889
For improved retrieval quality, use `VoyageContextualizedDocumentEmbedder` with the `voyage-context-3` model. This component preserves context between related document chunks by grouping them together during embedding, reducing context loss that occurs when chunks are embedded independently
5990

6091
**Important:** You must explicitly specify the `model` parameter when initializing any component. Choose from the available models listed in the [Embeddings Documentation](https://docs.voyageai.com/embeddings/). Recommended choices include:
61-
- `voyage-3.5` - Latest general-purpose model for best performance
62-
- `voyage-3.5-lite` - Efficient model with lower latency
63-
- `voyage-3-large` - High-capacity model for complex tasks
92+
- `voyage-4-large` - Best general-purpose and multilingual retrieval quality
93+
- `voyage-4` - Balanced general-purpose and multilingual retrieval quality
94+
- `voyage-4-lite` - Optimized for latency and cost
6495
- `voyage-context-3` - Contextualized embeddings for improved retrieval (use with `VoyageContextualizedDocumentEmbedder`)
65-
- `voyage-2` - Proven general-purpose model
6696

6797
You can set the environment variable `VOYAGE_API_KEY` instead of passing the API key as an argument. To get an API key, please see the [Voyage AI website.](https://www.voyageai.com/)
6898

@@ -188,6 +218,99 @@ result = embedder.run(documents=docs)
188218

189219
For more examples, see the [contextualized embedder example](https://github.com/awinml/voyage-embedders-haystack/blob/voyage_context-3_model/examples/contextualized_embedder_example.py).
190220

221+
## Multimodal Embeddings
222+
223+
Voyage AI's `voyage-multimodal-3.5` model transforms unstructured data from multiple modalities (text, images, video) into a shared vector space. This enables mixed-media document retrieval and cross-modal semantic search.
224+
225+
### Features
226+
227+
- **Multiple modalities**: Supports text, images, and video in a single input
228+
- **Variable dimensions**: Output dimensions of 256, 512, 1024 (default), or 2048
229+
- **Interleaved content**: Mix text, images, and video in single inputs
230+
- **No preprocessing required**: Process documents with embedded images directly
231+
232+
### Limits
233+
234+
- Images: Max 20MB, 16 million pixels
235+
- Video: Max 20MB
236+
- Context: 32,000 tokens
237+
- Token counting: 560 image pixels = 1 token, 1120 video pixels = 1 token
238+
239+
### Basic Multimodal Example
240+
241+
Use the `VoyageMultimodalEmbedder` component for multimodal embeddings. Each input is a list of content items (text, images, or videos):
242+
243+
```python
244+
from haystack.dataclasses import ByteStream
245+
from haystack_integrations.components.embedders.voyage_embedders import VoyageMultimodalEmbedder
246+
247+
# Text-only embedding
248+
embedder = VoyageMultimodalEmbedder(model="voyage-multimodal-3.5")
249+
result = embedder.run(inputs=[["A sunset over the ocean"]])
250+
print(f"Embedding dimensions: {len(result['embeddings'][0])}")
251+
252+
# Mixed text and image embedding
253+
image_bytes = ByteStream.from_file_path("image.jpg")
254+
result = embedder.run(inputs=[["Product photo for online store", image_bytes]])
255+
print(f"Tokens used: {result['meta']['total_tokens']}")
256+
```
257+
258+
### Multimodal Example with Custom Dimensions
259+
260+
```python
261+
from haystack.dataclasses import ByteStream
262+
from haystack_integrations.components.embedders.voyage_embedders import VoyageMultimodalEmbedder
263+
264+
# Configure output dimensions (256, 512, 1024, or 2048)
265+
embedder = VoyageMultimodalEmbedder(
266+
model="voyage-multimodal-3.5",
267+
output_dimension=2048, # Higher dimensions for better accuracy
268+
input_type="document", # Optimize for document retrieval
269+
)
270+
271+
# Embed multiple inputs at once
272+
image1 = ByteStream.from_file_path("doc1.jpg")
273+
image2 = ByteStream.from_file_path("doc2.jpg")
274+
275+
result = embedder.run(inputs=[
276+
["Document about machine learning", image1],
277+
["Technical diagram", image2],
278+
])
279+
280+
print(f"Number of embeddings: {len(result['embeddings'])}")
281+
print(f"Image pixels processed: {result['meta']['image_pixels']}")
282+
```
283+
284+
### Video Embedding Example
285+
286+
Video inputs require the `voyageai.video_utils` module:
287+
288+
```python
289+
from voyageai.video_utils import Video
290+
from haystack_integrations.components.embedders.voyage_embedders import VoyageMultimodalEmbedder
291+
292+
embedder = VoyageMultimodalEmbedder(model="voyage-multimodal-3.5")
293+
294+
# Load video using VoyageAI's Video utility
295+
video = Video.from_path("video.mp4", model="voyage-multimodal-3.5")
296+
297+
# Embed video with optional text context
298+
result = embedder.run(inputs=[["Machine learning tutorial", video]])
299+
300+
print(f"Embedding dimensions: {len(result['embeddings'][0])}")
301+
print(f"Video pixels processed: {result['meta']['video_pixels']}")
302+
print(f"Total tokens: {result['meta']['total_tokens']}")
303+
```
304+
305+
### Use Cases
306+
307+
- Mixed-media document retrieval (PDFs, slides with images)
308+
- Image-text similarity search
309+
- Video content retrieval and search
310+
- Cross-modal semantic search
311+
312+
For more information, see the [Multimodal Embeddings Documentation](https://docs.voyageai.com/docs/multimodal-embeddings).
313+
191314
## License
192315

193316
`voyage-embedders-haystack` is distributed under the terms of the [Apache-2.0 license](https://github.com/awinml/voyage-embedders-haystack/blob/main/LICENSE).

0 commit comments

Comments
 (0)