-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[DOC] Add docs for missing embedding functions in python and typescript #5864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Add documentation pages for eight previously undocumented embedding functions This PR introduces eight new markdown files under Key Changes• Added Affected Areas• Documentation site content ( This summary was automatically generated by @propel-code-bot |
| embeddings = bedrock_ef(texts) | ||
| ``` | ||
|
|
||
| You can pass in an optional `model_name` argument, which lets you choose which Amazon Bedrock embedding model to use. By default, Chroma uses `amazon.titan-embed-text-v1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Documentation]
In several of the new documentation files, the example code explicitly sets a parameter to its default value, while the following text describes it as optional. This could be slightly confusing for users, as they might think the parameter is required. To improve clarity, consider rephrasing the explanation to acknowledge that the example shows the default being set explicitly, or remove the parameter from the example to demonstrate that it's optional.
For example, you could change this line to something like:
The
model_nameargument is optional and defaults to"amazon.titan-embed-text-v1". The example above shows how to set it explicitly, but it can be omitted to use the default.
This pattern also appears in:
docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/chroma-cloud-splade.md(line 31)docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/text2vec.md(line 27)
Context for Agents
[**Documentation**]
In several of the new documentation files, the example code explicitly sets a parameter to its default value, while the following text describes it as optional. This could be slightly confusing for users, as they might think the parameter is required. To improve clarity, consider rephrasing the explanation to acknowledge that the example shows the default being set explicitly, or remove the parameter from the example to demonstrate that it's optional.
For example, you could change this line to something like:
> The `model_name` argument is optional and defaults to `"amazon.titan-embed-text-v1"`. The example above shows how to set it explicitly, but it can be omitted to use the default.
This pattern also appears in:
- `docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/chroma-cloud-splade.md` (line 31)
- `docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/text2vec.md` (line 27)
File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/amazon-bedrock.md
Line: 30| // npm install @chroma-core/chroma-bm25 | ||
|
|
||
| import { ChromaBm25EmbeddingFunction } from "@chroma-core/chroma-bm25"; | ||
|
|
||
| const embedder = new ChromaBm25EmbeddingFunction({ | ||
| k: 1.2, | ||
| b: 0.75, | ||
| avgDocLength: 256.0, | ||
| tokenMaxLength: 40, | ||
| }); | ||
|
|
||
| // use directly | ||
| const sparseEmbeddings = await embedder.generate(["document1", "document2"]); | ||
|
|
||
| // pass documents to query for .add and .query | ||
| const collection = await client.createCollection({ | ||
| name: "name", | ||
| embeddingFunction: embedder, | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Documentation]
The TypeScript code snippet uses a client variable without defining it, which can be confusing for users. To make the example self-contained and runnable, it's best to include the client initialization. Additionally, using a more descriptive collection name like "my_collection" instead of "name" would make the example clearer.
Context for Agents
[**Documentation**]
The TypeScript code snippet uses a `client` variable without defining it, which can be confusing for users. To make the example self-contained and runnable, it's best to include the client initialization. Additionally, using a more descriptive collection name like `"my_collection"` instead of `"name"` would make the example clearer.
File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/chroma-bm25.md
Line: 60|
|
||
| {% Tab label="python" %} | ||
|
|
||
| This embedding function relies on the `boto3` python package, which you can install with `pip install boto3`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Documentation]
Fix capitalization: 'python' should be 'Python'.
Context for Agents
[**Documentation**]
Fix capitalization: 'python' should be 'Python'.
File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/amazon-bedrock.md
Line: 14|
|
||
| {% Tab label="python" %} | ||
|
|
||
| This embedding function relies on the `httpx` python package, which you can install with `pip install httpx`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Documentation]
Fix capitalization: 'python' should be 'Python'.
Context for Agents
[**Documentation**]
Fix capitalization: 'python' should be 'Python'.
File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/chroma-cloud-qwen.md
Line: 14|
|
||
| {% Tab label="python" %} | ||
|
|
||
| This embedding function relies on the `httpx` python package, which you can install with `pip install httpx`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Documentation]
Fix capitalization: 'python' should be 'Python'.
Context for Agents
[**Documentation**]
Fix capitalization: 'python' should be 'Python'.
File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/chroma-cloud-splade.md
Line: 16|
|
||
| {% Tab label="python" %} | ||
|
|
||
| This embedding function relies on the `nomic` python package, which you can install with `pip install nomic`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Documentation]
Fix capitalization: 'python' should be 'Python'.
Context for Agents
[**Documentation**]
Fix capitalization: 'python' should be 'Python'.
File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/nomic.md
Line: 14|
|
||
| {% Tab label="python" %} | ||
|
|
||
| This embedding function relies on several python packages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Documentation]
Fix capitalization: 'python' should be 'Python'.
Context for Agents
[**Documentation**]
Fix capitalization: 'python' should be 'Python'.
File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/open-clip.md
Line: 14|
|
||
| {% Tab label="python" %} | ||
|
|
||
| This embedding function relies on the `sentence_transformers` python package, which you can install with `pip install sentence_transformers`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Documentation]
Fix capitalization: 'python' should be 'Python'.
Context for Agents
[**Documentation**]
Fix capitalization: 'python' should be 'Python'.
File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/sentence-transformer.md
Line: 14|
|
||
| {% Tab label="python" %} | ||
|
|
||
| This embedding function relies on the `text2vec` python package, which you can install with `pip install text2vec`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Documentation]
Fix capitalization: 'python' should be 'Python'.
Context for Agents
[**Documentation**]
Fix capitalization: 'python' should be 'Python'.
File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/text2vec.md
Line: 14
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?