-
-
Notifications
You must be signed in to change notification settings - Fork 407
Embeddings enhancements #651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embeddings enhancements #651
Conversation
o-mikhailovskii
commented
Sep 18, 2024
- Added support for Google's embedding models.
- Cohere's embeddings are currently broken since they require the inputType parameter (see the screenshot below). Now, there is a dropdown menu for making the selection.

src/constants.ts
Outdated
@@ -146,6 +155,13 @@ export const BUILTIN_EMBEDDING_MODELS: CustomModel[] = [ | |||
}, | |||
]; | |||
|
|||
export const COHEREAI_EMBEDDING_INPUT_TYPES = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a user setting or set internally? I'd like to avoid cluttering the settings and overwhelming users with options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right on one side. On the other side (at least in the current version of API), there are these four options, and that's it. Why would you need to make custom options?
However, when I dug into more details, a "best practice" usage of Cohere's API would be as follows:
- Embed documents with the
search_document
input type, - Search over the document with the
search_query
input type.
This suggests that dynamically changing the input type throughout Langchain usage would be ideal. However, it seems this would require substantial changes in plugin's codebase.
What do you think would be the best way to move forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The use of langchain is for switching between providers easily. If it doesn't give us the convenience, it'll be counterproductive to adapt its code for individual providers. Since there's a best practice, does langchain cohere client already have something out of the box for us to use? If not, I'd rather not adapt to it here in Copilot.
That said, I'm considering providing Copilot exclusive embedding models next. The user won't need an API key, just call that embedding provider through my gateway. For that, I'm considering Cohere vs. Voyage embeddings. Right now I'm inclined to Voyage for its better MTEB ranking and goodies like rerank API with explanation. If Cohere actually performs better in practice, I'll add it there as the Copilot's default embedding model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. The issue with the inputType
parameter appears to be addressed in @langchain/cohere
versions later than 0.2.0 (rendering my corresponding commit obsolete). Updating this package forces a cascade of dependency updates, ultimately leading to the need for a more recent langchain
package, such as version 0.3.0. Consequently, this results in changes to import
statements.
…3 models" This reverts commit da7b31a.
Thanks for the update! @o-mikhailovskii |