Embeddings enhancements #651

o-mikhailovskii · 2024-09-18T11:44:14Z

Added support for Google's embedding models.
Cohere's embeddings are currently broken since they require the inputType parameter (see the screenshot below). Now, there is a dropdown menu for making the selection.

input_type must be provided with the model

logancyang · 2024-09-19T21:10:53Z

src/constants.ts

@@ -146,6 +155,13 @@ export const BUILTIN_EMBEDDING_MODELS: CustomModel[] = [
  },
 ];

+export const COHEREAI_EMBEDDING_INPUT_TYPES = [


Should this be a user setting or set internally? I'd like to avoid cluttering the settings and overwhelming users with options.

You are right on one side. On the other side (at least in the current version of API), there are these four options, and that's it. Why would you need to make custom options?

However, when I dug into more details, a "best practice" usage of Cohere's API would be as follows:

Embed documents with the search_document input type,

Search over the document with the search_query input type.

This suggests that dynamically changing the input type throughout Langchain usage would be ideal. However, it seems this would require substantial changes in plugin's codebase.

What do you think would be the best way to move forward?

I see. The use of langchain is for switching between providers easily. If it doesn't give us the convenience, it'll be counterproductive to adapt its code for individual providers. Since there's a best practice, does langchain cohere client already have something out of the box for us to use? If not, I'd rather not adapt to it here in Copilot.

That said, I'm considering providing Copilot exclusive embedding models next. The user won't need an API key, just call that embedding provider through my gateway. For that, I'm considering Cohere vs. Voyage embeddings. Right now I'm inclined to Voyage for its better MTEB ranking and goodies like rerank API with explanation. If Cohere actually performs better in practice, I'll add it there as the Copilot's default embedding model.

Ok. The issue with the inputType parameter appears to be addressed in @langchain/cohere versions later than 0.2.0 (rendering my corresponding commit obsolete). Updating this package forces a cascade of dependency updates, ultimately leading to the need for a more recent langchain package, such as version 0.3.0. Consequently, this results in changes to import statements.

…3 models" This reverts commit da7b31a.

logancyang · 2024-09-30T23:00:46Z

Thanks for the update! @o-mikhailovskii

o-mikhailovskii added 2 commits September 18, 2024 13:15

Add Google embedding support

d034dfd

Add the required InputType parameter for Cohere's embedding V3 models

da7b31a

logancyang reviewed Sep 19, 2024

View reviewed changes

o-mikhailovskii added 2 commits September 29, 2024 12:52

Revert "Add the required InputType parameter for Cohere's embedding V…

069b584

…3 models" This reverts commit da7b31a.

Update dependencies

0130ef0

logancyang merged commit 7bb74f9 into logancyang:master Sep 30, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Embeddings enhancements #651

Embeddings enhancements #651

Uh oh!

o-mikhailovskii commented Sep 18, 2024

Uh oh!

logancyang Sep 19, 2024

Uh oh!

o-mikhailovskii Sep 28, 2024

Uh oh!

logancyang Sep 28, 2024

Uh oh!

o-mikhailovskii Sep 29, 2024

Uh oh!

logancyang commented Sep 30, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Embeddings enhancements #651

Embeddings enhancements #651

Uh oh!

Conversation

o-mikhailovskii commented Sep 18, 2024

Uh oh!

logancyang Sep 19, 2024

Choose a reason for hiding this comment

Uh oh!

o-mikhailovskii Sep 28, 2024

Choose a reason for hiding this comment

Uh oh!

logancyang Sep 28, 2024

Choose a reason for hiding this comment

Uh oh!

o-mikhailovskii Sep 29, 2024

Choose a reason for hiding this comment

Uh oh!

logancyang commented Sep 30, 2024

Uh oh!

Uh oh!

Uh oh!