Skip to content

fix(knowledgeBase): handle dimension parameter when sending embedding request #8086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

EurFelux
Copy link
Contributor

@EurFelux EurFelux commented Jul 11, 2025

What this PR does

KnowledgeBaseParamsKnowledgeBase添加isAutoDimensions标识是否为自动设置的嵌入维度(即嵌入维度为模型默认值),用于判断是否需要在嵌入请求中加入维度参数。

isAutoDimensionsfalse且当模型是qwen3-embedding模型且维度与默认值相同时,将维度设为undefined以避免参数传递。这样处理是为了向后兼容。

在添加知识库的modal中添加了一条提示信息。该提示仅在嵌入模型为qwen3-embedding系列时出现。

image

统一了info tooltip的颜色

before after
image image

Fixes #8066
Fixes #8283
Fixes #8301

Why we need it and why it was done in this way

The following tradeoffs were made:

The following alternatives were considered:

Links to places where the discussion took place:

Breaking changes

If this PR introduces breaking changes, please describe the changes and the impact on users.

Special notes for your reviewer

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note


@EurFelux EurFelux changed the title fix(embeddings): handle dimension parameter for qwen3-embedding series models fix(knowledgeBase): handle dimension parameter for qwen3-embedding series models Jul 11, 2025
Comment on lines 52 to 58

let newDimensions: number | undefined = dimensions
const baseModelName = getLowerBaseModelName(model)
if (dimensions === EMBEDDING_MODEL_DEFAULT_DIMS[baseModelName]) {
newDimensions = undefined
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

除了 qwen3 还有别的嵌入模型也不支持 dimensions 吧?
能不能直接在嵌入维度留空的时候就不传这个值呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

考虑过。添加知识库时的逻辑在之前某次pr中变回总是储存嵌入维度了,包括自动获取的。主要问题就是要不要储存dimensionsKnowledgeBaseParams。我这里按照储存dimensions做的设计。问题在于,即使改回去了,也要维护之前版本创建的那些总是储存dimensions字段的知识库,所以这里同样省略不了。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为总是有些知识库没有用户设置的嵌入维度也储存了dimensions,所以考虑向后兼容的话就必须得这么做。既然无论如何都要这么做,那再去把添加知识库的逻辑改成自动设置时不储存dimensions也没什么意义了。干脆就储存下来。除非不考虑向后兼容。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

自动设置就保持原样?毕竟它也是“设置”,关闭自动设置之后,不输入dimensions就不发送,这样会好点吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这样交互上有点麻烦,而且不太符合直觉。干脆加个flag表示是否为自动设置好了。自动设置就不传。

Copy link
Collaborator

@alephpiece alephpiece Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

实现 getDimentions 指的是在 VoyageEmbeddings 里面根据模型类型硬编码了默认嵌入维度吗?

override async getDimensions(): Promise<number> {
return this.configuration?.outputDimension ?? (this.configuration?.modelName === 'voyage-code-2' ? 1536 : 1024)
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

原本的逻辑似乎是交给用户控制?

  • 如果指定了 outputDimensions,就传这个值
  • 如果这个参数不被支持,就报错给用户

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我是感觉目前好像没法从 UI 控制传不传参数。不需要传 dimensions 的模型应该还有很多?可以像 #7893 那样能够知道每个模型是不是接受这个参数,或者就完全交给用户控制,写了就是要传,不写就是不传。
现在这样,就是当有人发现自己想不传参数给某个模型,但是又没法控制的时候,就发个 issue 让你改一下😂

Copy link
Contributor Author

@EurFelux EurFelux Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前getDimensions是需要从configuration获取outputDimension,否则就要报错,导致如果不设置dimensions就根本用不了。但是设置了的话,就总是要传入这个参数,导致不支持outputDimension参数的模型又会报错。结果就变成给不给dimensions都要报错了。所以改成了现在这个方案。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是有点难办。#7893 的设计其实也没法完全解决问题。像vllm这样,本来模型应该支持dimensions设置的也要报错。所以想交给用户设置是否要传入dimensions,然后传入与否和设置与否还要分开。

EurFelux added 2 commits July 14, 2025 16:56
添加临时警告提示,当选择qwen3嵌入模型时显示维度不支持警告。该提示将在vllm支持qwen3后移除。
为知识库添加autoDims字段,用于控制是否自动设置向量维度
修改EmbeddingsFactory逻辑,根据autoDims决定是否使用指定维度
添加版本迁移逻辑,将现有知识库autoDims默认设为false
@EurFelux EurFelux changed the title fix(knowledgeBase): handle dimension parameter for qwen3-embedding series models fix(knowledgeBase): handle dimension parameter when sending embedding request Jul 19, 2025
EurFelux added 3 commits July 20, 2025 02:13
重构知识库相关代码,将字段autoDims统一更名为isAutoDimensions
修改相关逻辑处理,确保在不同服务间传递时保持一致
移除不再需要的迁移代码
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants