Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: 知识库添加大文件夹遇到无响应,无法正常添加数据 #1979

Open
3 tasks done
seakymile opened this issue Feb 19, 2025 · 4 comments
Open
3 tasks done
Labels
bug Something isn't working

Comments

@seakymile
Copy link

Issue Checklist

  • I understand that issues are for feedback and problem solving, not for complaining in the comment section, and will provide as much information as possible to help solve the problem.
  • I've looked at pinned issues and searched for existing Open Issues and Closed Issues, no similar issue was found.
  • I've filled in short, clear headings so that developers can quickly identify a rough idea of what to expect when flipping through the list of issues. And not "a suggestion", "stuck", etc.

Platform

Windows

Version

V0.9.27

Bug Description

系统:windows11,家庭版
软件版本:0.9.27,这个是软件自动更新到最新的版本
问题步骤:
1、知识库创建好
2、添加一个大的文件目录,目录里面有几十个子目录,总共的文件大约十来个GB
3、添加进去,软件就是显示无响应,然后就没动静了
4、看了下log,开始会检索生成一些向量数据,后面就没动静了
5、把软件强制关闭后,在打开,这个目录就无法索引了
这个问题感觉比较严重,资料啥的,总不能一个一个拿出来单独上传吧,太费劲了,希望早点解决

Steps To Reproduce

问题步骤:
1、知识库创建好
2、添加一个大的文件目录,目录里面有几十个子目录,总共的文件大约十来个GB
3、添加进去,软件就是显示无响应,然后就没动静了
4、看了下log,开始会检索生成一些向量数据,后面就没动静了
5、把软件强制关闭后,在打开,这个目录就无法索引了
这个问题感觉比较严重,资料啥的,总不能一个一个拿出来单独上传吧,太费劲了,希望早点解决

Expected Behavior

希望能够支持大文件夹(包括很多子目录)的索引,同时,如果软件异常关闭,其实这个目录旁边弄个小按钮,重新刷新下,可以继续索引数据资料,也挺好的

Relevant Log Output

Additional Context

No response

@seakymile seakymile added the bug Something isn't working label Feb 19, 2025
@WuShichao
Copy link

我的24GB,我是分成很多子文件夹分别导入的

@WuShichao
Copy link

断点续传我之前也跟开发者 @亢奋猫 提过,他记录在案了

@kangfenmao
Copy link
Collaborator

Cherry 目前还没有能力处理超大规模的知识库

@a11s
Copy link

a11s commented Feb 20, 2025

我也曾今关于到这个问题,而且导致cs崩溃了.一晚上白干了.

我在思考,一次性添加是否是正确的用法.毕竟返回的参考文档是有限的.手动的建立多个知识库并且单独去指定某个"话题"跟embeding模型 似乎能缩小范围提高回答的质量.(尤其是中英文图像什么的,每个模型有他们擅长的事情)
当然,一股脑的丢给它会很省心.
目前我也在探索中.正在进行这方面的尝试

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants