This repository was archived by the owner on Sep 10, 2025. It is now read-only.
File tree Expand file tree Collapse file tree 2 files changed +48
-64
lines changed
Expand file tree Collapse file tree 2 files changed +48
-64
lines changed Original file line number Diff line number Diff line change 55
66- 前端界面
77
8- ![ console] ( ./console .png )
8+ ![ console] ( ./readme_1 .png )
99
1010- 部署方式
1111 - 参考 [ workshop] ( https://catalog.us-east-1.prod.workshops.aws/workshops/158a2497-7cbe-4ba4-8bee-2307cb01c08a/en-US )
6262
6363- 知识库构建
6464
65- + 构建Opensearch Index
66- 其中** doc_type** 可以为以下四个值** [ 'Question','Paragraph','Sentence','Abstract'] **
67- 注意:"dimension": 768 这个参数需要根据实际使用的向量模型输出纬度进行修改
68- ``` shell
69- PUT chatbot-index
70- {
71- " settings" : {
72- " index" :{
73- " number_of_shards" : 1,
74- " number_of_replicas" : 0,
75- " knn" : " true" ,
76- " knn.algo_param.ef_search" : 32
77- }
78- },
79- " mappings" : {
80- " properties" : {
81- " publish_date" : {
82- " type" : " date" ,
83- " format" : " yyyy-MM-dd HH:mm:ss"
84- },
85- " idx" : {
86- " type" : " integer"
87- },
88- " doc_type" : {
89- " type" : " keyword"
90- },
91- " doc" : {
92- " type" : " text" ,
93- " analyzer" : " ik_max_word" ,
94- " search_analyzer" : " ik_smart"
95- },
96- " content" : {
97- " type" : " text" ,
98- " analyzer" : " ik_max_word" ,
99- " search_analyzer" : " ik_smart"
100- },
101- " doc_title" : {
102- " type" : " keyword"
103- },
104- " doc_author" : {
105- " type" : " keyword"
106- },
107- " doc_category" : {
108- " type" : " keyword"
109- },
110- " embedding" : {
111- " type" : " knn_vector" ,
112- " dimension" : 768,
113- " method" : {
114- " name" : " hnsw" ,
115- " space_type" : " cosinesimil" ,
116- " engine" : " nmslib" ,
117- " parameters" : {
118- " ef_construction" : 512,
119- " m" : 32
120- }
121- }
122- }
123- }
124- }
125- }
126- ```
65+ 参考[ README.md] ( https://github.com/aws-samples/private-llm-qa-bot/blob/main/code/offline_process/aos_schema.md ) 构建知识库, 构建了知识库以后才能导入知识文件。构建完毕后,可以从前端页面导入知识。导入成功以后,能够在文档库中找到对应的知识文件
66+
67+ ![ console] ( ./readme_2.png )
Original file line number Diff line number Diff line change 1+
2+
3+ ### 知识 - 标准中间格式
4+ 在知识构造的过程中,所有原始格式的文档,都需要考虑转换到一种统一的知识格式,方便后续进行知识的注入和召回优化。
5+
6+ ``` json
7+ # schema
8+ {
9+ "page_content" :" {will include original document string and sliced document string}" ,
10+ "metadata" :{
11+ "content_type" :" paragraph" , # paragraph/table/QA
12+ "heading_hierarchy" :{
13+ },
14+ "figure_list" :[
15+ ],
16+ "chunk_id" :" " ,
17+ "file_path" :" " ,
18+ "keywords" :[
19+ ],
20+ "summary" :" "
21+ }
22+ }
23+
24+ # FAQ 例子
25+ {
26+ "page_content" :" {Question}=>{Answer}" ,
27+ "metadata" :{
28+ "content_type" :" QA" ,
29+ "heading_hierarchy" :{
30+ },
31+ "figure_list" :[
32+ ],
33+ "chunk_id" :" " ,
34+ "file_path" :" " ,
35+ "keywords" :[
36+ ],
37+ "summary" :" "
38+ }
39+ }
40+ ```
41+
42+
43+
144### 从长文档提取FAQ
245
346- 适应场景
5194 + ./common-stock-fs.pdf
5295
5396- Usage
54-
97+
5598 输出为输入的同名目录
5699
57100 ` ` ` shell
You can’t perform that action at this time.
0 commit comments