Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Commit c1559f1

Browse files
author
Yuanbo Li
committed
update readme
1 parent 0ca1d43 commit c1559f1

File tree

2 files changed

+48
-64
lines changed

2 files changed

+48
-64
lines changed

README.md

Lines changed: 4 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
- 前端界面
77

8-
![console](./console.png)
8+
![console](./readme_1.png)
99

1010
- 部署方式
1111
- 参考 [workshop](https://catalog.us-east-1.prod.workshops.aws/workshops/158a2497-7cbe-4ba4-8bee-2307cb01c08a/en-US)
@@ -62,65 +62,6 @@
6262

6363
- 知识库构建
6464

65-
+ 构建Opensearch Index
66-
其中**doc_type**可以为以下四个值**['Question','Paragraph','Sentence','Abstract']**
67-
注意:"dimension": 768 这个参数需要根据实际使用的向量模型输出纬度进行修改
68-
```shell
69-
PUT chatbot-index
70-
{
71-
"settings" : {
72-
"index":{
73-
"number_of_shards" : 1,
74-
"number_of_replicas" : 0,
75-
"knn": "true",
76-
"knn.algo_param.ef_search": 32
77-
}
78-
},
79-
"mappings": {
80-
"properties": {
81-
"publish_date" : {
82-
"type": "date",
83-
"format": "yyyy-MM-dd HH:mm:ss"
84-
},
85-
"idx" : {
86-
"type": "integer"
87-
},
88-
"doc_type" : {
89-
"type" : "keyword"
90-
},
91-
"doc": {
92-
"type": "text",
93-
"analyzer": "ik_max_word",
94-
"search_analyzer": "ik_smart"
95-
},
96-
"content": {
97-
"type": "text",
98-
"analyzer": "ik_max_word",
99-
"search_analyzer": "ik_smart"
100-
},
101-
"doc_title": {
102-
"type": "keyword"
103-
},
104-
"doc_author": {
105-
"type": "keyword"
106-
},
107-
"doc_category": {
108-
"type": "keyword"
109-
},
110-
"embedding": {
111-
"type": "knn_vector",
112-
"dimension": 768,
113-
"method": {
114-
"name": "hnsw",
115-
"space_type": "cosinesimil",
116-
"engine": "nmslib",
117-
"parameters": {
118-
"ef_construction": 512,
119-
"m": 32
120-
}
121-
}
122-
}
123-
}
124-
}
125-
}
126-
```
65+
参考[README.md](https://github.com/aws-samples/private-llm-qa-bot/blob/main/code/offline_process/aos_schema.md)构建知识库, 构建了知识库以后才能导入知识文件。构建完毕后,可以从前端页面导入知识。导入成功以后,能够在文档库中找到对应的知识文件
66+
67+
![console](./readme_2.png)

doc_preprocess/README.md

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,46 @@
1+
2+
3+
### 知识 - 标准中间格式
4+
在知识构造的过程中,所有原始格式的文档,都需要考虑转换到一种统一的知识格式,方便后续进行知识的注入和召回优化。
5+
6+
```json
7+
# schema
8+
{
9+
"page_content":"{will include original document string and sliced document string}",
10+
"metadata":{
11+
"content_type":"paragraph", # paragraph/table/QA
12+
"heading_hierarchy":{
13+
},
14+
"figure_list":[
15+
],
16+
"chunk_id":"",
17+
"file_path":"",
18+
"keywords":[
19+
],
20+
"summary":""
21+
}
22+
}
23+
24+
# FAQ 例子
25+
{
26+
"page_content":"{Question}=>{Answer}",
27+
"metadata":{
28+
"content_type":"QA",
29+
"heading_hierarchy":{
30+
},
31+
"figure_list":[
32+
],
33+
"chunk_id":"",
34+
"file_path":"",
35+
"keywords":[
36+
],
37+
"summary":""
38+
}
39+
}
40+
```
41+
42+
43+
144
### 从长文档提取FAQ
245

346
- 适应场景
@@ -51,7 +94,7 @@
5194
+ ./common-stock-fs.pdf
5295

5396
- Usage
54-
97+
5598
输出为输入的同名目录
5699

57100
```shell

0 commit comments

Comments
 (0)