Skip to content

Commit bf7da4c

Browse files
authored
Merge pull request #15 from stackhpc/upgrade-0.23.1
Upgrade to latest release tag v0.23.1
2 parents 757d449 + b65a93a commit bf7da4c

File tree

1,214 files changed

+72857
-596975
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,214 files changed

+72857
-596975
lines changed

.github/copilot-instructions.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Refer to [AGENTS.MD](../AGENTS.md) for all repo instructions.

.github/workflows/release.yml

Lines changed: 24 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,18 @@ name: release
33
on:
44
schedule:
55
- cron: '0 13 * * *' # This schedule runs every 13:00:00Z(21:00:00+08:00)
6+
# https://github.com/orgs/community/discussions/26286?utm_source=chatgpt.com#discussioncomment-3251208
7+
# "The create event does not support branch filter and tag filter."
68
# The "create tags" trigger is specifically focused on the creation of new tags, while the "push tags" trigger is activated when tags are pushed, including both new tag creations and updates to existing tags.
7-
create:
9+
push:
810
tags:
911
- "v*.*.*" # normal release
10-
- "nightly" # the only one mutable tag
12+
13+
permissions:
14+
contents: write
15+
actions: read
16+
checks: read
17+
statuses: read
1118

1219
# https://docs.github.com/en/actions/using-jobs/using-concurrency
1320
concurrency:
@@ -21,22 +28,22 @@ jobs:
2128
- name: Ensure workspace ownership
2229
run: echo "chown -R ${USER} ${GITHUB_WORKSPACE}" && sudo chown -R ${USER} ${GITHUB_WORKSPACE}
2330

24-
# https://github.com/actions/checkout/blob/v3/README.md
31+
# https://github.com/actions/checkout/blob/v6/README.md
2532
- name: Check out code
26-
uses: actions/checkout@v4
33+
uses: actions/checkout@v6
2734
with:
2835
token: ${{ secrets.GITHUB_TOKEN }} # Use the secret as an environment variable
2936
fetch-depth: 0
3037
fetch-tags: true
3138

3239
- name: Prepare release body
3340
run: |
34-
if [[ ${GITHUB_EVENT_NAME} == "create" ]]; then
41+
if [[ ${GITHUB_EVENT_NAME} != "schedule" ]]; then
3542
RELEASE_TAG=${GITHUB_REF#refs/tags/}
36-
if [[ ${RELEASE_TAG} == "nightly" ]]; then
37-
PRERELEASE=true
38-
else
43+
if [[ ${RELEASE_TAG} == v* ]]; then
3944
PRERELEASE=false
45+
else
46+
PRERELEASE=true
4047
fi
4148
echo "Workflow triggered by create tag: ${RELEASE_TAG}"
4249
else
@@ -55,7 +62,7 @@ jobs:
5562
git fetch --tags
5663
if [[ ${GITHUB_EVENT_NAME} == "schedule" ]]; then
5764
# Determine if a given tag exists and matches a specific Git commit.
58-
# actions/checkout@v4 fetch-tags doesn't work when triggered by schedule
65+
# actions/checkout@v6 fetch-tags doesn't work when triggered by schedule
5966
if [ "$(git rev-parse -q --verify "refs/tags/${RELEASE_TAG}")" = "${GITHUB_SHA}" ]; then
6067
echo "mutable tag ${RELEASE_TAG} exists and matches ${GITHUB_SHA}"
6168
else
@@ -75,6 +82,14 @@ jobs:
7582
# The body field does not support environment variable substitution directly.
7683
body_path: release_body.md
7784

85+
- name: Build and push image
86+
run: |
87+
sudo docker login --username infiniflow --password-stdin <<< ${{ secrets.DOCKERHUB_TOKEN }}
88+
sudo docker build --build-arg NEED_MIRROR=1 --build-arg HTTPS_PROXY=${HTTPS_PROXY} --build-arg HTTP_PROXY=${HTTP_PROXY} -t infiniflow/ragflow:${RELEASE_TAG} -f Dockerfile .
89+
sudo docker tag infiniflow/ragflow:${RELEASE_TAG} infiniflow/ragflow:latest
90+
sudo docker push infiniflow/ragflow:${RELEASE_TAG}
91+
sudo docker push infiniflow/ragflow:latest
92+
7893
- name: Build and push ragflow-sdk
7994
if: startsWith(github.ref, 'refs/tags/v')
8095
run: |
@@ -84,11 +99,3 @@ jobs:
8499
if: startsWith(github.ref, 'refs/tags/v')
85100
run: |
86101
cd admin/client && uv build && uv publish --token ${{ secrets.PYPI_API_TOKEN }}
87-
88-
- name: Build and push image
89-
run: |
90-
sudo docker login --username infiniflow --password-stdin <<< ${{ secrets.DOCKERHUB_TOKEN }}
91-
sudo docker build --build-arg NEED_MIRROR=1 -t infiniflow/ragflow:${RELEASE_TAG} -f Dockerfile .
92-
sudo docker tag infiniflow/ragflow:${RELEASE_TAG} infiniflow/ragflow:latest
93-
sudo docker push infiniflow/ragflow:${RELEASE_TAG}
94-
sudo docker push infiniflow/ragflow:latest

.github/workflows/tests.yml

Lines changed: 61 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
name: tests
2+
permissions:
3+
contents: read
24

35
on:
46
push:
@@ -12,7 +14,7 @@ on:
1214
# The only difference between pull_request and pull_request_target is the context in which the workflow runs:
1315
# — pull_request_target workflows use the workflow files from the default branch, and secrets are available.
1416
# — pull_request workflows use the workflow files from the pull request branch, and secrets are unavailable.
15-
pull_request_target:
17+
pull_request:
1618
types: [ synchronize, ready_for_review ]
1719
paths-ignore:
1820
- 'docs/**'
@@ -31,20 +33,17 @@ jobs:
3133
name: ragflow_tests
3234
# https://docs.github.com/en/actions/using-jobs/using-conditions-to-control-job-execution
3335
# https://github.com/orgs/community/discussions/26261
34-
if: ${{ github.event_name != 'pull_request_target' || contains(github.event.pull_request.labels.*.name, 'ci') }}
36+
if: ${{ github.event_name != 'pull_request' || (github.event.pull_request.draft == false && contains(github.event.pull_request.labels.*.name, 'ci')) }}
3537
runs-on: [ "self-hosted", "ragflow-test" ]
3638
steps:
37-
# https://github.com/hmarr/debug-action
38-
#- uses: hmarr/debug-action@v2
39-
4039
- name: Ensure workspace ownership
4140
run: |
4241
echo "Workflow triggered by ${{ github.event_name }}"
4342
echo "chown -R ${USER} ${GITHUB_WORKSPACE}" && sudo chown -R ${USER} ${GITHUB_WORKSPACE}
4443
4544
# https://github.com/actions/checkout/issues/1781
4645
- name: Check out code
47-
uses: actions/checkout@v4
46+
uses: actions/checkout@v6
4847
with:
4948
ref: ${{ (github.event_name == 'pull_request' || github.event_name == 'pull_request_target') && format('refs/pull/{0}/merge', github.event.pull_request.number) || github.sha }}
5049
fetch-depth: 0
@@ -53,7 +52,7 @@ jobs:
5352
- name: Check workflow duplication
5453
if: ${{ !cancelled() && !failure() }}
5554
run: |
56-
if [[ ${GITHUB_EVENT_NAME} != "pull_request_target" && ${GITHUB_EVENT_NAME} != "schedule" ]]; then
55+
if [[ ${GITHUB_EVENT_NAME} != "pull_request" && ${GITHUB_EVENT_NAME} != "schedule" ]]; then
5756
HEAD=$(git rev-parse HEAD)
5857
# Find a PR that introduced a given commit
5958
gh auth login --with-token <<< "${{ secrets.GITHUB_TOKEN }}"
@@ -78,7 +77,7 @@ jobs:
7877
fi
7978
fi
8079
fi
81-
elif [[ ${GITHUB_EVENT_NAME} == "pull_request_target" ]]; then
80+
elif [[ ${GITHUB_EVENT_NAME} == "pull_request" ]]; then
8281
PR_NUMBER=${{ github.event.pull_request.number }}
8382
PR_SHA_FP=${RUNNER_WORKSPACE_PREFIX}/artifacts/${GITHUB_REPOSITORY}/PR_${PR_NUMBER}
8483
# Calculate the hash of the current workspace content
@@ -95,13 +94,53 @@ jobs:
9594
version: ">=0.11.x"
9695
args: "check"
9796

97+
- name: Check comments of changed Python files
98+
if: ${{ false }}
99+
run: |
100+
if [[ ${{ github.event_name }} == 'pull_request' || ${{ github.event_name }} == 'pull_request_target' ]]; then
101+
CHANGED_FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha }}...${{ github.event.pull_request.head.sha }} \
102+
| grep -E '\.(py)$' || true)
103+
104+
if [ -n "$CHANGED_FILES" ]; then
105+
echo "Check comments of changed Python files with check_comment_ascii.py"
106+
107+
readarray -t files <<< "$CHANGED_FILES"
108+
HAS_ERROR=0
109+
110+
for file in "${files[@]}"; do
111+
if [ -f "$file" ]; then
112+
if python3 check_comment_ascii.py "$file"; then
113+
echo "✅ $file"
114+
else
115+
echo "❌ $file"
116+
HAS_ERROR=1
117+
fi
118+
fi
119+
done
120+
121+
if [ $HAS_ERROR -ne 0 ]; then
122+
exit 1
123+
fi
124+
else
125+
echo "No Python files changed"
126+
fi
127+
fi
128+
129+
- name: Run unit test
130+
run: |
131+
uv sync --python 3.12 --group test --frozen
132+
source .venv/bin/activate
133+
which pytest || echo "pytest not in PATH"
134+
echo "Start to run unit test"
135+
python3 run_tests.py
136+
98137
- name: Build ragflow:nightly
99138
run: |
100139
RUNNER_WORKSPACE_PREFIX=${RUNNER_WORKSPACE_PREFIX:-${HOME}}
101140
RAGFLOW_IMAGE=infiniflow/ragflow:${GITHUB_RUN_ID}
102141
echo "RAGFLOW_IMAGE=${RAGFLOW_IMAGE}" >> ${GITHUB_ENV}
103142
sudo docker pull ubuntu:22.04
104-
sudo DOCKER_BUILDKIT=1 docker build --build-arg NEED_MIRROR=1 -f Dockerfile -t ${RAGFLOW_IMAGE} .
143+
sudo DOCKER_BUILDKIT=1 docker build --build-arg NEED_MIRROR=1 --build-arg HTTPS_PROXY=${HTTPS_PROXY} --build-arg HTTP_PROXY=${HTTP_PROXY} -f Dockerfile -t ${RAGFLOW_IMAGE} .
105144
if [[ ${GITHUB_EVENT_NAME} == "schedule" ]]; then
106145
export HTTP_API_TEST_LEVEL=p3
107146
else
@@ -161,34 +200,34 @@ jobs:
161200
echo "HOST_ADDRESS=http://host.docker.internal:${SVR_HTTP_PORT}" >> ${GITHUB_ENV}
162201
163202
sudo docker compose -f docker/docker-compose.yml -p ${GITHUB_RUN_ID} up -d
164-
uv sync --python 3.10 --only-group test --no-default-groups --frozen && uv pip install sdk/python
203+
uv sync --python 3.12 --only-group test --no-default-groups --frozen && uv pip install sdk/python --group test
165204
166205
- name: Run sdk tests against Elasticsearch
167206
run: |
168207
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
169-
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
208+
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
170209
echo "Waiting for service to be available..."
171210
sleep 5
172211
done
173-
source .venv/bin/activate && pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api
212+
source .venv/bin/activate && set -o pipefail; pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api 2>&1 | tee es_sdk_test.log
174213
175214
- name: Run frontend api tests against Elasticsearch
176215
run: |
177216
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
178-
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
217+
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
179218
echo "Waiting for service to be available..."
180219
sleep 5
181220
done
182-
source .venv/bin/activate && pytest -s --tb=short sdk/python/test/test_frontend_api/get_email.py sdk/python/test/test_frontend_api/test_dataset.py
221+
source .venv/bin/activate && set -o pipefail; pytest -s --tb=short sdk/python/test/test_frontend_api/get_email.py sdk/python/test/test_frontend_api/test_dataset.py 2>&1 | tee es_api_test.log
183222
184223
- name: Run http api tests against Elasticsearch
185224
run: |
186225
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
187-
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
226+
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
188227
echo "Waiting for service to be available..."
189228
sleep 5
190229
done
191-
source .venv/bin/activate && pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api
230+
source .venv/bin/activate && set -o pipefail; pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api 2>&1 | tee es_http_api_test.log
192231
193232
- name: Stop ragflow:nightly
194233
if: always() # always run this step even if previous steps failed
@@ -204,29 +243,29 @@ jobs:
204243
- name: Run sdk tests against Infinity
205244
run: |
206245
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
207-
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
246+
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
208247
echo "Waiting for service to be available..."
209248
sleep 5
210249
done
211-
source .venv/bin/activate && DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api
250+
source .venv/bin/activate && set -o pipefail; DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_sdk_api 2>&1 | tee infinity_sdk_test.log
212251
213252
- name: Run frontend api tests against Infinity
214253
run: |
215254
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
216-
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
255+
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
217256
echo "Waiting for service to be available..."
218257
sleep 5
219258
done
220-
source .venv/bin/activate && DOC_ENGINE=infinity pytest -s --tb=short sdk/python/test/test_frontend_api/get_email.py sdk/python/test/test_frontend_api/test_dataset.py
259+
source .venv/bin/activate && set -o pipefail; DOC_ENGINE=infinity pytest -s --tb=short sdk/python/test/test_frontend_api/get_email.py sdk/python/test/test_frontend_api/test_dataset.py 2>&1 | tee infinity_api_test.log
221260
222261
- name: Run http api tests against Infinity
223262
run: |
224263
export http_proxy=""; export https_proxy=""; export no_proxy=""; export HTTP_PROXY=""; export HTTPS_PROXY=""; export NO_PROXY=""
225-
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS} > /dev/null; do
264+
until sudo docker exec ${RAGFLOW_CONTAINER} curl -s --connect-timeout 5 ${HOST_ADDRESS}/v1/system/ping > /dev/null; do
226265
echo "Waiting for service to be available..."
227266
sleep 5
228267
done
229-
source .venv/bin/activate && DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api
268+
source .venv/bin/activate && set -o pipefail; DOC_ENGINE=infinity pytest -s --tb=short --level=${HTTP_API_TEST_LEVEL} test/testcases/test_http_api 2>&1 | tee infinity_http_api_test.log
230269
231270
- name: Stop ragflow:nightly
232271
if: always() # always run this step even if previous steps failed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,3 +195,6 @@ ragflow_cli.egg-info
195195

196196
# Default backup dir
197197
backup
198+
199+
200+
.hypothesis

AGENTS.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# RAGFlow Project Instructions for GitHub Copilot
2+
3+
This file provides context, build instructions, and coding standards for the RAGFlow project.
4+
It is structured to follow GitHub Copilot's [customization guidelines](https://docs.github.com/en/copilot/concepts/prompting/response-customization).
5+
6+
## 1. Project Overview
7+
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It is a full-stack application with a Python backend and a React/TypeScript frontend.
8+
9+
- **Backend**: Python 3.10+ (Flask/Quart)
10+
- **Frontend**: TypeScript, React, UmiJS
11+
- **Architecture**: Microservices based on Docker.
12+
- `api/`: Backend API server.
13+
- `rag/`: Core RAG logic (indexing, retrieval).
14+
- `deepdoc/`: Document parsing and OCR.
15+
- `web/`: Frontend application.
16+
17+
## 2. Directory Structure
18+
- `api/`: Backend API server (Flask/Quart).
19+
- `apps/`: API Blueprints (Knowledge Base, Chat, etc.).
20+
- `db/`: Database models and services.
21+
- `rag/`: Core RAG logic.
22+
- `llm/`: LLM, Embedding, and Rerank model abstractions.
23+
- `deepdoc/`: Document parsing and OCR modules.
24+
- `agent/`: Agentic reasoning components.
25+
- `web/`: Frontend application (React + UmiJS).
26+
- `docker/`: Docker deployment configurations.
27+
- `sdk/`: Python SDK.
28+
- `test/`: Backend tests.
29+
30+
## 3. Build Instructions
31+
32+
### Backend (Python)
33+
The project uses **uv** for dependency management.
34+
35+
1. **Setup Environment**:
36+
```bash
37+
uv sync --python 3.12 --all-extras
38+
uv run download_deps.py
39+
```
40+
41+
2. **Run Server**:
42+
- **Pre-requisite**: Start dependent services (MySQL, ES/Infinity, Redis, MinIO).
43+
```bash
44+
docker compose -f docker/docker-compose-base.yml up -d
45+
```
46+
- **Launch**:
47+
```bash
48+
source .venv/bin/activate
49+
export PYTHONPATH=$(pwd)
50+
bash docker/launch_backend_service.sh
51+
```
52+
53+
### Frontend (TypeScript/React)
54+
Located in `web/`.
55+
56+
1. **Install Dependencies**:
57+
```bash
58+
cd web
59+
npm install
60+
```
61+
62+
2. **Run Dev Server**:
63+
```bash
64+
npm run dev
65+
```
66+
Runs on port 8000 by default.
67+
68+
### Docker Deployment
69+
To run the full stack using Docker:
70+
```bash
71+
cd docker
72+
docker compose -f docker-compose.yml up -d
73+
```
74+
75+
## 4. Testing Instructions
76+
77+
### Backend Tests
78+
- **Run All Tests**:
79+
```bash
80+
uv run pytest
81+
```
82+
- **Run Specific Test**:
83+
```bash
84+
uv run pytest test/test_api.py
85+
```
86+
87+
### Frontend Tests
88+
- **Run Tests**:
89+
```bash
90+
cd web
91+
npm run test
92+
```
93+
94+
## 5. Coding Standards & Guidelines
95+
- **Python Formatting**: Use `ruff` for linting and formatting.
96+
```bash
97+
ruff check
98+
ruff format
99+
```
100+
- **Frontend Linting**:
101+
```bash
102+
cd web
103+
npm run lint
104+
```
105+
- **Pre-commit**: Ensure pre-commit hooks are installed.
106+
```bash
107+
pre-commit install
108+
pre-commit run --all-files
109+
```
110+

0 commit comments

Comments
 (0)