Code forked from luguoyixiazi/test_nine. This fork primarily adds Docker build configurations for convenience. Intended for personal testing only; usability is not guaranteed.
Numerous tutorials are available online. The recommended installation command is:
bash <(curl -sSL [https://linuxmirrors.cn/docker.sh](https://linuxmirrors.cn/docker.sh))
git clone [https://github.com/kafuneri/captcha-tools.git](https://github.com/kafuneri/captcha-tools.git)
cd captcha-tools
docker compose up -d # Build and run the image
PS: If using a server in mainland China, please configure a proxy or change package sources accordingly when building the image.
Using docker-compose.yaml
:
version: '3'
services:
captcha-tools:
image: kafuneri/captcha-tools:latest # Use captcha-tools:arm64 for arm64 devices
container_name: captcha-tools
network_mode: host # Set to host network mode
restart: always
5. Integrate with MihoyoBBSTools
Replace the captcha.py
file in your MihoyoBBSTools installation with the captcha.py provided in this repository.
This project is for learning and communication purposes only. Do not use it for commercial purposes. You are responsible for any consequences.
Model and V4 dataset: https://github.com/taisuii/ClassificationCaptchaOcr API: https://github.com/ravizhan/geetest-v3-click-crack
(Optional) a. If training with PaddlePaddle, you also need to install paddlex and the image classification module. Refer to the project https://github.com/PaddlePaddle/PaddleX for installation instructions.
(* Required!) b. Create a model
folder in the project root directory and place the model file(s) inside. Name them resnet18.onnx
or PP-HGNetV2-B4.onnx
. The default model used is PP-HGNetV2-B4.onnx
. If using ResNet, set use_v3_model
to False
in the code, as the model inputs/outputs differ (you may need to modify the code yourself).
pip install -r requirements.txt
- Refer to the referenced project above for dataset details. However, that project uses a V4 dataset. V3 lacks a demo; adapt as needed. Using a V4 dataset to train for V3 without code modification results in poor accuracy.
- The main difference is image dimensions. V4 APIs provide two images: a target image and a nine-square grid. V3 combines them, requiring target cropping. V3 target images have low clarity. V4 grid images, after removing black borders, are 100x86 pixels. V3 grid images are 112x112. It's unclear what transformations V4 applies compared to V3; modifying preprocessing is necessary.
This model was chosen arbitrarily from Paddle. The dataset format is as follows. If using a V4 dataset for V3 training, consider applying more data augmentation/transformations.
dataset
├─images # Path for all images
├─label.txt # Label file path, format per line: <index> <space> <class_name>, e.g., 15 Globe
├─train.txt # Training images list, format per line: <image_path> <space> <class_index>, e.g., images/001.jpg 0
└─Validation and test sets follow the same format
c. To crop V3 images, use crop_image_v3
in crop_image.py
. For V4, use crop_image
. Write your own cropping script as needed.
- To train ResNet18, run
python train.py
- To train PP-HGNetV2-B4, run
python train_paddle.py
- Run
python convert.py
(Modify the script internally to select the model you want to convert, usually the one with the lowest loss). - For Paddle models, you need to install
paddle2onnx
. See details at https://www.paddlepaddle.org.cn/documentation/docs/guides/advanced/model_to_onnx_cn.html
Run python main.py
(By default, it uses the Paddle ONNX model. Modify the comments/code if you want to use ResNet18).
Due to potential issues with trajectory generation, verification might succeed locally but fail on the target system. It is recommended to increase the number of retry attempts. The trained Paddle model achieves an accuracy above 99.9%.
Example Python call:
import httpx
def game_captcha(gt: str, challenge: str):
try:
res = httpx.get("[http://127.0.0.1:9645/pass_nine](http://127.0.0.1:9645/pass_nine)", params={'gt': gt, 'challenge': challenge, 'use_v3_model': True}, timeout=10)
res.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
datas = res.json().get('data', {})
if datas.get('result') == 'success':
return datas.get('validate')
except httpx.RequestError as exc:
print(f"An error occurred while requesting {exc.request.url!r}: {exc}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
return None # Returns None on failure, 'validate' string on success
代码fork自luguoyixiazi/test_nine,在此仅添加docker构建配置以方便使用,仅供个人测试使用,不保证可用性
教程很多,不再赘述,推荐使用
bash <(curl -sSL https://linuxmirrors.cn/docker.sh)
git clone https://github.com/kafuneri/captcha-tools.git
cd captcha-tools
docker compose up -d #构建并运行镜像
PS:国内服务器构建镜像时请自行配置代理或换源
使用docker-compose.yaml
:
version: '3'
services:
captcha-tools:
image: kafuneri/captcha-tools:latest # arm64设备使用captcha-tools:arm64
container_name: captcha-tools
network_mode: host # 设置为 host 网络模式
restart: always
5.对接MihoyoBBSTools
修改MihoyoBBSTools中的captcha.py为该项目中的captcha.py
模型及V4数据集:https://github.com/taisuii/ClassificationCaptchaOcr
api:https://github.com/ravizhan/geetest-v3-click-crack
(可选)a.如果要训练paddle的话还得安装paddlex及图像分类模块,安装看项目https://github.com/PaddlePaddle/PaddleX
(* 必选!)b.模型需要在项目目录下新建一个model文件夹,然后把模型文件放进去,具体命名可以是resnet18.onnx或者PP-HGNetV2-B4.onnx,默认使用PP-HGNetV2-B4模型,如果用resnet则use_v3_model设置为False,因为模型的输入输出不一样,可以自行修改
pip install -r requirements.txt
- 数据集详情参考上面标注的项目,但是上面项目是V4数据集,V3没有demo,自行发挥吧,用V4练V3不改代码正确率有点感人
- 主要是V4的尺寸和V3有差别,V4的api直接给两张图,一张是目标图,一张是九宫格,V3放在一起要切目标,且V3目标图清晰度很低,V4九宫格切了之后是100 * 86的图(去掉黑边),但是V3九宫格切的是112 * 112,不确定V4九宫格内容在V3基础上做了什么变换,反正改预处理就完事了
在paddle上随便找的,数据集格式如下,如果拿V4练V3,建议是多整点变换
dataset
├─images #所有图片存放路径
├─label.txt #标签路径,每一行数据格式为 <序号>+<空格>+<类别>,如15 地球仪
├─train.txt #训练图片,每一行数据格式为 <图片路径>+<空格>+<类别>,如images/001.jpg 0
└─验证集和测试集同上
- 训练resnet18运行
python train.py
- 如果训练PP-HGNetV2-B4运行
python train_paddle.py
- 运行
python convert.py
(自行进去修改需要转换的模型,一般是选loss小的) - paddle模型转换要装paddle2onnx,详情参见https://www.paddlepaddle.org.cn/documentation/docs/guides/advanced/model_to_onnx_cn.html
运行 python main.py
(默认用的paddle的onnx模型,如果要用resnet18可以自己改注释)
由于轨迹问题,可能会出现验证正确但是结果失败,所以建议增加retry次数,训练后的paddle模型正确率在99.9%以上
python调用如:
import httpx
def game_captcha(gt: str, challenge: str):
res = httpx.get("http://127.0.0.1:9645/pass_nine",params={'gt':gt,'challenge':challenge,'use_v3_model':True},timeout=10)
datas = res.json()['data']
if datas['result'] == 'success':
return datas['validate']
return None # 失败返回None 成功返回validate
欢迎大家支持我的其他项目喵~~~~~~~~