SmolVLM2-500M-Video-Instruct.axera

HuggingFaceTB SmolVLM2-500M-Video-Instruct DEMO on Axera.

目前支持 Python 语言, C++ 代码在开发中.
预编译模型可以从百度网盘下载.
如需自行导出编译 VIT 模型请参考模型转换.

支持平台

AX650N
AX630C

Git Clone

首先使用如下命令 clone 本项目, 然后进入 python 文件夹:

$ git clone [email protected]:AXERA-TECH/SmolVLM2-500M-Video-Instruct.axera.git
$ cd SmolVLM2-500M-Video-Instruct.axera/python

之后在开发板上下载或安装以下支持库:

从 huggingface 下载 SmolVLM2-500M-Video-Instruct 模型.

$ git clone https://huggingface.co/HuggingFaceTB/SmolVLM2-500M-Video-Instruct

在开发板上安装配置 pyaxengine, 点击跳转下载链接. 注意板端 SDK 最低版本要求:
- AX650 SDK >= 2.18
- AX620E SDK >= 3.12
- 执行 pip3 install axengine-x.x.x-py3-none-any.whl 安装

将下载后的预编译模型解压到当前文件夹[🔔可选], 默认文件夹排布如下:

(lerobot) ➜  python git:(master) ✗ tree -L 2 .
.
├── infer_axmodel.py
├── infer.py
├── SmolVLM2-500M-Video-Instruct
│   ├── added_tokens.json
│   ├── chat_template.json
│   ├── config.json
│   ├── generation_config.json
│   ├── merges.txt
│   ├── model.safetensors
│   ├── onnx
│   ├── preprocessor_config.json
│   ├── processor_config.json
│   ├── README.md
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   ├── tokenizer.json
│   └── vocab.json
├── SmolVLM2-500M-Video-Instruct_axmodel
│   ├── llama_p128_l0_together.axmodel
│   ├── llama_p128_l10_together.axmodel
│   ├── llama_p128_l11_together.axmodel
│   ├── llama_p128_l12_together.axmodel
│   ├── llama_p128_l13_together.axmodel
│   ├── llama_p128_l14_together.axmodel
│   ├── llama_p128_l15_together.axmodel
│   ├── llama_p128_l16_together.axmodel
│   ├── llama_p128_l17_together.axmodel
│   ├── llama_p128_l18_together.axmodel
│   ├── llama_p128_l19_together.axmodel
│   ├── llama_p128_l1_together.axmodel
│   ├── llama_p128_l20_together.axmodel
│   ├── llama_p128_l21_together.axmodel
│   ├── llama_p128_l22_together.axmodel
│   ├── llama_p128_l23_together.axmodel
│   ├── llama_p128_l24_together.axmodel
│   ├── llama_p128_l25_together.axmodel
│   ├── llama_p128_l26_together.axmodel
│   ├── llama_p128_l27_together.axmodel
│   ├── llama_p128_l28_together.axmodel
│   ├── llama_p128_l29_together.axmodel
│   ├── llama_p128_l2_together.axmodel
│   ├── llama_p128_l30_together.axmodel
│   ├── llama_p128_l31_together.axmodel
│   ├── llama_p128_l3_together.axmodel
│   ├── llama_p128_l4_together.axmodel
│   ├── llama_p128_l5_together.axmodel
│   ├── llama_p128_l6_together.axmodel
│   ├── llama_p128_l7_together.axmodel
│   ├── llama_p128_l8_together.axmodel
│   ├── llama_p128_l9_together.axmodel
│   ├── llama_post.axmodel
│   └── model.embed_tokens.weight.npy
├── SmolVLMVisionEmbeddings.pkl
├── utils
│   └── infer_func.py
└── vit-models
    ├── vision_model.axmodel
    └── vision_model.onnx [可选]

上板部署

AX650N 的设备已预装 Ubuntu 22.04
以 root 权限登陆 AX650N 的板卡设备
接入互联网, 确保 AX650N 的设备能正常执行 apt install, pip install 等指令
已验证设备: AX650N DEMO Board、爱芯派Pro(AX650N)

Python API 运行

Requirements

$ mkdir /opt/site-packages
$ cd python
$ pip3 install -r requirements.txt --prefix=/opt/site-packages

添加环境变量

将以下两行添加到 /root/.bashrc(实际添加的路径需要自行检查)后, 重新连接终端或者执行 source ~/.bashrc

$ export PYTHONPATH=$PYTHONPATH:/opt/site-packages/local/lib/python3.10/dist-packages  
$ export PATH=$PATH:/opt/site-packages/local/bin

运行

在 Axera 开发板 上运行以下命令开启图像理解功能:

$ cd SmolVLM2-500M-Video-Instruct.axera/python
$ python3 infer_axmodel.py

输入图像为:

通过命令行参数可以手动指定图像路径, 模型推理结果如下:

$ python3 infer_axmodel.py -i ../assets/girl.png --vit_model vit-models/vision_model.axmodel

Model loaded successfully!
slice_indices: [0, 1, 2, 3, 4, 5, 6, 7, 8]
Slice prefill done: 0
Slice prefill done: 1
Slice prefill done: 2
Slice prefill done: 3
Slice prefill done: 4
Slice prefill done: 5
Slice prefill done: 6
Slice prefill done: 7
Slice prefill done: 8
answer >>  The image depicts a young woman with long, light gray hair, adorned with two pink flowers in her hair. She is standing on a beach, facing the camera with a neutral expression. The woman is wearing a blue off-shoulder dress that is open at the front, revealing^@ a white lace top underneath. She is also wearing a silver choker necklace and a silver bracelet on her left wrist.

The background of the image reveals a clear blue sky with fluffy white clouds, suggesting a sunny day. The ocean is visible in the distance, with gentle waves crashing onto the shore. The overall scene suggests a serene and peaceful beach setting.

The woman's attire and accessories, along with the serene ocean and clear sky, create a calm and picturesque atmosphere. The image does not contain any^@ discernible text or additional objects. The relative positions of the objects suggest that the woman is standing in the foreground, with the ocean and sky in the background. The image does not provide any information that would allow for a specific question to be answered definitively.

图像理解任务·推理耗时统计

该模型一共有 32 层 decode layer, 详细耗时信息如下:

Model	Time
ImageEncoder	1830 ms
Prefill TTFT	2892.151 ms
Decoder	27.51 ms

其中:

Prefill 阶段, 每一层的 llama_layer 最大耗时 90.3 ms.

各个子图耗时:

g1: 3.143 ms
g2: 4.909 ms
g3: 6.610 ms
g4: 8.263 ms
g5: 9.997 ms
g6: 11.819 ms
g7: 13.579 ms
g8: 15.096 ms
g9: 16.814 ms

Decoder 阶段, 每一层的 llama_layer 平均耗时 0.780 ms ms.
llama_post 耗时 2.551 ms.

模型解码速度为: 1000 / 27.51 ms = 36.35 tokens/s.

技术讨论

Github issues
QQ 群: 139953715

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
model_convert		model_convert
python		python
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SmolVLM2-500M-Video-Instruct.axera

支持平台

Git Clone

上板部署

Python API 运行

Requirements

添加环境变量

运行

图像理解任务·推理耗时统计

技术讨论

About

Uh oh!

Releases 1

Packages

Languages

License

AXERA-TECH/SmolVLM2-500M-Video-Instruct.axera

Folders and files

Latest commit

History

Repository files navigation

SmolVLM2-500M-Video-Instruct.axera

支持平台

Git Clone

上板部署

Python API 运行

Requirements

添加环境变量

运行

图像理解任务·推理耗时统计

技术讨论

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages