【热身打卡】FastDeploy 编译打卡

## 🚀 FastDeploy 热身打卡活动
—— 从源码编译开始，解锁高性能推理框架开发之路

**各位飞桨开发者大家好！**

为了帮助更多小伙伴快速进入 FastDeploy 二次开发生态，熟悉大型框架的工程结构与编译流程，飞桨社区特推出本次 **FastDeploy 热身打卡活动**。
通过亲手完成一次完整的 FastDeploy 编译与打包流程，你将正式具备参与 FastDeploy 套件开发的基础能力。

参与热身打卡活动并按照邮件模板格式将截图发送至 **[ext_paddle_oss@baidu.com](ext_paddle_oss@baidu.com)** ，还可获得社区认可与后续任务推荐资格。～

## 🧩 活动目标
通过本次打卡，你将掌握：
- **FastDeploy 源码结构**
- **Paddle 运行时与 FastDeploy 的依赖关系**
- **自定义算子编译机制**
- **wheel 构建与分发流程**
- **二次编译优化与开发调试效率提升方法**

注：本次热身打卡活动需要使用**GPU A800硬件**，赶快行动起来吧~也可 [申请AI Studio开发资源](https://github.com/PaddlePaddle/community/blob/master/pfcc/call-for-contributions/README.md#飞桨线上开发环境ai-studio)～

## 🧰 准备环境
### 1. 安装 PaddlePaddle

以 GPU 版本为例（CPU 同理）：

```bash
python -m pip install paddlepaddle-gpu==3.3.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
``` 

### 2. 克隆 FastDeploy 源码

```bash
git clone https://github.com/PaddlePaddle/FastDeploy
cd FastDeploy
```

### 3. 编译打卡流程
> 所有关键步骤需加 `time` 记录耗时，并截图保存。
#### Step 1：执行 FastDeploy 编译与打包
```bash
# 参数说明
# 第1个参数: 是否构建 wheel（1=构建，0=仅编译）
# 第2个参数: Python 解释器
# 第3个参数: 是否编译 CPU BF16 算子
# 第4个参数: GPU 架构（如 [80,90]）

time MAX_JOBS=24 bash build.sh 1 python false [80]
```
编译完成后，产物位于：
`FastDeploy/dist/`

#### Step 2：初次编译/二次编译

初次编译时间较长，二次编译因为有编译缓存的存在，时间会缩短，对日常开发来说，二次编译时间才是影响开发效率的。让我们来感受下修改不同文件的二次编译时间。

- 修改kernel_traits的头文件：custom_ops/gpu_ops/flash_mask_attn/kernel_traits.h
- 修改transfer_output的cc文件：custom_ops/gpu_ops/transfer_output.cc
- 修改python文件：FastDeploy/custom_ops/setup_ops.py

二次编译方式：对应文件加一个空行/空格保存退出后，然后执行编译命令`time MAX_JOBS=24 bash build.sh 0 python false [80]`。

### Step 3：安装whl包

```bash
pip install FastDeploy/dist/xxx.whl
```

### Step 4：运行单元测试

```bash
python tests/layers/test_ffn.py
```

## 邮件格式

标题： [Hackathon-FastDeploy 热身打卡]

内容：

飞桨团队你好，

【GitHub ID】：XXX

【打卡内容】：初次编译/二次编译/安装whl包/运行单元测试

【打卡截图】：



如：

标题： [Hackathon-FastDeploy 热身打卡]

内容：

飞桨团队你好，

【GitHub ID】：XXX（例如：paddle-hack）

【打卡内容】：初次编译&二次编译&安装whl包&运行单元测试

【打卡截图】：

| 硬件 | <img width="960" height="809" alt="Image" src="https://github.com/user-attachments/assets/e48f479f-d303-45d4-ad2b-cfb1fc3247ae" />|
| ------------------ | ------------------------------------------------------------ |
| 编译方式 | 参考【编译】文档（[源码编译文档](https://github.com/PaddlePaddle/FastDeploy/blob/develop/docs/zh/get_started/installation/nvidia_gpu.md#4-wheel包源码编译)） |
| 初次编译命令和时间 | 命令：`time MAX_JOBS=24 bash build.sh 1 python false [80]`时间：以下时间仅作为示例，不代表真实的初次编译时间<img width="1226" height="136" alt="Image" src="https://github.com/user-attachments/assets/0273f92e-70f6-44b5-8edd-156114f16454" />|
| 二次编译时间 | 时间：以下时间仅作为示例，不代表真实的二次编译时间`custom_ops/gpu_ops/flash_mask_attn/kernel_traits.h``custom_ops/gpu_ops/transfer_output.cc``FastDeploy/custom_ops/setup_ops.py`<img width="960" height="860" alt="Image" src="https://github.com/user-attachments/assets/089bfb56-14b7-4c37-99ba-6fc1aababfb8" />|
| 安装whl包 | 编译完成后，产物位于 `FastDeploy/dist/xxx.whl`<img width="1110" height="316" alt="Image" src="https://github.com/user-attachments/assets/e0f7c8f8-1800-4345-81f0-23f92fbf5404" /><img width="960" height="222" alt="Image" src="https://github.com/user-attachments/assets/681eb9e4-13ba-4d0a-aa46-e0b65155fdd4" />|
| 运行单元测试 | 以 `tests/layers/test_ffn.py` 为例，由于aistudio环境的影响因素，可以把 `quant_config = BlockWiseFP8Config(weight_block_size=[128, 128])` 移动至 `self.fd_config` 外。（不影响模型结构测试） `python tests/layers/test_ffn.py` <img width="777" height="531" alt="Image" src="https://github.com/user-attachments/assets/046dde5f-5f3f-40c1-97b8-c0b509f0fb9c" /><img width="1121" height="831" alt="Image" src="https://github.com/user-attachments/assets/d350f154-9613-4236-859a-bb1128f4fdf1" />|

## 优秀作品

**期待你的跑通 “踩坑” 文档能成为大家的教程哦～**

## 参与飞桨框架贡献

如果你已经顺利完成了打卡，具备了基础的框架和套件开发知识，你就可以参与飞桨社区丰富的开发任务，为一个大型开源项目做贡献，同时收获飞桨社区开发者的认可与各种福利。传送门：

- [ ] https://github.com/orgs/PaddlePaddle/projects/7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【热身打卡】FastDeploy 编译打卡 #6225

🚀 FastDeploy 热身打卡活动

🧩 活动目标

🧰 准备环境

1. 安装 PaddlePaddle

2. 克隆 FastDeploy 源码

3. 编译打卡流程

Step 1：执行 FastDeploy 编译与打包

Step 2：初次编译/二次编译

Step 3：安装whl包

Step 4：运行单元测试

邮件格式

优秀作品

参与飞桨框架贡献

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

硬件
编译方式	参考【编译】文档（源码编译文档）
初次编译命令和时间	命令：`time MAX_JOBS=24 bash build.sh 1 python false [80]` 时间：以下时间仅作为示例，不代表真实的初次编译时间
二次编译时间	时间：以下时间仅作为示例，不代表真实的二次编译时间 `custom_ops/gpu_ops/flash_mask_attn/kernel_traits.h` `custom_ops/gpu_ops/transfer_output.cc` `FastDeploy/custom_ops/setup_ops.py`
安装whl包	编译完成后，产物位于 `FastDeploy/dist/xxx.whl`
运行单元测试	以 `tests/layers/test_ffn.py` 为例，由于aistudio环境的影响因素，可以把 `quant_config = BlockWiseFP8Config(weight_block_size=[128, 128])` 移动至 `self.fd_config` 外。（不影响模型结构测试） `python tests/layers/test_ffn.py`

【热身打卡】FastDeploy 编译打卡 #6225

Description

🚀 FastDeploy 热身打卡活动

🧩 活动目标

🧰 准备环境

1. 安装 PaddlePaddle

2. 克隆 FastDeploy 源码

3. 编译打卡流程

Step 1：执行 FastDeploy 编译与打包

Step 2：初次编译/二次编译

Step 3：安装whl包

Step 4：运行单元测试

邮件格式

优秀作品

参与飞桨框架贡献

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions