Skip to content

Conversation

@lszxb
Copy link
Contributor

@lszxb lszxb commented Aug 26, 2024

PR types

Performance optimization

PR changes

Others

Description

目前,根据的这里的相关描述,Paddle会在while循环的末尾为所有的循环变量添加pd_op.assign_out_算子,但这是不必要的。当进行LLM解码时,这会导致每解码一个token,kv cache都被无意义地复制一遍,降低了推理的速度。这个PR编写了一个PIR Pass移除了紧接着循环末尾的cf.yield算子的pd_op.assign_out_算子。使用predictor.py在llama2模型上进行了测试,模型能够正常输出结果,同时在处理长文本时有约10%的性能提升。

一个简化后的例子是,这个PIR Pass会对以下while循环体进行变换:

(%5) = "pd_op.while" (cond=%4, inputs=%2) { 
^%arg_0
    (%6) = "pd_op.full" () {dtype:(pd_op.DataType)float32,place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Double)1} : () -> builtin.tensor<1xf32>
    (%7) = "pd_op.scale" (%arg_0, %6) {bias:(Float)1,bias_after_scale:true} : (builtin.tensor<1xi64>, builtin.tensor<1xf32>) -> builtin.tensor<1xi64>
    (%8) = "pd_op.less_than" (%7, %1) {} : (builtin.tensor<1xi64>, builtin.tensor<1xi64>) -> builtin.tensor<1xb>
    (%9) = "pd_op.assign_out_" (%7, %arg_0) {} : (builtin.tensor<1xi64>, builtin.tensor<1xi64>) -> builtin.tensor<1xi64>
    (%10) = "pd_op.assign_out_" (%8, %4) {} : (builtin.tensor<1xb>, builtin.tensor<1xb>) -> builtin.tensor<1xb>
    () = "cf.yield" (%10, %9) {} : (builtin.tensor<1xb>, builtin.tensor<1xi64>) -> 
}

变换为

(%5) = "pd_op.while" (cond=%4, inputs=%2) { 
^%arg_0
    (%6) = "pd_op.full" () {dtype:(pd_op.DataType)float32,place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Double)1} : () -> builtin.tensor<1xf32>
    (%7) = "pd_op.scale" (%arg_0, %6) {bias:(Float)1,bias_after_scale:true} : (builtin.tensor<1xi64>, builtin.tensor<1xf32>) -> builtin.tensor<1xi64>
    (%8) = "pd_op.less_than" (%7, %1) {} : (builtin.tensor<1xi64>, builtin.tensor<1xi64>) -> builtin.tensor<1xb>
    () = "cf.yield" (%8, %7) {} : (builtin.tensor<1xb>, builtin.tensor<1xi64>) -> 
}

改进前的profiling,可以看到浅蓝色的D2D显存拷贝消耗了一部分时间
image
改进后的profiling
image

@paddle-bot
Copy link

paddle-bot bot commented Aug 26, 2024

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Aug 26, 2024

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.78%. Comparing base (81f5ab5) to head (a95f16e).
⚠️ Report is 792 commits behind head on develop.

❌ Your project status has failed because the head coverage (52.78%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9002      +/-   ##
===========================================
- Coverage    52.92%   52.78%   -0.15%     
===========================================
  Files          661      661              
  Lines       107069   106945     -124     
===========================================
- Hits         56670    56452     -218     
- Misses       50399    50493      +94     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ZHUI ZHUI requested a review from DrownFish19 September 13, 2024 06:48
@ZHUI ZHUI changed the title Remove the redundant pd_op.assign_out_ op at the end of while loop [Performance] Remove the redundant pd_op.assign_out_ op at the end of while loop Sep 13, 2024
@CLAassistant
Copy link

CLAassistant commented Sep 18, 2024

CLA assistant check
All committers have signed the CLA.

@github-actions
Copy link

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。

@github-actions github-actions bot added the stale label Dec 28, 2024
@paddle-bot paddle-bot bot closed this Dec 30, 2025
@paddle-bot
Copy link

paddle-bot bot commented Dec 30, 2025

Automatically closed by Paddle-bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants