[Performance] Remove the redundant pd_op.assign_out_ op at the end of while loop #9002

lszxb · 2024-08-26T05:49:35Z

PR types

Performance optimization

PR changes

Others

Description

目前，根据的这里的相关描述，Paddle会在while循环的末尾为所有的循环变量添加pd_op.assign_out_算子，但这是不必要的。当进行LLM解码时，这会导致每解码一个token，kv cache都被无意义地复制一遍，降低了推理的速度。这个PR编写了一个PIR Pass移除了紧接着循环末尾的cf.yield算子的pd_op.assign_out_算子。使用predictor.py在llama2模型上进行了测试，模型能够正常输出结果，同时在处理长文本时有约10%的性能提升。

一个简化后的例子是，这个PIR Pass会对以下while循环体进行变换：

(%5) = "pd_op.while" (cond=%4, inputs=%2) { 
^%arg_0
    (%6) = "pd_op.full" () {dtype:(pd_op.DataType)float32,place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Double)1} : () -> builtin.tensor<1xf32>
    (%7) = "pd_op.scale" (%arg_0, %6) {bias:(Float)1,bias_after_scale:true} : (builtin.tensor<1xi64>, builtin.tensor<1xf32>) -> builtin.tensor<1xi64>
    (%8) = "pd_op.less_than" (%7, %1) {} : (builtin.tensor<1xi64>, builtin.tensor<1xi64>) -> builtin.tensor<1xb>
    (%9) = "pd_op.assign_out_" (%7, %arg_0) {} : (builtin.tensor<1xi64>, builtin.tensor<1xi64>) -> builtin.tensor<1xi64>
    (%10) = "pd_op.assign_out_" (%8, %4) {} : (builtin.tensor<1xb>, builtin.tensor<1xb>) -> builtin.tensor<1xb>
    () = "cf.yield" (%10, %9) {} : (builtin.tensor<1xb>, builtin.tensor<1xi64>) -> 
}

变换为

(%5) = "pd_op.while" (cond=%4, inputs=%2) { 
^%arg_0
    (%6) = "pd_op.full" () {dtype:(pd_op.DataType)float32,place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Double)1} : () -> builtin.tensor<1xf32>
    (%7) = "pd_op.scale" (%arg_0, %6) {bias:(Float)1,bias_after_scale:true} : (builtin.tensor<1xi64>, builtin.tensor<1xf32>) -> builtin.tensor<1xi64>
    (%8) = "pd_op.less_than" (%7, %1) {} : (builtin.tensor<1xi64>, builtin.tensor<1xi64>) -> builtin.tensor<1xb>
    () = "cf.yield" (%8, %7) {} : (builtin.tensor<1xb>, builtin.tensor<1xi64>) -> 
}

改进前的profiling，可以看到浅蓝色的D2D显存拷贝消耗了一部分时间

改进后的profiling

… loop, avoiding redundant kv cache copy

paddle-bot · 2024-08-26T05:49:40Z

Thanks for your contribution!

codecov · 2024-08-26T06:23:24Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.78%. Comparing base (81f5ab5) to head (a95f16e).
⚠️ Report is 792 commits behind head on develop.

❌ Your project status has failed because the head coverage (52.78%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9002      +/-   ##
===========================================
- Coverage    52.92%   52.78%   -0.15%     
===========================================
  Files          661      661              
  Lines       107069   106945     -124     
===========================================
- Hits         56670    56452     -218     
- Misses       50399    50493      +94

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ness

CLAassistant · 2024-09-18T10:38:10Z

All committers have signed the CLA.

github-actions · 2024-12-28T00:20:29Z

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动，被标记为stale。

paddle-bot · 2025-12-30T14:01:36Z

Automatically closed by Paddle-bot.

add a PIR pass to remove the pd_op.assign_out_ op at the end of while…

b1ebe45

… loop, avoiding redundant kv cache copy

lszxb mentioned this pull request Aug 28, 2024

[WeeklyReports] 2024.08.12~2024.08.25 周报汇总 PFCCLab/Camp#353

Closed

21 tasks

lszxb mentioned this pull request Sep 11, 2024

Add a inplace concat custom op based on CUDA VMM API #9126

Closed

ZHUI requested a review from DrownFish19 September 13, 2024 06:48

ZHUI changed the title ~~Remove the redundant pd_op.assign_out_ op at the end of while loop~~ [Performance] Remove the redundant pd_op.assign_out_ op at the end of while loop Sep 13, 2024

lszxb added 3 commits September 15, 2024 15:36

Merge branch 'develop' into fix_remove_assign_out_in_while_loop

21807cd

update remove_assign_out_pass, now it only trigger with cf.yield

20aa163

update remove_assign_out_pass: add more constraints to improve robust…

7058e45

…ness

Merge branch 'develop' into fix_remove_assign_out_in_while_loop

a95f16e

lszxb mentioned this pull request Oct 28, 2024

Add a inplace concat custom op based on CUDA VMM API (resubmitted) #9320

Closed

github-actions bot added the stale label Dec 28, 2024

paddle-bot bot closed this Dec 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] Remove the redundant pd_op.assign_out_ op at the end of while loop #9002

[Performance] Remove the redundant pd_op.assign_out_ op at the end of while loop #9002

Uh oh!

lszxb commented Aug 26, 2024 •

edited

Loading

Uh oh!

paddle-bot bot commented Aug 26, 2024

Uh oh!

codecov bot commented Aug 26, 2024 •

edited

Loading

Uh oh!

CLAassistant commented Sep 18, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Dec 28, 2024

Uh oh!

paddle-bot bot commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Performance] Remove the redundant pd_op.assign_out_ op at the end of while loop #9002

[Performance] Remove the redundant pd_op.assign_out_ op at the end of while loop #9002

Uh oh!

Conversation

lszxb commented Aug 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented Aug 26, 2024

Uh oh!

codecov bot commented Aug 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

CLAassistant commented Sep 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 28, 2024

Uh oh!

paddle-bot bot commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lszxb commented Aug 26, 2024 •

edited

Loading

codecov bot commented Aug 26, 2024 •

edited

Loading

CLAassistant commented Sep 18, 2024 •

edited

Loading