Skip to content

Conversation

byungilm
Copy link
Contributor

  • Fusing Eltwise to Broadcast
  • Implemented fusing operation to broadcast_ref kernel

Description of the issue

  • From analysis of Qwen-Reranker (CVS-173218), Eltwise occupied 6% execution time with ref kernel.
    it can be optimized out.

The code and line that caused this issue

  • Modified broadcast ref kernel : kernel_selector/cl_kernels/broadcast_gpu_ref.cl
  • Added condition for fusing Eltwise to Broadcast : graph_optimizer/prepare_primitive_fusing.cpp
  • Added logic to fused post-ops : broadcast/broadcast_kernel_base.cpp

Reproduction step and snapshot

  • target model QWen3-Reranker-0.6B is in openvino_notebook (notebooks/Qwen3-embedding)
  • Reproduced by benchmark_app
    ./benchmark_app -m openvino_notebooks/notebooks/qwen3-embedding/Qwen3-Reranker-0.6B/FP16/openvino_model.xml -shape [64,256] -d GPU -hint latency -api sync -nireq 1 -niter 1

Checklist

  • Is it a proper fix?
  • Did you include test case for this fix, if necessary?
  • Did you review existing test that can be extended to cover this scenario?

Tickets:

+ Fusing Eltwise to Broadcast
+ Implemented fusing operation to broadcast_ref kernel

Signed-off-by: Min, Byungil <[email protected]>
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Oct 13, 2025
@byungilm byungilm marked this pull request as ready for review October 13, 2025 15:09
@byungilm byungilm requested review from a team as code owners October 13, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant