Skip to content

Setting parallel to tbb static partitioner or tbb auto partitoner by thread pool #30167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 102 commits into
base: master
Choose a base branch
from

Conversation

sunxiaoxia2022
Copy link
Contributor

@sunxiaoxia2022 sunxiaoxia2022 commented Apr 16, 2025

Details:

  • Setting parallel to TBB STATIC partitioner or TBB AUTO partitioner through by thread pool
  • ...

Tickets:

@sunxiaoxia2022 sunxiaoxia2022 requested review from a team as code owners April 16, 2025 03:43
@github-actions github-actions bot added category: Core OpenVINO Core (aka ngraph) category: CPU OpenVINO CPU plugin category: build OpenVINO cmake script / infra category: CPP API OpenVINO CPP API bindings labels Apr 16, 2025
@mlukasze mlukasze requested a review from ilya-lavrenov April 17, 2025 07:04
@maxnick maxnick self-assigned this Apr 22, 2025
@github-actions github-actions bot removed category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings labels Apr 30, 2025
const std::vector<impl_desc_type>& implPriorities)
: m_stream(dnnl::stream(engine)),
: m_stream(dnnl::threadpool_interop::make_stream(engine, threadPool)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of thoughts regarding the threadPool.

  1. In fact, the only difference between ThreadPool and CpuParallel is the interface. However, it's better to avoid inheriting dnnl specific interface in CpuParallel. At the same time, the functionality may be reused. Therefore, it seems it's better to reimplement ThreadPool as a wrapper of CpuParallel class, redirecting all the related calls to the underlying CpuParallel object, while maintaining the dnnl specific interface.
  2. Having ThreadPool a wrapper over CpuParallel, we don't really need to store it in the context, but rather instance it on the fly (it will only store a pointer to corresponding CpuParallel object) from the available from the context CpuParallel object.
  3. How are we going to maintain the OMP build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions.

  1. Updated.
  2. Considering the pointer of ThreadPool was used to make judgement in Onednn api activate_threadpool. So I still store it in the context.
  3. Updated, supports OMP now.
    Please have a look, thank you!

@maxnick maxnick requested a review from Copilot July 25, 2025 08:08
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces TBB static and auto partitioner support for the Intel CPU plugin through a new thread pool implementation. The changes enable users to select between static and auto partitioning strategies for TBB-based threading operations, with automatic partitioner selection based on model characteristics and hardware configurations.

Key changes:

  • Add new TbbPartitioner property for controlling TBB partitioning strategy
  • Implement thread pool infrastructure with partitioner-aware parallel execution
  • Update configuration management to handle partitioner selection based on model analysis
  • Integrate thread pool usage throughout the codebase for consistent partitioning behavior

Reviewed Changes

Copilot reviewed 93 out of 93 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/plugins/intel_cpu/src/thread_pool_imp.hpp New thread pool interface with partitioner-aware parallel execution
src/plugins/intel_cpu/src/cpu_parallel.hpp CpuParallel class implementing TBB partitioner selection logic
src/plugins/intel_cpu/src/config.h Added TbbPartitioner configuration property
src/plugins/intel_cpu/src/cpu_streams_calculation.cpp Enhanced model analysis for automatic partitioner selection
src/plugins/intel_cpu/src/plugin.cpp Updated property registration and caching for TBB partitioner

@@ -1050,7 +1049,7 @@ void DeformableConvolution::DefConvExecutor::prepareSamplingWeights(const float*
}
};

parallel_nd(MB, DG, OH, OW, [&](dim_t mb, dim_t dg, dim_t oh, dim_t ow) {
parallel_for4d(MB, DG, OH, OW, [&](dim_t mb, dim_t dg, dim_t oh, dim_t ow) {
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function call 'parallel_for4d' is used but there's no corresponding 'cpu_parallel' context available. This should likely use 'cpu_parallel->parallel_for4d' instead.

Copilot uses AI. Check for mistakes.

@@ -1229,7 +1228,7 @@ void DeformableConvolution::DefConvRefExecutor::exec(const float* src,
return d;
};

parallel_nd(G, MB, OC, OH, OW, [&](dnnl_dim_t g, dnnl_dim_t mb, dnnl_dim_t oc, dnnl_dim_t oh, dnnl_dim_t ow) {
parallel_for5d(G, MB, OC, OH, OW, [&](dnnl_dim_t g, dnnl_dim_t mb, dnnl_dim_t oc, dnnl_dim_t oh, dnnl_dim_t ow) {
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function call 'parallel_for5d' is used but there's no corresponding 'cpu_parallel' context available. This should likely use 'cpu_parallel->parallel_for5d' instead.

Suggested change
parallel_for5d(G, MB, OC, OH, OW, [&](dnnl_dim_t g, dnnl_dim_t mb, dnnl_dim_t oc, dnnl_dim_t oh, dnnl_dim_t ow) {
cpu_parallel->parallel_for5d(G, MB, OC, OH, OW, [&](dnnl_dim_t g, dnnl_dim_t mb, dnnl_dim_t oc, dnnl_dim_t oh, dnnl_dim_t ow) {

Copilot uses AI. Check for mistakes.

@maxnick
Copy link
Contributor

maxnick commented Jul 28, 2025

@EgorDuplensky , could you please review?

@@ -248,6 +248,7 @@ CPU::CPU() {
_numa_nodes,
_sockets,
_cores,
_blocked_cores,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this part which is merged in master branch in other PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings category: CPU OpenVINO CPU plugin category: inference OpenVINO Runtime library - Inference category: Python API OpenVINO Python bindings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants