-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Setting parallel to tbb static partitioner or tbb auto partitoner by thread pool #30167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Setting parallel to tbb static partitioner or tbb auto partitoner by thread pool #30167
Conversation
const std::vector<impl_desc_type>& implPriorities) | ||
: m_stream(dnnl::stream(engine)), | ||
: m_stream(dnnl::threadpool_interop::make_stream(engine, threadPool)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of thoughts regarding the threadPool
.
- In fact, the only difference between
ThreadPool
andCpuParallel
is the interface. However, it's better to avoid inheriting dnnl specific interface inCpuParallel
. At the same time, the functionality may be reused. Therefore, it seems it's better to reimplementThreadPool
as a wrapper ofCpuParallel
class, redirecting all the related calls to the underlyingCpuParallel
object, while maintaining the dnnl specific interface. - Having
ThreadPool
a wrapper overCpuParallel
, we don't really need to store it in the context, but rather instance it on the fly (it will only store a pointer to correspondingCpuParallel
object) from the available from the contextCpuParallel
object. - How are we going to maintain the OMP build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions.
- Updated.
- Considering the pointer of
ThreadPool
was used to make judgement in Onednn apiactivate_threadpool
. So I still store it in the context. - Updated, supports OMP now.
Please have a look, thank you!
…xiaoxia2022/openvino into xiaoxia/auto_tbb_thread_pool
…xiaoxia2022/openvino into xiaoxia/auto_tbb_thread_pool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces TBB static and auto partitioner support for the Intel CPU plugin through a new thread pool implementation. The changes enable users to select between static and auto partitioning strategies for TBB-based threading operations, with automatic partitioner selection based on model characteristics and hardware configurations.
Key changes:
- Add new TbbPartitioner property for controlling TBB partitioning strategy
- Implement thread pool infrastructure with partitioner-aware parallel execution
- Update configuration management to handle partitioner selection based on model analysis
- Integrate thread pool usage throughout the codebase for consistent partitioning behavior
Reviewed Changes
Copilot reviewed 93 out of 93 changed files in this pull request and generated 6 comments.
Show a summary per file
File | Description |
---|---|
src/plugins/intel_cpu/src/thread_pool_imp.hpp | New thread pool interface with partitioner-aware parallel execution |
src/plugins/intel_cpu/src/cpu_parallel.hpp | CpuParallel class implementing TBB partitioner selection logic |
src/plugins/intel_cpu/src/config.h | Added TbbPartitioner configuration property |
src/plugins/intel_cpu/src/cpu_streams_calculation.cpp | Enhanced model analysis for automatic partitioner selection |
src/plugins/intel_cpu/src/plugin.cpp | Updated property registration and caching for TBB partitioner |
@@ -1050,7 +1049,7 @@ void DeformableConvolution::DefConvExecutor::prepareSamplingWeights(const float* | |||
} | |||
}; | |||
|
|||
parallel_nd(MB, DG, OH, OW, [&](dim_t mb, dim_t dg, dim_t oh, dim_t ow) { | |||
parallel_for4d(MB, DG, OH, OW, [&](dim_t mb, dim_t dg, dim_t oh, dim_t ow) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function call 'parallel_for4d' is used but there's no corresponding 'cpu_parallel' context available. This should likely use 'cpu_parallel->parallel_for4d' instead.
Copilot uses AI. Check for mistakes.
@@ -1229,7 +1228,7 @@ void DeformableConvolution::DefConvRefExecutor::exec(const float* src, | |||
return d; | |||
}; | |||
|
|||
parallel_nd(G, MB, OC, OH, OW, [&](dnnl_dim_t g, dnnl_dim_t mb, dnnl_dim_t oc, dnnl_dim_t oh, dnnl_dim_t ow) { | |||
parallel_for5d(G, MB, OC, OH, OW, [&](dnnl_dim_t g, dnnl_dim_t mb, dnnl_dim_t oc, dnnl_dim_t oh, dnnl_dim_t ow) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function call 'parallel_for5d' is used but there's no corresponding 'cpu_parallel' context available. This should likely use 'cpu_parallel->parallel_for5d' instead.
parallel_for5d(G, MB, OC, OH, OW, [&](dnnl_dim_t g, dnnl_dim_t mb, dnnl_dim_t oc, dnnl_dim_t oh, dnnl_dim_t ow) { | |
cpu_parallel->parallel_for5d(G, MB, OC, OH, OW, [&](dnnl_dim_t g, dnnl_dim_t mb, dnnl_dim_t oc, dnnl_dim_t oh, dnnl_dim_t ow) { |
Copilot uses AI. Check for mistakes.
@EgorDuplensky , could you please review? |
@@ -248,6 +248,7 @@ CPU::CPU() { | |||
_numa_nodes, | |||
_sockets, | |||
_cores, | |||
_blocked_cores, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this part which is merged in master branch in other PR.
Details:
Tickets: