Conversation
|
CI MESSAGE: [44656169]: BUILD STARTED |
|
CI MESSAGE: [44667089]: BUILD STARTED |
|
CI MESSAGE: [44667089]: BUILD FAILED |
|
CI MESSAGE: [44656169]: BUILD PASSED |
8fe96ee to
d879927
Compare
|
CI MESSAGE: [44698015]: BUILD STARTED |
|
CI MESSAGE: [44698471]: BUILD STARTED |
|
CI MESSAGE: [44717682]: BUILD STARTED |
|
CI MESSAGE: [44717682]: BUILD FAILED |
|
CI MESSAGE: [44719838]: BUILD STARTED |
|
CI MESSAGE: [44698471]: BUILD FAILED |
|
CI MESSAGE: [44719838]: BUILD FAILED |
|
CI MESSAGE: [44719838]: BUILD PASSED |
193e754 to
e485702
Compare
|
CI MESSAGE: [45440403]: BUILD STARTED |
Greptile SummaryThis PR refactors DALI's thread pool infrastructure: Key observations from this review cycle:
Confidence Score: 4/5
Important Files Changed
Class Diagram%%{init: {'theme': 'neutral'}}%%
classDiagram
class ThisThreadIdx {
+this_thread_idx() int
}
class ThreadPool {
<<abstract interface>>
+AddWork(void(int), priority) void
+AddWork(void(), priority) void
+RunAll(wait) void
+WaitForWork() void
+NumThreads() int
+GetThreadIds() vector
}
class OldThreadPool {
+AddWork(WorkWithThreadIdx, priority) void
+AddWork(Work, priority) void
+RunAll(wait) void
+WaitForWork() void
-WaitForWork(checkForErrors) void
-ThreadMain(thread_id, ...) void
}
class ThreadPoolBase {
+Init(num_threads, on_thread_start) void
+AddTask(TaskFunc) void
+NumThreads() int
+GetThreadIds() vector
#Shutdown(join) void
}
class NewThreadPool {
+NewThreadPool(num_threads, device_id, set_affinity, name)
-OnThreadStart(thread_idx, set_affinity) any
-device_id_ optional~int~
-name_ string
-nvml_handle_ NvmlInstance
}
class ThreadPoolFacade {
+AddWork(void(int), priority) void
+AddWork(void(), priority) void
+RunAll(wait) void
+WaitForWork() void
+NumThreads() int
+GetThreadIds() vector
-tp_* ThreadPoolBase
-jobs_ list~Job~
}
class Executor2Impl {
-old_tp_ unique_ptr~OldThreadPool~
-new_tp_ unique_ptr~NewThreadPool~
-thread_pool_wrappers_ vector~unique_ptr~ThreadPool~~
+SetupThreadPool() void
}
ThisThreadIdx <|-- ThreadPool
ThisThreadIdx <|-- ThreadPoolBase
ThreadPool <|-- OldThreadPool
ThreadPool <|-- ThreadPoolFacade
ThreadPoolBase <|-- NewThreadPool
ThreadPoolFacade --> ThreadPoolBase : delegates via tp_*
Executor2Impl --> NewThreadPool : owns (DALI_USE_NEW_THREAD_POOL=1)
Executor2Impl --> OldThreadPool : owns (default)
Executor2Impl --> ThreadPoolFacade : owns N facades (one per CPU op)
Last reviewed commit: 9f0e8d7 |
| if (UseNewThreadPool()) { | ||
| std::cerr << "\n!!! Forced use of NewThreadPool !!!" << std::endl; | ||
| if (graph_info_.num_cpu > 0) { |
There was a problem hiding this comment.
Debug std::cerr left in production code
This diagnostic print was presumably added during development but should not be shipped:
| if (UseNewThreadPool()) { | |
| std::cerr << "\n!!! Forced use of NewThreadPool !!!" << std::endl; | |
| if (graph_info_.num_cpu > 0) { | |
| if (UseNewThreadPool()) { | |
| if (graph_info_.num_cpu > 0) { |
Remove the std::cerr line entirely — it will spam every user's stderr whenever they set DALI_USE_NEW_THREAD_POOL=1.
There was a problem hiding this comment.
@greptileai This will be removed before merging, but is necessary to validate that the flag is properly set and used in CI.
There was a problem hiding this comment.
Got it! That makes sense for CI validation. Consider adding a TODO comment on that line so it's clear it needs to be removed:
| if (UseNewThreadPool()) { | |
| std::cerr << "\n!!! Forced use of NewThreadPool !!!" << std::endl; | |
| if (graph_info_.num_cpu > 0) { | |
| // TODO: Remove before merge - only for CI validation | |
| std::cerr << "\n!!! Forced use of NewThreadPool !!!" << std::endl; |
This way it won't accidentally slip through code review.
|
CI MESSAGE: [45440403]: BUILD FAILED |
|
CI MESSAGE: [45700206]: BUILD STARTED |
|
CI MESSAGE: [45700206]: BUILD FAILED |
|
CI MESSAGE: [45719366]: BUILD STARTED |
|
CI MESSAGE: [45791241]: BUILD STARTED |
|
CI MESSAGE: [45791241]: BUILD FAILED |
|
CI MESSAGE: [45785260]: BUILD FAILED |
|
CI MESSAGE: [45785260]: BUILD PASSED |
| vid_files = ["sintel_trailer-720p_2.mp4"] | ||
| vid_filenames = [os.path.join(vid_dir, vid_file) for vid_file in vid_files] | ||
|
|
||
| concurrency = OperatorConcurrency.FULL |
There was a problem hiding this comment.
This enables testing of the parallel execution of CPU operators.
There was a problem hiding this comment.
Is this intentional that only FULL concurrency is tested with auto augment tests? I would expect to exercise at least default concurrency as well.
|
|
||
| virtual void AddWork(std::function<void(int)> work, int64_t priority = 0) = 0; | ||
|
|
||
| virtual void AddWork(std::function<void()> work, int64_t priority = 0) = 0; |
There was a problem hiding this comment.
This is for future refactoring - in the end, most of the time we don't need the thread index and the new thread pool doesn't provide one to the callback. It adds one more layer of function wrapping, which is an avoidable cost. When the new thread pool becomes the default, we'll refactor the code to remove the thread_idx parameter from task functions.
|
CI MESSAGE: [45791241]: BUILD PASSED |
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
---- Signed-off-by: Michał Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
01a53c0 to
9f0e8d7
Compare
|
|
||
| /// CPU operators have default Thread Pool inside Workspace. Mixed and GPU ops don't. | ||
| std::optional<ThreadPool> thread_pool_ = std::nullopt; | ||
| std::unique_ptr<ThreadPool> thread_pool_; |
There was a problem hiding this comment.
| std::unique_ptr<ThreadPool> thread_pool_; | |
| std::unique_ptr<OldThreadPool> thread_pool_; |
| vid_files = ["sintel_trailer-720p_2.mp4"] | ||
| vid_filenames = [os.path.join(vid_dir, vid_file) for vid_file in vid_files] | ||
|
|
||
| concurrency = OperatorConcurrency.FULL |
There was a problem hiding this comment.
Is this intentional that only FULL concurrency is tested with auto augment tests? I would expect to exercise at least default concurrency as well.
Category:
Refactoring (Redesign of existing code that doesn't affect functionality)
Description:
In the executor, an environment variable DALI_USE_NEW_THREAD_POOL is checked and when it's set to 1, the new thread pool is used and each operator is given its own ThreadPoolFacade. This allows all operators to execute in parallel, because now they will add tasks to separate Job objects (even though the jobs will be executed in the same thread pool). DALI_USE_NEW_THREAD_POOL is also checked when restricting parallelism policy (when not set, CPU operators are never parallelized).
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
New qa tests script
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A