-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add implementation details information for task group dynamic dependencies #1743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add implementation details information for task group dynamic dependencies #1743
Conversation
Co-authored-by: Aleksei Fedotov <[email protected]>
Co-authored-by: Aleksei Fedotov <[email protected]>
Co-authored-by: Aleksei Fedotov <[email protected]>
Co-authored-by: Aleksei Fedotov <[email protected]>
Co-authored-by: Alexandra <[email protected]>
Co-authored-by: Alexandra <[email protected]>
Co-authored-by: Alexandra <[email protected]>
Co-authored-by: Alexandra <[email protected]>
Co-authored-by: Alexandra <[email protected]>
Co-authored-by: Alexandra <[email protected]>
Co-authored-by: Alexandra <[email protected]>
Co-authored-by: Alexandra <[email protected]>
Co-authored-by: Alexandra <[email protected]>
Co-authored-by: Alexandra <[email protected]>
Co-authored-by: Alexey Kukanov <[email protected]> Co-authored-by: Konstantin Boyarinov <[email protected]>
* Remove concrete proposals from the RFC * Apply review comments
…k_group_dynamic_dependencies
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
Only a task in the `created` state is allowed to be a recipient of successors (only `task_handle` argument | ||
is allowed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, the task in submitted state can also be a recipient of successors as long as its predecessors are not yet completed (e.g., not submitted). However, as far as I understand, we don't have use cases for such scenario at the moment. Or is there another reason for that restriction?
Supposing we will need to add such possibility in the future. How will it look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this extension as having two additional overloads for make_edge
:
class task_group {
public:
static void make_edge(task_handle& pred, task_tracker& succ);
static void make_edge(task_tracker& pred, task_tracker& succ);
};
We will also need to strictly define the limitations for succ
. Something like, if succ
tracks the task in the state, other than created
or submitted
, or the state is concurrently changing while creating the edge, the behavior is undefined.
But I agree, we need to have a use-case for this first.
to itself and `function_stack_task` uses a reference to the user-provided body instance since | ||
it cannot be destroyed before the blocking API finishes. | ||
|
||
Due to API limitations, only a `function_task` can be owned by a `task_handle`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class task
has 48 bytes in reserve. If we re-use something from there, will it help simplifying the implementation or, perhaps, support other desired use cases (if any)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can simplify the implementation and I don't see how the place where the additional pointer is placed can affect the covered use-cases.
Potential usage of these 48 bytes can be discussed. The only doubt I have is that we will reuse part of them only for the task_handle_task
s. What if other types of tasks need to use this space as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, we might "reserve" a void*
pointer in the layout of task
, which could be used differently in derived classes.
task_tracker(const task_handle& th) : m_state(th ? th.get_dynamic_state() : nullptr) { | ||
if (m_state) m_state->reserve(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will task_tracker
track, if corresponding task_handle
has been submitted by the time this constructor is called? task_handle
becomes empty once it is run()
, right? But previously there was an idea to be able to make predecessors from submitted tasks. If task_tracker
tracks nothing, then specifying it as a predecessor could potentially result in a data race since the task it is supposed to track may not be finished yet. Thus, this task as a predecessor and specified successor may be running simultaneously. Can you please point out where I am wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that move semantics for task_handle
protects us from such a races in some manner. What should the user expect if it concurrently copies and moves the object? The state of the copy is undefined, since copying can happen before the move or after the move. So, such actions are kind of undefined behavior since neither a valid or an empty copy cannot be guaranteed.
Even if we guarantee that the tracker
would be in one of two possible states after the constructor (which is not the case for the current implementation, since th
may become empty while calling th.get_dynamic_state
), extra emptiness check would be required on the user side, since providing an empty tracker (or handle) to make_edge
is specified as UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That raises the question: should we maybe consider construction from an empty task handle as undefined behavior? In other words, is not it always a mistake when a tracker is constructed from an empty handle? After all, if you want an empty tracker, there is the default constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, that we can treat the construction from an empty task_handle as an error.
is not completed and its successors were not transferred. Having an "alive" successors list means new successors can be added into the current | ||
`task_dynamic_state`. | ||
* "Dead" state (value of `m_successors_list_head` equal to `~std::uintptr_t(0)`): indicates two possible scenarios: | ||
* The associated task is completed. Adding new successors to the associated task does not make any sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not say adding new successors does not make any sense. It may have a sense to run these successors immediately (if they do not have other incomplete predecessors), since its predecessor (i.e. the associated task) is completed. This is identical to the situation where completed predecessors can be added with no effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I will improve the wording
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
should be redirected to the dynamic state of the task, receiving the successors. In this case also `m_new_dynamic_state` should point to | ||
the receiving state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case also
m_new_dynamic_state
should point to the receiving state.
I wonder if having only m_successors_list_head
is enough. Instead of having "dead" state for this scenario, let it point to the corresponding list in the task_dynamic_state
in the task where the successors have been moved. Will it allow not introducing m_new_dynamic_state
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that we can make m_successors_list_head
to point to the new list instead of "dead' state. But we still need some flag to identify that transfer happened, because in two following cases, the state would be the same:
- The task with successors that did not run
transfer_successors_to
-m_successors_list_head
points to the last added successor. The task completion should traverse the list and notify the successors. - The task that did the transfer -
m_successors_list_head
also point to the successor, but task completion should not cause signaling and releasing the successors. It is now a responsibility of the recipient task.
Also, I we will need to handle the lifetime issue somehow - current dynamic state should hold a reference to the recipient's dynamic state and release it in the destructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember also that we will have transfer "chains": the task (B) that received successors from a task (A) then transfers those to another task (C). In this case, how would the dynamic state of (A) change to point to the successor list of (C)?
* The associated task has transferred its successors to another task. In this case, any new successor should be redirected to the dynamic state | ||
of the task that have received the successors during the transfer. `m_new_dynamic_state` should point to that receiving state. | ||
|
||
### Adding successors to the list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the included code snippet definitely helps, a sequence diagram describing the algorithm will make the algorithm explanation clearer, from my personal standpoint. Since this is, perhaps, the main algorithm of the whole feature, I would consider adding sequence diagrams for such important pieces in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
During the transfer, the continuation vertex is moved from `p_d_state` to `new_d_state`. | ||
All other state transitions remain the same as in the previous linearization scenario. | ||
|
||
When `new_task` completes, it retrieves the successor list containing the `SCV` and decrements the reference counter in the continuation vertex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed something, but it seems that the task_dynamic_state
for successor task should be instantiated with two references (the first blue rectangle): one reference representing the dependency, i.e. due to make_edge
call, and another one for the corresponding and still alive task_handle
object. Otherwise, I'm afraid the successor task might start executing whenever the new_task
completes, which might happen before the associated task_handle
is submitted. Is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding is partially correct. There are two reference counters.
One handles the lifetime of task_dynamic_state
(that is shown on the diagram). It holds references for task itself, trackers and other dynamic states that did the transfer.
The second reference counter is in the continuation vertex (it is hidden in the diagram, but mentioned in the highlighted sentence). It holds reference counters for each predecessor task and for the task_hande
owning the successor task.
And this second reference counter is initialized with two reference, as you stated, to prevent execution of the successor task before the task_handle
is submitted.
## `task_dynamic_state` in details | ||
|
||
As mentioned above, the `task_dynamic_state` class tracks a task's state (completed or not), | ||
its list of successors, and any post-transfer actions. It has the following layout: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is meant by "post-transfer" actions here? Is it about using the m_new_dynamic_state
pointer after transfer_successors_to
was made?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it refers to adding new successors to the same tracker after doing the transfer. I will improve the wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Co-authored-by: Aleksei Fedotov <[email protected]>
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Alexey Kukanov <[email protected]>
### `continuation_vertex` class | ||
|
||
`continuation_vertex` is a class that represents the associated task as a successor of other tasks. | ||
It maintains an additional reference counter, which is incremented: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this class really needed? Looking at the implementation in #1685, I think it might be replaced with a pointer to reference_vertex
in the dynamic state, and with pointers to dynamic states in the successor lists. Seemingly that would result in less allocations, and maybe in less redirections as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we replace continuation vertex in task_dynamic_state
with a pointer to reference_vertex
and assume it is dynamically allocated, we will have the same amount of allocations and will need to manage the correct allocator usage. Currently task_dynamic_state
and continuation_vertex
can be allocated by different threads and therefore with different small_object_allocator
instances.
In theory, we can replace m_continuation_vertex
with the reference counter (or a special vertex if we want to support distributed wait in the future) and exclude dynamic allocations at all.
- Reference counter equal to zero would mean no dependencies. (replaces
m_continuation_vertex == nullptr
) - If
run
detects reference counter that is not equal to zero, it decreases it and if it is the last reference - spawns the task - The first reserve on the ref counter would reserve additional reference for the
task_handle
owning the task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we replace continuation vertex in
task_dynamic_state
with a pointer toreference_vertex
and assume it is dynamically allocated, we will have the same amount of allocations and will need to manage the correct allocator usage.
I missed this aspect, thanks for explaining. But then, why cannot task_dynamic_state
inherit reference_vertex
?
In theory, we can replace m_continuation_vertex with the reference counter (or a special vertex if we want to support distributed wait in the future) and exclude dynamic allocations at all.
That's also an option, but I guess the refcounting machinery is already implemented in reference_vertex
and we prefer to reuse it.
Updated: or does reference_vertex
have extra details that are redundant for this case - namely, the pointer to the "parent" vertex?
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
// Finding the last node in the list | ||
successor_list_node* last_node = list; | ||
|
||
while (last_node->get_next() != nullptr) last_node = last_node->get_next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this traversal a potential performance concern, or do we not expect the lists to be big enough for real troubles?
We could in principle add a pointer to the tail of the list to the dynamic state. If extra space is a concern, it could be put into a union with m_new_dynamic_state
(if the latter one does not need to be read for proper state determination).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all of the considered examples, the task had one or two successors to transfer. So in known cases, this should not introduce significant overhead (especially since only a single traverse is performed per transfer).
However, in the general case this could become problematic. I see two possible approaches to address it:
- Apply your proposed changes now to proactively prevent potential issues in the future use cases.
- Keep the current implementation for the preview feature and review how transfer is used in other examples. If we identify real-world scenarios where the traversal have significant cost, we can update the implementation before moving the feature from preview.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine to keep it as an open question.
…etails.md Co-authored-by: Alexey Kukanov <[email protected]>
…ps://github.com/oneapi-src/oneTBB into dev/kboyarinov/tg-dynamic-dependencies-impl-rfc
…pendencies-r2' into dev/kboyarinov/tg-dynamic-dependencies-impl-rfc
class task_dynamic_state { | ||
private: | ||
task_handle_task* m_task; | ||
std::atomic<successor_list_node*> m_successor_list_head; | ||
std::atomic<continuation_vertex*> m_continuation_vertex; | ||
std::atomic<task_dynamic_state*> m_new_dynamic_state; | ||
std::atomic<std::size_t> m_num_references; | ||
small_object_allocator m_allocator; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add a field for runtime versioning of the class, in case we need to change it in the future? If the class size is a concern, I think some space can be taken out of m_num_references
, for which 32 bits would be more than enough for any realistic number of references.
BTW, will the TBB scheduler access this class on the hot path in the presence of task dependencies? And if it will, is there risk of "false sharing" and related performance impact?
rfcs/proposed/task_group_dynamic_dependencies/implementation_details.md
Outdated
Show resolved
Hide resolved
* `completed` state (`m_successor_list_head` equals `~std::uintptr_t(0)`): indicates that the associated task is completed. | ||
Adding new successors does not add any real dependencies. Successors can proceed for execution if their other dependencies are satisfied. | ||
* `transferred` state (`m_successor_list_head` equals `std::uintptr_t(0) - 1`): indicates that the completion of the associated task was transferred | ||
to another task. In this case, any new successor should be redirected to the dynamic state of the task that has received the successors during the transfer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~0
and -1
is the same value. Did you mean ~std::uintptr_t(0) - 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tilde was missed. Fixed.
```cpp | ||
void transfer_successors_to(task_dynamic_state* new_dynamic_state) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to replace the remaining code blocks with the flow diagram for transitioning task completion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…etails.md Co-authored-by: Alexey Kukanov <[email protected]>
```cpp | ||
namespace tbb::detail::d1 { | ||
d1::task* get_current_task(); | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember, the TBB naming convention for getters that return a value is not to use the prefix. That's what I see in other similar entry points, e.g. current_context
and execution_slot
. So, this should probably be named current_task
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was initially named current_task
, but I faced issues on Mac systems due to conflict with current_task function implemented as a macro on some versions.
I think it can be changed to something like current_task_ptr
or this_task
.
…ynamic-dependencies-impl-rfc
Description
Add RFC describing the implementation of task_group dynamic dependencies.
RFC can be found here: #1664
Implementation:
The document describes the complete implementation
Fixes # - issue number(s) if exists
Type of change
Choose one or multiple, leave empty if none of the other choices apply
Add a respective label(s) to PR if you have permissions
Tests
Documentation
Breaks backward compatibility
Notify the following users
List users with
@
to send notificationsOther information