Skip to content

Add implementation details information for task group dynamic dependencies #1743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 86 commits into
base: master
Choose a base branch
from

Conversation

kboyarinov
Copy link
Contributor

@kboyarinov kboyarinov commented May 30, 2025

Description

Add RFC describing the implementation of task_group dynamic dependencies.
RFC can be found here: #1664
Implementation:

The document describes the complete implementation

Fixes # - issue number(s) if exists

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

List users with @ to send notifications

Other information

vossmjp and others added 30 commits December 4, 2024 17:34
Co-authored-by: Alexey Kukanov <[email protected]>
Co-authored-by: Konstantin Boyarinov <[email protected]>
* Remove concrete proposals from the RFC

* Apply review comments
Comment on lines 63 to 64
Only a task in the `created` state is allowed to be a recipient of successors (only `task_handle` argument
is allowed).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the task in submitted state can also be a recipient of successors as long as its predecessors are not yet completed (e.g., not submitted). However, as far as I understand, we don't have use cases for such scenario at the moment. Or is there another reason for that restriction?
Supposing we will need to add such possibility in the future. How will it look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this extension as having two additional overloads for make_edge:

class task_group {
public:
    static void make_edge(task_handle& pred, task_tracker& succ);
    static void make_edge(task_tracker& pred, task_tracker& succ);
};

We will also need to strictly define the limitations for succ. Something like, if succ tracks the task in the state, other than created or submitted, or the state is concurrently changing while creating the edge, the behavior is undefined.
But I agree, we need to have a use-case for this first.

to itself and `function_stack_task` uses a reference to the user-provided body instance since
it cannot be destroyed before the blocking API finishes.

Due to API limitations, only a `function_task` can be owned by a `task_handle`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class task has 48 bytes in reserve. If we re-use something from there, will it help simplifying the implementation or, perhaps, support other desired use cases (if any)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it can simplify the implementation and I don't see how the place where the additional pointer is placed can affect the covered use-cases.
Potential usage of these 48 bytes can be discussed. The only doubt I have is that we will reuse part of them only for the task_handle_tasks. What if other types of tasks need to use this space as well?

Copy link
Contributor

@akukanov akukanov Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, we might "reserve" a void* pointer in the layout of task, which could be used differently in derived classes.

Comment on lines 234 to 236
task_tracker(const task_handle& th) : m_state(th ? th.get_dynamic_state() : nullptr) {
if (m_state) m_state->reserve();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will task_tracker track, if corresponding task_handle has been submitted by the time this constructor is called? task_handle becomes empty once it is run(), right? But previously there was an idea to be able to make predecessors from submitted tasks. If task_tracker tracks nothing, then specifying it as a predecessor could potentially result in a data race since the task it is supposed to track may not be finished yet. Thus, this task as a predecessor and specified successor may be running simultaneously. Can you please point out where I am wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that move semantics for task_handle protects us from such a races in some manner. What should the user expect if it concurrently copies and moves the object? The state of the copy is undefined, since copying can happen before the move or after the move. So, such actions are kind of undefined behavior since neither a valid or an empty copy cannot be guaranteed.

Even if we guarantee that the tracker would be in one of two possible states after the constructor (which is not the case for the current implementation, since th may become empty while calling th.get_dynamic_state), extra emptiness check would be required on the user side, since providing an empty tracker (or handle) to make_edge is specified as UB.

Copy link
Contributor

@akukanov akukanov Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That raises the question: should we maybe consider construction from an empty task handle as undefined behavior? In other words, is not it always a mistake when a tracker is constructed from an empty handle? After all, if you want an empty tracker, there is the default constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that we can treat the construction from an empty task_handle as an error.

is not completed and its successors were not transferred. Having an "alive" successors list means new successors can be added into the current
`task_dynamic_state`.
* "Dead" state (value of `m_successors_list_head` equal to `~std::uintptr_t(0)`): indicates two possible scenarios:
* The associated task is completed. Adding new successors to the associated task does not make any sense.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not say adding new successors does not make any sense. It may have a sense to run these successors immediately (if they do not have other incomplete predecessors), since its predecessor (i.e. the associated task) is completed. This is identical to the situation where completed predecessors can be added with no effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I will improve the wording

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 371 to 372
should be redirected to the dynamic state of the task, receiving the successors. In this case also `m_new_dynamic_state` should point to
the receiving state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case also m_new_dynamic_state should point to the receiving state.

I wonder if having only m_successors_list_head is enough. Instead of having "dead" state for this scenario, let it point to the corresponding list in the task_dynamic_state in the task where the successors have been moved. Will it allow not introducing m_new_dynamic_state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that we can make m_successors_list_head to point to the new list instead of "dead' state. But we still need some flag to identify that transfer happened, because in two following cases, the state would be the same:

  • The task with successors that did not run transfer_successors_to - m_successors_list_head points to the last added successor. The task completion should traverse the list and notify the successors.
  • The task that did the transfer - m_successors_list_head also point to the successor, but task completion should not cause signaling and releasing the successors. It is now a responsibility of the recipient task.

Also, I we will need to handle the lifetime issue somehow - current dynamic state should hold a reference to the recipient's dynamic state and release it in the destructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember also that we will have transfer "chains": the task (B) that received successors from a task (A) then transfers those to another task (C). In this case, how would the dynamic state of (A) change to point to the successor list of (C)?

* The associated task has transferred its successors to another task. In this case, any new successor should be redirected to the dynamic state
of the task that have received the successors during the transfer. `m_new_dynamic_state` should point to that receiving state.

### Adding successors to the list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the included code snippet definitely helps, a sequence diagram describing the algorithm will make the algorithm explanation clearer, from my personal standpoint. Since this is, perhaps, the main algorithm of the whole feature, I would consider adding sequence diagrams for such important pieces in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

During the transfer, the continuation vertex is moved from `p_d_state` to `new_d_state`.
All other state transitions remain the same as in the previous linearization scenario.

When `new_task` completes, it retrieves the successor list containing the `SCV` and decrements the reference counter in the continuation vertex.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed something, but it seems that the task_dynamic_state for successor task should be instantiated with two references (the first blue rectangle): one reference representing the dependency, i.e. due to make_edge call, and another one for the corresponding and still alive task_handle object. Otherwise, I'm afraid the successor task might start executing whenever the new_task completes, which might happen before the associated task_handle is submitted. Is my understanding correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding is partially correct. There are two reference counters.
One handles the lifetime of task_dynamic_state (that is shown on the diagram). It holds references for task itself, trackers and other dynamic states that did the transfer.
The second reference counter is in the continuation vertex (it is hidden in the diagram, but mentioned in the highlighted sentence). It holds reference counters for each predecessor task and for the task_hande owning the successor task.
And this second reference counter is initialized with two reference, as you stated, to prevent execution of the successor task before the task_handle is submitted.

## `task_dynamic_state` in details

As mentioned above, the `task_dynamic_state` class tracks a task's state (completed or not),
its list of successors, and any post-transfer actions. It has the following layout:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is meant by "post-transfer" actions here? Is it about using the m_new_dynamic_state pointer after transfer_successors_to was made?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it refers to adding new successors to the same tracker after doing the transfer. I will improve the wording.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +165 to +168
### `continuation_vertex` class

`continuation_vertex` is a class that represents the associated task as a successor of other tasks.
It maintains an additional reference counter, which is incremented:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class really needed? Looking at the implementation in #1685, I think it might be replaced with a pointer to reference_vertex in the dynamic state, and with pointers to dynamic states in the successor lists. Seemingly that would result in less allocations, and maybe in less redirections as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we replace continuation vertex in task_dynamic_state with a pointer to reference_vertex and assume it is dynamically allocated, we will have the same amount of allocations and will need to manage the correct allocator usage. Currently task_dynamic_state and continuation_vertex can be allocated by different threads and therefore with different small_object_allocator instances.

In theory, we can replace m_continuation_vertex with the reference counter (or a special vertex if we want to support distributed wait in the future) and exclude dynamic allocations at all.

  • Reference counter equal to zero would mean no dependencies. (replaces m_continuation_vertex == nullptr)
  • If run detects reference counter that is not equal to zero, it decreases it and if it is the last reference - spawns the task
  • The first reserve on the ref counter would reserve additional reference for the task_handle owning the task.

Copy link
Contributor

@akukanov akukanov Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we replace continuation vertex in task_dynamic_state with a pointer to reference_vertex and assume it is dynamically allocated, we will have the same amount of allocations and will need to manage the correct allocator usage.

I missed this aspect, thanks for explaining. But then, why cannot task_dynamic_state inherit reference_vertex?

In theory, we can replace m_continuation_vertex with the reference counter (or a special vertex if we want to support distributed wait in the future) and exclude dynamic allocations at all.

That's also an option, but I guess the refcounting machinery is already implemented in reference_vertex and we prefer to reuse it.
Updated: or does reference_vertex have extra details that are redundant for this case - namely, the pointer to the "parent" vertex?

// Finding the last node in the list
successor_list_node* last_node = list;

while (last_node->get_next() != nullptr) last_node = last_node->get_next();
Copy link
Contributor

@akukanov akukanov Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this traversal a potential performance concern, or do we not expect the lists to be big enough for real troubles?

We could in principle add a pointer to the tail of the list to the dynamic state. If extra space is a concern, it could be put into a union with m_new_dynamic_state (if the latter one does not need to be read for proper state determination).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of the considered examples, the task had one or two successors to transfer. So in known cases, this should not introduce significant overhead (especially since only a single traverse is performed per transfer).

However, in the general case this could become problematic. I see two possible approaches to address it:

  • Apply your proposed changes now to proactively prevent potential issues in the future use cases.
  • Keep the current implementation for the preview feature and review how transfer is used in other examples. If we identify real-world scenarios where the traversal have significant cost, we can update the implementation before moving the feature from preview.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine to keep it as an open question.

Comment on lines +110 to +118
class task_dynamic_state {
private:
task_handle_task* m_task;
std::atomic<successor_list_node*> m_successor_list_head;
std::atomic<continuation_vertex*> m_continuation_vertex;
std::atomic<task_dynamic_state*> m_new_dynamic_state;
std::atomic<std::size_t> m_num_references;
small_object_allocator m_allocator;
};
Copy link
Contributor

@akukanov akukanov Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a field for runtime versioning of the class, in case we need to change it in the future? If the class size is a concern, I think some space can be taken out of m_num_references, for which 32 bits would be more than enough for any realistic number of references.

BTW, will the TBB scheduler access this class on the hot path in the presence of task dependencies? And if it will, is there risk of "false sharing" and related performance impact?

Comment on lines 223 to 226
* `completed` state (`m_successor_list_head` equals `~std::uintptr_t(0)`): indicates that the associated task is completed.
Adding new successors does not add any real dependencies. Successors can proceed for execution if their other dependencies are satisfied.
* `transferred` state (`m_successor_list_head` equals `std::uintptr_t(0) - 1`): indicates that the completion of the associated task was transferred
to another task. In this case, any new successor should be redirected to the dynamic state of the task that has received the successors during the transfer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~0 and -1 is the same value. Did you mean ~std::uintptr_t(0) - 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, tilde was missed. Fixed.

Comment on lines 352 to 353
```cpp
void transfer_successors_to(task_dynamic_state* new_dynamic_state) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to replace the remaining code blocks with the flow diagram for transitioning task completion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Base automatically changed from dev/kboyarinov/tg-rfc-dynamic-dependencies-r2 to master July 25, 2025 11:17
Comment on lines 416 to 420
```cpp
namespace tbb::detail::d1 {
d1::task* get_current_task();
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember, the TBB naming convention for getters that return a value is not to use the prefix. That's what I see in other similar entry points, e.g. current_context and execution_slot. So, this should probably be named current_task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was initially named current_task, but I faced issues on Mac systems due to conflict with current_task function implemented as a macro on some versions.
I think it can be changed to something like current_task_ptr or this_task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants