Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler][stream] Avoid circular dependency between partitions in execution scheduling #18217

Merged

Conversation

sogartar
Copy link
Contributor

@sogartar sogartar commented Aug 14, 2024

Multi-device programs may produce circular dependency between partitions, which is an invalid partitioning.
One such example is provided as a test case.

To fix this, here we track partitions that transitively depend (hazards) on a partition. Not just per operation.

Comment on lines 261 to 266
// Mark the op as having hazards against all other partitions.
// Why are we that conservative?
// Why we don't take the hazards for the users?
if (!builders.empty()) {
opInfo.hazards.set(0, builders.size() - 1);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benvanik, could you shed some light on what was your reasoning here when you wrote the algorithm?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to be safe than incorrect, especially with our current minimal test coverage. It's not always safe to reorder things - if anything we are unlikely to be conservative enough here - for example, if there's a stream.resource.load of a resource or a global we can't move anything that may affect that resource or global. This partitioning was designed to be conservative because debugging such issues is really difficult.

…ecution scheduling

Multi-device programs may produce circular dependency between
partitions, which is an invalid partitioning.
One such example is provided as a test case.

To fix this here we track partitions that transitively depend (hazards) on a partition.
Not just per operation.

Signed-off-by: Boian Petkantchin <[email protected]>
@sogartar sogartar force-pushed the fix-stream-exec-partitioning-with-affinties branch from be1af16 to c21f08a Compare August 14, 2024 06:20
@sogartar sogartar added the compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) label Aug 14, 2024
Comment on lines 108 to 114
// If we are cloning we care only about users that are a part of the
// candidate partition.
opHazards = &opHazardsInCandidatePartition;
for (auto user : op.getUsers()) {
if (builders[partitionOrdinal]->ops.contains(user))
opHazardsInCandidatePartition |= opInfos[user].hazards;
}
Copy link
Contributor Author

@sogartar sogartar Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, here we would need to walk further down the users if a user is also cloned into the partition. This will be useful if we have a block of cloneable ops. If left like that, other than the inefficiency, I don't think it will produce invalid partitioning.


for (auto opHazardOrdinal : opHazards->set_bits()) {
if (partitionOrdinal < opHazardOrdinal)
// Reject partition ordering that would require partition sorting.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use {} around multi-line if statements

Comment on lines 261 to 266
// Mark the op as having hazards against all other partitions.
// Why are we that conservative?
// Why we don't take the hazards for the users?
if (!builders.empty()) {
opInfo.hazards.set(0, builders.size() - 1);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to be safe than incorrect, especially with our current minimal test coverage. It's not always safe to reorder things - if anything we are unlikely to be conservative enough here - for example, if there's a stream.resource.load of a resource or a global we can't move anything that may affect that resource or global. This partitioning was designed to be conservative because debugging such issues is really difficult.

Signed-off-by: Boian Petkantchin <[email protected]>
@sogartar sogartar enabled auto-merge (squash) August 21, 2024 05:31
@sogartar sogartar merged commit 3a06719 into iree-org:main Aug 21, 2024
35 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants