-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[compiler][stream] Avoid circular dependency between partitions in execution scheduling #18217
[compiler][stream] Avoid circular dependency between partitions in execution scheduling #18217
Conversation
// Mark the op as having hazards against all other partitions. | ||
// Why are we that conservative? | ||
// Why we don't take the hazards for the users? | ||
if (!builders.empty()) { | ||
opInfo.hazards.set(0, builders.size() - 1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benvanik, could you shed some light on what was your reasoning here when you wrote the algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to be safe than incorrect, especially with our current minimal test coverage. It's not always safe to reorder things - if anything we are unlikely to be conservative enough here - for example, if there's a stream.resource.load
of a resource or a global we can't move anything that may affect that resource or global. This partitioning was designed to be conservative because debugging such issues is really difficult.
…ecution scheduling Multi-device programs may produce circular dependency between partitions, which is an invalid partitioning. One such example is provided as a test case. To fix this here we track partitions that transitively depend (hazards) on a partition. Not just per operation. Signed-off-by: Boian Petkantchin <[email protected]>
be1af16
to
c21f08a
Compare
// If we are cloning we care only about users that are a part of the | ||
// candidate partition. | ||
opHazards = &opHazardsInCandidatePartition; | ||
for (auto user : op.getUsers()) { | ||
if (builders[partitionOrdinal]->ops.contains(user)) | ||
opHazardsInCandidatePartition |= opInfos[user].hazards; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, here we would need to walk further down the users if a user is also cloned into the partition. This will be useful if we have a block of cloneable ops. If left like that, other than the inefficiency, I don't think it will produce invalid partitioning.
|
||
for (auto opHazardOrdinal : opHazards->set_bits()) { | ||
if (partitionOrdinal < opHazardOrdinal) | ||
// Reject partition ordering that would require partition sorting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use {}
around multi-line if statements
// Mark the op as having hazards against all other partitions. | ||
// Why are we that conservative? | ||
// Why we don't take the hazards for the users? | ||
if (!builders.empty()) { | ||
opInfo.hazards.set(0, builders.size() - 1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to be safe than incorrect, especially with our current minimal test coverage. It's not always safe to reorder things - if anything we are unlikely to be conservative enough here - for example, if there's a stream.resource.load
of a resource or a global we can't move anything that may affect that resource or global. This partitioning was designed to be conservative because debugging such issues is really difficult.
Signed-off-by: Boian Petkantchin <[email protected]>
Multi-device programs may produce circular dependency between partitions, which is an invalid partitioning.
One such example is provided as a test case.
To fix this, here we track partitions that transitively depend (hazards) on a partition. Not just per operation.