ElideAsyncCopiesPass refactoring for SCF/transfer support. #22739
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds SCF support for tracking elidable copies across control flow,
hal.tensor.importconsume handling, and topology-aware transfer elision that does not fight with itself.For SCF the existing CF/block argument analysis is extended to track values captured by regions and carried by the RegionBranchOpInterface. We can detect when a value is passed in as a loop initial value, updated each trip, and finally returned.
hal.tensor.importhas had theconsumeattribute on it for ages but it was never used: now we can treat the imported tensors with it set as having value semantics (like if we originated the tensors internally) and can better elide copies by potentially performing in-place operations on the imported tensor without introducing copies (the COW pass will still insert the copies defensively, but ElideAsyncCopiesPass can now mostly remove them).The existing ElideAsyncTransfersPass was added to try to support topology-aware transfer elision but tended to fight with COW/copy elision as if COW introduced copies after it ran we'd never try to remove them. Now we are back to the straightforward introduce copies for correctness -> elide copies that are not required flow. The new
elide_async_copies_topology.mlir test covers more than our old test transfer test did (including bugs I found that we didn't handle correctly in the old pass).
Part of #16168 PR sequence (6/6).