Skip to content

Conversation

@benvanik
Copy link
Collaborator

This adds SCF support for tracking elidable copies across control flow, hal.tensor.import consume handling, and topology-aware transfer elision that does not fight with itself.

For SCF the existing CF/block argument analysis is extended to track values captured by regions and carried by the RegionBranchOpInterface. We can detect when a value is passed in as a loop initial value, updated each trip, and finally returned.

hal.tensor.import has had the consume attribute on it for ages but it was never used: now we can treat the imported tensors with it set as having value semantics (like if we originated the tensors internally) and can better elide copies by potentially performing in-place operations on the imported tensor without introducing copies (the COW pass will still insert the copies defensively, but ElideAsyncCopiesPass can now mostly remove them).

The existing ElideAsyncTransfersPass was added to try to support topology-aware transfer elision but tended to fight with COW/copy elision as if COW introduced copies after it ran we'd never try to remove them. Now we are back to the straightforward introduce copies for correctness -> elide copies that are not required flow. The new
elide_async_copies_topology.mlir test covers more than our old test transfer test did (including bugs I found that we didn't handle correctly in the old pass).

Part of #16168 PR sequence (6/6).

This is a fairly large capability upgrade for existing programs
that used HAL fences to interop with externally allocated or
managed buffers (kvcaches, etc): we can now track the timeline
across the HAL ops thanks to the TimelineAwareOpInterface and
fence-like support (we don't use HAL fences, just say that
there are fence-like objects that are not defined by use-def
chains like timepoints are).

SCF support is also added for several cases where we can
(today) easily identify coverage - e.g., scf.if/scf.index_switch
are the easy cases, while we have some initial conservative
handling of scf.for/scf.while. Nesting is supported and we have
test coverage for it but there may still be cases that thwart
the analysis and result in extra joins/awaits.

Part of #16168 PR sequence (5/6).
This adds SCF support for tracking elidable copies across
control flow, `hal.tensor.import` consume handling, and
topology-aware transfer elision that does not fight with
itself.

For SCF the existing CF/block argument analysis is extended
to track values captured by regions and carried by the
RegionBranchOpInterface. We can detect when a value is
passed in as a loop initial value, updated each trip, and
finally returned.

`hal.tensor.import` has had the `consume` attribute on it for
ages but it was never used: now we can treat the imported
tensors with it set as having value semantics (like if we
originated the tensors internally) and can better elide
copies by potentially performing in-place operations on the
imported tensor without introducing copies (the COW pass will
still insert the copies defensively, but ElideAsyncCopiesPass
can now mostly remove them).

The existing ElideAsyncTransfersPass was added to try to
support topology-aware transfer elision but tended to fight
with COW/copy elision as if COW introduced copies after it
ran we'd never try to remove them. Now we are back to the
straightforward introduce copies for correctness -> elide
copies that are not required flow. The new
elide_async_copies_topology.mlir test covers more than our
old test transfer test did (including bugs I found that we
didn't handle correctly in the old pass).

Part of #16168 PR sequence (6/6).
@benvanik benvanik force-pushed the users/benvanik/16168-5 branch from 07a4db6 to 3263b58 Compare November 24, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants