[MPIX Stream] CUDA event compatible completion semantics

# Summary
When using MPIX Stream enqueue APIs with MPICH ch4:ucx in a PyTorch ProcessGroup backend, we observe correctness failures for larger allreduce sizes unless we force a cudaStreamSynchronize after each MPIX enqueue. This strongly suggests the “completion” seen by the application (via a CUDA event recorded after enqueue) can precede the actual GPU work’s visibility/completion on the provided CUDA stream. We need an event-compatible completion mechanism so frameworks can remain non-blocking and preserve overlap.

# Scenario
Use a CUDA stream `S` to 
1. enqueue an in-place allreduce via MPIX_Allreduce_enqueue(..., stream_comm) (created from `S`)
2. record a CUDA event `E` on `S` immediately after enqueue
3. later rely on `E` (or event-wait) for completion instead of cudaStreamSynchronize(`S`)

# Proposal
A CUDA event recorded after enqueue on the same stream should become “ready” only after the underlying GPU work for the collective is finished and data is visible to subsequent consumers on that stream. This allows non-blocking frameworks to:
- return immediately
- compose precise cross-stream dependencies via `cudaStreamWaitEvent`
- avoid stream-wide synchronizations

# Why this matters
PyTorch, NCCL, and other GPU-accelerated frameworks rely on CUDA events to coordinate completion without stalling streams or the CPU.
For process-group backends, we need the comm library to either:
- execute the GPU work on the provided CUDA stream and let us record a reliable end-event, or
- provide an API to record/obtain a completion event itself.

Is this doable? Please let me know your thoughts!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MPIX Stream] CUDA event compatible completion semantics #7612

Summary

Scenario

Proposal

Why this matters

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[MPIX Stream] CUDA event compatible completion semantics #7612

Description

Summary

Scenario

Proposal

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions