Currently we do a busy-wait with `MPI.IProbe` - `yield()` loop, which consumes CPU cycles unnecessarily.