-
Notifications
You must be signed in to change notification settings - Fork 118
Description
Problem Description
Using Megatron-LM + TE for Grok1 training on 8N MI355 GPU, AITER attention JIT itself take 30 mins to compile the aiter attention kernel, From the timeline, it compiles one node by one node with file lock
`[20251011 03:21:48][rank-0/64][DEBUG] [-----------utils.py:364] : [before the start of training step] datetime: 2025-10-11 03:21:48
[rank45]:[W1011 03:21:49.825479906 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank19]:[W1011 03:21:49.403827295 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank36]:[W1011 03:21:49.195641327 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank48]:[W1011 03:21:49.419414029 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank57]:[W1011 03:21:49.353152148 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank49]:[W1011 03:21:49.420466645 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank32]:[W1011 03:21:49.198830156 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank41]:[W1011 03:21:49.835922768 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank26]:[W1011 03:21:49.412158512 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank27]:[W1011 03:21:49.413206989 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank58]:[W1011 03:21:49.362519055 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank44]:[W1011 03:21:49.843861914 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank35]:[W1011 03:21:49.210528386 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank39]:[W1011 03:21:49.211385598 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank40]:[W1011 03:21:49.847988069 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank37]:[W1011 03:21:49.213424502 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank61]:[W1011 03:21:49.370906902 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank47]:[W1011 03:21:49.849648183 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank23]:[W1011 03:21:49.426366925 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank28]:[W1011 03:21:49.425293983 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank33]:[W1011 03:21:49.215084564 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank38]:[W1011 03:21:49.215271872 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank30]:[W1011 03:21:49.425788437 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank31]:[W1011 03:21:49.425795728 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank34]:[W1011 03:21:49.215498017 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank21]:[W1011 03:21:49.427881667 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank20]:[W1011 03:21:49.428033673 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank56]:[W1011 03:21:49.373584256 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank24]:[W1011 03:21:49.427589342 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank43]:[W1011 03:21:49.853419852 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank25]:[W1011 03:21:49.428863194 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank52]:[W1011 03:21:49.441812871 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank60]:[W1011 03:21:49.375764513 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank59]:[W1011 03:21:49.375816370 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank29]:[W1011 03:21:49.429712326 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank63]:[W1011 03:21:49.376999382 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank55]:[W1011 03:21:49.443722372 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank42]:[W1011 03:21:49.855750972 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank22]:[W1011 03:21:49.432856505 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank50]:[W1011 03:21:49.444312169 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank62]:[W1011 03:21:49.378643756 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank54]:[W1011 03:21:49.447710355 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank53]:[W1011 03:21:49.448056209 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank18]:[W1011 03:21:49.436574965 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank51]:[W1011 03:21:49.448241805 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank46]:[W1011 03:21:49.860450198 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank17]:[W1011 03:21:49.438012473 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank16]:[W1011 03:21:49.443368158 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] start build [module_rope_general_fwd] under /opt/venv/lib/python3.10/site-packages/aiter/jit/build/module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] start build [module_rope_general_fwd] under /opt/venv/lib/python3.10/site-packages/aiter/jit/build/module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[20251011 03:22:05][rank-7/64][DEBUG] [--hipify_python.py:1346] : Successfully preprocessed all matching files.
[20251011 03:22:05][rank-14/64][DEBUG] [--hipify_python.py:1346] : Successfully preprocessed all matching files.
[aiter] finish build [module_rope_general_fwd], cost 98.55709770s
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] finish build [module_rope_general_fwd], cost 98.74455975s
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[rank13]:[W1011 03:24:43.122697467 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank8]:[W1011 03:24:43.151166924 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank9]:[W1011 03:24:43.168776174 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank11]:[W1011 03:24:43.170256062 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank10]:[W1011 03:24:43.176922552 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank14]:[W1011 03:24:43.254269222 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank15]:[W1011 03:24:43.269884661 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank12]:[W1011 03:24:44.324608746 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank5]:[W1011 03:24:44.717754882 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank2]:[W1011 03:24:44.803421008 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank3]:[W1011 03:24:44.865496693 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank7]:[W1011 03:24:44.920774080 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank4]:[W1011 03:24:44.926722218 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank0]:[W1011 03:24:44.935994704 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank6]:[W1011 03:24:44.132406917 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank1]:[W1011 03:24:44.247031083 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] start build [module_rope_general_fwd] under /opt/venv/lib/python3.10/site-packages/aiter/jit/build/module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] start build [module_rope_general_fwd] under /opt/venv/lib/python3.10/site-packages/aiter/jit/build/module_rope_general_fwd
[20251011 03:24:59][rank-26/64][DEBUG] [--hipify_python.py:1346] : Successfully preprocessed all matching files.
[20251011 03:25:00][rank-23/64][DEBUG] [--hipify_python.py:1346] : Successfully preprocessed all matching files.
[aiter] finish build [module_rope_general_fwd], cost 98.72654024s
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] finish build [module_rope_general_fwd], cost 99.13293187s
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[rank30]:[W1011 03:27:37.705656762 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank29]:[W1011 03:27:37.726992353 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank26]:[W1011 03:27:37.741880690 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank27]:[W1011 03:27:37.745839541 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank31]:[W1011 03:27:37.747819188 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank28]:[W1011 03:27:37.779992898 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank24]:[W1011 03:27:37.800243115 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank25]:[W1011 03:27:37.827435753 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank17]:[W1011 03:27:38.912193592 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank22]:[W1011 03:27:38.952662744 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank20]:[W1011 03:27:38.967104982 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank21]:[W1011 03:27:38.053293579 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank16]:[W1011 03:27:38.061201705 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank11]:[W1011 03:27:38.977364139 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank10]:[W1011 03:27:38.977391689 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank8]:[W1011 03:27:38.977423206 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank13]:[W1011 03:27:38.977429025 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank9]:[W1011 03:27:38.977426431 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank15]:[W1011 03:27:38.977462805 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank14]:[W1011 03:27:38.977490176 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank12]:[W1011 03:27:38.977582793 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank23]:[W1011 03:27:38.135200916 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank18]:[W1011 03:27:38.213807862 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank19]:[W1011 03:27:38.223750555 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] start build [module_rope_general_fwd] under /opt/venv/lib/python3.10/site-packages/aiter/jit/build/module_rope_general_fwd
[rank2]:[W1011 03:27:40.432805324 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank0]:[W1011 03:27:40.432836130 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank1]:[W1011 03:27:40.432854086 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank3]:[W1011 03:27:40.432861858 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank4]:[W1011 03:27:40.432996217 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank7]:[W1011 03:27:40.433006562 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank6]:[W1011 03:27:40.433007624 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank5]:[W1011 03:27:40.433018430 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] start build [module_rope_general_fwd] under /opt/venv/lib/python3.10/site-packages/aiter/jit/build/module_rope_general_fwd
[20251011 03:27:52][rank-42/64][DEBUG] [--hipify_python.py:1346] : Successfully preprocessed all matching files.
[20251011 03:27:54][rank-36/64][DEBUG] [--hipify_python.py:1346] : Successfully preprocessed all matching files.
[aiter] finish build [module_rope_general_fwd], cost 98.41263403s
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] finish build [module_rope_general_fwd], cost 98.63961805s
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[rank45]:[W1011 03:30:30.815244929 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank40]:[W1011 03:30:30.822367960 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank47]:[W1011 03:30:30.826863306 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank43]:[W1011 03:30:30.860405698 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank42]:[W1011 03:30:30.875114867 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank46]:[W1011 03:30:30.889556169 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank41]:[W1011 03:30:30.916744090 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank44]:[W1011 03:30:30.946847093 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank32]:[W1011 03:30:32.399770578 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank35]:[W1011 03:30:32.432044261 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank34]:[W1011 03:30:32.461039575 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank33]:[W1011 03:30:32.475086714 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank39]:[W1011 03:30:32.485691281 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank38]:[W1011 03:30:32.532511724 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank37]:[W1011 03:30:32.592309050 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank36]:[W1011 03:30:32.599835765 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] start build [module_rope_general_fwd] under /opt/venv/lib/python3.10/site-packages/aiter/jit/build/module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] start build [module_rope_general_fwd] under /opt/venv/lib/python3.10/site-packages/aiter/jit/build/module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_fwd
[20251011 03:30:47][rank-60/64][DEBUG] [--hipify_python.py:1346] : Successfully preprocessed all matching files.
[20251011 03:30:49][rank-48/64][DEBUG] [--hipify_python.py:1346] : Successfully preprocessed all matching files.
[aiter] finish build [module_rope_general_fwd], cost 98.69664127s
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] finish build [module_rope_general_fwd], cost 98.52099328s
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_fwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[rank57]:[W1011 03:33:24.175809335 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank56]:[W1011 03:33:24.179584404 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank58]:[W1011 03:33:24.271360441 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank60]:[W1011 03:33:24.294050332 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank59]:[W1011 03:33:24.294428432 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank61]:[W1011 03:33:25.331219829 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank63]:[W1011 03:33:25.339720173 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank62]:[W1011 03:33:25.450298006 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank49]:[W1011 03:33:27.987642100 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank50]:[W1011 03:33:27.001556164 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank48]:[W1011 03:33:27.019513552 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank51]:[W1011 03:33:27.096777424 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank55]:[W1011 03:33:27.178333440 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank54]:[W1011 03:33:27.208795094 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank53]:[W1011 03:33:27.229690191 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[rank52]:[W1011 03:33:27.244682449 ProcessGroupNCCL.cpp:3996] Warning: An unbatched P2P op (send/recv) was called on this ProcessGroup with size 4. In lazy initialization mode, this will result in a new 2-rank NCCL communicator to be created. (function operator())
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] start build [module_rope_general_bwd] under /opt/venv/lib/python3.10/site-packages/aiter/jit/build/module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] start build [module_rope_general_bwd] under /opt/venv/lib/python3.10/site-packages/aiter/jit/build/module_rope_general_bwd
[20251011 03:34:54][rank-50/64][DEBUG] [--hipify_python.py:1346] : Successfully preprocessed all matching files.
[20251011 03:34:54][rank-56/64][DEBUG] [--hipify_python.py:1346] : Successfully preprocessed all matching files.
[aiter] finish build [module_rope_general_bwd], cost 105.03271872s
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] finish build [module_rope_general_bwd], cost 105.10509183s
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] start build [module_rope_general_bwd] under /opt/venv/lib/python3.10/site-packages/aiter/jit/build/module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[aiter] start build [module_rope_general_bwd] under /opt/venv/lib/python3.10/site-packages/aiter/jit/build/module_rope_general_bwd
[aiter] waiting for baton release at /opt/venv/lib/python3.10/site-packages/aiter/jit/build/lock_module_rope_general_bwd
[20251011 03:38:24][rank-47/64][DEBUG] [--hipify_python.py:1346] : Successfully preprocessed all matching files.
[20251011 03:38:37][rank-32/64][DEBUG] [--hipify_python.py:1346] : Successfully preprocessed all matching files.
[aiter] finish build [module_rope_general_bwd], cost 91.93398986s
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
[aiter] type hints mismatch, override to --> rope_bwd_impl(arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: int, arg4: bool, arg5: bool) -> None
`
Operating System
Ubuntu22.04
CPU
AMD EPYC 9575F
GPU
AMD MI355X
ROCm Version
ROCM 7.0
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response