Add non_blocking to loading and moving tensors #2222

rockerBOO · 2025-10-10T19:01:54Z

Can improve cases where we are moving multiple tensors to the GPU before we do processing on it.

We need to synchronize them before we do processing on them to be sure they are all there.
torch.cuda.synchronize

This code is mostly a prototype converting things to use non_blocking but needs testing and validation to be sure it's working as expected as it will "work" but not be synchronized.

With this I am getting 8-10% faster training through.

https://docs.pytorch.org/tutorials/intermediate/pinmem_nonblock.html

kohya-ss · 2025-10-11T01:55:56Z

Thank you, this has the potential to improve overall performance.

Regarding stream synchronization, this project also supports mps and xpu, so I would appreciate it if you could use device_utils.synchnorize_device.

rockerBOO added 2 commits October 10, 2025 14:50

Add non_blocking to loading and moving tensors

d4081b2

fix: revert strategy_sd.py and remove latents from huber

46f9e24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add non_blocking to loading and moving tensors #2222

Add non_blocking to loading and moving tensors #2222

Uh oh!

rockerBOO commented Oct 10, 2025

Uh oh!

kohya-ss commented Oct 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add non_blocking to loading and moving tensors #2222

Are you sure you want to change the base?

Add non_blocking to loading and moving tensors #2222

Uh oh!

Conversation

rockerBOO commented Oct 10, 2025

Uh oh!

kohya-ss commented Oct 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants