Exactly same libraries both Linux and Windows and Windows is widly slower what could be reason? FLUX Fine Tuning

I just tested FLUX Fine Tuning on Windows (RTX 5090) and Linux (RunPod RTX 5090 and Massed Compute RTX 6000 PRO)

The thing is on Linux the training speed is at least 25% faster than Windows

I am using Adafactor optimizer and Full bf16 training

How could it be?

No block swap used in all tests since it fits into 32 GB VRAM

I am using Torch 2.8 and CUDA 12.9 exactly same libraries on both platforms 

There weren't this much difference before

Moreover I was getting like 8.5 second / it before on RTX 3090 TI on Windows and now exactly same config is around 11 second / it on Windows

How can we debug reason you think? What could be the culprit? @kohya-ss 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Exactly same libraries both Linux and Windows and Windows is widly slower what could be reason? FLUX Fine Tuning #2218

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Exactly same libraries both Linux and Windows and Windows is widly slower what could be reason? FLUX Fine Tuning #2218

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions