feat: torch compile within checkpointing and activation memory budget for Lumina 2 #2217

urlesistiana · 2025-09-30T08:41:52Z

This PR has two features to speed up Lumina 2 training:

checkpointing + torch.compile sub modules

Currently, when using gradient checkpointing, torch.compile will skip all frames (modules) inside the checkpointed models. Those sub modules have to be compiled first. (I don't know torch.compile very well, but it seems a expected behavior).

I added a env "SDSCRIPTS_SELECTIVE_TORCH_COMPILE", set it to 1 will compile those sub modules.

In my setup, training a rank 16 LoRA, with gradient checkpointing, batch size 6 and resolution 1024. This reduces train time from 9.2s/it -> 6.6/it. (1.5s/img ->1.1s/img).

Only compile those core modules also signifyingly reduce compile time, from ~220s (use "--torch_compile") to ~5s.

torch.compile + Memory Budget API

In pytorch 2.4 there is a new feature called "Memory Budget API", which not only automatically does checkpointing but also only recompute cheap operations. So it is faster than traditional checkpointing method.

ref: https://pytorch.org/blog/activation-checkpointing-techniques/

I added a env "SDSCRIPTS_TORCH_COMPILE_ACTIVATION_MEMORY_BUDGET" to set the budget .

In my setup, training a rank 16 LoRA, with batch size 2, resolution 1024 and set the budget to "0.5", without "--gradient_checkpointing" but with "SDSCRIPTS_SELECTIVE_TORCH_COMPILE".

This reduces train time from 3.03/it -> 1.75/it. (1.5s/img ->0.88s/img), 70% faster.

Note: Training Lumina 2 without gradient checkpointing will OOM (>24G VRAM) even with batch size 1.

Tested on latest sd3 branch Lumina 2 training. Torch 2.9 nightly (9/29/2025), cuda 13.0, python 3.12, Nvidia 4090.

I'm not sure how to set those arguments properly, and I only tested for Lumina 2, so I set them via env for now, for minimum changes. But "torch.compile + Memory Budget API" seems so powerful and could apply to any model by just changing a global settings.

Open as draft for suggestions.

…ina 2

remove SDSCRIPTS_TORCH_COMPILE_ACTIVATION_MEMORY_BUDGET env

urlesistiana · 2025-09-30T14:08:04Z

added "--activation_memory_budget" argument. In theory models that can use torch.compile can benefit from this setting.

don't compile funcs with complex ops simplify FeedForward to avoid "cache line invalidated" error

kohya-ss · 2025-10-01T12:30:47Z

Thank you, this looks very promising. Please give me some time to investigate gradient checkpointing, torch.compile, and the Memory Budget API.
Did you test it on a Linux environment? Please let me know if you know about compatibility on Windows.

urlesistiana · 2025-10-01T12:58:39Z

Yes, I only tested on Linux. Can't test on Windows because I don't have such setup.

But I guess it should be ok on Windows.

We have the "--torch_compile" option already, which compile the whole model. For lumina2, if existing --torch_compile works, then "SDSCRIPTS_SELECTIVE_TORCH_COMPILE" should also work. It only compiles some sub models. No breaking changes.

For the global --activation_memory_budget option, if it works, then great. If not, just don't use it and it won't affect anything. No breaking changes.

One thing I do really concert, this "Memory Budget API" does not have any official documents. Only this blog mentioned this is a "experimental feature". But I can't find it in release note or list. Don't know if it is a "Beta" or "Prototype" or "Stable" feature. If it is not stable, maybe a env setting would be better.

FurkanGozukara · 2025-10-01T13:36:05Z

how --torch_compile working during training? or this is just inference?

urlesistiana · 2025-10-01T13:47:09Z

@FurkanGozukara Same as inference, catch and build graphs. Fuse them. Speed things up.

FurkanGozukara · 2025-10-01T13:53:58Z

@urlesistiana looks excellent

so what are the negatives? gradient checkpointing wont be used anymore? or block swap?

urlesistiana · 2025-10-01T14:08:30Z

No negatives. If it works, then great, free speed up. If it doesn't, because the model has unsupported strange code or operations, just don't use it.

not sure if block swap works with torch.compile, not tested.

But it can replace traditional gradient checkpointing with a smarter one, can utilize more VRAM for more speed. That's positive.

urlesistiana added 2 commits September 30, 2025 15:25

feat: add selective torch compile and activation memory budget to Lum…

c15e6b4

…ina 2

add --activation_memory_budget to training arguments

f25cb8a

remove SDSCRIPTS_TORCH_COMPILE_ACTIVATION_MEMORY_BUDGET env

urlesistiana marked this pull request as ready for review September 30, 2025 14:08

urlesistiana added 3 commits October 1, 2025 17:40

make torch.compile happy

45cab08

don't compile funcs with complex ops simplify FeedForward to avoid "cache line invalidated" error

check activation_memory_budget value range and accept value 0

3420a6f

fixed FeedForward

3bbfa9b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: torch compile within checkpointing and activation memory budget for Lumina 2 #2217

feat: torch compile within checkpointing and activation memory budget for Lumina 2 #2217

Uh oh!

urlesistiana commented Sep 30, 2025 •

edited

Loading

Uh oh!

urlesistiana commented Sep 30, 2025

Uh oh!

kohya-ss commented Oct 1, 2025

Uh oh!

urlesistiana commented Oct 1, 2025

Uh oh!

FurkanGozukara commented Oct 1, 2025

Uh oh!

urlesistiana commented Oct 1, 2025

Uh oh!

FurkanGozukara commented Oct 1, 2025

Uh oh!

urlesistiana commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat: torch compile within checkpointing and activation memory budget for Lumina 2 #2217

Are you sure you want to change the base?

feat: torch compile within checkpointing and activation memory budget for Lumina 2 #2217

Uh oh!

Conversation

urlesistiana commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

urlesistiana commented Sep 30, 2025

Uh oh!

kohya-ss commented Oct 1, 2025

Uh oh!

urlesistiana commented Oct 1, 2025

Uh oh!

FurkanGozukara commented Oct 1, 2025

Uh oh!

urlesistiana commented Oct 1, 2025

Uh oh!

FurkanGozukara commented Oct 1, 2025

Uh oh!

urlesistiana commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

urlesistiana commented Sep 30, 2025 •

edited

Loading