Initial NaFlex ViT model and training support #2466

rwightman · 2025-04-08T04:28:03Z

Working:

'flex' ViT w/ NaFlex position embedding resize, pre-patched input, attention padding masks
Single node train.py works with a custom naflex data-pipeline via a dataset wrapper that handles random seq-len & batch-size selection, constrains images to seq-len while keeping aspect ratio (with randomizations)
A much faster patch embed kernel resample, torch only, can be used in forward()
A 4-GPU distributed training run completed with decent results
NaFlex patch mode compatible mixup & cutmix and random erasing implementation
Add randomization of the patch_size along with seq_len
SigLip-2 NaFlex vision encoder weight port (tested with un-pushed OpenCLIP mods), matches expected results
weight loading / translation for existing vits (all but 2 existing vision_transformer.py vits load with use_naflex=True flag in create_model())

Not tested / not completed:

bigger distributed runs needed
dataset wrapper for iterable datasets (wds, tfds, iterable hfds) needs to be added
more model definitions
Integration of naflex data pipeline components into OpenCLIP

HuggingFaceDocBuilderDev · 2025-04-08T04:37:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…the model.

…t if we bash h,w key into an int or str

… add, classic vit weight loading for naflex model

…move redundancy

…king loader based patch compatible RandomErasing for NaFlex mode.

…w. Remove subregion mode, not going to be worth it.

… instead

… embeds and 'aspect preserving mode' to Flex Embeds. Some more docstrings and typing.

… creating classic vits as naflex. Cleanup, improvements.

…s need more code

…ndling from train.py onwards. Add docstrings and type annotations (thanks Claude).

… code

stas-sl · 2025-06-05T14:24:26Z

Hi Ross,

I've been following your progress a bit, as I'm also interested in the FlexiViT/Navit/Naflex architectures. I'm still trying to wrap my head around all the details, but I had a question regarding patch embedding resizing methods.

How essential do you think the PI variant of patch embedding resizing is, compared to the other variants mentioned in the FlexiViT paper (Appendix A.1), such as Vanilla, Token-LN, Image-LN, or Untied?

From my understanding, if you're using pretrained patch embeddings with a new patch size without introducing additional layers or operations in the transformer, then yes, PI resizing seems to work best. However, if you're training from scratch, their figure suggests that simple bilinear resizing performs quite well - except for very large patch sizes (like 48). So I'm wondering: is the added complexity of the PI method really worth it in that case, or can we just use bilinear resizing?

I also noticed you recently added the PatchEmbedInterpolator class, which I assume implements the PI method. Do you think it’s worth supporting other resizing methods (like plain bilinear), or is that already possible and I’m just missing something?

…only numpy arrays

rwightman · 2025-06-05T15:40:58Z

@stas-sl if you train/fine-tune with a diff patch size using basic interpolation as you say, yeah, I imagine it will be fine, if you train while resizing to different patch sizes using the simple interpolation and don't get crazy in the range of sizes covered, I expect it'd be robust to sizes in the range used (at inference time). But I haven't tried this extensively.

However, using the simple resize on existing model weights yields pretty poor results compared to the PI method. Originally with the PI method I had based on the original JAX impl it was damned slow, however I completely redid it native torch tensors and a WAY faster basis vector computation and it runs quite nicely at train time, so that's why I decided to just support the PI mode. I was just testing this yesterday and the NaFlex pipeline appears to be working well when both randomizing sequence length AND patch size at train time, neat.

rwightman added 2 commits April 7, 2025 21:27

Initial NaFlex ViT model and training support

0893f5d

Type fixes, remove old comments

825edcc

rwightman marked this pull request as draft April 8, 2025 04:39

rwightman added 26 commits April 8, 2025 07:59

Exclude naflex models from jit tests

9b23d6d

Fix ParallelThingsBlock w/ attn_mask

6675590

Add loss scale arg, initial distributed loss scale. Maybe fix FX for …

13e0f3a

…the model.

Exclude embeds module and mask attn functions from tracing

b4bb0f4

A much faster resample_patch_embed, can be used at train/validation time

97341fe

Improve several typing issues for flex vit, can (almost) work with ji…

ea728f6

…t if we bash h,w key into an int or str

Optimizations for pos embed resize, merge different mask helper fns

c527c37

Add naflex loader support to validate.py, fix bug in naflex pos embed…

3dc90ed

… add, classic vit weight loading for naflex model

Further pos embed tweaks, rejig model defs for testing

ee27b73

Starting to test distributed train, fix issue with batch_size reduce

39eb56f

Move NaFlexCollate with dataset, remove stand alone collate_fn and re…

e2073e3

…move redundancy

Add a WIP NaFlex compatible mixup/cutmix for testing

8fcbceb

Mixup cleanup, add prob support and train script integration. Add wor…

7624389

…king loader based patch compatible RandomErasing for NaFlex mode.

NaFlex random erasing performance improvements, python loops were slo…

f001b15

…w. Remove subregion mode, not going to be worth it.

Merge remote-tracking branch 'origin/main' into naflex

7bfe606

Add so400m model size for test, few tweaks.

d7d3538

Fix issue w/ MAP attention mask and no patch_valid

2ad75e8

Move naflex global pool into one fn that can be marked notrace

162f492

Fix tracing of attention module with attn_mask support

dd2c141

A few more maybe_add_mask situations

842a786

torch.fx.wrap not working with older pytorch, trying register_notrace…

b7ced7c

… instead

Add siglip2 compatible naflex encoders. Add support to factorized pos…

72858c1

… embeds and 'aspect preserving mode' to Flex Embeds. Some more docstrings and typing.

Significant naflex refactor. Rename classes, models. Support flag for…

fe2867c

… creating classic vits as naflex. Cleanup, improvements.

Merge remote-tracking branch 'origin/main' into naflex

2bf71f5

Add naflex vit exceptions to tests

b3ca8fd

Fix features intermediates for NCHW inputs, patch variable size input…

dd3b96c

…s need more code

Rename dataset wrapper to NaFlexMapDatasetWrapper

d78cbf4

rwightman mentioned this pull request Jun 4, 2025

Packed Sequence Vision Transformer (aka NaViT) #1952

Closed

rwightman added 2 commits June 4, 2025 17:03

Add variable patch size to naflex training, improve patch size arg ha…

0d43942

…ndling from train.py onwards. Add docstrings and type annotations (thanks Claude).

Add missing patch embed interpolator

dac2ec6

rwightman marked this pull request as ready for review June 5, 2025 03:56

A bit of docstring and comment consistency cleanup, remove some debug…

4ff865c

… code

rwightman added 2 commits June 5, 2025 08:25

Update old FastCollateMixup to accept torch tensor inputs instead of …

99a09eb

…only numpy arrays

Fix another low use path where only numpy arrays are supported

a0b5bcc

rwightman merged commit a5e551b into main Jun 5, 2025
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Initial NaFlex ViT model and training support #2466

Initial NaFlex ViT model and training support #2466

Uh oh!

rwightman commented Apr 8, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 8, 2025

Uh oh!

stas-sl commented Jun 5, 2025

Uh oh!

rwightman commented Jun 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Initial NaFlex ViT model and training support #2466

Initial NaFlex ViT model and training support #2466

Uh oh!

Conversation

rwightman commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 8, 2025

Uh oh!

stas-sl commented Jun 5, 2025

Uh oh!

rwightman commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rwightman commented Apr 8, 2025 •

edited

Loading

rwightman commented Jun 5, 2025 •

edited

Loading