-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Initial NaFlex ViT model and training support #2466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…t if we bash h,w key into an int or str
… add, classic vit weight loading for naflex model
…king loader based patch compatible RandomErasing for NaFlex mode.
…w. Remove subregion mode, not going to be worth it.
… embeds and 'aspect preserving mode' to Flex Embeds. Some more docstrings and typing.
… creating classic vits as naflex. Cleanup, improvements.
…ndling from train.py onwards. Add docstrings and type annotations (thanks Claude).
@stas-sl if you train/fine-tune with a diff patch size using basic interpolation as you say, yeah, I imagine it will be fine, if you train while resizing to different patch sizes using the simple interpolation and don't get crazy in the range of sizes covered, I expect it'd be robust to sizes in the range used (at inference time). But I haven't tried this extensively. However, using the simple resize on existing model weights yields pretty poor results compared to the PI method. Originally with the PI method I had based on the original JAX impl it was damned slow, however I completely redid it native torch tensors and a WAY faster basis vector computation and it runs quite nicely at train time, so that's why I decided to just support the PI mode. I was just testing this yesterday and the NaFlex pipeline appears to be working well when both randomizing sequence length AND patch size at train time, neat. |
Working:
use_naflex=True
flag in create_model())Not tested / not completed: