Add TimmWrapper #34564

qubvel · 2024-11-01T15:52:42Z

What does this PR do?

Adds a TimmWrapper set of classes such that timm models can be loaded in as transformer models into the library.

Continue of

TimmWrapper model #33687

General Usage

import torch
from urllib.request import urlopen
from PIL import Image
from transformers import AutoConfig, AutoModelForImageClassification, AutoImageProcessor

checkpoint = "timm/resnet50.a1_in1k"
img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

image_processor = AutoImageProcessor.from_pretrained(checkpoint)
inputs = image_processor(img, return_tensors="pt")
model = AutoModelForImageClassification.from_pretrained(checkpoint)

with torch.no_grad():
    logits = model(**inputs).logits

top5_probabilities, top5_class_indices = torch.topk(logits.softmax(dim=1) * 100, k=5)

Pipeline

Timm models can now be used in the image classification (if a classification model) and image feature extraction pipelines

import torch
from urllib.request import urlopen
from PIL import Image

from transformers import pipeline

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
pipe = pipeline("image-classification", model="timm/resnet18.a1_in1k")
print(pipe(img))

Trainer

Timm models can now be loaded and trained with the trainer class.

Example model trained with the trainer running the script command below:
https://huggingface.co/qubvel-hf/vit-base-beans

python run_image_classification.py \                
    --dataset_name beans \
    --output_dir ./beans_outputs/ \
    --remove_unused_columns False \
    --label_column_name labels \
    --do_train \
    --do_eval \
    --push_to_hub \
    --push_to_hub_model_id vit-base-beans \
    --learning_rate 2e-5 \
    --num_train_epochs 5 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --logging_strategy steps \
    --logging_steps 10 \
    --eval_strategy epoch \
    --save_strategy epoch \
    --load_best_model_at_end True \
    --save_total_limit 3 \
    --seed 1337 \
    --model_name_or_path timm/resnet18.a1_in1k \
    --ignore_mismatched_sizes

Other features enabled

Device map:

model = TimmWrapperForImageClassification.from_pretrained(checkpoint, device_map="auto")

Torch dtype:

model = TimmWrapperForImageClassification.from_pretrained(checkpoint, torch_dtype="bfloat16")

Quantization:

model = TimmWrapperForImageClassification.from_pretrained(checkpoint, load_in_4bit=True)

Intermediate hidden states: output_hidden_states=True or output_hidden_states=[1, 2, 3] (to select specific hidden states)

model = TimmWrapperForImageClassification.from_pretrained(checkpoint)
output = model(**intpus, output_hidden_states=True)

TODO

Gamma/beta renaming issue
Update timm in CI 0.9.6 -> 1.0.11 to enable output_hidden_states tests
Weights are loaded by transformers instead of timm, which architectures are affected?
Tests for image processor

HuggingFaceDocBuilderDev · 2024-11-01T16:38:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qubvel · 2024-11-04T16:44:57Z

src/transformers/modeling_utils.py

There are 2 changes in this file:

state_dict keys renaming moved into separate method to be able to override it for TimmWrapper (disable gamma/beta renaming + add prefix)

metadata is None for timm checkpoints -> assuming these are pytorch checkpoints

qubvel · 2024-11-04T16:45:44Z

src/transformers/models/auto/image_processing_auto.py

+        default_image_processor_filename = (
+            "config.json" if is_timm_checkpoint(pretrained_model_name_or_path) else IMAGE_PROCESSOR_NAME
+        )
+        kwargs["image_processor_filename"] = kwargs.get("image_processor_filename", default_image_processor_filename)


timm checkpoints store image processor config in config.json

qubvel · 2024-11-04T16:51:28Z

@rwightman could you please make the first review in case you have bandwidth

rwightman · 2024-11-05T04:46:32Z

@qubvel I'm starting to work through it now, wanted to get eval working to check some familiar models and just wasted way too much time realizing I needed --remove_unused_columns False for anything to work at all :/ ... that's a really poor setup when most datasets have an 'image' column and not a 'pixel_values' column (realize that's nothing to do with this PR, heh) :/

Annnyways, first pass of the code things looked sane but need to spend some time looking closer at the details and testing some cases.

rwightman · 2024-11-05T06:29:08Z

A few high level q...

Does 'Wrapper' add any worthwhile value/info in the name vs
- TimmPreTrainedModel(PreTrainedModel)
- TimmModel(TimmPretrainedModel)
- TimmModelForImageClassification(..)
Is there a reason TimmWrapperModel has .timm_model, instead of something more generic like .model
Any reservations in changing TimmWrapperModelForImageClassification to not use TimmWrapperModel? There are a few issues with the handling details for classifier, possible optimizations for forward call sequence and it'd probably be a bit cleaner to just duplicate a bit of redunant code and keep the two impl separate and a bit different.
I thought we were going to set pretained=True so timm can load the weights, their are a number of weight adaptation / translation things that don't run if this isn't use, cannot change num_classes cleanly for instance.
What happens if we try to push these models to the hub? Do they get uploaded/written in a form that timm can read?

qubvel · 2024-11-05T13:16:31Z

@rwightman thanks for the review! Indeed there are some default params I'm also confused about, it's even more for object detection 🥲

Does 'Wrapper' add any worthwhile value/info in the name vs
TimmPreTrainedModel(PreTrainedModel)
TimmModel(TimmPretrainedModel)
TimmModelForImageClassification(..)

Left it as it was in the previous PR, however, TimmModelForImageClassification sounds better to me, I can rename it

Is there a reason TimmWrapperModel has .timm_model, instead of something more generic like .model

The prefix "timm_model" is unique and is used in certain tests to identify when weights come from a timm model. It's also utilized in the _fix_state_dict_key method to determine whether to add the prefix when loading weights from the original checkpoint. For these reasons, I would prefer to keep it as "timm_model"

Any reservations in changing TimmWrapperModelForImageClassification to not use TimmWrapperModel? There are a few issues with the handling details for classifier, possible optimizations for forward call sequence and it'd probably be a bit cleaner to just duplicate a bit of redunant code and keep the two impl separate and a bit different.

Originally, this was implemented without TimmWrapperModel in TimmWrapperModelForImageClassification, but I introduced it to reduce code repetition. This approach also aligns better with common patterns in the transformers repo. Could you provide more details on the issues with the classifier? In any case, the code will remain the same for both models if we aim to maintain output_hidden_states functionality across them.

I thought we were going to set pretained=True so timm can load the weights, their are a number of weight adaptation / translation things that don't run if this isn't use, cannot change num_classes cleanly for instance.

I left a comment in the thread about this. We use transformers for weight loading to leverage features like device_map, torch_dtype, and quantization. I’m also unsure how to disable weight loading through transformers if it’s handled by timm, as I haven’t seen any examples of this in the repo. I can dig into it further if needed.

Do you have an estimate of how many models involve weight renaming? Is there a way to update checkpoints without breaking older versions of timm? Alternatively, could we manage similar weight renaming directly in transformers? (Though I think this approach may be less robust.)

What happens if we try to push these models to the hub? Do they get uploaded/written in a form that timm can read?

I’m not sure if it's currently compatible, but it would be great to enable it! The config should be compatible, however, the weights state dict will have the "model.timm_model." prefix. I can look into removing this prefix before saving. I will try to enable it and add a separate test for a few checkpoints. Thanks for bringing this up!

rwightman · 2024-11-05T16:14:38Z

Originally, this was implemented without TimmWrapperModel in TimmWrapperModelForImageClassification, but I introduced it to reduce code repetition. This approach also aligns better with common patterns in the transformers repo. Could you provide more details on the issues with the classifier? In any case, the code will remain the same for both models if we aim to maintain output_hidden_states functionality across them.

I feel in this case there is a difference in alignment with other models, because both ImageClassification and base model wrap timm model instances that differ, instead of adding their own head to the same timm model. I see some options, depending on the mix of hidden state flags, head vs no head where it'd be appropriate to go through different forward calls, where more than just one argument might be appropriate to change on creation, etc.

Also, thinking about the future and other tasks. I feel there is a high probability say for supporting native timm object detection more flexibility is desired so it'd be safer to have it uncoupled.

That and there's a resisitance to significant changes in transformers after something is in there, so feel it's better to leave uncoupled to have additional flexibility and avoid being stuck with tricky decisions that might impact timm moving forward.

rwightman · 2024-11-05T16:25:58Z

I thought we were going to set pretained=True so timm can load the weights, their are a number of weight adaptation / translation things that don't run if this isn't use, cannot change num_classes cleanly for instance.

I left a comment in the thread about this. We use transformers for weight loading to leverage features like device_map, torch_dtype, and quantization. I’m also unsure how to disable weight loading through transformers if it’s handled by timm, as I haven’t seen any examples of this in the repo. I can dig into it further if needed.

Do you have an estimate of how many models involve weight renaming? Is there a way to update checkpoints without breaking older versions of timm? Alternatively, could we manage similar weight renaming directly in transformers? (Though I think this approach may be less robust.)

Aside from renaming, it just doesn't work right now, you can't load weights for a model if the head size changes for a new classification task. The wrapper only works if you use the imagenet classifier.

I realize timm doesn't have the dtype, lazy init features but it's better to have it work I feel. Can potentially look at supporting some of that through timm. It hasn't been a priority as there aren't too many very large models in timm.

If there's no way to do use pretrained=True on creation then will probably need to figure out how to add & call a method in timm once transformers has the state_dict and before it's loaded into the model, not sure if there's a spot for such a call in transformers?

rwightman · 2024-11-05T17:27:57Z

src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

+        Empty init weights function to ensure compatibility of the class in the library.
+        """
+        if isinstance(module, (nn.Linear)):
+            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)


it's not clear why this is here, what it's attempting to do. timm has model specific init fns though they aren't separately callable right now, doing something like this that could overwrite timm defaults would change model behaviour

Added this to initialize classifier, without this weights are not properly initialized, probably due to how the model is created in transformers

from transformers import TimmWrapperForImageClassification # -------------- # With init # -------------- model = TimmWrapperForImageClassification.from_pretrained("timm/resnet18.a1_in1k", num_labels=10, ignore_mismatched_sizes=True) print(model.timm_model.fc.weight[:3, :3]) # tensor([[-0.2117, -0.2422, -0.2540], # [-0.1106, -0.1856, -0.0152], # [-0.3430, -0.6446, -0.0530]], grad_fn=<SliceBackward0>) # -------------- # Without init # -------------- # patch with empty init weight function to check def empty_init(self, module): pass TimmWrapperForImageClassification._init_weights = empty_init model = TimmWrapperForImageClassification.from_pretrained("timm/resnet18.a1_in1k", num_labels=10, ignore_mismatched_sizes=True) print(model.timm_model.fc.weight[:3, :3]) # tensor([[0., 0., 0.], # [0., 0., 0.], # [0., 0., 0.]], grad_fn=<SliceBackward0>)

Ideally, we should get rid of this, but it's not common for transformers to load external models, so it might require more code changes. For now, its a simple fix to enable model loading with initialized classifier if shapes are mismatched.

rwightman · 2024-11-05T17:29:48Z

src/transformers/utils/generic.py

+
+    if is_dir and os.path.exists(os.path.join(pretrained_model_name_or_path, IMAGE_PROCESSOR_NAME)):
+        # timm models don't have a preprocessor_config.json file saved out
+        return False


don't think absence of a file is a good check

… device_map)

qubvel · 2024-11-05T19:31:48Z

Aside from renaming, it just doesn't work right now, you can't load weights for a model if the head size changes for a new classification task. The wrapper only works if you use the imagenet classifier.

Fixed!

qubvel · 2024-11-05T19:35:06Z

If there's no way to do use pretrained=True on creation then will probably need to figure out how to add & call a method in timm once transformers has the state_dict and before it's loaded into the model, not sure if there's a spot for such a call in transformers?

If there would be any timm model/class-specific function that fixes state_dict I suppose we can call it before loading weights, similar to _fix_state_dict_key in the current implementation. However, it will be supported only for newer timm versions

rwightman · 2024-11-05T20:18:07Z

Thinking about this a bit more, there is another issue with the hub / transformers first weight loading. timm wasn't originally hub first, so the library itself is still the primary source of truth for some models, doing a hub based load won't work.

Example this model https://huggingface.co/laion/CLIP-ViT-B-16-laion2B-s34B-b88K ... is an OpenCLIP first model, but timm can load it if you use the model name 'vit_base_patch16_clip_224.laion2b'

Indeed I feel that the wrapper should support all timm model names that work in timm, but right now if the model isn't on the hub w/ a timm primary config it isn't useable. Ideally it should work with both a hub model name OR any timm model name. The timm model name would require timm do the pretrained loading to resolve any translation to other hub name or weight source.

Some very popular examples of this

https://github.com/huggingface/pytorch-image-models/blob/51ac8d2efb926c6b7c34eeb1dc52bcf57999e2de/timm/models/vision_transformer.py#L1580-L1716

rwightman · 2024-11-05T20:39:06Z

Aside from renaming, it just doesn't work right now, you can't load weights for a model if the head size changes for a new classification task. The wrapper only works if you use the imagenet classifier.

Fixed!

Yay, I was able to run fine-tune using run_image_classification.py after this fix. An observation the output files for that script aren't directly useable to push to hub.

There is no config matching timm format or name, there is a config output by the image preprocessor save process to a different filename that's a jumble of the timm config.

Also, the state dict for the model has the timm_model prefix so it's not loadable in timm. Is there anyway to remove that prefix in the checkpoints? this would also make it more seamless doing local dir loads if someone had timm weights already and config files checked out.

amyeroberts and others added 27 commits November 1, 2024 15:49

Add files

39c42db

Init

4b98375

Add TimmWrapperModel

aa494f1

Fix up

44d123e

Some fixes

4b35ae2

Fix up

2b5db8f

Remove old file

b07a5c9

Sort out import orders

e3a88b6

Fix some model loading

baffbe2

Compatible with pipeline and trainer

6fc50cf

Fix up

ed00b41

Delete test_timm_model_1/config.json

50b507b

Remove accidentally commited files

de87e54

Delete src/transformers/models/modeling_timm_wrapper.py

a907061

Remove empty imports; fix transformations applied

1f55841

Tidy up

23b38af

Add image classifcation model to special cases

2f0aee1

Create pretrained model; enable device_map='auto'

bff2e98

Enable most tests; fix init order

0c80253

Sort imports

a32de6a

[run-slow] timm_wrapper

59f55d1

Pass num_classes into timm.create_model

666419f

Remove train transforms from image processor

5540d32

Update timm creation with pretrained=False

88f737c

Fix gamma/beta issue for timm models

496d38d

Fixing gamma and beta renaming for timm models

4cfa51f

Simplify config and model creation

3891975

qubvel marked this pull request as draft November 1, 2024 15:52

qubvel added New model Vision labels Nov 1, 2024

qubvel added the run-slow label Nov 1, 2024

qubvel added 3 commits November 1, 2024 16:04

Remove attn_implementation diff

f878ab4

Fixup

b1f145f

Docstrings

7d87a95

qubvel added 8 commits November 1, 2024 17:34

Fix warning msg text according to test case

ae0425d

Fix device_map auto

ec9eade

Set dtype and device for pixel_values in forward

9767048

Enable output hidden states

d5478f6

Enable tests for hidden_states and model parallel

b61048b

Remove default scriptable arg

72236b5

Refactor inner model

e290f22

Update timm version

0a08a1f

qubvel commented Nov 4, 2024

View reviewed changes

qubvel requested a review from rwightman November 5, 2024 00:00

rwightman reviewed Nov 5, 2024

View reviewed changes

qubvel added 2 commits November 5, 2024 18:42

Fix _find_mismatched_keys function

1faf9ec

Change inheritance for Classification model (fix weights loading with…

addbac8

… device_map)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TimmWrapper #34564

Add TimmWrapper #34564

qubvel commented Nov 1, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 1, 2024

qubvel Nov 4, 2024

qubvel Nov 4, 2024

qubvel commented Nov 4, 2024

rwightman commented Nov 5, 2024 •

edited

Loading

rwightman commented Nov 5, 2024

qubvel commented Nov 5, 2024 •

edited

Loading

rwightman commented Nov 5, 2024 •

edited

Loading

rwightman commented Nov 5, 2024 •

edited

Loading

rwightman Nov 5, 2024

qubvel Nov 5, 2024

rwightman Nov 5, 2024

qubvel commented Nov 5, 2024 •

edited

Loading

qubvel commented Nov 5, 2024 •

edited

Loading

rwightman commented Nov 5, 2024

rwightman commented Nov 5, 2024

Add TimmWrapper #34564

Are you sure you want to change the base?

Add TimmWrapper #34564

Conversation

qubvel commented Nov 1, 2024 • edited Loading

What does this PR do?

General Usage

Pipeline

Trainer

Other features enabled

TODO

HuggingFaceDocBuilderDev commented Nov 1, 2024

qubvel Nov 4, 2024

Choose a reason for hiding this comment

qubvel Nov 4, 2024

Choose a reason for hiding this comment

qubvel commented Nov 4, 2024

rwightman commented Nov 5, 2024 • edited Loading

rwightman commented Nov 5, 2024

qubvel commented Nov 5, 2024 • edited Loading

rwightman commented Nov 5, 2024 • edited Loading

rwightman commented Nov 5, 2024 • edited Loading

rwightman Nov 5, 2024

Choose a reason for hiding this comment

qubvel Nov 5, 2024

Choose a reason for hiding this comment

rwightman Nov 5, 2024

Choose a reason for hiding this comment

qubvel commented Nov 5, 2024 • edited Loading

qubvel commented Nov 5, 2024 • edited Loading

rwightman commented Nov 5, 2024

rwightman commented Nov 5, 2024

qubvel commented Nov 1, 2024 •

edited

Loading

rwightman commented Nov 5, 2024 •

edited

Loading

qubvel commented Nov 5, 2024 •

edited

Loading

rwightman commented Nov 5, 2024 •

edited

Loading

rwightman commented Nov 5, 2024 •

edited

Loading

qubvel commented Nov 5, 2024 •

edited

Loading

qubvel commented Nov 5, 2024 •

edited

Loading