Add position encoding interpolation to DeiT #41528
Open
+89
−49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add position encoding interpolation to DeiT
Description
Adds position encoding interpolation to DeiT models, enabling the use of pretrained checkpoints on images with different resolutions than those used during training.
Contributes to #30579
Changes
interpolate_pos_encoding
parameter toDeiTModel
,DeiTForImageClassification
,DeiTForImageClassificationWithTeacher
, andDeiTForMaskedImageModeling
interpolate_pos_encoding()
method inDeiTEmbeddings
classtest_model_with_different_image_size
Implementation Details
DeiT has two special tokens (CLS + distillation), unlike ViT which only has one (CLS). The interpolation method:
Testing