Skip to content

Conversation

KaparthyReddy
Copy link

Add position encoding interpolation to DeiT

Description

Adds position encoding interpolation to DeiT models, enabling the use of pretrained checkpoints on images with different resolutions than those used during training.

Contributes to #30579

Changes

  • ✅ Added interpolate_pos_encoding parameter to DeiTModel, DeiTForImageClassification, DeiTForImageClassificationWithTeacher, and DeiTForMaskedImageModeling
  • ✅ Implemented interpolate_pos_encoding() method in DeiTEmbeddings class
  • ✅ Properly handles both CLS and distillation tokens (unique to DeiT)
  • ✅ Added comprehensive test test_model_with_different_image_size
  • ✅ All tests passing

Implementation Details

DeiT has two special tokens (CLS + distillation), unlike ViT which only has one (CLS). The interpolation method:

  1. Separates the CLS/distillation token embeddings from patch embeddings
  2. Interpolates patch embeddings to match new image size
  3. Recombines CLS/distillation tokens with interpolated patches

Testing

# Example usage
model = DeiTModel.from_pretrained("facebook/deit-base-distilled-patch16-224")
# Use with 480x480 images instead of 224x224
outputs = model(large_images, interpolate_pos_encoding=True)

- Add interpolate_pos_encoding parameter to DeiTModel forward methods
- Implement interpolate_pos_encoding method in DeiTEmbeddings
- Handle both CLS and distillation tokens in interpolation
- Add test for dynamic resolution input
- Enables using pretrained DeiT checkpoints on different image sizes

Contributes to huggingface#30579
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: deit

@Rocketknight1
Copy link
Member

This code agent PR has a lot of unnecessary changes (note the head_mask lines, etc.) Hard to review until it's made more compact! (Please don't just ask your code agent to fix that, lol, they tend to make it worse)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants