-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Open
Description
I am confused about the use of pretrained_window_size in WindowAttention Class.
I got the sense that in SwinV2 we can use Window-Size freely. So If I pre-train my model with ImageSize of 256 and Window-Size of 8 while I fine-tune a downstream task with ImageSize of 512 and Window-Size of 32, there should not be any need for handling the Continuous Relative Position Bias like we do interpolation etc in ViT.
But still we are doing this -
if pretrained_window_size[0] > 0:
relative_coords_table[:, :, :, 0] /= (pretrained_window_size[0] - 1)
relative_coords_table[:, :, :, 1] /= (pretrained_window_size[1] - 1)
else:
relative_coords_table[:, :, :, 0] /= (self.window_size[0] - 1)
relative_coords_table[:, :, :, 1] /= (self.window_size[1] - 1)
Also this is not being done in Torchvision's Implementation of SwinV2. check Here("https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py#L351-L365"). Also there is not such parameter as pretrained_window_size as well
Metadata
Metadata
Assignees
Labels
No labels