Skip to content

Use of pretrained_window_size #385

@abhiagwl4262

Description

@abhiagwl4262

I am confused about the use of pretrained_window_size in WindowAttention Class.

I got the sense that in SwinV2 we can use Window-Size freely. So If I pre-train my model with ImageSize of 256 and Window-Size of 8 while I fine-tune a downstream task with ImageSize of 512 and Window-Size of 32, there should not be any need for handling the Continuous Relative Position Bias like we do interpolation etc in ViT.

But still we are doing this -

        if pretrained_window_size[0] > 0:
            relative_coords_table[:, :, :, 0] /= (pretrained_window_size[0] - 1)
            relative_coords_table[:, :, :, 1] /= (pretrained_window_size[1] - 1)
        else:
            relative_coords_table[:, :, :, 0] /= (self.window_size[0] - 1)
            relative_coords_table[:, :, :, 1] /= (self.window_size[1] - 1)

Also this is not being done in Torchvision's Implementation of SwinV2. check Here("https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py#L351-L365"). Also there is not such parameter as pretrained_window_size as well

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions