About Normalization in the PatchEmbed for Swin Transformer #1667

Phuoc-Hoan-Le · 2023-02-10T16:53:40Z

Phuoc-Hoan-Le
Feb 10, 2023

Hi,

For normal ViT, there is no normalization layer after the nn.Conv2d in the patch embeding. However, for Swin Transformer, there is a normalization layer after the nn.Conv2d in the patch embeding.

Why did you decide to add normalization after that nn.Conv2d? Have you tried training Swin without adding a normalization layer after the nn.Conv2d in the patch embedding and see if it's better?

rwightman · 2023-02-10T17:29:54Z

rwightman
Feb 10, 2023
Maintainer

@CharlesLeeeee that's not my decision, that was done by the authors of the swin paper. It's not unique to swin either. The ViT variant used in CLIP has a norm layer after the patch + pos_emb/token, there are others like swin with one right after the patch embed conv, and there is the dual patch-norm paper that puts one both BEFORE the patch emb and after like Swin https://arxiv.org/abs/2302.01327 ... although they didn't credit any existing model archs that already added one after...

0 replies

rwightman · 2023-02-10T17:34:03Z

rwightman
Feb 10, 2023
Maintainer

@CharlesLeeeee moving to discussion as better forum for these sorts of q (and others can see after I close)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Normalization in the PatchEmbed for Swin Transformer #1667

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

About Normalization in the PatchEmbed for Swin Transformer #1667

Phuoc-Hoan-Le Feb 10, 2023

Replies: 2 comments

rwightman Feb 10, 2023 Maintainer

rwightman Feb 10, 2023 Maintainer

Phuoc-Hoan-Le
Feb 10, 2023

rwightman
Feb 10, 2023
Maintainer

rwightman
Feb 10, 2023
Maintainer