About Normalization in the PatchEmbed for Swin Transformer #1667
Replies: 2 comments
-
@CharlesLeeeee that's not my decision, that was done by the authors of the swin paper. It's not unique to swin either. The ViT variant used in CLIP has a norm layer after the patch + pos_emb/token, there are others like swin with one right after the patch embed conv, and there is the dual patch-norm paper that puts one both BEFORE the patch emb and after like Swin https://arxiv.org/abs/2302.01327 ... although they didn't credit any existing model archs that already added one after... |
Beta Was this translation helpful? Give feedback.
-
@CharlesLeeeee moving to discussion as better forum for these sorts of q (and others can see after I close) |
Beta Was this translation helpful? Give feedback.
-
Hi,
For normal ViT, there is no normalization layer after the nn.Conv2d in the patch embeding. However, for Swin Transformer, there is a normalization layer after the nn.Conv2d in the patch embeding.
Why did you decide to add normalization after that nn.Conv2d? Have you tried training Swin without adding a normalization layer after the nn.Conv2d in the patch embedding and see if it's better?
Beta Was this translation helpful? Give feedback.
All reactions