Skip to content

Question on Default init_values in SwinTransformerV2CrBlock #2233

You must be logged in to vote

@zhaohm14 norm is at the end of the residual path, so the norm's weight is the last scaling layer before merging with shortcut, therefore, it's similar to layer-scale, skip-init, and resnet zero-init-bn which all scale the residual by a single scalar or one-per-channel and typically start with 0 to very small value.

Replies: 1 comment 3 replies

You must be logged in to vote
3 replies
@zhaohm14

@rwightman

Answer selected by zhaohm14
@zhaohm14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants