-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cosine similarity between hidden outputs #3
Comments
Hi, could you give me your command? The problem seems caused by the usage of talking-head attention, and I am reproducing the results. |
@ChengyueGongR |
Similar problem, would you like to share the DeIT-24 model to reproduce the result? |
Hi, p.s. Our DEIT-B24 reported in the paper is trained by Cutmix instead of Mixup + Cutmix as in DEIT, and we will add this detail in a further version of our draft. I further test a DEIT-B24 model trained by Mixup + Cutmix, its cosine similarity is [0.43, 0.45, 0.35, 0.27, 0.27, 0.27, 0.28, 0.29, 0.31, 0.31, 0.32, 0.32, 0.35, 0.35, 0.37, 0.37, 0.39, 0.41, 0.48, 0.52, 0.61, 0.61, 0.77, 0.80], not so large as the previous one. I'm further checking this fact now. Thanks for your suggestions again! @Andy1621 @freeman-1995 |
I have met this problem too, I calculate the cosine similarity the same as @freeman-1995 on DeiT-base with official pretrained weight, and the consine similarity is: |
Hi, would you like to share your code about computing the cosine similarity? I can't reproduce the results. @freeman-1995 @ChuanyangZheng |
Hi, I train a vit_base_patch32 model with reslution 224 on imagenet, the valid acc comes up to 73.38%, then I dump all transformer blocks' outputs, calculate cosine similarity between them as mentioned in paper, I cannot have the same result in paper, here is my result. the 0,1,2,3,4,5,6,7,8,9,10,11 is layer depth, right is the value of cosine similarity
0 ---> 0.5428178586547845
1 ---> 0.6069238793659271
2 ---> 0.30199843167793006
3 ---> 0.26388993740273486
4 ---> 0.26132955026320265
5 ---> 0.24258930458215844
6 ---> 0.20970458839482967
7 ---> 0.21119057468517677
8 ---> 0.22155304307901189
9 ---> 0.23545575648548187
10 ---> 0.2329663067175004
11 ---> 0.22496230679589768
The text was updated successfully, but these errors were encountered: