-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
Hi, thank you for your great job!
I'm reproducing MSRVTT captioning results using the fine-tuned weights you provided in the repo(mPLUG2_ MSRVTT_Caption.pth downloaded from the link), but I cannot get the result reported in the paper, and there is a huge gap. What problem could it be? Thanks!
My results:
{'Bleu_1': 0.2391483871053033, 'Bleu_2': 0.1397145198812077, 'Bleu_3': 0.08582614908051771, 'Bleu_4': 0.0554141450685924, 'CIDEr': 0.6409439525382706}
More information:
- using checkpoint mPLUG2_ MSRVTT_Caption.pth downloaded from the link
- using language_evaluation package from https://github.com/bckim92/language-evaluation
- using MSRVTT-test-1ka split, also called JSFUSION split, which is the same split in text-to-video-retrieval task
My eval logs:
| distributed init (rank 3): env://
| distributed init (rank 1): env://
| distributed init (rank 2): env://
| distributed init (rank 0): env://
Creating video caption datasets
Creating model
use_checkpoint: True
_IncompatibleKeys(missing_keys=['visual.transformer.resblocks.0.lmhra1.ln.weight', 'visual.transformer.resblocks.0.lmhra1.ln.bias', 'visual.transformer.resblocks.0.lmhra1.down_proj.weight', 'visual.transformer.resblocks.0.lmhra1.down_proj.bias', 'visual.transformer.resblocks.0.lmhra1.conv.weight', 'visual.transformer.resblocks.0.lmhra1.conv.bias', 'visual.transformer.resblocks.0.lmhra1.up_proj.weight', 'visual.transformer.resblocks.0.lmhra1.up_proj.bias', 'visual.transformer.resblocks.0.lmhra2.ln.weight', 'visual.transformer.resblocks.0.lmhra2.ln.bias', 'visual.transformer.resblocks.0.lmhra2.down_proj.weight', 'visual.transformer.resblocks.0.lmhra2.down_proj.bias', 'visual.transformer.resblocks.0.lmhra2.conv.weight', 'visual.transformer.resblocks.0.lmhra2.conv.bias', 'visual.transformer.resblocks.0.lmhra2.up_proj.weight', 'visual.transformer.resblocks.0.lmhra2.up_proj.bias', 'visual.transformer.resblocks.1.lmhra1.ln.weight', 'visual.transformer.resblocks.1.lmhra1.ln.bias', 'visual.transformer.resblocks.1.lmhra1.down_proj.weight', 'visual.transformer.resblocks.1.lmhra1.down_proj.bias', 'visual.transformer.resblocks.1.lmhra1.conv.weight', 'visual.transformer.resblocks.1.lmhra1.conv.bias', 'visual.transformer.resblocks.1.lmhra1.up_proj.weight', 'visual.transformer.resblocks.1.lmhra1.up_proj.bias', 'visual.transformer.resblocks.1.lmhra2.ln.weight', 'visual.transformer.resblocks.1.lmhra2.ln.bias', 'visual.transformer.resblocks.1.lmhra2.down_proj.weight', 'visual.transformer.resblocks.1.lmhra2.down_proj.bias', 'visual.transformer.resblocks.1.lmhra2.conv.weight', 'visual.transformer.resblocks.1.lmhra2.conv.bias', 'visual.transformer.resblocks.1.lmhra2.up_proj.weight', 'visual.transformer.resblocks.1.lmhra2.up_proj.bias', 'visual.transformer.resblocks.2.lmhra1.ln.weight', 'visual.transformer.resblocks.2.lmhra1.ln.bias', 'visual.transformer.resblocks.2.lmhra1.down_proj.weight', 'visual.transformer.resblocks.2.lmhra1.down_proj.bias', 'visual.transformer.resblocks.2.lmhra1.conv.weight', 'visual.transformer.resblocks.2.lmhra1.conv.bias', 'visual.transformer.resblocks.2.lmhra1.up_proj.weight', 'visual.transformer.resblocks.2.lmhra1.up_proj.bias', 'visual.transformer.resblocks.2.lmhra2.ln.weight', 'visual.transformer.resblocks.2.lmhra2.ln.bias', 'visual.transformer.resblocks.2.lmhra2.down_proj.weight', 'visual.transformer.resblocks.2.lmhra2.down_proj.bias', 'visual.transformer.resblocks.2.lmhra2.conv.weight', 'visual.transformer.resblocks.2.lmhra2.conv.bias', 'visual.transformer.resblocks.2.lmhra2.up_proj.weight', 'visual.transformer.resblocks.2.lmhra2.up_proj.bias', 'visual.transformer.resblocks.3.lmhra1.ln.weight', 'visual.transformer.resblocks.3.lmhra1.ln.bias', 'visual.transformer.resblocks.3.lmhra1.down_proj.weight', 'visual.transformer.resblocks.3.lmhra1.down_proj.bias', 'visual.transformer.resblocks.3.lmhra1.conv.weight', 'visual.transformer.resblocks.3.lmhra1.conv.bias', 'visual.transformer.resblocks.3.lmhra1.up_proj.weight', 'visual.transformer.resblocks.3.lmhra1.up_proj.bias', 'visual.transformer.resblocks.3.lmhra2.ln.weight', 'visual.transformer.resblocks.3.lmhra2.ln.bias', 'visual.transformer.resblocks.3.lmhra2.down_proj.weight', 'visual.transformer.resblocks.3.lmhra2.down_proj.bias', 'visual.transformer.resblocks.3.lmhra2.conv.weight', 'visual.transformer.resblocks.3.lmhra2.conv.bias', 'visual.transformer.resblocks.3.lmhra2.up_proj.weight', 'visual.transformer.resblocks.3.lmhra2.up_proj.bias', 'visual.transformer.resblocks.4.lmhra1.ln.weight', 'visual.transformer.resblocks.4.lmhra1.ln.bias', 'visual.transformer.resblocks.4.lmhra1.down_proj.weight', 'visual.transformer.resblocks.4.lmhra1.down_proj.bias', 'visual.transformer.resblocks.4.lmhra1.conv.weight', 'visual.transformer.resblocks.4.lmhra1.conv.bias', 'visual.transformer.resblocks.4.lmhra1.up_proj.weight', 'visual.transformer.resblocks.4.lmhra1.up_proj.bias', 'visual.transformer.resblocks.4.lmhra2.ln.weight', 'visual.transformer.resblocks.4.lmhra2.ln.bias', 'visual.transformer.resblocks.4.lmhra2.down_proj.weight', 'visual.transformer.resblocks.4.lmhra2.down_proj.bias', 'visual.transformer.resblocks.4.lmhra2.conv.weight', 'visual.transformer.resblocks.4.lmhra2.conv.bias', 'visual.transformer.resblocks.4.lmhra2.up_proj.weight', 'visual.transformer.resblocks.4.lmhra2.up_proj.bias', 'visual.transformer.resblocks.5.lmhra1.ln.weight', 'visual.transformer.resblocks.5.lmhra1.ln.bias', 'visual.transformer.resblocks.5.lmhra1.down_proj.weight', 'visual.transformer.resblocks.5.lmhra1.down_proj.bias', 'visual.transformer.resblocks.5.lmhra1.conv.weight', 'visual.transformer.resblocks.5.lmhra1.conv.bias', 'visual.transformer.resblocks.5.lmhra1.up_proj.weight', 'visual.transformer.resblocks.5.lmhra1.up_proj.bias', 'visual.transformer.resblocks.5.lmhra2.ln.weight', 'visual.transformer.resblocks.5.lmhra2.ln.bias', 'visual.transformer.resblocks.5.lmhra2.down_proj.weight', 'visual.transformer.resblocks.5.lmhra2.down_proj.bias', 'visual.transformer.resblocks.5.lmhra2.conv.weight', 'visual.transformer.resblocks.5.lmhra2.conv.bias', 'visual.transformer.resblocks.5.lmhra2.up_proj.weight', 'visual.transformer.resblocks.5.lmhra2.up_proj.bias', 'visual.transformer.resblocks.6.lmhra1.ln.weight', 'visual.transformer.resblocks.6.lmhra1.ln.bias', 'visual.transformer.resblocks.6.lmhra1.down_proj.weight', 'visual.transformer.resblocks.6.lmhra1.down_proj.bias', 'visual.transformer.resblocks.6.lmhra1.conv.weight', 'visual.transformer.resblocks.6.lmhra1.conv.bias', 'visual.transformer.resblocks.6.lmhra1.up_proj.weight', 'visual.transformer.resblocks.6.lmhra1.up_proj.bias', 'visual.transformer.resblocks.6.lmhra2.ln.weight', 'visual.transformer.resblocks.6.lmhra2.ln.bias', 'visual.transformer.resblocks.6.lmhra2.down_proj.weight', 'visual.transformer.resblocks.6.lmhra2.down_proj.bias', 'visual.transformer.resblocks.6.lmhra2.conv.weight', 'visual.transformer.resblocks.6.lmhra2.conv.bias', 'visual.transformer.resblocks.6.lmhra2.up_proj.weight', 'visual.transformer.resblocks.6.lmhra2.up_proj.bias', 'visual.transformer.resblocks.7.lmhra1.ln.weight', 'visual.transformer.resblocks.7.lmhra1.ln.bias', 'visual.transformer.resblocks.7.lmhra1.down_proj.weight', 'visual.transformer.resblocks.7.lmhra1.down_proj.bias', 'visual.transformer.resblocks.7.lmhra1.conv.weight', 'visual.transformer.resblocks.7.lmhra1.conv.bias', 'visual.transformer.resblocks.7.lmhra1.up_proj.weight', 'visual.transformer.resblocks.7.lmhra1.up_proj.bias', 'visual.transformer.resblocks.7.lmhra2.ln.weight', 'visual.transformer.resblocks.7.lmhra2.ln.bias', 'visual.transformer.resblocks.7.lmhra2.down_proj.weight', 'visual.transformer.resblocks.7.lmhra2.down_proj.bias', 'visual.transformer.resblocks.7.lmhra2.conv.weight', 'visual.transformer.resblocks.7.lmhra2.conv.bias', 'visual.transformer.resblocks.7.lmhra2.up_proj.weight', 'visual.transformer.resblocks.7.lmhra2.up_proj.bias', 'visual.transformer.resblocks.8.lmhra1.ln.weight', 'visual.transformer.resblocks.8.lmhra1.ln.bias', 'visual.transformer.resblocks.8.lmhra1.down_proj.weight', 'visual.transformer.resblocks.8.lmhra1.down_proj.bias', 'visual.transformer.resblocks.8.lmhra1.conv.weight', 'visual.transformer.resblocks.8.lmhra1.conv.bias', 'visual.transformer.resblocks.8.lmhra1.up_proj.weight', 'visual.transformer.resblocks.8.lmhra1.up_proj.bias', 'visual.transformer.resblocks.8.lmhra2.ln.weight', 'visual.transformer.resblocks.8.lmhra2.ln.bias', 'visual.transformer.resblocks.8.lmhra2.down_proj.weight', 'visual.transformer.resblocks.8.lmhra2.down_proj.bias', 'visual.transformer.resblocks.8.lmhra2.conv.weight', 'visual.transformer.resblocks.8.lmhra2.conv.bias', 'visual.transformer.resblocks.8.lmhra2.up_proj.weight', 'visual.transformer.resblocks.8.lmhra2.up_proj.bias', 'visual.transformer.resblocks.9.lmhra1.ln.weight', 'visual.transformer.resblocks.9.lmhra1.ln.bias', 'visual.transformer.resblocks.9.lmhra1.down_proj.weight', 'visual.transformer.resblocks.9.lmhra1.down_proj.bias', 'visual.transformer.resblocks.9.lmhra1.conv.weight', 'visual.transformer.resblocks.9.lmhra1.conv.bias', 'visual.transformer.resblocks.9.lmhra1.up_proj.weight', 'visual.transformer.resblocks.9.lmhra1.up_proj.bias', 'visual.transformer.resblocks.9.lmhra2.ln.weight', 'visual.transformer.resblocks.9.lmhra2.ln.bias', 'visual.transformer.resblocks.9.lmhra2.down_proj.weight', 'visual.transformer.resblocks.9.lmhra2.down_proj.bias', 'visual.transformer.resblocks.9.lmhra2.conv.weight', 'visual.transformer.resblocks.9.lmhra2.conv.bias', 'visual.transformer.resblocks.9.lmhra2.up_proj.weight', 'visual.transformer.resblocks.9.lmhra2.up_proj.bias', 'visual.transformer.resblocks.10.lmhra1.ln.weight', 'visual.transformer.resblocks.10.lmhra1.ln.bias', 'visual.transformer.resblocks.10.lmhra1.down_proj.weight', 'visual.transformer.resblocks.10.lmhra1.down_proj.bias', 'visual.transformer.resblocks.10.lmhra1.conv.weight', 'visual.transformer.resblocks.10.lmhra1.conv.bias', 'visual.transformer.resblocks.10.lmhra1.up_proj.weight', 'visual.transformer.resblocks.10.lmhra1.up_proj.bias', 'visual.transformer.resblocks.10.lmhra2.ln.weight', 'visual.transformer.resblocks.10.lmhra2.ln.bias', 'visual.transformer.resblocks.10.lmhra2.down_proj.weight', 'visual.transformer.resblocks.10.lmhra2.down_proj.bias', 'visual.transformer.resblocks.10.lmhra2.conv.weight', 'visual.transformer.resblocks.10.lmhra2.conv.bias', 'visual.transformer.resblocks.10.lmhra2.up_proj.weight', 'visual.transformer.resblocks.10.lmhra2.up_proj.bias', 'visual.transformer.resblocks.11.lmhra1.ln.weight', 'visual.transformer.resblocks.11.lmhra1.ln.bias', 'visual.transformer.resblocks.11.lmhra1.down_proj.weight', 'visual.transformer.resblocks.11.lmhra1.down_proj.bias', 'visual.transformer.resblocks.11.lmhra1.conv.weight', 'visual.transformer.resblocks.11.lmhra1.conv.bias', 'visual.transformer.resblocks.11.lmhra1.up_proj.weight', 'visual.transformer.resblocks.11.lmhra1.up_proj.bias', 'visual.transformer.resblocks.11.lmhra2.ln.weight', 'visual.transformer.resblocks.11.lmhra2.ln.bias', 'visual.transformer.resblocks.11.lmhra2.down_proj.weight', 'visual.transformer.resblocks.11.lmhra2.down_proj.bias', 'visual.transformer.resblocks.11.lmhra2.conv.weight', 'visual.transformer.resblocks.11.lmhra2.conv.bias', 'visual.transformer.resblocks.11.lmhra2.up_proj.weight', 'visual.transformer.resblocks.11.lmhra2.up_proj.bias', 'visual.transformer.resblocks.12.lmhra1.ln.weight', 'visual.transformer.resblocks.12.lmhra1.ln.bias', 'visual.transformer.resblocks.12.lmhra1.down_proj.weight', 'visual.transformer.resblocks.12.lmhra1.down_proj.bias', 'visual.transformer.resblocks.12.lmhra1.conv.weight', 'visual.transformer.resblocks.12.lmhra1.conv.bias', 'visual.transformer.resblocks.12.lmhra1.up_proj.weight', 'visual.transformer.resblocks.12.lmhra1.up_proj.bias', 'visual.transformer.resblocks.12.lmhra2.ln.weight', 'visual.transformer.resblocks.12.lmhra2.ln.bias', 'visual.transformer.resblocks.12.lmhra2.down_proj.weight', 'visual.transformer.resblocks.12.lmhra2.down_proj.bias', 'visual.transformer.resblocks.12.lmhra2.conv.weight', 'visual.transformer.resblocks.12.lmhra2.conv.bias', 'visual.transformer.resblocks.12.lmhra2.up_proj.weight', 'visual.transformer.resblocks.12.lmhra2.up_proj.bias', 'visual.transformer.resblocks.13.lmhra1.ln.weight', 'visual.transformer.resblocks.13.lmhra1.ln.bias', 'visual.transformer.resblocks.13.lmhra1.down_proj.weight', 'visual.transformer.resblocks.13.lmhra1.down_proj.bias', 'visual.transformer.resblocks.13.lmhra1.conv.weight', 'visual.transformer.resblocks.13.lmhra1.conv.bias', 'visual.transformer.resblocks.13.lmhra1.up_proj.weight', 'visual.transformer.resblocks.13.lmhra1.up_proj.bias', 'visual.transformer.resblocks.13.lmhra2.ln.weight', 'visual.transformer.resblocks.13.lmhra2.ln.bias', 'visual.transformer.resblocks.13.lmhra2.down_proj.weight', 'visual.transformer.resblocks.13.lmhra2.down_proj.bias', 'visual.transformer.resblocks.13.lmhra2.conv.weight', 'visual.transformer.resblocks.13.lmhra2.conv.bias', 'visual.transformer.resblocks.13.lmhra2.up_proj.weight', 'visual.transformer.resblocks.13.lmhra2.up_proj.bias', 'visual.transformer.resblocks.14.lmhra1.ln.weight', 'visual.transformer.resblocks.14.lmhra1.ln.bias', 'visual.transformer.resblocks.14.lmhra1.down_proj.weight', 'visual.transformer.resblocks.14.lmhra1.down_proj.bias', 'visual.transformer.resblocks.14.lmhra1.conv.weight', 'visual.transformer.resblocks.14.lmhra1.conv.bias', 'visual.transformer.resblocks.14.lmhra1.up_proj.weight', 'visual.transformer.resblocks.14.lmhra1.up_proj.bias', 'visual.transformer.resblocks.14.lmhra2.ln.weight', 'visual.transformer.resblocks.14.lmhra2.ln.bias', 'visual.transformer.resblocks.14.lmhra2.down_proj.weight', 'visual.transformer.resblocks.14.lmhra2.down_proj.bias', 'visual.transformer.resblocks.14.lmhra2.conv.weight', 'visual.transformer.resblocks.14.lmhra2.conv.bias', 'visual.transformer.resblocks.14.lmhra2.up_proj.weight', 'visual.transformer.resblocks.14.lmhra2.up_proj.bias', 'visual.transformer.resblocks.15.lmhra1.ln.weight', 'visual.transformer.resblocks.15.lmhra1.ln.bias', 'visual.transformer.resblocks.15.lmhra1.down_proj.weight', 'visual.transformer.resblocks.15.lmhra1.down_proj.bias', 'visual.transformer.resblocks.15.lmhra1.conv.weight', 'visual.transformer.resblocks.15.lmhra1.conv.bias', 'visual.transformer.resblocks.15.lmhra1.up_proj.weight', 'visual.transformer.resblocks.15.lmhra1.up_proj.bias', 'visual.transformer.resblocks.15.lmhra2.ln.weight', 'visual.transformer.resblocks.15.lmhra2.ln.bias', 'visual.transformer.resblocks.15.lmhra2.down_proj.weight', 'visual.transformer.resblocks.15.lmhra2.down_proj.bias', 'visual.transformer.resblocks.15.lmhra2.conv.weight', 'visual.transformer.resblocks.15.lmhra2.conv.bias', 'visual.transformer.resblocks.15.lmhra2.up_proj.weight', 'visual.transformer.resblocks.15.lmhra2.up_proj.bias', 'visual.transformer.resblocks.16.lmhra1.ln.weight', 'visual.transformer.resblocks.16.lmhra1.ln.bias', 'visual.transformer.resblocks.16.lmhra1.down_proj.weight', 'visual.transformer.resblocks.16.lmhra1.down_proj.bias', 'visual.transformer.resblocks.16.lmhra1.conv.weight', 'visual.transformer.resblocks.16.lmhra1.conv.bias', 'visual.transformer.resblocks.16.lmhra1.up_proj.weight', 'visual.transformer.resblocks.16.lmhra1.up_proj.bias', 'visual.transformer.resblocks.16.lmhra2.ln.weight', 'visual.transformer.resblocks.16.lmhra2.ln.bias', 'visual.transformer.resblocks.16.lmhra2.down_proj.weight', 'visual.transformer.resblocks.16.lmhra2.down_proj.bias', 'visual.transformer.resblocks.16.lmhra2.conv.weight', 'visual.transformer.resblocks.16.lmhra2.conv.bias', 'visual.transformer.resblocks.16.lmhra2.up_proj.weight', 'visual.transformer.resblocks.16.lmhra2.up_proj.bias', 'visual.transformer.resblocks.17.lmhra1.ln.weight', 'visual.transformer.resblocks.17.lmhra1.ln.bias', 'visual.transformer.resblocks.17.lmhra1.down_proj.weight', 'visual.transformer.resblocks.17.lmhra1.down_proj.bias', 'visual.transformer.resblocks.17.lmhra1.conv.weight', 'visual.transformer.resblocks.17.lmhra1.conv.bias', 'visual.transformer.resblocks.17.lmhra1.up_proj.weight', 'visual.transformer.resblocks.17.lmhra1.up_proj.bias', 'visual.transformer.resblocks.17.lmhra2.ln.weight', 'visual.transformer.resblocks.17.lmhra2.ln.bias', 'visual.transformer.resblocks.17.lmhra2.down_proj.weight', 'visual.transformer.resblocks.17.lmhra2.down_proj.bias', 'visual.transformer.resblocks.17.lmhra2.conv.weight', 'visual.transformer.resblocks.17.lmhra2.conv.bias', 'visual.transformer.resblocks.17.lmhra2.up_proj.weight', 'visual.transformer.resblocks.17.lmhra2.up_proj.bias', 'visual.transformer.resblocks.18.lmhra1.ln.weight', 'visual.transformer.resblocks.18.lmhra1.ln.bias', 'visual.transformer.resblocks.18.lmhra1.down_proj.weight', 'visual.transformer.resblocks.18.lmhra1.down_proj.bias', 'visual.transformer.resblocks.18.lmhra1.conv.weight', 'visual.transformer.resblocks.18.lmhra1.conv.bias', 'visual.transformer.resblocks.18.lmhra1.up_proj.weight', 'visual.transformer.resblocks.18.lmhra1.up_proj.bias', 'visual.transformer.resblocks.18.lmhra2.ln.weight', 'visual.transformer.resblocks.18.lmhra2.ln.bias', 'visual.transformer.resblocks.18.lmhra2.down_proj.weight', 'visual.transformer.resblocks.18.lmhra2.down_proj.bias', 'visual.transformer.resblocks.18.lmhra2.conv.weight', 'visual.transformer.resblocks.18.lmhra2.conv.bias', 'visual.transformer.resblocks.18.lmhra2.up_proj.weight', 'visual.transformer.resblocks.18.lmhra2.up_proj.bias', 'visual.transformer.resblocks.19.lmhra1.ln.weight', 'visual.transformer.resblocks.19.lmhra1.ln.bias', 'visual.transformer.resblocks.19.lmhra1.down_proj.weight', 'visual.transformer.resblocks.19.lmhra1.down_proj.bias', 'visual.transformer.resblocks.19.lmhra1.conv.weight', 'visual.transformer.resblocks.19.lmhra1.conv.bias', 'visual.transformer.resblocks.19.lmhra1.up_proj.weight', 'visual.transformer.resblocks.19.lmhra1.up_proj.bias', 'visual.transformer.resblocks.19.lmhra2.ln.weight', 'visual.transformer.resblocks.19.lmhra2.ln.bias', 'visual.transformer.resblocks.19.lmhra2.down_proj.weight', 'visual.transformer.resblocks.19.lmhra2.down_proj.bias', 'visual.transformer.resblocks.19.lmhra2.conv.weight', 'visual.transformer.resblocks.19.lmhra2.conv.bias', 'visual.transformer.resblocks.19.lmhra2.up_proj.weight', 'visual.transformer.resblocks.19.lmhra2.up_proj.bias', 'visual.transformer.resblocks.20.lmhra1.ln.weight', 'visual.transformer.resblocks.20.lmhra1.ln.bias', 'visual.transformer.resblocks.20.lmhra1.down_proj.weight', 'visual.transformer.resblocks.20.lmhra1.down_proj.bias', 'visual.transformer.resblocks.20.lmhra1.conv.weight', 'visual.transformer.resblocks.20.lmhra1.conv.bias', 'visual.transformer.resblocks.20.lmhra1.up_proj.weight', 'visual.transformer.resblocks.20.lmhra1.up_proj.bias', 'visual.transformer.resblocks.20.lmhra2.ln.weight', 'visual.transformer.resblocks.20.lmhra2.ln.bias', 'visual.transformer.resblocks.20.lmhra2.down_proj.weight', 'visual.transformer.resblocks.20.lmhra2.down_proj.bias', 'visual.transformer.resblocks.20.lmhra2.conv.weight', 'visual.transformer.resblocks.20.lmhra2.conv.bias', 'visual.transformer.resblocks.20.lmhra2.up_proj.weight', 'visual.transformer.resblocks.20.lmhra2.up_proj.bias', 'visual.transformer.resblocks.21.lmhra1.ln.weight', 'visual.transformer.resblocks.21.lmhra1.ln.bias', 'visual.transformer.resblocks.21.lmhra1.down_proj.weight', 'visual.transformer.resblocks.21.lmhra1.down_proj.bias', 'visual.transformer.resblocks.21.lmhra1.conv.weight', 'visual.transformer.resblocks.21.lmhra1.conv.bias', 'visual.transformer.resblocks.21.lmhra1.up_proj.weight', 'visual.transformer.resblocks.21.lmhra1.up_proj.bias', 'visual.transformer.resblocks.21.lmhra2.ln.weight', 'visual.transformer.resblocks.21.lmhra2.ln.bias', 'visual.transformer.resblocks.21.lmhra2.down_proj.weight', 'visual.transformer.resblocks.21.lmhra2.down_proj.bias', 'visual.transformer.resblocks.21.lmhra2.conv.weight', 'visual.transformer.resblocks.21.lmhra2.conv.bias', 'visual.transformer.resblocks.21.lmhra2.up_proj.weight', 'visual.transformer.resblocks.21.lmhra2.up_proj.bias', 'visual.transformer.resblocks.22.lmhra1.ln.weight', 'visual.transformer.resblocks.22.lmhra1.ln.bias', 'visual.transformer.resblocks.22.lmhra1.down_proj.weight', 'visual.transformer.resblocks.22.lmhra1.down_proj.bias', 'visual.transformer.resblocks.22.lmhra1.conv.weight', 'visual.transformer.resblocks.22.lmhra1.conv.bias', 'visual.transformer.resblocks.22.lmhra1.up_proj.weight', 'visual.transformer.resblocks.22.lmhra1.up_proj.bias', 'visual.transformer.resblocks.22.lmhra2.ln.weight', 'visual.transformer.resblocks.22.lmhra2.ln.bias', 'visual.transformer.resblocks.22.lmhra2.down_proj.weight', 'visual.transformer.resblocks.22.lmhra2.down_proj.bias', 'visual.transformer.resblocks.22.lmhra2.conv.weight', 'visual.transformer.resblocks.22.lmhra2.conv.bias', 'visual.transformer.resblocks.22.lmhra2.up_proj.weight', 'visual.transformer.resblocks.22.lmhra2.up_proj.bias', 'visual.transformer.resblocks.23.lmhra1.ln.weight', 'visual.transformer.resblocks.23.lmhra1.ln.bias', 'visual.transformer.resblocks.23.lmhra1.down_proj.weight', 'visual.transformer.resblocks.23.lmhra1.down_proj.bias', 'visual.transformer.resblocks.23.lmhra1.conv.weight', 'visual.transformer.resblocks.23.lmhra1.conv.bias', 'visual.transformer.resblocks.23.lmhra1.up_proj.weight', 'visual.transformer.resblocks.23.lmhra1.up_proj.bias', 'visual.transformer.resblocks.23.lmhra2.ln.weight', 'visual.transformer.resblocks.23.lmhra2.ln.bias', 'visual.transformer.resblocks.23.lmhra2.down_proj.weight', 'visual.transformer.resblocks.23.lmhra2.down_proj.bias', 'visual.transformer.resblocks.23.lmhra2.conv.weight', 'visual.transformer.resblocks.23.lmhra2.conv.bias', 'visual.transformer.resblocks.23.lmhra2.up_proj.weight', 'visual.transformer.resblocks.23.lmhra2.up_proj.bias'], unexpected_keys=[])
train_step_per_epoch: 11250
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.
Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
load checkpoint from /home/lbx/MyHome/pretrained_model_weights/mPLUG-2/mPLUG2_MSRVTT_Caption.pth
<All keys matched successfully>
Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
Start training
[{'video_id': 'video9770', 'pred_caption': 'a boy is fixing a computer', 'gold_caption': 'a person is connecting something to system'}, {'video_id': 'video7026', 'pred_caption': 'a man is talking about a car', 'gold_caption': 'a man is giving a review on a vehicle'}, {'video_id': 'video9778', 'pred_caption': 'a boy is performing on the voice', 'gold_caption': 'a little boy singing in front of judges and crowd'}, {'video_id': 'video9772', 'pred_caption': 'a cartoon character is flying', 'gold_caption': 'some cartoon characters are moving around an area'}]
Generate Caption test result: [ 0/63] eta: 0:08:24 time: 8.0067 data: 5.3014 max mem: 16941
Generate Caption test result: [50/63] eta: 0:00:17 time: 1.1697 data: 0.0001 max mem: 17413
Generate Caption test result: [62/63] eta: 0:00:01 time: 1.1289 data: 0.0001 max mem: 17413
Generate Caption test result: Total time: 0:01:21 (1.2926 s / it)
result file saved to output/videocaption_msrvtt_4/result/caption_result_zeroshot.json
1000 {'Bleu_1': 0.2391483871053033, 'Bleu_2': 0.1397145198812077, 'Bleu_3': 0.08582614908051771, 'Bleu_4': 0.0554141450685924, 'CIDEr': 0.6409439525382706}
Training time 0:01:23
Metadata
Metadata
Assignees
Labels
No labels