-
Notifications
You must be signed in to change notification settings - Fork 257
Open
Description
Hi, thanks for sharing codes.
Two questions here:
-
I extracted video features by using this pre-trained model (resnet-34-kinetics-cpu.pth) and I checked the outputs that the dimension of extracted features for each segment (16 frames) is 512 dims. However, in your paper, for this model, it should be 512/2=256 dims after global average pooling. Please correct me if I am wrong.
-
For the pre-trained models provided by you, there are "resnet-34-kinetics-cpu.pth" and "resnext-101-kinetics.pth". I would ask - why is the latter model size smaller than the former's? To my understanding, the latter model should have more parameters to be trained (more filters/feature channels).
Looking forward to your reply. Thanks in advance.
Metadata
Metadata
Assignees
Labels
No labels