You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the paper, it has been mentioned that "For both classification tasks, we leverage clips of 16 frames, spanning temporally for 1 second, with a spatial dimension of 224×398 pixels. Specifically, the clips contain 8 frames before the foul and 8 frames after the foul." However, the data annotation lacks timestamps indicating when the foul occurred. Could you please share the details of how the 1-second clip was extracted?