HI authors, I appreciate your amazing work. I am using InternVideo2-1B as my work backbone. And want to train a clip model with our customized dataset. Could you provide the scripts about how to fintune a clip model from InternVideo2-1B checkpoint. Or any instructions how to do that. Thank you so much.