long videos inference error

Hello, when loading the model for inference, it was found that the inference results on short videos meet expectations, but on long videos (the example provided in the project), the inference results produce a series of special characters.

this is my example

![image](https://github.com/user-attachments/assets/02d684e3-b1ce-4d4b-8eb0-9060e330c6af)