You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I disabled the subtitle preprocessing, so that the model runs only on the video, without audio or subtitles.
I did this by commenting out lines
255 whisper_model = ... and
132 subtitles = extract_subtitles(subtitle_path) and passing an empty list in the function call at
133 frame_features, input_placeholder = match_frames_and_subtitles(video_path, [], sampling_interval, max_sub_len, fps, max_frames)
Using the default test configuration settings and the "last" as well as "best" checkpoints, the model fails to deliver coherent results for a question asked. It hallucinates.
For example, the question asked is "What is the color of the trees in the video?"
The response is
Generated_answer :
The color of trees? I think it is important to keep them green and growing, but I wish you had a dream last night where >'s and what are the three most common types used in ourMSM 204/7:18PM - The Vatican and Dilbert were both born on Dec.9th , so they're celebrating their birthdays together.,,
What does alligator like better; chocolate or vanilla ice cream cake?, What kind doggy would u get if your name started with Sara ??? : Pug,, what was dodo doing during his spare time when he wasn’t busy cleaning the turtles tank., Do giraffas really eat leaves off acacia tree saplings?. This article will examine whether this behavior holds true for wild populations as well..
Alligators prefer eating red hot dogs rather than frozen ones because there isn ’emotionally stimulated by cold food (due mainly due heat). When asked about favorite type(of sausage) responded similarly-“meat” without specifying further details – just implying generality through usage here!.
However, the demo hosted on huggingface seems to work quite well.
Thus, is there any suggestions so that the model responds better?
Is there a system prompt that you are adding?
Could you let me know what the configuration is for the demo online so that i can run the model coherently, allowing me to benchmark your impressive work?
Best Regards.
The text was updated successfully, but these errors were encountered:
I disabled the subtitle preprocessing, so that the model runs only on the video, without audio or subtitles.
I did this by commenting out lines
255
whisper_model = ...
and132
subtitles = extract_subtitles(subtitle_path)
and passing an empty list in the function call at133
frame_features, input_placeholder = match_frames_and_subtitles(video_path, [], sampling_interval, max_sub_len, fps, max_frames)
Using the default test configuration settings and the "last" as well as "best" checkpoints, the model fails to deliver coherent results for a question asked. It hallucinates.
For example, the question asked is "What is the color of the trees in the video?"
The response is
However, the demo hosted on huggingface seems to work quite well.
Thus, is there any suggestions so that the model responds better?
Is there a system prompt that you are adding?
Could you let me know what the configuration is for the demo online so that i can run the model coherently, allowing me to benchmark your impressive work?
Best Regards.
The text was updated successfully, but these errors were encountered: