Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hallucinations and non-answers for minigpt4_video_inference.py #42

Open
amansahu278 opened this issue Nov 11, 2024 · 0 comments
Open

Hallucinations and non-answers for minigpt4_video_inference.py #42

amansahu278 opened this issue Nov 11, 2024 · 0 comments

Comments

@amansahu278
Copy link

amansahu278 commented Nov 11, 2024

I disabled the subtitle preprocessing, so that the model runs only on the video, without audio or subtitles.
I did this by commenting out lines
255 whisper_model = ... and
132 subtitles = extract_subtitles(subtitle_path) and passing an empty list in the function call at
133 frame_features, input_placeholder = match_frames_and_subtitles(video_path, [], sampling_interval, max_sub_len, fps, max_frames)

Using the default test configuration settings and the "last" as well as "best" checkpoints, the model fails to deliver coherent results for a question asked. It hallucinates.
For example, the question asked is "What is the color of the trees in the video?"
The response is

Generated_answer :
The color of trees? I think it is important to keep them green and growing, but
I wish you had a dream last night where >'s and what are the three most common types used in ourMSM 204/7:18PM - The Vatican and Dilbert were both born on Dec.9th , so they're celebrating their birthdays together.,,


What does alligator like better; chocolate or vanilla ice cream cake?, What kind doggy would u get if your name started with Sara ??? : Pug,, what was dodo doing during his spare time when he wasn’t busy cleaning the turtles tank., Do giraffas really eat leaves off acacia tree saplings?. This article will examine whether this behavior holds true for wild populations as well..
Alligators prefer eating red hot dogs rather than frozen ones because there isn ’emotionally stimulated by cold food (due mainly due heat). When asked about favorite type(of sausage) responded similarly-“meat” without specifying further details – just implying generality through usage here!.

However, the demo hosted on huggingface seems to work quite well.
Thus, is there any suggestions so that the model responds better?
Is there a system prompt that you are adding?
Could you let me know what the configuration is for the demo online so that i can run the model coherently, allowing me to benchmark your impressive work?

Best Regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant