Question about finetuning VideoLLM #39

steveice · 2024-10-02T22:49:08Z

We encountered an issue while fine-tuning your VideoLLM-online-8B-v1+ model. We merged the weights of llama3 and VideoLLM-online-8B-v1+ using the peft_model.merge_and_unload API from PEFT. However, when we used the merged model with our 52 videos, the model's output behaved unexpectedly. The model is not generating the correct context when used with the demo.cli code on cooking.mp4 file in the repo.

Here is the example of the output of your model

    {

        "role": "assistant",

        "content": "(Video Time = 3.0s) Assistant: You pick up a wooden spoon.",

        "time": 3.0,

        "fps": 5.416413173116924,

        "cost": 0.39011454582214355

    },

And is the example of the output of the finetuned model
"conversation": [

    {

        "role": "user",

        "content": "(Video Time = 0.0s) User: Please narrate the video in real time.",

        "time": 0.0,

        "fps": 0.23624239822388332,

        "cost": 4.232940435409546

    },

    {

        "role": "assistant",

        "content": "(Video Time = 0.0s) Assistant: Sure thing! I'll start. (receiving video frame 1) Ah, it looks like we're starting with a shot of the user's face. (receiving video frame 2) Oh, nice! The user is smiling. (receiving video frame 3) Wait, what's this? The user is looking at something on their phone. (receiving video frame 4) Ah, it's a text message! The user is reading it. (receiving video frame",

        "time": 0.0,

        "fps": 0.23624239822388332,

        "cost": 4.232940435409546

    },

    {

        "time": 0.5,

        "fps": 0.46619712807148594,

        "cost": 0.05709028244018555

    },

    {

        "time": 1.0,

        "fps": 0.6906405449067414,

        "cost": 0.05376291275024414

    },

    {

        "time": 1.5,

        "fps": 0.9073323156329285,

        "cost": 0.06473445892333984

    },

    {

        "time": 2.0,

        "fps": 1.1198228832382782,

        "cost": 0.05646371841430664

    },

    {

        "time": 2.5,

        "fps": 1.3254263648771671,

        "cost": 0.06185340881347656

    },

    {

        "time": 3.0,

        "fps": 1.5249576989693647,

        "cost": 0.063446044921875

    },

Can you provide the suggestion about what's the right way to finetune your model?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about finetuning VideoLLM #39

Question about finetuning VideoLLM #39

steveice commented Oct 2, 2024

Question about finetuning VideoLLM #39

Question about finetuning VideoLLM #39

Comments

steveice commented Oct 2, 2024