Skip to content

[FEATURE REQUEST] Enable Video Training #305

@simplaj

Description

@simplaj

Is your feature request related to a problem? Please describe.
I have been actively using this repository for multimodal training involving images and text. It has been incredibly helpful for my research and development. However, I am interested in expanding the capabilities to include video-based multimodal training. Currently, the repository does not support video inputs, which limits the scope of applications that can be developed.

Describe the workflow you want to enable.
I would like to enable a workflow where video data can be seamlessly integrated into the existing multimodal training pipeline. This would involve handling video frames as sequential data and allowing the model to learn from both visual and textual information extracted from videos.

Describe your proposed solution.
To address this, I propose the following:
Implement support for video data by extending the current data handling pipeline to process video frames.

Describe alternatives you've considered
An alternative solution could be to preprocess videos externally into a sequence of images and then feed these images into the existing image-based pipeline. However, this approach may not fully leverage the temporal information present in videos, and the preprocessing step could introduce additional complexity.

Additional context
Supporting video inputs could significantly enhance the repository's utility for a wider range of applications, such as video captioning, action recognition, and video question answering.

Are you willing to help implement this feature?
Yes, I am very keen to contribute to this feature. I have experience in handling video data and training multimodal models. I expect it might take a few weeks to implement and test the feature, depending on the complexity. I would appreciate any guidance or support from the OpenFlamingo team to ensure seamless integration with the existing codebase.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions