[FEATURE REQUEST] Enable Video Training

**Is your feature request related to a problem? Please describe.**
I have been actively using this repository for multimodal training involving images and text. It has been incredibly helpful for my research and development. However, I am interested in expanding the capabilities to include video-based multimodal training. Currently, the repository does not support video inputs, which limits the scope of applications that can be developed.

**Describe the workflow you want to enable.**
I would like to enable a workflow where video data can be seamlessly integrated into the existing multimodal training pipeline. This would involve handling video frames as sequential data and allowing the model to learn from both visual and textual information extracted from videos.

**Describe your proposed solution.**
To address this, I propose the following:
Implement support for video data by extending the current data handling pipeline to process video frames.

**Describe alternatives you've considered**
An alternative solution could be to preprocess videos externally into a sequence of images and then feed these images into the existing image-based pipeline. However, this approach may not fully leverage the temporal information present in videos, and the preprocessing step could introduce additional complexity.

**Additional context**
Supporting video inputs could significantly enhance the repository's utility for a wider range of applications, such as video captioning, action recognition, and video question answering. 

**Are you willing to help implement this feature?**
Yes, I am very keen to contribute to this feature. I have experience in handling video data and training multimodal models. I expect it might take a few weeks to implement and test the feature, depending on the complexity. I would appreciate any guidance or support from the OpenFlamingo team to ensure seamless integration with the existing codebase.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE REQUEST] Enable Video Training #305

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE REQUEST] Enable Video Training #305

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions