Gemamba

This repository contains training code for the Gemamba multimodal language model.

Gemamba is the first multimodal LLM to combine a Mamba-based video encoder with performant and flexible Gemma transformer LLM in a LLaVA-style architecture.

Getting started

We recommend using Dev Containers to create the environment using pre-made configuration.

Install PyTorch.
Install Python dependencies.

pip3 install -r requirements.txt

Install VideoMamba dependencies:

pip3 install -e llava/model/multimodal_encoder/videomamba/causal-conv1d
pip3 install -e llava/model/multimodal_encoder/videomamba/mamba

[optional] Update transformers to get Phi3 support:

pip3 install git+https://github.com/huggingface/transformers

Download pretrained weights for VideoMamba:

wget https://huggingface.co/OpenGVLab/VideoMamba/resolve/main/videomamba_m16_25M_f8_res224.pth

Refer to run_finetune.ipynb to learn how to load a checkpoint and run inference.

Pretrained checkpoints

Pretrained checkpoint for the model can be found here: HF 🤗.

The model's projector has been pretrained for 1 epoch on the Valley dataset.
LLM and the projector have been jointly fine-tuned using the Video-ChatGPT dataset.

Training

We inherit most of the training workflow from the original LLaVA. Please refer to scripts/train to see configurations used for training the model. See scripts/eval for scripts used to calculate benchmark scores.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.devcontainer		.devcontainer
llava		llava
scripts		scripts
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
predict.py		predict.py
requirements.txt		requirements.txt
run_batched_inference.ipynb		run_batched_inference.ipynb
run_finetune.ipynb		run_finetune.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemamba

Getting started

Pretrained checkpoints

Training

About

Releases

Packages

Languages

License

tensorsense/gemamba

Folders and files

Latest commit

History

Repository files navigation

Gemamba

Getting started

Pretrained checkpoints

Training

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages