HICom

The official implementation of Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models (CVPR 2025)

🛠️ Environment Preparation

git clone https://github.com/lntzm/HICom.git
cd HICom
conda create -n hicom python==3.10
conda activate hicom
conda install pytorch==2.4.1 torchvision==0.19.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install numpy==1.26.4
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

📜 Data Preparation

We put all our training and evaluation model under playground folder. The structure are here:

playground
├── data
│   ├── eval_image -> /.../LLaVA/playground/data/eval # Link LLaVA eval folder here
│   ├── eval_video
│   │   ├── Activitynet_Zero_Shot_QA
│   │   ├── EgoSchema
│   │   ├── MLVU
│   │   ├── MSRVTT_Zero_Shot_QA
│   │   ├── MSVD_Zero_Shot_QA
│   │   ├── MVBench
│   │   ├── Video-ChatGPT-eval
│   │   └── Video-MME
│   ├── Ins-VL
│   ├── LLaVA-Instruct-150K
│   ├── LLaVA-Pretrain
│   └── Video_Mix_Instruct
│       ├── Charades
│       ├── CLEVER
│       ├── LLaVA-Hound
│       ├── LLaVA-Video-178K
│       ├── m4_instruct_videos
│       ├── mit_action
│       ├── NTU-RGB-D
│       ├── ssv2-cls
│       ├── TVQA
│       └── Video-ChatGPT-0525
└── models
    ├── Qwen2.5-0.5B-Instruct
    ├── Qwen2.5-1.5B-Instruct
    ├── Qwen2.5-7B-Instruct
    └── siglip-so400m-patch14-384

💰 Train

Train scripts are under scripts/qwen2.5_7B folder.

bash scripts/qwen2.5_7B/release/directg_local43_global32.sh

🤗 Checkpoints

We release our trained checkpoint in Huggingface, which performs a little higher than reported, as we re-organize the code, fix some bugs, upgrade the environment, and re-train the model with unfreezing the text encoder.

🤖 Evaluation

video evaluation scripts are under scripts/eval/video folder.

# videomme
CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/eval/video/eval_video_mcqa_videomme.sh CKPT_PATH

# mvbench
CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/eval/video/eval_video_mcqa_mvbench.sh CKPT_PATH

# egoschema
CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/eval/video/eval_video_mcqa_egoschema.sh CKPT_PATH

# mlvu
CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/eval/video/eval_video_mcqa_mlvu.sh CKPT_PATH

📑 Citation

If you find our work useful for your research and applications, please cite using this BibTeX:

@article{liu2025hybrid,
  title={Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models},
  author={Liu, Zhihang and Xie, Chen-Wei and Li, Pandeng and Zhao, Liming and Tang, Longxiang and Zheng, Yun and Liu, Chuanbin and Xie, Hongtao},
  journal={arXiv preprint arXiv:2503.16036},
  year={2025}
}

👍 Acknowledgement

The codebase of HICom is adapted from VideoLLaMA 2 and LLaVA-OneVision. We are also grateful for the following projects our HICom arise from: Qwen2.5, SigLIP, Panda-70M.

🔒 License

This project is released under the Apache 2.0 license as found in the LICENSE file. The service is a research preview intended for non-commercial use ONLY, subject to the model Licenses of LLaMA and Mistral, Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Please get in touch with us if you find any potential violations.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
hicom		hicom
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HICom

🛠️ Environment Preparation

📜 Data Preparation

💰 Train

🤗 Checkpoints

🤖 Evaluation

📑 Citation

👍 Acknowledgement

🔒 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

lntzm/HICom

Folders and files

Latest commit

History

Repository files navigation

HICom

🛠️ Environment Preparation

📜 Data Preparation

💰 Train

🤗 Checkpoints

🤖 Evaluation

📑 Citation

👍 Acknowledgement

🔒 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages