Skip to content

MCG-NJU/RGE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reasoning Guided Embeddings: Leveraging MLLM Reasoning
for Improved Multimodal Retrieval

Chunxu Liu,  Jiyuan Yang,  Ruopeng GaoYuhan ZhuFeng ZhuRui ZhaoLimin Wang
Nanjing University,   SenseTime Research

Overview

TL; DR. We introduce Reasoning Guide Embedding (RGE) model, which takes advantage of MLLMs’ structured reasoning during embedding extraction, using generated rationales with contrastive training to produce more context-aware representations, improving embedding quality.

Overview

We propose Reasoning Guided Embeddings (RGE), which explicitly incorporates a model’s generative reasoning into the embedding extraction process. RGE first prompts the MLLM to produce structured rationales conditioned on the instruction. After the reasoning unfolds, the model extracts embeddings enriched with context-dependent inference signals. Experiments on the MMEB benchmark show that this reasoning guided approach significantly boosts retrieval performance.

Environment Setup

Please make sure the transformers version is compatible.

conda create -n sorce python=3.11
pip install -r requirements.txt

Evaluation Dataset Preparation

Please download MMEB evaluation Hugging Face.

mkdir datasets
huggingface-cli download --repo-type dataset --resume-download TIGER-Lab/MMEB-eval --local-dir ./datasets/MMEB-eval

Evaluation

To evaluate the model, please first download MMEB evaluation dataset and our model at 🤗hugginface pretrained model:

huggingface-cli download --resume-download lcrocks/RGE --local-dir ./models/RGE
huggingface-cli download --repo-type dataset --resume-download TIGER-Lab/MMEB-eval --local-dir ./datasets/MMEB-eval

After downloading the evaluation dataset and model, please run the following command for evaluation.

bash scripts/evaluate.sh

Experiment Results:

Results

Citation

If you think this project is helpful in your research or for application, please feel free to leave a star⭐️ and cite our paper:


@misc{liu2025reasoningguidedembeddingsleveraging,
      title={Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval}, 
      author={Chunxu Liu and Jiyuan Yang and Ruopeng Gao and Yuhan Zhu and Feng Zhu and Rui Zhao and Limin Wang},
      year={2025},
      eprint={2511.16150},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.16150}, 
}

License and Acknowledgement

This project is released under the Apache 2.0 license. The codes are based on MoCa. Please also follow their licenses. Thanks for their awesome work!

We also thank all contributors to this project and Qingyun for the valuable discussions!

About

Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published