Chunxu Liu,
Jiyuan Yang,
Ruopeng Gao,
Yuhan Zhu,
Feng Zhu,
Rui Zhao,
Limin Wang,
Nanjing University, SenseTime Research
TL; DR. We introduce Reasoning Guide Embedding (RGE) model, which takes advantage of MLLMs’ structured reasoning during embedding extraction, using generated rationales with contrastive training to produce more context-aware representations, improving embedding quality.
We propose Reasoning Guided Embeddings (RGE), which explicitly incorporates a model’s generative reasoning into the embedding extraction process. RGE first prompts the MLLM to produce structured rationales conditioned on the instruction. After the reasoning unfolds, the model extracts embeddings enriched with context-dependent inference signals. Experiments on the MMEB benchmark show that this reasoning guided approach significantly boosts retrieval performance.
Please make sure the transformers version is compatible.
conda create -n sorce python=3.11
pip install -r requirements.txt
Please download MMEB evaluation Hugging Face.
mkdir datasets
huggingface-cli download --repo-type dataset --resume-download TIGER-Lab/MMEB-eval --local-dir ./datasets/MMEB-eval
To evaluate the model, please first download MMEB evaluation dataset and our model at 🤗hugginface pretrained model:
huggingface-cli download --resume-download lcrocks/RGE --local-dir ./models/RGE
huggingface-cli download --repo-type dataset --resume-download TIGER-Lab/MMEB-eval --local-dir ./datasets/MMEB-eval
After downloading the evaluation dataset and model, please run the following command for evaluation.
bash scripts/evaluate.sh
If you think this project is helpful in your research or for application, please feel free to leave a star⭐️ and cite our paper:
@misc{liu2025reasoningguidedembeddingsleveraging,
title={Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval},
author={Chunxu Liu and Jiyuan Yang and Ruopeng Gao and Yuhan Zhu and Feng Zhu and Rui Zhao and Limin Wang},
year={2025},
eprint={2511.16150},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.16150},
}
This project is released under the Apache 2.0 license. The codes are based on MoCa. Please also follow their licenses. Thanks for their awesome work!
We also thank all contributors to this project and Qingyun for the valuable discussions!

