GitHub - MCG-NJU/RGE: Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval

Reasoning Guided Embeddings: Leveraging MLLM Reasoning
for Improved Multimodal Retrieval

Chunxu Liu, Jiyuan Yang, Ruopeng Gao, Yuhan Zhu, Feng Zhu, Rui Zhao, Limin Wang,
Nanjing University, SenseTime Research

Overview

TL; DR. We introduce Reasoning Guide Embedding (RGE) model, which takes advantage of MLLMs’ structured reasoning during embedding extraction, using generated rationales with contrastive training to produce more context-aware representations, improving embedding quality.

We propose Reasoning Guided Embeddings (RGE), which explicitly incorporates a model’s generative reasoning into the embedding extraction process. RGE first prompts the MLLM to produce structured rationales conditioned on the instruction. After the reasoning unfolds, the model extracts embeddings enriched with context-dependent inference signals. Experiments on the MMEB benchmark show that this reasoning guided approach significantly boosts retrieval performance.

Environment Setup

Please make sure the transformers version is compatible.

conda create -n sorce python=3.11
pip install -r requirements.txt

Evaluation Dataset Preparation

Please download MMEB evaluation Hugging Face.

mkdir datasets
huggingface-cli download --repo-type dataset --resume-download TIGER-Lab/MMEB-eval --local-dir ./datasets/MMEB-eval

Evaluation

To evaluate the model, please first download MMEB evaluation dataset and our model at 🤗hugginface pretrained model:

huggingface-cli download --resume-download lcrocks/RGE --local-dir ./models/RGE
huggingface-cli download --repo-type dataset --resume-download TIGER-Lab/MMEB-eval --local-dir ./datasets/MMEB-eval

After downloading the evaluation dataset and model, please run the following command for evaluation.

bash scripts/evaluate.sh

Experiment Results:

Citation

If you think this project is helpful in your research or for application, please feel free to leave a star⭐️ and cite our paper:


@misc{liu2025reasoningguidedembeddingsleveraging,
      title={Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval}, 
      author={Chunxu Liu and Jiyuan Yang and Ruopeng Gao and Yuhan Zhu and Feng Zhu and Rui Zhao and Limin Wang},
      year={2025},
      eprint={2511.16150},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.16150}, 
}

License and Acknowledgement

This project is released under the Apache 2.0 license. The codes are based on MoCa. Please also follow their licenses. Thanks for their awesome work!

We also thank all contributors to this project and Qingyun for the valuable discussions!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
evaluation		evaluation
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
eval_ar.py		eval_ar.py
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reasoning Guided Embeddings: Leveraging MLLM Reasoning
for Improved Multimodal Retrieval

Overview

Environment Setup

Evaluation Dataset Preparation

Evaluation

Experiment Results:

Citation

License and Acknowledgement

About

Uh oh!

Releases

Packages

Languages

MCG-NJU/RGE

Folders and files

Latest commit

History

Repository files navigation

Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval

Overview

Environment Setup

Evaluation Dataset Preparation

Evaluation

Experiment Results:

Citation

License and Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Reasoning Guided Embeddings: Leveraging MLLM Reasoning
for Improved Multimodal Retrieval

Packages