Skip to content

Conversation

@Anjingkun
Copy link

Hello! We would like to request the inclusion of our paper, "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics", into your awesome list.

Our contributions include:

  • 🤖 A 3D Spatial Reasoning Model: We propose RoboRefer, a 3D-aware reasoning VLM trained using a sequential SFT-RFT strategy with metric-sensitive process reward functions to achieve spatial referring.
  • 📚 A Large-Scale Open-Source Dataset (RefSpatial): To support this research, we have released the RefSpatial dataset, a large-scale collection of 2.5 million samples with 20 million QA pairs. It features fine-grained annotations for 31 distinct spatial relations. A key feature of our simulated data is the inclusion of detailed, step-by-step reasoning processes that show how to utilize spatial constraints
  • 🏆 A Challenging Benchmark (RefSpatial-Bench): We also introduce RefSpatial-Bench, a new benchmark that fills the gap in evaluating spatial referring with multi-step reasoning. Over 70% of the tasks require multi-step reasoning (up to 5 steps).

The open-source assets we released, including our models, dataset and benchmark, have been downloaded over 4000 times, demonstrating their high value and impact. Therefore, we believe that our model, combined with our advanced dataset and benchmark, constitutes a significant contribution that will be a valuable resource for the community.

Citation:

@article{zhou2025roborefer,
  title={RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics},
  author={Zhou, Enshen and An, Jingkun and Chi, Cheng and Han, Yi and Rong, Shanyu and Zhang, Chi and Wang, Pengwei and Wang, Zhongyuan and Huang, Tiejun and Sheng, Lu and others},
  journal={arXiv preprint arXiv:2506.04308},
  year={2025}
}

Thank you for your consideration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant