This carefully curated list brings together key methods, datasets, and benchmarks in the field of spatial intelligence for VLMs.
With the development of multimodal models, evaluating and enhancing their spatial intelligence has become a key research frontier. This list aims to provide researchers and engineers with a quick index to track the latest advancements in the field.
We welcome contributions of excellent resources you find via Pull Request!
| Title | Introduction | Date | Code |
|---|---|---|---|
Do Multimodal Language Models Really Understand Direction? A Benchmark for Compass Direction Reasoning |
![]() |
2024-12 | - |
GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning |
![]() |
2024-07 | - |











































































































































