FlySearch: Exploring how vision-language models explore

A benchmark for evaluating vision-language models in simulated 3D, outdoor, photorealistic environments. Easy for humans, hard for state-of-the-art VLMs / MLLMs.

Leaderboard & Docs | Paper

Abstract

The real world is messy and unstructured. Uncovering critical information often requires active, goal-driven exploration. It remains to be seen whether Vision-Language Models (VLMs), which recently emerged as a popular zero-shot tool in many difficult tasks, can operate effectively in such conditions. In this paper, we answer this question by introducing FlySearch, a 3D, outdoor, photorealistic environment for searching and navigating to objects in complex scenes. We define three sets of scenarios with varying difficulty and observe that state-of-the-art VLMs cannot reliably solve even the simplest exploration tasks, with the gap to human performance increasing as the tasks get harder. We identify a set of central causes, ranging from vision hallucination, through context misunderstanding, to task planning failures, and we show that some of them can be addressed by finetuning. We publicly release the benchmark, scenarios, and the underlying codebase.

Preview

Objective: locate a fire, environment: city, model: GPT-4o.

flyserach-slow.hevc.mp4

Dependencies

Hardware

We recommend using a machine with at least 32GB of RAM and a modern CPU (e.g. AMD Ryzen 7 5800X3D or Intel i7 13700K). A ray-tracing capable GPU is required to run the Unreal Engine 5 (UE5) binaries. We've tested the benchmark on NVIDIA RTX 4060 and 4080 GPUs, as well NVIDIA A100. Vulkan drivers need to be installed for the GPU to work with UE5. Make sure you have at least 60GB free storage space (preferable SSD or RAM-cached HDD).

Operating system

We've verified that the benchmark works on Ubuntu 22.04, Archlinux (2025), and Rocky Linux 9.6, but it should work on any modern Linux distribution.

Unreal Engine 5 supports Windows and macOS as well, but we haven't tested the benchmark on these operating systems, nor provide compiled binaries for them. You will need to compile the UE5 environments yourself if you want to run the benchmark on Windows or macOS.

Python environment

We suggest you use Python 3.12 and then install dependencies using uv (https://docs.astral.sh/uv/) - it will automatically install all requirements when first running FlySearch.

`.env` file

Before proceeding, you need to create a .env file in the root directory of this repository. We've provided a template for it in the file .env-example. In other words, you should run:

cp .env-example .env

You will need to edit the .env file so that it contains your local variables (e.g. API keys and URLs).

Benchmark binaries

FlySearch will automatically download Unreal Engine binaries on Linux. We do not provide pre-compiled simulator for other platforms, so you will need to build it yourself (see documentation).

Running FlySearch

You can run FlySearch using

uv run flysearch.py --model-backend <name of model backend> --model-name <model name string> benchmark <scenario template set>

Scenario sets are located in the run_templates directory

See uv run flyserach.py --help for a list of all options.

Result analysis

To read how to analyse logs from FlySearch, please see the documentation.

Notes

UE5 binary crashes

The UE5 binary can sometimes spontaneously crash, usually when generating new scenarios. The code is designed to handle this in most situations (we've modified UnrealCV's code to do so). In rare case the entire benchmark crashes you just need to remove failed episode logs and restart the script. Furthermore, in case where your code was terminated by UnrealDiedException please open an issue here with a stack trace (or email us with it).

License and legal

Our benchmark code is released under the MIT License.

FlySearch uses Unreal® Engine. Unreal® is a trademark or registered trademark of Epic Games, Inc. in the United States of America and elsewhere. See Unreal Engine EULA for more information https://www.unrealengine.com/en-US/eula/unreal.

Citation

If you use FlySearch in your research, please cite the following paper:

@inproceedings{pardyl2025flysearch,
  title        = {{FlySearch: Exploring how vision-language models explore}},
  author       = {Pardyl, Adam and Matuszek, Dominik and Przebieracz, Mateusz and Cygan, Marek and Zieliński, Bartosz and Wołczyk, Maciej},
  year         = 2025,
  booktitle    = {{Advances in Neural Information Processing Systems (NeurIPS)}},
  volume       = 39
}

Name		Name	Last commit message	Last commit date
Latest commit History 646 Commits
.github/workflows		.github/workflows
all_logs		all_logs
analysis		analysis
conversation		conversation
docs		docs
glimpse_generators		glimpse_generators
misc		misc
prompts		prompts
response_parsers		response_parsers
rl		rl
run_templates		run_templates
scenarios		scenarios
web		web
.env-example		.env-example
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
flysearch.py		flysearch.py
locations_city.csv		locations_city.csv
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FlySearch: Exploring how vision-language models explore

Leaderboard & Docs | Paper

Abstract

Preview

Dependencies

Hardware

Operating system

Python environment

`.env` file

Benchmark binaries

Running FlySearch

Result analysis

Notes

UE5 binary crashes

License and legal

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

gmum/FlySearch

Folders and files

Latest commit

History

Repository files navigation

FlySearch: Exploring how vision-language models explore

Leaderboard & Docs | Paper

Abstract

Preview

Dependencies

Hardware

Operating system

Python environment

.env file

Benchmark binaries

Running FlySearch

Result analysis

Notes

UE5 binary crashes

License and legal

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

`.env` file

Packages