PyVision: Agentic Vision with Dynamic Tooling

🎯Overview

LLMs are increasingly deployed as agents, systems capable of planning, reasoning, and dynamically calling external tools. However, in visual reasoning, prior approaches largely remain limited by predefined workflows and static toolsets. In this report, we present PyVision, an interactive, multi-turn framework that enables MLLMs to autonomously generate, execute, and refine Python-based tools tailored to the task at hand, unlocking flexible and interpretable problem-solving. We develop a taxonomy of the tools created by PyVision and analyze their usage across a diverse set of benchmarks. Quantitatively, PyVision achieves consistent performance gains, boosting GPT-4.1 by +7.8% on V* and Claude-4.0-Sonnet by +31.1% on VLMsAreBlind-mini. These results point to a broader shift: dynamic tooling allows models not just to use tools, but to invent them, advancing toward more agentic visual reasoning.

🚩News

[2025-7-8] 🚀🚀🚀 We are excited to release PyVision, inluding:
- Techniqual report, code and online demo.

📦Installation

Prepare the running environment, both for the main process and the environment runtime.

git clone https://github.com/agents-x-project/PyVision.git
cd PyVision

conda create -n pyvision python=3.10
conda activate pyvision
pip install -r requirements.txt

💥Run PyVision

1. Setup API Config

Before running PyVision, you need to first setup the API config file, including the key and the base_url. We provide three types of clients: OpenAI, Azure and vLLM.

OpenAI Client

# ./api_config_files/api_config_openai.json
{
    "api_key": [
        "sk-xxx"
    ],
    "base_url": "xxx"
}

Azure Client

# ./api_config_files/api_config_azure.json
{
    "azure_openai_api_key": [
        "xxx"
    ],
    "azure_openai_endpoint": "xxx"
}

vLLM Client

# ./api_config_files/api_config_vllm.json
{
    "api_key": [
        "xxx"
    ],
    "base_url": "xxx"
}

2. Run

If you have setup the OpenAI API config file, you can run the run.sh file.

# openai client

python main.py \
    --image_path ./test_data/one_image_demo.png \
    --question "What is the color of the liquid contained in the glass on the table?" \
    --api_config ./api_config_files/api_config_openai.json \
    --client_type openai \
    --prompt_template ./prompt_template/prompt_template_vis.json \
    --prompt vistool_with_img_info_v2 \
    --exe_code \
    --max_tokens 10000 \
    --temperature 0.6 \
    --output_dir ./test_data \
    --save_messages

3. Visualization

After running the run.sh file, the generated message is stored at ./test_data/test_message.json.
Upload the message file to our hosted visualization HuggingFace space: visualization demo.

📜Citation

@article{zhao2025pyvision,
  title={PyVision: Agentic Vision with Dynamic Tooling.},
  author={Zhao, Shitian and Zhang, Haoquan and Lin, Shaoheng and Li, Ming and Wu, Qilong and Zhang, Kaipeng and Wei, Chen},
  journal={arxiv preprint arxiv:2507.07998},
  year={2025},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyVision: Agentic Vision with Dynamic Tooling

🎯Overview

🚩News

📋Contents

📦Installation

💥Run PyVision

1. Setup API Config

OpenAI Client

Azure Client

vLLM Client

2. Run

3. Visualization

📜Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
api_config_files		api_config_files
inference_engine		inference_engine
prompt_template		prompt_template
test_data		test_data
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run.sh		run.sh

agents-x-project/PyVision

Folders and files

Latest commit

History

Repository files navigation

PyVision: Agentic Vision with Dynamic Tooling

🎯Overview

🚩News

📋Contents

📦Installation

💥Run PyVision

1. Setup API Config

OpenAI Client

Azure Client

vLLM Client

2. Run

3. Visualization

📜Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages