GitHub - SamsungLabs/TinyClick: TinyClick: Single-Turn Agent for Empowering GUI Automation

TinyClick: Single-Turn Agent for Empowering GUI Automation

The code for running the model from paper: TinyClick: Single-Turn Agent for Empowering GUI Automation

About The Project

We present a single-turn agent for graphical user interface (GUI) interaction tasks, using Vision-Language Model Florence-2-Base. Main goal of the agent is to click on desired UI element based on the screenshot and user command. It demonstrates strong performance on Screenspot and OmniAct, while maintaining a compact size of 0.27B parameters and minimal latency.

Installation

Before running, set up the environment and install the required packages:

pip install -r requirements.txt

Usage

To see example inference with TinyClick, run this command:
python3 main.py --image-path "<PATH>" --text "<COMMAND>"

Citation

@misc{pawlowski2024tinyclicksingleturnagentempowering,
    title={TinyClick: Single-Turn Agent for Empowering GUI Automation}, 
    author={Pawel Pawlowski and Krystian Zawistowski and Wojciech Lapacz and Marcin Skorupa and Adam Wiacek and Sebastien Postansque and Jakub Hoscilowicz},
    year={2024},
    eprint={2410.11871},
    archivePrefix={arXiv},
    primaryClass={cs.HC},
    url={https://arxiv.org/abs/2410.11871}, 
}

License

Please check the MIT license that is listed in this repository. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
tinyclick_utils.py		tinyclick_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyClick: Single-Turn Agent for Empowering GUI Automation

About The Project

Installation

Usage

Citation

License

About

Releases

Packages

Languages

License

SamsungLabs/TinyClick

Folders and files

Latest commit

History

Repository files navigation

TinyClick: Single-Turn Agent for Empowering GUI Automation

About The Project

Installation

Usage

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages