Skip to content

Implement Omniparser #894

@abrichr

Description

@abrichr

Feature request

We want to implement https://huggingface.co/microsoft/OmniParser in a ReplayStrategy (e.g. #888)

Motivation

OmniParser is designed to be able to convert unstructured screenshot image into structured list of elements including interactable regions location and captions of icons on its potential functionality.
OmniParser is intended to be used in settings where users are already trained on responsible analytic approaches and critical reasoning is expected. OmniParser is capable of providing extracted information from the screenshot, however human judgement is needed for the output of OmniParser.
OmniParser is intended to be used on various screenshots, which includes both PC and Phone, and also on various applications.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions