some question about the detection ability

I hope to input a screenshot of a webpage/app and have the model help me detect some components in the image, such as icons, textboxes, etc., but it is completely unable to do so. The only successful attempt is that when prompt="icon" is entered, the model can detect a certain number of icons/images. In other cases, it is almost impossible to detect anything. I want to know if the model has not been trained specifically in this area, or if my prompt is not written well enough. There is also a special case where prompt="icon. textbox". If a prompt has one more word (textbox), almost nothing can be detected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

some question about the detection ability #47

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

some question about the detection ability #47

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions