Skip to content

InterfaceAgent: a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.

License

Notifications You must be signed in to change notification settings

francedot/Interface-Agent

Repository files navigation

Agent

TypeScript Node 20 LTS MIT License

InterfaceAgent Screenshot

🤔 What is InterfaceAgent?

Welcome to InterfaceAgent, a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.

Here are the key capabilities of InterfaceAgent:

  • Planning & Goal Refinement: The agent is capable of constructing multi-step plans across various applications to fulfill user requests. It can also adapt and refine these plans based on user feedback during the evaluation phase.

  • Action Prediction (Pure Visual / Textual / Set-of-Mark Visual Prompting): InterfaceAgent employs a visual coordinate-based approach, pure DOM textual analysis, or set-of-marking to enhance the accuracy of predicting the next likely action.

  • Mixture of Models: InterfaceAgent is compatible with both GPT-4V and Claude models, excelling in determining the subsequent steps directly from page screenshots.

  • Resilient Error Handling: Recognizing that errors are an inherent part of AI Agents, InterfaceAgent incorporates a robust retry mechanism with exponential backoff. This allows it to skillfully navigate through temporary failures, ensuring the Agent's progress is uninterrupted.

InterfaceAgent OS-specific agents extend the core toolkit with advanced automation for the target platform:

  • Preview of iOS Agents: Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your iOS device.
  • Preview of Windows Agents: Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your Windows 11 device.
  • Preview of Appium Android Agents (Coming soon): Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your Android device.
  • Playwright-based Web Agents (Coming soon): Learn how to build Web AI Agent Companions.

💻 Getting Started

You can choose to either clone the repository or use npm, yarn, or pnpm to install InterfaceAgent.

🎬 Demos

Windows

1) User Query: Help me download an app named EdgeTile

EdgeTile demo

2) User Query: Dropshipping products on Tiktok

TikTok demo

iOS

User Query: Help me prepare for a 30 days of fitness challenge

30 days of fitness demo

🚀 Challenges and Focus

InterfaceAgent continues to face challenges in long-horizon planning and selector inference accuracy. The current focus is on enhancing the stability of InterfaceAgent agents.

🤓 Contributing

We welcome contributions. Please follow the standard fork-and-pull request workflow for your contributions.

🛂 License

InterfaceAgent is licensed under the MIT License.

🚑 Support

For support, questions, or feature requests, open an issue in the GitHub repository.

About

InterfaceAgent: a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.

Resources

License

Stars

Watchers

Forks

Packages

No packages published