Welcome to InterfaceAgent, a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.
Here are the key capabilities of InterfaceAgent:
-
Planning & Goal Refinement: The agent is capable of constructing multi-step plans across various applications to fulfill user requests. It can also adapt and refine these plans based on user feedback during the evaluation phase.
-
Action Prediction (Pure Visual / Textual / Set-of-Mark Visual Prompting): InterfaceAgent employs a visual coordinate-based approach, pure DOM textual analysis, or set-of-marking to enhance the accuracy of predicting the next likely action.
-
Mixture of Models: InterfaceAgent is compatible with both GPT-4V and Claude models, excelling in determining the subsequent steps directly from page screenshots.
-
Resilient Error Handling: Recognizing that errors are an inherent part of AI Agents, InterfaceAgent incorporates a robust retry mechanism with exponential backoff. This allows it to skillfully navigate through temporary failures, ensuring the Agent's progress is uninterrupted.
InterfaceAgent OS-specific agents extend the core toolkit with advanced automation for the target platform:
- Preview of iOS Agents: Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your iOS device.
- Preview of Windows Agents: Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your Windows 11 device.
- Preview of Appium Android Agents (Coming soon): Explore how your AI Agents can gain access to the ecosystem of apps and functionalities on your Android device.
- Playwright-based Web Agents (Coming soon): Learn how to build Web AI Agent Companions.
You can choose to either clone the repository or use npm, yarn, or pnpm to install InterfaceAgent.
- For Core, see installation steps.
- For iOS, see installation steps.
- For Windows, see installation steps.
1) User Query: Help me download an app named EdgeTile
2) User Query: Dropshipping products on Tiktok
User Query: Help me prepare for a 30 days of fitness challenge
InterfaceAgent continues to face challenges in long-horizon planning and selector inference accuracy. The current focus is on enhancing the stability of InterfaceAgent agents.
We welcome contributions. Please follow the standard fork-and-pull request workflow for your contributions.
InterfaceAgent is licensed under the MIT License.
For support, questions, or feature requests, open an issue in the GitHub repository.