Super simple early version of a Manus-like agent.
WinterAI can perform tasks generally on your computer.
- Agent: Core agent logic (
Agent/Agent.py
) for decision-making and task execution. - Tools: A collection of tools (
Tools/
) for specific actions like web searching (Tools/BrowserAgent.py
), shell commands (Tools/ShellTool.py
), note-taking (Tools/Notes.py
), and checklist management (Tools/Checklist.py
). - Gemini API: Integration with the Gemini API (
Agent/gemini_api.py
). - Context Management: Handling context and requests/responses (`Agent/Context.py').
To set up and run this project, follow these steps:
-
Rename Secrets File:
- Navigate to the
Agent/
directory. - Rename
secrets.example.txt
tosecrets.txt
.
cd Agent mv secrets.example.txt secrets.txt cd ..
- Navigate to the
-
Add API Keys:
- Open the
Agent/secrets.txt
file. - Add your Gemini API key.
# Example secrets.txt GEMINI_API_KEY=YOUR_GEMINI_API_KEY
- Open the
-
Add Browser Path:
- Open the
Agent/secrets.txt
file. - Add your browser path.
# Example secrets.txt BROWSER_LOCATION=/usr/bin/google-chrome
- Common paths include:
Linux:
/usr/bin/google-chrome
Windows:
C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe
MacOS:
/Applications/Google Chrome.app/Contents/MacOS/Google Chrome
- Open the
-
Install Dependencies:
- Use
pip
to install required dependencies with version ranges specified inrequirements.txt
.
pip install -r requirements.txt
- Use
-
Run the Agent:
- To run the autonomous agent, execute the main script (
main.py
).
python main.py
- to execute with a new agent, run with the -n flag
python main.py -n
- To run the autonomous agent, execute the main script (
After providing an initial prompt, the prompt will be expanded to include many smaller steps. These steps will then be executed by an agent that is capable of using shell commands on your machine, searching the internet, and taking notes.
PLEASE NOTE: Shell commands are inherently unsafe. The program will prompt you to approve (y/n) every shell command that is attempted.
The Agent can also message you to ask for further input.
You also must approve new browser agents.
NOTE: Currently you must have all chrome instances closed before using the program. When the agent asks to start a new research agent, you must close the previous chrome window that was updated.
You can use the bash -n
(new) flag to create a new agent.
You can use the bash -e
(expand) flag to generate a more in depth prompt from your initial prompt.
browser_use used for web agents
@software{browser_use2024,
author = {Müller, Magnus and Žunič, Gregor},
title = {Browser Use: Enable AI to control your browser},
year = {2024},
publisher = {GitHub},
url = {https://github.com/browser-use/browser-use}
}
This project is licensed under the MIT License - see the LICENSE.md file for details.