In this folder, there may exist multiple implementations of Agent
that will be used by the framework.
For example, openhands/agenthub/codeact_agent
, etc.
Contributors from different backgrounds and interests can choose to contribute to any (or all!) of these directions.
The abstraction for an agent can be found here.
Agents are run inside of a loop. At each iteration, agent.step()
is called with a
State input, and the agent must output an Action.
Every agent also has a self.llm
which it can use to interact with the LLM configured by the user.
See the LiteLLM docs for self.llm.completion
.
The state
represents the running state of an agent in the OpenHands system. The class handles saving and restoring the agent session. It is serialized in a pickle.
The State object stores information about:
- Multi-agent state / delegates:
- the 'root task' (conversation between the agent and the user)
- the subtask (conversation between an agent and the user or another agent)
- global and local iterations
- delegate levels for multi-agent interactions
- almost stuck state
- Running state of an agent:
- current agent state (e.g., LOADING, RUNNING, PAUSED)
- traffic control state for rate limiting
- confirmation mode
- the last error encountered
- History:
- start and end IDs for events in agent's history. This allows to retrieve the actions taken by the agent, and observations (e.g. file content, command output) from the current or past sessions.
- Metrics:
- global metrics for the current task
- local metrics for the current subtask
- Extra data:
- additional task-specific data
The agent can add and modify subtasks through the AddTaskAction
and ModifyTaskAction
Here is a list of available Actions, which can be returned by agent.step()
:
CmdRunAction
- Runs a command inside a sandboxed terminalIPythonRunCellAction
- Execute a block of Python code interactively (in Jupyter notebook) and receivesCmdOutputObservation
. Requires setting upjupyter
plugin as a requirement.FileReadAction
- Reads the content of a fileFileWriteAction
- Writes new content to a fileBrowseURLAction
- Gets the content of a URLAddTaskAction
- Adds a subtask to the planModifyTaskAction
- Changes the state of a subtask.AgentFinishAction
- Stops the control loop, allowing the user/delegator agent to enter a new taskAgentRejectAction
- Stops the control loop, allowing the user/delegator agent to enter a new taskAgentFinishAction
- Stops the control loop, allowing the user to enter a new taskMessageAction
- Represents a message from an agent or the user
To serialize and deserialize an action, you can use:
action.to_dict()
to serialize the action to a dictionary to be sent to the UI, including a user-friendly string representation of the messageaction.to_memory()
to serialize the action to a dictionary to be sent to the LLM. It may include raw information, such as the underlying exceptions that occurred during the action execution.action_from_dict(action_dict)
to deserialize the action from a dictionary.
There are also several types of Observations. These are typically available in the step following the corresponding Action. But they may also appear as a result of asynchronous events (e.g. a message from the user).
Here is a list of available Observations:
CmdOutputObservation
BrowserOutputObservation
FileReadObservation
FileWriteObservation
ErrorObservation
SuccessObservation
You can use observation.to_dict()
and observation_from_dict
to serialize and deserialize observations.
Every agent must implement the following methods:
def step(self, state: "State") -> "Action"
step
moves the agent forward one step towards its goal. This probably means
sending a prompt to the LLM, then parsing the response into an Action
.
OpenHands is a multi-agentic system. Agents can delegate tasks to other agents, whether
prompted by the user, or when the agent decides to ask another agent for help. For example,
the CodeActAgent
might delegate to the BrowsingAgent
to answer questions that involve browsing
the web. The Delegator Agent forwards tasks to micro-agents, such as 'RepoStudyAgent' to study a repo,
or 'VerifierAgent' to verify a task completion.
A task
is an end-to-end conversation between OpenHands (the whole system) and the user,
which might involve one or more inputs from the user. It starts with an initial input
(typically a task statement) from the user, and ends with either an AgentFinishAction
initiated by the agent, a stop initiated by the user, or an error.
A subtask
is an end-to-end conversation between an agent and the user, or
another agent. If a task
is conducted by a single agent, then it's also a subtask
itself. Otherwise, a task
consists of multiple subtasks
, each executed by
one agent.
For example, considering a task from the user: tell me how many GitHub stars OpenHands repo has
. Let's assume the default agent is CodeActAgent.
-- TASK STARTS (SUBTASK 0 STARTS) --
DELEGATE_LEVEL 0, ITERATION 0, LOCAL_ITERATION 0
CodeActAgent: I should request help from BrowsingAgent
-- DELEGATE STARTS (SUBTASK 1 STARTS) --
DELEGATE_LEVEL 1, ITERATION 1, LOCAL_ITERATION 0
BrowsingAgent: Let me find the answer on GitHub
DELEGATE_LEVEL 1, ITERATION 2, LOCAL_ITERATION 1
BrowsingAgent: I found the answer, let me convey the result and finish
-- DELEGATE ENDS (SUBTASK 1 ENDS) --
DELEGATE_LEVEL 0, ITERATION 3, LOCAL_ITERATION 1
CodeActAgent: I got the answer from BrowsingAgent, let me convey the result
and finish
-- TASK ENDS (SUBTASK 0 ENDS) --
Note how ITERATION counter is shared across agents, while LOCAL_ITERATION is local to each subtask.