Skip to content

Observations are mapped to each agent but what about each agent's actions?  #15

Open
@PBarde

Description

@PBarde

I have trouble understanding where the list of action’s vector for each agent (that you pass to the MujocoMulti env ) is reassembled into the single agent Mujoco env action vector to match the correct actuators. For example, from line

flat_actions = np.concatenate([actions[i][:self.action_space[i].low.shape[0]] for i in range(self.n_agents)])

it seems that the multi-agent action list is simply flattened and then passed to the Mujoco single agent env. I do not see how this could handle both the 2-Agent Ant and 2-Agent Ant Diag setups. If we look at Figure 4 of the FACMAC paper, in Figure 4 H and I we have:

2-Agent Ant (Figure 4 H):

MA action list = [blue agent, green agent] = [[a1, a2, a5, a6], [a3, a4, a7, a8]]

Flattened single agent action = [a1, a2, a5, a6, a3, a4, a7, a8]

2-Agent Ant Diag (Figure 4 I):

MA action list = [blue agent, green agent] = [[a3, a4, a5, a6], [a1, a2, a7, a8]]

Flattened single agent action = [a3, a4, a5, a6, a1, a2, a7, a8]

We see that the action vectors passed to the single agent mujoco env do not correspond to the same actuators.

I think that this corresponds to agents observing one limb but controlling another.

Am I missing something here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions