Observations are mapped to each agent but what about each agent's actions? 

I have trouble understanding where the list of action’s vector for each agent (that you pass to the MujocoMulti env ) is reassembled into the single agent Mujoco env action vector to match the correct actuators. For example, from line https://github.com/schroederdewitt/multiagent_mujoco/blob/97eab01fcff0313f1a1c275115c10616988145a3/src/multiagent_mujoco/mujoco_multi.py#L111 

it seems that the multi-agent action list is simply flattened and then passed to the Mujoco single agent env. I do not see how this could handle both the 2-Agent Ant and 2-Agent Ant Diag setups. If we look at Figure 4 of the FACMAC paper, in Figure 4 H and I we have:


2-Agent Ant (Figure 4 H):

MA action list = [blue agent, green agent] = [[a1, a2, a5, a6], [a3, a4, a7, a8]]

Flattened single agent action =  [a1, a2, a5, a6, a3, a4, a7, a8]


2-Agent Ant Diag (Figure 4 I):

MA action list = [blue agent, green agent] = [[a3, a4, a5, a6], [a1, a2, a7, a8]]

Flattened single agent action =  [a3, a4, a5, a6, a1, a2, a7, a8]


We see that the action vectors passed to the single agent mujoco env do not correspond to the same actuators. 

I think that this corresponds to agents observing one limb but controlling another. 


Am I missing something here? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Observations are mapped to each agent but what about each agent's actions? #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Observations are mapped to each agent but what about each agent's actions? #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions