Reliably Re-Acting to Partner's Actions with the Social Intrinsic Motivation of Transfer Empowerment
This repo contains the code accompanying the paper Reliably Re-Acting to Partner's Actions with the Social Intrinsic Motivation of Transfer Empowerment. It builts on the MADDPG algorithm and uses the simulator from particle-env. One of the scenarios extends the single-agent OpenAI Gym Car to multiple agents.
We consider multi-agent reinforcement learning (MARL) for cooperative communication and coordination tasks. MARL agents can be brittle because they can overfit their training partners' policies. This overfitting can produce agents that adopt policies that act under the expectation that other agents will act in a certain way rather than react to their actions. Our objective is to bias the learning process towards finding reactive strategies towards other agents' behaviors. Our method, transfer empowerment, measures the potential influence between agents' actions. Results from three simulated cooperation scenarios support our hypothesis that transfer empowerment improves MARL performance. We discuss how transfer empowerment could be a useful principle to guide multi-agent coordination by ensuring reactiveness to one's partner.
pip install -e .
All training code is contained within main.py
. To view options simply run:
python main.py --help
If you want to checkout the training loss on tensorboard, activate the VE and use:
tensorboard --logdir models/model_name
If you want to train methods from the paper for scenario 'simple_order':
python main.py simple_order si --social_influence
python main.py simple_order te --variational_transfer_all_action_pi_empowerment
python main.py simple_order je --variational_joint_empowerment
If you want to train methods from the paper for scenario 'cars':
The moving agent needs to go to a landmark with a particular color. However, it is blind and another agent sends messages that help to navigate. Since there are more landmarks than communication channels, the speaking agent cannot simply output a symbol corresponding to a particular color. If the listening agent is not receptive to the messages, the speaker will output random signals. This in turn forces the listener to ignore them. With empowerment agents remain reactive to one another.
DDPG | MADDPG | EMADDPG |
---|---|---|
python simple_speaker_listener3 maddpg+ve --recurrent --variational_transfer_empowerment
In this simple task agents need to cover all landmarks. MADDPG algorithm is trained by self-play, causing them to agree upon a rule. For example, agent 1 goes to the red, agent 2 goes to the green and agent 3 to the blue landmark. At test time, agent 1 is paired with agent 2 and 3 from a different run and so the former rule does not necessarily results in the most efficient landmark selection. In contrast, EMADDPG uses empowerment that results in picking a landmark closest to each agent.
MADDPG | EMADDPG |
---|---|
python main.py maddpg+ve --recurrent --variational_joint_empowerment
Cars need to stay on the road and need to avoid collisions. Agents only obtain a small top view image and their own states, such as orientation and velocity.
Visual inputs:
Red Agent | Green Agent |
---|---|
DDPG | MADDPG | |
---|---|---|
Overtaking | ||
Obstacle avoidance | ||
Junctions |
Agent | Average dist. | Collisions % |
---|---|---|
MADDPG | 1.767 | 20.9 |
EMADDPG | 0.180 | 2.01 |
The average distance of a landmark (lower is better) and number of collisions between agents.
Agent | Taget reach % | Average distance | Obstacle hits % |
---|---|---|---|
MADDPG | 84.0 | 2.233 | 53.5 |
EMADDPG | 98.8 | 0.012 | 1.90 |
The target is reached if it has <.1 from the target landmark (higher is better).