Skip to content

Latest commit

 

History

History
106 lines (76 loc) · 6.21 KB

README.md

File metadata and controls

106 lines (76 loc) · 6.21 KB

Reliably Re-Acting to Partner's Actions with the Social Intrinsic Motivation of Transfer Empowerment

This repo contains the code accompanying the paper Reliably Re-Acting to Partner's Actions with the Social Intrinsic Motivation of Transfer Empowerment. It builts on the MADDPG algorithm and uses the simulator from particle-env. One of the scenarios extends the single-agent OpenAI Gym Car to multiple agents.

We consider multi-agent reinforcement learning (MARL) for cooperative communication and coordination tasks. MARL agents can be brittle because they can overfit their training partners' policies. This overfitting can produce agents that adopt policies that act under the expectation that other agents will act in a certain way rather than react to their actions. Our objective is to bias the learning process towards finding reactive strategies towards other agents' behaviors. Our method, transfer empowerment, measures the potential influence between agents' actions. Results from three simulated cooperation scenarios support our hypothesis that transfer empowerment improves MARL performance. We discuss how transfer empowerment could be a useful principle to guide multi-agent coordination by ensuring reactiveness to one's partner.

Requirements

pip install -e .

How to Run

All training code is contained within main.py. To view options simply run:

python main.py --help

If you want to checkout the training loss on tensorboard, activate the VE and use:

tensorboard --logdir models/model_name

If you want to train methods from the paper for scenario 'simple_order':

python main.py simple_order si --social_influence
python main.py simple_order te --variational_transfer_all_action_pi_empowerment
python main.py simple_order je --variational_joint_empowerment

If you want to train methods from the paper for scenario 'cars':

Simulation Videos

Cooperative Communication

The moving agent needs to go to a landmark with a particular color. However, it is blind and another agent sends messages that help to navigate. Since there are more landmarks than communication channels, the speaking agent cannot simply output a symbol corresponding to a particular color. If the listening agent is not receptive to the messages, the speaker will output random signals. This in turn forces the listener to ignore them. With empowerment agents remain reactive to one another.

DDPG MADDPG EMADDPG
python simple_speaker_listener3 maddpg+ve --recurrent --variational_transfer_empowerment

Cooperative Coordination

In this simple task agents need to cover all landmarks. MADDPG algorithm is trained by self-play, causing them to agree upon a rule. For example, agent 1 goes to the red, agent 2 goes to the green and agent 3 to the blue landmark. At test time, agent 1 is paired with agent 2 and 3 from a different run and so the former rule does not necessarily results in the most efficient landmark selection. In contrast, EMADDPG uses empowerment that results in picking a landmark closest to each agent.

MADDPG EMADDPG
python main.py maddpg+ve --recurrent --variational_joint_empowerment

Cooperative Driving

Cars need to stay on the road and need to avoid collisions. Agents only obtain a small top view image and their own states, such as orientation and velocity.

Visual inputs:

Red Agent Green Agent
DDPG MADDPG
Overtaking
Obstacle avoidance
Junctions

Cooperative Coordination

Agent Average dist. Collisions %
MADDPG 1.767 20.9
EMADDPG 0.180 2.01

The average distance of a landmark (lower is better) and number of collisions between agents.

Cooperative Communication

Agent Taget reach % Average distance Obstacle hits %
MADDPG 84.0 2.233 53.5
EMADDPG 98.8 0.012 1.90

The target is reached if it has <.1 from the target landmark (higher is better).