This repository contains the code for my paper. For more details, please refer to the paper Social navigation with human empowerment driven reinforcement learning.
The next generation of mobile robots needs to be socially-compliant to be accepted by humans. As simple as this task may seem, defining compliance formally is not trivial. Yet, classical reinforcement learning (RL) relies upon hard-coded reward signals. In this work, we go beyond this approach and provide the agent with intrinsic motivation using empowerment. Empowerment maximizes the influence of an agent on its near future and has been shown to be a good model for biological behaviors. It also has been used for artificial agents to learn complicated and generalized actions. Self-empowerment maximizes the influence of an agent on its future. On the contrary, our robot strives for the empowerment of people in its environment, so they are not disturbed by the robot when pursuing their goals. We show that our robot has a positive influence on humans, as it minimizes the travel time and distance of humans while moving efficiently to its own goal. The method can be used in any multi-agent system that requires a robot to solve a particular task involving humans interactions.
The robot uses the states of all its neighbors to compute their empowerment. Empowerment is used in addition to state-value estimates. In this way, SCR learns to not collide with humans (from state-values) as well as giving them the ability to pursue their goals (from empowerment estimates).
The human states are occupancy maps centered around them. These provide enough information on whether an action would have influence or not, because occupied areas block their movement. Empowerment is computed from these maps and their actions, which are 2 (dx, dy movements) continuous samples obtained from normal distributions.
- Install Python-RVO2 library
- Install crowd_sim and crowd_nav into pip
pip install -e .
This repository is organized in two parts: gym_crowd/ folder contains the simulation environment and crowd_nav/ folder contains codes for training and testing the policies. Details of the simulation framework can be found here. Below are the instructions for training and testing policies, and they should be executed inside the crowd_nav/ folder.
- Train a policy.
python train.py --policy scr
- Test policies with 500 test cases.
python acceptance_test_score.py --policy orca --phase test
python acceptance_test_score.py --policy scr --model_dir data/output --phase test
- Visualize a test case and potentially save a video or plot
python acceptance_test_visualisation.py --policy scr --model_dir data/output --phase test --visualize --test_case 0 --plot_file data/output/plot.png
python acceptance_test_visualisation.py --policy scr --model_dir data/output --phase test --visualize --test_case 0 --video_file data/output/video.mp4
- Plot training curve.
python utils/plot.py data/output/output.log
CADRL | LSTM-RL |
---|---|
SARL | OM-SARL |
The policies can also be tested by manually controlling the position of the human. This can be done by:
python acceptance_test_interaction.py --policy scr