This repository contains implementations of transfer learning algorithms described in the following papers:
Learning Fast Adaptation with Meta Strategy Optimization, ICRA 2020
Policy Transfer with Strategy Optimization, ICLR 2019
Prepare for the Unknown: Learning a Universal Policy with Online System Identification, RSS 2017
To use this code you need to install OpenAI Baselines, Dart and PyDart2.
You can find detailed instructions for installing OpenAI Baselines here. For installing Dart and PyDart2, you can follow the installation instructions here.
Note that the environments also depends on OpenAI Gym, however it should come with Baselines.
Run the following command from the project directory:
pip install -e .
SO-CMA has two stages: training universal policy and strategy optimization.
To train a universal policy, use the code in ppo. FOr the strategy optimization part, use the code in test_socma.
An example of Dart hopper transferred to MuJoCo hopper can be found in examples:
examples/socma_hopper_5d_train.sh
The training results will be saved to data/.
To perform strategy optimization, run:
examples/socma_hopper_5d_test.sh
You can also use test_policy.py to test individual policies.
Training UP-OSI involves two steps: training a universal policy and training an online system identification model.
To train a universal policy, use the code in ppo. To train the online system identification model, use the code in train_osi.
An example training script for the hopper environment is available in examples, use the following command to run the example training script:
examples/uposi_hopper_2d_train.sh
The training results will be saved to data/.
To test the resulting controller, run:
examples/uposi_hopper_2d_test.sh
and follow the prompt in the terminal. After each rollout a plot of the estimated model parameters and true model parameters is shown.
If you see errors like: ODE INTERNAL ERROR 1: assertion "d[i] != dReal(0.0)" failed in _dLDLTRemove(), try downloading lcp.cpp and replace the one in dart/external/odelcpsolver/ with it. Recompile Dart and Pydart2 afterward and the issue should be gone.
Please contact Wenhao Yu ([email protected]) if you have any feedbacks/questions about this work.