Skip to content

VincentYu68/policy_transfer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Learning Transferrable and Adaptive Control Policies

This repository contains implementations of transfer learning algorithms described in the following papers:

Learning Fast Adaptation with Meta Strategy Optimization, ICRA 2020

Policy Transfer with Strategy Optimization, ICLR 2019

Prepare for the Unknown: Learning a Universal Policy with Online System Identification, RSS 2017

Prerequisites

To use this code you need to install OpenAI Baselines, Dart and PyDart2.

You can find detailed instructions for installing OpenAI Baselines here. For installing Dart and PyDart2, you can follow the installation instructions here.

Note that the environments also depends on OpenAI Gym, however it should come with Baselines.

Installation

Run the following command from the project directory:

pip install -e .

How to use

SO-CMA

SO-CMA has two stages: training universal policy and strategy optimization.

To train a universal policy, use the code in ppo. FOr the strategy optimization part, use the code in test_socma.

An example of Dart hopper transferred to MuJoCo hopper can be found in examples:

examples/socma_hopper_5d_train.sh

The training results will be saved to data/.

To perform strategy optimization, run:

examples/socma_hopper_5d_test.sh

You can also use test_policy.py to test individual policies.

UP-OSI

Training UP-OSI involves two steps: training a universal policy and training an online system identification model.

To train a universal policy, use the code in ppo. To train the online system identification model, use the code in train_osi.

An example training script for the hopper environment is available in examples, use the following command to run the example training script:

examples/uposi_hopper_2d_train.sh

The training results will be saved to data/.

To test the resulting controller, run:

examples/uposi_hopper_2d_test.sh

and follow the prompt in the terminal. After each rollout a plot of the estimated model parameters and true model parameters is shown.

ODE Internal Error

If you see errors like: ODE INTERNAL ERROR 1: assertion "d[i] != dReal(0.0)" failed in _dLDLTRemove(), try downloading lcp.cpp and replace the one in dart/external/odelcpsolver/ with it. Recompile Dart and Pydart2 afterward and the issue should be gone.

Additional feedbacks:

Please contact Wenhao Yu ([email protected]) if you have any feedbacks/questions about this work.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages