Skip to content

dvlab-research/RL-GPT

Repository files navigation

RL-GPT open sourced RL training

Combine LLMs and RL: The LLM reasons about the agent's behavior to solve subtasks and generates higher-level actions, improving RL's sample efficiency.

Contents

Install

  • Resource link: https://drive.google.com/file/d/1IdBEGZJh1r4MOn4vMDOZ3CfKn9YAvOzZ/view?usp=sharing

  • The official document of MineDojo environment: https://docs.minedojo.org/sections/getting_started/install.html#prerequisites

  • Create python 3.9 environment in anaconda.

  • Install jdk version 171, otherwise you may see some error with Malmo. The package jdk-8u171-linux-x64.tar.gz is in the resource link.

    • sudo tar -xzvf jdk-8u171-linux-x64.tar.gz -C /usr/local
    • export JAVA_HOME=/usr/local/jdk1.8.0_171
  • Install dependencies

    • sudo apt install xvfb xserver-xephyr python-opengl ffmpeg
    • Centos: sudo yum install xorg-x11-server-Xvfb xorg-x11-server-Xephyr ffmpeg
  • Install OpenGL (Centos)

    • sudo yum install mesa*
    • sudo yum install freeglut*
  • Download our repo https://github.com/PKU-RL/MCEnv. Run python setup.py install.

  • For different tasks, carefully check our fast_reset option.

  • If successfully installed, you can run MINEDOJO_HEADLESS=1 python validate_install.py.

  • Install MineCLIP: pip install git+https://github.com/MineDojo/MineCLIP, or use the package in the resource link.

  • Use PyTorch>=1.8.1. Require x-transformers==0.27.1, otherwise the CLIP model cannot be loaded.

  • Check the arguments in train.py. Download the pretrained MineCLIP model adjust.pth in the resource link.

PPO-Training

  • For PPO, run MINEDOJO_HEADLESS=1 python train.py.
    --task: the programmatic task name.

    --exp-name: specify dir name prefix of saved logs and models.

    --save-path: this log dir will save models and gifs.

    Model, gif videos and experience are saved in checkpoint/. Training configs and logs are saved in data/.

  • Draw training curves: find the training log file progress.txt in data/, move vis.py into its directory and run.

Results

  • For milk & wool, the --task is harvest_milk_with_empty_bucket_and_cow and harvest_wool_with_shears_and_sheep. fig/ shows our training results.

  • For other tasks, you may refer to the paper and modify the environment minecraft.py, to specify the simulation and reward function.

milk wool

TODO

  • Open-source RL training framework

  • Fix the environmental issues of different systems

  • The temporal abstraction technique

  • More applications

Citation

@article{liu2024rlgpt,
title={{RL-GPT}: Integrating Reinforcement Learning and Code-as-policy}, 
author={Liu, Shaoteng and Yuan, Haoqi and Hu, Minda and Li, Yanwei and Chen, Yukang and Liu, Shu and Lu, Zongqing and Jia, Jiaya},
journal={arXiv preprint arXiv:2402.19299}, 
year={2024},
}

Acknowledgement

  • A multi-task agent in Minecraft Plan4MC.
  • The first LLM-powered lifelong learning agent in Minecraft Voyager.
  • Many practical prompts and tools. in DEPS.

License

This codebase is under MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages