Skip to content

Commit 2de5c26

Browse files
Vincent Moensvmoens
Vincent Moens
authored andcommitted
[Algorithm] GRPO scripts
ghstack-source-id: a5aa8e1 Pull-Request-resolved: #2970
1 parent 023c965 commit 2de5c26

File tree

23 files changed

+1272
-242
lines changed

23 files changed

+1272
-242
lines changed

README.md

Lines changed: 44 additions & 177 deletions
Original file line numberDiff line numberDiff line change
@@ -826,201 +826,68 @@ If you're using TorchRL, please refer to this BibTeX entry to cite this work:
826826

827827
## Installation
828828

829-
Create a conda environment where the packages will be installed.
830-
831-
```
832-
conda create --name torch_rl python=3.9
833-
conda activate torch_rl
834-
```
835-
836-
**PyTorch**
837-
838-
Depending on the use of functorch that you want to make, you may want to
839-
install the latest (nightly) PyTorch release or the latest stable version of PyTorch.
840-
See [here](https://pytorch.org/get-started/locally/) for a detailed list of commands,
841-
including `pip3` or other special installation instructions.
842-
843-
**Torchrl**
844-
845-
You can install the **latest stable release** by using
829+
1. Create a new virtual environment:
846830
```bash
847-
pip3 install torchrl
831+
python -m venv venv
832+
source venv/bin/activate # On Windows use: venv\Scripts\activate
848833
```
849-
This should work on linux, Windows 10 and OsX (Intel or Silicon chips).
850-
On certain Windows machines (Windows 11), one should install the library locally (see below).
851834

852-
For AArch64 machines, the binaries are not yet stored on PyPI so you will need to download them directly from
853-
the [release page](https://github.com/pytorch/rl/releases/) or install the library via
854-
```
855-
pip3 install git+https://github.com/pytorch/[email protected]
856-
```
857-
858-
The **nightly build** can be installed via
835+
2. Install dependencies:
859836
```bash
860-
pip3 install tensordict-nightly torchrl-nightly
837+
pip install -r requirements.txt
861838
```
862-
which we currently only ship for Linux machines.
863-
Importantly, the nightly builds require the nightly builds of PyTorch too.
864839

865-
To install extra dependencies, call
840+
3. Set required environment variables:
866841
```bash
867-
pip3 install "torchrl[atari,dm_control,gym_continuous,rendering,tests,utils,marl,open_spiel,checkpointing]"
842+
export VLLM_USE_V1=0 # Required for vLLM compatibility
868843
```
869-
or a subset of these.
870844

871-
To install torchrl with the latest pytorch, use
872-
```bash
873-
pip3 install "torchrl[replay_buffer]"
874-
```
875-
since some features in the replay buffer require PyTorch 2.7.0 or above.
876-
877-
One may also desire to install the library locally. Three main reasons can motivate this:
878-
- the nightly/stable release isn't available for one's platform (eg, Windows 11, nightlies for Apple Silicon etc.);
879-
- contributing to the code;
880-
- install torchrl with a previous version of PyTorch (any version >= 2.1) (note that this should also be doable via a regular install followed
881-
by a downgrade to a previous pytorch version -- but the C++ binaries will not be available so some feature will not work,
882-
such as prioritized replay buffers and the like.)
883-
884-
**Disclaimer**: As of today, TorchRL is roughly compatible with any pytorch version >= 2.1 and installing it will not
885-
directly require a newer version of pytorch to be installed. Indirectly though, tensordict still requires the latest
886-
PyTorch to be installed and we are working hard to loosen that requirement.
887-
The C++ binaries of TorchRL (mainly for prioritized replay buffers) will only work with PyTorch 2.7.0 and above.
888-
Some features (e.g., working with nested jagged tensors) may also
889-
be limited with older versions of pytorch. It is recommended to use the latest TorchRL with the latest PyTorch version
890-
unless there is a strong reason not to do so.
891-
892-
To install the library locally, start by cloning the repo:
893-
```bash
894-
git clone https://github.com/pytorch/rl
895-
```
896-
and don't forget to check out the branch or tag you want to use for the build:
897-
```bash
898-
git checkout v0.8.0
899-
```
845+
## Usage
900846

901-
Go to the directory where you have cloned the torchrl repo and install it (after
902-
installing `ninja`)
903-
```bash
904-
cd /path/to/torchrl/
905-
pip3 install ninja -U
906-
python setup.py develop
907-
```
847+
The main training script supports various datasets and models:
908848

909-
One can also build the wheels to distribute to co-workers using
910-
```bash
911-
python setup.py bdist_wheel
912-
```
913-
Your wheels will be stored there `./dist/torchrl<name>.whl` and installable via
914849
```bash
915-
pip install torchrl<name>.whl
916-
```
917-
918-
**Warning**: Unfortunately, `pip3 install -e .` does not currently work. Contributions to help fix this are welcome!
919-
920-
On M1 machines, this should work out-of-the-box with the nightly build of PyTorch.
921-
If the generation of this artifact in MacOs M1 doesn't work correctly or in the execution the message
922-
`(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))` appears, then try
923-
924-
```
925-
ARCHFLAGS="-arch arm64" python setup.py develop
926-
```
927-
928-
To run a quick sanity check, leave that directory (e.g. by executing `cd ~/`)
929-
and try to import the library.
930-
```
931-
python -c "import torchrl"
850+
python sota-implementations/llm/grpo.py \
851+
--dataset gsm8k \
852+
--model_name Qwen/Qwen2.5-3B \
853+
--num_envs 8 \
854+
--steps_per_batch 64 \
855+
--optim_batch_size 4 \
856+
--epochs 1 \
857+
--repeats 16 \
858+
--lr 1e-5 \
859+
--kl_coef 0.01
932860
```
933-
This should not return any warning or error.
934-
935-
**Optional dependencies**
936-
937-
The following libraries can be installed depending on the usage one wants to
938-
make of torchrl:
939-
```
940-
# diverse
941-
pip3 install tqdm tensorboard "hydra-core>=1.1" hydra-submitit-launcher
942-
943-
# rendering
944-
pip3 install "moviepy<2.0.0"
945-
946-
# deepmind control suite
947-
pip3 install dm_control
948-
949-
# gym, atari games
950-
pip3 install "gym[atari]" "gym[accept-rom-license]" pygame
951-
952-
# tests
953-
pip3 install pytest pyyaml pytest-instafail
954-
955-
# tensorboard
956-
pip3 install tensorboard
957-
958-
# wandb
959-
pip3 install wandb
960-
```
961-
962-
**Troubleshooting**
963-
964-
If a `ModuleNotFoundError: No module named ‘torchrl._torchrl` errors occurs (or
965-
a warning indicating that the C++ binaries could not be loaded),
966-
it means that the C++ extensions were not installed or not found.
967-
968-
- One common reason might be that you are trying to import torchrl from within the
969-
git repo location. The following code snippet should return an error if
970-
torchrl has not been installed in `develop` mode:
971-
```
972-
cd ~/path/to/rl/repo
973-
python -c 'from torchrl.envs.libs.gym import GymEnv'
974-
```
975-
If this is the case, consider executing torchrl from another location.
976-
- If you're not importing torchrl from within its repo location, it could be
977-
caused by a problem during the local installation. Check the log after the
978-
`python setup.py develop`. One common cause is a g++/C++ version discrepancy
979-
and/or a problem with the `ninja` library.
980-
- If the problem persists, feel free to open an issue on the topic in the repo,
981-
we'll make our best to help!
982-
- On **MacOs**, we recommend installing XCode first.
983-
With Apple Silicon M1 chips, make sure you are using the arm64-built python
984-
(e.g. [here](https://betterprogramming.pub/how-to-install-pytorch-on-apple-m1-series-512b3ad9bc6)).
985-
Running the following lines of code
986-
```
987-
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
988-
python collect_env.py
989-
```
990-
should display
991-
```
992-
OS: macOS *** (arm64)
993-
```
994-
and not
995-
```
996-
OS: macOS **** (x86_64)
997-
```
998-
999-
Versioning issues can cause error message of the type ```undefined symbol```
1000-
and such. For these, refer to the [versioning issues document](https://github.com/pytorch/rl/blob/main/knowledge_base/VERSIONING_ISSUES.md)
1001-
for a complete explanation and proposed workarounds.
1002-
1003-
## Asking a question
1004-
1005-
If you spot a bug in the library, please raise an issue in this repo.
1006861

1007-
If you have a more generic question regarding RL in PyTorch, post it on
1008-
the [PyTorch forum](https://discuss.pytorch.org/c/reinforcement-learning/6).
862+
### Key Parameters
1009863

1010-
## Contributing
864+
- `--dataset`: Currently supports 'gsm8k' and 'ifeval'
865+
- `--model_name`: Any HuggingFace model name
866+
- `--num_envs`: Number of parallel environments
867+
- `--steps_per_batch`: Steps to collect per batch
868+
- `--optim_batch_size`: Batch size for optimization
869+
- `--epochs`: Number of epochs per batch collection
870+
- `--repeats`: Number of action repeats for GRPO
871+
- `--lr`: Learning rate
872+
- `--kl_coef`: KL penalty coefficient
873+
- `--compile`: Enable torch.compile() for the loss function
874+
- `--clip_grad_norm`: Gradient norm clipping value
875+
- `--gpu_memory_utilization`: GPU memory utilization for vLLM
1011876

1012-
Internal collaborations to torchrl are welcome! Feel free to fork, submit issues and PRs.
1013-
You can checkout the detailed contribution guide [here](https://github.com/pytorch/rl/blob/main/CONTRIBUTING.md).
1014-
As mentioned above, a list of open contributions can be found in [here](https://github.com/pytorch/rl/issues/509).
877+
## Hardware Requirements
1015878

1016-
Contributors are recommended to install [pre-commit hooks](https://pre-commit.com/) (using `pre-commit install`). pre-commit will check for linting related issues when the code is committed locally. You can disable th check by appending `-n` to your commit command: `git commit -m <commit message> -n`
879+
- CUDA-capable GPU with at least 8GB VRAM
880+
- For multi-GPU setups, the script automatically manages device allocation
1017881

882+
## Monitoring
1018883

1019-
## Disclaimer
884+
The training progress is logged to Weights & Biases. Key metrics include:
885+
- Reward
886+
- Advantage
887+
- KL penalty
888+
- Sequence length
889+
- Loss metrics (ESS, objective, clip fraction, etc.)
1020890

1021-
This library is released as a PyTorch beta feature.
1022-
BC-breaking changes are likely to happen but they will be introduced with a deprecation
1023-
warranty after a few release cycles.
891+
## License
1024892

1025-
# License
1026-
TorchRL is licensed under the MIT License. See [LICENSE](https://github.com/pytorch/rl/blob/main/LICENSE) for details.
893+
This project is licensed under the MIT License - see the LICENSE file for details.

0 commit comments

Comments
 (0)