Note
Check out our blog-post for more detail and benchmarks!
git clone https://github.com/axolotl-ai-cloud/grpo_code.git
cd grpo_code
pip install -e .
pip install axolotl==0.8.0[vllm,flash-attn]
The following environment variables can be used to modify the behaviour of the reward functions:
WASM_FUEL
- Controls the amount of fuel (computation resources) allocated to the WASM environment (default: 10000000000)WASM_PATH
- Path to the Python WASM runtime file (default: "./wasm/python-3.12.0.wasm")TIMEOUT
- Maximum execution time in seconds for code evaluation (default: 1)MAX_WORKERS
- Number of parallel workers for multiprocessing reward functions (default: 1)
First, spin up a vLLM
instance:
CUDA_VISIBLE_DEVICES=2,3 axolotl vllm-serve r1_acecode.yaml
Then, in another terminal, kick off the training process:
CUDA_VISIBLE_DEVICES=0,1 MAX_WORKERS=64 axolotl train r1_acecode.yaml --num-processes 2
This example uses 4 A100 GPUs - adjust CUDA_VISIBLE_DEVICES
, MAX_WORKERS
, cfg.micro_batch_size
and cfg.gradient_accumulation_steps
as necessary to match your hardware.
This project uses Python 3.12.0 compiled to WebAssembly from VMware Labs.
If you already have the WASM file and want to verify its integrity:
- Ensure you have both
python-3.12.0.wasm
andpython-3.12.0.wasm.sha256sum
in thewasm
directory. - Run the verification command:
Linux/macOS:
sha256sum -c ./wasm/python-3.12.0.wasm.sha256sum
To download the runtime files yourself:
-
Download the Python WASM runtime:
curl -LO https://github.com/vmware-labs/webassembly-language-runtimes/releases/download/python%2F3.12.0%2B20231211-040d5a6/python-3.12.0.wasm -o ./wasm/python-3.12.0.wasm
-
Download the SHA256 checksum file:
curl -LO https://github.com/vmware-labs/webassembly-language-runtimes/releases/download/python%2F3.12.0%2B20231211-040d5a6/python-3.12.0.wasm.sha256sum -o ./wasm/python-3.12.0.wasm.sha256sum
-
Verify the download:
sha256sum -c ./wasm/python-3.12.0.wasm.sha256sum
-
Place both files in your project directory or specify the path in your configuration.