Skip to content

A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.

License

Notifications You must be signed in to change notification settings

axolotl-ai-cloud/grpo_code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Note

Check out our blog-post for more detail and benchmarks!

Installation

git clone https://github.com/axolotl-ai-cloud/grpo_code.git
cd grpo_code
pip install -e .
pip install axolotl==0.8.0[vllm,flash-attn]

Training

The following environment variables can be used to modify the behaviour of the reward functions:

  • WASM_FUEL - Controls the amount of fuel (computation resources) allocated to the WASM environment (default: 10000000000)
  • WASM_PATH - Path to the Python WASM runtime file (default: "./wasm/python-3.12.0.wasm")
  • TIMEOUT - Maximum execution time in seconds for code evaluation (default: 1)
  • MAX_WORKERS - Number of parallel workers for multiprocessing reward functions (default: 1)

First, spin up a vLLM instance:

CUDA_VISIBLE_DEVICES=2,3 axolotl vllm-serve r1_acecode.yaml

Then, in another terminal, kick off the training process:

CUDA_VISIBLE_DEVICES=0,1 MAX_WORKERS=64 axolotl train r1_acecode.yaml --num-processes 2

This example uses 4 A100 GPUs - adjust CUDA_VISIBLE_DEVICES, MAX_WORKERS, cfg.micro_batch_size and cfg.gradient_accumulation_steps as necessary to match your hardware.

Python WASM Runtime

This project uses Python 3.12.0 compiled to WebAssembly from VMware Labs.

Verify an Existing Download

If you already have the WASM file and want to verify its integrity:

  1. Ensure you have both python-3.12.0.wasm and python-3.12.0.wasm.sha256sum in the wasm directory.
  2. Run the verification command:

Linux/macOS:

sha256sum -c ./wasm/python-3.12.0.wasm.sha256sum

Manual Download

To download the runtime files yourself:

  1. Download the Python WASM runtime:

    curl -LO https://github.com/vmware-labs/webassembly-language-runtimes/releases/download/python%2F3.12.0%2B20231211-040d5a6/python-3.12.0.wasm -o ./wasm/python-3.12.0.wasm
  2. Download the SHA256 checksum file:

    curl -LO https://github.com/vmware-labs/webassembly-language-runtimes/releases/download/python%2F3.12.0%2B20231211-040d5a6/python-3.12.0.wasm.sha256sum -o ./wasm/python-3.12.0.wasm.sha256sum
  3. Verify the download:

    sha256sum -c ./wasm/python-3.12.0.wasm.sha256sum
  4. Place both files in your project directory or specify the path in your configuration.

About

A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.

Resources

License

Stars

Watchers

Forks