Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
gsm8k_grpo_mt.yaml	gsm8k_grpo_mt.yaml
gsm8k_rl_mt.py	gsm8k_rl_mt.py
reward_curve.png	reward_curve.png

Name

Last commit message

Last commit date

Training a Multi-Turn GSM8K Math Agent in AReaL

Files in this folder presents an example that train a multi-turn GSM8K math agent from Qwen/Qwen2.5-1.5B-Instruct, using ArealOpenAI APIs and its concat mode to organize training data and discount reward.

To run the example

python3 examples/multi_turn_math/gsm8k_rl_mt.py \
    --config examples/multi_turn_math/gsm8k_grpo_mt.yaml \
    scheduler.type=ray \
    experiment_name=gsm8k-grpo-multiturn trial_name=trial0

only the following config are added compared to the original gsm8k_grpo.yaml config:

export_style: concat
agent_run_args:
  max_turns: 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Training a Multi-Turn GSM8K Math Agent in AReaL

To run the example

Reward Curve

FilesExpand file tree

multi_turn_math

Directory actions

More options

Directory actions

More options

Latest commit

History

multi_turn_math

Folders and files

parent directory

README.md

Training a Multi-Turn GSM8K Math Agent in AReaL

To run the example

Reward Curve