OLMO + RL #424

vwxyzjn · 2024-11-08T17:35:13Z

I put the code here. To reproduce my work, pip install ai2_olmo and run

for beta in 0.05
do
for lr in 3e-7
do
python mason.py \
    --cluster ai2/augusta-google-1 --image nathanl/open_instruct_auto --pure_docker_mode \
    --workspace ai2/tulu-3-dev \
    --priority high \
    --preemptible \
    --num_nodes 1 \
    --image costah/open_instruct_ppo_ray_olmo \
    --budget ai2/allennlp \
    --gpus 8 --  pip install --upgrade transformers \&\& python open_instruct/ppo_vllm_thread_ray_gtrl_olmo.py \
    --exp_name "ppo_olmo_rm_init_one_epoch_beta_${beta}_lr_${lr}" \
    --beta $beta \
    --learning_rate $lr \
    --dataset_mixer "{\"ai2-adapt-dev/gsm8k_ground_truth\": 1.0}" \
    --dataset_train_splits train \
    --dataset_eval_mixer "{\"ai2-adapt-dev/gsm8k_math_ground_truth\": 1.0}" \
    --dataset_eval_splits test \
    --max_token_length 2048 \
    --max_prompt_token_length 2048 \
    --response_length 1024 \
    --model_name_or_path allenai/open_instruct_dev \
    --model_revision olmo_7b_soup_anneal_v3.9_4_DPO___model__42__1730863426 \
    --reward_model_path allenai/open_instruct_dev \
    --reward_model_revision reward_modeling__1__1730930663 \
    --non_stop_penalty \
    --stop_token eos \
    --temperature 1.0 \
    --ground_truths_key ground_truth \
    --chat_template tulu \
    --sft_messages_key messages \
    --total_episodes 200000 \
    --penalty_reward_value -10.0 \
    --deepspeed_stage 3 \
    --per_device_train_batch_size 4 \
    --local_rollout_forward_batch_size 8 \
    --local_mini_batch_size 32 \
    --local_rollout_batch_size 32 \
    --actor_num_gpus_per_node 7 \
    --vllm_tensor_parallel_size 1 \
    --num_epochs 1 \
    --apply_verifiable_reward true \
    --output_dir /output \
    --seed 3 \
    --num_evals 3 \
    --reward_model_multiplier 0.0 \
    --no_try_launch_beaker_eval_jobs \
    --gradient_checkpointing \
    --with_tracking
done
done

…olmo_again

# Conflicts: # open_instruct/mix_data.py # requirements.txt

hamishivi · 2025-01-08T00:22:19Z

note: this pr closes #467

vwxyzjn · 2025-01-26T14:43:03Z

Closed by #525

natolambert and others added 30 commits September 5, 2024 19:41

init

65aaaec

up

7da49a9

branch dockerfile

d929a89

update

6ef706c

debugging and minor fixes

a4797fb

nit and style

56d6d86

fixes

50500ea

add weka mounting

5eb61cc

up

b134410

add hardcode flash_attn

d007da0

tweaks

5d82ea2

making it work nicely

11397bf

clean

ee5c7d2

clean

fd16aee

clean

fa447f8

up

e02e985

no longer install from branch

a0a32bf

Merge branch 'main' into olmo_again

eed8e4f

fixes

503e61e

Merge branch 'main' of https://github.com/allenai/open-instruct into …

ec745b0

…olmo_again

dpo config

9827264

temp olmo changes

b175717

first pass

b444e80

fix spelling, ground truth stuff

f0569a3

fix misspelling

8e0f517

count verifieds and intermediate saving

6eebf7f

save intermediate steps

f9a0b3c

small fix to logging

bad1933

fix bug for forward rollout batching

028315d

support gsm8k and math, more flexibility in future

faa7dc0

hamishivi and others added 14 commits October 29, 2024 13:00

Merge branch 'main' into verifiable-rewards

b9de634

add weka save override

f6a2b75

add multinode ray file

b59659a

lint and fix

36a2ed4

first stab at flan

527c51f

Merge branch 'main' into olmo_again

a5ed1c1

# Conflicts: # open_instruct/mix_data.py # requirements.txt

add olmo training

281e28e

fix dir in config

e8ddd56

rollback my changes

abd25bf

eval on intermediate checkpoints (#414)

f037460

Merge branch 'main' into verifiable-rewards

ed18615

quick change

63a4449

Merge branch 'olmo_again' into rlolmo

3cfc9e2

update OLMo code

ebdf456

vwxyzjn mentioned this pull request Nov 8, 2024

Olmo rl #417

Closed

vwxyzjn changed the base branch from main to olmo_again November 8, 2024 17:35

vwxyzjn added 6 commits November 11, 2024 18:27

push changes

3422229

push changes

7905e63

Merge branch 'main' into rlolmo

918b701

Merge branch 'main' into rlolmo

4d5c77a

quick change

e857e36

push

0bebd50

vwxyzjn mentioned this pull request Dec 1, 2024

Will you support fine-tuning from olmo2? #467

Open

vwxyzjn added 2 commits January 8, 2025 06:28

push changes

b452f92

push changes

f732a73

vwxyzjn changed the base branch from olmo_again to main January 8, 2025 14:32

vwxyzjn mentioned this pull request Jan 8, 2025

Use the latest OLMo2 image #502

Merged

vwxyzjn closed this Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OLMO + RL #424

OLMO + RL #424

vwxyzjn commented Nov 8, 2024 •

edited

Loading

hamishivi commented Jan 8, 2025

vwxyzjn commented Jan 26, 2025

OLMO + RL #424

OLMO + RL #424

Conversation

vwxyzjn commented Nov 8, 2024 • edited Loading

hamishivi commented Jan 8, 2025

vwxyzjn commented Jan 26, 2025

vwxyzjn commented Nov 8, 2024 •

edited

Loading