Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigScience Eval Harness #291

Open
wants to merge 75 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
71728c2
Add functionality for running the evaluation harness on single gpu
DanielHesslow Nov 28, 2021
97a2339
Add support for pipelining
DanielHesslow Nov 29, 2021
e72a18f
support tensor parallel
DanielHesslow Nov 29, 2021
0a82965
save the results
DanielHesslow Nov 29, 2021
ceddfc5
Minor cleanup
DanielHesslow Nov 29, 2021
c1e8022
Experimental Deepspeed support
DanielHesslow Nov 29, 2021
0f8c8c0
Proper deepspeed integration, now working on combined tp and pp
DanielHesslow Dec 1, 2021
37e6962
Update model loading and clean up code.
DanielHesslow Dec 5, 2021
ee0a1a9
Add some options
DanielHesslow Dec 5, 2021
cacc58f
Fix pipelining + fp32 evaluaiton.
DanielHesslow Dec 8, 2021
778f251
Remove dummy paths in examples/run_evalharness.sh
DanielHesslow Dec 9, 2021
3d90b18
Simplify offline loading with export HF_DATASETS_OFFLINE=1
DanielHesslow Dec 9, 2021
2bb61ac
Remove accidental copy-paste.
DanielHesslow Dec 14, 2021
a362da3
Experimantel deepspeed evaluation-path
DanielHesslow Dec 15, 2021
9899be0
make it work with deepspeed; add instructions
stas00 Jan 8, 2022
7ef5ba7
improve
stas00 Jan 8, 2022
9527ad3
make adaptive_seq_len work with deepspeed
stas00 Jan 8, 2022
d4dacbe
move to slurm
stas00 Jan 8, 2022
151e91a
fixes
stas00 Jan 9, 2022
92123d0
cleanup
stas00 Jan 9, 2022
a6fab1f
add instructions on how to import data into the spreadsheet
stas00 Jan 10, 2022
dedf111
not tracking ppl/em
stas00 Jan 11, 2022
a12af5c
add task version
stas00 Jan 11, 2022
dbecf81
make compatible with lm-eval@master
stas00 Jan 12, 2022
c04e3d0
switch to 16gb slurm; simplify; improve instructions
stas00 Jan 13, 2022
e6e4800
Deepspeed model loading hack
DanielHesslow Jan 13, 2022
5e611bf
Restore correct zero state.
DanielHesslow Jan 13, 2022
7937eab
fix conversion script
stas00 Jan 14, 2022
afd3814
simpler config
stas00 Jan 14, 2022
b1a54f3
Merge remote-tracking branch 'origin/main' into eval_harness
stas00 Jan 15, 2022
9c60079
corrections
stas00 Jan 18, 2022
d861137
add logiqa
stas00 Jan 18, 2022
7158790
dealing with custom tokenizers
stas00 Jan 19, 2022
f0da71d
fix
stas00 Jan 21, 2022
1e06f41
Update examples/run_evalharness_deepspeed.md
stas00 Feb 18, 2022
a9221ac
Merge branch 'main' into eval_harness
stas00 Mar 18, 2022
a722259
Merge remote-tracking branch 'origin/main' into eval_harness
stas00 Apr 25, 2022
9ac9fad
check that the checkpoint path is valid
stas00 Apr 26, 2022
8ef9018
skip --abort_on_unmet_fused_kernel_constraints during eval
stas00 Apr 26, 2022
a798d69
disable sanity check on layers-2%pp==0
stas00 Apr 26, 2022
5884dcf
sort skip_keys
stas00 Apr 26, 2022
45bd9c6
make the default path unique to avoid overwrite
stas00 May 11, 2022
f75e232
Add bootstrap_iters arg
Muennighoff May 13, 2022
7bf75b9
Explain bootstrap_iters flag
Muennighoff May 13, 2022
3f18e7b
Intermediate results flag
Muennighoff May 14, 2022
213317f
Add backup file
Muennighoff May 15, 2022
1c11b10
Add arg to reduce bubble for pipeline parallel
Muennighoff May 15, 2022
f330705
Fix adaptive_seq_len via resetting activation shape
Muennighoff May 15, 2022
5082035
Extract args.load prior to load_ds_checkpoint_and_setup_megatron
Muennighoff May 15, 2022
db203cc
Parse args prior to loading function to get load_path
Muennighoff May 15, 2022
1d6c630
Add run_evalharness-tr11-176b-ml slurm script
Muennighoff May 16, 2022
7244745
Add bseval_harness fork compatibility
Muennighoff Jun 29, 2022
6fd4646
Remove superfluous script
Muennighoff Jun 29, 2022
e81615e
Merge branch 'main' into bseval_harness
Muennighoff Jun 29, 2022
0214bb7
Remove duplicates
Muennighoff Jun 29, 2022
2ce9ff6
Remove superfluous string
Muennighoff Jun 29, 2022
1fa0618
Add emission & example file
Muennighoff Jun 30, 2022
9af3e02
Add downloading
Muennighoff Jul 5, 2022
f75af1f
Offload to CPU earlier & increase number of bs in pipleine parallelism
Muennighoff Jul 5, 2022
9cf7ffd
Add offload arg
Muennighoff Jul 5, 2022
40cf869
add offload arg to slurm scripts
Muennighoff Jul 5, 2022
9313466
Fix setup_example_logger
thomasw21 Jul 6, 2022
d0b2efa
Add torch barrier
thomasw21 Jul 6, 2022
01dc62a
Add torch barrier
thomasw21 Jul 6, 2022
c193ffc
Improvement
thomasw21 Jul 7, 2022
79cb569
Be very careful of random states
thomasw21 Jul 7, 2022
bd31b62
Woops
thomasw21 Jul 7, 2022
c6f7602
This is already done correctly
thomasw21 Jul 7, 2022
6105fe4
Filter out generative tasks
thomasw21 Jul 7, 2022
43936d9
There's no BOS for bloom
thomasw21 Jul 7, 2022
280f1dc
Remove codecarbon
thomasw21 Jul 7, 2022
b466009
Merge branch 'main' into bseval_harness
Muennighoff Jul 16, 2022
9a2277c
Add small model scripts
Muennighoff Jul 16, 2022
02961ea
Merge branch 'bseval_harness' of https://github.com/bigscience-worksh…
Muennighoff Jul 16, 2022
472045e
merge main (#331)
Muennighoff Aug 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions examples/evalharness/run_bsevalharness_tr11-176b-ml.slurm
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
#!/bin/bash
#SBATCH --job-name=run_bsevalharness-tr11-176b-ml
#SBATCH --partition=gpu_p5
#SBATCH --constraint=a100
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1 # crucial - only 1 task per dist per node!
#SBATCH --cpus-per-task=64 # number of cores per tasks
#SBATCH --hint=nomultithread # we get physical cores not logical
#SBATCH --gres=gpu:8 # number of gpus
#SBATCH --time 20:00:00 # maximum execution time (HH:MM:SS)
#SBATCH --output=%x-%j.out # output file name
#SBATCH --account=six@a100
#SBATCH --reservation=hug


set -x -e

source $six_ALL_CCFRWORK/start-muennighofflmeval

echo "START TIME: $(date)"

# a unique identifier for the current eval ideally correspnding to the modelname
VARIANT="tr11-176b-ml-bsevalharness"


CHECKPOINT_PATH=$six_ALL_CCFRSCRATCH/checkpoints/tr11-176B-ml/checkpoints/main/global_step90000
MEGATRON_DEEPSPEED_REPO=$six_ALL_CCFRSCRATCH/commun/experiments/muennighoff/megdsbslmeval/Megatron-DeepSpeed
export HF_DATASETS_OFFLINE=1
export TRANSFORMERS_OFFLINE=1

export TRANSFORMERS_CACHE=$six_ALL_CCFRWORK/models
export HF_DATASETS_CACHE=$six_ALL_CCFRWORK/datasets
export HF_MODULES_CACHE=$six_ALL_CCFRWORK/modules
export HF_METRICS_CACHE=$six_ALL_CCFRWORK/metrics

cd $MEGATRON_DEEPSPEED_REPO

TOKENIZER_NAME_OR_PATH=bigscience-catalogue-data-dev/byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles

PP_SIZE=8
TP_SIZE=1
SEQ_LEN=2048

# different from the training MICRO_BATCH_SIZE - no optim memory, so can do bigger BS
# make as big as it can fit into gpu w/o OOM, but not too close to 100%
EVAL_MICRO_BATCH_SIZE=1

#dummy arguments to make megatron happy.
MEGATRON_REQUIRED_ARGS=" \
--num-layers -1 \
--hidden-size -1 \
--num-attention-heads -1 \
--seq-length -1 \
--max-position-embeddings -1 \
"


ZERO_STAGE=0

config_json="./ds_config.json"

# Deepspeed figures out GAS dynamically from dynamic GBS via set_train_batch_size()
cat <<EOT > $config_json
{
"train_micro_batch_size_per_gpu": 1,
"train_batch_size": 1,
"gradient_clipping": 1.0,
"zero_optimization": {
"stage": $ZERO_STAGE
},
"bf16": {
"enabled": true
},
"steps_per_print": 2000,
"wall_clock_breakdown": false
}
EOT


CMD="./tasks/eval_harness/evaluate_bsevalharness.py \
--load $CHECKPOINT_PATH \
--results_path $VARIANT-results.json \
--tensor-model-parallel-size $TP_SIZE \
--pipeline-model-parallel-size $PP_SIZE \
--tokenizer-type PretrainedFromHF \
--tokenizer-name-or-path $TOKENIZER_NAME_OR_PATH \
--micro-batch-size $EVAL_MICRO_BATCH_SIZE \
--no-load-optim \
--no-load-rng \
--bf16 \
--inference \
--seq-length $SEQ_LEN \
--task_list wnli \
--deepspeed \
--deepspeed_config ds_config.json \
--intermed_results \
--adaptive_seq_len \
--micro_bs_multiplier 16 \
--offloadearly \
$MEGATRON_REQUIRED_ARGS \
"

GPUS_PER_NODE=8
NNODES=$SLURM_NNODES
MASTER_ADDR=$(scontrol show hostnames $SLURM_JOB_NODELIST | head -n 1)
MASTER_PORT=6000
export LAUNCHER="python -u -m torch.distributed.run \
--nproc_per_node $GPUS_PER_NODE \
--nnodes $NNODES \
--rdzv_endpoint $MASTER_ADDR:$MASTER_PORT \
--rdzv_backend c10d \
--max_restarts 0 \
--tee 3 \
"

export CUDA_LAUNCH_BLOCKING=1

echo $LAUNCHER $CMD

export PYTHONPATH=$MEGATRON_DEEPSPEED_REPO

$LAUNCHER $CMD 2>&1 | tee $VARIANT-eval-harness.log
122 changes: 122 additions & 0 deletions examples/evalharness/run_bsevalharness_tr11b-1b3-ml.slurm
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
#!/bin/bash
#SBATCH --job-name=run_bsevalharness-tr11b-1b3-ml
#SBATCH --partition=gpu_p5
#SBATCH --constraint=a100
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1 # crucial - only 1 task per dist per node!
#SBATCH --cpus-per-task=8 # number of cores per tasks
#SBATCH --hint=nomultithread # we get physical cores not logical
#SBATCH --gres=gpu:1 # number of gpus
#SBATCH --time 20:00:00 # maximum execution time (HH:MM:SS)
#SBATCH --output=%x-%j.out # output file name
#SBATCH --account=six@a100
#SBATCH --reservation=hug


set -x -e

source $six_ALL_CCFRWORK/start-muennighofflmeval

echo "START TIME: $(date)"

# a unique identifier for the current eval ideally correspnding to the modelname
VARIANT="tr11b-1b3-ml-bsevalharness"


CHECKPOINT_PATH=$six_ALL_CCFRSCRATCH/checkpoints/tr11b-1B3-ml/checkpoints/main/global_step340500
MEGATRON_DEEPSPEED_REPO=$six_ALL_CCFRSCRATCH/commun/experiments/muennighoff/megdsbslmeval/Megatron-DeepSpeed
export HF_DATASETS_OFFLINE=1
export TRANSFORMERS_OFFLINE=1

export TRANSFORMERS_CACHE=$six_ALL_CCFRWORK/models
export HF_DATASETS_CACHE=$six_ALL_CCFRWORK/datasetseval
export HF_MODULES_CACHE=$six_ALL_CCFRWORK/modules
export HF_METRICS_CACHE=$six_ALL_CCFRWORK/metrics
export TOKENIZERS_PARALLELISM=false

cd $MEGATRON_DEEPSPEED_REPO

TOKENIZER_NAME_OR_PATH=bigscience-catalogue-data-dev/byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles

PP_SIZE=1
TP_SIZE=1
SEQ_LEN=2048

# different from the training MICRO_BATCH_SIZE - no optim memory, so can do bigger BS
# make as big as it can fit into gpu w/o OOM, but not too close to 100%
EVAL_MICRO_BATCH_SIZE=1

#dummy arguments to make megatron happy.
MEGATRON_REQUIRED_ARGS=" \
--num-layers -1 \
--hidden-size -1 \
--num-attention-heads -1 \
--seq-length -1 \
--max-position-embeddings -1 \
"


ZERO_STAGE=0

config_json="./ds_config.json"

# Deepspeed figures out GAS dynamically from dynamic GBS via set_train_batch_size()
cat <<EOT > $config_json
{
"train_micro_batch_size_per_gpu": 1,
"train_batch_size": 1,
"gradient_clipping": 1.0,
"zero_optimization": {
"stage": $ZERO_STAGE
},
"bf16": {
"enabled": false
},
"steps_per_print": 2000,
"wall_clock_breakdown": false
}
EOT


CMD="./tasks/eval_harness/evaluate_bsevalharness.py \
--load $CHECKPOINT_PATH \
--results_path $VARIANT-results.json \
--tensor-model-parallel-size $TP_SIZE \
--pipeline-model-parallel-size $PP_SIZE \
--tokenizer-type PretrainedFromHF \
--tokenizer-name-or-path $TOKENIZER_NAME_OR_PATH \
--micro-batch-size $EVAL_MICRO_BATCH_SIZE \
--no-load-optim \
--no-load-rng \
--inference \
--seq-length $SEQ_LEN \
--task_list axb,axg,boolq,cb,cola,copa,crows_pairs_english,crows_pairs_french,diabla,e2e_nlg_cleaned,mnli,mnli_mismatched,multirc,piaf,qqp,rte,sst,tydiqa_primary,tydiqa_secondary,wic,wsc,wnli,wino_bias_type1_anti,wino_bias_type1_pro,wino_bias_type2_anti,wino_bias_type2_pro,xquad_ar,xquad_en,gsarti/flores_101_afr,gsarti/flores_101_amh,gsarti/flores_101_ara,gsarti/flores_101_hye,gsarti/flores_101_asm,gsarti/flores_101_ast,gsarti/flores_101_azj,gsarti/flores_101_bel,gsarti/flores_101_ben,gsarti/flores_101_bos,gsarti/flores_101_bul,gsarti/flores_101_mya,gsarti/flores_101_cat,gsarti/flores_101_ceb,gsarti/flores_101_zho_simpl,gsarti/flores_101_zho_trad,gsarti/flores_101_hrv,gsarti/flores_101_ces,gsarti/flores_101_dan,gsarti/flores_101_nld,gsarti/flores_101_eng,gsarti/flores_101_est,gsarti/flores_101_tgl,gsarti/flores_101_fin,gsarti/flores_101_fra,gsarti/flores_101_ful,gsarti/flores_101_glg,gsarti/flores_101_lug,gsarti/flores_101_kat,gsarti/flores_101_deu,gsarti/flores_101_ell,gsarti/flores_101_guj,gsarti/flores_101_hau,gsarti/flores_101_heb,gsarti/flores_101_hin,gsarti/flores_101_hun,gsarti/flores_101_isl,gsarti/flores_101_ibo,gsarti/flores_101_ind,gsarti/flores_101_gle,gsarti/flores_101_ita,gsarti/flores_101_jpn,gsarti/flores_101_jav,gsarti/flores_101_kea,gsarti/flores_101_kam,gsarti/flores_101_kan,gsarti/flores_101_kaz,gsarti/flores_101_khm,gsarti/flores_101_kor,gsarti/flores_101_kir,gsarti/flores_101_lao,gsarti/flores_101_lav,gsarti/flores_101_lin,gsarti/flores_101_lit,gsarti/flores_101_luo,gsarti/flores_101_ltz,gsarti/flores_101_mkd,gsarti/flores_101_msa,gsarti/flores_101_mal,gsarti/flores_101_mlt,gsarti/flores_101_mri,gsarti/flores_101_mar,gsarti/flores_101_mon,gsarti/flores_101_npi,gsarti/flores_101_nso,gsarti/flores_101_nob,gsarti/flores_101_nya,gsarti/flores_101_oci,gsarti/flores_101_ory,gsarti/flores_101_orm,gsarti/flores_101_pus,gsarti/flores_101_fas,gsarti/flores_101_pol,gsarti/flores_101_por,gsarti/flores_101_pan,gsarti/flores_101_ron,gsarti/flores_101_rus,gsarti/flores_101_srp,gsarti/flores_101_sna,gsarti/flores_101_snd,gsarti/flores_101_slk,gsarti/flores_101_slv,gsarti/flores_101_som,gsarti/flores_101_ckb,gsarti/flores_101_spa,gsarti/flores_101_swh,gsarti/flores_101_swe,gsarti/flores_101_tgk,gsarti/flores_101_tam,gsarti/flores_101_tel,gsarti/flores_101_tha,gsarti/flores_101_tur,gsarti/flores_101_ukr,gsarti/flores_101_umb,gsarti/flores_101_urd,gsarti/flores_101_uzb,gsarti/flores_101_vie,gsarti/flores_101_cym,gsarti/flores_101_wol,gsarti/flores_101_xho,gsarti/flores_101_yor,gsarti/flores_101_zul \
--eval_fp32 \
--deepspeed \
--deepspeed_config ds_config.json \
--intermed_results \
--adaptive_seq_len \
--micro_bs_multiplier 8 \
$MEGATRON_REQUIRED_ARGS \
"

GPUS_PER_NODE=1
NNODES=$SLURM_NNODES
MASTER_ADDR=$(scontrol show hostnames $SLURM_JOB_NODELIST | head -n 1)
MASTER_PORT=6000
export LAUNCHER="python -u -m torch.distributed.run \
--nproc_per_node $GPUS_PER_NODE \
--nnodes $NNODES \
--rdzv_endpoint $MASTER_ADDR:$MASTER_PORT \
--rdzv_backend c10d \
--max_restarts 0 \
--tee 3 \
"

export CUDA_LAUNCH_BLOCKING=1

echo $LAUNCHER $CMD

export PYTHONPATH=$MEGATRON_DEEPSPEED_REPO

$LAUNCHER $CMD 2>&1 | tee $VARIANT-eval-harness.log
121 changes: 121 additions & 0 deletions examples/evalharness/run_bsevalharness_tr11c-2b5-ml.slurm
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
#!/bin/bash
#SBATCH --job-name=run_bsevalharness-tr11c-2b5-ml
#SBATCH --partition=gpu_p5
#SBATCH --constraint=a100
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1 # crucial - only 1 task per dist per node!
#SBATCH --cpus-per-task=8 # number of cores per tasks
#SBATCH --hint=nomultithread # we get physical cores not logical
#SBATCH --gres=gpu:1 # number of gpus
#SBATCH --time 20:00:00 # maximum execution time (HH:MM:SS)
#SBATCH --output=%x-%j.out # output file name
#SBATCH --account=six@a100
#SBATCH --reservation=hug


set -x -e

source $six_ALL_CCFRWORK/start-muennighofflmeval

echo "START TIME: $(date)"

# a unique identifier for the current eval ideally correspnding to the modelname
VARIANT="tr11c-2b5-ml-bsevalharness"


CHECKPOINT_PATH=$six_ALL_CCFRSCRATCH/checkpoints/tr11c-2B5-ml/checkpoints/main/global_step337250
MEGATRON_DEEPSPEED_REPO=$six_ALL_CCFRSCRATCH/commun/experiments/muennighoff/megdsbslmeval/Megatron-DeepSpeed
export HF_DATASETS_OFFLINE=1
export TRANSFORMERS_OFFLINE=1

export TRANSFORMERS_CACHE=$six_ALL_CCFRWORK/models
export HF_DATASETS_CACHE=$six_ALL_CCFRWORK/datasetseval
export HF_MODULES_CACHE=$six_ALL_CCFRWORK/modules
export HF_METRICS_CACHE=$six_ALL_CCFRWORK/metrics
export TOKENIZERS_PARALLELISM=false

cd $MEGATRON_DEEPSPEED_REPO

TOKENIZER_NAME_OR_PATH=bigscience-catalogue-data-dev/byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles

PP_SIZE=1
TP_SIZE=1
SEQ_LEN=2048

# different from the training MICRO_BATCH_SIZE - no optim memory, so can do bigger BS
# make as big as it can fit into gpu w/o OOM, but not too close to 100%
EVAL_MICRO_BATCH_SIZE=1

#dummy arguments to make megatron happy.
MEGATRON_REQUIRED_ARGS=" \
--num-layers -1 \
--hidden-size -1 \
--num-attention-heads -1 \
--seq-length -1 \
--max-position-embeddings -1 \
"


ZERO_STAGE=0

config_json="./ds_config.json"

# Deepspeed figures out GAS dynamically from dynamic GBS via set_train_batch_size()
cat <<EOT > $config_json
{
"train_micro_batch_size_per_gpu": 1,
"train_batch_size": 1,
"gradient_clipping": 1.0,
"zero_optimization": {
"stage": $ZERO_STAGE
},
"bf16": {
"enabled": false
},
"steps_per_print": 2000,
"wall_clock_breakdown": false
}
EOT

CMD="./tasks/eval_harness/evaluate_bsevalharness.py \
--load $CHECKPOINT_PATH \
--results_path $VARIANT-results.json \
--tensor-model-parallel-size $TP_SIZE \
--pipeline-model-parallel-size $PP_SIZE \
--tokenizer-type PretrainedFromHF \
--tokenizer-name-or-path $TOKENIZER_NAME_OR_PATH \
--micro-batch-size $EVAL_MICRO_BATCH_SIZE \
--no-load-optim \
--no-load-rng \
--inference \
--seq-length $SEQ_LEN \
--task_list axb,axg,boolq,cb,cola,copa,crows_pairs_english,crows_pairs_french,diabla,e2e_nlg_cleaned,mnli,mnli_mismatched,multirc,piaf,qqp,rte,sst,tydiqa_primary,tydiqa_secondary,wic,wsc,wnli,wino_bias_type1_anti,wino_bias_type1_pro,wino_bias_type2_anti,wino_bias_type2_pro,xquad_ar,xquad_en,gsarti/flores_101_afr,gsarti/flores_101_amh,gsarti/flores_101_ara,gsarti/flores_101_hye,gsarti/flores_101_asm,gsarti/flores_101_ast,gsarti/flores_101_azj,gsarti/flores_101_bel,gsarti/flores_101_ben,gsarti/flores_101_bos,gsarti/flores_101_bul,gsarti/flores_101_mya,gsarti/flores_101_cat,gsarti/flores_101_ceb,gsarti/flores_101_zho_simpl,gsarti/flores_101_zho_trad,gsarti/flores_101_hrv,gsarti/flores_101_ces,gsarti/flores_101_dan,gsarti/flores_101_nld,gsarti/flores_101_eng,gsarti/flores_101_est,gsarti/flores_101_tgl,gsarti/flores_101_fin,gsarti/flores_101_fra,gsarti/flores_101_ful,gsarti/flores_101_glg,gsarti/flores_101_lug,gsarti/flores_101_kat,gsarti/flores_101_deu,gsarti/flores_101_ell,gsarti/flores_101_guj,gsarti/flores_101_hau,gsarti/flores_101_heb,gsarti/flores_101_hin,gsarti/flores_101_hun,gsarti/flores_101_isl,gsarti/flores_101_ibo,gsarti/flores_101_ind,gsarti/flores_101_gle,gsarti/flores_101_ita,gsarti/flores_101_jpn,gsarti/flores_101_jav,gsarti/flores_101_kea,gsarti/flores_101_kam,gsarti/flores_101_kan,gsarti/flores_101_kaz,gsarti/flores_101_khm,gsarti/flores_101_kor,gsarti/flores_101_kir,gsarti/flores_101_lao,gsarti/flores_101_lav,gsarti/flores_101_lin,gsarti/flores_101_lit,gsarti/flores_101_luo,gsarti/flores_101_ltz,gsarti/flores_101_mkd,gsarti/flores_101_msa,gsarti/flores_101_mal,gsarti/flores_101_mlt,gsarti/flores_101_mri,gsarti/flores_101_mar,gsarti/flores_101_mon,gsarti/flores_101_npi,gsarti/flores_101_nso,gsarti/flores_101_nob,gsarti/flores_101_nya,gsarti/flores_101_oci,gsarti/flores_101_ory,gsarti/flores_101_orm,gsarti/flores_101_pus,gsarti/flores_101_fas,gsarti/flores_101_pol,gsarti/flores_101_por,gsarti/flores_101_pan,gsarti/flores_101_ron,gsarti/flores_101_rus,gsarti/flores_101_srp,gsarti/flores_101_sna,gsarti/flores_101_snd,gsarti/flores_101_slk,gsarti/flores_101_slv,gsarti/flores_101_som,gsarti/flores_101_ckb,gsarti/flores_101_spa,gsarti/flores_101_swh,gsarti/flores_101_swe,gsarti/flores_101_tgk,gsarti/flores_101_tam,gsarti/flores_101_tel,gsarti/flores_101_tha,gsarti/flores_101_tur,gsarti/flores_101_ukr,gsarti/flores_101_umb,gsarti/flores_101_urd,gsarti/flores_101_uzb,gsarti/flores_101_vie,gsarti/flores_101_cym,gsarti/flores_101_wol,gsarti/flores_101_xho,gsarti/flores_101_yor,gsarti/flores_101_zul \
--eval_fp32 \
--deepspeed \
--deepspeed_config ds_config.json \
--intermed_results \
--adaptive_seq_len \
--micro_bs_multiplier 8 \
$MEGATRON_REQUIRED_ARGS \
"

GPUS_PER_NODE=1
NNODES=$SLURM_NNODES
MASTER_ADDR=$(scontrol show hostnames $SLURM_JOB_NODELIST | head -n 1)
MASTER_PORT=6000
export LAUNCHER="python -u -m torch.distributed.run \
--nproc_per_node $GPUS_PER_NODE \
--nnodes $NNODES \
--rdzv_endpoint $MASTER_ADDR:$MASTER_PORT \
--rdzv_backend c10d \
--max_restarts 0 \
--tee 3 \
"

export CUDA_LAUNCH_BLOCKING=1

echo $LAUNCHER $CMD

export PYTHONPATH=$MEGATRON_DEEPSPEED_REPO

$LAUNCHER $CMD 2>&1 | tee $VARIANT-eval-harness.log
Loading