-
Notifications
You must be signed in to change notification settings - Fork 630
Description
What happened?
I'm seeing a major drop in evaluation accuracy when using distributed accelerate launch
with multiple GPUs vs single-GPU evaluation, with no changes to the config file.
Here is the config file I used (oumi/qwen7b/eval_config_base.yml
):
model:
model_name: "Qwen/Qwen2.5-7B-Instruct"
trust_remote_code: True
shard_for_eval: True
tasks:
- evaluation_backend: lm_harness
task_name: mmlu_pro
num_samples: 5
inference_engine: NATIVE # NATIVE
output_dir: "eval_results/qwen7b_base"
When I run the evaluation on a single GPU, I get reasonable accuracy:
CUDA_VISIBLE_DEVICES=0 oumi evaluate -c oumi/qwen7b/eval_config_base.yml
# Accuracy: 35%
However, when I try using multiple GPUs with distributed accelerate launch
, the accuracy drops drastically:
CUDA_VISIBLE_DEVICES=0,1 oumi distributed accelerate launch -m oumi evaluate -c oumi/qwen7b/eval_config_base.yml
# Accuracy: 5%
I was not able to find any error message or log that could indicate why this was happening.
No changes were made to the config or environment besides switching the command and enabling multi-GPU. Could this be an issue with how the dataset is sharded or results aggregated in distributed mode?
Let me know what further details I can provide to help debug.
Steps to reproduce the bug
file eval.yml
model:
model_name: "Qwen/Qwen2.5-7B-Instruct"
trust_remote_code: True
shard_for_eval: True
tasks:
- evaluation_backend: lm_harness
task_name: mmlu_pro
num_samples: 5
inference_engine: NATIVE # NATIVE
output_dir: "eval_results/qwen7b_base"
conda create -n test-oumi python=3.11
pip install oumi[gpu]
CUDA_VISIBLE_DEVICES=0,1 oumi distributed accelerate launch -m oumi evaluate -c eval.yml
System Info
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Oumi environment information:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌────────────────┬───────────────────────────────────────────────┐
│ Oumi version │ 0.1.12 │
│ Python version │ 3.11.11 │
│ Platform │ Linux-5.15.0-84-generic-x86_64-with-glibc2.31 │
└────────────────┴───────────────────────────────────────────────┘
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Installed dependencies:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ PACKAGE ┃ VERSION ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ accelerate │ 1.2.1 │
│ aiohttp │ 3.11.18 │
│ bitsandbytes │ 0.45.5 │
│ datasets │ 3.2.0 │
│ diffusers │ <not installed> │
│ einops │ 0.8.1 │
│ jsonlines │ 4.0.0 │
│ liger-kernel │ 0.5.9 │
│ llama-cpp-python │ <not installed> │
│ lm-eval │ 0.4.8 │
│ mlflow │ 2.21.3 │
│ numpy │ 1.26.4 │
│ nvidia-ml-py │ 12.560.30 │
│ omegaconf │ 2.4.0.dev3 │
│ open_clip_torch │ <not installed> │
│ pandas │ 2.2.3 │
│ peft │ 0.14.0 │
│ pexpect │ 4.8.0 │
│ pillow │ 11.1.0 │
│ pydantic │ 2.9.2 │
│ responses │ 0.25.7 │
│ sglang │ <not installed> │
│ skypilot │ 0.7.0 │
│ tensorboard │ 2.18.0 │
│ timm │ <not installed> │
│ torch │ 2.5.1 │
│ torchdata │ 0.9.0 │
│ torchvision │ 0.20.1 │
│ tqdm │ 4.67.1 │
│ transformers │ 4.51.3 │
│ trl │ 0.16.1 │
│ typer │ 0.15.3 │
│ vllm │ 0.7.3 │
│ wandb │ 0.19.11 │
└──────────────────┴─────────────────┘
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Environment variables:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ VARIABLE ┃ VALUE ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ ACCELERATE_DYNAMO_BACKEND │ <not set> │
│ ACCELERATE_DYNAMO_MODE │ <not set> │
│ ACCELERATE_DYNAMO_USE_DYNAMIC │ <not set> │
│ ACCELERATE_DYNAMO_USE_FULLGRAPH │ <not set> │
│ ACCELERATE_USE_FSDP │ <not set> │
│ CUDA_VISIBLE_DEVICES │ 0,1 │
│ LOCAL_RANK │ <not set> │
│ LOCAL_WORLD_SIZE │ <not set> │
│ OUMI_EXTRA_DEPS_FILE │ <not set> │
│ OUMI_FORCE_EDITABLE_INSTALL │ <not set> │
│ OUMI_SLURM_CONNECTIONS │ <not set> │
│ OUMI_USE_SPOT_VM │ <not set> │
│ RANK │ <not set> │
│ WORLD_SIZE │ <not set> │
└─────────────────────────────────┴───────────┘
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PyTorch information:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌────────────────┬───────────────────────┐
│ CUDA available │ True │
│ CUDA version │ 12.4 │
│ cuDNN version │ 90.1.0 │
│ Number of GPUs │ 2 │
│ GPU type │ NVIDIA A100 80GB PCIe │
│ GPU memory │ 79.2GB │
└────────────────┴───────────────────────┘